Journalists and researchers have got accustomed to using data journalism skills for many years. In light of a conflict over the ownership of the term and date of its release or spreading; I preferred to discover by myself the real date of using the term “Data Journalism” and related searching processes as we know today.
To avoid controversy over the ownership of term and its use, I used “Google Trends” website, which provided us with the date of spreading of search process concerning specific words, the time range of search processes of these words. I did a quick search process on the site for “Data Journalism” term and I found that the largest number of search processes for such term happened in 2008, and there is a steady increase in search processes for the term. The best evidence of broad spread of the term is search engine “Google,” which displays 161 million results for “Data Journalism” searching in English against 24 million results for searching in Arabic. Such evidence shows obviously that the term has seen a viral spread in current time which may increase drastically.
Journalists have been dealing with data since the emergence of the press, but the increase in size of data and related skills and experiences formed the term “Data Journalism” as we know now. Leaks such as “Wikileaks documents” which contained a huge amount of data and subsequent leaks like “Panama documents,” revealed the need to develop tools and methods of prospecting data of enormous sizes which is too difficult for the human brain to handle without using modern information technologies.
The huge number of questions about Data Journalism and increasing searches for tools and related software urged me to look for a clear answer about stages of work as an intern in Data Science spending her first week in “InfoTimes” internship. I would like to share you the “InfoTimes” internship through this notation, which contains outcomes of intensive search processes I did and extensive discussions with my colleagues in InfoTimes to depict a map for the steps of producing data-supported journalist story.
Finding data
Finding data is the starting point for any journalist story. To find data needed for your story, you have two ways: first, you should have an idea of a journalist story which needs to be supported by data as you have a head in need to body; second, you look into your data to search for hidden journalist stories that deserve narration that looks like finding a diamond stone in a coal mine.
Whatever the way you follow, you need a database which may be built by you to support your idea. Building a database is costly; it takes a long time to find, collect, and arrange data to answer the main questions of your idea. Instead of building a database, you can get data which was previously prepared and collected by others.
After picking out a suitable scenario, you will go through 5 steps journey to get integrated data-supported journalist story which is ready for publication. Remember that the below-mentioned steps will be the same whatever the way you pick out to get data. What distinguishes these steps is that they are sequential and their order can’t be changed; time to complete each step is the only thing to be changed, which depends on the kind of an effort needed to complete the step whether it is a manual effort or by using certain programming and whether this programming is available or it needs to be prepared for the first time by a specialized programmer.
The principal step in building any data-supported journalist stories is finding data; the more variant sources you get, the more chances you have to complete your story or even you find more than one story hidden in the data.
Verifying data
After getting data, your first step will be reviewing the included facts before starting its processing and analysis. You can achieve such step on two levels: the first is verification of data source, the second is verification of data content. To verify source of data, you should ask some questions such as what was the source of data? How was data collected? When was data collected? What was the purpose of the collection? What was the method of collection? What were the motives behind its publication? Was any relation between the data related persons and its publication? If we verify that the data related person is a well-known person or a trusted entity whose job is collecting data, neutral and he explained the method of collecting data and its time domain, we can start to process data.
- You should use verifying data operation in case of using data which is provided by an entity or an organization because it is illogic to ask yourself such questions if you are the source of data. The second level of verifying data operation is achieved by random scan of data-included information and ensuring its integrity such as total figures, doubtful figures due to exaggeration and extremism or figures that can be verified through multiple sources such as census numbers, national income, and budget deficit.
Organizing data
At this phase, you will determine your destination through data, build your main idea and start to exclude sub-ideas which may affect your data flow such as extra columns or take you to a sub-destination which is not important for your story. In such case, you should specify the main line of your story and exclude sub-lines which are considered an additional burden on you and later on the reader in case of inserting such additional data in subsequent visual elements.
- At organizing data phase, you should always keep the data original file and make all modifications on a copy of such file where give you chance to refer to if required.
Cleaning data
At this phase, we will move to a more advanced phase in data processing where we take away the repeated columns or columns which contain wrong values or correcting some deficiencies in addition and subtraction processes or calculations in general; to avoid any problems found in final file and preparing it for analysis such as correcting writing mistakes, unifying spelling of similar words, trying to fill missing values and ensuring no extra spaces before and afterwords. Such operations can be done by using trim equivalence, split operations for date and time, unifying formats of date, time and numbers, and making order operations using sort and some other steps.
- What does each record represent? (What does this dataset include?)
- Is it possible that some things might not be included?
- What’s the time frame of the data?
- Are there any fields you don’t understand?
Analyzing data
This step requires some basic statistical skills to get answers of those questions that you have in your mind as a journalist or a researcher investigating a story or journalistic article. At this step, we use some simple calculation operations such as ratio, percentage, percentage of change, arithmetic mean, maximum value, minimum value and others of calculation operations that we can get benefit in analyzing data operation; for example, if we are searching in a database of kidnapped children, you can determine the most common method of kidnapping through a simple calculation operation without the need to review hundreds or thousands of columns that database contains.
- Do the math!
Visualization
This step is an outcome of the previous effort which is the conversion of data statistical analysis into an interactive visual form to make the reader understand data easily and deeply. At such step, we always advise to abide by the rule “show, don’t tell.” Such step depends heavily on the size of data and the relationship with each other; consequently, visualization forms which are used in displaying results are different. There are many patterns of visual display such as static or interactive as scroll telling. Such also depends on the nature of media tool that will be used in displaying results.
Visual patterns which are used in newspaper or publications are different than the used ones in websites and the nature of displayed data because visualization techniques of data of geographic subjects differ than that related to the comparison of change of prices or temporal evolution of a certain cause.
Writing the story
We conclude our journey with the data in the final part of writing the story or the text which narrates the results and explains to the readers the importance of what you get after analyzing and examining data; such text help readers understand detailed drawings and numbers that
your story was built on. At this phase, you must have basic journalistic skills in writing your story. Your text should contain adequate answers to the basic journalistic questions: who, what, when, where, how and all that without dumping the reader into details don’t mean.
- Avoid too many numbers in the text. Characterize the findings.
- Give readers the detail in graphics
- Do you have the right context? The “compared to what”?
Publishing it
Before publishing your story, you have to review some essential points which affect in the interaction of public with the story such as choosing the right time for publishing; it is not right to publish your story when your platform “website” receives the least interaction from the public; hence, using simple and available tools, you can know the ideal time that your public prefer to visit your platform. You have to check shape and advantages of your story by entering your platform using your mobile phone at the first time and through your computer at the second time to be sure that your story is displayed as you want and all interactive visual elements are located at the right position and work efficiently.
- Edit the digital content for usability, clarity, style, spelling/grammar and make sure it complements the story