When data become news. A content analysis of data journalism pieces.

22
1 Conference Paper The Future of Journalism: Risks, Threats and Opportunities September 10-11, 2015, Cardiff University, UK When Data Become News A content analysis of data journalism pieces Wiebke Loosen a , Julius Reimer a & Fenja Schmidt b a Hans-Bredow-Institute for Media Research, Hamburg, Germany b Institute of Journalism and Communication Studies, University of Hamburg, Germany Corresponding author: Wiebke Loosen, [email protected] Abstract For journalism the phenomena of ‘big data’ and an increasingly data-driven society are doubly relevant: First, it is a topic worth covering so that the related developments and their consequences are made understandable and debatable for the public. Second, the ‘computational turn’ has already begun to affect practices of news production and is giving rise to novel ways to identify and tell stories. Thus, what we observe is the emergence of a new journalistic sub-field mostly described as ‘computational/data journalism’. This study focuses on the output of data journalism with the aim of contributing to a better understanding of its reporting styles. The method used is a classical ‘handmade’ standardised content analysis. The sample consists of all the pieces that were nominated for the Data Journalism Award (DJA) an award issued annually by the Global Editors Network in 2013 and 2014 (n= 120). Categories of analysis look at, amongst other aspects, data sources and types, visualisation strategies, interactive features, topics, and types of nominated media outlets. Results show that over 40 percent of the data-driven pieces were published on the websites of (daily or weekly) newspapers; just over 20 Percent came mainly from non-profit organisations for investigative journalism like ProPublica. Almost half of the cases cover a political topic, and social and scientific issues appear frequently too. Financial data and geodata are the types of data used most often and most of the data relates to a national context. More than two-thirds of the projects use data from official sources like Eurostat. Further analyses regard differences between 2013 and 2014 and look deeper into visualisation strategies and interactive features.

Transcript of When data become news. A content analysis of data journalism pieces.

1

Conference Paper

The Future of Journalism: Risks, Threats and Opportunities

September 10-11, 2015, Cardiff University, UK

When Data Become News A content analysis of data journalism pieces

Wiebke Loosena, Julius Reimera & Fenja Schmidtb

a Hans-Bredow-Institute for Media Research, Hamburg, Germany

b Institute of Journalism and Communication Studies, University of Hamburg, Germany

Corresponding author: Wiebke Loosen, [email protected]

Abstract

For journalism the phenomena of ‘big data’ and an increasingly data-driven society are doubly

relevant: First, it is a topic worth covering so that the related developments and their

consequences are made understandable and debatable for the public. Second, the

‘computational turn’ has already begun to affect practices of news production and is giving rise

to novel ways to identify and tell stories. Thus, what we observe is the emergence of a new

journalistic sub-field mostly described as ‘computational/data journalism’. This study focuses on

the output of data journalism – with the aim of contributing to a better understanding of its

reporting styles. The method used is a classical ‘handmade’ standardised content analysis. The

sample consists of all the pieces that were nominated for the Data Journalism Award (DJA) – an

award issued annually by the Global Editors Network – in 2013 and 2014 (n= 120). Categories

of analysis look at, amongst other aspects, data sources and types, visualisation strategies,

interactive features, topics, and types of nominated media outlets. Results show that over 40

percent of the data-driven pieces were published on the websites of (daily or weekly)

newspapers; just over 20 Percent came mainly from non-profit organisations for investigative

journalism like ProPublica. Almost half of the cases cover a political topic, and social and

scientific issues appear frequently too. Financial data and geodata are the types of data used

most often and most of the data relates to a national context. More than two-thirds of the

projects use data from official sources like Eurostat. Further analyses regard differences

between 2013 and 2014 and look deeper into visualisation strategies and interactive features.

2

Introduction and Literature Review

The emergence of data journalism can be understood as journalism’s response to the

datafication of society and the “data deluge” (Lewis, 2015: 322) generated by it. Indeed, the field

of journalism is grappling with the big data phenomenon, creating novel ways to identify and tell

stories (Anderson, 2013; Coddington, 2015; Lewis & Usher, 2014). The intense discussions of

these developments within journalism itself (e.g., Fink & Anderson, 2015; Weinacht & Spiller,

2014) indicate their practical relevance (as well as their ability to stimulate journalistic self-

reflection). Meanwhile the trend is already transforming journalism education, e.g. with data

journalism courses being offered in journalism programs at several universities (Anderson,

2013; Weinacht & Spiller, 2014). The extensive attention paid to data journalism by practitioners

has also fuelled a “rapidly growing body” (Lewis, 2015: 322) of scientific studies on the topic -

“an explosion in data journalism-oriented scholarship” (Fink & Anderson, 2015: 476). These

mainly concentrate on two aspects:

Firstly, scholars have tried to clarify what data journalism is and how it is similar to and different

from investigative journalism, computer-assisted reporting, computational journalism, etc.:

These definitional approaches (e.g., Anderson, 2013; Appelgren & Nygren, 2014; Coddington,

2015; Fink & Anderson, 2015; Gray et al., 2012) highlight the following presumed characteristics

of data journalism:

● It builds on (usually large) sets of quantitative (digital) data as ‘raw material’ which is

subjected to some form of (statistical) analysis in order to find stories in it or tell stories

with it;

● the results “often need visualization” (Gray et al., 2012: n.p.), i.e. they are presented in

the form of maps, bar charts and other graphics;

● it is “characterised by its participatory openness” (Coddington, 2015: 337) and “so-called

crowdsourcing” (Appelgren & Nygren, 2014: 394) in that users help with collecting,

3

analysing or interpreting the data – going along with journalists relinquishing their

“interpretive authority” (Weinacht & Spiller, 2014: 414; own translation);

● and it follows an open data and open source approach by publishing the raw data a story

is built on.

However, scholars’ definitional approaches are often contradicting: While Anderson (2013:

1005), for instance, places data(-driven) journalism within the broader field of “computational

journalism”, Coddington (2015) explicitly distinguishes between these two concepts (as well as

computer-assisted reporting). Although he (2015: 333) acknowledges that computational and

data journalism “are not mutually exclusive” and that they “inevitably overlap”, he identifies

“significant differences between these forms of practice” and makes the point that “not all data

journalism is computational” (Coddington 2015: 336). In addition, computational journalism, for

Coddington (2015: 337–341), is characterised to a much lesser extent by professional expertise

as well as transparency. Consequently, Fink & Anderson (2015: 478) lament the “lack of a

shared definition of data journalism”, which (presumably not only) Coddington (2015: 332)

considers “fundamental” for building “a coherent body of scholarship”.

Secondly, it is the actors involved in the production of data journalism that have been the foci of

many of these studies: Data journalists in Sweden (Appelgren & Nygren, 2014), Belgium (De

Maeyer et al., 2015), the United States (Fink & Anderson, 2015; Parasie, 2014; Parasie &

Dagiral, 2013), Norway (Karlsen & Stavelin, 2014) and Germany (Weinacht & Spiller, 2014)

were interviewed and observed with regard to their (journalistic) role conceptions and self-

understanding as well as their organisational implementation into newsrooms.

However, there are hitherto no systematically gathered insights regarding data journalism as “an

emerging form of storytelling” (Appelgren & Nygren, 2014: 394), i.e. the structural elements of

data-journalistic pieces which could be constitutive of a whole new “pattern of reporting”

(Schmidt & Weischenberg, 1994; own translation) that data journalism represents. The only

exception to this we came across is the study by Parasie & Dagiral (2013) who also analysed

4

data-journalistic pieces. Their data set, however, is very limited in spatio-temporal terms since it

comprises only pieces from one outlet, which were published before March 2011.

Analysing data-journalistic content on a broader scale does not only close this gap and add

substantially to the body of research on this new phenomenon, it is also an important step

towards a more consensual definition, since it appears as though the focus on the actors in the

field alone is not sufficient to clarify the “diffuse term” (Fink & Anderson, 2015: 470) that

data(-driven) journalism continues to embody.

Research Objectives

Against the background of the research gaps identified above, this study focuses on the output

of data journalism. The attempts to define and systematise data journalism mentioned above

suggest that a content analysis of data-driven pieces must, at the very least, look at what are

often considered its core characteristics as discussed above:

a) its foundation on ‘data’ as the ‘raw material’ to tell stories on a certain topic;

b) its particular strategies to process and visualise this data; and

c) the use of interactive features that allow users to explore stories and the underlying data

sets according to their own interests.

Based on these characteristics, the content analysis (e.g., Krippendorff, 2013) principally needs

to collect data on which topics are covered by data journalism pieces, by what means data-

driven stories are told and presented, and on what kind of data (sources) they rely. Here the aim

of this study is twofold: First, we seek to advance current research by contributing to a better

understanding of data journalism as a distinctive reporting style. Second, we also understand

this study as a methodological attempt to develop variables in order to make data journalism’s

products analysable.

5

Therefore, we pursue the following research objectives:

to identify the forms of presentation and the structural elements in data-driven pieces

which are regularly stated in different definitions of data journalism with respect to

o types of data and data processing,

o visualisation elements and

o interactive features;

to describe the topics that are covered in the data-driven projects;

to assess what kind of media organisations are particularly active in the field of data

journalism.

Methodology

Selection of Material for Analysis

The literature review has shown that data journalism is still a “diffuse term” (Fink & Anderson,

2015: 470), making it difficult, or rather preconditional, to identify respective pieces for a content

analysis of data journalism products. Under these circumstances we have chosen a pragmatic

as well as an inductive approach for this first systematic content analysis of data journalism

products. This will help us avoid the possibility of starting with either too narrow or too broad a

definition of what counts as data journalism. That is why our material for analysis relies on a

definition from within data journalism itself: Our database consists of pieces that were

nominated for the Data Journalism Award (DJA) – an award issued annually by the Global

Editors Network1 – in 2013 and 2014. This guarantees that we will analyse projects that count

as data journalism within the field itself and have been chosen to represent significant or

innovative forms of data journalism. Similar approaches to sampling have already proven useful

for analysing particular reporting styles and aspects of storytelling: For instance, Wahl-

1 Cf. http://www.globaleditorsnetwork.org/about-us/ (accessed March 13, 2015).

6

Jorgensen (2013a, 2013b) as well as Lanosga (2014) turned to nominees and winners of the

Pulitzer Prize for researching emotionality in the news and investigative reporting.

Table 1 gives an overview of the sample; if a nomination referred to a media outlet as a whole

and not to a specific project the case was excluded from the analysis as our unit of analysis is a

single data-driven piece.

Tab. 1: Dataset overview

Submissions Nominated projects Projects suited for the analysis

Award-winning projects (% of analysed projects)

2013 >300* 72 56 6 (10.7)

2014 520 75 64 9 (14.1)

Total >820 147 120 15 (12.5)

* The GEN does not specify the number of submissions for 2013, but only states that “more than 300 entries” had been submitted (http://www.globaleditorsnetwork.org/programmes/dja; accessed February 17, 2014).

With respect to the research objectives formulated above this sample also allows us to identify

differences between a) the years 2013 and 2014 as well as b) those data journalism pieces that

were only nominated and those actually awarded.

Codebook

Most variables in the codebook were developed inductively and based on an explorative

analysis of a subsample from 2013. Some categories were inspired by Parasie & Dagiral’s

(2013) study and others were suggested by fellow researcher Julian Ausserhofer and data

journalist Lorenz Matzat. A pretest was conducted with two coders and a subsample of 10

percent of cases. All variables reached an intercoder reliability coefficient (Holsti and

Krippendorff’s Alpha) equivalent to or higher than 0.7 which is generally considered sufficient for

exploratory research (Lombard et al., 2002).

7

The final codebook contains 29 categories in four dimensions (see table 2). With the

presentation of the results we will provide deeper insights on a selection of the analysed

categories as it goes beyond the scope of this paper to present all our findings here.2

Tab. 2: Dimensions and variables of the codebook3

Dimension Variables

formal characteristics - medium - type of medium - winner of DJA - topic - reference to a specific event* - headline - question(s) posed to data - referring article(s)** - length of article - language - number of people involved mentioned by name - external partners

data set - additional information on data* - data source(s) - type(s) of data source(s) - access to data - kind of data - geographical reference - changeability of dataset*** - time period covered - unit of analysis

analysis and journalistic editing of content - personalised case example**** - call for public intervention or criticism** - purpose of data analysis***** - visualisation

context of use - interactive functions - online access to the database** - opportunities of communication

* Suggested by data journalist Lorenz Matzat ** Adopted from Parasie & Dagiral (2013: 5–14) *** Suggested by (data) journalism researcher Julian Ausserhofer **** Inspired by Holtermann 2011 ***** Inspired by Gray et al. (2012: n.p.)

2 For more detailed results as well as information on the study in general see Loosen et al. (forthcoming).

3 The authors will provide the complete codebook on request.

8

Results

To ensure a straightforward entry into the data, we approach the presentation of the results by

organising the research questions in reverse order. We start with the identified media

organisations among the nominees and the staff involved and give insights into the topics

covered by the data-driven pieces in our sample as well as into several of their formal elements.

Against this background we will then present the results gathered with respect to our ‘key

variables’, those dealing with types of data and data processing, visualisation elements and

interactive features.

Where we found differences of statistical or substantial significance between particular groups

of stories (between those from 2013 and those from 2014, between DJA-awarded projects and

those only nominated, between pieces on specific topics, etc.), this is indicated in the text.

Types of organisations among the nominees and number of identifiable

actors involved

Data journalism is the domain of newspapers – at least according to our particular sample. Over

the two years they represent by far the biggest group among the nominees (see table 3).

9

Table 3: Type of medium

2013

(n = 56)

2014

(n = 64)

Awarded (2013 + 2014)

(n = 15)

Total

(n = 120)

Freq % Freq % Freq % Freq %

Website of print newspaper 23 41.1 28 43.8 8 53.3 51 42.5

Website of investigative journalistic organisation

8 14.3 16 25.0 4 26.7 24 20.0

Website of print magazine 4 7.1 11 17.2 - - 15 12.5

Genuine online medium 5 8.9 2 3.1 1 6.7 7 5.8

Website of public broadcasting company

5 8.9 2 3.1 2 13.3 7 5.8

Website of university medium 3 5.4 - - - - 5 4.2

Website of non-journalistic organisation 2 3.6 2 3.1 - - 4 3.3

Website of private broadcasting company

2 3.6 1 1.6 - - 3 2.5

Website of news agency 3 5.4 - - - - 3 2.5

Other* 1 1.8 - - - - 1 0.8

* In this case: private website of freelance journalist Gregor Aisch

The print organisations nominated the most include The New York Times, the US magazine

Mother Jones, the Argentinian newspaper La Nación and The Guardian. Another important

group are investigative journalistic organisations such as Pro Publica – contributing the most

projects in total (12 cases) – and The Center for Public Integrity. Pro Publica and The Guardian

are the only organisations which are represented with more than one project in both years.

Our sample includes projects from twenty different countries, half of them, however, are

represented with a single project. The top three countries are the United States (47,5%), Great

Britain (12,5%) and Germany (8,3%). It is not surprising, then, that more than two thirds of the

nominated pieces are in English (67,5 %). This might be partly explained by the fact that data

journalism has a longer history in English speaking countries. The next most frequently

occurring projects are bi- or multinational (14,2%). In most of these cases, the project exists in

10

two versions: in English and in the medium’s native language (most common: Spanish and

German with 4 cases each). The predominance of the English language proved constant over

the two years.

Our results also illustrate that data journalism, more often than not, is a collaborative effort. In

cases where the project contained credits (n = 100), almost five individuals are named as

authors or contributors on average. More than a third of all projects have been realised in

association with external partners either contributing to the analysis or designing the

visualisations of the data being reported on.

The average number of people involved in the production of data-driven projects increased from

about four in 2013 to nearly six people in 2014 (M = 4.13, SD = 3.84 vs. M = 5.55, SD = 3.97).

This difference is only significant on the 10%-level (t = 1.812, dF = 98, p < .10) but could be

interpreted as an indication that the production of data journalism is increasingly personnel

intensive – at least as far as our sample is concerned. The difference between projects only

nominated (M = 4.66, SD = 3.92, n = 86) and those awarded (M = 6.21, SD = 4.04, n = 14) is

even larger but is statistically insignificant (however, this could be due to the different sample

sizes).

Topics covered in data journalism pieces and formal elements

The data journalism in our sample is dominated by politics. Almost half of the pieces analysed

(48,3%) cover a political topic or combine political aspects with other topics. Typical subjects are

election trends or results: “Exit Polls 2012: How the Vote has Shifted”4, “Bundestagswahl 2013

in Berlin – Alle Stimmen der 1709 Wahllokale (The 2013 General Election in Berlin – Every Vote

from 1,709 Polling Stations)”, and “Municipales 2014 (Local Elections 2014)” are prime

examples of this subgroup. Political issues are often combined with financial ones, as seen in:

“Ethics Explorer – A Guide to the Financial Interests of Elected Officials”, “Gastos en el Senado

4 A list of (and links to) all projects nominated for a DJA in 2013 and 2014 is available on: http://community.globaleditorsnetwork.org/projects_by_global_event/744 (accessed August 31, 2015).

11

2004-2013 (Senate Expenses 2004-2013)” and “Il prezzo della politica italiana: 5 miliardi di euro

in 20 anni (The Price of Italy's Politics: 5bn Euros in 20 Years)”.

Societal issues such as census results and crime reports are also a preferential topic for data

journalism, accounting for one third of cases (33.3%). So are health and science (21.7%; e.g.

“Hooked – Canada’s Pill Problem”, “Life on the Line: 911 Breakdowns at LAFD”, “Innovative

Energy Projects in Developing Countries”) as well as business and economy (20.0%). This

illustrates that data journalism is mainly concerned with those domains where data are

(becoming) routinely available.

The subjects with the least coverage in our sample were education (7.5%), sports (2,5%;

actually a topic that is predestined for data journalism due to its traditional ‘data-centricity’) and

culture (1,7%; examples for this category are: “Le marché de l'art pour les nuls” (The Art Market

for Dummies) and “Front Row to Fashion Week”).

More than half of data-driven stories are topic-oriented (53.3%), i.e. they are not driven by a

particular recent event. Political pieces are more likely to refer to a specific event (58.6%) like an

election, while those dealing with society (37.5%), economy (33.3%) or health and science

(26.9%) are much less likely to do so.

Data journalism is not only relying on data as its genuine ‘fuel’, but often provides additional

context and interpretation: Almost half of the analysed pieces in our sample contain one

accompanying text contribution (48.3%); more than a fifth (22.5%) even come with a whole

dossier of multiple articles. However, we also found that 17.5 percent of pieces had no

additional articles.

One way to counter the abstractness of quantitative data is to complement it with a personalised

case example – also a regular technique for non-data-driven journalism: For instance, a health-

related article will start with the story of one patient. This storytelling technique could be found in

40.8 percent of the pieces analysed, while the rates were considerably lower for economic and

education topics (20.8% and 22.2%).

12

Types of data and data processing

By far most pieces in our sample are based on data by official institutions like Eurostat, Land

Statistical Offices, and ministries for education or defense (see table 4). The second largest

group is data journalism that uses data by ‘other, non-commercial organisations’ which include

universities, research institutes and NGOs. Roughly 20 percent rely on their own data sources

which means that the respective media organisation collected the data itself, e.g. through a

survey or by searching its own archives.

Table 4: Type of data source (multiple coding possible)

2013

(n = 56)

2014

(n = 64)

Awarded (2013 + 2014)

(n = 15)

Total

(n = 120)

Freq % Freq % Freq % Freq %

Official institution 37 66.1 44 68.8 10 66.7 81 67.5

Other, non-commercial organisation 19 33.9 34 53.1 6 40.0 53 44.2

Own source 13 23.2 9 14.1 3 20.0 22 18.3

Private company 8 14.3 12 18.8 3 20.0 20 16.7

Source not indicated 3 5.4 5 7.8 - - 8 6.7

It is considered a quality criterion in data journalism that data sources are indicated; yet,

approximately seven percent of the cases did not indicate where they got the data from.

However, this is not the case for any of the awarded pieces.

More than half of the stories in which a source is indicated (n = 112) analyse data from only one

kind of source (53.6%). Those pieces building on two (39.3%) or more (7.2%) different kinds of

sources most frequently combined official data with data from either non-commercial

organisations (34 cases), their own organisation (10 cases) or private companies (9 cases).

Political topics are more likely to be covered with the help of data from an official institution

(79.3%). Stories from 2014 are built on data from other, non-commercial organisations

13

significantly more often (53.1%) than those from 2013 (33.9%; χ 2 = 4.463, dF = 1, p < .05).

Early data journalism focused on data from official institutions and other well-known sources

and only now is it beginning to discover data sources beyond these.

Sixty percent of the projects used data collected on a national level. However, data journalism is

also adaptable to ‘smaller’ scales: Almost a quarter of cases are based on data from a regional

context (24.2%) and 18.3 percent analyse information gathered on a local level (multiple coding

was possible). Stories awarded with a DJA are even more likely to refer to data on a national

level (80.0.%) than those who were only nominated (57.1%). However, this difference is only

significant on the 10%-level (χ 2 = 2.857, dF = 1, p < .01). Stories from 2014 were significantly

less likely to draw on regional data (9.4%) than those from 2013 (41.1%; χ 2 = 16.373, dF = 1, p

< .01). One explanation for this pattern could be the fact that the large newspapers particularly

active in the field of data journalism increasingly try to reach spatially extended audiences for

which data on a national level are assumed to have a higher news value than those on a

regional scale.

Many projects (60.0%) use data referring to a simple unit of analysis (e.g., a person, a flight, a

vote); complex units like a nation or a company are covered in almost half of all cases (46.7%).

Only 10.8 percent of cases deal with an aggregated unit of analysis like a household or a class

of schoolchildren.

Most of the analysed pieces (also) rely on data that is publicly available. This is due to the fact

that most data originates from official institutions. However, in two-fifths of the pieces, journalists

did not indicate how they accessed the data (see table 5). In 18.3 percent of cases, the data

had to be requested from the source since it was not publicly available beforehand. Freedom of

Information requests also belong in this category and were sometimes explicitly mentioned in

the additional information about the data. Only very few cases are based on data scraped or

collected otherwise by the journalists themselves (e.g., “Gastos en el Senado 2004-2013

14

(Senate Expenses 2004-2013)”). Moreover, despite the public attention devoted to it, work using

leaked data is rare.

Table 5: Access to data (multiple coding possible)

2013

(n = 56)

2014

(n = 64)

Awarded (2013 + 2014)

(n = 15)

Total

(n = 120)

Freq % Freq % Freq % Freq %

Public available data 22 39.3 28 43.8 7 46.7 50 41.7

Access to data not indicated 20 35.7 28 43.8 3 20.0 48 40.0

Requested data 12 21.4 10 15.6 4 26.7 22 18.3

Scraped data 3 5.4 5 7.8 1 6.7 8 6.7

Own data collection 5 8.9 1 1.6 1 6.7 6 5.0

Leaked data 1 1.8 3 4.7 - - 4 3.3

The data journalism we analysed relied to a large extent on financial data (45.4%), geodata

(42.9%) and measured values which are compiled by sensors or measuring tools (39.5%; e.g.,

aircraft noise, weather data, train speeds) (see table 6). While this last category gained

prominence over the years, award-winning projects rely on this kind of data to a below average

extent. Other types of data used more frequently in 2014 are: personal data – i.e. information

which can be attributed to individual persons – and metadata – i.e. ‘data about data’, for

instance about individual instances of application use and data content. In contrast, the use of

sociodemographic data – i.e. data that is only available in the form of mean values for larger

groups – has decreased. However, none of these differences is statistically significant. The kind

of data used least frequently originates from polls or surveys.

15

Table 6: Kind of data (multiple coding possible)

2013

(n = 55)

2014

(n = 64)

Awarded (2013 + 2014)

(n = 15)

Total

(n = 119)

Freq % Freq % Freq % Freq %

Financial data 25 45.5 29 45.3 8 53.5 54 45.4

Geo data 26 47.3 25 39.1 6 40.0 51 42.9

Measured values 19 34.5 28 43.8 4 26.7 47 39.5

Sociodemographic data 21 38.2 16 25.0 4 26.7 37 31.1

Personal data 12 21.8 21 32.8 5 33.3 33 27.7

Metadata 7 12.7 13 20.3 1* 6.7 20 16.8

Poll ratings / survey data 8 14.5 7 10.9 1 6.7 15 12.6

Other data - - - - 1 6.7 2** 1.7

* “Homes for the Taking” ** Legislative texts in “Gay Rights by State”, experts’ reports in “Who's Pulling the Strings of D.C. Puppet

Corporations?”

Only about a quarter of the pieces nominated for a DJA were based on a single type of data

(26.6%); most stories referred to two (40.0%) or three (25.0%) different kinds. Most frequently,

geodata was combined with either measured values (25 cases; e.g. radiation levels in becquerel

or noise exposure in decibel) or with sociodemographic or financial information (20 cases each);

furthermore, sociodemographic statistics appeared together with either financial data (17 cases)

or measured values (15 cases).

Almost all of our cases use ‘static’ data that does not change (93.3%); only seven pieces are

built on data that are updated regularly (e.g., “Bloomberg Billionaires: Today’s Ranking of the

World’s Richest People”), and only one project worked with real time data (“Tweetómetro” that

used data directly from the Twitter API).

16

Table 7 gives an overview on what is actually shown with the data:

Table 7: Purpose of data analysis (multiple coding possible)

2013

(n = 56)

2014

(n = 64)

Awarded (2013 + 2014)

(n = 15)

Total

(n = 120)

Freq % Freq % Freq % Freq %

Compare values 46 82.1 56 87.5 15 100.0 102 85.0

Show connections and flows 18 32.1 23 35.9 4 26.7 41 34.2

Show changes over time 26 46.4 30 46.9 8 53.3 56 46.7

Show hierarchy 8 14.3 6 9.4 1 6.7 14 11.7

In the vast majority of cases, the data is analysed with a focus on comparing values (e.g., to

show differences between men and women or neighbourhoods) and almost half of the pieces

show changes over time (e.g. “Climate Change: How Hot Will It Get in My Lifetime?”).

Connections and flows are illustrated in more than a third of all projects. Less frequent are

pieces that use data to show hierarchies – as in “Women as Academic Authors” which ranks the

most important female scientists. In stories that deal with, among others, a political topic, data is

used significantly more often to show connections and processes than in the average data-

driven project (46.6% vs. 34.2%): For instance, the project “Rede de Escândalos (Network of

Scandals)” shows connections between Brazilian politicians and their involvement in different

political scandals, and “Consider the Source” follows cash flows from corporations to non-profit

organisations.

Visualisation and other structural elements

If we think of data journalism as a distinct style of reporting it is crucial to learn about the

particular ways it tells stories. The following results refer to these aspects and deal with the

kinds of visualisations and interactive features that are applied to data-driven pieces.

17

Table 8 shows that there seems to be a more or less stable set of visualisation elements which

mainly includes pictures (60.0%), simple static charts (54.2%), and maps (49.2%); over a

quarter of the projects (also) work with tables (26.7%). Animated visualisations are rarer

(15.8%). This partly echoes the statements of the data journalists interviewed by Appelgren &

Nygren (2014: 403) who “described maps as the standard visualizing method”.

The share of animations as well as that of pictures is relatively high among the award-winning

projects as well as among pieces from 2014. However, both differences are statistically

significant only for pictures (χ2 = 2.857, dF = 1, p < .10 and χ 2 = 8.058, dF = 1, p < .01).

Moreover, comparing pieces (also) covering societal topics with those (also) focusing on

politics, we find that the former contain simple, static graphics more often (65.0% vs. 54.2%) as

well as maps (65.0% vs. 49.2%) while offering a search function less frequently (15.0% vs.

26.7%). Moreover, political stories are even less likely to contain an animation (6.9%) than the

average piece.

Table 8: Visualisation (multiple coding possible)

2013

(n = 56)

2014

(n = 64)

Awarded (2013 + 2014)

(n = 15)

Total

(n = 120)

Freq % Freq % Freq % Freq %

Pictures 26 46.4 46 71.9 12 80.0 72 60.0

Simple static chart(s) 31 55.4 34 53.1 7 46.7 65 54.2

Map 29 51.8 30 46.9 7 46.7 59 49.2

Table 14 25.0 18 28.1 4 26.7 32 26.7

Combined static diagram(s) 11 19.6 11 17.2 3 20.0 22 18.3

Animated visualisation 6 10.7 13 20.3 4 26.7 19 15.8

No visualisation - - - - - - - -

18

On average, the pieces contained more than two different5 visualisations (M = 2.24, SD = 1.05).

The numbers are only slightly (and not statistically significant) higher for awarded projects as

compared to those only nominated as well as for stories from 2014 as compared to pieces from

2013. Typical combinations of visualising elements include simple static charts with pictures (38

cases) or with a map (33 cases).

Interactive Features

Interactive elements are often discussed as a ‘key characteristic’ of data journalism (e.g.,

Coddington, 2015; Gray et al., 2012; Weinacht & Spiller, 2014). However, we found that an

18.3% share of cases have no interactive functions at all; political stories are even more likely to

come without them (24.1%). Yet, this is true for only one of the award winning projects

(“Reshaping New York”) (see table 9) leading us to speculate that their use is considered a

quality criterion for the DJA.

Table 9: Interactive functions (multiple coding possible)

2013

(n = 56)

2014

(n = 64)

Awarded (2013 + 2014)

(n = 15)

Total

(n = 120)

Freq % Freq % Freq % Freq %

Zoom / details on demand 32 57.1 35 54.7 10 66.7 67 55.8

Filtering 30 53.6 32 50.0 7 46.7 62 51.7

Search 17 30.4 15 23.4 1 6.7 32 26.7

No interactive functions 7 12.5 15 23.4 1 6.7 22 18.3

Personalisation 13 23.2 9 14.1 4 26.7 22 18.3

Playful interaction 2 3.6 1 1.6 - - 3 2.5

The interactive features most often integrated are zoom functions for maps, details on demand

(e.g., the number of victims for each case of a reported school shooting), and filtering functions

5 We did not take into account whether elements of the same kind were included more than once: Several pictures, for instance, were counted as one visualisation.

19

which allow the user to filter the provided data with respect to different variables (e.g., to only

select voting results from one state or one year). Personalisation tools – where the user has to

enter personal data like their ZIP code or age to tailor the piece with customised data – is less

common (18.3% of cases). Only three projects include an opportunity for a playful interaction

(e.g., “Heart Saver”, a game in which the user has to send ambulances as fast as possible to

fictional characters having a heart attack).

Despite the large amount of projects which offer no interactive option at all, the average piece

contains 1.55 different6 features (SD = 1.10), with seven stories offering the maximum number

of four interactive elements.

Conclusion/Discussion

What does the reporting style of data journalism look like? According to our study the ‘typical’

data-driven piece

is published by a newspaper,

covers a political topic,

relies on public data from official sources,

builds its story on financial and/or geodata – preferably collected on a national scale,

is based on a simple unit of analysis such as single persons,

compares values in order to show differences and similarities between different objects

of study (e.g., people of different gender, neighbourhoods)

combines two types of visualisations – preferably pictures with maps or simple charts,

allows the user to zoom into a map, request details and/or to filter data.

6 We did not take into account whether feature of the same kind (e.g., zoom-in function) were offered more than once (e.g., in more than one map included in the story).

20

Overall, this shows that data journalism as a reporting style is firmly characterised by those

elements that cursory observations, literature reviews and actor studies have already hinted at.

However, these characteristics do not apply to all data-journalistic projects and we found a

variety of (combinations of) other story elements used less frequently but still often enough to be

significant. This also means that our study could not conclusively clarify what the reporting style

of data journalism is in terms of a universal definition. By the same token, we confirmed the

diversity of forms, topics and combinations of story elements that other researchers’ partly

contradicting definitions already implied. Obviously, data journalism as an emerging reporting

style is both still evolving and flexible in that different types of data, analyses and visualisation

strategies can be combined – or omitted – when it suits the topic and story.

However, our sample has particular limitations as it is based on pieces that have a double bias:

First, as nominees for a data journalism award they represent a special group. Second, these

pieces are based on self-selection as any data journalist is able to hand in his/her data-driven

pieces to be considered for nomination by the organising committee.

Despite these limitations the sample also has two particular advantages: First, we can assume

that the analysed cases, as nominees for a DJA, fulfill a certain quality standard and that the

awarded pieces in particular are seen by experts in the field as a ‘gold standard’ and as such

could influence the development of the field into the future. Second, the comparison between

two successive years allows us, to a certain degree, to trace the field’s development. However,

we did not find any significant differences (on a 5%-level) between 2013 and 2014 as well as

between nominees and awarded pieces with regards to the sheer (average) number of certain

story elements: visualisations, topics touched, sources and types of data used as well as

interactive functions. Consequently, it is neither likely that data-driven pieces are awarded if

they follow the principle ‘the more, the better’, nor do we observe a trend in that direction if we

compare 2013 and 2014. We can assume, therefore, that data-driven means are not considered

award-worthy or applied by journalists as an end in itself but that they clearly have to support

21

the story being told. This echoes Coddington’s (2015: 339) observation that data journalists

subordinate the use of data “to the professional journalistic value of narrative and the ‘story.’ [...]

[D]ata journalism discourse foregrounds telling the story over using data”.

References

Anderson, Chris W. (2013). Towards a sociology of computational and algorithmic journalism. New Media

& Society, 15(7), pp. 1005–1021.

Appelgren, Ester; Nygren, Gunnar (2014). Data journalism in Sweden. Introducing new methods and

genres of journalism into “old” organizations. Digital Journalism, 2(3), pp. 394–405.

Coddington, Mark (2015). Clarifying journalism’s quantitative turn. A typology for evaluating data

journalism, computational journalism, and computer-assisted reporting. Digital Journalism, 3(3), pp.

331–348.

De Maeyer, Juliette; Libert, Manon; Domingo, David; Heinderyckx, François; Le Cam, Florence (2015).

Waiting for data journalism. A qualitative assessment of the anecdotal take-up of data journalism in

French-speaking Belgium. Digital Journalism, 3(3), pp. 432–446.

Fink, Katherine; Anderson, Christopher W. (2015). Data journalism in the United States. Beyond the

“usual suspects”. Journalism Studies, 6(4), pp. 467–481.

Gray, Jonathan; Bounegru, Liliana; Chambers, Lucy (eds.) (2012): The data journalism handbook. How

journalists can use data to improve the news. (Early release). Sebastopol: O’Reilly.

Holtermann, Hannes (2011): Datenjournalismus: eine neue Form der journalistischen Wertschöpfung aus

Daten [Data journalism: a new form of journalistically creating value from data]. Unpublished Master

Thesis. Hamburg.

Karlsen, Joakim; Stavelin, Eirik (2014). Computational journalism in Norwegian newsrooms. Journalism

Practice, 8(1), pp. 34–48.

Krippendorff, Klaus (2013). Content analysis: an introduction to its methodology. Los Angeles: SAGE.

Lanosga, Gerry (2014): New views of investigative reporting in the twentieth century. American

Journalism, 31(4), pp. 490–506.

Lewis, Seth C. (2015). Journalism in an era of big data. Digital Journalism, 3(3), pp. 321–330.

Lewis, Seth C.; Usher, Nikki (2014). Code, collaboration, and the future of journalism. A case study of the

Hacks/Hackers global network. Digital Journalism, 2(3), pp. 383–393.

22

Lombard, Matthew; Snyder-Duch, Jennifer; Bracken, Cheryl Campanella (2002): Content Analysis in

Mass Communication. Assessment and Reporting of Intercoder Reliability. Human Communication

Research, 28(4), pp. 587–604.

Loosen, Wiebke; Schmidt, Fenja; Reimer, Julius (2015, forthcoming). “When data become news”: eine

explorative Inhaltsanalyse daten-journalistischer Projekte [“When Data Become News”: an

exploratory content analysis of data-journalistic projects]. Hamburg: Hans-Bredow-Institut.

Parasie, Sylvain (2014). Data-driven revelation? Epistemological tensions in investigative journalism in

the age of “big data”. Digital Journalism, DOI: 10.1080/21670811.2014.976408.

Parasie, Sylvain; Dagiral, Eric (2013). Data-driven journalism and the public good. “Computer-assisted-

reporters” and “programmer-journalists” in Chicago. New Media & Society, 15(6), pp. 853–871.

Schmidt, Siegfried J.; Weischenberg, Siegfried (1994). Mediengattungen, Berichterstattungsmuster,

Darstellungsformen [Media genres, patterns of reporting, presentation forms]. In: Merten, Klaus;

Schmidt, Siegfried J.; Weischenberg, Siegfried (eds.): Die Wirklichkeit der Medien [The reality of the

media]. Opladen: Westdeutscher Verlag, pp. 212–236.

Wahl-Jorgensen, Karin (2013a) Subjectivity and story-telling in journalism. Examining expressions of

affect, judgement and appreciation in Pulitzer Prize-winning stories. Journalism Studies 14(3), pp.

305–20.

Wahl-Jorgensen, Karin (2013b): The strategic ritual of emotionality: a case study of Pulitzer Prize-winning

articles. Journalism 14(1), pp. 129–45.

Weinacht, Stefan; Spiller, Ralf (2014). Datenjournalismus in Deutschland. Eine explorative Untersuchung

zu Rollenbildern von Datenjournalisten [Data-journalism in Germany. An exploratory study on the

role conceptions of data-journalists]. Publizistik, 59(4), pp. 411–433.