Auditing Wikipedia's Hyperlinks Network on Polarizing Topics

13
Auditing Wikipedia’s Hyperlinks Network on Polarizing Topics Cristina Menghini DIAG Sapienza University Rome, Italy [email protected] Aris Anagnostopoulos DIAG Sapienza University Rome, Italy [email protected] Eli Upfal Dept. of Computer Science Brown University Providence, RI, USA [email protected] ABSTRACT People eager to learn about a topic can access Wikipedia to form a preliminary opinion. Despite the solid revision process behind the encyclopedia’s articles, the users’ exploration process is still influenced by the hyperlinks’ network. In this paper, we shed light on this overlooked phenomenon by investigating how articles de- scribing complementary subjects of a topic interconnect, and thus may shape readers’ exposure to diverging content. To quantify this, we introduce the exposure to diverse information, a metric that captures how users’ exposure to multiple subjects of a topic varies click-after-click by leveraging navigation models. For the experiments, we collected six topic-induced networks about polarizing topics and analyzed the extent to which their topologies induce readers to examine diverse content. More specif- ically, we take two sets of articles about opposing stances (e.g., guns control and guns right ) and measure the probability that users move within or across the sets, by simulating their behavior via a Wikipedia-tailored model. Our findings show that the networks hinder users to symmetrically explore diverse content. Moreover, on average, the probability that the networks nudge users to remain in a knowledge bubble is up to an order of magnitude higher than that of exploring pages of contrasting subjects. Taken together, those findings return a new and intriguing picture of Wikipedia’s network structural influence on polarizing issues’ exploration. KEYWORDS Wikipedia, Hyperlinks Network, Polarization, User Behavior ACM Reference Format: Cristina Menghini, Aris Anagnostopoulos, and Eli Upfal. 2021. Auditing Wikipedia’s Hyperlinks Network on Polarizing Topics. In Proceeding of The Web Conference 2021, April 19–23, 2021, Ljubljana, Slovenia. ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn 1 INTRODUCTION Knowledge on Wikipedia is distributed across articles inter-connected via hyperlinks. According to Wikipedia’s Linking Manual [49], "In- ternal links can add to the cohesion and utility of Wikipedia, allowing Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. WWW ’21, April 19–23, 2021, Ljubljana, Slovenia © 2021 Association for Computing Machinery. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00 https://doi.org/10.1145/nnnnnnn.nnnnnnn readers to deepen their understanding of a topic by conveniently ac- cessing other articles." Consequently, while reading an article, users are directly exposed to its content and indirectly exposed to the content of the pages it points to. Wikipedia’s pages are the result of collaborative efforts of a com- mitted community that, following policies and guidelines [4, 20], generates and maintains up-to-date and high-quality content [28, 40]. Even though tools support the community for curating pages and adding links, it lacks a systematic way to contextualize the pages within the more general articles’ network. Indeed, it is im- portant to stress that having access to high-quality pages does not imply a comprehensive exposure to an argument, especially for a broader or polarizing topic. Users differently use Wikipedia, according to their information needs. Singer et al. [45] show that users curious about a topic explore it by browsing the encyclopedia. In fact, they rely on hyperlinks to find correlated or complementary content to the subject of interest. Therefore, it is crucial to evaluate the extent to which the current link structure encourages users to browse related topics to develop a more comprehensive view and perspective of a subject. This theme becomes particularly important when users look for an overview on polarizing topics spanning across multiple articles. Wikipedia’s Neutral Point of View (NPOV) encourages editors to work such that articles’ content fairly and proportionately repre- sents all the significant views that have been published by reliable sources on the subject [51]. Although the NPOV document gathers many suggestions to properly curate the direct content of pages, it does not refer to the impact links might have in determining users’ exposure to indirect content. Suppose we consider the topic abortion. It is a broad issue, which distributes across multiple articles on Wikipedia. Moreover, due to its polarizing nature, it is possible to recognize pages about events, people, subjects or organizations that are associated either to pro- choice or pro-life standings. Users willing to learn about abortion might access the encyclopedia to collect information and then de- velop their idea. Consider a user that enters the network reading the article Abortion-rights movement that portrays and outlines campaigns supporting abortion. We assume that the article’s body does not endorse the page’s subject due to the NPOV principle. So, we expect that the user acquires objective knowledge about organizations supporting abortion and, maybe, also realizes the existence of anti-abortion movements. Now, imagine that the user decides to continue her exploration of the topic, and to do it, she follows the hyperlinks within the current page. If the linkage to pages regarding subjects close to pro-life view is weak, our user has little possibilities of collecting diverse views that contribute to the users’ development of a comprehensive perspective on the topic. It arXiv:2007.08197v4 [cs.SI] 8 Mar 2021

Transcript of Auditing Wikipedia's Hyperlinks Network on Polarizing Topics

Auditing Wikipediarsquos Hyperlinks Network on Polarizing TopicsCristina Menghini

DIAG

Sapienza University

Rome Italy

menghinidiaguniroma1it

Aris Anagnostopoulos

DIAG

Sapienza University

Rome Italy

arisdiaguniroma1it

Eli Upfal

Dept of Computer Science

Brown University

Providence RI USA

elicsbrownedu

ABSTRACTPeople eager to learn about a topic can accessWikipedia to form

a preliminary opinion Despite the solid revision process behind

the encyclopediarsquos articles the usersrsquo exploration process is still

influenced by the hyperlinksrsquo network In this paper we shed light

on this overlooked phenomenon by investigating how articles de-

scribing complementary subjects of a topic interconnect and thus

may shape readersrsquo exposure to diverging content To quantify

this we introduce the exposure to diverse information a metric that

captures how usersrsquo exposure to multiple subjects of a topic varies

click-after-click by leveraging navigation models

For the experiments we collected six topic-induced networks

about polarizing topics and analyzed the extent to which their

topologies induce readers to examine diverse content More specif-

ically we take two sets of articles about opposing stances (eg

guns control and guns right) and measure the probability that users

move within or across the sets by simulating their behavior via

a Wikipedia-tailored model Our findings show that the networks

hinder users to symmetrically explore diverse content Moreover

on average the probability that the networks nudge users to remain

in a knowledge bubble is up to an order of magnitude higher than

that of exploring pages of contrasting subjects Taken together

those findings return a new and intriguing picture of Wikipediarsquos

network structural influence on polarizing issuesrsquo exploration

KEYWORDSWikipedia Hyperlinks Network Polarization User Behavior

ACM Reference FormatCristina Menghini Aris Anagnostopoulos and Eli Upfal 2021 Auditing

Wikipediarsquos Hyperlinks Network on Polarizing Topics In Proceeding of TheWeb Conference 2021 April 19ndash23 2021 Ljubljana Slovenia ACM New York

NY USA 13 pages httpsdoiorg101145nnnnnnnnnnnnnn

1 INTRODUCTIONKnowledge onWikipedia is distributed across articles inter-connected

via hyperlinks According to Wikipediarsquos Linking Manual [49] In-ternal links can add to the cohesion and utility of Wikipedia allowing

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for profit or commercial advantage and that copies bear this notice and the full citation

on the first page Copyrights for components of this work owned by others than ACM

must be honored Abstracting with credit is permitted To copy otherwise or republish

to post on servers or to redistribute to lists requires prior specific permission andor a

fee Request permissions from permissionsacmorg

WWW rsquo21 April 19ndash23 2021 Ljubljana Sloveniacopy 2021 Association for Computing Machinery

ACM ISBN 978-x-xxxx-xxxx-xYYMM $1500

httpsdoiorg101145nnnnnnnnnnnnnn

readers to deepen their understanding of a topic by conveniently ac-cessing other articles Consequently while reading an article users

are directly exposed to its content and indirectly exposed to the

content of the pages it points to

Wikipediarsquos pages are the result of collaborative efforts of a com-

mitted community that following policies and guidelines [4 20]

generates and maintains up-to-date and high-quality content [28

40] Even though tools support the community for curating pages

and adding links it lacks a systematic way to contextualize the

pages within the more general articlesrsquo network Indeed it is im-

portant to stress that having access to high-quality pages does not

imply a comprehensive exposure to an argument especially for a

broader or polarizing topic

Users differently use Wikipedia according to their informationneeds Singer et al [45] show that users curious about a topic explore

it by browsing the encyclopedia In fact they rely on hyperlinks to

find correlated or complementary content to the subject of interest

Therefore it is crucial to evaluate the extent to which the current

link structure encourages users to browse related topics to develop a

more comprehensive view and perspective of a subject This theme

becomes particularly important when users look for an overview

on polarizing topics spanning across multiple articles

Wikipediarsquos Neutral Point of View (NPOV) encourages editors

to work such that articlesrsquo content fairly and proportionately repre-

sents all the significant views that have been published by reliable

sources on the subject [51] Although the NPOV document gathers

many suggestions to properly curate the direct content of pages itdoes not refer to the impact links might have in determining usersrsquo

exposure to indirect contentSuppose we consider the topic abortion It is a broad issue which

distributes across multiple articles on Wikipedia Moreover due to

its polarizing nature it is possible to recognize pages about events

people subjects or organizations that are associated either to pro-choice or pro-life standings Users willing to learn about abortionmight access the encyclopedia to collect information and then de-

velop their idea Consider a user that enters the network reading

the article Abortion-rights movement that portrays and outlines

campaigns supporting abortion We assume that the articlersquos body

does not endorse the pagersquos subject due to the NPOV principle

So we expect that the user acquires objective knowledge about

organizations supporting abortion and maybe also realizes the

existence of anti-abortion movements Now imagine that the user

decides to continue her exploration of the topic and to do it she

follows the hyperlinks within the current page If the linkage to

pages regarding subjects close to pro-life view is weak our user has

little possibilities of collecting diverse views that contribute to the

usersrsquo development of a comprehensive perspective on the topic It

arX

iv2

007

0819

7v4

[cs

SI]

8 M

ar 2

021

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

follows that the lack of sufficient linkage among pages expressing

diverse stances of a topic can be against the NPOVrsquos goals

To our best knowledge there are no studies that investigate

the validity of the NPOV principles concerning usersrsquo exposure to

the indirect content (ie the one suggested by hyperlinks) Hence

analyzing Wikipediarsquos links is particularly important to understand

if broad topics which conceptually span across multiple articles

are effectively proportionately and fairly presented to readers not

only in terms of direct content (ie articlersquos body)Previous works addressed the issue of usersrsquo polarization on

social networks and showed that it is hard for users to interact

with content createdshared by users of opposing views [1 9 10

19 24] Ribeiro et al [42] empirically showed that the YouTube

recommender system contributes to radicalize usersrsquo pathways

Given the nature and role of Wikipedia as a primary source of

knowledge acquisition the lack of broad exposure to different views

of a topic appears to be critical to guarantee fair and balanced access

to a well-rounded knowledge

This paper provides a first observational study on Wikipedia

that aims to quantify how the hyperlinksrsquo network topology can

profoundly affect user exposure to diverse stances on polarizing

topics Having a comprehensive view of the connections among

Wikipedia pages and how they shape reader exposure to informa-

tion is a difficult task to grasp for humans Therefore it requires

introducing algorithmic methods to audit and quantify the mutual

level of exposure among articles of diverse content especially for

polarizing matters That is fundamental for the improvement of the

encyclopedia and its role in promoting a self-critical society

By studying the hyperlinks network we first aim to discover

to what extent the networkrsquos topology pushes users to explore

diverse content rather than keep them within knowledge bubbles1Secondly we aim to gain insights that may help to design a system

supporting editors in (1) contextualizing pages within the more

general encyclopediarsquos network and (2) adding links connecting

articles of opposingcomplementary views

In summary this paper tackles the following research questions

RQ1 How do readers consume articles about polarizing topics

(Sect 5)

RQ2 To what extent does the hyperlinksrsquo network expose read-

ers to diverse information (Sect 6)

By answering them we make the following contributions

bull We initiate a discussion that aims to shed light on the role

that the hyperlink network plays in connecting articles be-

longing to different categories We focus our work on ana-

lyzing this phenomenon on a set of polarizing topics such

as abortion guns evolution

bull We define two metrics the exposure to diverse informationand the (mutual) exposure to diverse information to quan-

tify the strength of connections among sets of articles (eg

pages about abortion-rights and anti-abortion) These met-

rics quantify to what extent the network topology assists

readers to visit pages of contrasting subjects and whether it

does it equally for all them (see Sections 41 412 and 42)

To this end they embed readers possible behavior relying

1We intend as knowledge bubbles the sets of pages presenting one side of a con-

tentious subject (ie pages about pro-life or pro-choice movements)

on their behavioral patterns [43 45] features determining

the success of wikilinks [16 31] and readersrsquo clickstream

data [53]

bull We find that the structure of the network facilitates users

to explore knowledge bubbles of homogeneous view rather

than opposing stances Moreover we show that readersrsquo

interest is biased toward one side of the topic based on the

internal and external traffic on Wikipedia (see Sect 411 5

and 6)

To our knowledge this is the first work that analyzes Wikipediarsquos

readersrsquo exposure to diverse information through the link network

Before moving on we want to emphasize that this work does not

claim how the hyperlinks network should be rather we aim to

study if the current connections among articles encumber users in

visiting complementary pages about a polarizing topic Also our

conclusions come from a network-based analysis More advanced

investigation combining network properties and articlesrsquo content is

left out for future works The code to replicate the paper is stored

in an anonymous folder2

2 RELATEDWORKSWe divide this paper related work in four categories ImprovingWikipedia Navigating Wikipedia Wikipedia Categorization and

Polarization on Social MediaImprovingWikipedia The scientific community proposed semi-

automated procedures to improveWikipediarsquos quality These works

check the veracity of references [18 41] suggest articlesrsquo structure

[39] look for hoaxes [30] or recommend links [38 54] Although

link recommendation tools enrich the editing process they do not

provide editors a measure to evaluate the relationship among ar-

ticles containing diverse opinions In this work we define such

metrics Sect 42

Wikipedia Navigation The literature still lacks a model that

generalizes Wikipediarsquos usersrsquo behavior Previous studies [25 27

31 46] focused on modeling and predicting human navigation in-

side Wikipedia relying on traces from navigation games ie Wik-ispeedia [43 48] and WikiGame [13 29 46]3 While such games

provide valuable insights about how users exploit links to go from

one concept to another Singer et al [45] and [15 17] showed that

users display different behavioral patterns depending on their in-

formation needs and the linksrsquo position within pages Thus we

exploit the insights provided by Singer et al [45] to define a general

model mimicking localized and more in-depth topic exploration

We further enrich the model characterizing usersrsquo next-link choices

according to findings in [15 17] Sections 412 and 413

Wikipedia Categorization In this work we need to collect ar-

ticles expressing the distinct facets of a polarizing topic Wikimedia

provides a supervised classifier ie ORES4that based on features

derived from the articlesrsquo text categorize an article into a manually-

designed categories taxonomy5[3] Alternatively one can use topic

models [5 6] Unfortunately none of the above approaches provide

2httpsdrivegooglecomdrivefolders1CJr_YiFE2YlyAtB9yKaGe8CLwVLWx9Ta

usp=sharing

3These games ask readers to go from one article to another using wikilinks

4httpsoreswikimediaorg

5httpswwwmediawikiorgwikiORESArticletopicTaxonomy

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

Figure 1 From Wikipediarsquos graph to a topic-induced net-work The image on the left shows the original Wikipediarsquosgraph On the rights we have the final topic-induced net-work The dashed circles in119882 identify the set of nodes thatwe use to build the topic-induced network 119866 The color redrefers to the set of nodes 119875 We use the blue to indicate 119875

and green and yellow for N and 119904 respectively To keep theimage tidy we do not specify the edges direction

us the requested granularity So we exploit the collection proce-

dure employed by Shi et al [44] who needed the same data to study

how polarization in teams impact articlesrsquo content about polarizing

topics see Sect 3

Polarization on Social Media There is a large spectrum of

works related to detect [1 10 12 19 36] model quantify and mit-

igate [2 9 21ndash24 32 34 35 37] polarization on social media We

focus on the work of Garimella et al [24] that better relates to our

metric of exposure to diverse information (ExDIN) They introduced

a graph polarization measure based on random walks ie Random

Walk Controversy score (RWC) On a social graph it quantifies to

what extent opinionated users are more exposed to their own opin-

ion than the opposite thanks to a chain of retweets (represented by

the random walks) While RWC is conceived for networks of users

and measures the overall polarization of a graph ExDIN works

on information networks and quantifies how the networkrsquos topol-

ogy impacts the usersrsquo exposure to diverse information when they

navigate the graph

Cultural bias onWikipedia Callahan and Herring [8] showedthe presence of cultural bias in the same articles of different lan-

guages Other studies highlighted differences between women and

men biographies [26 47] These content-based analyses call for

the need for a thorough investigation of the phenomenon To this

end we decide to investigate the presence of bias in the hyperlink

network by quantifying the diversity of pages it suggests to users

browsing the network of articles

3 DATA COLLECTIONTo audit a polarizing topic on Wikipedia we encode it by building

a topic-induced network This representation embeds both the

network structure and readersrsquo interactions with the topic

31 Topic Induced NetworksIn this section we explain how to build a topic-induced network

We suggest the reader to follow the process looking at Figure 1

First we consider the directed English Wikipediarsquos graph119882 =

(119860 119871) The nodes of the graph are encyclopediarsquos pages classified

as Articles [50] The edges represent the links connecting pages

and are known as wikilinks6 This set of links includes those in the

infoboxes7

Among the vertices we identify a set of pages T sub 119860 about the

different polarizing sides of a given topic We partition T into two

sets 119875 and 119875 (ie 119875 cap 119875 = empty and 119875 cup 119875 = T ) Each of them gather

pages related to the same side of the topic Then we define the set

of nodes N that includes all vertices at one-hop distance from the

vertices in T The reason we consider nodes representing pages

outside T is twofold (1) We want to include in the graph those

nodes related to the topic that do not appear in T because describe

subjects neutral to the topic8 (2) When we will consider readers

exploring the network we want to account for the possibility that

they reach pages about entities of opposing opinion passing through

articles not strictly related to the topic (see Sect 41)

To reduce the complexity of our analysis we cluster all the pages

in 119878 = 119860 (T cup N) in one super node 119904 Note that nodes in 119878

are only connected to vertices in N For each node 119907 isin N we can

have multiple edges going to 119904 We compress them in a unique edge

(119907 119904) Respectively 119904 can point multiple times to the same node

119907 isin N So we compress them to a unique edge (119904 119907) In both cases

the weights of (119907 119904) and (119904 119907) will be the sum of weights of the

aggregated edges

Finally we built a directed weighted network119866 = (119881 119864) that wecall topic-induced network whose set of vertices119881 is T cupNcup119904 ofcardinality 119899 + 1 and the edges 119864 are the links connecting the pages

The edge weights are transition probabilities as follows Let119872 be

an (119899 + 1) times (119899 + 1) right-stochastic transition matrix associated

to 119866 that is a matrix such that each entry 119898119894 119895 is a probability

with119898119894 119895 = 0 if (119894 119895) notin 119864 and such that

sum119899+1

119895=1119898119894 119895 = 1 The entry

119898119894 119895 describes the probability that being on article 119894 a reader clicks

page 119895 In Section 412 we propose different characterizations of

the transition matrix

Summarizing to extract the topic-induced network of a given

topic we first extracted data from a complete English Wikipedia

database dump9From this dump we build the graph119882 To collect

the corpus of articles expressing different opinions about the topic

(ie T ) we rely on the collection strategy adopted by the authors

of [44] (see Sect 2) In particular the subcorpus belonging to 119875

consists of all articles categorized under a Wikipedia category de-

scribing a viewpoint and its subcategories For instance the corpus

of abortion articles consists of two subcorpora pro-life (119875 ) and

pro-choice (119875 ) articles The pro-life subcorpus consists of all articlescategorized under the seed category ldquoAnti-abortion movementrdquo and

its subcategoriesFor instance the article ldquoFetal rightsrdquo is directly un-

der the seed category whereas the article ldquoCrisis pregnancy centerrdquo

is located under the subcategory ldquoAnti-abortion organizationsrdquo The

pro-choice corpus is collected in a similar fashion starting from the

category ldquoAbortion-rights movementrdquo Note that because we want

6We exclude links within the same page Moreover while building the graph

we resolve all the redirects [52] Specifically for any given node 119903 pointed by 119906 and

redirecting to 119907 we replace the edges (119906 119903 ) and (119903 119907) with (119906 119907) The final effectof this operation is that we exclude all the redirecting nodes from119860 while retaining

their connections to the rest of the graph

7An infobox is a fixed-format table usually added to the top right-hand corner of

articles to consistently present a summary of some unifying aspect that the articles

share

8For instance articles that present an overall introductiondescription of the topic

9Unless differently specified we refer to the dump of September 2020

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

Topic |119881 119904| |119875 | |119875 | |N | |119864 | |119864119875rarr119875 | |119864119875rarr119875 | |119864119875rarrN | |119864119875rarrN | |119864Nrarr119875 | |119864Nrarr119875 | 119876 (119875 119875) Unreach(119875 ) Unreach(119875 )

Abort 56861 481 291 56093 19M 205 97 21843 14492 21396 29889 029 (030041) 21 481

Cannabis 32743 45 231 32470 11M 8 6 1089 15055 656 27823 027 (014 003) 1136 349

Guns 65743 167 187 65393 25M 98 115 18342 12304 56702 16608 026 (024 030) 363 000

Evolution 84788 342 1334 83113 199M 391 135 18289 45472 15601 58720 020 (022 027) 169 562

Racism 129963 1024 1022 127953 48M 746 560 64359 41566 74354 58195 032 (021 031) 272 255

LGBT 150563 459 640 149479 46M 195 143 28100 22678 92975 81706 034 (030 013) 244 535

Table 1 Networksrsquo statistics The notation119876 (119875 119875) isin [0 1] indicates the modularity among the partitions Higher119876 means thatconnections within partitions exceed those among them

Pro-lifePro-choice

ProhibiitonActivism

ControlRights

CreationismEvol Bio

Racism Anti-racism

DiscriminationSupport

000

025

050

075

Links

pos

ition

Opposite opinion Same opinion

Figure 2 Linksrsquo position distribution within pages Given 119875 and 119875 the orange boxplots show the distribution of links withinpages in 119875 (resp 119875) that point to articles in 119875 (resp 119875 ) The green boxes represent linkrsquos placement among pages only belongingto 119875 (resp 119875 ) The value of the y-axis is the relative position re-scaled with the 119905119886119899ℎ to similarly score links at the top of thepage Higher the value higher the position in the page is

119875 and 119875 to be disjoint articles belonging to both ldquoAnti-abortion

movementrdquo and ldquoAbortion-rights movementrdquo are assigned to N10

Once we have the list of pages in T we proceed building the topic-

induced network as described in the first part of this section The

articles we collect gather pages about different entities such as

organizations people events The inclusion of a heterogeneous set

of pages for each viewpoint allows to capture the different way a

user can learnknow about a topic

Before moving on we need to make two remarks (1) Throughoutthe paper when we talk about articles expressing an opinion ordescribe a viewpoint of a topic we do not mean that they endorse

the position of any subject they describe But they objectively talk of

entities that are close to one side of the issue (2) Since subcategoriesare often redundant or not entirely related to the parent category

we check them manually In this way we avoid cases like having

articles about anti-racism falling into the racism category Moreover

we do not consider categories whose names do not include topic-

specific keywords

32 General Statistics on Topicsrsquo NetworksFollowing the procedure explained in the previous section we

collect the topic-induced network related to six different topics

that we pick from the List of controversial issues on Wikipedia11

and other resources that indicate some controversial issues in our

society These topics are abortion cannabis guns evolution LGBTand racism These are critical topics that often polarize as follows

pro-choice vs prolife cannabis activism vs cannabis prohibition

gun control vs gun rights creationism vs evolutionary biology

support to LGBT rights vs opposition to LGBT rights and racism

10We report the size of the intersections between partitions in the next section

11httpsenwikipediaorgwikiWikipediaList_of_controversial_issues

Topic 119875 119875 Seed 119875 Seed 119875

Abortion Pro-life Pro-choice

Anti-abortion

movement

Abortion-rights

movement

Cannabis Prohibition Activism Cannabis prohibition Cannabis activism

Guns Control Rights

Gun control

advocacy groups

Gun rights

advocacy groups

Evolution Creationism

Evolutionary

biology

Creationism

Evolutionary

biology

Racism Racism Anti-racism Racism Anti-racism

LGBT Discrimination Support

Discrimination against

LGBT people

LGBT rights

movement

Table 2 The table indicates what opinion of a topic the par-titions 119875 and 119875 correspond to

vs anti-racism Information about the seed categories of each topic

are in Table 2 The full category lists and sample titles are provided

in the code folder Sect 1

For the rest of the paper we refer to the opinions about a topic

using 119875 and 119875 In Table 2 for each topic we match each set to the

real opinion it represents

Before presenting the general statistics of the retrieved networks

we remark that when we assign the articles to partitions we put

to the set N those assigned to both partitions The size of the

intersections among partitions (ie the number of common articles)

are the following abortion is 2 cannabis is 3 evolution is 2 guns is 1lgbt is 5 racism is 7 Recalling that we do not remove these articles

(ie they belong to N ) they can still act as bridges connecting 119875

and 119875 in sessions longer than 1 click Instead when we consider

the direct connections among partitions (1 click) we discard them

since they do not explicitly categorized into one partition

In Table 1 we show some statistics on the six topic-induced

networks Immediately we observe that the size of 119875 and 119875 differ

substantially for all the topics except for racism and guns It means

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

that we have one of the two opinions represented by more articles

In terms of content this does not necessarily imply that neither

one of the two views is incomplete nor insufficiently represented

Indeed a topic spans a few articles or may require more pages to

be complete On the other hand the unbalanced sizes can affect an

opinionrsquos exposure within the entire Wikipediarsquos network Practi-

cally if a set of articles is large and well connected to the rest of

the network the chances that users who randomly browses reach

it are higher than those of going to a small partition Moreover if

readers exploit the random article functionality of Wikipedia an

opinion more represented gets more chances of being randomly

sampled

The topics showing the higher unbalance are cannabis wherethere are five times more pages about activism than about prohi-bition and evolution where there are four times more pages about

evolutionary biology than about creationism If we consider the edges

across partitions the number of cross-partition edges is higher for

bigger sets This is reasonable because more nodes can point to the

opposite side Despite that for evolution the edges from creationismto evolutionary biology are sim3 times more and for LGBT the edges

from discrimination to rights are 36 more Despite the low number

of edges across cannabis partitions we decide not to discard the

topic

Above we said that one of the two partitions might connect

better to the rest of the encyclopedia We observe that the sizes of 119875

and 119875 are not linear in the number of edges that point out or to the

nodes in the partitions For instance the number of articles about

pro-choice (291) is half of the nodes related to pro-life movement

(481) Although the nodes in pro-life are twice as many as those in

pro-choice the number of links pointing to pages about pro-choiceis 36 more than those pointing to pro-life articles This happenswith different magnitude also for guns and LGBT We will see later

that the fact that a side of a topic is better blended in the network

has implications on the readersrsquo exposure to one of the two sides

of the topic (Sect 6)

We also investigate how many pages in 119875 and 119875 cannot be

reached by users unless they enter Wikipedia directly on those

pages The sets of articles with the highest number of unreachable

nodes are in the category of cannabis prohibition (1136) followed

by the 562 of evolutionary biology and LGBT rights (535)Furthermore we compute the modularity 119876 among 119875 and 119875

Higher 119876 means that connections within partitions exceed those

among them In Table 1 we report three values computed on dif-

ferently weighted graphs with probabilities assigned to click the

link of each page as follows (1) uniform (2) proportional to the

position of the link within the page and (3) proportional to readersrsquo

clickstream (see Sect 412) Overall if we consider the position of

links and readersrsquo clickstream it seems that the partitions are more

modular

Based on that we study how links across and within partitions

position in pages First we define the position of a link Given a

page we have its list of links in order of appearance We get the

relative rank within the list for each link and re-scale it by the tanh

In this way we have values in [0 1] and the links at the top of the

list get a more similar score The set of links includes those in the

infoboxes We regard them as at the top of the article according to

results in [15 17] If a link appears more than once we average its

position

In Figure 2 we show the position distributions According to

the t-test whose significant level is fixed to 120572 = 095 the average

position of links in pro-choice pointing to pro-choice is significantlydifferent than the average position of links pointing to pro-life Alsothe position of links from guns control to guns control is signifi-cantly higher than those to guns rights For evolutionary biologywhose distribution of links to creationism are placed statistically

significantly lower than those to evolutionary biology The same

happens for LGBTFor the sake of completeness of the analysis even if not used

further in the paper for each topic we study the quality of the pages

populating it In particular we use the ORES API to get the ldquoarticlequalityrdquo We observe that overall for all the topics between 60 and

70 the articles are classified as stubs or start Then the 22-29 is

in B-class the 0-5 are Featured Articles and the remaining belong

to the C-class12

4 METRICSIn this section we define the models and metrics that we use to an-

swer the research questions formulated in Sect 1 First we describe

how we characterize readersrsquo consumption either by analyzing

real usersrsquo data or by simulating their behavior (see Sect 41) Then

we introduce the core metrics of the paper ExDIN and M-ExDIN

see Sect 42

41 Content ConsumptionTo understand readersrsquo consumption of polarizing topics we need

different modeling strategies that we describe in the following

subsections

411 Metrics Based on Clickstream We build twometrics upon the

information we extract from usersrsquo clickstream data that are made

publicly available by Wikimedia and preserve usersrsquo privacy [14

54]13

From these data we infer 119888119894 119895 counting how many times a hyper-

link to 119894 isin 119881 is clicked from page 119895 The page 119895 may be either an

internalWikipedia page ( 119895 isin 119860 recalling that119881 = T cupNcup119878 includeall the Wikipedia pages) or external if corresponds to a page from

outside Wikipedia (eg a search engine) Thus we define the vari-

able120575 119895 which indicateswhether 119895 is an external page or it belongs to

the topic-induced network 120575 119895 = 1 if 119895 is external and 0 otherwise

Given a page 119894 we indicate withJ the set of external and internal

pages pointing to it see Figure 3 We define 119888119894 =sum

119895 isinJ 119888 119895119894 to be

the total clicks to the page

sum119895 isinJ 120575119894119888 119895119894 is the total number of clicks

from external websites therefore the difference between 119888119894 and this

summation is the number of visits from internal (Wikipedia) pages

Now we are ready to define the following metrics

Reader Search Rate (RSR) Given a page 119894 isin 119881 the empirical

probability that a visit to page 119894 is from an external website is

119877119878119877119894 =

sum119895 isinJ 120575119894119888 119895119894

119888119894 (1)

12httpsenwikipediaorgwikiTemplateGrading_scheme

13Description of the data is at httpsmetawikimediaorgwikiResearch

Wikipedia_clickstream The provided information is enough to extract the clickstream

based metrics

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

Click Through Rate (CTR) Given a page 119894 isin 119881 the empirical

probability that a reader clicks a link within the page is

119862119879119877119894 =

sum119895 isin119873119900119906119905 (119894) 119888119894 119895

119888119894 (2)

where 119873119900119906119905 (119894) is the the set of pages 119894 points to (Multiple clicks

from the same page are counted as originating from different visits

to 119894 and thus counted multiple times in 119888119894 )

412 Model Clicks Within Pages When readers visit a page they

have the possibility of clicking any of the present links However

according to the information needs they want to satisfy each of the

links may have a different probability of being clicked [45] Now

we propose three models to describe the distribution probability

of clicking a link ldquojrdquo within an article ldquoirdquo First let 119894 be an article

in 119881 and 119895 isin 119873119900119906119905 (119894) We define 119901119900119904 ( 119895 |119894) as the rank of 119895 among

all links in 119894 and 119903 ( 119895 |119894) = |119873119900119906119905 (119894) | minus 119901119900119904 ( 119895 |119894) such that a higher

value indicates a higher ranking position Moreover we introduce

tanh119909 = 1198902119909minus1

1198902119909+1 which we use to transform ranking positions to

values between 0 and 1 such that links at the top of the page get

similar scores

The Clicks Within Pages models (CwP) are directly applicable on

119866 by setting the transition matrix119872 in one of the following modes

(1) 119872119906(Uniform) whose entry119898(119894 119895) = 1

|119873119900119906119905 (119894) | mimics read-

ers who click each link in a page uniformly at random

(2) 119872119901(Position) whose entry 119898(119894 119895) =

tanh 119903 ( 119895 |119894)sum119895isin119873119900119906119905 (119894 ) tanh 119903 ( 119895 |119894)

captures the scenario in which readers click with higher

probability links appearing first in the page This model is

based on previous work that shows how the link position is

a good predictor to determine the success of a link [16 31]

(3) 119872119888(Clicks) whose entry119898(119894 119895) = 119888119894119895sum

119895isin119873119900119906119905 (119894 ) 119888119894119895represents

the empirical probability that users in 119894 will click the link

toward 119895 When 119888119894 119895 lt 10 we substitute it with 1014 the

minimum number of times the link must be clicked to be

included in the dataset [53]

For the sake of completeness we recall that 119866 includes a super

node 119904 To fill its corresponding entries in the transition matrices

we need to aggregate over the edges we compressed to build the

graph15

see Sect 31

413 Readers Navigation Model The main goal of this paper is

to audit the mutual exposure to diverse information across 119875 and

119875 We can do it by simply looking at a snapshot of the graph and

counting the links going from 119875 to 119875 and vice-versa To do a step

further we recall that the Wikipediarsquos network is conceived to let

users move fulfilling their own information needs Thus we want

to understand how different usersrsquo navigation behavior can affect

readersrsquo exposure to diverse information

To do that it would be optimal to have access to usersrsquo log ses-

sion Because these data are not available to the public we define a

parametric model that simulates usersrsquo navigation by embedding

14We aim to model users on the current version of Wikipedia Thus to include all

the links we assign a smoothing factor equal to 10 to links clicked less than 10 This

implies a small probability of clicking these links Setting the smoothing factor to 10

is a deliberate choice However we experimentally verified that setting any number

between 1 and 10 does not affect the results

15The computation of these quantities is straightforward so we omit it from the

body of the paper

external

internal

i internal

Figure 3 Information from the clickstreamdataset For eachnode we extract the number of views coming from inter-nal and external websites Moreover we know howmany ac-cesses on a page turn into a click toward another article

different behaviors accordingly to chosen parameter We empha-

size that the scope of this model is not to perfectly replicate usersrsquo

behavior on Wikipedia Rather we want to see how users simu-

lated from a reasonable and general model are exposed to diverse

information

In other words we want to define a stochastic process with 119899 +1

states corresponding to the 119899 + 1 pages in119881 that approximates the

probability of reaching any of the articles starting at random from

119901 isin 119875 (or from 119875 )

Wemodel this by considering the process 119883 ℓ ℓ = 0 1 119871 on

the set of nodes119881 induced by transitionmatrix119872 with starting state

119883 0selected from the probability distribution 1206450

119875= (120587119875 )119894 isin R1times119899

over119881 We recall that the transition matrix119872 can vary according to

the CwP models (Sections 31 and 412) Based on the assumption

that usersrsquo session length (the number of clicks) is finite we evaluate

the process on a finite number of states 119871 We have that Pr(119883 ℓ =

119895) = (120587 ℓ119875) 119895 where the (row) vector 120645 ℓ

119875is given by the following

variation of the Personalized Random Walk with Restart (RWR)

Definition 1 (Navigation Model) Let1198720 be the transition ma-trix embedding a click-within-pages model 1206450

119875the distribution of the

starting state over 119875 and 120572 isin [0 1] the restart parameter We have

1206451

119875 = 1206450

119875middot1198720 (3)

and for ℓ ge 1

120645 ℓ+1

119875 = (1 minus 120572)120645 ℓ119875 middot119872ℓ + 120572 (1206450

119875 middot119872ℓ ) (4)

where119872ℓ = norm((119863 (119872ℓminus1)119879 )119879 ) and119863 = 119889119894119886119892

(1 + 120645 ℓminus1

119875

)minus1

norm(119872)transforms matrix119872 into a right-stochastic matrix by normalizingeach row independently such that it sums to 1

This process is a variation of the standard random-surfer (PageR-

ank) model with the difference that the transition matrix is updated

in each step It takes into account the probability that an article

has already been visited in a previous iteration Specifically the

vector 120645 ℓ119875that we get at the end of each iteration represents the

likelihood that each node is reached at step ℓ if it starts uniformly at

random from a node in 119875 We assume that readers within the same

session do not click more than once the same link Thus we desire

that at step ℓ + 1 the nodes that are clicked with high probability

at step ℓ see their probability of being reached deflated and those

with lower probability have more chances of being clicked We

achieve this by dividing the rows of119872 by the vector of probabilities

120645 ℓ119875+1 where 1 is a smoothing factor to avoid divisions by 0 and

then normalize the matrix to get the updated stochastic matrix to

use in the next iteration

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

(a) Star-like (120572 = 1) (b) Star-like Rand Navigation(0 lt

120572 lt 1)

(c) Random Naviga-

tion (0 = 120572 )

Figure 4 Navigation model for different 120572 The green nodesrepresent the starting navigation pages

Overall as we will later see in Section 42 this approach allow us

to investigate how the exposure to diverse information varies for

users who behave differently in terms of navigation session length

(meant as the number of clicks) and next-link choices

Looking deeper at the model

bull When 120572 = 1 Figure 4(a) the model emulates the reader

whose navigation consists in just opening links from the

starting page We call this behavior star-like and basically

consists in opening pages from the starting node With this

kind of exploration readers locally explore articles likely

semantically related to each other [49]

bull For 0 lt 120572 lt 1 Figure 4(b) we simulate two cases (1) readers

open sequential articles and then jump back to the starting

page (2) readers keeps multiple path open The more 120572 is

close to 1 the more users show a star-like behavior Instead

the closer 120572 is to 0 the more users navigate navigate in a

more DFS-oriented fashion Thus readers move randomly

according to the CwP model and from time to times jump

back to the starting page

bull If 120572 = 0 Figure 4(c) the users sequentially clicks links so

each click depends only on the CwP model In this case

especially if related articles are not densely connected the

exploration can lead to articles less related to the starting

page and returning to the origin following hyperlinks may

be difficult

Because Wikipedia does not have a button that allows readers to

go back to the previous page we assume the jumping back action

to consist in clicking the back button of the browser in use until

reaching again the session starting page The restart parameter

indirectly embeds the back-button action which for the absence of

back-links on Wikipedia can not be tracked on the graph

The behaviors replicated through the model recall those de-

scribed in [43 45]

42 Exposure MetricsAt this point we have all the ingredients to define the exposureto diverse information The metrics aim to quantify how much the

network structure allows readers to reach one or multiple sets of

articles To do that we rely on both the CwP and Navigationmodels

The application of the following metrics is not limited to polarizing

topics In fact they can generalize to the analysis of any sets of

nodes in a graph For this reason we adopt a more general notation

in their definition

Pro-

life

Pro-

choi

ce

Proh

ibiit

onAc

tivism

Cont

rol

Righ

ts

Crea

tioni

smEv

ol B

io

Racis

m

Anti-

racis

m

Disc

rimin

atio

nSu

ppor

t

0

5

10

Log(

Page

view

s)

Figure 5 Pageviews distribution For each topic we havea purple and yellow boxplot They represent the average(over all pages in the group 119875 or 119875) number of pageviewsAll the distribution distributions except for abortion are sta-tistically different at confidence level 120572 = 095 The topicsin order are abortion cannabis guns evolution racism andLGBT

Definition 2 (Exposure to diverse information (ExDIN)) Giventwo sets of pages 119875 119875 in 119881 let 120645 ℓ

119875be the vector indicating for each

article the probability of being reached at step ℓ (ℓ ge 1) starting froma random page in 119875 We say that the exposure of 119875 to 119875 is

119890ℓ119875rarr119875

=sum119895 isin119875

Pr(119883 ℓ = 119895) =sum119895 isin119875

120645 ℓ119875 (5)

and describes the probability that a reader in 119875 reaches an arbitrarynode in 119875 at the ℓth click

We employ this metric in two ways

(1) (Topological exposure to diverse information) If ℓ is 1 and

the CwP model is 119872119906(see Sect 412) it only quantifies

the topological property of the network to connect pages

belonging to different sets

(2) (Readersrsquo exposure to diverse information) For any parameter

and model that we pick the metric tells us how the readers

characterized by the CwP and Navigation models change

their exposure to diverse information over a session (ie

sequence of clicks)

Moreover we notice that Definition 2 can be extended tomultiple

sets Consider the case where we want to understand how one set of

nodes 119875 is exposed to three sets of nodes 119876119885 and 119871 To calculate

the ExDIN if we want to know the total exposure to the three sets

we define 119875 = 119876 cup 119885 cup 119871 Otherwise if we want to have the ExDIN

wrt to each set namely 119890119875rarr119876 119890119875rarr119885 119890119875rarr119871 we take 120645ℓ119875and sum

up the probabilities of the nodes within each set

Now that we have a metric to compute the exposure to diverseinformation we want to compare the flows among the sets Thus

we introduce the mutual exposure to diverse information

Definition 3 (Mutual exposure to diverse information (M-ExDIN))Let 119890ℓ

119875rarr119875and 119890ℓ

119875rarr119875be the exposure to diverse information of sets 119875

and 119875 We say that the mutual exposure between the sets is

120598ℓ =min119890ℓ

119875rarr119875 119890ℓ119875rarr119875

max119890ℓ119875rarr119875

119890ℓ119875rarr119875

isin [0 1] (6)

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

00 05 10 15 20 25 of pageview from

opposite partition

Pro-lifePro-choice

ProhibitionActivism

Gun controlGun rights

CreazionismEvolutionary biology

LGBT discriminationLGBT rights

RacismAnti-racism

Figure 6 Percentage of pageviews coming from oppositeside Topics in order from the bottom abortion cannabisguns evolution LGBT racism

If either 119890ℓ119875rarr119875

or 119890ℓ119875rarr119875

is 0 then 120598 = 0

This measure quantifies to what extend the exposure to diverse

information is balanced across 119875 and 119875

The closer 120598 is to 1 the more balanced the probabilities of moving

from one set to the other are In this case the network topology does

not favor connections from one set to the other On the other hand

if 120598 is close to 0 it the network structure tends to favor either the

navigation from 119875 rarr 119875 or from 119875 rarr 119875 On this perspective if we

observe a tendency in the network of facilitating the exploration

from one of the sets to the other we may say that the network

topology is biased toward a direction Thus we can think of using

M-ExDIN to measure the bias in the network wrt two sets of

nodes

Even though the mutual exposure to diverse information cap-

tures the balance among ExDIN of 119875 and 119875 when they are of com-

parable size it may fail if one is much smaller than the other For

instance suppose 119875 is 10 times larger than 119875 then if pages of both

partitions have a similar out-degree distribution one would expect

119890ℓ119875rarr119875

asymp 10 middot 119890ℓ119875rarr119875

and as a result 120598 asymp 01 The same happens if

they have similar in-degree distribution For this reason when we

compute either ExDIN and M-ExDIN we check whether the sizes

of the communities are unbalanced and we proceed as follows If

|119875 | lt |119875 | we define 119875 prime obtained by sampling |119875 | articles from 119875

Thus we use the new set for all computations Because of the ran-

domness of the phenomenon we repeat the measurements multiple

times

5 RQ1 READERSrsquo TOPIC CONSUMPTIONBefore looking into how readers are exposed to diverse content

we investigate how they have consumed each of the six topics

that we concentrate on over the last four years In particular we

collect monthly clickstream data from November 2017 to September

2020 We note that when we count the click views of a page we

consider the average over the number of months the page existed16

Accordingly when computing the occurrences for the transitions

matrix based on clickstream we consider the average clicks of the

link over the number of months it exists In this way we reduce

16Based on the temporal graphs extracted by [11]

the seasonality effect and weight links according to page changes

in terms of hyperlinks

51 Pageviews DistributionTo start our analysis we count the average number of times a

page has been visited over 34 months In Figure 5 we plot the

log-distributions of the pageviews for each topic and opinion By

running a t-test we conclude that for all topics except for abortionthe difference of the means of opinionsrsquo pageviews is statistically

significant for 119901 lt 005 This finding demonstrate that users tend to

visit more pages expressingsupporting one of the two viewpoints

From a networkrsquos perspective to increase the exposure to opposing

opinions it is desirable for pages that are frequently visited to be

well connected to articles expressing opposing opinions

In Figure 6 we break down the pageviews showing how many

of them come from pages of the opposing partition Overall the

fraction of visits from the opposite side is low (below 05) The

category LGBT rights has the highest ratio of visits from LGBT dis-crimination pages about 25 For topics such as guns and abortionthe percentage of visits from opposite partition shows that there

are somewhat fewer visits to pages of a liberal inclination from

articles expressing a more conservative opinion In fact the 028

of visits to pro-choice come from pro-life compared to 06 visits of

pro-life from pro-choice

52 External or Internal Access to the TopicWe now investigate how readers access content about a topic As

introduced in Section 411 from the clickstream data we can com-

pute the RSR which indicates whether a page is accessed more

by external sources or by navigating Wikipedia In Figure 7 we

provide a visualization that depicts the flows of the cumulative

visits from external and internal pages towards the two partitions

Referring to Figure 7(c) the 448 of visualizations come from

internal pages The click stream from internal pages is broken down

to see the proportion of flow towards guns control and gun rightsThe internal views of guns control articles are 34 times more than

those of gun rights We observe that also from external websites

most of the traffic is towards gun control (27 times more than gunrights) Overall the 26 of the total visits to gun related content is

concentrated on gun rightsThe abundance of traffic towards one of the two opinions does

not characterize only the guns topic Indeed among all the topics

the 59ndash74 of visits is accumulated by one partition Moreover

readersrsquo preferences appear consistent among external and internal

accesses that is they both point more towards the same view of

the topic For both internal and external views the distribution

of accesses toward partitions is approximately the same (ie the

percentage of visits from external to 119875 (resp 119875 ) is the same of

from internal to 119875 (resp119875 )) The only exception is evolution whoseexternal visits to creationism is 453 lower than internal accesses

We note that partitions with higher views are not necessarily the

biggest in the topic-induced networks

In general the largest amount of visits to topicsrsquo articles comes

from external pages Particularly only the 236 and 335 of traffic

to evolution and racism is generated by the internal Wikipediarsquos

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll PPPPPrrrrrooooo-----llllliiiiifffffeeeee

PPPPPrrrrrooooo-----ccccchhhhhoooooiiiiiccccceeeee

(a) Abortion

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

PPPPPrrrrrooooohhhhhiiiiibbbbbiiiiiiiiiitttttooooonnnnn

AAAAAccccctttttiiiiivvvvviiiiisssssmmmmm

(b) Cannabis

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

CCCCCooooonnnnntttttrrrrrooooolllll

RRRRRiiiiiggggghhhhhtttttsssss

(c) Guns

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllllCCCCCrrrrreeeeeaaaaatttttiiiiiooooonnnnniiiiisssssmmmmm

EEEEEvvvvvooooolllll BBBBBiiiiiooooo

(d) Evolution

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

RRRRRaaaaaccccciiiiisssssmmmmm

AAAAAnnnnntttttiiiii-----rrrrraaaaaccccciiiiisssssmmmmm

(e) Racism

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

DDDDDiiiiissssscccccrrrrriiiiimmmmmiiiiinnnnnaaaaatttttiiiiiooooonnnnn

SSSSSuuuuuppppppppppooooorrrrrttttt

(f) LGBT

Figure 7 Cumulative Pagesrsquo Traffic Eachplot indicates (1) On the left the cumulative amount of accesses coming from externalweb pages or internal Wikipediarsquos articles (2) The flows of visits from external and internal pages to partitions (3) On theright the cumulative accesses to 119875 and 119875

0 5 10 15 20 25 30Avg(Click Through Rate) X 100

Pro-lifePro-choiceProhibition

ActivismGun controlGun rights

CreazionismEvolutionary biologyLGBT discrimination

LGBT rightsRacism

Anti-racism

Figure 8 Average click-through rate In this plot we reportthe average CTR of pages belonging to the same set 119875 (resp119875) The score indicate the average probability that a linkwithin a page 119901 in 119875 (resp 119875) is clicked Topics in orderfrom the bottom abortion cannabis guns evolution racismLGBT

navigation The same quantity for the remaining topics ranges

ranges between 44 and 47

We point out that readersrsquo consulting articles about abortioncannabis and guns are inclined toward pages conveying liberal

views on the topic Instead it is more complicated to draw inter-

pretations about the remaining topics One explanation may be

that users look for information generally less covered in the public

mainstream debate

53 How Much Readers Navigate LinksOnce readers visit a page they can decide to click any of its links

We want to understand how frequently they do so For that we

compute the average pagesrsquo click-through rate (Sect 411)

We plot this information in Figure 8 Overall we see that the

percentage of access turning into a visit to another page ranges

between 10ndash28 Dimitrov and Lemmerich [14] observed that the

CTR average for the whole Wikipedia is 12 So most of the subset

of pages we consider have a CTR higher than Wikipediarsquos average

The CTR of guns control is the highest (28) the pages about racismfollow with 26 The articles that over the years have generated

less internal traffic are those about evolutionary biology and LGBTrights

Examining pagesrsquo connections we found that those with higher

CTR have more links (the Pearson correlation coefficient of is 052)

Topic macrC(119875)

macrC(119875 rarr 119875) macrC(119875 rarr 119875)

macrC(119875) macrC(119875 rarr 119875)

macrC(119875 rarr 119875)Abortion 6889 9082 5764 7026 8981 4537

Cannabis 7081 9501 3750 6578 9652 1667

Guns 5234 7869 3568 5963 7535 3928

Evolution 7115 8449 6447 7269 9900 5513

Racism 5636 8841 3432 7187 9063 6544

LGBT 6166 8942 525 7242 9252 5917

Table 3 Average of links within pages clicked less than 10times

macrC(119875) is the average percentage of un-clicked hyper-linkswithin pages in 119875

macrC(119875 rarr 119875) is the average percentageof un-clicked links within 119875 pointing to 119875

This suggests that articles having higher out-degree offer more

options to users Presumably because of this users are more likely

to continue the exploration from those articles

In addition we count the number of links clicked fewer than 10

times over the last three years see Table 3 As an example given

a page in creationism on average 7115 of its links have been

clicked fewer than 10 times If we distinguish between references

to creationism and references to evolution readers did not click

the 8449 of links pointing to creationism and the 6447 of those

pointing to evolutionary biology

6 RQ2 EXPOSURE ACROSS TOPICVIEWPOINTS

The main contribution of this paper is to examine to what extent

current Wikipediarsquos topology supports users to explore diverse

facets of polarizing issues In particular we study (1) how readers are

locally exposed to diverse information and (2) how their exposure

to plural opinions may change throughout a navigation session

61 Exposure to DiversityTo evaluate the exposure to diversity induced by the networkrsquos

topology we compute the exposure to diverse information for ℓ = 1

using the uniform CwP model Recalling that ℓ indicates the usersrsquo

session length if we set it equals to 1 we study the exposure to

diversity over one-click sessions

Plots in the first row of Figure 9 show the value of ExDIN for

all the topics when CwP is119872119906

For instance let evolution be the topic we analyze If readers

start uniformly at random from a page about creationism the prob-

ability of visiting an article of the same partition is 576 On the

contrary the chances of entering a page about evolutionary biology

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

Pro-life

Pro-choice

Unifo

rm

456

748133

061 Prohibiiton

Activism

338

038039

008 Control

Rights

444

478161

116 Creationism

Evol Bio

576

327061

018 Racism

Anti-racism

876

8613

142 Discrimination

Support

424

395063

054

Pro-life

Pro-choice

Posit

ion

428

781115

056 Prohibiiton

Activism

332

037037

008 Control

Rights

468

473138

089 Creationism

Evol Bio

553

329054

016 Racism

Anti-racism

892

834129

159 Discrimination

Support

394

356056

042

Pro-

life

Pro-

choi

ce

Pro-life

Pro-choice

Click

s

1274

1357129

123Pr

ohib

iiton

Activ

ism

Prohibiiton

Activism

864

065081

013

Cont

rol

Righ

ts

Control

Rights

1376

1305208

137

Crea

tioni

sm

Evol

Bio

Creationism

Evol Bio

1157

52069

03

Racis

m

Anti-

racis

m

Racism

Anti-racism

2367

1503116

129

Disc

rimin

atio

n

Supp

ort

Discrimination

Support

1107

653066

053

Figure 9 ExDIN Each patch of the matrix is the exposure to diverse information across partitions for example 119875 rarr 119875 119875 rarr 119875 The 119910-axis indicates the source and the 119909-axis is the destination To each row corresponds the exposure to diverse informationcomputed for different CwP Darker colors indicate higher probability of being in the correspondent square in one click

is 018 (32 times smaller) On the other hand readers starting

uniformly at random from an article about evolutionary biologyhave 327 chances of reading pages conveying the same opinion

This probability is 5 times larger than that of visiting creationismpages

It is worth to point out that the current networkrsquos topology not

only nudges users in reading more about the same opinion but

also hinders them to explore diverse content symmetrically Indeed

users reading about evolutionary biology have higher chances of

reading one article about creationism (3 times more) than users from

creationism of reading about evolutionary biology After repeatingthe same analysis for all the topics we realize that the aforemen-

tioned observations hold for most of them Moreover we note that

the probabilities to continue the session reading a page of the same

opinion is greater for one of the two partitions of a given topic

Taken together these measurements highlight that the structure

of the network facilitates users to explore knowledge bubbles ofhomogeneous view and makes the measure of mutual exposure to

diverse information smaller than 1

The findings above report the intrinsic capability of the network

to spur users towards diverse content If we want to combine it

with readersrsquo next-click choice behavior we use the the positionand clicks CwP models instead of the uniform We show the results

in the second and third row of Figure 9

Referring back to evolution we now consider the matrix corre-

sponding to the ExDIN computed using the position CwP model

(second row) We see that if users click with higher chances links

at the top of the page wrt the uniform model the probabilities are

only slightly modified These modest variations are coherent with

linksrsquo placement within pages Figure 2

For a few topics such as guns the linksrsquo position plays a more

significant role worsening the user exposure to diverse information

Indeed in pages about guns control links belonging to the gunsrights partition seem to be mentioned later in the page The con-

sequence is that the probability of reaching an article supporting

guns rights starting uniformly at random from an article in gunscontrol has a 30 drop wrt the probability observed using the

uniform CwP model Therefore we conclude that for some topics

the placement of links within pages contributes in reducing the

exposure to diverse information In other words users who tend

to click with higher probability the links located towards the be-

ginning of a page have less possibility to read about contrasting

opinions

Finally we analyze how the phenomenon changes when we use

the click CwP model (third row) In this case we assume readers

make the next-click choice similarly to past users Going back to

evolution we immediately observe that the probability to start a

session in creationism and to continue reading about it after one

click grows from 576 of the uniform model to the 1157

For all the topics we verify a significant increment of the proba-

bility to visit pages of the same opinion Simply interpreting this

result we can say that real users click more the links strictly related

to the page they are reading From another perspective combining

this finding with the previous remark saying that the ldquotopology of

the network seems to drive users to explore knowledge bubbles ofhomogeneous viewrdquo we ask the reader the following question Hasthe behavior of past users been influenced by the network topologyUnfortunately because of lack of information we can not answer

this question but we hope it will be addressed in future works

Furthermore we observe that for some topics like abortion theprobability of reaching pro-choice articles from pro-life duplicatesThis is a sign that users may be willing to explore content proposing

diverse view

Before moving to the next section we want to underline that

stronger relations among pages of similar content is an intrinsic

property of Wikipedia In fact in Wikipediarsquos Linking Manual [49]editors are asked to link related content [7 33] Although this is a

fact we believe that it would be valuable to provide to editors met-

rics and tools making them aware of the effect that a newcurrent

links have on usersrsquo exposure to diverse information This is not

meant to alter the core and essential intrinsic property ofWikipedia

rather to avoid this property to become harmful when it prevents

users from accessing diverse content

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

1 3 5 7 9 11 13 15

1

2

e PP

Abortion

1 3 5 7 9 11 13 15

00

05

10Cannabis

1 3 5 7 9 11 13 150

1

2

Guns

1 3 5 7 9 11 13 15

05

10Evolution

1 3 5 7 9 11 13 15

05

10

15

Racism

1 3 5 7 9 11 13 15

05

10

Lgbt

1 3 5 7 9 11 13 1500

05

10

15

e PP

1 3 5 7 9 11 13 15

000

025

050

1 3 5 7 9 11 13 15

1

2

1 3 5 7 9 11 13 15

02

04

06

1 3 5 7 9 11 13 1505

10

15

1 3 5 7 9 11 13 15

05

10

15

1 3 5 7 9 11 13 15

40

60

80

100

1 3 5 7 9 11 13 15

25

50

75

1 3 5 7 9 11 13 1525

50

75

100

1 3 5 7 9 11 13 15

40

60

80

1 3 5 7 9 11 13 15

60

80

100

1 3 5 7 9 11 13 15

60

80

100

Number of Clicks

uniform alpha=0position alpha=0clicks alpha=0

uniform alpha=02position alpha=02clicks alpha=02

uniform alpha=1position alpha=1clicks alpha=1

Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42

62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions

longer than 1 and see how their (mutual) exposure to diversity may

change According to the combinations of models employed to mea-

sure the ExDIN and M-ExDIN we provide different insights about

the effect of the current networkrsquos topology on usersrsquo exposure to

diverse content over a navigation session

In Figure 10 for sessions of length 15 we plot the ExDIN from

119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can

notice that each of the topics shows its own trends For this reason

we decide to highlight and provide an explanation for the most

recurrent patterns Moreover for better understanding we suggest

to cross-check the following explanations with the analysis done

above in the paper We start describing how given the current net-

workrsquos topology the ExDIN changes over the course of a navigation

session (first and second row)

(1) The curves corresponding to the same value of 120572 (same color)

show very similar trends Depending on the respective CwP model

(markerrsquos shape) they are shifted up or down This implies that

when users share the same navigation behavior the way they make

the next-click choice plays a crucial role on determining the mag-

nitude of their exposure to diverse content In general the CwP

model (markersrsquo shapes) corresponding to higher exposure is119872119888

followed by119872119906and119872119901

(2) If users navigate mirroring a star-like behavior (green 120572 = 1)

their exposure to the opposite opinion is steady It can slightly

decrease or increase when the probability of clicking links to the

opposite side becomes higher or lower respectively in the first

iterations This happens because these kind of users are only subject

to the exposure of their starting navigation page So the more links

to the opposite partition they click in the first steps the more their

exposure to diversity decreases and vice-versa

(3) The curves of users who randomly navigate the network (sky

blue 120572 = 0) show two trends For both cases after the first few

clicks the ExDIN is lower than at the beginning of the session

Then it inverts the trend In one case it reaches or exceeds the

starting exposure On the other hand it grows getting steady below

the initial exposure The more the destination partition is connected

to the rest of the graph the more users randomly navigating the

network are able to reach it Sometimes ExDIN from 119875 to 119875 of

LGBT the curves start to decrease after many steps This happens

when the pages within the destination partition have been reached

with high probability

(4) Users characterized by a star-like random navigation (blue

120572 = 02) are exposed to diverse content similarly to users exploring

randomly the network but the ExDIN magnitude is greater because

of the possibility of jumping back to the starting point at each step

Given this observation we analyze the ExDIN of guns With the

current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)

are those characterized by a star-like behavior As soon as the users

navigate in a random fashion this probability drops down This is

because the guns rights partition has a number of in-going edges

that prevent random users who walk away to get back to it We

observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to

the guns control partition is higher or comparable to that of star-like

behaving users The probability of reaching guns control after many

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

steps becomes high because of the higher incoming edges of the

partition

From complementary analysis we observe that for sessions

longer than 2 page visits the topology of the graph is not such

to keep random users within the knowledge bubble they accessed

Indeed for all the topics the probability to visit articles of the same

opinion of the starting page becomes close to 0 (refer to Fig 9 to

cross-check the probabilities at the first click) For users with star-

like behavior the probabilities show slightly descending curves

This demonstrates that they tend to pick pages of the same opinion

at the first iterations Due to space constraints figures picturing

these phenomena could not be displayed

Nowwe compare the ExDIN curves of each topic to understand if

users reading about different opinions have equal chances of visiting

each other We recall that to do it we use the Mutual exposure to

diverse information see Sect 42 For all the topics the mutual

exposure to diverse information is lower than 100 meaning that

for none of them does the network topology provide an equal

exposure across opinions If we consider topics like abortion and

guns the longer the usersrsquo session is the more the network topology

prevents readers to symmetrically explore different opinions

In general if we detect low mutuality the topology of the net-

work favors the exploration of one side of the topic more than the

other Moreover we want to stress that all the comments regarding

users navigating according to the uniform CwP model express the

intrinsic topological exposure of the network On Wikipedia two

scenarios determining this phenomenon may be (1) the knowledge

on the encyclopedia is complete but articles are underlinked [49]

that is there is the content and keywords to become anchors thus

we need a strategy to densify the network ensuringmutual exposure

to diversity (2) the knowledge on the encyclopedia is incomplete

that is there are no words to attach links thus the addition of com-

plementary content may be necessary An in depth investigation of

this conditions may be an interesting future work

7 CONCLUSIONSOur work provides a first analysis to understand how the current

Wikipediarsquos network topology assists readers to explore opposite

stances of polarizing topics spanning over sets of articles We for-

malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles

expressing opposing viewpoints The latter evaluates whether the

relationship across diverse views is symmetric that is whether

the flow and the opportunity to go from one side to the other is

comparable for the two directions

We investigate the phenomenon on six polarizing topics (Sect 6)

In addition we also study the overall usersrsquo topics consumption

Our main findings suggest the following

bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-

ing from both external and internal pages suggest that read-

ers are inclined to seek content about one facet of the topic

Most seem to have bias toward liberal content (see Fig 7)

bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse

contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61

the analysis suggests that users reading about an opinion

have higher chances of continuing to explore articles of sim-

ilar views than of the opposite Furthermore for each of the

topics that we explored the users of one of the two views

had substantially more tendency albeit small to visit pages

of the opposing view than the ones of the other one

bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we

observe that the mutual exposure to diverse information is

never achieved by users navigating completely at random

The better one of the two opinions is connected to the rest of

the network the more the graph nudges users toward that

opinion

bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-

mation of the same inclination diminishes On the other

hand if users explore the network with a star-like behavior

their level of exposure to the same opinion is similar to those

who only do one click

In our study we analyze sets of articles assigned to opinions ac-

cording to editorsrsquo crafted categories [44] Although this approach

represents a solid starting point for analysis it can cause article mis-

classification As future work we plan to investigate a more reliable

classification strategy to improve the accuracy of our analysis an

analysis which should include also the content of the articles along

the line information Secondly the performance of a longitudinal

study with the goal of understanding the dynamics that brought

to the current state of the encyclopediarsquos network would provide

further understanding of the usersrsquo behavior Finally in the light

of our findings we deem crucial to design tools to help editors to

contextualize articles within the network such that they are aware

of the effect of links insertion on users knowledge exploration

The prevalence of bias and polarization is well established in

multiple areas of our life and filter bubbles aggravate this phenom-

enon Understanding better how they manifest in Wikipedia (and

other media) is a crucial first step for finding ways to attenuate it

and our hope is that this work is a step towards this goal

Acknowledges Partially supported by the ERCAdvancedGrant

788893 AMDROMA Algorithmic and Mechanism Design Research

in Online Markets and MIUR PRIN project ALGADIMAR Algo-

rithms Games and Digital Markets

REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004

US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem

[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion

networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep

Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18

[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian

Self-Governance in Action Motivating the Policy Lens In ICWSM

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models

IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation

Journal of machine Learning research 3 Jan (2003) 993ndash1022

[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-

work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web

[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content

on famous persons Journal of the American society for information science andtechnology (2011)

[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter

Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM

[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves

Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter

In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs

A complete longitudinal and multi-language dataset of the Wikipedia link net-

works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607

[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena

Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-

ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media

[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho

2016 Extracting Semantics from Random Walks on Wikipedia Comparing

Learning and Counting Methods In Wiki ICWSM

[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-

ferent topic different traffic How search and navigation interplay on Wikipedia

The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28

[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926

[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web

[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding

news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346

[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers

and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320

[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in

Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72

[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In

Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)

[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Political discourse on social media Echo chambers gate-

keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922

[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-

ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671

[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)

[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-

rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282

[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First

Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media

[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013

Models of human navigation in information networks based on decentralized

search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98

[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki

dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-

trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113

[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and

Andreas Hotho 2019 On the right track Analysing and predicting navigation

success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext

and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web

Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web

[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017

How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50

[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of

source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754

[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-

nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and

moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505

[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my

friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726

[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito

2015 Measuring political polarization Twitter shows the two sides of Venezuela

Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018

Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18

[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving

website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624

[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring

Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval

[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk

about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11

[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019

Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-

fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and

Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141

[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014

The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222

[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom

of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn

Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600

[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013

Computing semantic relatedness from human navigational paths A case study

on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70

[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer

2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)

[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-

works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki

WikipediaManual_of_StyleLinking

[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia

Namespace

[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki

WikipediaNeutral_point_of_view

[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect

[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https

doiorg106084m9figshare1305770v22 (2017)

[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing

wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985

  • Abstract
  • 1 Introduction
  • 2 Related Works
  • 3 Data Collection
    • 31 Topic Induced Networks
    • 32 General Statistics on Topics Networks
      • 4 Metrics
        • 41 Content Consumption
        • 42 Exposure Metrics
          • 5 RQ1 Readers Topic Consumption
            • 51 Pageviews Distribution
            • 52 External or Internal Access to the Topic
            • 53 How Much Readers Navigate Links
              • 6 RQ2 Exposure Across Topic Viewpoints
                • 61 Exposure to Diversity
                • 62 Dynamic Exposure to Diversity
                  • 7 Conclusions
                  • References

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

follows that the lack of sufficient linkage among pages expressing

diverse stances of a topic can be against the NPOVrsquos goals

To our best knowledge there are no studies that investigate

the validity of the NPOV principles concerning usersrsquo exposure to

the indirect content (ie the one suggested by hyperlinks) Hence

analyzing Wikipediarsquos links is particularly important to understand

if broad topics which conceptually span across multiple articles

are effectively proportionately and fairly presented to readers not

only in terms of direct content (ie articlersquos body)Previous works addressed the issue of usersrsquo polarization on

social networks and showed that it is hard for users to interact

with content createdshared by users of opposing views [1 9 10

19 24] Ribeiro et al [42] empirically showed that the YouTube

recommender system contributes to radicalize usersrsquo pathways

Given the nature and role of Wikipedia as a primary source of

knowledge acquisition the lack of broad exposure to different views

of a topic appears to be critical to guarantee fair and balanced access

to a well-rounded knowledge

This paper provides a first observational study on Wikipedia

that aims to quantify how the hyperlinksrsquo network topology can

profoundly affect user exposure to diverse stances on polarizing

topics Having a comprehensive view of the connections among

Wikipedia pages and how they shape reader exposure to informa-

tion is a difficult task to grasp for humans Therefore it requires

introducing algorithmic methods to audit and quantify the mutual

level of exposure among articles of diverse content especially for

polarizing matters That is fundamental for the improvement of the

encyclopedia and its role in promoting a self-critical society

By studying the hyperlinks network we first aim to discover

to what extent the networkrsquos topology pushes users to explore

diverse content rather than keep them within knowledge bubbles1Secondly we aim to gain insights that may help to design a system

supporting editors in (1) contextualizing pages within the more

general encyclopediarsquos network and (2) adding links connecting

articles of opposingcomplementary views

In summary this paper tackles the following research questions

RQ1 How do readers consume articles about polarizing topics

(Sect 5)

RQ2 To what extent does the hyperlinksrsquo network expose read-

ers to diverse information (Sect 6)

By answering them we make the following contributions

bull We initiate a discussion that aims to shed light on the role

that the hyperlink network plays in connecting articles be-

longing to different categories We focus our work on ana-

lyzing this phenomenon on a set of polarizing topics such

as abortion guns evolution

bull We define two metrics the exposure to diverse informationand the (mutual) exposure to diverse information to quan-

tify the strength of connections among sets of articles (eg

pages about abortion-rights and anti-abortion) These met-

rics quantify to what extent the network topology assists

readers to visit pages of contrasting subjects and whether it

does it equally for all them (see Sections 41 412 and 42)

To this end they embed readers possible behavior relying

1We intend as knowledge bubbles the sets of pages presenting one side of a con-

tentious subject (ie pages about pro-life or pro-choice movements)

on their behavioral patterns [43 45] features determining

the success of wikilinks [16 31] and readersrsquo clickstream

data [53]

bull We find that the structure of the network facilitates users

to explore knowledge bubbles of homogeneous view rather

than opposing stances Moreover we show that readersrsquo

interest is biased toward one side of the topic based on the

internal and external traffic on Wikipedia (see Sect 411 5

and 6)

To our knowledge this is the first work that analyzes Wikipediarsquos

readersrsquo exposure to diverse information through the link network

Before moving on we want to emphasize that this work does not

claim how the hyperlinks network should be rather we aim to

study if the current connections among articles encumber users in

visiting complementary pages about a polarizing topic Also our

conclusions come from a network-based analysis More advanced

investigation combining network properties and articlesrsquo content is

left out for future works The code to replicate the paper is stored

in an anonymous folder2

2 RELATEDWORKSWe divide this paper related work in four categories ImprovingWikipedia Navigating Wikipedia Wikipedia Categorization and

Polarization on Social MediaImprovingWikipedia The scientific community proposed semi-

automated procedures to improveWikipediarsquos quality These works

check the veracity of references [18 41] suggest articlesrsquo structure

[39] look for hoaxes [30] or recommend links [38 54] Although

link recommendation tools enrich the editing process they do not

provide editors a measure to evaluate the relationship among ar-

ticles containing diverse opinions In this work we define such

metrics Sect 42

Wikipedia Navigation The literature still lacks a model that

generalizes Wikipediarsquos usersrsquo behavior Previous studies [25 27

31 46] focused on modeling and predicting human navigation in-

side Wikipedia relying on traces from navigation games ie Wik-ispeedia [43 48] and WikiGame [13 29 46]3 While such games

provide valuable insights about how users exploit links to go from

one concept to another Singer et al [45] and [15 17] showed that

users display different behavioral patterns depending on their in-

formation needs and the linksrsquo position within pages Thus we

exploit the insights provided by Singer et al [45] to define a general

model mimicking localized and more in-depth topic exploration

We further enrich the model characterizing usersrsquo next-link choices

according to findings in [15 17] Sections 412 and 413

Wikipedia Categorization In this work we need to collect ar-

ticles expressing the distinct facets of a polarizing topic Wikimedia

provides a supervised classifier ie ORES4that based on features

derived from the articlesrsquo text categorize an article into a manually-

designed categories taxonomy5[3] Alternatively one can use topic

models [5 6] Unfortunately none of the above approaches provide

2httpsdrivegooglecomdrivefolders1CJr_YiFE2YlyAtB9yKaGe8CLwVLWx9Ta

usp=sharing

3These games ask readers to go from one article to another using wikilinks

4httpsoreswikimediaorg

5httpswwwmediawikiorgwikiORESArticletopicTaxonomy

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

Figure 1 From Wikipediarsquos graph to a topic-induced net-work The image on the left shows the original Wikipediarsquosgraph On the rights we have the final topic-induced net-work The dashed circles in119882 identify the set of nodes thatwe use to build the topic-induced network 119866 The color redrefers to the set of nodes 119875 We use the blue to indicate 119875

and green and yellow for N and 119904 respectively To keep theimage tidy we do not specify the edges direction

us the requested granularity So we exploit the collection proce-

dure employed by Shi et al [44] who needed the same data to study

how polarization in teams impact articlesrsquo content about polarizing

topics see Sect 3

Polarization on Social Media There is a large spectrum of

works related to detect [1 10 12 19 36] model quantify and mit-

igate [2 9 21ndash24 32 34 35 37] polarization on social media We

focus on the work of Garimella et al [24] that better relates to our

metric of exposure to diverse information (ExDIN) They introduced

a graph polarization measure based on random walks ie Random

Walk Controversy score (RWC) On a social graph it quantifies to

what extent opinionated users are more exposed to their own opin-

ion than the opposite thanks to a chain of retweets (represented by

the random walks) While RWC is conceived for networks of users

and measures the overall polarization of a graph ExDIN works

on information networks and quantifies how the networkrsquos topol-

ogy impacts the usersrsquo exposure to diverse information when they

navigate the graph

Cultural bias onWikipedia Callahan and Herring [8] showedthe presence of cultural bias in the same articles of different lan-

guages Other studies highlighted differences between women and

men biographies [26 47] These content-based analyses call for

the need for a thorough investigation of the phenomenon To this

end we decide to investigate the presence of bias in the hyperlink

network by quantifying the diversity of pages it suggests to users

browsing the network of articles

3 DATA COLLECTIONTo audit a polarizing topic on Wikipedia we encode it by building

a topic-induced network This representation embeds both the

network structure and readersrsquo interactions with the topic

31 Topic Induced NetworksIn this section we explain how to build a topic-induced network

We suggest the reader to follow the process looking at Figure 1

First we consider the directed English Wikipediarsquos graph119882 =

(119860 119871) The nodes of the graph are encyclopediarsquos pages classified

as Articles [50] The edges represent the links connecting pages

and are known as wikilinks6 This set of links includes those in the

infoboxes7

Among the vertices we identify a set of pages T sub 119860 about the

different polarizing sides of a given topic We partition T into two

sets 119875 and 119875 (ie 119875 cap 119875 = empty and 119875 cup 119875 = T ) Each of them gather

pages related to the same side of the topic Then we define the set

of nodes N that includes all vertices at one-hop distance from the

vertices in T The reason we consider nodes representing pages

outside T is twofold (1) We want to include in the graph those

nodes related to the topic that do not appear in T because describe

subjects neutral to the topic8 (2) When we will consider readers

exploring the network we want to account for the possibility that

they reach pages about entities of opposing opinion passing through

articles not strictly related to the topic (see Sect 41)

To reduce the complexity of our analysis we cluster all the pages

in 119878 = 119860 (T cup N) in one super node 119904 Note that nodes in 119878

are only connected to vertices in N For each node 119907 isin N we can

have multiple edges going to 119904 We compress them in a unique edge

(119907 119904) Respectively 119904 can point multiple times to the same node

119907 isin N So we compress them to a unique edge (119904 119907) In both cases

the weights of (119907 119904) and (119904 119907) will be the sum of weights of the

aggregated edges

Finally we built a directed weighted network119866 = (119881 119864) that wecall topic-induced network whose set of vertices119881 is T cupNcup119904 ofcardinality 119899 + 1 and the edges 119864 are the links connecting the pages

The edge weights are transition probabilities as follows Let119872 be

an (119899 + 1) times (119899 + 1) right-stochastic transition matrix associated

to 119866 that is a matrix such that each entry 119898119894 119895 is a probability

with119898119894 119895 = 0 if (119894 119895) notin 119864 and such that

sum119899+1

119895=1119898119894 119895 = 1 The entry

119898119894 119895 describes the probability that being on article 119894 a reader clicks

page 119895 In Section 412 we propose different characterizations of

the transition matrix

Summarizing to extract the topic-induced network of a given

topic we first extracted data from a complete English Wikipedia

database dump9From this dump we build the graph119882 To collect

the corpus of articles expressing different opinions about the topic

(ie T ) we rely on the collection strategy adopted by the authors

of [44] (see Sect 2) In particular the subcorpus belonging to 119875

consists of all articles categorized under a Wikipedia category de-

scribing a viewpoint and its subcategories For instance the corpus

of abortion articles consists of two subcorpora pro-life (119875 ) and

pro-choice (119875 ) articles The pro-life subcorpus consists of all articlescategorized under the seed category ldquoAnti-abortion movementrdquo and

its subcategoriesFor instance the article ldquoFetal rightsrdquo is directly un-

der the seed category whereas the article ldquoCrisis pregnancy centerrdquo

is located under the subcategory ldquoAnti-abortion organizationsrdquo The

pro-choice corpus is collected in a similar fashion starting from the

category ldquoAbortion-rights movementrdquo Note that because we want

6We exclude links within the same page Moreover while building the graph

we resolve all the redirects [52] Specifically for any given node 119903 pointed by 119906 and

redirecting to 119907 we replace the edges (119906 119903 ) and (119903 119907) with (119906 119907) The final effectof this operation is that we exclude all the redirecting nodes from119860 while retaining

their connections to the rest of the graph

7An infobox is a fixed-format table usually added to the top right-hand corner of

articles to consistently present a summary of some unifying aspect that the articles

share

8For instance articles that present an overall introductiondescription of the topic

9Unless differently specified we refer to the dump of September 2020

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

Topic |119881 119904| |119875 | |119875 | |N | |119864 | |119864119875rarr119875 | |119864119875rarr119875 | |119864119875rarrN | |119864119875rarrN | |119864Nrarr119875 | |119864Nrarr119875 | 119876 (119875 119875) Unreach(119875 ) Unreach(119875 )

Abort 56861 481 291 56093 19M 205 97 21843 14492 21396 29889 029 (030041) 21 481

Cannabis 32743 45 231 32470 11M 8 6 1089 15055 656 27823 027 (014 003) 1136 349

Guns 65743 167 187 65393 25M 98 115 18342 12304 56702 16608 026 (024 030) 363 000

Evolution 84788 342 1334 83113 199M 391 135 18289 45472 15601 58720 020 (022 027) 169 562

Racism 129963 1024 1022 127953 48M 746 560 64359 41566 74354 58195 032 (021 031) 272 255

LGBT 150563 459 640 149479 46M 195 143 28100 22678 92975 81706 034 (030 013) 244 535

Table 1 Networksrsquo statistics The notation119876 (119875 119875) isin [0 1] indicates the modularity among the partitions Higher119876 means thatconnections within partitions exceed those among them

Pro-lifePro-choice

ProhibiitonActivism

ControlRights

CreationismEvol Bio

Racism Anti-racism

DiscriminationSupport

000

025

050

075

Links

pos

ition

Opposite opinion Same opinion

Figure 2 Linksrsquo position distribution within pages Given 119875 and 119875 the orange boxplots show the distribution of links withinpages in 119875 (resp 119875) that point to articles in 119875 (resp 119875 ) The green boxes represent linkrsquos placement among pages only belongingto 119875 (resp 119875 ) The value of the y-axis is the relative position re-scaled with the 119905119886119899ℎ to similarly score links at the top of thepage Higher the value higher the position in the page is

119875 and 119875 to be disjoint articles belonging to both ldquoAnti-abortion

movementrdquo and ldquoAbortion-rights movementrdquo are assigned to N10

Once we have the list of pages in T we proceed building the topic-

induced network as described in the first part of this section The

articles we collect gather pages about different entities such as

organizations people events The inclusion of a heterogeneous set

of pages for each viewpoint allows to capture the different way a

user can learnknow about a topic

Before moving on we need to make two remarks (1) Throughoutthe paper when we talk about articles expressing an opinion ordescribe a viewpoint of a topic we do not mean that they endorse

the position of any subject they describe But they objectively talk of

entities that are close to one side of the issue (2) Since subcategoriesare often redundant or not entirely related to the parent category

we check them manually In this way we avoid cases like having

articles about anti-racism falling into the racism category Moreover

we do not consider categories whose names do not include topic-

specific keywords

32 General Statistics on Topicsrsquo NetworksFollowing the procedure explained in the previous section we

collect the topic-induced network related to six different topics

that we pick from the List of controversial issues on Wikipedia11

and other resources that indicate some controversial issues in our

society These topics are abortion cannabis guns evolution LGBTand racism These are critical topics that often polarize as follows

pro-choice vs prolife cannabis activism vs cannabis prohibition

gun control vs gun rights creationism vs evolutionary biology

support to LGBT rights vs opposition to LGBT rights and racism

10We report the size of the intersections between partitions in the next section

11httpsenwikipediaorgwikiWikipediaList_of_controversial_issues

Topic 119875 119875 Seed 119875 Seed 119875

Abortion Pro-life Pro-choice

Anti-abortion

movement

Abortion-rights

movement

Cannabis Prohibition Activism Cannabis prohibition Cannabis activism

Guns Control Rights

Gun control

advocacy groups

Gun rights

advocacy groups

Evolution Creationism

Evolutionary

biology

Creationism

Evolutionary

biology

Racism Racism Anti-racism Racism Anti-racism

LGBT Discrimination Support

Discrimination against

LGBT people

LGBT rights

movement

Table 2 The table indicates what opinion of a topic the par-titions 119875 and 119875 correspond to

vs anti-racism Information about the seed categories of each topic

are in Table 2 The full category lists and sample titles are provided

in the code folder Sect 1

For the rest of the paper we refer to the opinions about a topic

using 119875 and 119875 In Table 2 for each topic we match each set to the

real opinion it represents

Before presenting the general statistics of the retrieved networks

we remark that when we assign the articles to partitions we put

to the set N those assigned to both partitions The size of the

intersections among partitions (ie the number of common articles)

are the following abortion is 2 cannabis is 3 evolution is 2 guns is 1lgbt is 5 racism is 7 Recalling that we do not remove these articles

(ie they belong to N ) they can still act as bridges connecting 119875

and 119875 in sessions longer than 1 click Instead when we consider

the direct connections among partitions (1 click) we discard them

since they do not explicitly categorized into one partition

In Table 1 we show some statistics on the six topic-induced

networks Immediately we observe that the size of 119875 and 119875 differ

substantially for all the topics except for racism and guns It means

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

that we have one of the two opinions represented by more articles

In terms of content this does not necessarily imply that neither

one of the two views is incomplete nor insufficiently represented

Indeed a topic spans a few articles or may require more pages to

be complete On the other hand the unbalanced sizes can affect an

opinionrsquos exposure within the entire Wikipediarsquos network Practi-

cally if a set of articles is large and well connected to the rest of

the network the chances that users who randomly browses reach

it are higher than those of going to a small partition Moreover if

readers exploit the random article functionality of Wikipedia an

opinion more represented gets more chances of being randomly

sampled

The topics showing the higher unbalance are cannabis wherethere are five times more pages about activism than about prohi-bition and evolution where there are four times more pages about

evolutionary biology than about creationism If we consider the edges

across partitions the number of cross-partition edges is higher for

bigger sets This is reasonable because more nodes can point to the

opposite side Despite that for evolution the edges from creationismto evolutionary biology are sim3 times more and for LGBT the edges

from discrimination to rights are 36 more Despite the low number

of edges across cannabis partitions we decide not to discard the

topic

Above we said that one of the two partitions might connect

better to the rest of the encyclopedia We observe that the sizes of 119875

and 119875 are not linear in the number of edges that point out or to the

nodes in the partitions For instance the number of articles about

pro-choice (291) is half of the nodes related to pro-life movement

(481) Although the nodes in pro-life are twice as many as those in

pro-choice the number of links pointing to pages about pro-choiceis 36 more than those pointing to pro-life articles This happenswith different magnitude also for guns and LGBT We will see later

that the fact that a side of a topic is better blended in the network

has implications on the readersrsquo exposure to one of the two sides

of the topic (Sect 6)

We also investigate how many pages in 119875 and 119875 cannot be

reached by users unless they enter Wikipedia directly on those

pages The sets of articles with the highest number of unreachable

nodes are in the category of cannabis prohibition (1136) followed

by the 562 of evolutionary biology and LGBT rights (535)Furthermore we compute the modularity 119876 among 119875 and 119875

Higher 119876 means that connections within partitions exceed those

among them In Table 1 we report three values computed on dif-

ferently weighted graphs with probabilities assigned to click the

link of each page as follows (1) uniform (2) proportional to the

position of the link within the page and (3) proportional to readersrsquo

clickstream (see Sect 412) Overall if we consider the position of

links and readersrsquo clickstream it seems that the partitions are more

modular

Based on that we study how links across and within partitions

position in pages First we define the position of a link Given a

page we have its list of links in order of appearance We get the

relative rank within the list for each link and re-scale it by the tanh

In this way we have values in [0 1] and the links at the top of the

list get a more similar score The set of links includes those in the

infoboxes We regard them as at the top of the article according to

results in [15 17] If a link appears more than once we average its

position

In Figure 2 we show the position distributions According to

the t-test whose significant level is fixed to 120572 = 095 the average

position of links in pro-choice pointing to pro-choice is significantlydifferent than the average position of links pointing to pro-life Alsothe position of links from guns control to guns control is signifi-cantly higher than those to guns rights For evolutionary biologywhose distribution of links to creationism are placed statistically

significantly lower than those to evolutionary biology The same

happens for LGBTFor the sake of completeness of the analysis even if not used

further in the paper for each topic we study the quality of the pages

populating it In particular we use the ORES API to get the ldquoarticlequalityrdquo We observe that overall for all the topics between 60 and

70 the articles are classified as stubs or start Then the 22-29 is

in B-class the 0-5 are Featured Articles and the remaining belong

to the C-class12

4 METRICSIn this section we define the models and metrics that we use to an-

swer the research questions formulated in Sect 1 First we describe

how we characterize readersrsquo consumption either by analyzing

real usersrsquo data or by simulating their behavior (see Sect 41) Then

we introduce the core metrics of the paper ExDIN and M-ExDIN

see Sect 42

41 Content ConsumptionTo understand readersrsquo consumption of polarizing topics we need

different modeling strategies that we describe in the following

subsections

411 Metrics Based on Clickstream We build twometrics upon the

information we extract from usersrsquo clickstream data that are made

publicly available by Wikimedia and preserve usersrsquo privacy [14

54]13

From these data we infer 119888119894 119895 counting how many times a hyper-

link to 119894 isin 119881 is clicked from page 119895 The page 119895 may be either an

internalWikipedia page ( 119895 isin 119860 recalling that119881 = T cupNcup119878 includeall the Wikipedia pages) or external if corresponds to a page from

outside Wikipedia (eg a search engine) Thus we define the vari-

able120575 119895 which indicateswhether 119895 is an external page or it belongs to

the topic-induced network 120575 119895 = 1 if 119895 is external and 0 otherwise

Given a page 119894 we indicate withJ the set of external and internal

pages pointing to it see Figure 3 We define 119888119894 =sum

119895 isinJ 119888 119895119894 to be

the total clicks to the page

sum119895 isinJ 120575119894119888 119895119894 is the total number of clicks

from external websites therefore the difference between 119888119894 and this

summation is the number of visits from internal (Wikipedia) pages

Now we are ready to define the following metrics

Reader Search Rate (RSR) Given a page 119894 isin 119881 the empirical

probability that a visit to page 119894 is from an external website is

119877119878119877119894 =

sum119895 isinJ 120575119894119888 119895119894

119888119894 (1)

12httpsenwikipediaorgwikiTemplateGrading_scheme

13Description of the data is at httpsmetawikimediaorgwikiResearch

Wikipedia_clickstream The provided information is enough to extract the clickstream

based metrics

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

Click Through Rate (CTR) Given a page 119894 isin 119881 the empirical

probability that a reader clicks a link within the page is

119862119879119877119894 =

sum119895 isin119873119900119906119905 (119894) 119888119894 119895

119888119894 (2)

where 119873119900119906119905 (119894) is the the set of pages 119894 points to (Multiple clicks

from the same page are counted as originating from different visits

to 119894 and thus counted multiple times in 119888119894 )

412 Model Clicks Within Pages When readers visit a page they

have the possibility of clicking any of the present links However

according to the information needs they want to satisfy each of the

links may have a different probability of being clicked [45] Now

we propose three models to describe the distribution probability

of clicking a link ldquojrdquo within an article ldquoirdquo First let 119894 be an article

in 119881 and 119895 isin 119873119900119906119905 (119894) We define 119901119900119904 ( 119895 |119894) as the rank of 119895 among

all links in 119894 and 119903 ( 119895 |119894) = |119873119900119906119905 (119894) | minus 119901119900119904 ( 119895 |119894) such that a higher

value indicates a higher ranking position Moreover we introduce

tanh119909 = 1198902119909minus1

1198902119909+1 which we use to transform ranking positions to

values between 0 and 1 such that links at the top of the page get

similar scores

The Clicks Within Pages models (CwP) are directly applicable on

119866 by setting the transition matrix119872 in one of the following modes

(1) 119872119906(Uniform) whose entry119898(119894 119895) = 1

|119873119900119906119905 (119894) | mimics read-

ers who click each link in a page uniformly at random

(2) 119872119901(Position) whose entry 119898(119894 119895) =

tanh 119903 ( 119895 |119894)sum119895isin119873119900119906119905 (119894 ) tanh 119903 ( 119895 |119894)

captures the scenario in which readers click with higher

probability links appearing first in the page This model is

based on previous work that shows how the link position is

a good predictor to determine the success of a link [16 31]

(3) 119872119888(Clicks) whose entry119898(119894 119895) = 119888119894119895sum

119895isin119873119900119906119905 (119894 ) 119888119894119895represents

the empirical probability that users in 119894 will click the link

toward 119895 When 119888119894 119895 lt 10 we substitute it with 1014 the

minimum number of times the link must be clicked to be

included in the dataset [53]

For the sake of completeness we recall that 119866 includes a super

node 119904 To fill its corresponding entries in the transition matrices

we need to aggregate over the edges we compressed to build the

graph15

see Sect 31

413 Readers Navigation Model The main goal of this paper is

to audit the mutual exposure to diverse information across 119875 and

119875 We can do it by simply looking at a snapshot of the graph and

counting the links going from 119875 to 119875 and vice-versa To do a step

further we recall that the Wikipediarsquos network is conceived to let

users move fulfilling their own information needs Thus we want

to understand how different usersrsquo navigation behavior can affect

readersrsquo exposure to diverse information

To do that it would be optimal to have access to usersrsquo log ses-

sion Because these data are not available to the public we define a

parametric model that simulates usersrsquo navigation by embedding

14We aim to model users on the current version of Wikipedia Thus to include all

the links we assign a smoothing factor equal to 10 to links clicked less than 10 This

implies a small probability of clicking these links Setting the smoothing factor to 10

is a deliberate choice However we experimentally verified that setting any number

between 1 and 10 does not affect the results

15The computation of these quantities is straightforward so we omit it from the

body of the paper

external

internal

i internal

Figure 3 Information from the clickstreamdataset For eachnode we extract the number of views coming from inter-nal and external websites Moreover we know howmany ac-cesses on a page turn into a click toward another article

different behaviors accordingly to chosen parameter We empha-

size that the scope of this model is not to perfectly replicate usersrsquo

behavior on Wikipedia Rather we want to see how users simu-

lated from a reasonable and general model are exposed to diverse

information

In other words we want to define a stochastic process with 119899 +1

states corresponding to the 119899 + 1 pages in119881 that approximates the

probability of reaching any of the articles starting at random from

119901 isin 119875 (or from 119875 )

Wemodel this by considering the process 119883 ℓ ℓ = 0 1 119871 on

the set of nodes119881 induced by transitionmatrix119872 with starting state

119883 0selected from the probability distribution 1206450

119875= (120587119875 )119894 isin R1times119899

over119881 We recall that the transition matrix119872 can vary according to

the CwP models (Sections 31 and 412) Based on the assumption

that usersrsquo session length (the number of clicks) is finite we evaluate

the process on a finite number of states 119871 We have that Pr(119883 ℓ =

119895) = (120587 ℓ119875) 119895 where the (row) vector 120645 ℓ

119875is given by the following

variation of the Personalized Random Walk with Restart (RWR)

Definition 1 (Navigation Model) Let1198720 be the transition ma-trix embedding a click-within-pages model 1206450

119875the distribution of the

starting state over 119875 and 120572 isin [0 1] the restart parameter We have

1206451

119875 = 1206450

119875middot1198720 (3)

and for ℓ ge 1

120645 ℓ+1

119875 = (1 minus 120572)120645 ℓ119875 middot119872ℓ + 120572 (1206450

119875 middot119872ℓ ) (4)

where119872ℓ = norm((119863 (119872ℓminus1)119879 )119879 ) and119863 = 119889119894119886119892

(1 + 120645 ℓminus1

119875

)minus1

norm(119872)transforms matrix119872 into a right-stochastic matrix by normalizingeach row independently such that it sums to 1

This process is a variation of the standard random-surfer (PageR-

ank) model with the difference that the transition matrix is updated

in each step It takes into account the probability that an article

has already been visited in a previous iteration Specifically the

vector 120645 ℓ119875that we get at the end of each iteration represents the

likelihood that each node is reached at step ℓ if it starts uniformly at

random from a node in 119875 We assume that readers within the same

session do not click more than once the same link Thus we desire

that at step ℓ + 1 the nodes that are clicked with high probability

at step ℓ see their probability of being reached deflated and those

with lower probability have more chances of being clicked We

achieve this by dividing the rows of119872 by the vector of probabilities

120645 ℓ119875+1 where 1 is a smoothing factor to avoid divisions by 0 and

then normalize the matrix to get the updated stochastic matrix to

use in the next iteration

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

(a) Star-like (120572 = 1) (b) Star-like Rand Navigation(0 lt

120572 lt 1)

(c) Random Naviga-

tion (0 = 120572 )

Figure 4 Navigation model for different 120572 The green nodesrepresent the starting navigation pages

Overall as we will later see in Section 42 this approach allow us

to investigate how the exposure to diverse information varies for

users who behave differently in terms of navigation session length

(meant as the number of clicks) and next-link choices

Looking deeper at the model

bull When 120572 = 1 Figure 4(a) the model emulates the reader

whose navigation consists in just opening links from the

starting page We call this behavior star-like and basically

consists in opening pages from the starting node With this

kind of exploration readers locally explore articles likely

semantically related to each other [49]

bull For 0 lt 120572 lt 1 Figure 4(b) we simulate two cases (1) readers

open sequential articles and then jump back to the starting

page (2) readers keeps multiple path open The more 120572 is

close to 1 the more users show a star-like behavior Instead

the closer 120572 is to 0 the more users navigate navigate in a

more DFS-oriented fashion Thus readers move randomly

according to the CwP model and from time to times jump

back to the starting page

bull If 120572 = 0 Figure 4(c) the users sequentially clicks links so

each click depends only on the CwP model In this case

especially if related articles are not densely connected the

exploration can lead to articles less related to the starting

page and returning to the origin following hyperlinks may

be difficult

Because Wikipedia does not have a button that allows readers to

go back to the previous page we assume the jumping back action

to consist in clicking the back button of the browser in use until

reaching again the session starting page The restart parameter

indirectly embeds the back-button action which for the absence of

back-links on Wikipedia can not be tracked on the graph

The behaviors replicated through the model recall those de-

scribed in [43 45]

42 Exposure MetricsAt this point we have all the ingredients to define the exposureto diverse information The metrics aim to quantify how much the

network structure allows readers to reach one or multiple sets of

articles To do that we rely on both the CwP and Navigationmodels

The application of the following metrics is not limited to polarizing

topics In fact they can generalize to the analysis of any sets of

nodes in a graph For this reason we adopt a more general notation

in their definition

Pro-

life

Pro-

choi

ce

Proh

ibiit

onAc

tivism

Cont

rol

Righ

ts

Crea

tioni

smEv

ol B

io

Racis

m

Anti-

racis

m

Disc

rimin

atio

nSu

ppor

t

0

5

10

Log(

Page

view

s)

Figure 5 Pageviews distribution For each topic we havea purple and yellow boxplot They represent the average(over all pages in the group 119875 or 119875) number of pageviewsAll the distribution distributions except for abortion are sta-tistically different at confidence level 120572 = 095 The topicsin order are abortion cannabis guns evolution racism andLGBT

Definition 2 (Exposure to diverse information (ExDIN)) Giventwo sets of pages 119875 119875 in 119881 let 120645 ℓ

119875be the vector indicating for each

article the probability of being reached at step ℓ (ℓ ge 1) starting froma random page in 119875 We say that the exposure of 119875 to 119875 is

119890ℓ119875rarr119875

=sum119895 isin119875

Pr(119883 ℓ = 119895) =sum119895 isin119875

120645 ℓ119875 (5)

and describes the probability that a reader in 119875 reaches an arbitrarynode in 119875 at the ℓth click

We employ this metric in two ways

(1) (Topological exposure to diverse information) If ℓ is 1 and

the CwP model is 119872119906(see Sect 412) it only quantifies

the topological property of the network to connect pages

belonging to different sets

(2) (Readersrsquo exposure to diverse information) For any parameter

and model that we pick the metric tells us how the readers

characterized by the CwP and Navigation models change

their exposure to diverse information over a session (ie

sequence of clicks)

Moreover we notice that Definition 2 can be extended tomultiple

sets Consider the case where we want to understand how one set of

nodes 119875 is exposed to three sets of nodes 119876119885 and 119871 To calculate

the ExDIN if we want to know the total exposure to the three sets

we define 119875 = 119876 cup 119885 cup 119871 Otherwise if we want to have the ExDIN

wrt to each set namely 119890119875rarr119876 119890119875rarr119885 119890119875rarr119871 we take 120645ℓ119875and sum

up the probabilities of the nodes within each set

Now that we have a metric to compute the exposure to diverseinformation we want to compare the flows among the sets Thus

we introduce the mutual exposure to diverse information

Definition 3 (Mutual exposure to diverse information (M-ExDIN))Let 119890ℓ

119875rarr119875and 119890ℓ

119875rarr119875be the exposure to diverse information of sets 119875

and 119875 We say that the mutual exposure between the sets is

120598ℓ =min119890ℓ

119875rarr119875 119890ℓ119875rarr119875

max119890ℓ119875rarr119875

119890ℓ119875rarr119875

isin [0 1] (6)

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

00 05 10 15 20 25 of pageview from

opposite partition

Pro-lifePro-choice

ProhibitionActivism

Gun controlGun rights

CreazionismEvolutionary biology

LGBT discriminationLGBT rights

RacismAnti-racism

Figure 6 Percentage of pageviews coming from oppositeside Topics in order from the bottom abortion cannabisguns evolution LGBT racism

If either 119890ℓ119875rarr119875

or 119890ℓ119875rarr119875

is 0 then 120598 = 0

This measure quantifies to what extend the exposure to diverse

information is balanced across 119875 and 119875

The closer 120598 is to 1 the more balanced the probabilities of moving

from one set to the other are In this case the network topology does

not favor connections from one set to the other On the other hand

if 120598 is close to 0 it the network structure tends to favor either the

navigation from 119875 rarr 119875 or from 119875 rarr 119875 On this perspective if we

observe a tendency in the network of facilitating the exploration

from one of the sets to the other we may say that the network

topology is biased toward a direction Thus we can think of using

M-ExDIN to measure the bias in the network wrt two sets of

nodes

Even though the mutual exposure to diverse information cap-

tures the balance among ExDIN of 119875 and 119875 when they are of com-

parable size it may fail if one is much smaller than the other For

instance suppose 119875 is 10 times larger than 119875 then if pages of both

partitions have a similar out-degree distribution one would expect

119890ℓ119875rarr119875

asymp 10 middot 119890ℓ119875rarr119875

and as a result 120598 asymp 01 The same happens if

they have similar in-degree distribution For this reason when we

compute either ExDIN and M-ExDIN we check whether the sizes

of the communities are unbalanced and we proceed as follows If

|119875 | lt |119875 | we define 119875 prime obtained by sampling |119875 | articles from 119875

Thus we use the new set for all computations Because of the ran-

domness of the phenomenon we repeat the measurements multiple

times

5 RQ1 READERSrsquo TOPIC CONSUMPTIONBefore looking into how readers are exposed to diverse content

we investigate how they have consumed each of the six topics

that we concentrate on over the last four years In particular we

collect monthly clickstream data from November 2017 to September

2020 We note that when we count the click views of a page we

consider the average over the number of months the page existed16

Accordingly when computing the occurrences for the transitions

matrix based on clickstream we consider the average clicks of the

link over the number of months it exists In this way we reduce

16Based on the temporal graphs extracted by [11]

the seasonality effect and weight links according to page changes

in terms of hyperlinks

51 Pageviews DistributionTo start our analysis we count the average number of times a

page has been visited over 34 months In Figure 5 we plot the

log-distributions of the pageviews for each topic and opinion By

running a t-test we conclude that for all topics except for abortionthe difference of the means of opinionsrsquo pageviews is statistically

significant for 119901 lt 005 This finding demonstrate that users tend to

visit more pages expressingsupporting one of the two viewpoints

From a networkrsquos perspective to increase the exposure to opposing

opinions it is desirable for pages that are frequently visited to be

well connected to articles expressing opposing opinions

In Figure 6 we break down the pageviews showing how many

of them come from pages of the opposing partition Overall the

fraction of visits from the opposite side is low (below 05) The

category LGBT rights has the highest ratio of visits from LGBT dis-crimination pages about 25 For topics such as guns and abortionthe percentage of visits from opposite partition shows that there

are somewhat fewer visits to pages of a liberal inclination from

articles expressing a more conservative opinion In fact the 028

of visits to pro-choice come from pro-life compared to 06 visits of

pro-life from pro-choice

52 External or Internal Access to the TopicWe now investigate how readers access content about a topic As

introduced in Section 411 from the clickstream data we can com-

pute the RSR which indicates whether a page is accessed more

by external sources or by navigating Wikipedia In Figure 7 we

provide a visualization that depicts the flows of the cumulative

visits from external and internal pages towards the two partitions

Referring to Figure 7(c) the 448 of visualizations come from

internal pages The click stream from internal pages is broken down

to see the proportion of flow towards guns control and gun rightsThe internal views of guns control articles are 34 times more than

those of gun rights We observe that also from external websites

most of the traffic is towards gun control (27 times more than gunrights) Overall the 26 of the total visits to gun related content is

concentrated on gun rightsThe abundance of traffic towards one of the two opinions does

not characterize only the guns topic Indeed among all the topics

the 59ndash74 of visits is accumulated by one partition Moreover

readersrsquo preferences appear consistent among external and internal

accesses that is they both point more towards the same view of

the topic For both internal and external views the distribution

of accesses toward partitions is approximately the same (ie the

percentage of visits from external to 119875 (resp 119875 ) is the same of

from internal to 119875 (resp119875 )) The only exception is evolution whoseexternal visits to creationism is 453 lower than internal accesses

We note that partitions with higher views are not necessarily the

biggest in the topic-induced networks

In general the largest amount of visits to topicsrsquo articles comes

from external pages Particularly only the 236 and 335 of traffic

to evolution and racism is generated by the internal Wikipediarsquos

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll PPPPPrrrrrooooo-----llllliiiiifffffeeeee

PPPPPrrrrrooooo-----ccccchhhhhoooooiiiiiccccceeeee

(a) Abortion

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

PPPPPrrrrrooooohhhhhiiiiibbbbbiiiiiiiiiitttttooooonnnnn

AAAAAccccctttttiiiiivvvvviiiiisssssmmmmm

(b) Cannabis

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

CCCCCooooonnnnntttttrrrrrooooolllll

RRRRRiiiiiggggghhhhhtttttsssss

(c) Guns

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllllCCCCCrrrrreeeeeaaaaatttttiiiiiooooonnnnniiiiisssssmmmmm

EEEEEvvvvvooooolllll BBBBBiiiiiooooo

(d) Evolution

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

RRRRRaaaaaccccciiiiisssssmmmmm

AAAAAnnnnntttttiiiii-----rrrrraaaaaccccciiiiisssssmmmmm

(e) Racism

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

DDDDDiiiiissssscccccrrrrriiiiimmmmmiiiiinnnnnaaaaatttttiiiiiooooonnnnn

SSSSSuuuuuppppppppppooooorrrrrttttt

(f) LGBT

Figure 7 Cumulative Pagesrsquo Traffic Eachplot indicates (1) On the left the cumulative amount of accesses coming from externalweb pages or internal Wikipediarsquos articles (2) The flows of visits from external and internal pages to partitions (3) On theright the cumulative accesses to 119875 and 119875

0 5 10 15 20 25 30Avg(Click Through Rate) X 100

Pro-lifePro-choiceProhibition

ActivismGun controlGun rights

CreazionismEvolutionary biologyLGBT discrimination

LGBT rightsRacism

Anti-racism

Figure 8 Average click-through rate In this plot we reportthe average CTR of pages belonging to the same set 119875 (resp119875) The score indicate the average probability that a linkwithin a page 119901 in 119875 (resp 119875) is clicked Topics in orderfrom the bottom abortion cannabis guns evolution racismLGBT

navigation The same quantity for the remaining topics ranges

ranges between 44 and 47

We point out that readersrsquo consulting articles about abortioncannabis and guns are inclined toward pages conveying liberal

views on the topic Instead it is more complicated to draw inter-

pretations about the remaining topics One explanation may be

that users look for information generally less covered in the public

mainstream debate

53 How Much Readers Navigate LinksOnce readers visit a page they can decide to click any of its links

We want to understand how frequently they do so For that we

compute the average pagesrsquo click-through rate (Sect 411)

We plot this information in Figure 8 Overall we see that the

percentage of access turning into a visit to another page ranges

between 10ndash28 Dimitrov and Lemmerich [14] observed that the

CTR average for the whole Wikipedia is 12 So most of the subset

of pages we consider have a CTR higher than Wikipediarsquos average

The CTR of guns control is the highest (28) the pages about racismfollow with 26 The articles that over the years have generated

less internal traffic are those about evolutionary biology and LGBTrights

Examining pagesrsquo connections we found that those with higher

CTR have more links (the Pearson correlation coefficient of is 052)

Topic macrC(119875)

macrC(119875 rarr 119875) macrC(119875 rarr 119875)

macrC(119875) macrC(119875 rarr 119875)

macrC(119875 rarr 119875)Abortion 6889 9082 5764 7026 8981 4537

Cannabis 7081 9501 3750 6578 9652 1667

Guns 5234 7869 3568 5963 7535 3928

Evolution 7115 8449 6447 7269 9900 5513

Racism 5636 8841 3432 7187 9063 6544

LGBT 6166 8942 525 7242 9252 5917

Table 3 Average of links within pages clicked less than 10times

macrC(119875) is the average percentage of un-clicked hyper-linkswithin pages in 119875

macrC(119875 rarr 119875) is the average percentageof un-clicked links within 119875 pointing to 119875

This suggests that articles having higher out-degree offer more

options to users Presumably because of this users are more likely

to continue the exploration from those articles

In addition we count the number of links clicked fewer than 10

times over the last three years see Table 3 As an example given

a page in creationism on average 7115 of its links have been

clicked fewer than 10 times If we distinguish between references

to creationism and references to evolution readers did not click

the 8449 of links pointing to creationism and the 6447 of those

pointing to evolutionary biology

6 RQ2 EXPOSURE ACROSS TOPICVIEWPOINTS

The main contribution of this paper is to examine to what extent

current Wikipediarsquos topology supports users to explore diverse

facets of polarizing issues In particular we study (1) how readers are

locally exposed to diverse information and (2) how their exposure

to plural opinions may change throughout a navigation session

61 Exposure to DiversityTo evaluate the exposure to diversity induced by the networkrsquos

topology we compute the exposure to diverse information for ℓ = 1

using the uniform CwP model Recalling that ℓ indicates the usersrsquo

session length if we set it equals to 1 we study the exposure to

diversity over one-click sessions

Plots in the first row of Figure 9 show the value of ExDIN for

all the topics when CwP is119872119906

For instance let evolution be the topic we analyze If readers

start uniformly at random from a page about creationism the prob-

ability of visiting an article of the same partition is 576 On the

contrary the chances of entering a page about evolutionary biology

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

Pro-life

Pro-choice

Unifo

rm

456

748133

061 Prohibiiton

Activism

338

038039

008 Control

Rights

444

478161

116 Creationism

Evol Bio

576

327061

018 Racism

Anti-racism

876

8613

142 Discrimination

Support

424

395063

054

Pro-life

Pro-choice

Posit

ion

428

781115

056 Prohibiiton

Activism

332

037037

008 Control

Rights

468

473138

089 Creationism

Evol Bio

553

329054

016 Racism

Anti-racism

892

834129

159 Discrimination

Support

394

356056

042

Pro-

life

Pro-

choi

ce

Pro-life

Pro-choice

Click

s

1274

1357129

123Pr

ohib

iiton

Activ

ism

Prohibiiton

Activism

864

065081

013

Cont

rol

Righ

ts

Control

Rights

1376

1305208

137

Crea

tioni

sm

Evol

Bio

Creationism

Evol Bio

1157

52069

03

Racis

m

Anti-

racis

m

Racism

Anti-racism

2367

1503116

129

Disc

rimin

atio

n

Supp

ort

Discrimination

Support

1107

653066

053

Figure 9 ExDIN Each patch of the matrix is the exposure to diverse information across partitions for example 119875 rarr 119875 119875 rarr 119875 The 119910-axis indicates the source and the 119909-axis is the destination To each row corresponds the exposure to diverse informationcomputed for different CwP Darker colors indicate higher probability of being in the correspondent square in one click

is 018 (32 times smaller) On the other hand readers starting

uniformly at random from an article about evolutionary biologyhave 327 chances of reading pages conveying the same opinion

This probability is 5 times larger than that of visiting creationismpages

It is worth to point out that the current networkrsquos topology not

only nudges users in reading more about the same opinion but

also hinders them to explore diverse content symmetrically Indeed

users reading about evolutionary biology have higher chances of

reading one article about creationism (3 times more) than users from

creationism of reading about evolutionary biology After repeatingthe same analysis for all the topics we realize that the aforemen-

tioned observations hold for most of them Moreover we note that

the probabilities to continue the session reading a page of the same

opinion is greater for one of the two partitions of a given topic

Taken together these measurements highlight that the structure

of the network facilitates users to explore knowledge bubbles ofhomogeneous view and makes the measure of mutual exposure to

diverse information smaller than 1

The findings above report the intrinsic capability of the network

to spur users towards diverse content If we want to combine it

with readersrsquo next-click choice behavior we use the the positionand clicks CwP models instead of the uniform We show the results

in the second and third row of Figure 9

Referring back to evolution we now consider the matrix corre-

sponding to the ExDIN computed using the position CwP model

(second row) We see that if users click with higher chances links

at the top of the page wrt the uniform model the probabilities are

only slightly modified These modest variations are coherent with

linksrsquo placement within pages Figure 2

For a few topics such as guns the linksrsquo position plays a more

significant role worsening the user exposure to diverse information

Indeed in pages about guns control links belonging to the gunsrights partition seem to be mentioned later in the page The con-

sequence is that the probability of reaching an article supporting

guns rights starting uniformly at random from an article in gunscontrol has a 30 drop wrt the probability observed using the

uniform CwP model Therefore we conclude that for some topics

the placement of links within pages contributes in reducing the

exposure to diverse information In other words users who tend

to click with higher probability the links located towards the be-

ginning of a page have less possibility to read about contrasting

opinions

Finally we analyze how the phenomenon changes when we use

the click CwP model (third row) In this case we assume readers

make the next-click choice similarly to past users Going back to

evolution we immediately observe that the probability to start a

session in creationism and to continue reading about it after one

click grows from 576 of the uniform model to the 1157

For all the topics we verify a significant increment of the proba-

bility to visit pages of the same opinion Simply interpreting this

result we can say that real users click more the links strictly related

to the page they are reading From another perspective combining

this finding with the previous remark saying that the ldquotopology of

the network seems to drive users to explore knowledge bubbles ofhomogeneous viewrdquo we ask the reader the following question Hasthe behavior of past users been influenced by the network topologyUnfortunately because of lack of information we can not answer

this question but we hope it will be addressed in future works

Furthermore we observe that for some topics like abortion theprobability of reaching pro-choice articles from pro-life duplicatesThis is a sign that users may be willing to explore content proposing

diverse view

Before moving to the next section we want to underline that

stronger relations among pages of similar content is an intrinsic

property of Wikipedia In fact in Wikipediarsquos Linking Manual [49]editors are asked to link related content [7 33] Although this is a

fact we believe that it would be valuable to provide to editors met-

rics and tools making them aware of the effect that a newcurrent

links have on usersrsquo exposure to diverse information This is not

meant to alter the core and essential intrinsic property ofWikipedia

rather to avoid this property to become harmful when it prevents

users from accessing diverse content

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

1 3 5 7 9 11 13 15

1

2

e PP

Abortion

1 3 5 7 9 11 13 15

00

05

10Cannabis

1 3 5 7 9 11 13 150

1

2

Guns

1 3 5 7 9 11 13 15

05

10Evolution

1 3 5 7 9 11 13 15

05

10

15

Racism

1 3 5 7 9 11 13 15

05

10

Lgbt

1 3 5 7 9 11 13 1500

05

10

15

e PP

1 3 5 7 9 11 13 15

000

025

050

1 3 5 7 9 11 13 15

1

2

1 3 5 7 9 11 13 15

02

04

06

1 3 5 7 9 11 13 1505

10

15

1 3 5 7 9 11 13 15

05

10

15

1 3 5 7 9 11 13 15

40

60

80

100

1 3 5 7 9 11 13 15

25

50

75

1 3 5 7 9 11 13 1525

50

75

100

1 3 5 7 9 11 13 15

40

60

80

1 3 5 7 9 11 13 15

60

80

100

1 3 5 7 9 11 13 15

60

80

100

Number of Clicks

uniform alpha=0position alpha=0clicks alpha=0

uniform alpha=02position alpha=02clicks alpha=02

uniform alpha=1position alpha=1clicks alpha=1

Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42

62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions

longer than 1 and see how their (mutual) exposure to diversity may

change According to the combinations of models employed to mea-

sure the ExDIN and M-ExDIN we provide different insights about

the effect of the current networkrsquos topology on usersrsquo exposure to

diverse content over a navigation session

In Figure 10 for sessions of length 15 we plot the ExDIN from

119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can

notice that each of the topics shows its own trends For this reason

we decide to highlight and provide an explanation for the most

recurrent patterns Moreover for better understanding we suggest

to cross-check the following explanations with the analysis done

above in the paper We start describing how given the current net-

workrsquos topology the ExDIN changes over the course of a navigation

session (first and second row)

(1) The curves corresponding to the same value of 120572 (same color)

show very similar trends Depending on the respective CwP model

(markerrsquos shape) they are shifted up or down This implies that

when users share the same navigation behavior the way they make

the next-click choice plays a crucial role on determining the mag-

nitude of their exposure to diverse content In general the CwP

model (markersrsquo shapes) corresponding to higher exposure is119872119888

followed by119872119906and119872119901

(2) If users navigate mirroring a star-like behavior (green 120572 = 1)

their exposure to the opposite opinion is steady It can slightly

decrease or increase when the probability of clicking links to the

opposite side becomes higher or lower respectively in the first

iterations This happens because these kind of users are only subject

to the exposure of their starting navigation page So the more links

to the opposite partition they click in the first steps the more their

exposure to diversity decreases and vice-versa

(3) The curves of users who randomly navigate the network (sky

blue 120572 = 0) show two trends For both cases after the first few

clicks the ExDIN is lower than at the beginning of the session

Then it inverts the trend In one case it reaches or exceeds the

starting exposure On the other hand it grows getting steady below

the initial exposure The more the destination partition is connected

to the rest of the graph the more users randomly navigating the

network are able to reach it Sometimes ExDIN from 119875 to 119875 of

LGBT the curves start to decrease after many steps This happens

when the pages within the destination partition have been reached

with high probability

(4) Users characterized by a star-like random navigation (blue

120572 = 02) are exposed to diverse content similarly to users exploring

randomly the network but the ExDIN magnitude is greater because

of the possibility of jumping back to the starting point at each step

Given this observation we analyze the ExDIN of guns With the

current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)

are those characterized by a star-like behavior As soon as the users

navigate in a random fashion this probability drops down This is

because the guns rights partition has a number of in-going edges

that prevent random users who walk away to get back to it We

observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to

the guns control partition is higher or comparable to that of star-like

behaving users The probability of reaching guns control after many

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

steps becomes high because of the higher incoming edges of the

partition

From complementary analysis we observe that for sessions

longer than 2 page visits the topology of the graph is not such

to keep random users within the knowledge bubble they accessed

Indeed for all the topics the probability to visit articles of the same

opinion of the starting page becomes close to 0 (refer to Fig 9 to

cross-check the probabilities at the first click) For users with star-

like behavior the probabilities show slightly descending curves

This demonstrates that they tend to pick pages of the same opinion

at the first iterations Due to space constraints figures picturing

these phenomena could not be displayed

Nowwe compare the ExDIN curves of each topic to understand if

users reading about different opinions have equal chances of visiting

each other We recall that to do it we use the Mutual exposure to

diverse information see Sect 42 For all the topics the mutual

exposure to diverse information is lower than 100 meaning that

for none of them does the network topology provide an equal

exposure across opinions If we consider topics like abortion and

guns the longer the usersrsquo session is the more the network topology

prevents readers to symmetrically explore different opinions

In general if we detect low mutuality the topology of the net-

work favors the exploration of one side of the topic more than the

other Moreover we want to stress that all the comments regarding

users navigating according to the uniform CwP model express the

intrinsic topological exposure of the network On Wikipedia two

scenarios determining this phenomenon may be (1) the knowledge

on the encyclopedia is complete but articles are underlinked [49]

that is there is the content and keywords to become anchors thus

we need a strategy to densify the network ensuringmutual exposure

to diversity (2) the knowledge on the encyclopedia is incomplete

that is there are no words to attach links thus the addition of com-

plementary content may be necessary An in depth investigation of

this conditions may be an interesting future work

7 CONCLUSIONSOur work provides a first analysis to understand how the current

Wikipediarsquos network topology assists readers to explore opposite

stances of polarizing topics spanning over sets of articles We for-

malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles

expressing opposing viewpoints The latter evaluates whether the

relationship across diverse views is symmetric that is whether

the flow and the opportunity to go from one side to the other is

comparable for the two directions

We investigate the phenomenon on six polarizing topics (Sect 6)

In addition we also study the overall usersrsquo topics consumption

Our main findings suggest the following

bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-

ing from both external and internal pages suggest that read-

ers are inclined to seek content about one facet of the topic

Most seem to have bias toward liberal content (see Fig 7)

bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse

contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61

the analysis suggests that users reading about an opinion

have higher chances of continuing to explore articles of sim-

ilar views than of the opposite Furthermore for each of the

topics that we explored the users of one of the two views

had substantially more tendency albeit small to visit pages

of the opposing view than the ones of the other one

bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we

observe that the mutual exposure to diverse information is

never achieved by users navigating completely at random

The better one of the two opinions is connected to the rest of

the network the more the graph nudges users toward that

opinion

bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-

mation of the same inclination diminishes On the other

hand if users explore the network with a star-like behavior

their level of exposure to the same opinion is similar to those

who only do one click

In our study we analyze sets of articles assigned to opinions ac-

cording to editorsrsquo crafted categories [44] Although this approach

represents a solid starting point for analysis it can cause article mis-

classification As future work we plan to investigate a more reliable

classification strategy to improve the accuracy of our analysis an

analysis which should include also the content of the articles along

the line information Secondly the performance of a longitudinal

study with the goal of understanding the dynamics that brought

to the current state of the encyclopediarsquos network would provide

further understanding of the usersrsquo behavior Finally in the light

of our findings we deem crucial to design tools to help editors to

contextualize articles within the network such that they are aware

of the effect of links insertion on users knowledge exploration

The prevalence of bias and polarization is well established in

multiple areas of our life and filter bubbles aggravate this phenom-

enon Understanding better how they manifest in Wikipedia (and

other media) is a crucial first step for finding ways to attenuate it

and our hope is that this work is a step towards this goal

Acknowledges Partially supported by the ERCAdvancedGrant

788893 AMDROMA Algorithmic and Mechanism Design Research

in Online Markets and MIUR PRIN project ALGADIMAR Algo-

rithms Games and Digital Markets

REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004

US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem

[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion

networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep

Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18

[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian

Self-Governance in Action Motivating the Policy Lens In ICWSM

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models

IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation

Journal of machine Learning research 3 Jan (2003) 993ndash1022

[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-

work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web

[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content

on famous persons Journal of the American society for information science andtechnology (2011)

[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter

Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM

[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves

Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter

In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs

A complete longitudinal and multi-language dataset of the Wikipedia link net-

works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607

[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena

Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-

ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media

[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho

2016 Extracting Semantics from Random Walks on Wikipedia Comparing

Learning and Counting Methods In Wiki ICWSM

[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-

ferent topic different traffic How search and navigation interplay on Wikipedia

The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28

[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926

[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web

[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding

news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346

[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers

and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320

[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in

Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72

[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In

Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)

[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Political discourse on social media Echo chambers gate-

keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922

[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-

ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671

[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)

[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-

rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282

[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First

Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media

[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013

Models of human navigation in information networks based on decentralized

search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98

[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki

dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-

trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113

[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and

Andreas Hotho 2019 On the right track Analysing and predicting navigation

success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext

and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web

Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web

[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017

How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50

[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of

source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754

[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-

nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and

moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505

[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my

friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726

[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito

2015 Measuring political polarization Twitter shows the two sides of Venezuela

Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018

Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18

[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving

website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624

[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring

Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval

[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk

about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11

[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019

Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-

fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and

Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141

[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014

The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222

[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom

of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn

Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600

[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013

Computing semantic relatedness from human navigational paths A case study

on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70

[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer

2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)

[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-

works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki

WikipediaManual_of_StyleLinking

[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia

Namespace

[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki

WikipediaNeutral_point_of_view

[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect

[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https

doiorg106084m9figshare1305770v22 (2017)

[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing

wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985

  • Abstract
  • 1 Introduction
  • 2 Related Works
  • 3 Data Collection
    • 31 Topic Induced Networks
    • 32 General Statistics on Topics Networks
      • 4 Metrics
        • 41 Content Consumption
        • 42 Exposure Metrics
          • 5 RQ1 Readers Topic Consumption
            • 51 Pageviews Distribution
            • 52 External or Internal Access to the Topic
            • 53 How Much Readers Navigate Links
              • 6 RQ2 Exposure Across Topic Viewpoints
                • 61 Exposure to Diversity
                • 62 Dynamic Exposure to Diversity
                  • 7 Conclusions
                  • References

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

Figure 1 From Wikipediarsquos graph to a topic-induced net-work The image on the left shows the original Wikipediarsquosgraph On the rights we have the final topic-induced net-work The dashed circles in119882 identify the set of nodes thatwe use to build the topic-induced network 119866 The color redrefers to the set of nodes 119875 We use the blue to indicate 119875

and green and yellow for N and 119904 respectively To keep theimage tidy we do not specify the edges direction

us the requested granularity So we exploit the collection proce-

dure employed by Shi et al [44] who needed the same data to study

how polarization in teams impact articlesrsquo content about polarizing

topics see Sect 3

Polarization on Social Media There is a large spectrum of

works related to detect [1 10 12 19 36] model quantify and mit-

igate [2 9 21ndash24 32 34 35 37] polarization on social media We

focus on the work of Garimella et al [24] that better relates to our

metric of exposure to diverse information (ExDIN) They introduced

a graph polarization measure based on random walks ie Random

Walk Controversy score (RWC) On a social graph it quantifies to

what extent opinionated users are more exposed to their own opin-

ion than the opposite thanks to a chain of retweets (represented by

the random walks) While RWC is conceived for networks of users

and measures the overall polarization of a graph ExDIN works

on information networks and quantifies how the networkrsquos topol-

ogy impacts the usersrsquo exposure to diverse information when they

navigate the graph

Cultural bias onWikipedia Callahan and Herring [8] showedthe presence of cultural bias in the same articles of different lan-

guages Other studies highlighted differences between women and

men biographies [26 47] These content-based analyses call for

the need for a thorough investigation of the phenomenon To this

end we decide to investigate the presence of bias in the hyperlink

network by quantifying the diversity of pages it suggests to users

browsing the network of articles

3 DATA COLLECTIONTo audit a polarizing topic on Wikipedia we encode it by building

a topic-induced network This representation embeds both the

network structure and readersrsquo interactions with the topic

31 Topic Induced NetworksIn this section we explain how to build a topic-induced network

We suggest the reader to follow the process looking at Figure 1

First we consider the directed English Wikipediarsquos graph119882 =

(119860 119871) The nodes of the graph are encyclopediarsquos pages classified

as Articles [50] The edges represent the links connecting pages

and are known as wikilinks6 This set of links includes those in the

infoboxes7

Among the vertices we identify a set of pages T sub 119860 about the

different polarizing sides of a given topic We partition T into two

sets 119875 and 119875 (ie 119875 cap 119875 = empty and 119875 cup 119875 = T ) Each of them gather

pages related to the same side of the topic Then we define the set

of nodes N that includes all vertices at one-hop distance from the

vertices in T The reason we consider nodes representing pages

outside T is twofold (1) We want to include in the graph those

nodes related to the topic that do not appear in T because describe

subjects neutral to the topic8 (2) When we will consider readers

exploring the network we want to account for the possibility that

they reach pages about entities of opposing opinion passing through

articles not strictly related to the topic (see Sect 41)

To reduce the complexity of our analysis we cluster all the pages

in 119878 = 119860 (T cup N) in one super node 119904 Note that nodes in 119878

are only connected to vertices in N For each node 119907 isin N we can

have multiple edges going to 119904 We compress them in a unique edge

(119907 119904) Respectively 119904 can point multiple times to the same node

119907 isin N So we compress them to a unique edge (119904 119907) In both cases

the weights of (119907 119904) and (119904 119907) will be the sum of weights of the

aggregated edges

Finally we built a directed weighted network119866 = (119881 119864) that wecall topic-induced network whose set of vertices119881 is T cupNcup119904 ofcardinality 119899 + 1 and the edges 119864 are the links connecting the pages

The edge weights are transition probabilities as follows Let119872 be

an (119899 + 1) times (119899 + 1) right-stochastic transition matrix associated

to 119866 that is a matrix such that each entry 119898119894 119895 is a probability

with119898119894 119895 = 0 if (119894 119895) notin 119864 and such that

sum119899+1

119895=1119898119894 119895 = 1 The entry

119898119894 119895 describes the probability that being on article 119894 a reader clicks

page 119895 In Section 412 we propose different characterizations of

the transition matrix

Summarizing to extract the topic-induced network of a given

topic we first extracted data from a complete English Wikipedia

database dump9From this dump we build the graph119882 To collect

the corpus of articles expressing different opinions about the topic

(ie T ) we rely on the collection strategy adopted by the authors

of [44] (see Sect 2) In particular the subcorpus belonging to 119875

consists of all articles categorized under a Wikipedia category de-

scribing a viewpoint and its subcategories For instance the corpus

of abortion articles consists of two subcorpora pro-life (119875 ) and

pro-choice (119875 ) articles The pro-life subcorpus consists of all articlescategorized under the seed category ldquoAnti-abortion movementrdquo and

its subcategoriesFor instance the article ldquoFetal rightsrdquo is directly un-

der the seed category whereas the article ldquoCrisis pregnancy centerrdquo

is located under the subcategory ldquoAnti-abortion organizationsrdquo The

pro-choice corpus is collected in a similar fashion starting from the

category ldquoAbortion-rights movementrdquo Note that because we want

6We exclude links within the same page Moreover while building the graph

we resolve all the redirects [52] Specifically for any given node 119903 pointed by 119906 and

redirecting to 119907 we replace the edges (119906 119903 ) and (119903 119907) with (119906 119907) The final effectof this operation is that we exclude all the redirecting nodes from119860 while retaining

their connections to the rest of the graph

7An infobox is a fixed-format table usually added to the top right-hand corner of

articles to consistently present a summary of some unifying aspect that the articles

share

8For instance articles that present an overall introductiondescription of the topic

9Unless differently specified we refer to the dump of September 2020

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

Topic |119881 119904| |119875 | |119875 | |N | |119864 | |119864119875rarr119875 | |119864119875rarr119875 | |119864119875rarrN | |119864119875rarrN | |119864Nrarr119875 | |119864Nrarr119875 | 119876 (119875 119875) Unreach(119875 ) Unreach(119875 )

Abort 56861 481 291 56093 19M 205 97 21843 14492 21396 29889 029 (030041) 21 481

Cannabis 32743 45 231 32470 11M 8 6 1089 15055 656 27823 027 (014 003) 1136 349

Guns 65743 167 187 65393 25M 98 115 18342 12304 56702 16608 026 (024 030) 363 000

Evolution 84788 342 1334 83113 199M 391 135 18289 45472 15601 58720 020 (022 027) 169 562

Racism 129963 1024 1022 127953 48M 746 560 64359 41566 74354 58195 032 (021 031) 272 255

LGBT 150563 459 640 149479 46M 195 143 28100 22678 92975 81706 034 (030 013) 244 535

Table 1 Networksrsquo statistics The notation119876 (119875 119875) isin [0 1] indicates the modularity among the partitions Higher119876 means thatconnections within partitions exceed those among them

Pro-lifePro-choice

ProhibiitonActivism

ControlRights

CreationismEvol Bio

Racism Anti-racism

DiscriminationSupport

000

025

050

075

Links

pos

ition

Opposite opinion Same opinion

Figure 2 Linksrsquo position distribution within pages Given 119875 and 119875 the orange boxplots show the distribution of links withinpages in 119875 (resp 119875) that point to articles in 119875 (resp 119875 ) The green boxes represent linkrsquos placement among pages only belongingto 119875 (resp 119875 ) The value of the y-axis is the relative position re-scaled with the 119905119886119899ℎ to similarly score links at the top of thepage Higher the value higher the position in the page is

119875 and 119875 to be disjoint articles belonging to both ldquoAnti-abortion

movementrdquo and ldquoAbortion-rights movementrdquo are assigned to N10

Once we have the list of pages in T we proceed building the topic-

induced network as described in the first part of this section The

articles we collect gather pages about different entities such as

organizations people events The inclusion of a heterogeneous set

of pages for each viewpoint allows to capture the different way a

user can learnknow about a topic

Before moving on we need to make two remarks (1) Throughoutthe paper when we talk about articles expressing an opinion ordescribe a viewpoint of a topic we do not mean that they endorse

the position of any subject they describe But they objectively talk of

entities that are close to one side of the issue (2) Since subcategoriesare often redundant or not entirely related to the parent category

we check them manually In this way we avoid cases like having

articles about anti-racism falling into the racism category Moreover

we do not consider categories whose names do not include topic-

specific keywords

32 General Statistics on Topicsrsquo NetworksFollowing the procedure explained in the previous section we

collect the topic-induced network related to six different topics

that we pick from the List of controversial issues on Wikipedia11

and other resources that indicate some controversial issues in our

society These topics are abortion cannabis guns evolution LGBTand racism These are critical topics that often polarize as follows

pro-choice vs prolife cannabis activism vs cannabis prohibition

gun control vs gun rights creationism vs evolutionary biology

support to LGBT rights vs opposition to LGBT rights and racism

10We report the size of the intersections between partitions in the next section

11httpsenwikipediaorgwikiWikipediaList_of_controversial_issues

Topic 119875 119875 Seed 119875 Seed 119875

Abortion Pro-life Pro-choice

Anti-abortion

movement

Abortion-rights

movement

Cannabis Prohibition Activism Cannabis prohibition Cannabis activism

Guns Control Rights

Gun control

advocacy groups

Gun rights

advocacy groups

Evolution Creationism

Evolutionary

biology

Creationism

Evolutionary

biology

Racism Racism Anti-racism Racism Anti-racism

LGBT Discrimination Support

Discrimination against

LGBT people

LGBT rights

movement

Table 2 The table indicates what opinion of a topic the par-titions 119875 and 119875 correspond to

vs anti-racism Information about the seed categories of each topic

are in Table 2 The full category lists and sample titles are provided

in the code folder Sect 1

For the rest of the paper we refer to the opinions about a topic

using 119875 and 119875 In Table 2 for each topic we match each set to the

real opinion it represents

Before presenting the general statistics of the retrieved networks

we remark that when we assign the articles to partitions we put

to the set N those assigned to both partitions The size of the

intersections among partitions (ie the number of common articles)

are the following abortion is 2 cannabis is 3 evolution is 2 guns is 1lgbt is 5 racism is 7 Recalling that we do not remove these articles

(ie they belong to N ) they can still act as bridges connecting 119875

and 119875 in sessions longer than 1 click Instead when we consider

the direct connections among partitions (1 click) we discard them

since they do not explicitly categorized into one partition

In Table 1 we show some statistics on the six topic-induced

networks Immediately we observe that the size of 119875 and 119875 differ

substantially for all the topics except for racism and guns It means

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

that we have one of the two opinions represented by more articles

In terms of content this does not necessarily imply that neither

one of the two views is incomplete nor insufficiently represented

Indeed a topic spans a few articles or may require more pages to

be complete On the other hand the unbalanced sizes can affect an

opinionrsquos exposure within the entire Wikipediarsquos network Practi-

cally if a set of articles is large and well connected to the rest of

the network the chances that users who randomly browses reach

it are higher than those of going to a small partition Moreover if

readers exploit the random article functionality of Wikipedia an

opinion more represented gets more chances of being randomly

sampled

The topics showing the higher unbalance are cannabis wherethere are five times more pages about activism than about prohi-bition and evolution where there are four times more pages about

evolutionary biology than about creationism If we consider the edges

across partitions the number of cross-partition edges is higher for

bigger sets This is reasonable because more nodes can point to the

opposite side Despite that for evolution the edges from creationismto evolutionary biology are sim3 times more and for LGBT the edges

from discrimination to rights are 36 more Despite the low number

of edges across cannabis partitions we decide not to discard the

topic

Above we said that one of the two partitions might connect

better to the rest of the encyclopedia We observe that the sizes of 119875

and 119875 are not linear in the number of edges that point out or to the

nodes in the partitions For instance the number of articles about

pro-choice (291) is half of the nodes related to pro-life movement

(481) Although the nodes in pro-life are twice as many as those in

pro-choice the number of links pointing to pages about pro-choiceis 36 more than those pointing to pro-life articles This happenswith different magnitude also for guns and LGBT We will see later

that the fact that a side of a topic is better blended in the network

has implications on the readersrsquo exposure to one of the two sides

of the topic (Sect 6)

We also investigate how many pages in 119875 and 119875 cannot be

reached by users unless they enter Wikipedia directly on those

pages The sets of articles with the highest number of unreachable

nodes are in the category of cannabis prohibition (1136) followed

by the 562 of evolutionary biology and LGBT rights (535)Furthermore we compute the modularity 119876 among 119875 and 119875

Higher 119876 means that connections within partitions exceed those

among them In Table 1 we report three values computed on dif-

ferently weighted graphs with probabilities assigned to click the

link of each page as follows (1) uniform (2) proportional to the

position of the link within the page and (3) proportional to readersrsquo

clickstream (see Sect 412) Overall if we consider the position of

links and readersrsquo clickstream it seems that the partitions are more

modular

Based on that we study how links across and within partitions

position in pages First we define the position of a link Given a

page we have its list of links in order of appearance We get the

relative rank within the list for each link and re-scale it by the tanh

In this way we have values in [0 1] and the links at the top of the

list get a more similar score The set of links includes those in the

infoboxes We regard them as at the top of the article according to

results in [15 17] If a link appears more than once we average its

position

In Figure 2 we show the position distributions According to

the t-test whose significant level is fixed to 120572 = 095 the average

position of links in pro-choice pointing to pro-choice is significantlydifferent than the average position of links pointing to pro-life Alsothe position of links from guns control to guns control is signifi-cantly higher than those to guns rights For evolutionary biologywhose distribution of links to creationism are placed statistically

significantly lower than those to evolutionary biology The same

happens for LGBTFor the sake of completeness of the analysis even if not used

further in the paper for each topic we study the quality of the pages

populating it In particular we use the ORES API to get the ldquoarticlequalityrdquo We observe that overall for all the topics between 60 and

70 the articles are classified as stubs or start Then the 22-29 is

in B-class the 0-5 are Featured Articles and the remaining belong

to the C-class12

4 METRICSIn this section we define the models and metrics that we use to an-

swer the research questions formulated in Sect 1 First we describe

how we characterize readersrsquo consumption either by analyzing

real usersrsquo data or by simulating their behavior (see Sect 41) Then

we introduce the core metrics of the paper ExDIN and M-ExDIN

see Sect 42

41 Content ConsumptionTo understand readersrsquo consumption of polarizing topics we need

different modeling strategies that we describe in the following

subsections

411 Metrics Based on Clickstream We build twometrics upon the

information we extract from usersrsquo clickstream data that are made

publicly available by Wikimedia and preserve usersrsquo privacy [14

54]13

From these data we infer 119888119894 119895 counting how many times a hyper-

link to 119894 isin 119881 is clicked from page 119895 The page 119895 may be either an

internalWikipedia page ( 119895 isin 119860 recalling that119881 = T cupNcup119878 includeall the Wikipedia pages) or external if corresponds to a page from

outside Wikipedia (eg a search engine) Thus we define the vari-

able120575 119895 which indicateswhether 119895 is an external page or it belongs to

the topic-induced network 120575 119895 = 1 if 119895 is external and 0 otherwise

Given a page 119894 we indicate withJ the set of external and internal

pages pointing to it see Figure 3 We define 119888119894 =sum

119895 isinJ 119888 119895119894 to be

the total clicks to the page

sum119895 isinJ 120575119894119888 119895119894 is the total number of clicks

from external websites therefore the difference between 119888119894 and this

summation is the number of visits from internal (Wikipedia) pages

Now we are ready to define the following metrics

Reader Search Rate (RSR) Given a page 119894 isin 119881 the empirical

probability that a visit to page 119894 is from an external website is

119877119878119877119894 =

sum119895 isinJ 120575119894119888 119895119894

119888119894 (1)

12httpsenwikipediaorgwikiTemplateGrading_scheme

13Description of the data is at httpsmetawikimediaorgwikiResearch

Wikipedia_clickstream The provided information is enough to extract the clickstream

based metrics

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

Click Through Rate (CTR) Given a page 119894 isin 119881 the empirical

probability that a reader clicks a link within the page is

119862119879119877119894 =

sum119895 isin119873119900119906119905 (119894) 119888119894 119895

119888119894 (2)

where 119873119900119906119905 (119894) is the the set of pages 119894 points to (Multiple clicks

from the same page are counted as originating from different visits

to 119894 and thus counted multiple times in 119888119894 )

412 Model Clicks Within Pages When readers visit a page they

have the possibility of clicking any of the present links However

according to the information needs they want to satisfy each of the

links may have a different probability of being clicked [45] Now

we propose three models to describe the distribution probability

of clicking a link ldquojrdquo within an article ldquoirdquo First let 119894 be an article

in 119881 and 119895 isin 119873119900119906119905 (119894) We define 119901119900119904 ( 119895 |119894) as the rank of 119895 among

all links in 119894 and 119903 ( 119895 |119894) = |119873119900119906119905 (119894) | minus 119901119900119904 ( 119895 |119894) such that a higher

value indicates a higher ranking position Moreover we introduce

tanh119909 = 1198902119909minus1

1198902119909+1 which we use to transform ranking positions to

values between 0 and 1 such that links at the top of the page get

similar scores

The Clicks Within Pages models (CwP) are directly applicable on

119866 by setting the transition matrix119872 in one of the following modes

(1) 119872119906(Uniform) whose entry119898(119894 119895) = 1

|119873119900119906119905 (119894) | mimics read-

ers who click each link in a page uniformly at random

(2) 119872119901(Position) whose entry 119898(119894 119895) =

tanh 119903 ( 119895 |119894)sum119895isin119873119900119906119905 (119894 ) tanh 119903 ( 119895 |119894)

captures the scenario in which readers click with higher

probability links appearing first in the page This model is

based on previous work that shows how the link position is

a good predictor to determine the success of a link [16 31]

(3) 119872119888(Clicks) whose entry119898(119894 119895) = 119888119894119895sum

119895isin119873119900119906119905 (119894 ) 119888119894119895represents

the empirical probability that users in 119894 will click the link

toward 119895 When 119888119894 119895 lt 10 we substitute it with 1014 the

minimum number of times the link must be clicked to be

included in the dataset [53]

For the sake of completeness we recall that 119866 includes a super

node 119904 To fill its corresponding entries in the transition matrices

we need to aggregate over the edges we compressed to build the

graph15

see Sect 31

413 Readers Navigation Model The main goal of this paper is

to audit the mutual exposure to diverse information across 119875 and

119875 We can do it by simply looking at a snapshot of the graph and

counting the links going from 119875 to 119875 and vice-versa To do a step

further we recall that the Wikipediarsquos network is conceived to let

users move fulfilling their own information needs Thus we want

to understand how different usersrsquo navigation behavior can affect

readersrsquo exposure to diverse information

To do that it would be optimal to have access to usersrsquo log ses-

sion Because these data are not available to the public we define a

parametric model that simulates usersrsquo navigation by embedding

14We aim to model users on the current version of Wikipedia Thus to include all

the links we assign a smoothing factor equal to 10 to links clicked less than 10 This

implies a small probability of clicking these links Setting the smoothing factor to 10

is a deliberate choice However we experimentally verified that setting any number

between 1 and 10 does not affect the results

15The computation of these quantities is straightforward so we omit it from the

body of the paper

external

internal

i internal

Figure 3 Information from the clickstreamdataset For eachnode we extract the number of views coming from inter-nal and external websites Moreover we know howmany ac-cesses on a page turn into a click toward another article

different behaviors accordingly to chosen parameter We empha-

size that the scope of this model is not to perfectly replicate usersrsquo

behavior on Wikipedia Rather we want to see how users simu-

lated from a reasonable and general model are exposed to diverse

information

In other words we want to define a stochastic process with 119899 +1

states corresponding to the 119899 + 1 pages in119881 that approximates the

probability of reaching any of the articles starting at random from

119901 isin 119875 (or from 119875 )

Wemodel this by considering the process 119883 ℓ ℓ = 0 1 119871 on

the set of nodes119881 induced by transitionmatrix119872 with starting state

119883 0selected from the probability distribution 1206450

119875= (120587119875 )119894 isin R1times119899

over119881 We recall that the transition matrix119872 can vary according to

the CwP models (Sections 31 and 412) Based on the assumption

that usersrsquo session length (the number of clicks) is finite we evaluate

the process on a finite number of states 119871 We have that Pr(119883 ℓ =

119895) = (120587 ℓ119875) 119895 where the (row) vector 120645 ℓ

119875is given by the following

variation of the Personalized Random Walk with Restart (RWR)

Definition 1 (Navigation Model) Let1198720 be the transition ma-trix embedding a click-within-pages model 1206450

119875the distribution of the

starting state over 119875 and 120572 isin [0 1] the restart parameter We have

1206451

119875 = 1206450

119875middot1198720 (3)

and for ℓ ge 1

120645 ℓ+1

119875 = (1 minus 120572)120645 ℓ119875 middot119872ℓ + 120572 (1206450

119875 middot119872ℓ ) (4)

where119872ℓ = norm((119863 (119872ℓminus1)119879 )119879 ) and119863 = 119889119894119886119892

(1 + 120645 ℓminus1

119875

)minus1

norm(119872)transforms matrix119872 into a right-stochastic matrix by normalizingeach row independently such that it sums to 1

This process is a variation of the standard random-surfer (PageR-

ank) model with the difference that the transition matrix is updated

in each step It takes into account the probability that an article

has already been visited in a previous iteration Specifically the

vector 120645 ℓ119875that we get at the end of each iteration represents the

likelihood that each node is reached at step ℓ if it starts uniformly at

random from a node in 119875 We assume that readers within the same

session do not click more than once the same link Thus we desire

that at step ℓ + 1 the nodes that are clicked with high probability

at step ℓ see their probability of being reached deflated and those

with lower probability have more chances of being clicked We

achieve this by dividing the rows of119872 by the vector of probabilities

120645 ℓ119875+1 where 1 is a smoothing factor to avoid divisions by 0 and

then normalize the matrix to get the updated stochastic matrix to

use in the next iteration

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

(a) Star-like (120572 = 1) (b) Star-like Rand Navigation(0 lt

120572 lt 1)

(c) Random Naviga-

tion (0 = 120572 )

Figure 4 Navigation model for different 120572 The green nodesrepresent the starting navigation pages

Overall as we will later see in Section 42 this approach allow us

to investigate how the exposure to diverse information varies for

users who behave differently in terms of navigation session length

(meant as the number of clicks) and next-link choices

Looking deeper at the model

bull When 120572 = 1 Figure 4(a) the model emulates the reader

whose navigation consists in just opening links from the

starting page We call this behavior star-like and basically

consists in opening pages from the starting node With this

kind of exploration readers locally explore articles likely

semantically related to each other [49]

bull For 0 lt 120572 lt 1 Figure 4(b) we simulate two cases (1) readers

open sequential articles and then jump back to the starting

page (2) readers keeps multiple path open The more 120572 is

close to 1 the more users show a star-like behavior Instead

the closer 120572 is to 0 the more users navigate navigate in a

more DFS-oriented fashion Thus readers move randomly

according to the CwP model and from time to times jump

back to the starting page

bull If 120572 = 0 Figure 4(c) the users sequentially clicks links so

each click depends only on the CwP model In this case

especially if related articles are not densely connected the

exploration can lead to articles less related to the starting

page and returning to the origin following hyperlinks may

be difficult

Because Wikipedia does not have a button that allows readers to

go back to the previous page we assume the jumping back action

to consist in clicking the back button of the browser in use until

reaching again the session starting page The restart parameter

indirectly embeds the back-button action which for the absence of

back-links on Wikipedia can not be tracked on the graph

The behaviors replicated through the model recall those de-

scribed in [43 45]

42 Exposure MetricsAt this point we have all the ingredients to define the exposureto diverse information The metrics aim to quantify how much the

network structure allows readers to reach one or multiple sets of

articles To do that we rely on both the CwP and Navigationmodels

The application of the following metrics is not limited to polarizing

topics In fact they can generalize to the analysis of any sets of

nodes in a graph For this reason we adopt a more general notation

in their definition

Pro-

life

Pro-

choi

ce

Proh

ibiit

onAc

tivism

Cont

rol

Righ

ts

Crea

tioni

smEv

ol B

io

Racis

m

Anti-

racis

m

Disc

rimin

atio

nSu

ppor

t

0

5

10

Log(

Page

view

s)

Figure 5 Pageviews distribution For each topic we havea purple and yellow boxplot They represent the average(over all pages in the group 119875 or 119875) number of pageviewsAll the distribution distributions except for abortion are sta-tistically different at confidence level 120572 = 095 The topicsin order are abortion cannabis guns evolution racism andLGBT

Definition 2 (Exposure to diverse information (ExDIN)) Giventwo sets of pages 119875 119875 in 119881 let 120645 ℓ

119875be the vector indicating for each

article the probability of being reached at step ℓ (ℓ ge 1) starting froma random page in 119875 We say that the exposure of 119875 to 119875 is

119890ℓ119875rarr119875

=sum119895 isin119875

Pr(119883 ℓ = 119895) =sum119895 isin119875

120645 ℓ119875 (5)

and describes the probability that a reader in 119875 reaches an arbitrarynode in 119875 at the ℓth click

We employ this metric in two ways

(1) (Topological exposure to diverse information) If ℓ is 1 and

the CwP model is 119872119906(see Sect 412) it only quantifies

the topological property of the network to connect pages

belonging to different sets

(2) (Readersrsquo exposure to diverse information) For any parameter

and model that we pick the metric tells us how the readers

characterized by the CwP and Navigation models change

their exposure to diverse information over a session (ie

sequence of clicks)

Moreover we notice that Definition 2 can be extended tomultiple

sets Consider the case where we want to understand how one set of

nodes 119875 is exposed to three sets of nodes 119876119885 and 119871 To calculate

the ExDIN if we want to know the total exposure to the three sets

we define 119875 = 119876 cup 119885 cup 119871 Otherwise if we want to have the ExDIN

wrt to each set namely 119890119875rarr119876 119890119875rarr119885 119890119875rarr119871 we take 120645ℓ119875and sum

up the probabilities of the nodes within each set

Now that we have a metric to compute the exposure to diverseinformation we want to compare the flows among the sets Thus

we introduce the mutual exposure to diverse information

Definition 3 (Mutual exposure to diverse information (M-ExDIN))Let 119890ℓ

119875rarr119875and 119890ℓ

119875rarr119875be the exposure to diverse information of sets 119875

and 119875 We say that the mutual exposure between the sets is

120598ℓ =min119890ℓ

119875rarr119875 119890ℓ119875rarr119875

max119890ℓ119875rarr119875

119890ℓ119875rarr119875

isin [0 1] (6)

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

00 05 10 15 20 25 of pageview from

opposite partition

Pro-lifePro-choice

ProhibitionActivism

Gun controlGun rights

CreazionismEvolutionary biology

LGBT discriminationLGBT rights

RacismAnti-racism

Figure 6 Percentage of pageviews coming from oppositeside Topics in order from the bottom abortion cannabisguns evolution LGBT racism

If either 119890ℓ119875rarr119875

or 119890ℓ119875rarr119875

is 0 then 120598 = 0

This measure quantifies to what extend the exposure to diverse

information is balanced across 119875 and 119875

The closer 120598 is to 1 the more balanced the probabilities of moving

from one set to the other are In this case the network topology does

not favor connections from one set to the other On the other hand

if 120598 is close to 0 it the network structure tends to favor either the

navigation from 119875 rarr 119875 or from 119875 rarr 119875 On this perspective if we

observe a tendency in the network of facilitating the exploration

from one of the sets to the other we may say that the network

topology is biased toward a direction Thus we can think of using

M-ExDIN to measure the bias in the network wrt two sets of

nodes

Even though the mutual exposure to diverse information cap-

tures the balance among ExDIN of 119875 and 119875 when they are of com-

parable size it may fail if one is much smaller than the other For

instance suppose 119875 is 10 times larger than 119875 then if pages of both

partitions have a similar out-degree distribution one would expect

119890ℓ119875rarr119875

asymp 10 middot 119890ℓ119875rarr119875

and as a result 120598 asymp 01 The same happens if

they have similar in-degree distribution For this reason when we

compute either ExDIN and M-ExDIN we check whether the sizes

of the communities are unbalanced and we proceed as follows If

|119875 | lt |119875 | we define 119875 prime obtained by sampling |119875 | articles from 119875

Thus we use the new set for all computations Because of the ran-

domness of the phenomenon we repeat the measurements multiple

times

5 RQ1 READERSrsquo TOPIC CONSUMPTIONBefore looking into how readers are exposed to diverse content

we investigate how they have consumed each of the six topics

that we concentrate on over the last four years In particular we

collect monthly clickstream data from November 2017 to September

2020 We note that when we count the click views of a page we

consider the average over the number of months the page existed16

Accordingly when computing the occurrences for the transitions

matrix based on clickstream we consider the average clicks of the

link over the number of months it exists In this way we reduce

16Based on the temporal graphs extracted by [11]

the seasonality effect and weight links according to page changes

in terms of hyperlinks

51 Pageviews DistributionTo start our analysis we count the average number of times a

page has been visited over 34 months In Figure 5 we plot the

log-distributions of the pageviews for each topic and opinion By

running a t-test we conclude that for all topics except for abortionthe difference of the means of opinionsrsquo pageviews is statistically

significant for 119901 lt 005 This finding demonstrate that users tend to

visit more pages expressingsupporting one of the two viewpoints

From a networkrsquos perspective to increase the exposure to opposing

opinions it is desirable for pages that are frequently visited to be

well connected to articles expressing opposing opinions

In Figure 6 we break down the pageviews showing how many

of them come from pages of the opposing partition Overall the

fraction of visits from the opposite side is low (below 05) The

category LGBT rights has the highest ratio of visits from LGBT dis-crimination pages about 25 For topics such as guns and abortionthe percentage of visits from opposite partition shows that there

are somewhat fewer visits to pages of a liberal inclination from

articles expressing a more conservative opinion In fact the 028

of visits to pro-choice come from pro-life compared to 06 visits of

pro-life from pro-choice

52 External or Internal Access to the TopicWe now investigate how readers access content about a topic As

introduced in Section 411 from the clickstream data we can com-

pute the RSR which indicates whether a page is accessed more

by external sources or by navigating Wikipedia In Figure 7 we

provide a visualization that depicts the flows of the cumulative

visits from external and internal pages towards the two partitions

Referring to Figure 7(c) the 448 of visualizations come from

internal pages The click stream from internal pages is broken down

to see the proportion of flow towards guns control and gun rightsThe internal views of guns control articles are 34 times more than

those of gun rights We observe that also from external websites

most of the traffic is towards gun control (27 times more than gunrights) Overall the 26 of the total visits to gun related content is

concentrated on gun rightsThe abundance of traffic towards one of the two opinions does

not characterize only the guns topic Indeed among all the topics

the 59ndash74 of visits is accumulated by one partition Moreover

readersrsquo preferences appear consistent among external and internal

accesses that is they both point more towards the same view of

the topic For both internal and external views the distribution

of accesses toward partitions is approximately the same (ie the

percentage of visits from external to 119875 (resp 119875 ) is the same of

from internal to 119875 (resp119875 )) The only exception is evolution whoseexternal visits to creationism is 453 lower than internal accesses

We note that partitions with higher views are not necessarily the

biggest in the topic-induced networks

In general the largest amount of visits to topicsrsquo articles comes

from external pages Particularly only the 236 and 335 of traffic

to evolution and racism is generated by the internal Wikipediarsquos

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll PPPPPrrrrrooooo-----llllliiiiifffffeeeee

PPPPPrrrrrooooo-----ccccchhhhhoooooiiiiiccccceeeee

(a) Abortion

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

PPPPPrrrrrooooohhhhhiiiiibbbbbiiiiiiiiiitttttooooonnnnn

AAAAAccccctttttiiiiivvvvviiiiisssssmmmmm

(b) Cannabis

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

CCCCCooooonnnnntttttrrrrrooooolllll

RRRRRiiiiiggggghhhhhtttttsssss

(c) Guns

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllllCCCCCrrrrreeeeeaaaaatttttiiiiiooooonnnnniiiiisssssmmmmm

EEEEEvvvvvooooolllll BBBBBiiiiiooooo

(d) Evolution

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

RRRRRaaaaaccccciiiiisssssmmmmm

AAAAAnnnnntttttiiiii-----rrrrraaaaaccccciiiiisssssmmmmm

(e) Racism

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

DDDDDiiiiissssscccccrrrrriiiiimmmmmiiiiinnnnnaaaaatttttiiiiiooooonnnnn

SSSSSuuuuuppppppppppooooorrrrrttttt

(f) LGBT

Figure 7 Cumulative Pagesrsquo Traffic Eachplot indicates (1) On the left the cumulative amount of accesses coming from externalweb pages or internal Wikipediarsquos articles (2) The flows of visits from external and internal pages to partitions (3) On theright the cumulative accesses to 119875 and 119875

0 5 10 15 20 25 30Avg(Click Through Rate) X 100

Pro-lifePro-choiceProhibition

ActivismGun controlGun rights

CreazionismEvolutionary biologyLGBT discrimination

LGBT rightsRacism

Anti-racism

Figure 8 Average click-through rate In this plot we reportthe average CTR of pages belonging to the same set 119875 (resp119875) The score indicate the average probability that a linkwithin a page 119901 in 119875 (resp 119875) is clicked Topics in orderfrom the bottom abortion cannabis guns evolution racismLGBT

navigation The same quantity for the remaining topics ranges

ranges between 44 and 47

We point out that readersrsquo consulting articles about abortioncannabis and guns are inclined toward pages conveying liberal

views on the topic Instead it is more complicated to draw inter-

pretations about the remaining topics One explanation may be

that users look for information generally less covered in the public

mainstream debate

53 How Much Readers Navigate LinksOnce readers visit a page they can decide to click any of its links

We want to understand how frequently they do so For that we

compute the average pagesrsquo click-through rate (Sect 411)

We plot this information in Figure 8 Overall we see that the

percentage of access turning into a visit to another page ranges

between 10ndash28 Dimitrov and Lemmerich [14] observed that the

CTR average for the whole Wikipedia is 12 So most of the subset

of pages we consider have a CTR higher than Wikipediarsquos average

The CTR of guns control is the highest (28) the pages about racismfollow with 26 The articles that over the years have generated

less internal traffic are those about evolutionary biology and LGBTrights

Examining pagesrsquo connections we found that those with higher

CTR have more links (the Pearson correlation coefficient of is 052)

Topic macrC(119875)

macrC(119875 rarr 119875) macrC(119875 rarr 119875)

macrC(119875) macrC(119875 rarr 119875)

macrC(119875 rarr 119875)Abortion 6889 9082 5764 7026 8981 4537

Cannabis 7081 9501 3750 6578 9652 1667

Guns 5234 7869 3568 5963 7535 3928

Evolution 7115 8449 6447 7269 9900 5513

Racism 5636 8841 3432 7187 9063 6544

LGBT 6166 8942 525 7242 9252 5917

Table 3 Average of links within pages clicked less than 10times

macrC(119875) is the average percentage of un-clicked hyper-linkswithin pages in 119875

macrC(119875 rarr 119875) is the average percentageof un-clicked links within 119875 pointing to 119875

This suggests that articles having higher out-degree offer more

options to users Presumably because of this users are more likely

to continue the exploration from those articles

In addition we count the number of links clicked fewer than 10

times over the last three years see Table 3 As an example given

a page in creationism on average 7115 of its links have been

clicked fewer than 10 times If we distinguish between references

to creationism and references to evolution readers did not click

the 8449 of links pointing to creationism and the 6447 of those

pointing to evolutionary biology

6 RQ2 EXPOSURE ACROSS TOPICVIEWPOINTS

The main contribution of this paper is to examine to what extent

current Wikipediarsquos topology supports users to explore diverse

facets of polarizing issues In particular we study (1) how readers are

locally exposed to diverse information and (2) how their exposure

to plural opinions may change throughout a navigation session

61 Exposure to DiversityTo evaluate the exposure to diversity induced by the networkrsquos

topology we compute the exposure to diverse information for ℓ = 1

using the uniform CwP model Recalling that ℓ indicates the usersrsquo

session length if we set it equals to 1 we study the exposure to

diversity over one-click sessions

Plots in the first row of Figure 9 show the value of ExDIN for

all the topics when CwP is119872119906

For instance let evolution be the topic we analyze If readers

start uniformly at random from a page about creationism the prob-

ability of visiting an article of the same partition is 576 On the

contrary the chances of entering a page about evolutionary biology

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

Pro-life

Pro-choice

Unifo

rm

456

748133

061 Prohibiiton

Activism

338

038039

008 Control

Rights

444

478161

116 Creationism

Evol Bio

576

327061

018 Racism

Anti-racism

876

8613

142 Discrimination

Support

424

395063

054

Pro-life

Pro-choice

Posit

ion

428

781115

056 Prohibiiton

Activism

332

037037

008 Control

Rights

468

473138

089 Creationism

Evol Bio

553

329054

016 Racism

Anti-racism

892

834129

159 Discrimination

Support

394

356056

042

Pro-

life

Pro-

choi

ce

Pro-life

Pro-choice

Click

s

1274

1357129

123Pr

ohib

iiton

Activ

ism

Prohibiiton

Activism

864

065081

013

Cont

rol

Righ

ts

Control

Rights

1376

1305208

137

Crea

tioni

sm

Evol

Bio

Creationism

Evol Bio

1157

52069

03

Racis

m

Anti-

racis

m

Racism

Anti-racism

2367

1503116

129

Disc

rimin

atio

n

Supp

ort

Discrimination

Support

1107

653066

053

Figure 9 ExDIN Each patch of the matrix is the exposure to diverse information across partitions for example 119875 rarr 119875 119875 rarr 119875 The 119910-axis indicates the source and the 119909-axis is the destination To each row corresponds the exposure to diverse informationcomputed for different CwP Darker colors indicate higher probability of being in the correspondent square in one click

is 018 (32 times smaller) On the other hand readers starting

uniformly at random from an article about evolutionary biologyhave 327 chances of reading pages conveying the same opinion

This probability is 5 times larger than that of visiting creationismpages

It is worth to point out that the current networkrsquos topology not

only nudges users in reading more about the same opinion but

also hinders them to explore diverse content symmetrically Indeed

users reading about evolutionary biology have higher chances of

reading one article about creationism (3 times more) than users from

creationism of reading about evolutionary biology After repeatingthe same analysis for all the topics we realize that the aforemen-

tioned observations hold for most of them Moreover we note that

the probabilities to continue the session reading a page of the same

opinion is greater for one of the two partitions of a given topic

Taken together these measurements highlight that the structure

of the network facilitates users to explore knowledge bubbles ofhomogeneous view and makes the measure of mutual exposure to

diverse information smaller than 1

The findings above report the intrinsic capability of the network

to spur users towards diverse content If we want to combine it

with readersrsquo next-click choice behavior we use the the positionand clicks CwP models instead of the uniform We show the results

in the second and third row of Figure 9

Referring back to evolution we now consider the matrix corre-

sponding to the ExDIN computed using the position CwP model

(second row) We see that if users click with higher chances links

at the top of the page wrt the uniform model the probabilities are

only slightly modified These modest variations are coherent with

linksrsquo placement within pages Figure 2

For a few topics such as guns the linksrsquo position plays a more

significant role worsening the user exposure to diverse information

Indeed in pages about guns control links belonging to the gunsrights partition seem to be mentioned later in the page The con-

sequence is that the probability of reaching an article supporting

guns rights starting uniformly at random from an article in gunscontrol has a 30 drop wrt the probability observed using the

uniform CwP model Therefore we conclude that for some topics

the placement of links within pages contributes in reducing the

exposure to diverse information In other words users who tend

to click with higher probability the links located towards the be-

ginning of a page have less possibility to read about contrasting

opinions

Finally we analyze how the phenomenon changes when we use

the click CwP model (third row) In this case we assume readers

make the next-click choice similarly to past users Going back to

evolution we immediately observe that the probability to start a

session in creationism and to continue reading about it after one

click grows from 576 of the uniform model to the 1157

For all the topics we verify a significant increment of the proba-

bility to visit pages of the same opinion Simply interpreting this

result we can say that real users click more the links strictly related

to the page they are reading From another perspective combining

this finding with the previous remark saying that the ldquotopology of

the network seems to drive users to explore knowledge bubbles ofhomogeneous viewrdquo we ask the reader the following question Hasthe behavior of past users been influenced by the network topologyUnfortunately because of lack of information we can not answer

this question but we hope it will be addressed in future works

Furthermore we observe that for some topics like abortion theprobability of reaching pro-choice articles from pro-life duplicatesThis is a sign that users may be willing to explore content proposing

diverse view

Before moving to the next section we want to underline that

stronger relations among pages of similar content is an intrinsic

property of Wikipedia In fact in Wikipediarsquos Linking Manual [49]editors are asked to link related content [7 33] Although this is a

fact we believe that it would be valuable to provide to editors met-

rics and tools making them aware of the effect that a newcurrent

links have on usersrsquo exposure to diverse information This is not

meant to alter the core and essential intrinsic property ofWikipedia

rather to avoid this property to become harmful when it prevents

users from accessing diverse content

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

1 3 5 7 9 11 13 15

1

2

e PP

Abortion

1 3 5 7 9 11 13 15

00

05

10Cannabis

1 3 5 7 9 11 13 150

1

2

Guns

1 3 5 7 9 11 13 15

05

10Evolution

1 3 5 7 9 11 13 15

05

10

15

Racism

1 3 5 7 9 11 13 15

05

10

Lgbt

1 3 5 7 9 11 13 1500

05

10

15

e PP

1 3 5 7 9 11 13 15

000

025

050

1 3 5 7 9 11 13 15

1

2

1 3 5 7 9 11 13 15

02

04

06

1 3 5 7 9 11 13 1505

10

15

1 3 5 7 9 11 13 15

05

10

15

1 3 5 7 9 11 13 15

40

60

80

100

1 3 5 7 9 11 13 15

25

50

75

1 3 5 7 9 11 13 1525

50

75

100

1 3 5 7 9 11 13 15

40

60

80

1 3 5 7 9 11 13 15

60

80

100

1 3 5 7 9 11 13 15

60

80

100

Number of Clicks

uniform alpha=0position alpha=0clicks alpha=0

uniform alpha=02position alpha=02clicks alpha=02

uniform alpha=1position alpha=1clicks alpha=1

Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42

62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions

longer than 1 and see how their (mutual) exposure to diversity may

change According to the combinations of models employed to mea-

sure the ExDIN and M-ExDIN we provide different insights about

the effect of the current networkrsquos topology on usersrsquo exposure to

diverse content over a navigation session

In Figure 10 for sessions of length 15 we plot the ExDIN from

119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can

notice that each of the topics shows its own trends For this reason

we decide to highlight and provide an explanation for the most

recurrent patterns Moreover for better understanding we suggest

to cross-check the following explanations with the analysis done

above in the paper We start describing how given the current net-

workrsquos topology the ExDIN changes over the course of a navigation

session (first and second row)

(1) The curves corresponding to the same value of 120572 (same color)

show very similar trends Depending on the respective CwP model

(markerrsquos shape) they are shifted up or down This implies that

when users share the same navigation behavior the way they make

the next-click choice plays a crucial role on determining the mag-

nitude of their exposure to diverse content In general the CwP

model (markersrsquo shapes) corresponding to higher exposure is119872119888

followed by119872119906and119872119901

(2) If users navigate mirroring a star-like behavior (green 120572 = 1)

their exposure to the opposite opinion is steady It can slightly

decrease or increase when the probability of clicking links to the

opposite side becomes higher or lower respectively in the first

iterations This happens because these kind of users are only subject

to the exposure of their starting navigation page So the more links

to the opposite partition they click in the first steps the more their

exposure to diversity decreases and vice-versa

(3) The curves of users who randomly navigate the network (sky

blue 120572 = 0) show two trends For both cases after the first few

clicks the ExDIN is lower than at the beginning of the session

Then it inverts the trend In one case it reaches or exceeds the

starting exposure On the other hand it grows getting steady below

the initial exposure The more the destination partition is connected

to the rest of the graph the more users randomly navigating the

network are able to reach it Sometimes ExDIN from 119875 to 119875 of

LGBT the curves start to decrease after many steps This happens

when the pages within the destination partition have been reached

with high probability

(4) Users characterized by a star-like random navigation (blue

120572 = 02) are exposed to diverse content similarly to users exploring

randomly the network but the ExDIN magnitude is greater because

of the possibility of jumping back to the starting point at each step

Given this observation we analyze the ExDIN of guns With the

current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)

are those characterized by a star-like behavior As soon as the users

navigate in a random fashion this probability drops down This is

because the guns rights partition has a number of in-going edges

that prevent random users who walk away to get back to it We

observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to

the guns control partition is higher or comparable to that of star-like

behaving users The probability of reaching guns control after many

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

steps becomes high because of the higher incoming edges of the

partition

From complementary analysis we observe that for sessions

longer than 2 page visits the topology of the graph is not such

to keep random users within the knowledge bubble they accessed

Indeed for all the topics the probability to visit articles of the same

opinion of the starting page becomes close to 0 (refer to Fig 9 to

cross-check the probabilities at the first click) For users with star-

like behavior the probabilities show slightly descending curves

This demonstrates that they tend to pick pages of the same opinion

at the first iterations Due to space constraints figures picturing

these phenomena could not be displayed

Nowwe compare the ExDIN curves of each topic to understand if

users reading about different opinions have equal chances of visiting

each other We recall that to do it we use the Mutual exposure to

diverse information see Sect 42 For all the topics the mutual

exposure to diverse information is lower than 100 meaning that

for none of them does the network topology provide an equal

exposure across opinions If we consider topics like abortion and

guns the longer the usersrsquo session is the more the network topology

prevents readers to symmetrically explore different opinions

In general if we detect low mutuality the topology of the net-

work favors the exploration of one side of the topic more than the

other Moreover we want to stress that all the comments regarding

users navigating according to the uniform CwP model express the

intrinsic topological exposure of the network On Wikipedia two

scenarios determining this phenomenon may be (1) the knowledge

on the encyclopedia is complete but articles are underlinked [49]

that is there is the content and keywords to become anchors thus

we need a strategy to densify the network ensuringmutual exposure

to diversity (2) the knowledge on the encyclopedia is incomplete

that is there are no words to attach links thus the addition of com-

plementary content may be necessary An in depth investigation of

this conditions may be an interesting future work

7 CONCLUSIONSOur work provides a first analysis to understand how the current

Wikipediarsquos network topology assists readers to explore opposite

stances of polarizing topics spanning over sets of articles We for-

malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles

expressing opposing viewpoints The latter evaluates whether the

relationship across diverse views is symmetric that is whether

the flow and the opportunity to go from one side to the other is

comparable for the two directions

We investigate the phenomenon on six polarizing topics (Sect 6)

In addition we also study the overall usersrsquo topics consumption

Our main findings suggest the following

bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-

ing from both external and internal pages suggest that read-

ers are inclined to seek content about one facet of the topic

Most seem to have bias toward liberal content (see Fig 7)

bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse

contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61

the analysis suggests that users reading about an opinion

have higher chances of continuing to explore articles of sim-

ilar views than of the opposite Furthermore for each of the

topics that we explored the users of one of the two views

had substantially more tendency albeit small to visit pages

of the opposing view than the ones of the other one

bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we

observe that the mutual exposure to diverse information is

never achieved by users navigating completely at random

The better one of the two opinions is connected to the rest of

the network the more the graph nudges users toward that

opinion

bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-

mation of the same inclination diminishes On the other

hand if users explore the network with a star-like behavior

their level of exposure to the same opinion is similar to those

who only do one click

In our study we analyze sets of articles assigned to opinions ac-

cording to editorsrsquo crafted categories [44] Although this approach

represents a solid starting point for analysis it can cause article mis-

classification As future work we plan to investigate a more reliable

classification strategy to improve the accuracy of our analysis an

analysis which should include also the content of the articles along

the line information Secondly the performance of a longitudinal

study with the goal of understanding the dynamics that brought

to the current state of the encyclopediarsquos network would provide

further understanding of the usersrsquo behavior Finally in the light

of our findings we deem crucial to design tools to help editors to

contextualize articles within the network such that they are aware

of the effect of links insertion on users knowledge exploration

The prevalence of bias and polarization is well established in

multiple areas of our life and filter bubbles aggravate this phenom-

enon Understanding better how they manifest in Wikipedia (and

other media) is a crucial first step for finding ways to attenuate it

and our hope is that this work is a step towards this goal

Acknowledges Partially supported by the ERCAdvancedGrant

788893 AMDROMA Algorithmic and Mechanism Design Research

in Online Markets and MIUR PRIN project ALGADIMAR Algo-

rithms Games and Digital Markets

REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004

US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem

[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion

networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep

Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18

[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian

Self-Governance in Action Motivating the Policy Lens In ICWSM

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models

IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation

Journal of machine Learning research 3 Jan (2003) 993ndash1022

[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-

work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web

[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content

on famous persons Journal of the American society for information science andtechnology (2011)

[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter

Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM

[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves

Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter

In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs

A complete longitudinal and multi-language dataset of the Wikipedia link net-

works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607

[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena

Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-

ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media

[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho

2016 Extracting Semantics from Random Walks on Wikipedia Comparing

Learning and Counting Methods In Wiki ICWSM

[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-

ferent topic different traffic How search and navigation interplay on Wikipedia

The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28

[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926

[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web

[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding

news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346

[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers

and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320

[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in

Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72

[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In

Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)

[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Political discourse on social media Echo chambers gate-

keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922

[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-

ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671

[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)

[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-

rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282

[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First

Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media

[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013

Models of human navigation in information networks based on decentralized

search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98

[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki

dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-

trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113

[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and

Andreas Hotho 2019 On the right track Analysing and predicting navigation

success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext

and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web

Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web

[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017

How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50

[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of

source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754

[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-

nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and

moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505

[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my

friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726

[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito

2015 Measuring political polarization Twitter shows the two sides of Venezuela

Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018

Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18

[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving

website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624

[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring

Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval

[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk

about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11

[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019

Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-

fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and

Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141

[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014

The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222

[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom

of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn

Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600

[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013

Computing semantic relatedness from human navigational paths A case study

on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70

[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer

2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)

[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-

works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki

WikipediaManual_of_StyleLinking

[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia

Namespace

[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki

WikipediaNeutral_point_of_view

[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect

[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https

doiorg106084m9figshare1305770v22 (2017)

[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing

wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985

  • Abstract
  • 1 Introduction
  • 2 Related Works
  • 3 Data Collection
    • 31 Topic Induced Networks
    • 32 General Statistics on Topics Networks
      • 4 Metrics
        • 41 Content Consumption
        • 42 Exposure Metrics
          • 5 RQ1 Readers Topic Consumption
            • 51 Pageviews Distribution
            • 52 External or Internal Access to the Topic
            • 53 How Much Readers Navigate Links
              • 6 RQ2 Exposure Across Topic Viewpoints
                • 61 Exposure to Diversity
                • 62 Dynamic Exposure to Diversity
                  • 7 Conclusions
                  • References

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

Topic |119881 119904| |119875 | |119875 | |N | |119864 | |119864119875rarr119875 | |119864119875rarr119875 | |119864119875rarrN | |119864119875rarrN | |119864Nrarr119875 | |119864Nrarr119875 | 119876 (119875 119875) Unreach(119875 ) Unreach(119875 )

Abort 56861 481 291 56093 19M 205 97 21843 14492 21396 29889 029 (030041) 21 481

Cannabis 32743 45 231 32470 11M 8 6 1089 15055 656 27823 027 (014 003) 1136 349

Guns 65743 167 187 65393 25M 98 115 18342 12304 56702 16608 026 (024 030) 363 000

Evolution 84788 342 1334 83113 199M 391 135 18289 45472 15601 58720 020 (022 027) 169 562

Racism 129963 1024 1022 127953 48M 746 560 64359 41566 74354 58195 032 (021 031) 272 255

LGBT 150563 459 640 149479 46M 195 143 28100 22678 92975 81706 034 (030 013) 244 535

Table 1 Networksrsquo statistics The notation119876 (119875 119875) isin [0 1] indicates the modularity among the partitions Higher119876 means thatconnections within partitions exceed those among them

Pro-lifePro-choice

ProhibiitonActivism

ControlRights

CreationismEvol Bio

Racism Anti-racism

DiscriminationSupport

000

025

050

075

Links

pos

ition

Opposite opinion Same opinion

Figure 2 Linksrsquo position distribution within pages Given 119875 and 119875 the orange boxplots show the distribution of links withinpages in 119875 (resp 119875) that point to articles in 119875 (resp 119875 ) The green boxes represent linkrsquos placement among pages only belongingto 119875 (resp 119875 ) The value of the y-axis is the relative position re-scaled with the 119905119886119899ℎ to similarly score links at the top of thepage Higher the value higher the position in the page is

119875 and 119875 to be disjoint articles belonging to both ldquoAnti-abortion

movementrdquo and ldquoAbortion-rights movementrdquo are assigned to N10

Once we have the list of pages in T we proceed building the topic-

induced network as described in the first part of this section The

articles we collect gather pages about different entities such as

organizations people events The inclusion of a heterogeneous set

of pages for each viewpoint allows to capture the different way a

user can learnknow about a topic

Before moving on we need to make two remarks (1) Throughoutthe paper when we talk about articles expressing an opinion ordescribe a viewpoint of a topic we do not mean that they endorse

the position of any subject they describe But they objectively talk of

entities that are close to one side of the issue (2) Since subcategoriesare often redundant or not entirely related to the parent category

we check them manually In this way we avoid cases like having

articles about anti-racism falling into the racism category Moreover

we do not consider categories whose names do not include topic-

specific keywords

32 General Statistics on Topicsrsquo NetworksFollowing the procedure explained in the previous section we

collect the topic-induced network related to six different topics

that we pick from the List of controversial issues on Wikipedia11

and other resources that indicate some controversial issues in our

society These topics are abortion cannabis guns evolution LGBTand racism These are critical topics that often polarize as follows

pro-choice vs prolife cannabis activism vs cannabis prohibition

gun control vs gun rights creationism vs evolutionary biology

support to LGBT rights vs opposition to LGBT rights and racism

10We report the size of the intersections between partitions in the next section

11httpsenwikipediaorgwikiWikipediaList_of_controversial_issues

Topic 119875 119875 Seed 119875 Seed 119875

Abortion Pro-life Pro-choice

Anti-abortion

movement

Abortion-rights

movement

Cannabis Prohibition Activism Cannabis prohibition Cannabis activism

Guns Control Rights

Gun control

advocacy groups

Gun rights

advocacy groups

Evolution Creationism

Evolutionary

biology

Creationism

Evolutionary

biology

Racism Racism Anti-racism Racism Anti-racism

LGBT Discrimination Support

Discrimination against

LGBT people

LGBT rights

movement

Table 2 The table indicates what opinion of a topic the par-titions 119875 and 119875 correspond to

vs anti-racism Information about the seed categories of each topic

are in Table 2 The full category lists and sample titles are provided

in the code folder Sect 1

For the rest of the paper we refer to the opinions about a topic

using 119875 and 119875 In Table 2 for each topic we match each set to the

real opinion it represents

Before presenting the general statistics of the retrieved networks

we remark that when we assign the articles to partitions we put

to the set N those assigned to both partitions The size of the

intersections among partitions (ie the number of common articles)

are the following abortion is 2 cannabis is 3 evolution is 2 guns is 1lgbt is 5 racism is 7 Recalling that we do not remove these articles

(ie they belong to N ) they can still act as bridges connecting 119875

and 119875 in sessions longer than 1 click Instead when we consider

the direct connections among partitions (1 click) we discard them

since they do not explicitly categorized into one partition

In Table 1 we show some statistics on the six topic-induced

networks Immediately we observe that the size of 119875 and 119875 differ

substantially for all the topics except for racism and guns It means

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

that we have one of the two opinions represented by more articles

In terms of content this does not necessarily imply that neither

one of the two views is incomplete nor insufficiently represented

Indeed a topic spans a few articles or may require more pages to

be complete On the other hand the unbalanced sizes can affect an

opinionrsquos exposure within the entire Wikipediarsquos network Practi-

cally if a set of articles is large and well connected to the rest of

the network the chances that users who randomly browses reach

it are higher than those of going to a small partition Moreover if

readers exploit the random article functionality of Wikipedia an

opinion more represented gets more chances of being randomly

sampled

The topics showing the higher unbalance are cannabis wherethere are five times more pages about activism than about prohi-bition and evolution where there are four times more pages about

evolutionary biology than about creationism If we consider the edges

across partitions the number of cross-partition edges is higher for

bigger sets This is reasonable because more nodes can point to the

opposite side Despite that for evolution the edges from creationismto evolutionary biology are sim3 times more and for LGBT the edges

from discrimination to rights are 36 more Despite the low number

of edges across cannabis partitions we decide not to discard the

topic

Above we said that one of the two partitions might connect

better to the rest of the encyclopedia We observe that the sizes of 119875

and 119875 are not linear in the number of edges that point out or to the

nodes in the partitions For instance the number of articles about

pro-choice (291) is half of the nodes related to pro-life movement

(481) Although the nodes in pro-life are twice as many as those in

pro-choice the number of links pointing to pages about pro-choiceis 36 more than those pointing to pro-life articles This happenswith different magnitude also for guns and LGBT We will see later

that the fact that a side of a topic is better blended in the network

has implications on the readersrsquo exposure to one of the two sides

of the topic (Sect 6)

We also investigate how many pages in 119875 and 119875 cannot be

reached by users unless they enter Wikipedia directly on those

pages The sets of articles with the highest number of unreachable

nodes are in the category of cannabis prohibition (1136) followed

by the 562 of evolutionary biology and LGBT rights (535)Furthermore we compute the modularity 119876 among 119875 and 119875

Higher 119876 means that connections within partitions exceed those

among them In Table 1 we report three values computed on dif-

ferently weighted graphs with probabilities assigned to click the

link of each page as follows (1) uniform (2) proportional to the

position of the link within the page and (3) proportional to readersrsquo

clickstream (see Sect 412) Overall if we consider the position of

links and readersrsquo clickstream it seems that the partitions are more

modular

Based on that we study how links across and within partitions

position in pages First we define the position of a link Given a

page we have its list of links in order of appearance We get the

relative rank within the list for each link and re-scale it by the tanh

In this way we have values in [0 1] and the links at the top of the

list get a more similar score The set of links includes those in the

infoboxes We regard them as at the top of the article according to

results in [15 17] If a link appears more than once we average its

position

In Figure 2 we show the position distributions According to

the t-test whose significant level is fixed to 120572 = 095 the average

position of links in pro-choice pointing to pro-choice is significantlydifferent than the average position of links pointing to pro-life Alsothe position of links from guns control to guns control is signifi-cantly higher than those to guns rights For evolutionary biologywhose distribution of links to creationism are placed statistically

significantly lower than those to evolutionary biology The same

happens for LGBTFor the sake of completeness of the analysis even if not used

further in the paper for each topic we study the quality of the pages

populating it In particular we use the ORES API to get the ldquoarticlequalityrdquo We observe that overall for all the topics between 60 and

70 the articles are classified as stubs or start Then the 22-29 is

in B-class the 0-5 are Featured Articles and the remaining belong

to the C-class12

4 METRICSIn this section we define the models and metrics that we use to an-

swer the research questions formulated in Sect 1 First we describe

how we characterize readersrsquo consumption either by analyzing

real usersrsquo data or by simulating their behavior (see Sect 41) Then

we introduce the core metrics of the paper ExDIN and M-ExDIN

see Sect 42

41 Content ConsumptionTo understand readersrsquo consumption of polarizing topics we need

different modeling strategies that we describe in the following

subsections

411 Metrics Based on Clickstream We build twometrics upon the

information we extract from usersrsquo clickstream data that are made

publicly available by Wikimedia and preserve usersrsquo privacy [14

54]13

From these data we infer 119888119894 119895 counting how many times a hyper-

link to 119894 isin 119881 is clicked from page 119895 The page 119895 may be either an

internalWikipedia page ( 119895 isin 119860 recalling that119881 = T cupNcup119878 includeall the Wikipedia pages) or external if corresponds to a page from

outside Wikipedia (eg a search engine) Thus we define the vari-

able120575 119895 which indicateswhether 119895 is an external page or it belongs to

the topic-induced network 120575 119895 = 1 if 119895 is external and 0 otherwise

Given a page 119894 we indicate withJ the set of external and internal

pages pointing to it see Figure 3 We define 119888119894 =sum

119895 isinJ 119888 119895119894 to be

the total clicks to the page

sum119895 isinJ 120575119894119888 119895119894 is the total number of clicks

from external websites therefore the difference between 119888119894 and this

summation is the number of visits from internal (Wikipedia) pages

Now we are ready to define the following metrics

Reader Search Rate (RSR) Given a page 119894 isin 119881 the empirical

probability that a visit to page 119894 is from an external website is

119877119878119877119894 =

sum119895 isinJ 120575119894119888 119895119894

119888119894 (1)

12httpsenwikipediaorgwikiTemplateGrading_scheme

13Description of the data is at httpsmetawikimediaorgwikiResearch

Wikipedia_clickstream The provided information is enough to extract the clickstream

based metrics

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

Click Through Rate (CTR) Given a page 119894 isin 119881 the empirical

probability that a reader clicks a link within the page is

119862119879119877119894 =

sum119895 isin119873119900119906119905 (119894) 119888119894 119895

119888119894 (2)

where 119873119900119906119905 (119894) is the the set of pages 119894 points to (Multiple clicks

from the same page are counted as originating from different visits

to 119894 and thus counted multiple times in 119888119894 )

412 Model Clicks Within Pages When readers visit a page they

have the possibility of clicking any of the present links However

according to the information needs they want to satisfy each of the

links may have a different probability of being clicked [45] Now

we propose three models to describe the distribution probability

of clicking a link ldquojrdquo within an article ldquoirdquo First let 119894 be an article

in 119881 and 119895 isin 119873119900119906119905 (119894) We define 119901119900119904 ( 119895 |119894) as the rank of 119895 among

all links in 119894 and 119903 ( 119895 |119894) = |119873119900119906119905 (119894) | minus 119901119900119904 ( 119895 |119894) such that a higher

value indicates a higher ranking position Moreover we introduce

tanh119909 = 1198902119909minus1

1198902119909+1 which we use to transform ranking positions to

values between 0 and 1 such that links at the top of the page get

similar scores

The Clicks Within Pages models (CwP) are directly applicable on

119866 by setting the transition matrix119872 in one of the following modes

(1) 119872119906(Uniform) whose entry119898(119894 119895) = 1

|119873119900119906119905 (119894) | mimics read-

ers who click each link in a page uniformly at random

(2) 119872119901(Position) whose entry 119898(119894 119895) =

tanh 119903 ( 119895 |119894)sum119895isin119873119900119906119905 (119894 ) tanh 119903 ( 119895 |119894)

captures the scenario in which readers click with higher

probability links appearing first in the page This model is

based on previous work that shows how the link position is

a good predictor to determine the success of a link [16 31]

(3) 119872119888(Clicks) whose entry119898(119894 119895) = 119888119894119895sum

119895isin119873119900119906119905 (119894 ) 119888119894119895represents

the empirical probability that users in 119894 will click the link

toward 119895 When 119888119894 119895 lt 10 we substitute it with 1014 the

minimum number of times the link must be clicked to be

included in the dataset [53]

For the sake of completeness we recall that 119866 includes a super

node 119904 To fill its corresponding entries in the transition matrices

we need to aggregate over the edges we compressed to build the

graph15

see Sect 31

413 Readers Navigation Model The main goal of this paper is

to audit the mutual exposure to diverse information across 119875 and

119875 We can do it by simply looking at a snapshot of the graph and

counting the links going from 119875 to 119875 and vice-versa To do a step

further we recall that the Wikipediarsquos network is conceived to let

users move fulfilling their own information needs Thus we want

to understand how different usersrsquo navigation behavior can affect

readersrsquo exposure to diverse information

To do that it would be optimal to have access to usersrsquo log ses-

sion Because these data are not available to the public we define a

parametric model that simulates usersrsquo navigation by embedding

14We aim to model users on the current version of Wikipedia Thus to include all

the links we assign a smoothing factor equal to 10 to links clicked less than 10 This

implies a small probability of clicking these links Setting the smoothing factor to 10

is a deliberate choice However we experimentally verified that setting any number

between 1 and 10 does not affect the results

15The computation of these quantities is straightforward so we omit it from the

body of the paper

external

internal

i internal

Figure 3 Information from the clickstreamdataset For eachnode we extract the number of views coming from inter-nal and external websites Moreover we know howmany ac-cesses on a page turn into a click toward another article

different behaviors accordingly to chosen parameter We empha-

size that the scope of this model is not to perfectly replicate usersrsquo

behavior on Wikipedia Rather we want to see how users simu-

lated from a reasonable and general model are exposed to diverse

information

In other words we want to define a stochastic process with 119899 +1

states corresponding to the 119899 + 1 pages in119881 that approximates the

probability of reaching any of the articles starting at random from

119901 isin 119875 (or from 119875 )

Wemodel this by considering the process 119883 ℓ ℓ = 0 1 119871 on

the set of nodes119881 induced by transitionmatrix119872 with starting state

119883 0selected from the probability distribution 1206450

119875= (120587119875 )119894 isin R1times119899

over119881 We recall that the transition matrix119872 can vary according to

the CwP models (Sections 31 and 412) Based on the assumption

that usersrsquo session length (the number of clicks) is finite we evaluate

the process on a finite number of states 119871 We have that Pr(119883 ℓ =

119895) = (120587 ℓ119875) 119895 where the (row) vector 120645 ℓ

119875is given by the following

variation of the Personalized Random Walk with Restart (RWR)

Definition 1 (Navigation Model) Let1198720 be the transition ma-trix embedding a click-within-pages model 1206450

119875the distribution of the

starting state over 119875 and 120572 isin [0 1] the restart parameter We have

1206451

119875 = 1206450

119875middot1198720 (3)

and for ℓ ge 1

120645 ℓ+1

119875 = (1 minus 120572)120645 ℓ119875 middot119872ℓ + 120572 (1206450

119875 middot119872ℓ ) (4)

where119872ℓ = norm((119863 (119872ℓminus1)119879 )119879 ) and119863 = 119889119894119886119892

(1 + 120645 ℓminus1

119875

)minus1

norm(119872)transforms matrix119872 into a right-stochastic matrix by normalizingeach row independently such that it sums to 1

This process is a variation of the standard random-surfer (PageR-

ank) model with the difference that the transition matrix is updated

in each step It takes into account the probability that an article

has already been visited in a previous iteration Specifically the

vector 120645 ℓ119875that we get at the end of each iteration represents the

likelihood that each node is reached at step ℓ if it starts uniformly at

random from a node in 119875 We assume that readers within the same

session do not click more than once the same link Thus we desire

that at step ℓ + 1 the nodes that are clicked with high probability

at step ℓ see their probability of being reached deflated and those

with lower probability have more chances of being clicked We

achieve this by dividing the rows of119872 by the vector of probabilities

120645 ℓ119875+1 where 1 is a smoothing factor to avoid divisions by 0 and

then normalize the matrix to get the updated stochastic matrix to

use in the next iteration

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

(a) Star-like (120572 = 1) (b) Star-like Rand Navigation(0 lt

120572 lt 1)

(c) Random Naviga-

tion (0 = 120572 )

Figure 4 Navigation model for different 120572 The green nodesrepresent the starting navigation pages

Overall as we will later see in Section 42 this approach allow us

to investigate how the exposure to diverse information varies for

users who behave differently in terms of navigation session length

(meant as the number of clicks) and next-link choices

Looking deeper at the model

bull When 120572 = 1 Figure 4(a) the model emulates the reader

whose navigation consists in just opening links from the

starting page We call this behavior star-like and basically

consists in opening pages from the starting node With this

kind of exploration readers locally explore articles likely

semantically related to each other [49]

bull For 0 lt 120572 lt 1 Figure 4(b) we simulate two cases (1) readers

open sequential articles and then jump back to the starting

page (2) readers keeps multiple path open The more 120572 is

close to 1 the more users show a star-like behavior Instead

the closer 120572 is to 0 the more users navigate navigate in a

more DFS-oriented fashion Thus readers move randomly

according to the CwP model and from time to times jump

back to the starting page

bull If 120572 = 0 Figure 4(c) the users sequentially clicks links so

each click depends only on the CwP model In this case

especially if related articles are not densely connected the

exploration can lead to articles less related to the starting

page and returning to the origin following hyperlinks may

be difficult

Because Wikipedia does not have a button that allows readers to

go back to the previous page we assume the jumping back action

to consist in clicking the back button of the browser in use until

reaching again the session starting page The restart parameter

indirectly embeds the back-button action which for the absence of

back-links on Wikipedia can not be tracked on the graph

The behaviors replicated through the model recall those de-

scribed in [43 45]

42 Exposure MetricsAt this point we have all the ingredients to define the exposureto diverse information The metrics aim to quantify how much the

network structure allows readers to reach one or multiple sets of

articles To do that we rely on both the CwP and Navigationmodels

The application of the following metrics is not limited to polarizing

topics In fact they can generalize to the analysis of any sets of

nodes in a graph For this reason we adopt a more general notation

in their definition

Pro-

life

Pro-

choi

ce

Proh

ibiit

onAc

tivism

Cont

rol

Righ

ts

Crea

tioni

smEv

ol B

io

Racis

m

Anti-

racis

m

Disc

rimin

atio

nSu

ppor

t

0

5

10

Log(

Page

view

s)

Figure 5 Pageviews distribution For each topic we havea purple and yellow boxplot They represent the average(over all pages in the group 119875 or 119875) number of pageviewsAll the distribution distributions except for abortion are sta-tistically different at confidence level 120572 = 095 The topicsin order are abortion cannabis guns evolution racism andLGBT

Definition 2 (Exposure to diverse information (ExDIN)) Giventwo sets of pages 119875 119875 in 119881 let 120645 ℓ

119875be the vector indicating for each

article the probability of being reached at step ℓ (ℓ ge 1) starting froma random page in 119875 We say that the exposure of 119875 to 119875 is

119890ℓ119875rarr119875

=sum119895 isin119875

Pr(119883 ℓ = 119895) =sum119895 isin119875

120645 ℓ119875 (5)

and describes the probability that a reader in 119875 reaches an arbitrarynode in 119875 at the ℓth click

We employ this metric in two ways

(1) (Topological exposure to diverse information) If ℓ is 1 and

the CwP model is 119872119906(see Sect 412) it only quantifies

the topological property of the network to connect pages

belonging to different sets

(2) (Readersrsquo exposure to diverse information) For any parameter

and model that we pick the metric tells us how the readers

characterized by the CwP and Navigation models change

their exposure to diverse information over a session (ie

sequence of clicks)

Moreover we notice that Definition 2 can be extended tomultiple

sets Consider the case where we want to understand how one set of

nodes 119875 is exposed to three sets of nodes 119876119885 and 119871 To calculate

the ExDIN if we want to know the total exposure to the three sets

we define 119875 = 119876 cup 119885 cup 119871 Otherwise if we want to have the ExDIN

wrt to each set namely 119890119875rarr119876 119890119875rarr119885 119890119875rarr119871 we take 120645ℓ119875and sum

up the probabilities of the nodes within each set

Now that we have a metric to compute the exposure to diverseinformation we want to compare the flows among the sets Thus

we introduce the mutual exposure to diverse information

Definition 3 (Mutual exposure to diverse information (M-ExDIN))Let 119890ℓ

119875rarr119875and 119890ℓ

119875rarr119875be the exposure to diverse information of sets 119875

and 119875 We say that the mutual exposure between the sets is

120598ℓ =min119890ℓ

119875rarr119875 119890ℓ119875rarr119875

max119890ℓ119875rarr119875

119890ℓ119875rarr119875

isin [0 1] (6)

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

00 05 10 15 20 25 of pageview from

opposite partition

Pro-lifePro-choice

ProhibitionActivism

Gun controlGun rights

CreazionismEvolutionary biology

LGBT discriminationLGBT rights

RacismAnti-racism

Figure 6 Percentage of pageviews coming from oppositeside Topics in order from the bottom abortion cannabisguns evolution LGBT racism

If either 119890ℓ119875rarr119875

or 119890ℓ119875rarr119875

is 0 then 120598 = 0

This measure quantifies to what extend the exposure to diverse

information is balanced across 119875 and 119875

The closer 120598 is to 1 the more balanced the probabilities of moving

from one set to the other are In this case the network topology does

not favor connections from one set to the other On the other hand

if 120598 is close to 0 it the network structure tends to favor either the

navigation from 119875 rarr 119875 or from 119875 rarr 119875 On this perspective if we

observe a tendency in the network of facilitating the exploration

from one of the sets to the other we may say that the network

topology is biased toward a direction Thus we can think of using

M-ExDIN to measure the bias in the network wrt two sets of

nodes

Even though the mutual exposure to diverse information cap-

tures the balance among ExDIN of 119875 and 119875 when they are of com-

parable size it may fail if one is much smaller than the other For

instance suppose 119875 is 10 times larger than 119875 then if pages of both

partitions have a similar out-degree distribution one would expect

119890ℓ119875rarr119875

asymp 10 middot 119890ℓ119875rarr119875

and as a result 120598 asymp 01 The same happens if

they have similar in-degree distribution For this reason when we

compute either ExDIN and M-ExDIN we check whether the sizes

of the communities are unbalanced and we proceed as follows If

|119875 | lt |119875 | we define 119875 prime obtained by sampling |119875 | articles from 119875

Thus we use the new set for all computations Because of the ran-

domness of the phenomenon we repeat the measurements multiple

times

5 RQ1 READERSrsquo TOPIC CONSUMPTIONBefore looking into how readers are exposed to diverse content

we investigate how they have consumed each of the six topics

that we concentrate on over the last four years In particular we

collect monthly clickstream data from November 2017 to September

2020 We note that when we count the click views of a page we

consider the average over the number of months the page existed16

Accordingly when computing the occurrences for the transitions

matrix based on clickstream we consider the average clicks of the

link over the number of months it exists In this way we reduce

16Based on the temporal graphs extracted by [11]

the seasonality effect and weight links according to page changes

in terms of hyperlinks

51 Pageviews DistributionTo start our analysis we count the average number of times a

page has been visited over 34 months In Figure 5 we plot the

log-distributions of the pageviews for each topic and opinion By

running a t-test we conclude that for all topics except for abortionthe difference of the means of opinionsrsquo pageviews is statistically

significant for 119901 lt 005 This finding demonstrate that users tend to

visit more pages expressingsupporting one of the two viewpoints

From a networkrsquos perspective to increase the exposure to opposing

opinions it is desirable for pages that are frequently visited to be

well connected to articles expressing opposing opinions

In Figure 6 we break down the pageviews showing how many

of them come from pages of the opposing partition Overall the

fraction of visits from the opposite side is low (below 05) The

category LGBT rights has the highest ratio of visits from LGBT dis-crimination pages about 25 For topics such as guns and abortionthe percentage of visits from opposite partition shows that there

are somewhat fewer visits to pages of a liberal inclination from

articles expressing a more conservative opinion In fact the 028

of visits to pro-choice come from pro-life compared to 06 visits of

pro-life from pro-choice

52 External or Internal Access to the TopicWe now investigate how readers access content about a topic As

introduced in Section 411 from the clickstream data we can com-

pute the RSR which indicates whether a page is accessed more

by external sources or by navigating Wikipedia In Figure 7 we

provide a visualization that depicts the flows of the cumulative

visits from external and internal pages towards the two partitions

Referring to Figure 7(c) the 448 of visualizations come from

internal pages The click stream from internal pages is broken down

to see the proportion of flow towards guns control and gun rightsThe internal views of guns control articles are 34 times more than

those of gun rights We observe that also from external websites

most of the traffic is towards gun control (27 times more than gunrights) Overall the 26 of the total visits to gun related content is

concentrated on gun rightsThe abundance of traffic towards one of the two opinions does

not characterize only the guns topic Indeed among all the topics

the 59ndash74 of visits is accumulated by one partition Moreover

readersrsquo preferences appear consistent among external and internal

accesses that is they both point more towards the same view of

the topic For both internal and external views the distribution

of accesses toward partitions is approximately the same (ie the

percentage of visits from external to 119875 (resp 119875 ) is the same of

from internal to 119875 (resp119875 )) The only exception is evolution whoseexternal visits to creationism is 453 lower than internal accesses

We note that partitions with higher views are not necessarily the

biggest in the topic-induced networks

In general the largest amount of visits to topicsrsquo articles comes

from external pages Particularly only the 236 and 335 of traffic

to evolution and racism is generated by the internal Wikipediarsquos

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll PPPPPrrrrrooooo-----llllliiiiifffffeeeee

PPPPPrrrrrooooo-----ccccchhhhhoooooiiiiiccccceeeee

(a) Abortion

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

PPPPPrrrrrooooohhhhhiiiiibbbbbiiiiiiiiiitttttooooonnnnn

AAAAAccccctttttiiiiivvvvviiiiisssssmmmmm

(b) Cannabis

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

CCCCCooooonnnnntttttrrrrrooooolllll

RRRRRiiiiiggggghhhhhtttttsssss

(c) Guns

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllllCCCCCrrrrreeeeeaaaaatttttiiiiiooooonnnnniiiiisssssmmmmm

EEEEEvvvvvooooolllll BBBBBiiiiiooooo

(d) Evolution

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

RRRRRaaaaaccccciiiiisssssmmmmm

AAAAAnnnnntttttiiiii-----rrrrraaaaaccccciiiiisssssmmmmm

(e) Racism

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

DDDDDiiiiissssscccccrrrrriiiiimmmmmiiiiinnnnnaaaaatttttiiiiiooooonnnnn

SSSSSuuuuuppppppppppooooorrrrrttttt

(f) LGBT

Figure 7 Cumulative Pagesrsquo Traffic Eachplot indicates (1) On the left the cumulative amount of accesses coming from externalweb pages or internal Wikipediarsquos articles (2) The flows of visits from external and internal pages to partitions (3) On theright the cumulative accesses to 119875 and 119875

0 5 10 15 20 25 30Avg(Click Through Rate) X 100

Pro-lifePro-choiceProhibition

ActivismGun controlGun rights

CreazionismEvolutionary biologyLGBT discrimination

LGBT rightsRacism

Anti-racism

Figure 8 Average click-through rate In this plot we reportthe average CTR of pages belonging to the same set 119875 (resp119875) The score indicate the average probability that a linkwithin a page 119901 in 119875 (resp 119875) is clicked Topics in orderfrom the bottom abortion cannabis guns evolution racismLGBT

navigation The same quantity for the remaining topics ranges

ranges between 44 and 47

We point out that readersrsquo consulting articles about abortioncannabis and guns are inclined toward pages conveying liberal

views on the topic Instead it is more complicated to draw inter-

pretations about the remaining topics One explanation may be

that users look for information generally less covered in the public

mainstream debate

53 How Much Readers Navigate LinksOnce readers visit a page they can decide to click any of its links

We want to understand how frequently they do so For that we

compute the average pagesrsquo click-through rate (Sect 411)

We plot this information in Figure 8 Overall we see that the

percentage of access turning into a visit to another page ranges

between 10ndash28 Dimitrov and Lemmerich [14] observed that the

CTR average for the whole Wikipedia is 12 So most of the subset

of pages we consider have a CTR higher than Wikipediarsquos average

The CTR of guns control is the highest (28) the pages about racismfollow with 26 The articles that over the years have generated

less internal traffic are those about evolutionary biology and LGBTrights

Examining pagesrsquo connections we found that those with higher

CTR have more links (the Pearson correlation coefficient of is 052)

Topic macrC(119875)

macrC(119875 rarr 119875) macrC(119875 rarr 119875)

macrC(119875) macrC(119875 rarr 119875)

macrC(119875 rarr 119875)Abortion 6889 9082 5764 7026 8981 4537

Cannabis 7081 9501 3750 6578 9652 1667

Guns 5234 7869 3568 5963 7535 3928

Evolution 7115 8449 6447 7269 9900 5513

Racism 5636 8841 3432 7187 9063 6544

LGBT 6166 8942 525 7242 9252 5917

Table 3 Average of links within pages clicked less than 10times

macrC(119875) is the average percentage of un-clicked hyper-linkswithin pages in 119875

macrC(119875 rarr 119875) is the average percentageof un-clicked links within 119875 pointing to 119875

This suggests that articles having higher out-degree offer more

options to users Presumably because of this users are more likely

to continue the exploration from those articles

In addition we count the number of links clicked fewer than 10

times over the last three years see Table 3 As an example given

a page in creationism on average 7115 of its links have been

clicked fewer than 10 times If we distinguish between references

to creationism and references to evolution readers did not click

the 8449 of links pointing to creationism and the 6447 of those

pointing to evolutionary biology

6 RQ2 EXPOSURE ACROSS TOPICVIEWPOINTS

The main contribution of this paper is to examine to what extent

current Wikipediarsquos topology supports users to explore diverse

facets of polarizing issues In particular we study (1) how readers are

locally exposed to diverse information and (2) how their exposure

to plural opinions may change throughout a navigation session

61 Exposure to DiversityTo evaluate the exposure to diversity induced by the networkrsquos

topology we compute the exposure to diverse information for ℓ = 1

using the uniform CwP model Recalling that ℓ indicates the usersrsquo

session length if we set it equals to 1 we study the exposure to

diversity over one-click sessions

Plots in the first row of Figure 9 show the value of ExDIN for

all the topics when CwP is119872119906

For instance let evolution be the topic we analyze If readers

start uniformly at random from a page about creationism the prob-

ability of visiting an article of the same partition is 576 On the

contrary the chances of entering a page about evolutionary biology

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

Pro-life

Pro-choice

Unifo

rm

456

748133

061 Prohibiiton

Activism

338

038039

008 Control

Rights

444

478161

116 Creationism

Evol Bio

576

327061

018 Racism

Anti-racism

876

8613

142 Discrimination

Support

424

395063

054

Pro-life

Pro-choice

Posit

ion

428

781115

056 Prohibiiton

Activism

332

037037

008 Control

Rights

468

473138

089 Creationism

Evol Bio

553

329054

016 Racism

Anti-racism

892

834129

159 Discrimination

Support

394

356056

042

Pro-

life

Pro-

choi

ce

Pro-life

Pro-choice

Click

s

1274

1357129

123Pr

ohib

iiton

Activ

ism

Prohibiiton

Activism

864

065081

013

Cont

rol

Righ

ts

Control

Rights

1376

1305208

137

Crea

tioni

sm

Evol

Bio

Creationism

Evol Bio

1157

52069

03

Racis

m

Anti-

racis

m

Racism

Anti-racism

2367

1503116

129

Disc

rimin

atio

n

Supp

ort

Discrimination

Support

1107

653066

053

Figure 9 ExDIN Each patch of the matrix is the exposure to diverse information across partitions for example 119875 rarr 119875 119875 rarr 119875 The 119910-axis indicates the source and the 119909-axis is the destination To each row corresponds the exposure to diverse informationcomputed for different CwP Darker colors indicate higher probability of being in the correspondent square in one click

is 018 (32 times smaller) On the other hand readers starting

uniformly at random from an article about evolutionary biologyhave 327 chances of reading pages conveying the same opinion

This probability is 5 times larger than that of visiting creationismpages

It is worth to point out that the current networkrsquos topology not

only nudges users in reading more about the same opinion but

also hinders them to explore diverse content symmetrically Indeed

users reading about evolutionary biology have higher chances of

reading one article about creationism (3 times more) than users from

creationism of reading about evolutionary biology After repeatingthe same analysis for all the topics we realize that the aforemen-

tioned observations hold for most of them Moreover we note that

the probabilities to continue the session reading a page of the same

opinion is greater for one of the two partitions of a given topic

Taken together these measurements highlight that the structure

of the network facilitates users to explore knowledge bubbles ofhomogeneous view and makes the measure of mutual exposure to

diverse information smaller than 1

The findings above report the intrinsic capability of the network

to spur users towards diverse content If we want to combine it

with readersrsquo next-click choice behavior we use the the positionand clicks CwP models instead of the uniform We show the results

in the second and third row of Figure 9

Referring back to evolution we now consider the matrix corre-

sponding to the ExDIN computed using the position CwP model

(second row) We see that if users click with higher chances links

at the top of the page wrt the uniform model the probabilities are

only slightly modified These modest variations are coherent with

linksrsquo placement within pages Figure 2

For a few topics such as guns the linksrsquo position plays a more

significant role worsening the user exposure to diverse information

Indeed in pages about guns control links belonging to the gunsrights partition seem to be mentioned later in the page The con-

sequence is that the probability of reaching an article supporting

guns rights starting uniformly at random from an article in gunscontrol has a 30 drop wrt the probability observed using the

uniform CwP model Therefore we conclude that for some topics

the placement of links within pages contributes in reducing the

exposure to diverse information In other words users who tend

to click with higher probability the links located towards the be-

ginning of a page have less possibility to read about contrasting

opinions

Finally we analyze how the phenomenon changes when we use

the click CwP model (third row) In this case we assume readers

make the next-click choice similarly to past users Going back to

evolution we immediately observe that the probability to start a

session in creationism and to continue reading about it after one

click grows from 576 of the uniform model to the 1157

For all the topics we verify a significant increment of the proba-

bility to visit pages of the same opinion Simply interpreting this

result we can say that real users click more the links strictly related

to the page they are reading From another perspective combining

this finding with the previous remark saying that the ldquotopology of

the network seems to drive users to explore knowledge bubbles ofhomogeneous viewrdquo we ask the reader the following question Hasthe behavior of past users been influenced by the network topologyUnfortunately because of lack of information we can not answer

this question but we hope it will be addressed in future works

Furthermore we observe that for some topics like abortion theprobability of reaching pro-choice articles from pro-life duplicatesThis is a sign that users may be willing to explore content proposing

diverse view

Before moving to the next section we want to underline that

stronger relations among pages of similar content is an intrinsic

property of Wikipedia In fact in Wikipediarsquos Linking Manual [49]editors are asked to link related content [7 33] Although this is a

fact we believe that it would be valuable to provide to editors met-

rics and tools making them aware of the effect that a newcurrent

links have on usersrsquo exposure to diverse information This is not

meant to alter the core and essential intrinsic property ofWikipedia

rather to avoid this property to become harmful when it prevents

users from accessing diverse content

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

1 3 5 7 9 11 13 15

1

2

e PP

Abortion

1 3 5 7 9 11 13 15

00

05

10Cannabis

1 3 5 7 9 11 13 150

1

2

Guns

1 3 5 7 9 11 13 15

05

10Evolution

1 3 5 7 9 11 13 15

05

10

15

Racism

1 3 5 7 9 11 13 15

05

10

Lgbt

1 3 5 7 9 11 13 1500

05

10

15

e PP

1 3 5 7 9 11 13 15

000

025

050

1 3 5 7 9 11 13 15

1

2

1 3 5 7 9 11 13 15

02

04

06

1 3 5 7 9 11 13 1505

10

15

1 3 5 7 9 11 13 15

05

10

15

1 3 5 7 9 11 13 15

40

60

80

100

1 3 5 7 9 11 13 15

25

50

75

1 3 5 7 9 11 13 1525

50

75

100

1 3 5 7 9 11 13 15

40

60

80

1 3 5 7 9 11 13 15

60

80

100

1 3 5 7 9 11 13 15

60

80

100

Number of Clicks

uniform alpha=0position alpha=0clicks alpha=0

uniform alpha=02position alpha=02clicks alpha=02

uniform alpha=1position alpha=1clicks alpha=1

Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42

62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions

longer than 1 and see how their (mutual) exposure to diversity may

change According to the combinations of models employed to mea-

sure the ExDIN and M-ExDIN we provide different insights about

the effect of the current networkrsquos topology on usersrsquo exposure to

diverse content over a navigation session

In Figure 10 for sessions of length 15 we plot the ExDIN from

119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can

notice that each of the topics shows its own trends For this reason

we decide to highlight and provide an explanation for the most

recurrent patterns Moreover for better understanding we suggest

to cross-check the following explanations with the analysis done

above in the paper We start describing how given the current net-

workrsquos topology the ExDIN changes over the course of a navigation

session (first and second row)

(1) The curves corresponding to the same value of 120572 (same color)

show very similar trends Depending on the respective CwP model

(markerrsquos shape) they are shifted up or down This implies that

when users share the same navigation behavior the way they make

the next-click choice plays a crucial role on determining the mag-

nitude of their exposure to diverse content In general the CwP

model (markersrsquo shapes) corresponding to higher exposure is119872119888

followed by119872119906and119872119901

(2) If users navigate mirroring a star-like behavior (green 120572 = 1)

their exposure to the opposite opinion is steady It can slightly

decrease or increase when the probability of clicking links to the

opposite side becomes higher or lower respectively in the first

iterations This happens because these kind of users are only subject

to the exposure of their starting navigation page So the more links

to the opposite partition they click in the first steps the more their

exposure to diversity decreases and vice-versa

(3) The curves of users who randomly navigate the network (sky

blue 120572 = 0) show two trends For both cases after the first few

clicks the ExDIN is lower than at the beginning of the session

Then it inverts the trend In one case it reaches or exceeds the

starting exposure On the other hand it grows getting steady below

the initial exposure The more the destination partition is connected

to the rest of the graph the more users randomly navigating the

network are able to reach it Sometimes ExDIN from 119875 to 119875 of

LGBT the curves start to decrease after many steps This happens

when the pages within the destination partition have been reached

with high probability

(4) Users characterized by a star-like random navigation (blue

120572 = 02) are exposed to diverse content similarly to users exploring

randomly the network but the ExDIN magnitude is greater because

of the possibility of jumping back to the starting point at each step

Given this observation we analyze the ExDIN of guns With the

current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)

are those characterized by a star-like behavior As soon as the users

navigate in a random fashion this probability drops down This is

because the guns rights partition has a number of in-going edges

that prevent random users who walk away to get back to it We

observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to

the guns control partition is higher or comparable to that of star-like

behaving users The probability of reaching guns control after many

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

steps becomes high because of the higher incoming edges of the

partition

From complementary analysis we observe that for sessions

longer than 2 page visits the topology of the graph is not such

to keep random users within the knowledge bubble they accessed

Indeed for all the topics the probability to visit articles of the same

opinion of the starting page becomes close to 0 (refer to Fig 9 to

cross-check the probabilities at the first click) For users with star-

like behavior the probabilities show slightly descending curves

This demonstrates that they tend to pick pages of the same opinion

at the first iterations Due to space constraints figures picturing

these phenomena could not be displayed

Nowwe compare the ExDIN curves of each topic to understand if

users reading about different opinions have equal chances of visiting

each other We recall that to do it we use the Mutual exposure to

diverse information see Sect 42 For all the topics the mutual

exposure to diverse information is lower than 100 meaning that

for none of them does the network topology provide an equal

exposure across opinions If we consider topics like abortion and

guns the longer the usersrsquo session is the more the network topology

prevents readers to symmetrically explore different opinions

In general if we detect low mutuality the topology of the net-

work favors the exploration of one side of the topic more than the

other Moreover we want to stress that all the comments regarding

users navigating according to the uniform CwP model express the

intrinsic topological exposure of the network On Wikipedia two

scenarios determining this phenomenon may be (1) the knowledge

on the encyclopedia is complete but articles are underlinked [49]

that is there is the content and keywords to become anchors thus

we need a strategy to densify the network ensuringmutual exposure

to diversity (2) the knowledge on the encyclopedia is incomplete

that is there are no words to attach links thus the addition of com-

plementary content may be necessary An in depth investigation of

this conditions may be an interesting future work

7 CONCLUSIONSOur work provides a first analysis to understand how the current

Wikipediarsquos network topology assists readers to explore opposite

stances of polarizing topics spanning over sets of articles We for-

malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles

expressing opposing viewpoints The latter evaluates whether the

relationship across diverse views is symmetric that is whether

the flow and the opportunity to go from one side to the other is

comparable for the two directions

We investigate the phenomenon on six polarizing topics (Sect 6)

In addition we also study the overall usersrsquo topics consumption

Our main findings suggest the following

bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-

ing from both external and internal pages suggest that read-

ers are inclined to seek content about one facet of the topic

Most seem to have bias toward liberal content (see Fig 7)

bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse

contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61

the analysis suggests that users reading about an opinion

have higher chances of continuing to explore articles of sim-

ilar views than of the opposite Furthermore for each of the

topics that we explored the users of one of the two views

had substantially more tendency albeit small to visit pages

of the opposing view than the ones of the other one

bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we

observe that the mutual exposure to diverse information is

never achieved by users navigating completely at random

The better one of the two opinions is connected to the rest of

the network the more the graph nudges users toward that

opinion

bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-

mation of the same inclination diminishes On the other

hand if users explore the network with a star-like behavior

their level of exposure to the same opinion is similar to those

who only do one click

In our study we analyze sets of articles assigned to opinions ac-

cording to editorsrsquo crafted categories [44] Although this approach

represents a solid starting point for analysis it can cause article mis-

classification As future work we plan to investigate a more reliable

classification strategy to improve the accuracy of our analysis an

analysis which should include also the content of the articles along

the line information Secondly the performance of a longitudinal

study with the goal of understanding the dynamics that brought

to the current state of the encyclopediarsquos network would provide

further understanding of the usersrsquo behavior Finally in the light

of our findings we deem crucial to design tools to help editors to

contextualize articles within the network such that they are aware

of the effect of links insertion on users knowledge exploration

The prevalence of bias and polarization is well established in

multiple areas of our life and filter bubbles aggravate this phenom-

enon Understanding better how they manifest in Wikipedia (and

other media) is a crucial first step for finding ways to attenuate it

and our hope is that this work is a step towards this goal

Acknowledges Partially supported by the ERCAdvancedGrant

788893 AMDROMA Algorithmic and Mechanism Design Research

in Online Markets and MIUR PRIN project ALGADIMAR Algo-

rithms Games and Digital Markets

REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004

US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem

[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion

networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep

Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18

[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian

Self-Governance in Action Motivating the Policy Lens In ICWSM

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models

IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation

Journal of machine Learning research 3 Jan (2003) 993ndash1022

[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-

work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web

[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content

on famous persons Journal of the American society for information science andtechnology (2011)

[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter

Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM

[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves

Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter

In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs

A complete longitudinal and multi-language dataset of the Wikipedia link net-

works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607

[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena

Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-

ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media

[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho

2016 Extracting Semantics from Random Walks on Wikipedia Comparing

Learning and Counting Methods In Wiki ICWSM

[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-

ferent topic different traffic How search and navigation interplay on Wikipedia

The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28

[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926

[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web

[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding

news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346

[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers

and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320

[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in

Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72

[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In

Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)

[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Political discourse on social media Echo chambers gate-

keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922

[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-

ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671

[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)

[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-

rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282

[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First

Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media

[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013

Models of human navigation in information networks based on decentralized

search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98

[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki

dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-

trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113

[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and

Andreas Hotho 2019 On the right track Analysing and predicting navigation

success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext

and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web

Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web

[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017

How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50

[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of

source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754

[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-

nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and

moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505

[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my

friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726

[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito

2015 Measuring political polarization Twitter shows the two sides of Venezuela

Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018

Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18

[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving

website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624

[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring

Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval

[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk

about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11

[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019

Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-

fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and

Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141

[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014

The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222

[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom

of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn

Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600

[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013

Computing semantic relatedness from human navigational paths A case study

on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70

[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer

2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)

[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-

works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki

WikipediaManual_of_StyleLinking

[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia

Namespace

[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki

WikipediaNeutral_point_of_view

[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect

[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https

doiorg106084m9figshare1305770v22 (2017)

[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing

wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985

  • Abstract
  • 1 Introduction
  • 2 Related Works
  • 3 Data Collection
    • 31 Topic Induced Networks
    • 32 General Statistics on Topics Networks
      • 4 Metrics
        • 41 Content Consumption
        • 42 Exposure Metrics
          • 5 RQ1 Readers Topic Consumption
            • 51 Pageviews Distribution
            • 52 External or Internal Access to the Topic
            • 53 How Much Readers Navigate Links
              • 6 RQ2 Exposure Across Topic Viewpoints
                • 61 Exposure to Diversity
                • 62 Dynamic Exposure to Diversity
                  • 7 Conclusions
                  • References

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

that we have one of the two opinions represented by more articles

In terms of content this does not necessarily imply that neither

one of the two views is incomplete nor insufficiently represented

Indeed a topic spans a few articles or may require more pages to

be complete On the other hand the unbalanced sizes can affect an

opinionrsquos exposure within the entire Wikipediarsquos network Practi-

cally if a set of articles is large and well connected to the rest of

the network the chances that users who randomly browses reach

it are higher than those of going to a small partition Moreover if

readers exploit the random article functionality of Wikipedia an

opinion more represented gets more chances of being randomly

sampled

The topics showing the higher unbalance are cannabis wherethere are five times more pages about activism than about prohi-bition and evolution where there are four times more pages about

evolutionary biology than about creationism If we consider the edges

across partitions the number of cross-partition edges is higher for

bigger sets This is reasonable because more nodes can point to the

opposite side Despite that for evolution the edges from creationismto evolutionary biology are sim3 times more and for LGBT the edges

from discrimination to rights are 36 more Despite the low number

of edges across cannabis partitions we decide not to discard the

topic

Above we said that one of the two partitions might connect

better to the rest of the encyclopedia We observe that the sizes of 119875

and 119875 are not linear in the number of edges that point out or to the

nodes in the partitions For instance the number of articles about

pro-choice (291) is half of the nodes related to pro-life movement

(481) Although the nodes in pro-life are twice as many as those in

pro-choice the number of links pointing to pages about pro-choiceis 36 more than those pointing to pro-life articles This happenswith different magnitude also for guns and LGBT We will see later

that the fact that a side of a topic is better blended in the network

has implications on the readersrsquo exposure to one of the two sides

of the topic (Sect 6)

We also investigate how many pages in 119875 and 119875 cannot be

reached by users unless they enter Wikipedia directly on those

pages The sets of articles with the highest number of unreachable

nodes are in the category of cannabis prohibition (1136) followed

by the 562 of evolutionary biology and LGBT rights (535)Furthermore we compute the modularity 119876 among 119875 and 119875

Higher 119876 means that connections within partitions exceed those

among them In Table 1 we report three values computed on dif-

ferently weighted graphs with probabilities assigned to click the

link of each page as follows (1) uniform (2) proportional to the

position of the link within the page and (3) proportional to readersrsquo

clickstream (see Sect 412) Overall if we consider the position of

links and readersrsquo clickstream it seems that the partitions are more

modular

Based on that we study how links across and within partitions

position in pages First we define the position of a link Given a

page we have its list of links in order of appearance We get the

relative rank within the list for each link and re-scale it by the tanh

In this way we have values in [0 1] and the links at the top of the

list get a more similar score The set of links includes those in the

infoboxes We regard them as at the top of the article according to

results in [15 17] If a link appears more than once we average its

position

In Figure 2 we show the position distributions According to

the t-test whose significant level is fixed to 120572 = 095 the average

position of links in pro-choice pointing to pro-choice is significantlydifferent than the average position of links pointing to pro-life Alsothe position of links from guns control to guns control is signifi-cantly higher than those to guns rights For evolutionary biologywhose distribution of links to creationism are placed statistically

significantly lower than those to evolutionary biology The same

happens for LGBTFor the sake of completeness of the analysis even if not used

further in the paper for each topic we study the quality of the pages

populating it In particular we use the ORES API to get the ldquoarticlequalityrdquo We observe that overall for all the topics between 60 and

70 the articles are classified as stubs or start Then the 22-29 is

in B-class the 0-5 are Featured Articles and the remaining belong

to the C-class12

4 METRICSIn this section we define the models and metrics that we use to an-

swer the research questions formulated in Sect 1 First we describe

how we characterize readersrsquo consumption either by analyzing

real usersrsquo data or by simulating their behavior (see Sect 41) Then

we introduce the core metrics of the paper ExDIN and M-ExDIN

see Sect 42

41 Content ConsumptionTo understand readersrsquo consumption of polarizing topics we need

different modeling strategies that we describe in the following

subsections

411 Metrics Based on Clickstream We build twometrics upon the

information we extract from usersrsquo clickstream data that are made

publicly available by Wikimedia and preserve usersrsquo privacy [14

54]13

From these data we infer 119888119894 119895 counting how many times a hyper-

link to 119894 isin 119881 is clicked from page 119895 The page 119895 may be either an

internalWikipedia page ( 119895 isin 119860 recalling that119881 = T cupNcup119878 includeall the Wikipedia pages) or external if corresponds to a page from

outside Wikipedia (eg a search engine) Thus we define the vari-

able120575 119895 which indicateswhether 119895 is an external page or it belongs to

the topic-induced network 120575 119895 = 1 if 119895 is external and 0 otherwise

Given a page 119894 we indicate withJ the set of external and internal

pages pointing to it see Figure 3 We define 119888119894 =sum

119895 isinJ 119888 119895119894 to be

the total clicks to the page

sum119895 isinJ 120575119894119888 119895119894 is the total number of clicks

from external websites therefore the difference between 119888119894 and this

summation is the number of visits from internal (Wikipedia) pages

Now we are ready to define the following metrics

Reader Search Rate (RSR) Given a page 119894 isin 119881 the empirical

probability that a visit to page 119894 is from an external website is

119877119878119877119894 =

sum119895 isinJ 120575119894119888 119895119894

119888119894 (1)

12httpsenwikipediaorgwikiTemplateGrading_scheme

13Description of the data is at httpsmetawikimediaorgwikiResearch

Wikipedia_clickstream The provided information is enough to extract the clickstream

based metrics

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

Click Through Rate (CTR) Given a page 119894 isin 119881 the empirical

probability that a reader clicks a link within the page is

119862119879119877119894 =

sum119895 isin119873119900119906119905 (119894) 119888119894 119895

119888119894 (2)

where 119873119900119906119905 (119894) is the the set of pages 119894 points to (Multiple clicks

from the same page are counted as originating from different visits

to 119894 and thus counted multiple times in 119888119894 )

412 Model Clicks Within Pages When readers visit a page they

have the possibility of clicking any of the present links However

according to the information needs they want to satisfy each of the

links may have a different probability of being clicked [45] Now

we propose three models to describe the distribution probability

of clicking a link ldquojrdquo within an article ldquoirdquo First let 119894 be an article

in 119881 and 119895 isin 119873119900119906119905 (119894) We define 119901119900119904 ( 119895 |119894) as the rank of 119895 among

all links in 119894 and 119903 ( 119895 |119894) = |119873119900119906119905 (119894) | minus 119901119900119904 ( 119895 |119894) such that a higher

value indicates a higher ranking position Moreover we introduce

tanh119909 = 1198902119909minus1

1198902119909+1 which we use to transform ranking positions to

values between 0 and 1 such that links at the top of the page get

similar scores

The Clicks Within Pages models (CwP) are directly applicable on

119866 by setting the transition matrix119872 in one of the following modes

(1) 119872119906(Uniform) whose entry119898(119894 119895) = 1

|119873119900119906119905 (119894) | mimics read-

ers who click each link in a page uniformly at random

(2) 119872119901(Position) whose entry 119898(119894 119895) =

tanh 119903 ( 119895 |119894)sum119895isin119873119900119906119905 (119894 ) tanh 119903 ( 119895 |119894)

captures the scenario in which readers click with higher

probability links appearing first in the page This model is

based on previous work that shows how the link position is

a good predictor to determine the success of a link [16 31]

(3) 119872119888(Clicks) whose entry119898(119894 119895) = 119888119894119895sum

119895isin119873119900119906119905 (119894 ) 119888119894119895represents

the empirical probability that users in 119894 will click the link

toward 119895 When 119888119894 119895 lt 10 we substitute it with 1014 the

minimum number of times the link must be clicked to be

included in the dataset [53]

For the sake of completeness we recall that 119866 includes a super

node 119904 To fill its corresponding entries in the transition matrices

we need to aggregate over the edges we compressed to build the

graph15

see Sect 31

413 Readers Navigation Model The main goal of this paper is

to audit the mutual exposure to diverse information across 119875 and

119875 We can do it by simply looking at a snapshot of the graph and

counting the links going from 119875 to 119875 and vice-versa To do a step

further we recall that the Wikipediarsquos network is conceived to let

users move fulfilling their own information needs Thus we want

to understand how different usersrsquo navigation behavior can affect

readersrsquo exposure to diverse information

To do that it would be optimal to have access to usersrsquo log ses-

sion Because these data are not available to the public we define a

parametric model that simulates usersrsquo navigation by embedding

14We aim to model users on the current version of Wikipedia Thus to include all

the links we assign a smoothing factor equal to 10 to links clicked less than 10 This

implies a small probability of clicking these links Setting the smoothing factor to 10

is a deliberate choice However we experimentally verified that setting any number

between 1 and 10 does not affect the results

15The computation of these quantities is straightforward so we omit it from the

body of the paper

external

internal

i internal

Figure 3 Information from the clickstreamdataset For eachnode we extract the number of views coming from inter-nal and external websites Moreover we know howmany ac-cesses on a page turn into a click toward another article

different behaviors accordingly to chosen parameter We empha-

size that the scope of this model is not to perfectly replicate usersrsquo

behavior on Wikipedia Rather we want to see how users simu-

lated from a reasonable and general model are exposed to diverse

information

In other words we want to define a stochastic process with 119899 +1

states corresponding to the 119899 + 1 pages in119881 that approximates the

probability of reaching any of the articles starting at random from

119901 isin 119875 (or from 119875 )

Wemodel this by considering the process 119883 ℓ ℓ = 0 1 119871 on

the set of nodes119881 induced by transitionmatrix119872 with starting state

119883 0selected from the probability distribution 1206450

119875= (120587119875 )119894 isin R1times119899

over119881 We recall that the transition matrix119872 can vary according to

the CwP models (Sections 31 and 412) Based on the assumption

that usersrsquo session length (the number of clicks) is finite we evaluate

the process on a finite number of states 119871 We have that Pr(119883 ℓ =

119895) = (120587 ℓ119875) 119895 where the (row) vector 120645 ℓ

119875is given by the following

variation of the Personalized Random Walk with Restart (RWR)

Definition 1 (Navigation Model) Let1198720 be the transition ma-trix embedding a click-within-pages model 1206450

119875the distribution of the

starting state over 119875 and 120572 isin [0 1] the restart parameter We have

1206451

119875 = 1206450

119875middot1198720 (3)

and for ℓ ge 1

120645 ℓ+1

119875 = (1 minus 120572)120645 ℓ119875 middot119872ℓ + 120572 (1206450

119875 middot119872ℓ ) (4)

where119872ℓ = norm((119863 (119872ℓminus1)119879 )119879 ) and119863 = 119889119894119886119892

(1 + 120645 ℓminus1

119875

)minus1

norm(119872)transforms matrix119872 into a right-stochastic matrix by normalizingeach row independently such that it sums to 1

This process is a variation of the standard random-surfer (PageR-

ank) model with the difference that the transition matrix is updated

in each step It takes into account the probability that an article

has already been visited in a previous iteration Specifically the

vector 120645 ℓ119875that we get at the end of each iteration represents the

likelihood that each node is reached at step ℓ if it starts uniformly at

random from a node in 119875 We assume that readers within the same

session do not click more than once the same link Thus we desire

that at step ℓ + 1 the nodes that are clicked with high probability

at step ℓ see their probability of being reached deflated and those

with lower probability have more chances of being clicked We

achieve this by dividing the rows of119872 by the vector of probabilities

120645 ℓ119875+1 where 1 is a smoothing factor to avoid divisions by 0 and

then normalize the matrix to get the updated stochastic matrix to

use in the next iteration

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

(a) Star-like (120572 = 1) (b) Star-like Rand Navigation(0 lt

120572 lt 1)

(c) Random Naviga-

tion (0 = 120572 )

Figure 4 Navigation model for different 120572 The green nodesrepresent the starting navigation pages

Overall as we will later see in Section 42 this approach allow us

to investigate how the exposure to diverse information varies for

users who behave differently in terms of navigation session length

(meant as the number of clicks) and next-link choices

Looking deeper at the model

bull When 120572 = 1 Figure 4(a) the model emulates the reader

whose navigation consists in just opening links from the

starting page We call this behavior star-like and basically

consists in opening pages from the starting node With this

kind of exploration readers locally explore articles likely

semantically related to each other [49]

bull For 0 lt 120572 lt 1 Figure 4(b) we simulate two cases (1) readers

open sequential articles and then jump back to the starting

page (2) readers keeps multiple path open The more 120572 is

close to 1 the more users show a star-like behavior Instead

the closer 120572 is to 0 the more users navigate navigate in a

more DFS-oriented fashion Thus readers move randomly

according to the CwP model and from time to times jump

back to the starting page

bull If 120572 = 0 Figure 4(c) the users sequentially clicks links so

each click depends only on the CwP model In this case

especially if related articles are not densely connected the

exploration can lead to articles less related to the starting

page and returning to the origin following hyperlinks may

be difficult

Because Wikipedia does not have a button that allows readers to

go back to the previous page we assume the jumping back action

to consist in clicking the back button of the browser in use until

reaching again the session starting page The restart parameter

indirectly embeds the back-button action which for the absence of

back-links on Wikipedia can not be tracked on the graph

The behaviors replicated through the model recall those de-

scribed in [43 45]

42 Exposure MetricsAt this point we have all the ingredients to define the exposureto diverse information The metrics aim to quantify how much the

network structure allows readers to reach one or multiple sets of

articles To do that we rely on both the CwP and Navigationmodels

The application of the following metrics is not limited to polarizing

topics In fact they can generalize to the analysis of any sets of

nodes in a graph For this reason we adopt a more general notation

in their definition

Pro-

life

Pro-

choi

ce

Proh

ibiit

onAc

tivism

Cont

rol

Righ

ts

Crea

tioni

smEv

ol B

io

Racis

m

Anti-

racis

m

Disc

rimin

atio

nSu

ppor

t

0

5

10

Log(

Page

view

s)

Figure 5 Pageviews distribution For each topic we havea purple and yellow boxplot They represent the average(over all pages in the group 119875 or 119875) number of pageviewsAll the distribution distributions except for abortion are sta-tistically different at confidence level 120572 = 095 The topicsin order are abortion cannabis guns evolution racism andLGBT

Definition 2 (Exposure to diverse information (ExDIN)) Giventwo sets of pages 119875 119875 in 119881 let 120645 ℓ

119875be the vector indicating for each

article the probability of being reached at step ℓ (ℓ ge 1) starting froma random page in 119875 We say that the exposure of 119875 to 119875 is

119890ℓ119875rarr119875

=sum119895 isin119875

Pr(119883 ℓ = 119895) =sum119895 isin119875

120645 ℓ119875 (5)

and describes the probability that a reader in 119875 reaches an arbitrarynode in 119875 at the ℓth click

We employ this metric in two ways

(1) (Topological exposure to diverse information) If ℓ is 1 and

the CwP model is 119872119906(see Sect 412) it only quantifies

the topological property of the network to connect pages

belonging to different sets

(2) (Readersrsquo exposure to diverse information) For any parameter

and model that we pick the metric tells us how the readers

characterized by the CwP and Navigation models change

their exposure to diverse information over a session (ie

sequence of clicks)

Moreover we notice that Definition 2 can be extended tomultiple

sets Consider the case where we want to understand how one set of

nodes 119875 is exposed to three sets of nodes 119876119885 and 119871 To calculate

the ExDIN if we want to know the total exposure to the three sets

we define 119875 = 119876 cup 119885 cup 119871 Otherwise if we want to have the ExDIN

wrt to each set namely 119890119875rarr119876 119890119875rarr119885 119890119875rarr119871 we take 120645ℓ119875and sum

up the probabilities of the nodes within each set

Now that we have a metric to compute the exposure to diverseinformation we want to compare the flows among the sets Thus

we introduce the mutual exposure to diverse information

Definition 3 (Mutual exposure to diverse information (M-ExDIN))Let 119890ℓ

119875rarr119875and 119890ℓ

119875rarr119875be the exposure to diverse information of sets 119875

and 119875 We say that the mutual exposure between the sets is

120598ℓ =min119890ℓ

119875rarr119875 119890ℓ119875rarr119875

max119890ℓ119875rarr119875

119890ℓ119875rarr119875

isin [0 1] (6)

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

00 05 10 15 20 25 of pageview from

opposite partition

Pro-lifePro-choice

ProhibitionActivism

Gun controlGun rights

CreazionismEvolutionary biology

LGBT discriminationLGBT rights

RacismAnti-racism

Figure 6 Percentage of pageviews coming from oppositeside Topics in order from the bottom abortion cannabisguns evolution LGBT racism

If either 119890ℓ119875rarr119875

or 119890ℓ119875rarr119875

is 0 then 120598 = 0

This measure quantifies to what extend the exposure to diverse

information is balanced across 119875 and 119875

The closer 120598 is to 1 the more balanced the probabilities of moving

from one set to the other are In this case the network topology does

not favor connections from one set to the other On the other hand

if 120598 is close to 0 it the network structure tends to favor either the

navigation from 119875 rarr 119875 or from 119875 rarr 119875 On this perspective if we

observe a tendency in the network of facilitating the exploration

from one of the sets to the other we may say that the network

topology is biased toward a direction Thus we can think of using

M-ExDIN to measure the bias in the network wrt two sets of

nodes

Even though the mutual exposure to diverse information cap-

tures the balance among ExDIN of 119875 and 119875 when they are of com-

parable size it may fail if one is much smaller than the other For

instance suppose 119875 is 10 times larger than 119875 then if pages of both

partitions have a similar out-degree distribution one would expect

119890ℓ119875rarr119875

asymp 10 middot 119890ℓ119875rarr119875

and as a result 120598 asymp 01 The same happens if

they have similar in-degree distribution For this reason when we

compute either ExDIN and M-ExDIN we check whether the sizes

of the communities are unbalanced and we proceed as follows If

|119875 | lt |119875 | we define 119875 prime obtained by sampling |119875 | articles from 119875

Thus we use the new set for all computations Because of the ran-

domness of the phenomenon we repeat the measurements multiple

times

5 RQ1 READERSrsquo TOPIC CONSUMPTIONBefore looking into how readers are exposed to diverse content

we investigate how they have consumed each of the six topics

that we concentrate on over the last four years In particular we

collect monthly clickstream data from November 2017 to September

2020 We note that when we count the click views of a page we

consider the average over the number of months the page existed16

Accordingly when computing the occurrences for the transitions

matrix based on clickstream we consider the average clicks of the

link over the number of months it exists In this way we reduce

16Based on the temporal graphs extracted by [11]

the seasonality effect and weight links according to page changes

in terms of hyperlinks

51 Pageviews DistributionTo start our analysis we count the average number of times a

page has been visited over 34 months In Figure 5 we plot the

log-distributions of the pageviews for each topic and opinion By

running a t-test we conclude that for all topics except for abortionthe difference of the means of opinionsrsquo pageviews is statistically

significant for 119901 lt 005 This finding demonstrate that users tend to

visit more pages expressingsupporting one of the two viewpoints

From a networkrsquos perspective to increase the exposure to opposing

opinions it is desirable for pages that are frequently visited to be

well connected to articles expressing opposing opinions

In Figure 6 we break down the pageviews showing how many

of them come from pages of the opposing partition Overall the

fraction of visits from the opposite side is low (below 05) The

category LGBT rights has the highest ratio of visits from LGBT dis-crimination pages about 25 For topics such as guns and abortionthe percentage of visits from opposite partition shows that there

are somewhat fewer visits to pages of a liberal inclination from

articles expressing a more conservative opinion In fact the 028

of visits to pro-choice come from pro-life compared to 06 visits of

pro-life from pro-choice

52 External or Internal Access to the TopicWe now investigate how readers access content about a topic As

introduced in Section 411 from the clickstream data we can com-

pute the RSR which indicates whether a page is accessed more

by external sources or by navigating Wikipedia In Figure 7 we

provide a visualization that depicts the flows of the cumulative

visits from external and internal pages towards the two partitions

Referring to Figure 7(c) the 448 of visualizations come from

internal pages The click stream from internal pages is broken down

to see the proportion of flow towards guns control and gun rightsThe internal views of guns control articles are 34 times more than

those of gun rights We observe that also from external websites

most of the traffic is towards gun control (27 times more than gunrights) Overall the 26 of the total visits to gun related content is

concentrated on gun rightsThe abundance of traffic towards one of the two opinions does

not characterize only the guns topic Indeed among all the topics

the 59ndash74 of visits is accumulated by one partition Moreover

readersrsquo preferences appear consistent among external and internal

accesses that is they both point more towards the same view of

the topic For both internal and external views the distribution

of accesses toward partitions is approximately the same (ie the

percentage of visits from external to 119875 (resp 119875 ) is the same of

from internal to 119875 (resp119875 )) The only exception is evolution whoseexternal visits to creationism is 453 lower than internal accesses

We note that partitions with higher views are not necessarily the

biggest in the topic-induced networks

In general the largest amount of visits to topicsrsquo articles comes

from external pages Particularly only the 236 and 335 of traffic

to evolution and racism is generated by the internal Wikipediarsquos

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll PPPPPrrrrrooooo-----llllliiiiifffffeeeee

PPPPPrrrrrooooo-----ccccchhhhhoooooiiiiiccccceeeee

(a) Abortion

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

PPPPPrrrrrooooohhhhhiiiiibbbbbiiiiiiiiiitttttooooonnnnn

AAAAAccccctttttiiiiivvvvviiiiisssssmmmmm

(b) Cannabis

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

CCCCCooooonnnnntttttrrrrrooooolllll

RRRRRiiiiiggggghhhhhtttttsssss

(c) Guns

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllllCCCCCrrrrreeeeeaaaaatttttiiiiiooooonnnnniiiiisssssmmmmm

EEEEEvvvvvooooolllll BBBBBiiiiiooooo

(d) Evolution

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

RRRRRaaaaaccccciiiiisssssmmmmm

AAAAAnnnnntttttiiiii-----rrrrraaaaaccccciiiiisssssmmmmm

(e) Racism

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

DDDDDiiiiissssscccccrrrrriiiiimmmmmiiiiinnnnnaaaaatttttiiiiiooooonnnnn

SSSSSuuuuuppppppppppooooorrrrrttttt

(f) LGBT

Figure 7 Cumulative Pagesrsquo Traffic Eachplot indicates (1) On the left the cumulative amount of accesses coming from externalweb pages or internal Wikipediarsquos articles (2) The flows of visits from external and internal pages to partitions (3) On theright the cumulative accesses to 119875 and 119875

0 5 10 15 20 25 30Avg(Click Through Rate) X 100

Pro-lifePro-choiceProhibition

ActivismGun controlGun rights

CreazionismEvolutionary biologyLGBT discrimination

LGBT rightsRacism

Anti-racism

Figure 8 Average click-through rate In this plot we reportthe average CTR of pages belonging to the same set 119875 (resp119875) The score indicate the average probability that a linkwithin a page 119901 in 119875 (resp 119875) is clicked Topics in orderfrom the bottom abortion cannabis guns evolution racismLGBT

navigation The same quantity for the remaining topics ranges

ranges between 44 and 47

We point out that readersrsquo consulting articles about abortioncannabis and guns are inclined toward pages conveying liberal

views on the topic Instead it is more complicated to draw inter-

pretations about the remaining topics One explanation may be

that users look for information generally less covered in the public

mainstream debate

53 How Much Readers Navigate LinksOnce readers visit a page they can decide to click any of its links

We want to understand how frequently they do so For that we

compute the average pagesrsquo click-through rate (Sect 411)

We plot this information in Figure 8 Overall we see that the

percentage of access turning into a visit to another page ranges

between 10ndash28 Dimitrov and Lemmerich [14] observed that the

CTR average for the whole Wikipedia is 12 So most of the subset

of pages we consider have a CTR higher than Wikipediarsquos average

The CTR of guns control is the highest (28) the pages about racismfollow with 26 The articles that over the years have generated

less internal traffic are those about evolutionary biology and LGBTrights

Examining pagesrsquo connections we found that those with higher

CTR have more links (the Pearson correlation coefficient of is 052)

Topic macrC(119875)

macrC(119875 rarr 119875) macrC(119875 rarr 119875)

macrC(119875) macrC(119875 rarr 119875)

macrC(119875 rarr 119875)Abortion 6889 9082 5764 7026 8981 4537

Cannabis 7081 9501 3750 6578 9652 1667

Guns 5234 7869 3568 5963 7535 3928

Evolution 7115 8449 6447 7269 9900 5513

Racism 5636 8841 3432 7187 9063 6544

LGBT 6166 8942 525 7242 9252 5917

Table 3 Average of links within pages clicked less than 10times

macrC(119875) is the average percentage of un-clicked hyper-linkswithin pages in 119875

macrC(119875 rarr 119875) is the average percentageof un-clicked links within 119875 pointing to 119875

This suggests that articles having higher out-degree offer more

options to users Presumably because of this users are more likely

to continue the exploration from those articles

In addition we count the number of links clicked fewer than 10

times over the last three years see Table 3 As an example given

a page in creationism on average 7115 of its links have been

clicked fewer than 10 times If we distinguish between references

to creationism and references to evolution readers did not click

the 8449 of links pointing to creationism and the 6447 of those

pointing to evolutionary biology

6 RQ2 EXPOSURE ACROSS TOPICVIEWPOINTS

The main contribution of this paper is to examine to what extent

current Wikipediarsquos topology supports users to explore diverse

facets of polarizing issues In particular we study (1) how readers are

locally exposed to diverse information and (2) how their exposure

to plural opinions may change throughout a navigation session

61 Exposure to DiversityTo evaluate the exposure to diversity induced by the networkrsquos

topology we compute the exposure to diverse information for ℓ = 1

using the uniform CwP model Recalling that ℓ indicates the usersrsquo

session length if we set it equals to 1 we study the exposure to

diversity over one-click sessions

Plots in the first row of Figure 9 show the value of ExDIN for

all the topics when CwP is119872119906

For instance let evolution be the topic we analyze If readers

start uniformly at random from a page about creationism the prob-

ability of visiting an article of the same partition is 576 On the

contrary the chances of entering a page about evolutionary biology

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

Pro-life

Pro-choice

Unifo

rm

456

748133

061 Prohibiiton

Activism

338

038039

008 Control

Rights

444

478161

116 Creationism

Evol Bio

576

327061

018 Racism

Anti-racism

876

8613

142 Discrimination

Support

424

395063

054

Pro-life

Pro-choice

Posit

ion

428

781115

056 Prohibiiton

Activism

332

037037

008 Control

Rights

468

473138

089 Creationism

Evol Bio

553

329054

016 Racism

Anti-racism

892

834129

159 Discrimination

Support

394

356056

042

Pro-

life

Pro-

choi

ce

Pro-life

Pro-choice

Click

s

1274

1357129

123Pr

ohib

iiton

Activ

ism

Prohibiiton

Activism

864

065081

013

Cont

rol

Righ

ts

Control

Rights

1376

1305208

137

Crea

tioni

sm

Evol

Bio

Creationism

Evol Bio

1157

52069

03

Racis

m

Anti-

racis

m

Racism

Anti-racism

2367

1503116

129

Disc

rimin

atio

n

Supp

ort

Discrimination

Support

1107

653066

053

Figure 9 ExDIN Each patch of the matrix is the exposure to diverse information across partitions for example 119875 rarr 119875 119875 rarr 119875 The 119910-axis indicates the source and the 119909-axis is the destination To each row corresponds the exposure to diverse informationcomputed for different CwP Darker colors indicate higher probability of being in the correspondent square in one click

is 018 (32 times smaller) On the other hand readers starting

uniformly at random from an article about evolutionary biologyhave 327 chances of reading pages conveying the same opinion

This probability is 5 times larger than that of visiting creationismpages

It is worth to point out that the current networkrsquos topology not

only nudges users in reading more about the same opinion but

also hinders them to explore diverse content symmetrically Indeed

users reading about evolutionary biology have higher chances of

reading one article about creationism (3 times more) than users from

creationism of reading about evolutionary biology After repeatingthe same analysis for all the topics we realize that the aforemen-

tioned observations hold for most of them Moreover we note that

the probabilities to continue the session reading a page of the same

opinion is greater for one of the two partitions of a given topic

Taken together these measurements highlight that the structure

of the network facilitates users to explore knowledge bubbles ofhomogeneous view and makes the measure of mutual exposure to

diverse information smaller than 1

The findings above report the intrinsic capability of the network

to spur users towards diverse content If we want to combine it

with readersrsquo next-click choice behavior we use the the positionand clicks CwP models instead of the uniform We show the results

in the second and third row of Figure 9

Referring back to evolution we now consider the matrix corre-

sponding to the ExDIN computed using the position CwP model

(second row) We see that if users click with higher chances links

at the top of the page wrt the uniform model the probabilities are

only slightly modified These modest variations are coherent with

linksrsquo placement within pages Figure 2

For a few topics such as guns the linksrsquo position plays a more

significant role worsening the user exposure to diverse information

Indeed in pages about guns control links belonging to the gunsrights partition seem to be mentioned later in the page The con-

sequence is that the probability of reaching an article supporting

guns rights starting uniformly at random from an article in gunscontrol has a 30 drop wrt the probability observed using the

uniform CwP model Therefore we conclude that for some topics

the placement of links within pages contributes in reducing the

exposure to diverse information In other words users who tend

to click with higher probability the links located towards the be-

ginning of a page have less possibility to read about contrasting

opinions

Finally we analyze how the phenomenon changes when we use

the click CwP model (third row) In this case we assume readers

make the next-click choice similarly to past users Going back to

evolution we immediately observe that the probability to start a

session in creationism and to continue reading about it after one

click grows from 576 of the uniform model to the 1157

For all the topics we verify a significant increment of the proba-

bility to visit pages of the same opinion Simply interpreting this

result we can say that real users click more the links strictly related

to the page they are reading From another perspective combining

this finding with the previous remark saying that the ldquotopology of

the network seems to drive users to explore knowledge bubbles ofhomogeneous viewrdquo we ask the reader the following question Hasthe behavior of past users been influenced by the network topologyUnfortunately because of lack of information we can not answer

this question but we hope it will be addressed in future works

Furthermore we observe that for some topics like abortion theprobability of reaching pro-choice articles from pro-life duplicatesThis is a sign that users may be willing to explore content proposing

diverse view

Before moving to the next section we want to underline that

stronger relations among pages of similar content is an intrinsic

property of Wikipedia In fact in Wikipediarsquos Linking Manual [49]editors are asked to link related content [7 33] Although this is a

fact we believe that it would be valuable to provide to editors met-

rics and tools making them aware of the effect that a newcurrent

links have on usersrsquo exposure to diverse information This is not

meant to alter the core and essential intrinsic property ofWikipedia

rather to avoid this property to become harmful when it prevents

users from accessing diverse content

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

1 3 5 7 9 11 13 15

1

2

e PP

Abortion

1 3 5 7 9 11 13 15

00

05

10Cannabis

1 3 5 7 9 11 13 150

1

2

Guns

1 3 5 7 9 11 13 15

05

10Evolution

1 3 5 7 9 11 13 15

05

10

15

Racism

1 3 5 7 9 11 13 15

05

10

Lgbt

1 3 5 7 9 11 13 1500

05

10

15

e PP

1 3 5 7 9 11 13 15

000

025

050

1 3 5 7 9 11 13 15

1

2

1 3 5 7 9 11 13 15

02

04

06

1 3 5 7 9 11 13 1505

10

15

1 3 5 7 9 11 13 15

05

10

15

1 3 5 7 9 11 13 15

40

60

80

100

1 3 5 7 9 11 13 15

25

50

75

1 3 5 7 9 11 13 1525

50

75

100

1 3 5 7 9 11 13 15

40

60

80

1 3 5 7 9 11 13 15

60

80

100

1 3 5 7 9 11 13 15

60

80

100

Number of Clicks

uniform alpha=0position alpha=0clicks alpha=0

uniform alpha=02position alpha=02clicks alpha=02

uniform alpha=1position alpha=1clicks alpha=1

Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42

62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions

longer than 1 and see how their (mutual) exposure to diversity may

change According to the combinations of models employed to mea-

sure the ExDIN and M-ExDIN we provide different insights about

the effect of the current networkrsquos topology on usersrsquo exposure to

diverse content over a navigation session

In Figure 10 for sessions of length 15 we plot the ExDIN from

119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can

notice that each of the topics shows its own trends For this reason

we decide to highlight and provide an explanation for the most

recurrent patterns Moreover for better understanding we suggest

to cross-check the following explanations with the analysis done

above in the paper We start describing how given the current net-

workrsquos topology the ExDIN changes over the course of a navigation

session (first and second row)

(1) The curves corresponding to the same value of 120572 (same color)

show very similar trends Depending on the respective CwP model

(markerrsquos shape) they are shifted up or down This implies that

when users share the same navigation behavior the way they make

the next-click choice plays a crucial role on determining the mag-

nitude of their exposure to diverse content In general the CwP

model (markersrsquo shapes) corresponding to higher exposure is119872119888

followed by119872119906and119872119901

(2) If users navigate mirroring a star-like behavior (green 120572 = 1)

their exposure to the opposite opinion is steady It can slightly

decrease or increase when the probability of clicking links to the

opposite side becomes higher or lower respectively in the first

iterations This happens because these kind of users are only subject

to the exposure of their starting navigation page So the more links

to the opposite partition they click in the first steps the more their

exposure to diversity decreases and vice-versa

(3) The curves of users who randomly navigate the network (sky

blue 120572 = 0) show two trends For both cases after the first few

clicks the ExDIN is lower than at the beginning of the session

Then it inverts the trend In one case it reaches or exceeds the

starting exposure On the other hand it grows getting steady below

the initial exposure The more the destination partition is connected

to the rest of the graph the more users randomly navigating the

network are able to reach it Sometimes ExDIN from 119875 to 119875 of

LGBT the curves start to decrease after many steps This happens

when the pages within the destination partition have been reached

with high probability

(4) Users characterized by a star-like random navigation (blue

120572 = 02) are exposed to diverse content similarly to users exploring

randomly the network but the ExDIN magnitude is greater because

of the possibility of jumping back to the starting point at each step

Given this observation we analyze the ExDIN of guns With the

current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)

are those characterized by a star-like behavior As soon as the users

navigate in a random fashion this probability drops down This is

because the guns rights partition has a number of in-going edges

that prevent random users who walk away to get back to it We

observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to

the guns control partition is higher or comparable to that of star-like

behaving users The probability of reaching guns control after many

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

steps becomes high because of the higher incoming edges of the

partition

From complementary analysis we observe that for sessions

longer than 2 page visits the topology of the graph is not such

to keep random users within the knowledge bubble they accessed

Indeed for all the topics the probability to visit articles of the same

opinion of the starting page becomes close to 0 (refer to Fig 9 to

cross-check the probabilities at the first click) For users with star-

like behavior the probabilities show slightly descending curves

This demonstrates that they tend to pick pages of the same opinion

at the first iterations Due to space constraints figures picturing

these phenomena could not be displayed

Nowwe compare the ExDIN curves of each topic to understand if

users reading about different opinions have equal chances of visiting

each other We recall that to do it we use the Mutual exposure to

diverse information see Sect 42 For all the topics the mutual

exposure to diverse information is lower than 100 meaning that

for none of them does the network topology provide an equal

exposure across opinions If we consider topics like abortion and

guns the longer the usersrsquo session is the more the network topology

prevents readers to symmetrically explore different opinions

In general if we detect low mutuality the topology of the net-

work favors the exploration of one side of the topic more than the

other Moreover we want to stress that all the comments regarding

users navigating according to the uniform CwP model express the

intrinsic topological exposure of the network On Wikipedia two

scenarios determining this phenomenon may be (1) the knowledge

on the encyclopedia is complete but articles are underlinked [49]

that is there is the content and keywords to become anchors thus

we need a strategy to densify the network ensuringmutual exposure

to diversity (2) the knowledge on the encyclopedia is incomplete

that is there are no words to attach links thus the addition of com-

plementary content may be necessary An in depth investigation of

this conditions may be an interesting future work

7 CONCLUSIONSOur work provides a first analysis to understand how the current

Wikipediarsquos network topology assists readers to explore opposite

stances of polarizing topics spanning over sets of articles We for-

malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles

expressing opposing viewpoints The latter evaluates whether the

relationship across diverse views is symmetric that is whether

the flow and the opportunity to go from one side to the other is

comparable for the two directions

We investigate the phenomenon on six polarizing topics (Sect 6)

In addition we also study the overall usersrsquo topics consumption

Our main findings suggest the following

bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-

ing from both external and internal pages suggest that read-

ers are inclined to seek content about one facet of the topic

Most seem to have bias toward liberal content (see Fig 7)

bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse

contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61

the analysis suggests that users reading about an opinion

have higher chances of continuing to explore articles of sim-

ilar views than of the opposite Furthermore for each of the

topics that we explored the users of one of the two views

had substantially more tendency albeit small to visit pages

of the opposing view than the ones of the other one

bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we

observe that the mutual exposure to diverse information is

never achieved by users navigating completely at random

The better one of the two opinions is connected to the rest of

the network the more the graph nudges users toward that

opinion

bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-

mation of the same inclination diminishes On the other

hand if users explore the network with a star-like behavior

their level of exposure to the same opinion is similar to those

who only do one click

In our study we analyze sets of articles assigned to opinions ac-

cording to editorsrsquo crafted categories [44] Although this approach

represents a solid starting point for analysis it can cause article mis-

classification As future work we plan to investigate a more reliable

classification strategy to improve the accuracy of our analysis an

analysis which should include also the content of the articles along

the line information Secondly the performance of a longitudinal

study with the goal of understanding the dynamics that brought

to the current state of the encyclopediarsquos network would provide

further understanding of the usersrsquo behavior Finally in the light

of our findings we deem crucial to design tools to help editors to

contextualize articles within the network such that they are aware

of the effect of links insertion on users knowledge exploration

The prevalence of bias and polarization is well established in

multiple areas of our life and filter bubbles aggravate this phenom-

enon Understanding better how they manifest in Wikipedia (and

other media) is a crucial first step for finding ways to attenuate it

and our hope is that this work is a step towards this goal

Acknowledges Partially supported by the ERCAdvancedGrant

788893 AMDROMA Algorithmic and Mechanism Design Research

in Online Markets and MIUR PRIN project ALGADIMAR Algo-

rithms Games and Digital Markets

REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004

US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem

[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion

networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep

Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18

[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian

Self-Governance in Action Motivating the Policy Lens In ICWSM

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models

IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation

Journal of machine Learning research 3 Jan (2003) 993ndash1022

[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-

work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web

[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content

on famous persons Journal of the American society for information science andtechnology (2011)

[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter

Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM

[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves

Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter

In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs

A complete longitudinal and multi-language dataset of the Wikipedia link net-

works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607

[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena

Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-

ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media

[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho

2016 Extracting Semantics from Random Walks on Wikipedia Comparing

Learning and Counting Methods In Wiki ICWSM

[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-

ferent topic different traffic How search and navigation interplay on Wikipedia

The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28

[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926

[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web

[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding

news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346

[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers

and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320

[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in

Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72

[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In

Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)

[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Political discourse on social media Echo chambers gate-

keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922

[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-

ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671

[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)

[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-

rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282

[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First

Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media

[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013

Models of human navigation in information networks based on decentralized

search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98

[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki

dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-

trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113

[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and

Andreas Hotho 2019 On the right track Analysing and predicting navigation

success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext

and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web

Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web

[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017

How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50

[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of

source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754

[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-

nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and

moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505

[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my

friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726

[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito

2015 Measuring political polarization Twitter shows the two sides of Venezuela

Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018

Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18

[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving

website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624

[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring

Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval

[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk

about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11

[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019

Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-

fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and

Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141

[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014

The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222

[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom

of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn

Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600

[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013

Computing semantic relatedness from human navigational paths A case study

on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70

[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer

2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)

[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-

works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki

WikipediaManual_of_StyleLinking

[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia

Namespace

[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki

WikipediaNeutral_point_of_view

[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect

[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https

doiorg106084m9figshare1305770v22 (2017)

[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing

wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985

  • Abstract
  • 1 Introduction
  • 2 Related Works
  • 3 Data Collection
    • 31 Topic Induced Networks
    • 32 General Statistics on Topics Networks
      • 4 Metrics
        • 41 Content Consumption
        • 42 Exposure Metrics
          • 5 RQ1 Readers Topic Consumption
            • 51 Pageviews Distribution
            • 52 External or Internal Access to the Topic
            • 53 How Much Readers Navigate Links
              • 6 RQ2 Exposure Across Topic Viewpoints
                • 61 Exposure to Diversity
                • 62 Dynamic Exposure to Diversity
                  • 7 Conclusions
                  • References

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

Click Through Rate (CTR) Given a page 119894 isin 119881 the empirical

probability that a reader clicks a link within the page is

119862119879119877119894 =

sum119895 isin119873119900119906119905 (119894) 119888119894 119895

119888119894 (2)

where 119873119900119906119905 (119894) is the the set of pages 119894 points to (Multiple clicks

from the same page are counted as originating from different visits

to 119894 and thus counted multiple times in 119888119894 )

412 Model Clicks Within Pages When readers visit a page they

have the possibility of clicking any of the present links However

according to the information needs they want to satisfy each of the

links may have a different probability of being clicked [45] Now

we propose three models to describe the distribution probability

of clicking a link ldquojrdquo within an article ldquoirdquo First let 119894 be an article

in 119881 and 119895 isin 119873119900119906119905 (119894) We define 119901119900119904 ( 119895 |119894) as the rank of 119895 among

all links in 119894 and 119903 ( 119895 |119894) = |119873119900119906119905 (119894) | minus 119901119900119904 ( 119895 |119894) such that a higher

value indicates a higher ranking position Moreover we introduce

tanh119909 = 1198902119909minus1

1198902119909+1 which we use to transform ranking positions to

values between 0 and 1 such that links at the top of the page get

similar scores

The Clicks Within Pages models (CwP) are directly applicable on

119866 by setting the transition matrix119872 in one of the following modes

(1) 119872119906(Uniform) whose entry119898(119894 119895) = 1

|119873119900119906119905 (119894) | mimics read-

ers who click each link in a page uniformly at random

(2) 119872119901(Position) whose entry 119898(119894 119895) =

tanh 119903 ( 119895 |119894)sum119895isin119873119900119906119905 (119894 ) tanh 119903 ( 119895 |119894)

captures the scenario in which readers click with higher

probability links appearing first in the page This model is

based on previous work that shows how the link position is

a good predictor to determine the success of a link [16 31]

(3) 119872119888(Clicks) whose entry119898(119894 119895) = 119888119894119895sum

119895isin119873119900119906119905 (119894 ) 119888119894119895represents

the empirical probability that users in 119894 will click the link

toward 119895 When 119888119894 119895 lt 10 we substitute it with 1014 the

minimum number of times the link must be clicked to be

included in the dataset [53]

For the sake of completeness we recall that 119866 includes a super

node 119904 To fill its corresponding entries in the transition matrices

we need to aggregate over the edges we compressed to build the

graph15

see Sect 31

413 Readers Navigation Model The main goal of this paper is

to audit the mutual exposure to diverse information across 119875 and

119875 We can do it by simply looking at a snapshot of the graph and

counting the links going from 119875 to 119875 and vice-versa To do a step

further we recall that the Wikipediarsquos network is conceived to let

users move fulfilling their own information needs Thus we want

to understand how different usersrsquo navigation behavior can affect

readersrsquo exposure to diverse information

To do that it would be optimal to have access to usersrsquo log ses-

sion Because these data are not available to the public we define a

parametric model that simulates usersrsquo navigation by embedding

14We aim to model users on the current version of Wikipedia Thus to include all

the links we assign a smoothing factor equal to 10 to links clicked less than 10 This

implies a small probability of clicking these links Setting the smoothing factor to 10

is a deliberate choice However we experimentally verified that setting any number

between 1 and 10 does not affect the results

15The computation of these quantities is straightforward so we omit it from the

body of the paper

external

internal

i internal

Figure 3 Information from the clickstreamdataset For eachnode we extract the number of views coming from inter-nal and external websites Moreover we know howmany ac-cesses on a page turn into a click toward another article

different behaviors accordingly to chosen parameter We empha-

size that the scope of this model is not to perfectly replicate usersrsquo

behavior on Wikipedia Rather we want to see how users simu-

lated from a reasonable and general model are exposed to diverse

information

In other words we want to define a stochastic process with 119899 +1

states corresponding to the 119899 + 1 pages in119881 that approximates the

probability of reaching any of the articles starting at random from

119901 isin 119875 (or from 119875 )

Wemodel this by considering the process 119883 ℓ ℓ = 0 1 119871 on

the set of nodes119881 induced by transitionmatrix119872 with starting state

119883 0selected from the probability distribution 1206450

119875= (120587119875 )119894 isin R1times119899

over119881 We recall that the transition matrix119872 can vary according to

the CwP models (Sections 31 and 412) Based on the assumption

that usersrsquo session length (the number of clicks) is finite we evaluate

the process on a finite number of states 119871 We have that Pr(119883 ℓ =

119895) = (120587 ℓ119875) 119895 where the (row) vector 120645 ℓ

119875is given by the following

variation of the Personalized Random Walk with Restart (RWR)

Definition 1 (Navigation Model) Let1198720 be the transition ma-trix embedding a click-within-pages model 1206450

119875the distribution of the

starting state over 119875 and 120572 isin [0 1] the restart parameter We have

1206451

119875 = 1206450

119875middot1198720 (3)

and for ℓ ge 1

120645 ℓ+1

119875 = (1 minus 120572)120645 ℓ119875 middot119872ℓ + 120572 (1206450

119875 middot119872ℓ ) (4)

where119872ℓ = norm((119863 (119872ℓminus1)119879 )119879 ) and119863 = 119889119894119886119892

(1 + 120645 ℓminus1

119875

)minus1

norm(119872)transforms matrix119872 into a right-stochastic matrix by normalizingeach row independently such that it sums to 1

This process is a variation of the standard random-surfer (PageR-

ank) model with the difference that the transition matrix is updated

in each step It takes into account the probability that an article

has already been visited in a previous iteration Specifically the

vector 120645 ℓ119875that we get at the end of each iteration represents the

likelihood that each node is reached at step ℓ if it starts uniformly at

random from a node in 119875 We assume that readers within the same

session do not click more than once the same link Thus we desire

that at step ℓ + 1 the nodes that are clicked with high probability

at step ℓ see their probability of being reached deflated and those

with lower probability have more chances of being clicked We

achieve this by dividing the rows of119872 by the vector of probabilities

120645 ℓ119875+1 where 1 is a smoothing factor to avoid divisions by 0 and

then normalize the matrix to get the updated stochastic matrix to

use in the next iteration

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

(a) Star-like (120572 = 1) (b) Star-like Rand Navigation(0 lt

120572 lt 1)

(c) Random Naviga-

tion (0 = 120572 )

Figure 4 Navigation model for different 120572 The green nodesrepresent the starting navigation pages

Overall as we will later see in Section 42 this approach allow us

to investigate how the exposure to diverse information varies for

users who behave differently in terms of navigation session length

(meant as the number of clicks) and next-link choices

Looking deeper at the model

bull When 120572 = 1 Figure 4(a) the model emulates the reader

whose navigation consists in just opening links from the

starting page We call this behavior star-like and basically

consists in opening pages from the starting node With this

kind of exploration readers locally explore articles likely

semantically related to each other [49]

bull For 0 lt 120572 lt 1 Figure 4(b) we simulate two cases (1) readers

open sequential articles and then jump back to the starting

page (2) readers keeps multiple path open The more 120572 is

close to 1 the more users show a star-like behavior Instead

the closer 120572 is to 0 the more users navigate navigate in a

more DFS-oriented fashion Thus readers move randomly

according to the CwP model and from time to times jump

back to the starting page

bull If 120572 = 0 Figure 4(c) the users sequentially clicks links so

each click depends only on the CwP model In this case

especially if related articles are not densely connected the

exploration can lead to articles less related to the starting

page and returning to the origin following hyperlinks may

be difficult

Because Wikipedia does not have a button that allows readers to

go back to the previous page we assume the jumping back action

to consist in clicking the back button of the browser in use until

reaching again the session starting page The restart parameter

indirectly embeds the back-button action which for the absence of

back-links on Wikipedia can not be tracked on the graph

The behaviors replicated through the model recall those de-

scribed in [43 45]

42 Exposure MetricsAt this point we have all the ingredients to define the exposureto diverse information The metrics aim to quantify how much the

network structure allows readers to reach one or multiple sets of

articles To do that we rely on both the CwP and Navigationmodels

The application of the following metrics is not limited to polarizing

topics In fact they can generalize to the analysis of any sets of

nodes in a graph For this reason we adopt a more general notation

in their definition

Pro-

life

Pro-

choi

ce

Proh

ibiit

onAc

tivism

Cont

rol

Righ

ts

Crea

tioni

smEv

ol B

io

Racis

m

Anti-

racis

m

Disc

rimin

atio

nSu

ppor

t

0

5

10

Log(

Page

view

s)

Figure 5 Pageviews distribution For each topic we havea purple and yellow boxplot They represent the average(over all pages in the group 119875 or 119875) number of pageviewsAll the distribution distributions except for abortion are sta-tistically different at confidence level 120572 = 095 The topicsin order are abortion cannabis guns evolution racism andLGBT

Definition 2 (Exposure to diverse information (ExDIN)) Giventwo sets of pages 119875 119875 in 119881 let 120645 ℓ

119875be the vector indicating for each

article the probability of being reached at step ℓ (ℓ ge 1) starting froma random page in 119875 We say that the exposure of 119875 to 119875 is

119890ℓ119875rarr119875

=sum119895 isin119875

Pr(119883 ℓ = 119895) =sum119895 isin119875

120645 ℓ119875 (5)

and describes the probability that a reader in 119875 reaches an arbitrarynode in 119875 at the ℓth click

We employ this metric in two ways

(1) (Topological exposure to diverse information) If ℓ is 1 and

the CwP model is 119872119906(see Sect 412) it only quantifies

the topological property of the network to connect pages

belonging to different sets

(2) (Readersrsquo exposure to diverse information) For any parameter

and model that we pick the metric tells us how the readers

characterized by the CwP and Navigation models change

their exposure to diverse information over a session (ie

sequence of clicks)

Moreover we notice that Definition 2 can be extended tomultiple

sets Consider the case where we want to understand how one set of

nodes 119875 is exposed to three sets of nodes 119876119885 and 119871 To calculate

the ExDIN if we want to know the total exposure to the three sets

we define 119875 = 119876 cup 119885 cup 119871 Otherwise if we want to have the ExDIN

wrt to each set namely 119890119875rarr119876 119890119875rarr119885 119890119875rarr119871 we take 120645ℓ119875and sum

up the probabilities of the nodes within each set

Now that we have a metric to compute the exposure to diverseinformation we want to compare the flows among the sets Thus

we introduce the mutual exposure to diverse information

Definition 3 (Mutual exposure to diverse information (M-ExDIN))Let 119890ℓ

119875rarr119875and 119890ℓ

119875rarr119875be the exposure to diverse information of sets 119875

and 119875 We say that the mutual exposure between the sets is

120598ℓ =min119890ℓ

119875rarr119875 119890ℓ119875rarr119875

max119890ℓ119875rarr119875

119890ℓ119875rarr119875

isin [0 1] (6)

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

00 05 10 15 20 25 of pageview from

opposite partition

Pro-lifePro-choice

ProhibitionActivism

Gun controlGun rights

CreazionismEvolutionary biology

LGBT discriminationLGBT rights

RacismAnti-racism

Figure 6 Percentage of pageviews coming from oppositeside Topics in order from the bottom abortion cannabisguns evolution LGBT racism

If either 119890ℓ119875rarr119875

or 119890ℓ119875rarr119875

is 0 then 120598 = 0

This measure quantifies to what extend the exposure to diverse

information is balanced across 119875 and 119875

The closer 120598 is to 1 the more balanced the probabilities of moving

from one set to the other are In this case the network topology does

not favor connections from one set to the other On the other hand

if 120598 is close to 0 it the network structure tends to favor either the

navigation from 119875 rarr 119875 or from 119875 rarr 119875 On this perspective if we

observe a tendency in the network of facilitating the exploration

from one of the sets to the other we may say that the network

topology is biased toward a direction Thus we can think of using

M-ExDIN to measure the bias in the network wrt two sets of

nodes

Even though the mutual exposure to diverse information cap-

tures the balance among ExDIN of 119875 and 119875 when they are of com-

parable size it may fail if one is much smaller than the other For

instance suppose 119875 is 10 times larger than 119875 then if pages of both

partitions have a similar out-degree distribution one would expect

119890ℓ119875rarr119875

asymp 10 middot 119890ℓ119875rarr119875

and as a result 120598 asymp 01 The same happens if

they have similar in-degree distribution For this reason when we

compute either ExDIN and M-ExDIN we check whether the sizes

of the communities are unbalanced and we proceed as follows If

|119875 | lt |119875 | we define 119875 prime obtained by sampling |119875 | articles from 119875

Thus we use the new set for all computations Because of the ran-

domness of the phenomenon we repeat the measurements multiple

times

5 RQ1 READERSrsquo TOPIC CONSUMPTIONBefore looking into how readers are exposed to diverse content

we investigate how they have consumed each of the six topics

that we concentrate on over the last four years In particular we

collect monthly clickstream data from November 2017 to September

2020 We note that when we count the click views of a page we

consider the average over the number of months the page existed16

Accordingly when computing the occurrences for the transitions

matrix based on clickstream we consider the average clicks of the

link over the number of months it exists In this way we reduce

16Based on the temporal graphs extracted by [11]

the seasonality effect and weight links according to page changes

in terms of hyperlinks

51 Pageviews DistributionTo start our analysis we count the average number of times a

page has been visited over 34 months In Figure 5 we plot the

log-distributions of the pageviews for each topic and opinion By

running a t-test we conclude that for all topics except for abortionthe difference of the means of opinionsrsquo pageviews is statistically

significant for 119901 lt 005 This finding demonstrate that users tend to

visit more pages expressingsupporting one of the two viewpoints

From a networkrsquos perspective to increase the exposure to opposing

opinions it is desirable for pages that are frequently visited to be

well connected to articles expressing opposing opinions

In Figure 6 we break down the pageviews showing how many

of them come from pages of the opposing partition Overall the

fraction of visits from the opposite side is low (below 05) The

category LGBT rights has the highest ratio of visits from LGBT dis-crimination pages about 25 For topics such as guns and abortionthe percentage of visits from opposite partition shows that there

are somewhat fewer visits to pages of a liberal inclination from

articles expressing a more conservative opinion In fact the 028

of visits to pro-choice come from pro-life compared to 06 visits of

pro-life from pro-choice

52 External or Internal Access to the TopicWe now investigate how readers access content about a topic As

introduced in Section 411 from the clickstream data we can com-

pute the RSR which indicates whether a page is accessed more

by external sources or by navigating Wikipedia In Figure 7 we

provide a visualization that depicts the flows of the cumulative

visits from external and internal pages towards the two partitions

Referring to Figure 7(c) the 448 of visualizations come from

internal pages The click stream from internal pages is broken down

to see the proportion of flow towards guns control and gun rightsThe internal views of guns control articles are 34 times more than

those of gun rights We observe that also from external websites

most of the traffic is towards gun control (27 times more than gunrights) Overall the 26 of the total visits to gun related content is

concentrated on gun rightsThe abundance of traffic towards one of the two opinions does

not characterize only the guns topic Indeed among all the topics

the 59ndash74 of visits is accumulated by one partition Moreover

readersrsquo preferences appear consistent among external and internal

accesses that is they both point more towards the same view of

the topic For both internal and external views the distribution

of accesses toward partitions is approximately the same (ie the

percentage of visits from external to 119875 (resp 119875 ) is the same of

from internal to 119875 (resp119875 )) The only exception is evolution whoseexternal visits to creationism is 453 lower than internal accesses

We note that partitions with higher views are not necessarily the

biggest in the topic-induced networks

In general the largest amount of visits to topicsrsquo articles comes

from external pages Particularly only the 236 and 335 of traffic

to evolution and racism is generated by the internal Wikipediarsquos

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll PPPPPrrrrrooooo-----llllliiiiifffffeeeee

PPPPPrrrrrooooo-----ccccchhhhhoooooiiiiiccccceeeee

(a) Abortion

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

PPPPPrrrrrooooohhhhhiiiiibbbbbiiiiiiiiiitttttooooonnnnn

AAAAAccccctttttiiiiivvvvviiiiisssssmmmmm

(b) Cannabis

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

CCCCCooooonnnnntttttrrrrrooooolllll

RRRRRiiiiiggggghhhhhtttttsssss

(c) Guns

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllllCCCCCrrrrreeeeeaaaaatttttiiiiiooooonnnnniiiiisssssmmmmm

EEEEEvvvvvooooolllll BBBBBiiiiiooooo

(d) Evolution

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

RRRRRaaaaaccccciiiiisssssmmmmm

AAAAAnnnnntttttiiiii-----rrrrraaaaaccccciiiiisssssmmmmm

(e) Racism

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

DDDDDiiiiissssscccccrrrrriiiiimmmmmiiiiinnnnnaaaaatttttiiiiiooooonnnnn

SSSSSuuuuuppppppppppooooorrrrrttttt

(f) LGBT

Figure 7 Cumulative Pagesrsquo Traffic Eachplot indicates (1) On the left the cumulative amount of accesses coming from externalweb pages or internal Wikipediarsquos articles (2) The flows of visits from external and internal pages to partitions (3) On theright the cumulative accesses to 119875 and 119875

0 5 10 15 20 25 30Avg(Click Through Rate) X 100

Pro-lifePro-choiceProhibition

ActivismGun controlGun rights

CreazionismEvolutionary biologyLGBT discrimination

LGBT rightsRacism

Anti-racism

Figure 8 Average click-through rate In this plot we reportthe average CTR of pages belonging to the same set 119875 (resp119875) The score indicate the average probability that a linkwithin a page 119901 in 119875 (resp 119875) is clicked Topics in orderfrom the bottom abortion cannabis guns evolution racismLGBT

navigation The same quantity for the remaining topics ranges

ranges between 44 and 47

We point out that readersrsquo consulting articles about abortioncannabis and guns are inclined toward pages conveying liberal

views on the topic Instead it is more complicated to draw inter-

pretations about the remaining topics One explanation may be

that users look for information generally less covered in the public

mainstream debate

53 How Much Readers Navigate LinksOnce readers visit a page they can decide to click any of its links

We want to understand how frequently they do so For that we

compute the average pagesrsquo click-through rate (Sect 411)

We plot this information in Figure 8 Overall we see that the

percentage of access turning into a visit to another page ranges

between 10ndash28 Dimitrov and Lemmerich [14] observed that the

CTR average for the whole Wikipedia is 12 So most of the subset

of pages we consider have a CTR higher than Wikipediarsquos average

The CTR of guns control is the highest (28) the pages about racismfollow with 26 The articles that over the years have generated

less internal traffic are those about evolutionary biology and LGBTrights

Examining pagesrsquo connections we found that those with higher

CTR have more links (the Pearson correlation coefficient of is 052)

Topic macrC(119875)

macrC(119875 rarr 119875) macrC(119875 rarr 119875)

macrC(119875) macrC(119875 rarr 119875)

macrC(119875 rarr 119875)Abortion 6889 9082 5764 7026 8981 4537

Cannabis 7081 9501 3750 6578 9652 1667

Guns 5234 7869 3568 5963 7535 3928

Evolution 7115 8449 6447 7269 9900 5513

Racism 5636 8841 3432 7187 9063 6544

LGBT 6166 8942 525 7242 9252 5917

Table 3 Average of links within pages clicked less than 10times

macrC(119875) is the average percentage of un-clicked hyper-linkswithin pages in 119875

macrC(119875 rarr 119875) is the average percentageof un-clicked links within 119875 pointing to 119875

This suggests that articles having higher out-degree offer more

options to users Presumably because of this users are more likely

to continue the exploration from those articles

In addition we count the number of links clicked fewer than 10

times over the last three years see Table 3 As an example given

a page in creationism on average 7115 of its links have been

clicked fewer than 10 times If we distinguish between references

to creationism and references to evolution readers did not click

the 8449 of links pointing to creationism and the 6447 of those

pointing to evolutionary biology

6 RQ2 EXPOSURE ACROSS TOPICVIEWPOINTS

The main contribution of this paper is to examine to what extent

current Wikipediarsquos topology supports users to explore diverse

facets of polarizing issues In particular we study (1) how readers are

locally exposed to diverse information and (2) how their exposure

to plural opinions may change throughout a navigation session

61 Exposure to DiversityTo evaluate the exposure to diversity induced by the networkrsquos

topology we compute the exposure to diverse information for ℓ = 1

using the uniform CwP model Recalling that ℓ indicates the usersrsquo

session length if we set it equals to 1 we study the exposure to

diversity over one-click sessions

Plots in the first row of Figure 9 show the value of ExDIN for

all the topics when CwP is119872119906

For instance let evolution be the topic we analyze If readers

start uniformly at random from a page about creationism the prob-

ability of visiting an article of the same partition is 576 On the

contrary the chances of entering a page about evolutionary biology

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

Pro-life

Pro-choice

Unifo

rm

456

748133

061 Prohibiiton

Activism

338

038039

008 Control

Rights

444

478161

116 Creationism

Evol Bio

576

327061

018 Racism

Anti-racism

876

8613

142 Discrimination

Support

424

395063

054

Pro-life

Pro-choice

Posit

ion

428

781115

056 Prohibiiton

Activism

332

037037

008 Control

Rights

468

473138

089 Creationism

Evol Bio

553

329054

016 Racism

Anti-racism

892

834129

159 Discrimination

Support

394

356056

042

Pro-

life

Pro-

choi

ce

Pro-life

Pro-choice

Click

s

1274

1357129

123Pr

ohib

iiton

Activ

ism

Prohibiiton

Activism

864

065081

013

Cont

rol

Righ

ts

Control

Rights

1376

1305208

137

Crea

tioni

sm

Evol

Bio

Creationism

Evol Bio

1157

52069

03

Racis

m

Anti-

racis

m

Racism

Anti-racism

2367

1503116

129

Disc

rimin

atio

n

Supp

ort

Discrimination

Support

1107

653066

053

Figure 9 ExDIN Each patch of the matrix is the exposure to diverse information across partitions for example 119875 rarr 119875 119875 rarr 119875 The 119910-axis indicates the source and the 119909-axis is the destination To each row corresponds the exposure to diverse informationcomputed for different CwP Darker colors indicate higher probability of being in the correspondent square in one click

is 018 (32 times smaller) On the other hand readers starting

uniformly at random from an article about evolutionary biologyhave 327 chances of reading pages conveying the same opinion

This probability is 5 times larger than that of visiting creationismpages

It is worth to point out that the current networkrsquos topology not

only nudges users in reading more about the same opinion but

also hinders them to explore diverse content symmetrically Indeed

users reading about evolutionary biology have higher chances of

reading one article about creationism (3 times more) than users from

creationism of reading about evolutionary biology After repeatingthe same analysis for all the topics we realize that the aforemen-

tioned observations hold for most of them Moreover we note that

the probabilities to continue the session reading a page of the same

opinion is greater for one of the two partitions of a given topic

Taken together these measurements highlight that the structure

of the network facilitates users to explore knowledge bubbles ofhomogeneous view and makes the measure of mutual exposure to

diverse information smaller than 1

The findings above report the intrinsic capability of the network

to spur users towards diverse content If we want to combine it

with readersrsquo next-click choice behavior we use the the positionand clicks CwP models instead of the uniform We show the results

in the second and third row of Figure 9

Referring back to evolution we now consider the matrix corre-

sponding to the ExDIN computed using the position CwP model

(second row) We see that if users click with higher chances links

at the top of the page wrt the uniform model the probabilities are

only slightly modified These modest variations are coherent with

linksrsquo placement within pages Figure 2

For a few topics such as guns the linksrsquo position plays a more

significant role worsening the user exposure to diverse information

Indeed in pages about guns control links belonging to the gunsrights partition seem to be mentioned later in the page The con-

sequence is that the probability of reaching an article supporting

guns rights starting uniformly at random from an article in gunscontrol has a 30 drop wrt the probability observed using the

uniform CwP model Therefore we conclude that for some topics

the placement of links within pages contributes in reducing the

exposure to diverse information In other words users who tend

to click with higher probability the links located towards the be-

ginning of a page have less possibility to read about contrasting

opinions

Finally we analyze how the phenomenon changes when we use

the click CwP model (third row) In this case we assume readers

make the next-click choice similarly to past users Going back to

evolution we immediately observe that the probability to start a

session in creationism and to continue reading about it after one

click grows from 576 of the uniform model to the 1157

For all the topics we verify a significant increment of the proba-

bility to visit pages of the same opinion Simply interpreting this

result we can say that real users click more the links strictly related

to the page they are reading From another perspective combining

this finding with the previous remark saying that the ldquotopology of

the network seems to drive users to explore knowledge bubbles ofhomogeneous viewrdquo we ask the reader the following question Hasthe behavior of past users been influenced by the network topologyUnfortunately because of lack of information we can not answer

this question but we hope it will be addressed in future works

Furthermore we observe that for some topics like abortion theprobability of reaching pro-choice articles from pro-life duplicatesThis is a sign that users may be willing to explore content proposing

diverse view

Before moving to the next section we want to underline that

stronger relations among pages of similar content is an intrinsic

property of Wikipedia In fact in Wikipediarsquos Linking Manual [49]editors are asked to link related content [7 33] Although this is a

fact we believe that it would be valuable to provide to editors met-

rics and tools making them aware of the effect that a newcurrent

links have on usersrsquo exposure to diverse information This is not

meant to alter the core and essential intrinsic property ofWikipedia

rather to avoid this property to become harmful when it prevents

users from accessing diverse content

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

1 3 5 7 9 11 13 15

1

2

e PP

Abortion

1 3 5 7 9 11 13 15

00

05

10Cannabis

1 3 5 7 9 11 13 150

1

2

Guns

1 3 5 7 9 11 13 15

05

10Evolution

1 3 5 7 9 11 13 15

05

10

15

Racism

1 3 5 7 9 11 13 15

05

10

Lgbt

1 3 5 7 9 11 13 1500

05

10

15

e PP

1 3 5 7 9 11 13 15

000

025

050

1 3 5 7 9 11 13 15

1

2

1 3 5 7 9 11 13 15

02

04

06

1 3 5 7 9 11 13 1505

10

15

1 3 5 7 9 11 13 15

05

10

15

1 3 5 7 9 11 13 15

40

60

80

100

1 3 5 7 9 11 13 15

25

50

75

1 3 5 7 9 11 13 1525

50

75

100

1 3 5 7 9 11 13 15

40

60

80

1 3 5 7 9 11 13 15

60

80

100

1 3 5 7 9 11 13 15

60

80

100

Number of Clicks

uniform alpha=0position alpha=0clicks alpha=0

uniform alpha=02position alpha=02clicks alpha=02

uniform alpha=1position alpha=1clicks alpha=1

Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42

62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions

longer than 1 and see how their (mutual) exposure to diversity may

change According to the combinations of models employed to mea-

sure the ExDIN and M-ExDIN we provide different insights about

the effect of the current networkrsquos topology on usersrsquo exposure to

diverse content over a navigation session

In Figure 10 for sessions of length 15 we plot the ExDIN from

119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can

notice that each of the topics shows its own trends For this reason

we decide to highlight and provide an explanation for the most

recurrent patterns Moreover for better understanding we suggest

to cross-check the following explanations with the analysis done

above in the paper We start describing how given the current net-

workrsquos topology the ExDIN changes over the course of a navigation

session (first and second row)

(1) The curves corresponding to the same value of 120572 (same color)

show very similar trends Depending on the respective CwP model

(markerrsquos shape) they are shifted up or down This implies that

when users share the same navigation behavior the way they make

the next-click choice plays a crucial role on determining the mag-

nitude of their exposure to diverse content In general the CwP

model (markersrsquo shapes) corresponding to higher exposure is119872119888

followed by119872119906and119872119901

(2) If users navigate mirroring a star-like behavior (green 120572 = 1)

their exposure to the opposite opinion is steady It can slightly

decrease or increase when the probability of clicking links to the

opposite side becomes higher or lower respectively in the first

iterations This happens because these kind of users are only subject

to the exposure of their starting navigation page So the more links

to the opposite partition they click in the first steps the more their

exposure to diversity decreases and vice-versa

(3) The curves of users who randomly navigate the network (sky

blue 120572 = 0) show two trends For both cases after the first few

clicks the ExDIN is lower than at the beginning of the session

Then it inverts the trend In one case it reaches or exceeds the

starting exposure On the other hand it grows getting steady below

the initial exposure The more the destination partition is connected

to the rest of the graph the more users randomly navigating the

network are able to reach it Sometimes ExDIN from 119875 to 119875 of

LGBT the curves start to decrease after many steps This happens

when the pages within the destination partition have been reached

with high probability

(4) Users characterized by a star-like random navigation (blue

120572 = 02) are exposed to diverse content similarly to users exploring

randomly the network but the ExDIN magnitude is greater because

of the possibility of jumping back to the starting point at each step

Given this observation we analyze the ExDIN of guns With the

current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)

are those characterized by a star-like behavior As soon as the users

navigate in a random fashion this probability drops down This is

because the guns rights partition has a number of in-going edges

that prevent random users who walk away to get back to it We

observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to

the guns control partition is higher or comparable to that of star-like

behaving users The probability of reaching guns control after many

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

steps becomes high because of the higher incoming edges of the

partition

From complementary analysis we observe that for sessions

longer than 2 page visits the topology of the graph is not such

to keep random users within the knowledge bubble they accessed

Indeed for all the topics the probability to visit articles of the same

opinion of the starting page becomes close to 0 (refer to Fig 9 to

cross-check the probabilities at the first click) For users with star-

like behavior the probabilities show slightly descending curves

This demonstrates that they tend to pick pages of the same opinion

at the first iterations Due to space constraints figures picturing

these phenomena could not be displayed

Nowwe compare the ExDIN curves of each topic to understand if

users reading about different opinions have equal chances of visiting

each other We recall that to do it we use the Mutual exposure to

diverse information see Sect 42 For all the topics the mutual

exposure to diverse information is lower than 100 meaning that

for none of them does the network topology provide an equal

exposure across opinions If we consider topics like abortion and

guns the longer the usersrsquo session is the more the network topology

prevents readers to symmetrically explore different opinions

In general if we detect low mutuality the topology of the net-

work favors the exploration of one side of the topic more than the

other Moreover we want to stress that all the comments regarding

users navigating according to the uniform CwP model express the

intrinsic topological exposure of the network On Wikipedia two

scenarios determining this phenomenon may be (1) the knowledge

on the encyclopedia is complete but articles are underlinked [49]

that is there is the content and keywords to become anchors thus

we need a strategy to densify the network ensuringmutual exposure

to diversity (2) the knowledge on the encyclopedia is incomplete

that is there are no words to attach links thus the addition of com-

plementary content may be necessary An in depth investigation of

this conditions may be an interesting future work

7 CONCLUSIONSOur work provides a first analysis to understand how the current

Wikipediarsquos network topology assists readers to explore opposite

stances of polarizing topics spanning over sets of articles We for-

malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles

expressing opposing viewpoints The latter evaluates whether the

relationship across diverse views is symmetric that is whether

the flow and the opportunity to go from one side to the other is

comparable for the two directions

We investigate the phenomenon on six polarizing topics (Sect 6)

In addition we also study the overall usersrsquo topics consumption

Our main findings suggest the following

bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-

ing from both external and internal pages suggest that read-

ers are inclined to seek content about one facet of the topic

Most seem to have bias toward liberal content (see Fig 7)

bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse

contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61

the analysis suggests that users reading about an opinion

have higher chances of continuing to explore articles of sim-

ilar views than of the opposite Furthermore for each of the

topics that we explored the users of one of the two views

had substantially more tendency albeit small to visit pages

of the opposing view than the ones of the other one

bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we

observe that the mutual exposure to diverse information is

never achieved by users navigating completely at random

The better one of the two opinions is connected to the rest of

the network the more the graph nudges users toward that

opinion

bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-

mation of the same inclination diminishes On the other

hand if users explore the network with a star-like behavior

their level of exposure to the same opinion is similar to those

who only do one click

In our study we analyze sets of articles assigned to opinions ac-

cording to editorsrsquo crafted categories [44] Although this approach

represents a solid starting point for analysis it can cause article mis-

classification As future work we plan to investigate a more reliable

classification strategy to improve the accuracy of our analysis an

analysis which should include also the content of the articles along

the line information Secondly the performance of a longitudinal

study with the goal of understanding the dynamics that brought

to the current state of the encyclopediarsquos network would provide

further understanding of the usersrsquo behavior Finally in the light

of our findings we deem crucial to design tools to help editors to

contextualize articles within the network such that they are aware

of the effect of links insertion on users knowledge exploration

The prevalence of bias and polarization is well established in

multiple areas of our life and filter bubbles aggravate this phenom-

enon Understanding better how they manifest in Wikipedia (and

other media) is a crucial first step for finding ways to attenuate it

and our hope is that this work is a step towards this goal

Acknowledges Partially supported by the ERCAdvancedGrant

788893 AMDROMA Algorithmic and Mechanism Design Research

in Online Markets and MIUR PRIN project ALGADIMAR Algo-

rithms Games and Digital Markets

REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004

US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem

[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion

networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep

Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18

[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian

Self-Governance in Action Motivating the Policy Lens In ICWSM

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models

IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation

Journal of machine Learning research 3 Jan (2003) 993ndash1022

[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-

work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web

[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content

on famous persons Journal of the American society for information science andtechnology (2011)

[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter

Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM

[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves

Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter

In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs

A complete longitudinal and multi-language dataset of the Wikipedia link net-

works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607

[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena

Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-

ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media

[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho

2016 Extracting Semantics from Random Walks on Wikipedia Comparing

Learning and Counting Methods In Wiki ICWSM

[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-

ferent topic different traffic How search and navigation interplay on Wikipedia

The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28

[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926

[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web

[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding

news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346

[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers

and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320

[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in

Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72

[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In

Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)

[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Political discourse on social media Echo chambers gate-

keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922

[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-

ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671

[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)

[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-

rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282

[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First

Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media

[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013

Models of human navigation in information networks based on decentralized

search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98

[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki

dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-

trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113

[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and

Andreas Hotho 2019 On the right track Analysing and predicting navigation

success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext

and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web

Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web

[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017

How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50

[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of

source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754

[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-

nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and

moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505

[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my

friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726

[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito

2015 Measuring political polarization Twitter shows the two sides of Venezuela

Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018

Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18

[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving

website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624

[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring

Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval

[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk

about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11

[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019

Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-

fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and

Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141

[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014

The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222

[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom

of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn

Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600

[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013

Computing semantic relatedness from human navigational paths A case study

on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70

[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer

2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)

[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-

works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki

WikipediaManual_of_StyleLinking

[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia

Namespace

[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki

WikipediaNeutral_point_of_view

[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect

[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https

doiorg106084m9figshare1305770v22 (2017)

[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing

wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985

  • Abstract
  • 1 Introduction
  • 2 Related Works
  • 3 Data Collection
    • 31 Topic Induced Networks
    • 32 General Statistics on Topics Networks
      • 4 Metrics
        • 41 Content Consumption
        • 42 Exposure Metrics
          • 5 RQ1 Readers Topic Consumption
            • 51 Pageviews Distribution
            • 52 External or Internal Access to the Topic
            • 53 How Much Readers Navigate Links
              • 6 RQ2 Exposure Across Topic Viewpoints
                • 61 Exposure to Diversity
                • 62 Dynamic Exposure to Diversity
                  • 7 Conclusions
                  • References

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

(a) Star-like (120572 = 1) (b) Star-like Rand Navigation(0 lt

120572 lt 1)

(c) Random Naviga-

tion (0 = 120572 )

Figure 4 Navigation model for different 120572 The green nodesrepresent the starting navigation pages

Overall as we will later see in Section 42 this approach allow us

to investigate how the exposure to diverse information varies for

users who behave differently in terms of navigation session length

(meant as the number of clicks) and next-link choices

Looking deeper at the model

bull When 120572 = 1 Figure 4(a) the model emulates the reader

whose navigation consists in just opening links from the

starting page We call this behavior star-like and basically

consists in opening pages from the starting node With this

kind of exploration readers locally explore articles likely

semantically related to each other [49]

bull For 0 lt 120572 lt 1 Figure 4(b) we simulate two cases (1) readers

open sequential articles and then jump back to the starting

page (2) readers keeps multiple path open The more 120572 is

close to 1 the more users show a star-like behavior Instead

the closer 120572 is to 0 the more users navigate navigate in a

more DFS-oriented fashion Thus readers move randomly

according to the CwP model and from time to times jump

back to the starting page

bull If 120572 = 0 Figure 4(c) the users sequentially clicks links so

each click depends only on the CwP model In this case

especially if related articles are not densely connected the

exploration can lead to articles less related to the starting

page and returning to the origin following hyperlinks may

be difficult

Because Wikipedia does not have a button that allows readers to

go back to the previous page we assume the jumping back action

to consist in clicking the back button of the browser in use until

reaching again the session starting page The restart parameter

indirectly embeds the back-button action which for the absence of

back-links on Wikipedia can not be tracked on the graph

The behaviors replicated through the model recall those de-

scribed in [43 45]

42 Exposure MetricsAt this point we have all the ingredients to define the exposureto diverse information The metrics aim to quantify how much the

network structure allows readers to reach one or multiple sets of

articles To do that we rely on both the CwP and Navigationmodels

The application of the following metrics is not limited to polarizing

topics In fact they can generalize to the analysis of any sets of

nodes in a graph For this reason we adopt a more general notation

in their definition

Pro-

life

Pro-

choi

ce

Proh

ibiit

onAc

tivism

Cont

rol

Righ

ts

Crea

tioni

smEv

ol B

io

Racis

m

Anti-

racis

m

Disc

rimin

atio

nSu

ppor

t

0

5

10

Log(

Page

view

s)

Figure 5 Pageviews distribution For each topic we havea purple and yellow boxplot They represent the average(over all pages in the group 119875 or 119875) number of pageviewsAll the distribution distributions except for abortion are sta-tistically different at confidence level 120572 = 095 The topicsin order are abortion cannabis guns evolution racism andLGBT

Definition 2 (Exposure to diverse information (ExDIN)) Giventwo sets of pages 119875 119875 in 119881 let 120645 ℓ

119875be the vector indicating for each

article the probability of being reached at step ℓ (ℓ ge 1) starting froma random page in 119875 We say that the exposure of 119875 to 119875 is

119890ℓ119875rarr119875

=sum119895 isin119875

Pr(119883 ℓ = 119895) =sum119895 isin119875

120645 ℓ119875 (5)

and describes the probability that a reader in 119875 reaches an arbitrarynode in 119875 at the ℓth click

We employ this metric in two ways

(1) (Topological exposure to diverse information) If ℓ is 1 and

the CwP model is 119872119906(see Sect 412) it only quantifies

the topological property of the network to connect pages

belonging to different sets

(2) (Readersrsquo exposure to diverse information) For any parameter

and model that we pick the metric tells us how the readers

characterized by the CwP and Navigation models change

their exposure to diverse information over a session (ie

sequence of clicks)

Moreover we notice that Definition 2 can be extended tomultiple

sets Consider the case where we want to understand how one set of

nodes 119875 is exposed to three sets of nodes 119876119885 and 119871 To calculate

the ExDIN if we want to know the total exposure to the three sets

we define 119875 = 119876 cup 119885 cup 119871 Otherwise if we want to have the ExDIN

wrt to each set namely 119890119875rarr119876 119890119875rarr119885 119890119875rarr119871 we take 120645ℓ119875and sum

up the probabilities of the nodes within each set

Now that we have a metric to compute the exposure to diverseinformation we want to compare the flows among the sets Thus

we introduce the mutual exposure to diverse information

Definition 3 (Mutual exposure to diverse information (M-ExDIN))Let 119890ℓ

119875rarr119875and 119890ℓ

119875rarr119875be the exposure to diverse information of sets 119875

and 119875 We say that the mutual exposure between the sets is

120598ℓ =min119890ℓ

119875rarr119875 119890ℓ119875rarr119875

max119890ℓ119875rarr119875

119890ℓ119875rarr119875

isin [0 1] (6)

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

00 05 10 15 20 25 of pageview from

opposite partition

Pro-lifePro-choice

ProhibitionActivism

Gun controlGun rights

CreazionismEvolutionary biology

LGBT discriminationLGBT rights

RacismAnti-racism

Figure 6 Percentage of pageviews coming from oppositeside Topics in order from the bottom abortion cannabisguns evolution LGBT racism

If either 119890ℓ119875rarr119875

or 119890ℓ119875rarr119875

is 0 then 120598 = 0

This measure quantifies to what extend the exposure to diverse

information is balanced across 119875 and 119875

The closer 120598 is to 1 the more balanced the probabilities of moving

from one set to the other are In this case the network topology does

not favor connections from one set to the other On the other hand

if 120598 is close to 0 it the network structure tends to favor either the

navigation from 119875 rarr 119875 or from 119875 rarr 119875 On this perspective if we

observe a tendency in the network of facilitating the exploration

from one of the sets to the other we may say that the network

topology is biased toward a direction Thus we can think of using

M-ExDIN to measure the bias in the network wrt two sets of

nodes

Even though the mutual exposure to diverse information cap-

tures the balance among ExDIN of 119875 and 119875 when they are of com-

parable size it may fail if one is much smaller than the other For

instance suppose 119875 is 10 times larger than 119875 then if pages of both

partitions have a similar out-degree distribution one would expect

119890ℓ119875rarr119875

asymp 10 middot 119890ℓ119875rarr119875

and as a result 120598 asymp 01 The same happens if

they have similar in-degree distribution For this reason when we

compute either ExDIN and M-ExDIN we check whether the sizes

of the communities are unbalanced and we proceed as follows If

|119875 | lt |119875 | we define 119875 prime obtained by sampling |119875 | articles from 119875

Thus we use the new set for all computations Because of the ran-

domness of the phenomenon we repeat the measurements multiple

times

5 RQ1 READERSrsquo TOPIC CONSUMPTIONBefore looking into how readers are exposed to diverse content

we investigate how they have consumed each of the six topics

that we concentrate on over the last four years In particular we

collect monthly clickstream data from November 2017 to September

2020 We note that when we count the click views of a page we

consider the average over the number of months the page existed16

Accordingly when computing the occurrences for the transitions

matrix based on clickstream we consider the average clicks of the

link over the number of months it exists In this way we reduce

16Based on the temporal graphs extracted by [11]

the seasonality effect and weight links according to page changes

in terms of hyperlinks

51 Pageviews DistributionTo start our analysis we count the average number of times a

page has been visited over 34 months In Figure 5 we plot the

log-distributions of the pageviews for each topic and opinion By

running a t-test we conclude that for all topics except for abortionthe difference of the means of opinionsrsquo pageviews is statistically

significant for 119901 lt 005 This finding demonstrate that users tend to

visit more pages expressingsupporting one of the two viewpoints

From a networkrsquos perspective to increase the exposure to opposing

opinions it is desirable for pages that are frequently visited to be

well connected to articles expressing opposing opinions

In Figure 6 we break down the pageviews showing how many

of them come from pages of the opposing partition Overall the

fraction of visits from the opposite side is low (below 05) The

category LGBT rights has the highest ratio of visits from LGBT dis-crimination pages about 25 For topics such as guns and abortionthe percentage of visits from opposite partition shows that there

are somewhat fewer visits to pages of a liberal inclination from

articles expressing a more conservative opinion In fact the 028

of visits to pro-choice come from pro-life compared to 06 visits of

pro-life from pro-choice

52 External or Internal Access to the TopicWe now investigate how readers access content about a topic As

introduced in Section 411 from the clickstream data we can com-

pute the RSR which indicates whether a page is accessed more

by external sources or by navigating Wikipedia In Figure 7 we

provide a visualization that depicts the flows of the cumulative

visits from external and internal pages towards the two partitions

Referring to Figure 7(c) the 448 of visualizations come from

internal pages The click stream from internal pages is broken down

to see the proportion of flow towards guns control and gun rightsThe internal views of guns control articles are 34 times more than

those of gun rights We observe that also from external websites

most of the traffic is towards gun control (27 times more than gunrights) Overall the 26 of the total visits to gun related content is

concentrated on gun rightsThe abundance of traffic towards one of the two opinions does

not characterize only the guns topic Indeed among all the topics

the 59ndash74 of visits is accumulated by one partition Moreover

readersrsquo preferences appear consistent among external and internal

accesses that is they both point more towards the same view of

the topic For both internal and external views the distribution

of accesses toward partitions is approximately the same (ie the

percentage of visits from external to 119875 (resp 119875 ) is the same of

from internal to 119875 (resp119875 )) The only exception is evolution whoseexternal visits to creationism is 453 lower than internal accesses

We note that partitions with higher views are not necessarily the

biggest in the topic-induced networks

In general the largest amount of visits to topicsrsquo articles comes

from external pages Particularly only the 236 and 335 of traffic

to evolution and racism is generated by the internal Wikipediarsquos

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll PPPPPrrrrrooooo-----llllliiiiifffffeeeee

PPPPPrrrrrooooo-----ccccchhhhhoooooiiiiiccccceeeee

(a) Abortion

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

PPPPPrrrrrooooohhhhhiiiiibbbbbiiiiiiiiiitttttooooonnnnn

AAAAAccccctttttiiiiivvvvviiiiisssssmmmmm

(b) Cannabis

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

CCCCCooooonnnnntttttrrrrrooooolllll

RRRRRiiiiiggggghhhhhtttttsssss

(c) Guns

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllllCCCCCrrrrreeeeeaaaaatttttiiiiiooooonnnnniiiiisssssmmmmm

EEEEEvvvvvooooolllll BBBBBiiiiiooooo

(d) Evolution

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

RRRRRaaaaaccccciiiiisssssmmmmm

AAAAAnnnnntttttiiiii-----rrrrraaaaaccccciiiiisssssmmmmm

(e) Racism

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

DDDDDiiiiissssscccccrrrrriiiiimmmmmiiiiinnnnnaaaaatttttiiiiiooooonnnnn

SSSSSuuuuuppppppppppooooorrrrrttttt

(f) LGBT

Figure 7 Cumulative Pagesrsquo Traffic Eachplot indicates (1) On the left the cumulative amount of accesses coming from externalweb pages or internal Wikipediarsquos articles (2) The flows of visits from external and internal pages to partitions (3) On theright the cumulative accesses to 119875 and 119875

0 5 10 15 20 25 30Avg(Click Through Rate) X 100

Pro-lifePro-choiceProhibition

ActivismGun controlGun rights

CreazionismEvolutionary biologyLGBT discrimination

LGBT rightsRacism

Anti-racism

Figure 8 Average click-through rate In this plot we reportthe average CTR of pages belonging to the same set 119875 (resp119875) The score indicate the average probability that a linkwithin a page 119901 in 119875 (resp 119875) is clicked Topics in orderfrom the bottom abortion cannabis guns evolution racismLGBT

navigation The same quantity for the remaining topics ranges

ranges between 44 and 47

We point out that readersrsquo consulting articles about abortioncannabis and guns are inclined toward pages conveying liberal

views on the topic Instead it is more complicated to draw inter-

pretations about the remaining topics One explanation may be

that users look for information generally less covered in the public

mainstream debate

53 How Much Readers Navigate LinksOnce readers visit a page they can decide to click any of its links

We want to understand how frequently they do so For that we

compute the average pagesrsquo click-through rate (Sect 411)

We plot this information in Figure 8 Overall we see that the

percentage of access turning into a visit to another page ranges

between 10ndash28 Dimitrov and Lemmerich [14] observed that the

CTR average for the whole Wikipedia is 12 So most of the subset

of pages we consider have a CTR higher than Wikipediarsquos average

The CTR of guns control is the highest (28) the pages about racismfollow with 26 The articles that over the years have generated

less internal traffic are those about evolutionary biology and LGBTrights

Examining pagesrsquo connections we found that those with higher

CTR have more links (the Pearson correlation coefficient of is 052)

Topic macrC(119875)

macrC(119875 rarr 119875) macrC(119875 rarr 119875)

macrC(119875) macrC(119875 rarr 119875)

macrC(119875 rarr 119875)Abortion 6889 9082 5764 7026 8981 4537

Cannabis 7081 9501 3750 6578 9652 1667

Guns 5234 7869 3568 5963 7535 3928

Evolution 7115 8449 6447 7269 9900 5513

Racism 5636 8841 3432 7187 9063 6544

LGBT 6166 8942 525 7242 9252 5917

Table 3 Average of links within pages clicked less than 10times

macrC(119875) is the average percentage of un-clicked hyper-linkswithin pages in 119875

macrC(119875 rarr 119875) is the average percentageof un-clicked links within 119875 pointing to 119875

This suggests that articles having higher out-degree offer more

options to users Presumably because of this users are more likely

to continue the exploration from those articles

In addition we count the number of links clicked fewer than 10

times over the last three years see Table 3 As an example given

a page in creationism on average 7115 of its links have been

clicked fewer than 10 times If we distinguish between references

to creationism and references to evolution readers did not click

the 8449 of links pointing to creationism and the 6447 of those

pointing to evolutionary biology

6 RQ2 EXPOSURE ACROSS TOPICVIEWPOINTS

The main contribution of this paper is to examine to what extent

current Wikipediarsquos topology supports users to explore diverse

facets of polarizing issues In particular we study (1) how readers are

locally exposed to diverse information and (2) how their exposure

to plural opinions may change throughout a navigation session

61 Exposure to DiversityTo evaluate the exposure to diversity induced by the networkrsquos

topology we compute the exposure to diverse information for ℓ = 1

using the uniform CwP model Recalling that ℓ indicates the usersrsquo

session length if we set it equals to 1 we study the exposure to

diversity over one-click sessions

Plots in the first row of Figure 9 show the value of ExDIN for

all the topics when CwP is119872119906

For instance let evolution be the topic we analyze If readers

start uniformly at random from a page about creationism the prob-

ability of visiting an article of the same partition is 576 On the

contrary the chances of entering a page about evolutionary biology

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

Pro-life

Pro-choice

Unifo

rm

456

748133

061 Prohibiiton

Activism

338

038039

008 Control

Rights

444

478161

116 Creationism

Evol Bio

576

327061

018 Racism

Anti-racism

876

8613

142 Discrimination

Support

424

395063

054

Pro-life

Pro-choice

Posit

ion

428

781115

056 Prohibiiton

Activism

332

037037

008 Control

Rights

468

473138

089 Creationism

Evol Bio

553

329054

016 Racism

Anti-racism

892

834129

159 Discrimination

Support

394

356056

042

Pro-

life

Pro-

choi

ce

Pro-life

Pro-choice

Click

s

1274

1357129

123Pr

ohib

iiton

Activ

ism

Prohibiiton

Activism

864

065081

013

Cont

rol

Righ

ts

Control

Rights

1376

1305208

137

Crea

tioni

sm

Evol

Bio

Creationism

Evol Bio

1157

52069

03

Racis

m

Anti-

racis

m

Racism

Anti-racism

2367

1503116

129

Disc

rimin

atio

n

Supp

ort

Discrimination

Support

1107

653066

053

Figure 9 ExDIN Each patch of the matrix is the exposure to diverse information across partitions for example 119875 rarr 119875 119875 rarr 119875 The 119910-axis indicates the source and the 119909-axis is the destination To each row corresponds the exposure to diverse informationcomputed for different CwP Darker colors indicate higher probability of being in the correspondent square in one click

is 018 (32 times smaller) On the other hand readers starting

uniformly at random from an article about evolutionary biologyhave 327 chances of reading pages conveying the same opinion

This probability is 5 times larger than that of visiting creationismpages

It is worth to point out that the current networkrsquos topology not

only nudges users in reading more about the same opinion but

also hinders them to explore diverse content symmetrically Indeed

users reading about evolutionary biology have higher chances of

reading one article about creationism (3 times more) than users from

creationism of reading about evolutionary biology After repeatingthe same analysis for all the topics we realize that the aforemen-

tioned observations hold for most of them Moreover we note that

the probabilities to continue the session reading a page of the same

opinion is greater for one of the two partitions of a given topic

Taken together these measurements highlight that the structure

of the network facilitates users to explore knowledge bubbles ofhomogeneous view and makes the measure of mutual exposure to

diverse information smaller than 1

The findings above report the intrinsic capability of the network

to spur users towards diverse content If we want to combine it

with readersrsquo next-click choice behavior we use the the positionand clicks CwP models instead of the uniform We show the results

in the second and third row of Figure 9

Referring back to evolution we now consider the matrix corre-

sponding to the ExDIN computed using the position CwP model

(second row) We see that if users click with higher chances links

at the top of the page wrt the uniform model the probabilities are

only slightly modified These modest variations are coherent with

linksrsquo placement within pages Figure 2

For a few topics such as guns the linksrsquo position plays a more

significant role worsening the user exposure to diverse information

Indeed in pages about guns control links belonging to the gunsrights partition seem to be mentioned later in the page The con-

sequence is that the probability of reaching an article supporting

guns rights starting uniformly at random from an article in gunscontrol has a 30 drop wrt the probability observed using the

uniform CwP model Therefore we conclude that for some topics

the placement of links within pages contributes in reducing the

exposure to diverse information In other words users who tend

to click with higher probability the links located towards the be-

ginning of a page have less possibility to read about contrasting

opinions

Finally we analyze how the phenomenon changes when we use

the click CwP model (third row) In this case we assume readers

make the next-click choice similarly to past users Going back to

evolution we immediately observe that the probability to start a

session in creationism and to continue reading about it after one

click grows from 576 of the uniform model to the 1157

For all the topics we verify a significant increment of the proba-

bility to visit pages of the same opinion Simply interpreting this

result we can say that real users click more the links strictly related

to the page they are reading From another perspective combining

this finding with the previous remark saying that the ldquotopology of

the network seems to drive users to explore knowledge bubbles ofhomogeneous viewrdquo we ask the reader the following question Hasthe behavior of past users been influenced by the network topologyUnfortunately because of lack of information we can not answer

this question but we hope it will be addressed in future works

Furthermore we observe that for some topics like abortion theprobability of reaching pro-choice articles from pro-life duplicatesThis is a sign that users may be willing to explore content proposing

diverse view

Before moving to the next section we want to underline that

stronger relations among pages of similar content is an intrinsic

property of Wikipedia In fact in Wikipediarsquos Linking Manual [49]editors are asked to link related content [7 33] Although this is a

fact we believe that it would be valuable to provide to editors met-

rics and tools making them aware of the effect that a newcurrent

links have on usersrsquo exposure to diverse information This is not

meant to alter the core and essential intrinsic property ofWikipedia

rather to avoid this property to become harmful when it prevents

users from accessing diverse content

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

1 3 5 7 9 11 13 15

1

2

e PP

Abortion

1 3 5 7 9 11 13 15

00

05

10Cannabis

1 3 5 7 9 11 13 150

1

2

Guns

1 3 5 7 9 11 13 15

05

10Evolution

1 3 5 7 9 11 13 15

05

10

15

Racism

1 3 5 7 9 11 13 15

05

10

Lgbt

1 3 5 7 9 11 13 1500

05

10

15

e PP

1 3 5 7 9 11 13 15

000

025

050

1 3 5 7 9 11 13 15

1

2

1 3 5 7 9 11 13 15

02

04

06

1 3 5 7 9 11 13 1505

10

15

1 3 5 7 9 11 13 15

05

10

15

1 3 5 7 9 11 13 15

40

60

80

100

1 3 5 7 9 11 13 15

25

50

75

1 3 5 7 9 11 13 1525

50

75

100

1 3 5 7 9 11 13 15

40

60

80

1 3 5 7 9 11 13 15

60

80

100

1 3 5 7 9 11 13 15

60

80

100

Number of Clicks

uniform alpha=0position alpha=0clicks alpha=0

uniform alpha=02position alpha=02clicks alpha=02

uniform alpha=1position alpha=1clicks alpha=1

Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42

62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions

longer than 1 and see how their (mutual) exposure to diversity may

change According to the combinations of models employed to mea-

sure the ExDIN and M-ExDIN we provide different insights about

the effect of the current networkrsquos topology on usersrsquo exposure to

diverse content over a navigation session

In Figure 10 for sessions of length 15 we plot the ExDIN from

119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can

notice that each of the topics shows its own trends For this reason

we decide to highlight and provide an explanation for the most

recurrent patterns Moreover for better understanding we suggest

to cross-check the following explanations with the analysis done

above in the paper We start describing how given the current net-

workrsquos topology the ExDIN changes over the course of a navigation

session (first and second row)

(1) The curves corresponding to the same value of 120572 (same color)

show very similar trends Depending on the respective CwP model

(markerrsquos shape) they are shifted up or down This implies that

when users share the same navigation behavior the way they make

the next-click choice plays a crucial role on determining the mag-

nitude of their exposure to diverse content In general the CwP

model (markersrsquo shapes) corresponding to higher exposure is119872119888

followed by119872119906and119872119901

(2) If users navigate mirroring a star-like behavior (green 120572 = 1)

their exposure to the opposite opinion is steady It can slightly

decrease or increase when the probability of clicking links to the

opposite side becomes higher or lower respectively in the first

iterations This happens because these kind of users are only subject

to the exposure of their starting navigation page So the more links

to the opposite partition they click in the first steps the more their

exposure to diversity decreases and vice-versa

(3) The curves of users who randomly navigate the network (sky

blue 120572 = 0) show two trends For both cases after the first few

clicks the ExDIN is lower than at the beginning of the session

Then it inverts the trend In one case it reaches or exceeds the

starting exposure On the other hand it grows getting steady below

the initial exposure The more the destination partition is connected

to the rest of the graph the more users randomly navigating the

network are able to reach it Sometimes ExDIN from 119875 to 119875 of

LGBT the curves start to decrease after many steps This happens

when the pages within the destination partition have been reached

with high probability

(4) Users characterized by a star-like random navigation (blue

120572 = 02) are exposed to diverse content similarly to users exploring

randomly the network but the ExDIN magnitude is greater because

of the possibility of jumping back to the starting point at each step

Given this observation we analyze the ExDIN of guns With the

current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)

are those characterized by a star-like behavior As soon as the users

navigate in a random fashion this probability drops down This is

because the guns rights partition has a number of in-going edges

that prevent random users who walk away to get back to it We

observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to

the guns control partition is higher or comparable to that of star-like

behaving users The probability of reaching guns control after many

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

steps becomes high because of the higher incoming edges of the

partition

From complementary analysis we observe that for sessions

longer than 2 page visits the topology of the graph is not such

to keep random users within the knowledge bubble they accessed

Indeed for all the topics the probability to visit articles of the same

opinion of the starting page becomes close to 0 (refer to Fig 9 to

cross-check the probabilities at the first click) For users with star-

like behavior the probabilities show slightly descending curves

This demonstrates that they tend to pick pages of the same opinion

at the first iterations Due to space constraints figures picturing

these phenomena could not be displayed

Nowwe compare the ExDIN curves of each topic to understand if

users reading about different opinions have equal chances of visiting

each other We recall that to do it we use the Mutual exposure to

diverse information see Sect 42 For all the topics the mutual

exposure to diverse information is lower than 100 meaning that

for none of them does the network topology provide an equal

exposure across opinions If we consider topics like abortion and

guns the longer the usersrsquo session is the more the network topology

prevents readers to symmetrically explore different opinions

In general if we detect low mutuality the topology of the net-

work favors the exploration of one side of the topic more than the

other Moreover we want to stress that all the comments regarding

users navigating according to the uniform CwP model express the

intrinsic topological exposure of the network On Wikipedia two

scenarios determining this phenomenon may be (1) the knowledge

on the encyclopedia is complete but articles are underlinked [49]

that is there is the content and keywords to become anchors thus

we need a strategy to densify the network ensuringmutual exposure

to diversity (2) the knowledge on the encyclopedia is incomplete

that is there are no words to attach links thus the addition of com-

plementary content may be necessary An in depth investigation of

this conditions may be an interesting future work

7 CONCLUSIONSOur work provides a first analysis to understand how the current

Wikipediarsquos network topology assists readers to explore opposite

stances of polarizing topics spanning over sets of articles We for-

malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles

expressing opposing viewpoints The latter evaluates whether the

relationship across diverse views is symmetric that is whether

the flow and the opportunity to go from one side to the other is

comparable for the two directions

We investigate the phenomenon on six polarizing topics (Sect 6)

In addition we also study the overall usersrsquo topics consumption

Our main findings suggest the following

bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-

ing from both external and internal pages suggest that read-

ers are inclined to seek content about one facet of the topic

Most seem to have bias toward liberal content (see Fig 7)

bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse

contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61

the analysis suggests that users reading about an opinion

have higher chances of continuing to explore articles of sim-

ilar views than of the opposite Furthermore for each of the

topics that we explored the users of one of the two views

had substantially more tendency albeit small to visit pages

of the opposing view than the ones of the other one

bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we

observe that the mutual exposure to diverse information is

never achieved by users navigating completely at random

The better one of the two opinions is connected to the rest of

the network the more the graph nudges users toward that

opinion

bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-

mation of the same inclination diminishes On the other

hand if users explore the network with a star-like behavior

their level of exposure to the same opinion is similar to those

who only do one click

In our study we analyze sets of articles assigned to opinions ac-

cording to editorsrsquo crafted categories [44] Although this approach

represents a solid starting point for analysis it can cause article mis-

classification As future work we plan to investigate a more reliable

classification strategy to improve the accuracy of our analysis an

analysis which should include also the content of the articles along

the line information Secondly the performance of a longitudinal

study with the goal of understanding the dynamics that brought

to the current state of the encyclopediarsquos network would provide

further understanding of the usersrsquo behavior Finally in the light

of our findings we deem crucial to design tools to help editors to

contextualize articles within the network such that they are aware

of the effect of links insertion on users knowledge exploration

The prevalence of bias and polarization is well established in

multiple areas of our life and filter bubbles aggravate this phenom-

enon Understanding better how they manifest in Wikipedia (and

other media) is a crucial first step for finding ways to attenuate it

and our hope is that this work is a step towards this goal

Acknowledges Partially supported by the ERCAdvancedGrant

788893 AMDROMA Algorithmic and Mechanism Design Research

in Online Markets and MIUR PRIN project ALGADIMAR Algo-

rithms Games and Digital Markets

REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004

US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem

[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion

networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep

Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18

[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian

Self-Governance in Action Motivating the Policy Lens In ICWSM

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models

IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation

Journal of machine Learning research 3 Jan (2003) 993ndash1022

[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-

work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web

[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content

on famous persons Journal of the American society for information science andtechnology (2011)

[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter

Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM

[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves

Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter

In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs

A complete longitudinal and multi-language dataset of the Wikipedia link net-

works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607

[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena

Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-

ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media

[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho

2016 Extracting Semantics from Random Walks on Wikipedia Comparing

Learning and Counting Methods In Wiki ICWSM

[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-

ferent topic different traffic How search and navigation interplay on Wikipedia

The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28

[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926

[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web

[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding

news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346

[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers

and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320

[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in

Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72

[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In

Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)

[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Political discourse on social media Echo chambers gate-

keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922

[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-

ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671

[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)

[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-

rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282

[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First

Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media

[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013

Models of human navigation in information networks based on decentralized

search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98

[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki

dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-

trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113

[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and

Andreas Hotho 2019 On the right track Analysing and predicting navigation

success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext

and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web

Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web

[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017

How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50

[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of

source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754

[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-

nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and

moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505

[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my

friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726

[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito

2015 Measuring political polarization Twitter shows the two sides of Venezuela

Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018

Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18

[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving

website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624

[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring

Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval

[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk

about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11

[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019

Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-

fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and

Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141

[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014

The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222

[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom

of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn

Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600

[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013

Computing semantic relatedness from human navigational paths A case study

on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70

[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer

2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)

[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-

works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki

WikipediaManual_of_StyleLinking

[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia

Namespace

[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki

WikipediaNeutral_point_of_view

[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect

[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https

doiorg106084m9figshare1305770v22 (2017)

[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing

wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985

  • Abstract
  • 1 Introduction
  • 2 Related Works
  • 3 Data Collection
    • 31 Topic Induced Networks
    • 32 General Statistics on Topics Networks
      • 4 Metrics
        • 41 Content Consumption
        • 42 Exposure Metrics
          • 5 RQ1 Readers Topic Consumption
            • 51 Pageviews Distribution
            • 52 External or Internal Access to the Topic
            • 53 How Much Readers Navigate Links
              • 6 RQ2 Exposure Across Topic Viewpoints
                • 61 Exposure to Diversity
                • 62 Dynamic Exposure to Diversity
                  • 7 Conclusions
                  • References

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

00 05 10 15 20 25 of pageview from

opposite partition

Pro-lifePro-choice

ProhibitionActivism

Gun controlGun rights

CreazionismEvolutionary biology

LGBT discriminationLGBT rights

RacismAnti-racism

Figure 6 Percentage of pageviews coming from oppositeside Topics in order from the bottom abortion cannabisguns evolution LGBT racism

If either 119890ℓ119875rarr119875

or 119890ℓ119875rarr119875

is 0 then 120598 = 0

This measure quantifies to what extend the exposure to diverse

information is balanced across 119875 and 119875

The closer 120598 is to 1 the more balanced the probabilities of moving

from one set to the other are In this case the network topology does

not favor connections from one set to the other On the other hand

if 120598 is close to 0 it the network structure tends to favor either the

navigation from 119875 rarr 119875 or from 119875 rarr 119875 On this perspective if we

observe a tendency in the network of facilitating the exploration

from one of the sets to the other we may say that the network

topology is biased toward a direction Thus we can think of using

M-ExDIN to measure the bias in the network wrt two sets of

nodes

Even though the mutual exposure to diverse information cap-

tures the balance among ExDIN of 119875 and 119875 when they are of com-

parable size it may fail if one is much smaller than the other For

instance suppose 119875 is 10 times larger than 119875 then if pages of both

partitions have a similar out-degree distribution one would expect

119890ℓ119875rarr119875

asymp 10 middot 119890ℓ119875rarr119875

and as a result 120598 asymp 01 The same happens if

they have similar in-degree distribution For this reason when we

compute either ExDIN and M-ExDIN we check whether the sizes

of the communities are unbalanced and we proceed as follows If

|119875 | lt |119875 | we define 119875 prime obtained by sampling |119875 | articles from 119875

Thus we use the new set for all computations Because of the ran-

domness of the phenomenon we repeat the measurements multiple

times

5 RQ1 READERSrsquo TOPIC CONSUMPTIONBefore looking into how readers are exposed to diverse content

we investigate how they have consumed each of the six topics

that we concentrate on over the last four years In particular we

collect monthly clickstream data from November 2017 to September

2020 We note that when we count the click views of a page we

consider the average over the number of months the page existed16

Accordingly when computing the occurrences for the transitions

matrix based on clickstream we consider the average clicks of the

link over the number of months it exists In this way we reduce

16Based on the temporal graphs extracted by [11]

the seasonality effect and weight links according to page changes

in terms of hyperlinks

51 Pageviews DistributionTo start our analysis we count the average number of times a

page has been visited over 34 months In Figure 5 we plot the

log-distributions of the pageviews for each topic and opinion By

running a t-test we conclude that for all topics except for abortionthe difference of the means of opinionsrsquo pageviews is statistically

significant for 119901 lt 005 This finding demonstrate that users tend to

visit more pages expressingsupporting one of the two viewpoints

From a networkrsquos perspective to increase the exposure to opposing

opinions it is desirable for pages that are frequently visited to be

well connected to articles expressing opposing opinions

In Figure 6 we break down the pageviews showing how many

of them come from pages of the opposing partition Overall the

fraction of visits from the opposite side is low (below 05) The

category LGBT rights has the highest ratio of visits from LGBT dis-crimination pages about 25 For topics such as guns and abortionthe percentage of visits from opposite partition shows that there

are somewhat fewer visits to pages of a liberal inclination from

articles expressing a more conservative opinion In fact the 028

of visits to pro-choice come from pro-life compared to 06 visits of

pro-life from pro-choice

52 External or Internal Access to the TopicWe now investigate how readers access content about a topic As

introduced in Section 411 from the clickstream data we can com-

pute the RSR which indicates whether a page is accessed more

by external sources or by navigating Wikipedia In Figure 7 we

provide a visualization that depicts the flows of the cumulative

visits from external and internal pages towards the two partitions

Referring to Figure 7(c) the 448 of visualizations come from

internal pages The click stream from internal pages is broken down

to see the proportion of flow towards guns control and gun rightsThe internal views of guns control articles are 34 times more than

those of gun rights We observe that also from external websites

most of the traffic is towards gun control (27 times more than gunrights) Overall the 26 of the total visits to gun related content is

concentrated on gun rightsThe abundance of traffic towards one of the two opinions does

not characterize only the guns topic Indeed among all the topics

the 59ndash74 of visits is accumulated by one partition Moreover

readersrsquo preferences appear consistent among external and internal

accesses that is they both point more towards the same view of

the topic For both internal and external views the distribution

of accesses toward partitions is approximately the same (ie the

percentage of visits from external to 119875 (resp 119875 ) is the same of

from internal to 119875 (resp119875 )) The only exception is evolution whoseexternal visits to creationism is 453 lower than internal accesses

We note that partitions with higher views are not necessarily the

biggest in the topic-induced networks

In general the largest amount of visits to topicsrsquo articles comes

from external pages Particularly only the 236 and 335 of traffic

to evolution and racism is generated by the internal Wikipediarsquos

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll PPPPPrrrrrooooo-----llllliiiiifffffeeeee

PPPPPrrrrrooooo-----ccccchhhhhoooooiiiiiccccceeeee

(a) Abortion

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

PPPPPrrrrrooooohhhhhiiiiibbbbbiiiiiiiiiitttttooooonnnnn

AAAAAccccctttttiiiiivvvvviiiiisssssmmmmm

(b) Cannabis

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

CCCCCooooonnnnntttttrrrrrooooolllll

RRRRRiiiiiggggghhhhhtttttsssss

(c) Guns

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllllCCCCCrrrrreeeeeaaaaatttttiiiiiooooonnnnniiiiisssssmmmmm

EEEEEvvvvvooooolllll BBBBBiiiiiooooo

(d) Evolution

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

RRRRRaaaaaccccciiiiisssssmmmmm

AAAAAnnnnntttttiiiii-----rrrrraaaaaccccciiiiisssssmmmmm

(e) Racism

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

DDDDDiiiiissssscccccrrrrriiiiimmmmmiiiiinnnnnaaaaatttttiiiiiooooonnnnn

SSSSSuuuuuppppppppppooooorrrrrttttt

(f) LGBT

Figure 7 Cumulative Pagesrsquo Traffic Eachplot indicates (1) On the left the cumulative amount of accesses coming from externalweb pages or internal Wikipediarsquos articles (2) The flows of visits from external and internal pages to partitions (3) On theright the cumulative accesses to 119875 and 119875

0 5 10 15 20 25 30Avg(Click Through Rate) X 100

Pro-lifePro-choiceProhibition

ActivismGun controlGun rights

CreazionismEvolutionary biologyLGBT discrimination

LGBT rightsRacism

Anti-racism

Figure 8 Average click-through rate In this plot we reportthe average CTR of pages belonging to the same set 119875 (resp119875) The score indicate the average probability that a linkwithin a page 119901 in 119875 (resp 119875) is clicked Topics in orderfrom the bottom abortion cannabis guns evolution racismLGBT

navigation The same quantity for the remaining topics ranges

ranges between 44 and 47

We point out that readersrsquo consulting articles about abortioncannabis and guns are inclined toward pages conveying liberal

views on the topic Instead it is more complicated to draw inter-

pretations about the remaining topics One explanation may be

that users look for information generally less covered in the public

mainstream debate

53 How Much Readers Navigate LinksOnce readers visit a page they can decide to click any of its links

We want to understand how frequently they do so For that we

compute the average pagesrsquo click-through rate (Sect 411)

We plot this information in Figure 8 Overall we see that the

percentage of access turning into a visit to another page ranges

between 10ndash28 Dimitrov and Lemmerich [14] observed that the

CTR average for the whole Wikipedia is 12 So most of the subset

of pages we consider have a CTR higher than Wikipediarsquos average

The CTR of guns control is the highest (28) the pages about racismfollow with 26 The articles that over the years have generated

less internal traffic are those about evolutionary biology and LGBTrights

Examining pagesrsquo connections we found that those with higher

CTR have more links (the Pearson correlation coefficient of is 052)

Topic macrC(119875)

macrC(119875 rarr 119875) macrC(119875 rarr 119875)

macrC(119875) macrC(119875 rarr 119875)

macrC(119875 rarr 119875)Abortion 6889 9082 5764 7026 8981 4537

Cannabis 7081 9501 3750 6578 9652 1667

Guns 5234 7869 3568 5963 7535 3928

Evolution 7115 8449 6447 7269 9900 5513

Racism 5636 8841 3432 7187 9063 6544

LGBT 6166 8942 525 7242 9252 5917

Table 3 Average of links within pages clicked less than 10times

macrC(119875) is the average percentage of un-clicked hyper-linkswithin pages in 119875

macrC(119875 rarr 119875) is the average percentageof un-clicked links within 119875 pointing to 119875

This suggests that articles having higher out-degree offer more

options to users Presumably because of this users are more likely

to continue the exploration from those articles

In addition we count the number of links clicked fewer than 10

times over the last three years see Table 3 As an example given

a page in creationism on average 7115 of its links have been

clicked fewer than 10 times If we distinguish between references

to creationism and references to evolution readers did not click

the 8449 of links pointing to creationism and the 6447 of those

pointing to evolutionary biology

6 RQ2 EXPOSURE ACROSS TOPICVIEWPOINTS

The main contribution of this paper is to examine to what extent

current Wikipediarsquos topology supports users to explore diverse

facets of polarizing issues In particular we study (1) how readers are

locally exposed to diverse information and (2) how their exposure

to plural opinions may change throughout a navigation session

61 Exposure to DiversityTo evaluate the exposure to diversity induced by the networkrsquos

topology we compute the exposure to diverse information for ℓ = 1

using the uniform CwP model Recalling that ℓ indicates the usersrsquo

session length if we set it equals to 1 we study the exposure to

diversity over one-click sessions

Plots in the first row of Figure 9 show the value of ExDIN for

all the topics when CwP is119872119906

For instance let evolution be the topic we analyze If readers

start uniformly at random from a page about creationism the prob-

ability of visiting an article of the same partition is 576 On the

contrary the chances of entering a page about evolutionary biology

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

Pro-life

Pro-choice

Unifo

rm

456

748133

061 Prohibiiton

Activism

338

038039

008 Control

Rights

444

478161

116 Creationism

Evol Bio

576

327061

018 Racism

Anti-racism

876

8613

142 Discrimination

Support

424

395063

054

Pro-life

Pro-choice

Posit

ion

428

781115

056 Prohibiiton

Activism

332

037037

008 Control

Rights

468

473138

089 Creationism

Evol Bio

553

329054

016 Racism

Anti-racism

892

834129

159 Discrimination

Support

394

356056

042

Pro-

life

Pro-

choi

ce

Pro-life

Pro-choice

Click

s

1274

1357129

123Pr

ohib

iiton

Activ

ism

Prohibiiton

Activism

864

065081

013

Cont

rol

Righ

ts

Control

Rights

1376

1305208

137

Crea

tioni

sm

Evol

Bio

Creationism

Evol Bio

1157

52069

03

Racis

m

Anti-

racis

m

Racism

Anti-racism

2367

1503116

129

Disc

rimin

atio

n

Supp

ort

Discrimination

Support

1107

653066

053

Figure 9 ExDIN Each patch of the matrix is the exposure to diverse information across partitions for example 119875 rarr 119875 119875 rarr 119875 The 119910-axis indicates the source and the 119909-axis is the destination To each row corresponds the exposure to diverse informationcomputed for different CwP Darker colors indicate higher probability of being in the correspondent square in one click

is 018 (32 times smaller) On the other hand readers starting

uniformly at random from an article about evolutionary biologyhave 327 chances of reading pages conveying the same opinion

This probability is 5 times larger than that of visiting creationismpages

It is worth to point out that the current networkrsquos topology not

only nudges users in reading more about the same opinion but

also hinders them to explore diverse content symmetrically Indeed

users reading about evolutionary biology have higher chances of

reading one article about creationism (3 times more) than users from

creationism of reading about evolutionary biology After repeatingthe same analysis for all the topics we realize that the aforemen-

tioned observations hold for most of them Moreover we note that

the probabilities to continue the session reading a page of the same

opinion is greater for one of the two partitions of a given topic

Taken together these measurements highlight that the structure

of the network facilitates users to explore knowledge bubbles ofhomogeneous view and makes the measure of mutual exposure to

diverse information smaller than 1

The findings above report the intrinsic capability of the network

to spur users towards diverse content If we want to combine it

with readersrsquo next-click choice behavior we use the the positionand clicks CwP models instead of the uniform We show the results

in the second and third row of Figure 9

Referring back to evolution we now consider the matrix corre-

sponding to the ExDIN computed using the position CwP model

(second row) We see that if users click with higher chances links

at the top of the page wrt the uniform model the probabilities are

only slightly modified These modest variations are coherent with

linksrsquo placement within pages Figure 2

For a few topics such as guns the linksrsquo position plays a more

significant role worsening the user exposure to diverse information

Indeed in pages about guns control links belonging to the gunsrights partition seem to be mentioned later in the page The con-

sequence is that the probability of reaching an article supporting

guns rights starting uniformly at random from an article in gunscontrol has a 30 drop wrt the probability observed using the

uniform CwP model Therefore we conclude that for some topics

the placement of links within pages contributes in reducing the

exposure to diverse information In other words users who tend

to click with higher probability the links located towards the be-

ginning of a page have less possibility to read about contrasting

opinions

Finally we analyze how the phenomenon changes when we use

the click CwP model (third row) In this case we assume readers

make the next-click choice similarly to past users Going back to

evolution we immediately observe that the probability to start a

session in creationism and to continue reading about it after one

click grows from 576 of the uniform model to the 1157

For all the topics we verify a significant increment of the proba-

bility to visit pages of the same opinion Simply interpreting this

result we can say that real users click more the links strictly related

to the page they are reading From another perspective combining

this finding with the previous remark saying that the ldquotopology of

the network seems to drive users to explore knowledge bubbles ofhomogeneous viewrdquo we ask the reader the following question Hasthe behavior of past users been influenced by the network topologyUnfortunately because of lack of information we can not answer

this question but we hope it will be addressed in future works

Furthermore we observe that for some topics like abortion theprobability of reaching pro-choice articles from pro-life duplicatesThis is a sign that users may be willing to explore content proposing

diverse view

Before moving to the next section we want to underline that

stronger relations among pages of similar content is an intrinsic

property of Wikipedia In fact in Wikipediarsquos Linking Manual [49]editors are asked to link related content [7 33] Although this is a

fact we believe that it would be valuable to provide to editors met-

rics and tools making them aware of the effect that a newcurrent

links have on usersrsquo exposure to diverse information This is not

meant to alter the core and essential intrinsic property ofWikipedia

rather to avoid this property to become harmful when it prevents

users from accessing diverse content

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

1 3 5 7 9 11 13 15

1

2

e PP

Abortion

1 3 5 7 9 11 13 15

00

05

10Cannabis

1 3 5 7 9 11 13 150

1

2

Guns

1 3 5 7 9 11 13 15

05

10Evolution

1 3 5 7 9 11 13 15

05

10

15

Racism

1 3 5 7 9 11 13 15

05

10

Lgbt

1 3 5 7 9 11 13 1500

05

10

15

e PP

1 3 5 7 9 11 13 15

000

025

050

1 3 5 7 9 11 13 15

1

2

1 3 5 7 9 11 13 15

02

04

06

1 3 5 7 9 11 13 1505

10

15

1 3 5 7 9 11 13 15

05

10

15

1 3 5 7 9 11 13 15

40

60

80

100

1 3 5 7 9 11 13 15

25

50

75

1 3 5 7 9 11 13 1525

50

75

100

1 3 5 7 9 11 13 15

40

60

80

1 3 5 7 9 11 13 15

60

80

100

1 3 5 7 9 11 13 15

60

80

100

Number of Clicks

uniform alpha=0position alpha=0clicks alpha=0

uniform alpha=02position alpha=02clicks alpha=02

uniform alpha=1position alpha=1clicks alpha=1

Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42

62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions

longer than 1 and see how their (mutual) exposure to diversity may

change According to the combinations of models employed to mea-

sure the ExDIN and M-ExDIN we provide different insights about

the effect of the current networkrsquos topology on usersrsquo exposure to

diverse content over a navigation session

In Figure 10 for sessions of length 15 we plot the ExDIN from

119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can

notice that each of the topics shows its own trends For this reason

we decide to highlight and provide an explanation for the most

recurrent patterns Moreover for better understanding we suggest

to cross-check the following explanations with the analysis done

above in the paper We start describing how given the current net-

workrsquos topology the ExDIN changes over the course of a navigation

session (first and second row)

(1) The curves corresponding to the same value of 120572 (same color)

show very similar trends Depending on the respective CwP model

(markerrsquos shape) they are shifted up or down This implies that

when users share the same navigation behavior the way they make

the next-click choice plays a crucial role on determining the mag-

nitude of their exposure to diverse content In general the CwP

model (markersrsquo shapes) corresponding to higher exposure is119872119888

followed by119872119906and119872119901

(2) If users navigate mirroring a star-like behavior (green 120572 = 1)

their exposure to the opposite opinion is steady It can slightly

decrease or increase when the probability of clicking links to the

opposite side becomes higher or lower respectively in the first

iterations This happens because these kind of users are only subject

to the exposure of their starting navigation page So the more links

to the opposite partition they click in the first steps the more their

exposure to diversity decreases and vice-versa

(3) The curves of users who randomly navigate the network (sky

blue 120572 = 0) show two trends For both cases after the first few

clicks the ExDIN is lower than at the beginning of the session

Then it inverts the trend In one case it reaches or exceeds the

starting exposure On the other hand it grows getting steady below

the initial exposure The more the destination partition is connected

to the rest of the graph the more users randomly navigating the

network are able to reach it Sometimes ExDIN from 119875 to 119875 of

LGBT the curves start to decrease after many steps This happens

when the pages within the destination partition have been reached

with high probability

(4) Users characterized by a star-like random navigation (blue

120572 = 02) are exposed to diverse content similarly to users exploring

randomly the network but the ExDIN magnitude is greater because

of the possibility of jumping back to the starting point at each step

Given this observation we analyze the ExDIN of guns With the

current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)

are those characterized by a star-like behavior As soon as the users

navigate in a random fashion this probability drops down This is

because the guns rights partition has a number of in-going edges

that prevent random users who walk away to get back to it We

observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to

the guns control partition is higher or comparable to that of star-like

behaving users The probability of reaching guns control after many

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

steps becomes high because of the higher incoming edges of the

partition

From complementary analysis we observe that for sessions

longer than 2 page visits the topology of the graph is not such

to keep random users within the knowledge bubble they accessed

Indeed for all the topics the probability to visit articles of the same

opinion of the starting page becomes close to 0 (refer to Fig 9 to

cross-check the probabilities at the first click) For users with star-

like behavior the probabilities show slightly descending curves

This demonstrates that they tend to pick pages of the same opinion

at the first iterations Due to space constraints figures picturing

these phenomena could not be displayed

Nowwe compare the ExDIN curves of each topic to understand if

users reading about different opinions have equal chances of visiting

each other We recall that to do it we use the Mutual exposure to

diverse information see Sect 42 For all the topics the mutual

exposure to diverse information is lower than 100 meaning that

for none of them does the network topology provide an equal

exposure across opinions If we consider topics like abortion and

guns the longer the usersrsquo session is the more the network topology

prevents readers to symmetrically explore different opinions

In general if we detect low mutuality the topology of the net-

work favors the exploration of one side of the topic more than the

other Moreover we want to stress that all the comments regarding

users navigating according to the uniform CwP model express the

intrinsic topological exposure of the network On Wikipedia two

scenarios determining this phenomenon may be (1) the knowledge

on the encyclopedia is complete but articles are underlinked [49]

that is there is the content and keywords to become anchors thus

we need a strategy to densify the network ensuringmutual exposure

to diversity (2) the knowledge on the encyclopedia is incomplete

that is there are no words to attach links thus the addition of com-

plementary content may be necessary An in depth investigation of

this conditions may be an interesting future work

7 CONCLUSIONSOur work provides a first analysis to understand how the current

Wikipediarsquos network topology assists readers to explore opposite

stances of polarizing topics spanning over sets of articles We for-

malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles

expressing opposing viewpoints The latter evaluates whether the

relationship across diverse views is symmetric that is whether

the flow and the opportunity to go from one side to the other is

comparable for the two directions

We investigate the phenomenon on six polarizing topics (Sect 6)

In addition we also study the overall usersrsquo topics consumption

Our main findings suggest the following

bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-

ing from both external and internal pages suggest that read-

ers are inclined to seek content about one facet of the topic

Most seem to have bias toward liberal content (see Fig 7)

bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse

contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61

the analysis suggests that users reading about an opinion

have higher chances of continuing to explore articles of sim-

ilar views than of the opposite Furthermore for each of the

topics that we explored the users of one of the two views

had substantially more tendency albeit small to visit pages

of the opposing view than the ones of the other one

bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we

observe that the mutual exposure to diverse information is

never achieved by users navigating completely at random

The better one of the two opinions is connected to the rest of

the network the more the graph nudges users toward that

opinion

bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-

mation of the same inclination diminishes On the other

hand if users explore the network with a star-like behavior

their level of exposure to the same opinion is similar to those

who only do one click

In our study we analyze sets of articles assigned to opinions ac-

cording to editorsrsquo crafted categories [44] Although this approach

represents a solid starting point for analysis it can cause article mis-

classification As future work we plan to investigate a more reliable

classification strategy to improve the accuracy of our analysis an

analysis which should include also the content of the articles along

the line information Secondly the performance of a longitudinal

study with the goal of understanding the dynamics that brought

to the current state of the encyclopediarsquos network would provide

further understanding of the usersrsquo behavior Finally in the light

of our findings we deem crucial to design tools to help editors to

contextualize articles within the network such that they are aware

of the effect of links insertion on users knowledge exploration

The prevalence of bias and polarization is well established in

multiple areas of our life and filter bubbles aggravate this phenom-

enon Understanding better how they manifest in Wikipedia (and

other media) is a crucial first step for finding ways to attenuate it

and our hope is that this work is a step towards this goal

Acknowledges Partially supported by the ERCAdvancedGrant

788893 AMDROMA Algorithmic and Mechanism Design Research

in Online Markets and MIUR PRIN project ALGADIMAR Algo-

rithms Games and Digital Markets

REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004

US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem

[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion

networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep

Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18

[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian

Self-Governance in Action Motivating the Policy Lens In ICWSM

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models

IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation

Journal of machine Learning research 3 Jan (2003) 993ndash1022

[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-

work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web

[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content

on famous persons Journal of the American society for information science andtechnology (2011)

[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter

Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM

[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves

Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter

In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs

A complete longitudinal and multi-language dataset of the Wikipedia link net-

works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607

[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena

Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-

ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media

[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho

2016 Extracting Semantics from Random Walks on Wikipedia Comparing

Learning and Counting Methods In Wiki ICWSM

[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-

ferent topic different traffic How search and navigation interplay on Wikipedia

The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28

[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926

[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web

[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding

news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346

[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers

and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320

[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in

Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72

[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In

Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)

[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Political discourse on social media Echo chambers gate-

keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922

[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-

ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671

[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)

[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-

rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282

[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First

Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media

[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013

Models of human navigation in information networks based on decentralized

search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98

[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki

dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-

trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113

[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and

Andreas Hotho 2019 On the right track Analysing and predicting navigation

success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext

and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web

Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web

[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017

How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50

[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of

source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754

[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-

nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and

moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505

[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my

friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726

[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito

2015 Measuring political polarization Twitter shows the two sides of Venezuela

Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018

Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18

[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving

website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624

[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring

Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval

[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk

about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11

[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019

Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-

fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and

Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141

[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014

The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222

[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom

of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn

Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600

[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013

Computing semantic relatedness from human navigational paths A case study

on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70

[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer

2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)

[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-

works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki

WikipediaManual_of_StyleLinking

[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia

Namespace

[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki

WikipediaNeutral_point_of_view

[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect

[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https

doiorg106084m9figshare1305770v22 (2017)

[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing

wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985

  • Abstract
  • 1 Introduction
  • 2 Related Works
  • 3 Data Collection
    • 31 Topic Induced Networks
    • 32 General Statistics on Topics Networks
      • 4 Metrics
        • 41 Content Consumption
        • 42 Exposure Metrics
          • 5 RQ1 Readers Topic Consumption
            • 51 Pageviews Distribution
            • 52 External or Internal Access to the Topic
            • 53 How Much Readers Navigate Links
              • 6 RQ2 Exposure Across Topic Viewpoints
                • 61 Exposure to Diversity
                • 62 Dynamic Exposure to Diversity
                  • 7 Conclusions
                  • References

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll PPPPPrrrrrooooo-----llllliiiiifffffeeeee

PPPPPrrrrrooooo-----ccccchhhhhoooooiiiiiccccceeeee

(a) Abortion

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

PPPPPrrrrrooooohhhhhiiiiibbbbbiiiiiiiiiitttttooooonnnnn

AAAAAccccctttttiiiiivvvvviiiiisssssmmmmm

(b) Cannabis

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

CCCCCooooonnnnntttttrrrrrooooolllll

RRRRRiiiiiggggghhhhhtttttsssss

(c) Guns

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllllCCCCCrrrrreeeeeaaaaatttttiiiiiooooonnnnniiiiisssssmmmmm

EEEEEvvvvvooooolllll BBBBBiiiiiooooo

(d) Evolution

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

RRRRRaaaaaccccciiiiisssssmmmmm

AAAAAnnnnntttttiiiii-----rrrrraaaaaccccciiiiisssssmmmmm

(e) Racism

EEEEExxxxxttttteeeeerrrrrnnnnnaaaaalllll

IIIIInnnnnttttteeeeerrrrrnnnnnaaaaalllll

DDDDDiiiiissssscccccrrrrriiiiimmmmmiiiiinnnnnaaaaatttttiiiiiooooonnnnn

SSSSSuuuuuppppppppppooooorrrrrttttt

(f) LGBT

Figure 7 Cumulative Pagesrsquo Traffic Eachplot indicates (1) On the left the cumulative amount of accesses coming from externalweb pages or internal Wikipediarsquos articles (2) The flows of visits from external and internal pages to partitions (3) On theright the cumulative accesses to 119875 and 119875

0 5 10 15 20 25 30Avg(Click Through Rate) X 100

Pro-lifePro-choiceProhibition

ActivismGun controlGun rights

CreazionismEvolutionary biologyLGBT discrimination

LGBT rightsRacism

Anti-racism

Figure 8 Average click-through rate In this plot we reportthe average CTR of pages belonging to the same set 119875 (resp119875) The score indicate the average probability that a linkwithin a page 119901 in 119875 (resp 119875) is clicked Topics in orderfrom the bottom abortion cannabis guns evolution racismLGBT

navigation The same quantity for the remaining topics ranges

ranges between 44 and 47

We point out that readersrsquo consulting articles about abortioncannabis and guns are inclined toward pages conveying liberal

views on the topic Instead it is more complicated to draw inter-

pretations about the remaining topics One explanation may be

that users look for information generally less covered in the public

mainstream debate

53 How Much Readers Navigate LinksOnce readers visit a page they can decide to click any of its links

We want to understand how frequently they do so For that we

compute the average pagesrsquo click-through rate (Sect 411)

We plot this information in Figure 8 Overall we see that the

percentage of access turning into a visit to another page ranges

between 10ndash28 Dimitrov and Lemmerich [14] observed that the

CTR average for the whole Wikipedia is 12 So most of the subset

of pages we consider have a CTR higher than Wikipediarsquos average

The CTR of guns control is the highest (28) the pages about racismfollow with 26 The articles that over the years have generated

less internal traffic are those about evolutionary biology and LGBTrights

Examining pagesrsquo connections we found that those with higher

CTR have more links (the Pearson correlation coefficient of is 052)

Topic macrC(119875)

macrC(119875 rarr 119875) macrC(119875 rarr 119875)

macrC(119875) macrC(119875 rarr 119875)

macrC(119875 rarr 119875)Abortion 6889 9082 5764 7026 8981 4537

Cannabis 7081 9501 3750 6578 9652 1667

Guns 5234 7869 3568 5963 7535 3928

Evolution 7115 8449 6447 7269 9900 5513

Racism 5636 8841 3432 7187 9063 6544

LGBT 6166 8942 525 7242 9252 5917

Table 3 Average of links within pages clicked less than 10times

macrC(119875) is the average percentage of un-clicked hyper-linkswithin pages in 119875

macrC(119875 rarr 119875) is the average percentageof un-clicked links within 119875 pointing to 119875

This suggests that articles having higher out-degree offer more

options to users Presumably because of this users are more likely

to continue the exploration from those articles

In addition we count the number of links clicked fewer than 10

times over the last three years see Table 3 As an example given

a page in creationism on average 7115 of its links have been

clicked fewer than 10 times If we distinguish between references

to creationism and references to evolution readers did not click

the 8449 of links pointing to creationism and the 6447 of those

pointing to evolutionary biology

6 RQ2 EXPOSURE ACROSS TOPICVIEWPOINTS

The main contribution of this paper is to examine to what extent

current Wikipediarsquos topology supports users to explore diverse

facets of polarizing issues In particular we study (1) how readers are

locally exposed to diverse information and (2) how their exposure

to plural opinions may change throughout a navigation session

61 Exposure to DiversityTo evaluate the exposure to diversity induced by the networkrsquos

topology we compute the exposure to diverse information for ℓ = 1

using the uniform CwP model Recalling that ℓ indicates the usersrsquo

session length if we set it equals to 1 we study the exposure to

diversity over one-click sessions

Plots in the first row of Figure 9 show the value of ExDIN for

all the topics when CwP is119872119906

For instance let evolution be the topic we analyze If readers

start uniformly at random from a page about creationism the prob-

ability of visiting an article of the same partition is 576 On the

contrary the chances of entering a page about evolutionary biology

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

Pro-life

Pro-choice

Unifo

rm

456

748133

061 Prohibiiton

Activism

338

038039

008 Control

Rights

444

478161

116 Creationism

Evol Bio

576

327061

018 Racism

Anti-racism

876

8613

142 Discrimination

Support

424

395063

054

Pro-life

Pro-choice

Posit

ion

428

781115

056 Prohibiiton

Activism

332

037037

008 Control

Rights

468

473138

089 Creationism

Evol Bio

553

329054

016 Racism

Anti-racism

892

834129

159 Discrimination

Support

394

356056

042

Pro-

life

Pro-

choi

ce

Pro-life

Pro-choice

Click

s

1274

1357129

123Pr

ohib

iiton

Activ

ism

Prohibiiton

Activism

864

065081

013

Cont

rol

Righ

ts

Control

Rights

1376

1305208

137

Crea

tioni

sm

Evol

Bio

Creationism

Evol Bio

1157

52069

03

Racis

m

Anti-

racis

m

Racism

Anti-racism

2367

1503116

129

Disc

rimin

atio

n

Supp

ort

Discrimination

Support

1107

653066

053

Figure 9 ExDIN Each patch of the matrix is the exposure to diverse information across partitions for example 119875 rarr 119875 119875 rarr 119875 The 119910-axis indicates the source and the 119909-axis is the destination To each row corresponds the exposure to diverse informationcomputed for different CwP Darker colors indicate higher probability of being in the correspondent square in one click

is 018 (32 times smaller) On the other hand readers starting

uniformly at random from an article about evolutionary biologyhave 327 chances of reading pages conveying the same opinion

This probability is 5 times larger than that of visiting creationismpages

It is worth to point out that the current networkrsquos topology not

only nudges users in reading more about the same opinion but

also hinders them to explore diverse content symmetrically Indeed

users reading about evolutionary biology have higher chances of

reading one article about creationism (3 times more) than users from

creationism of reading about evolutionary biology After repeatingthe same analysis for all the topics we realize that the aforemen-

tioned observations hold for most of them Moreover we note that

the probabilities to continue the session reading a page of the same

opinion is greater for one of the two partitions of a given topic

Taken together these measurements highlight that the structure

of the network facilitates users to explore knowledge bubbles ofhomogeneous view and makes the measure of mutual exposure to

diverse information smaller than 1

The findings above report the intrinsic capability of the network

to spur users towards diverse content If we want to combine it

with readersrsquo next-click choice behavior we use the the positionand clicks CwP models instead of the uniform We show the results

in the second and third row of Figure 9

Referring back to evolution we now consider the matrix corre-

sponding to the ExDIN computed using the position CwP model

(second row) We see that if users click with higher chances links

at the top of the page wrt the uniform model the probabilities are

only slightly modified These modest variations are coherent with

linksrsquo placement within pages Figure 2

For a few topics such as guns the linksrsquo position plays a more

significant role worsening the user exposure to diverse information

Indeed in pages about guns control links belonging to the gunsrights partition seem to be mentioned later in the page The con-

sequence is that the probability of reaching an article supporting

guns rights starting uniformly at random from an article in gunscontrol has a 30 drop wrt the probability observed using the

uniform CwP model Therefore we conclude that for some topics

the placement of links within pages contributes in reducing the

exposure to diverse information In other words users who tend

to click with higher probability the links located towards the be-

ginning of a page have less possibility to read about contrasting

opinions

Finally we analyze how the phenomenon changes when we use

the click CwP model (third row) In this case we assume readers

make the next-click choice similarly to past users Going back to

evolution we immediately observe that the probability to start a

session in creationism and to continue reading about it after one

click grows from 576 of the uniform model to the 1157

For all the topics we verify a significant increment of the proba-

bility to visit pages of the same opinion Simply interpreting this

result we can say that real users click more the links strictly related

to the page they are reading From another perspective combining

this finding with the previous remark saying that the ldquotopology of

the network seems to drive users to explore knowledge bubbles ofhomogeneous viewrdquo we ask the reader the following question Hasthe behavior of past users been influenced by the network topologyUnfortunately because of lack of information we can not answer

this question but we hope it will be addressed in future works

Furthermore we observe that for some topics like abortion theprobability of reaching pro-choice articles from pro-life duplicatesThis is a sign that users may be willing to explore content proposing

diverse view

Before moving to the next section we want to underline that

stronger relations among pages of similar content is an intrinsic

property of Wikipedia In fact in Wikipediarsquos Linking Manual [49]editors are asked to link related content [7 33] Although this is a

fact we believe that it would be valuable to provide to editors met-

rics and tools making them aware of the effect that a newcurrent

links have on usersrsquo exposure to diverse information This is not

meant to alter the core and essential intrinsic property ofWikipedia

rather to avoid this property to become harmful when it prevents

users from accessing diverse content

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

1 3 5 7 9 11 13 15

1

2

e PP

Abortion

1 3 5 7 9 11 13 15

00

05

10Cannabis

1 3 5 7 9 11 13 150

1

2

Guns

1 3 5 7 9 11 13 15

05

10Evolution

1 3 5 7 9 11 13 15

05

10

15

Racism

1 3 5 7 9 11 13 15

05

10

Lgbt

1 3 5 7 9 11 13 1500

05

10

15

e PP

1 3 5 7 9 11 13 15

000

025

050

1 3 5 7 9 11 13 15

1

2

1 3 5 7 9 11 13 15

02

04

06

1 3 5 7 9 11 13 1505

10

15

1 3 5 7 9 11 13 15

05

10

15

1 3 5 7 9 11 13 15

40

60

80

100

1 3 5 7 9 11 13 15

25

50

75

1 3 5 7 9 11 13 1525

50

75

100

1 3 5 7 9 11 13 15

40

60

80

1 3 5 7 9 11 13 15

60

80

100

1 3 5 7 9 11 13 15

60

80

100

Number of Clicks

uniform alpha=0position alpha=0clicks alpha=0

uniform alpha=02position alpha=02clicks alpha=02

uniform alpha=1position alpha=1clicks alpha=1

Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42

62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions

longer than 1 and see how their (mutual) exposure to diversity may

change According to the combinations of models employed to mea-

sure the ExDIN and M-ExDIN we provide different insights about

the effect of the current networkrsquos topology on usersrsquo exposure to

diverse content over a navigation session

In Figure 10 for sessions of length 15 we plot the ExDIN from

119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can

notice that each of the topics shows its own trends For this reason

we decide to highlight and provide an explanation for the most

recurrent patterns Moreover for better understanding we suggest

to cross-check the following explanations with the analysis done

above in the paper We start describing how given the current net-

workrsquos topology the ExDIN changes over the course of a navigation

session (first and second row)

(1) The curves corresponding to the same value of 120572 (same color)

show very similar trends Depending on the respective CwP model

(markerrsquos shape) they are shifted up or down This implies that

when users share the same navigation behavior the way they make

the next-click choice plays a crucial role on determining the mag-

nitude of their exposure to diverse content In general the CwP

model (markersrsquo shapes) corresponding to higher exposure is119872119888

followed by119872119906and119872119901

(2) If users navigate mirroring a star-like behavior (green 120572 = 1)

their exposure to the opposite opinion is steady It can slightly

decrease or increase when the probability of clicking links to the

opposite side becomes higher or lower respectively in the first

iterations This happens because these kind of users are only subject

to the exposure of their starting navigation page So the more links

to the opposite partition they click in the first steps the more their

exposure to diversity decreases and vice-versa

(3) The curves of users who randomly navigate the network (sky

blue 120572 = 0) show two trends For both cases after the first few

clicks the ExDIN is lower than at the beginning of the session

Then it inverts the trend In one case it reaches or exceeds the

starting exposure On the other hand it grows getting steady below

the initial exposure The more the destination partition is connected

to the rest of the graph the more users randomly navigating the

network are able to reach it Sometimes ExDIN from 119875 to 119875 of

LGBT the curves start to decrease after many steps This happens

when the pages within the destination partition have been reached

with high probability

(4) Users characterized by a star-like random navigation (blue

120572 = 02) are exposed to diverse content similarly to users exploring

randomly the network but the ExDIN magnitude is greater because

of the possibility of jumping back to the starting point at each step

Given this observation we analyze the ExDIN of guns With the

current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)

are those characterized by a star-like behavior As soon as the users

navigate in a random fashion this probability drops down This is

because the guns rights partition has a number of in-going edges

that prevent random users who walk away to get back to it We

observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to

the guns control partition is higher or comparable to that of star-like

behaving users The probability of reaching guns control after many

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

steps becomes high because of the higher incoming edges of the

partition

From complementary analysis we observe that for sessions

longer than 2 page visits the topology of the graph is not such

to keep random users within the knowledge bubble they accessed

Indeed for all the topics the probability to visit articles of the same

opinion of the starting page becomes close to 0 (refer to Fig 9 to

cross-check the probabilities at the first click) For users with star-

like behavior the probabilities show slightly descending curves

This demonstrates that they tend to pick pages of the same opinion

at the first iterations Due to space constraints figures picturing

these phenomena could not be displayed

Nowwe compare the ExDIN curves of each topic to understand if

users reading about different opinions have equal chances of visiting

each other We recall that to do it we use the Mutual exposure to

diverse information see Sect 42 For all the topics the mutual

exposure to diverse information is lower than 100 meaning that

for none of them does the network topology provide an equal

exposure across opinions If we consider topics like abortion and

guns the longer the usersrsquo session is the more the network topology

prevents readers to symmetrically explore different opinions

In general if we detect low mutuality the topology of the net-

work favors the exploration of one side of the topic more than the

other Moreover we want to stress that all the comments regarding

users navigating according to the uniform CwP model express the

intrinsic topological exposure of the network On Wikipedia two

scenarios determining this phenomenon may be (1) the knowledge

on the encyclopedia is complete but articles are underlinked [49]

that is there is the content and keywords to become anchors thus

we need a strategy to densify the network ensuringmutual exposure

to diversity (2) the knowledge on the encyclopedia is incomplete

that is there are no words to attach links thus the addition of com-

plementary content may be necessary An in depth investigation of

this conditions may be an interesting future work

7 CONCLUSIONSOur work provides a first analysis to understand how the current

Wikipediarsquos network topology assists readers to explore opposite

stances of polarizing topics spanning over sets of articles We for-

malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles

expressing opposing viewpoints The latter evaluates whether the

relationship across diverse views is symmetric that is whether

the flow and the opportunity to go from one side to the other is

comparable for the two directions

We investigate the phenomenon on six polarizing topics (Sect 6)

In addition we also study the overall usersrsquo topics consumption

Our main findings suggest the following

bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-

ing from both external and internal pages suggest that read-

ers are inclined to seek content about one facet of the topic

Most seem to have bias toward liberal content (see Fig 7)

bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse

contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61

the analysis suggests that users reading about an opinion

have higher chances of continuing to explore articles of sim-

ilar views than of the opposite Furthermore for each of the

topics that we explored the users of one of the two views

had substantially more tendency albeit small to visit pages

of the opposing view than the ones of the other one

bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we

observe that the mutual exposure to diverse information is

never achieved by users navigating completely at random

The better one of the two opinions is connected to the rest of

the network the more the graph nudges users toward that

opinion

bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-

mation of the same inclination diminishes On the other

hand if users explore the network with a star-like behavior

their level of exposure to the same opinion is similar to those

who only do one click

In our study we analyze sets of articles assigned to opinions ac-

cording to editorsrsquo crafted categories [44] Although this approach

represents a solid starting point for analysis it can cause article mis-

classification As future work we plan to investigate a more reliable

classification strategy to improve the accuracy of our analysis an

analysis which should include also the content of the articles along

the line information Secondly the performance of a longitudinal

study with the goal of understanding the dynamics that brought

to the current state of the encyclopediarsquos network would provide

further understanding of the usersrsquo behavior Finally in the light

of our findings we deem crucial to design tools to help editors to

contextualize articles within the network such that they are aware

of the effect of links insertion on users knowledge exploration

The prevalence of bias and polarization is well established in

multiple areas of our life and filter bubbles aggravate this phenom-

enon Understanding better how they manifest in Wikipedia (and

other media) is a crucial first step for finding ways to attenuate it

and our hope is that this work is a step towards this goal

Acknowledges Partially supported by the ERCAdvancedGrant

788893 AMDROMA Algorithmic and Mechanism Design Research

in Online Markets and MIUR PRIN project ALGADIMAR Algo-

rithms Games and Digital Markets

REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004

US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem

[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion

networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep

Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18

[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian

Self-Governance in Action Motivating the Policy Lens In ICWSM

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models

IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation

Journal of machine Learning research 3 Jan (2003) 993ndash1022

[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-

work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web

[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content

on famous persons Journal of the American society for information science andtechnology (2011)

[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter

Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM

[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves

Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter

In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs

A complete longitudinal and multi-language dataset of the Wikipedia link net-

works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607

[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena

Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-

ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media

[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho

2016 Extracting Semantics from Random Walks on Wikipedia Comparing

Learning and Counting Methods In Wiki ICWSM

[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-

ferent topic different traffic How search and navigation interplay on Wikipedia

The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28

[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926

[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web

[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding

news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346

[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers

and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320

[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in

Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72

[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In

Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)

[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Political discourse on social media Echo chambers gate-

keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922

[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-

ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671

[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)

[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-

rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282

[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First

Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media

[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013

Models of human navigation in information networks based on decentralized

search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98

[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki

dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-

trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113

[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and

Andreas Hotho 2019 On the right track Analysing and predicting navigation

success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext

and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web

Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web

[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017

How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50

[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of

source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754

[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-

nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and

moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505

[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my

friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726

[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito

2015 Measuring political polarization Twitter shows the two sides of Venezuela

Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018

Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18

[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving

website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624

[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring

Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval

[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk

about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11

[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019

Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-

fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and

Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141

[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014

The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222

[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom

of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn

Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600

[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013

Computing semantic relatedness from human navigational paths A case study

on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70

[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer

2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)

[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-

works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki

WikipediaManual_of_StyleLinking

[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia

Namespace

[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki

WikipediaNeutral_point_of_view

[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect

[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https

doiorg106084m9figshare1305770v22 (2017)

[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing

wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985

  • Abstract
  • 1 Introduction
  • 2 Related Works
  • 3 Data Collection
    • 31 Topic Induced Networks
    • 32 General Statistics on Topics Networks
      • 4 Metrics
        • 41 Content Consumption
        • 42 Exposure Metrics
          • 5 RQ1 Readers Topic Consumption
            • 51 Pageviews Distribution
            • 52 External or Internal Access to the Topic
            • 53 How Much Readers Navigate Links
              • 6 RQ2 Exposure Across Topic Viewpoints
                • 61 Exposure to Diversity
                • 62 Dynamic Exposure to Diversity
                  • 7 Conclusions
                  • References

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

Pro-life

Pro-choice

Unifo

rm

456

748133

061 Prohibiiton

Activism

338

038039

008 Control

Rights

444

478161

116 Creationism

Evol Bio

576

327061

018 Racism

Anti-racism

876

8613

142 Discrimination

Support

424

395063

054

Pro-life

Pro-choice

Posit

ion

428

781115

056 Prohibiiton

Activism

332

037037

008 Control

Rights

468

473138

089 Creationism

Evol Bio

553

329054

016 Racism

Anti-racism

892

834129

159 Discrimination

Support

394

356056

042

Pro-

life

Pro-

choi

ce

Pro-life

Pro-choice

Click

s

1274

1357129

123Pr

ohib

iiton

Activ

ism

Prohibiiton

Activism

864

065081

013

Cont

rol

Righ

ts

Control

Rights

1376

1305208

137

Crea

tioni

sm

Evol

Bio

Creationism

Evol Bio

1157

52069

03

Racis

m

Anti-

racis

m

Racism

Anti-racism

2367

1503116

129

Disc

rimin

atio

n

Supp

ort

Discrimination

Support

1107

653066

053

Figure 9 ExDIN Each patch of the matrix is the exposure to diverse information across partitions for example 119875 rarr 119875 119875 rarr 119875 The 119910-axis indicates the source and the 119909-axis is the destination To each row corresponds the exposure to diverse informationcomputed for different CwP Darker colors indicate higher probability of being in the correspondent square in one click

is 018 (32 times smaller) On the other hand readers starting

uniformly at random from an article about evolutionary biologyhave 327 chances of reading pages conveying the same opinion

This probability is 5 times larger than that of visiting creationismpages

It is worth to point out that the current networkrsquos topology not

only nudges users in reading more about the same opinion but

also hinders them to explore diverse content symmetrically Indeed

users reading about evolutionary biology have higher chances of

reading one article about creationism (3 times more) than users from

creationism of reading about evolutionary biology After repeatingthe same analysis for all the topics we realize that the aforemen-

tioned observations hold for most of them Moreover we note that

the probabilities to continue the session reading a page of the same

opinion is greater for one of the two partitions of a given topic

Taken together these measurements highlight that the structure

of the network facilitates users to explore knowledge bubbles ofhomogeneous view and makes the measure of mutual exposure to

diverse information smaller than 1

The findings above report the intrinsic capability of the network

to spur users towards diverse content If we want to combine it

with readersrsquo next-click choice behavior we use the the positionand clicks CwP models instead of the uniform We show the results

in the second and third row of Figure 9

Referring back to evolution we now consider the matrix corre-

sponding to the ExDIN computed using the position CwP model

(second row) We see that if users click with higher chances links

at the top of the page wrt the uniform model the probabilities are

only slightly modified These modest variations are coherent with

linksrsquo placement within pages Figure 2

For a few topics such as guns the linksrsquo position plays a more

significant role worsening the user exposure to diverse information

Indeed in pages about guns control links belonging to the gunsrights partition seem to be mentioned later in the page The con-

sequence is that the probability of reaching an article supporting

guns rights starting uniformly at random from an article in gunscontrol has a 30 drop wrt the probability observed using the

uniform CwP model Therefore we conclude that for some topics

the placement of links within pages contributes in reducing the

exposure to diverse information In other words users who tend

to click with higher probability the links located towards the be-

ginning of a page have less possibility to read about contrasting

opinions

Finally we analyze how the phenomenon changes when we use

the click CwP model (third row) In this case we assume readers

make the next-click choice similarly to past users Going back to

evolution we immediately observe that the probability to start a

session in creationism and to continue reading about it after one

click grows from 576 of the uniform model to the 1157

For all the topics we verify a significant increment of the proba-

bility to visit pages of the same opinion Simply interpreting this

result we can say that real users click more the links strictly related

to the page they are reading From another perspective combining

this finding with the previous remark saying that the ldquotopology of

the network seems to drive users to explore knowledge bubbles ofhomogeneous viewrdquo we ask the reader the following question Hasthe behavior of past users been influenced by the network topologyUnfortunately because of lack of information we can not answer

this question but we hope it will be addressed in future works

Furthermore we observe that for some topics like abortion theprobability of reaching pro-choice articles from pro-life duplicatesThis is a sign that users may be willing to explore content proposing

diverse view

Before moving to the next section we want to underline that

stronger relations among pages of similar content is an intrinsic

property of Wikipedia In fact in Wikipediarsquos Linking Manual [49]editors are asked to link related content [7 33] Although this is a

fact we believe that it would be valuable to provide to editors met-

rics and tools making them aware of the effect that a newcurrent

links have on usersrsquo exposure to diverse information This is not

meant to alter the core and essential intrinsic property ofWikipedia

rather to avoid this property to become harmful when it prevents

users from accessing diverse content

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

1 3 5 7 9 11 13 15

1

2

e PP

Abortion

1 3 5 7 9 11 13 15

00

05

10Cannabis

1 3 5 7 9 11 13 150

1

2

Guns

1 3 5 7 9 11 13 15

05

10Evolution

1 3 5 7 9 11 13 15

05

10

15

Racism

1 3 5 7 9 11 13 15

05

10

Lgbt

1 3 5 7 9 11 13 1500

05

10

15

e PP

1 3 5 7 9 11 13 15

000

025

050

1 3 5 7 9 11 13 15

1

2

1 3 5 7 9 11 13 15

02

04

06

1 3 5 7 9 11 13 1505

10

15

1 3 5 7 9 11 13 15

05

10

15

1 3 5 7 9 11 13 15

40

60

80

100

1 3 5 7 9 11 13 15

25

50

75

1 3 5 7 9 11 13 1525

50

75

100

1 3 5 7 9 11 13 15

40

60

80

1 3 5 7 9 11 13 15

60

80

100

1 3 5 7 9 11 13 15

60

80

100

Number of Clicks

uniform alpha=0position alpha=0clicks alpha=0

uniform alpha=02position alpha=02clicks alpha=02

uniform alpha=1position alpha=1clicks alpha=1

Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42

62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions

longer than 1 and see how their (mutual) exposure to diversity may

change According to the combinations of models employed to mea-

sure the ExDIN and M-ExDIN we provide different insights about

the effect of the current networkrsquos topology on usersrsquo exposure to

diverse content over a navigation session

In Figure 10 for sessions of length 15 we plot the ExDIN from

119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can

notice that each of the topics shows its own trends For this reason

we decide to highlight and provide an explanation for the most

recurrent patterns Moreover for better understanding we suggest

to cross-check the following explanations with the analysis done

above in the paper We start describing how given the current net-

workrsquos topology the ExDIN changes over the course of a navigation

session (first and second row)

(1) The curves corresponding to the same value of 120572 (same color)

show very similar trends Depending on the respective CwP model

(markerrsquos shape) they are shifted up or down This implies that

when users share the same navigation behavior the way they make

the next-click choice plays a crucial role on determining the mag-

nitude of their exposure to diverse content In general the CwP

model (markersrsquo shapes) corresponding to higher exposure is119872119888

followed by119872119906and119872119901

(2) If users navigate mirroring a star-like behavior (green 120572 = 1)

their exposure to the opposite opinion is steady It can slightly

decrease or increase when the probability of clicking links to the

opposite side becomes higher or lower respectively in the first

iterations This happens because these kind of users are only subject

to the exposure of their starting navigation page So the more links

to the opposite partition they click in the first steps the more their

exposure to diversity decreases and vice-versa

(3) The curves of users who randomly navigate the network (sky

blue 120572 = 0) show two trends For both cases after the first few

clicks the ExDIN is lower than at the beginning of the session

Then it inverts the trend In one case it reaches or exceeds the

starting exposure On the other hand it grows getting steady below

the initial exposure The more the destination partition is connected

to the rest of the graph the more users randomly navigating the

network are able to reach it Sometimes ExDIN from 119875 to 119875 of

LGBT the curves start to decrease after many steps This happens

when the pages within the destination partition have been reached

with high probability

(4) Users characterized by a star-like random navigation (blue

120572 = 02) are exposed to diverse content similarly to users exploring

randomly the network but the ExDIN magnitude is greater because

of the possibility of jumping back to the starting point at each step

Given this observation we analyze the ExDIN of guns With the

current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)

are those characterized by a star-like behavior As soon as the users

navigate in a random fashion this probability drops down This is

because the guns rights partition has a number of in-going edges

that prevent random users who walk away to get back to it We

observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to

the guns control partition is higher or comparable to that of star-like

behaving users The probability of reaching guns control after many

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

steps becomes high because of the higher incoming edges of the

partition

From complementary analysis we observe that for sessions

longer than 2 page visits the topology of the graph is not such

to keep random users within the knowledge bubble they accessed

Indeed for all the topics the probability to visit articles of the same

opinion of the starting page becomes close to 0 (refer to Fig 9 to

cross-check the probabilities at the first click) For users with star-

like behavior the probabilities show slightly descending curves

This demonstrates that they tend to pick pages of the same opinion

at the first iterations Due to space constraints figures picturing

these phenomena could not be displayed

Nowwe compare the ExDIN curves of each topic to understand if

users reading about different opinions have equal chances of visiting

each other We recall that to do it we use the Mutual exposure to

diverse information see Sect 42 For all the topics the mutual

exposure to diverse information is lower than 100 meaning that

for none of them does the network topology provide an equal

exposure across opinions If we consider topics like abortion and

guns the longer the usersrsquo session is the more the network topology

prevents readers to symmetrically explore different opinions

In general if we detect low mutuality the topology of the net-

work favors the exploration of one side of the topic more than the

other Moreover we want to stress that all the comments regarding

users navigating according to the uniform CwP model express the

intrinsic topological exposure of the network On Wikipedia two

scenarios determining this phenomenon may be (1) the knowledge

on the encyclopedia is complete but articles are underlinked [49]

that is there is the content and keywords to become anchors thus

we need a strategy to densify the network ensuringmutual exposure

to diversity (2) the knowledge on the encyclopedia is incomplete

that is there are no words to attach links thus the addition of com-

plementary content may be necessary An in depth investigation of

this conditions may be an interesting future work

7 CONCLUSIONSOur work provides a first analysis to understand how the current

Wikipediarsquos network topology assists readers to explore opposite

stances of polarizing topics spanning over sets of articles We for-

malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles

expressing opposing viewpoints The latter evaluates whether the

relationship across diverse views is symmetric that is whether

the flow and the opportunity to go from one side to the other is

comparable for the two directions

We investigate the phenomenon on six polarizing topics (Sect 6)

In addition we also study the overall usersrsquo topics consumption

Our main findings suggest the following

bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-

ing from both external and internal pages suggest that read-

ers are inclined to seek content about one facet of the topic

Most seem to have bias toward liberal content (see Fig 7)

bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse

contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61

the analysis suggests that users reading about an opinion

have higher chances of continuing to explore articles of sim-

ilar views than of the opposite Furthermore for each of the

topics that we explored the users of one of the two views

had substantially more tendency albeit small to visit pages

of the opposing view than the ones of the other one

bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we

observe that the mutual exposure to diverse information is

never achieved by users navigating completely at random

The better one of the two opinions is connected to the rest of

the network the more the graph nudges users toward that

opinion

bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-

mation of the same inclination diminishes On the other

hand if users explore the network with a star-like behavior

their level of exposure to the same opinion is similar to those

who only do one click

In our study we analyze sets of articles assigned to opinions ac-

cording to editorsrsquo crafted categories [44] Although this approach

represents a solid starting point for analysis it can cause article mis-

classification As future work we plan to investigate a more reliable

classification strategy to improve the accuracy of our analysis an

analysis which should include also the content of the articles along

the line information Secondly the performance of a longitudinal

study with the goal of understanding the dynamics that brought

to the current state of the encyclopediarsquos network would provide

further understanding of the usersrsquo behavior Finally in the light

of our findings we deem crucial to design tools to help editors to

contextualize articles within the network such that they are aware

of the effect of links insertion on users knowledge exploration

The prevalence of bias and polarization is well established in

multiple areas of our life and filter bubbles aggravate this phenom-

enon Understanding better how they manifest in Wikipedia (and

other media) is a crucial first step for finding ways to attenuate it

and our hope is that this work is a step towards this goal

Acknowledges Partially supported by the ERCAdvancedGrant

788893 AMDROMA Algorithmic and Mechanism Design Research

in Online Markets and MIUR PRIN project ALGADIMAR Algo-

rithms Games and Digital Markets

REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004

US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem

[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion

networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep

Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18

[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian

Self-Governance in Action Motivating the Policy Lens In ICWSM

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models

IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation

Journal of machine Learning research 3 Jan (2003) 993ndash1022

[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-

work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web

[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content

on famous persons Journal of the American society for information science andtechnology (2011)

[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter

Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM

[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves

Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter

In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs

A complete longitudinal and multi-language dataset of the Wikipedia link net-

works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607

[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena

Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-

ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media

[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho

2016 Extracting Semantics from Random Walks on Wikipedia Comparing

Learning and Counting Methods In Wiki ICWSM

[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-

ferent topic different traffic How search and navigation interplay on Wikipedia

The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28

[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926

[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web

[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding

news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346

[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers

and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320

[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in

Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72

[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In

Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)

[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Political discourse on social media Echo chambers gate-

keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922

[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-

ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671

[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)

[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-

rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282

[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First

Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media

[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013

Models of human navigation in information networks based on decentralized

search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98

[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki

dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-

trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113

[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and

Andreas Hotho 2019 On the right track Analysing and predicting navigation

success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext

and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web

Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web

[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017

How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50

[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of

source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754

[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-

nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and

moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505

[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my

friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726

[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito

2015 Measuring political polarization Twitter shows the two sides of Venezuela

Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018

Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18

[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving

website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624

[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring

Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval

[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk

about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11

[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019

Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-

fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and

Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141

[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014

The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222

[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom

of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn

Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600

[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013

Computing semantic relatedness from human navigational paths A case study

on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70

[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer

2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)

[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-

works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki

WikipediaManual_of_StyleLinking

[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia

Namespace

[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki

WikipediaNeutral_point_of_view

[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect

[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https

doiorg106084m9figshare1305770v22 (2017)

[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing

wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985

  • Abstract
  • 1 Introduction
  • 2 Related Works
  • 3 Data Collection
    • 31 Topic Induced Networks
    • 32 General Statistics on Topics Networks
      • 4 Metrics
        • 41 Content Consumption
        • 42 Exposure Metrics
          • 5 RQ1 Readers Topic Consumption
            • 51 Pageviews Distribution
            • 52 External or Internal Access to the Topic
            • 53 How Much Readers Navigate Links
              • 6 RQ2 Exposure Across Topic Viewpoints
                • 61 Exposure to Diversity
                • 62 Dynamic Exposure to Diversity
                  • 7 Conclusions
                  • References

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

1 3 5 7 9 11 13 15

1

2

e PP

Abortion

1 3 5 7 9 11 13 15

00

05

10Cannabis

1 3 5 7 9 11 13 150

1

2

Guns

1 3 5 7 9 11 13 15

05

10Evolution

1 3 5 7 9 11 13 15

05

10

15

Racism

1 3 5 7 9 11 13 15

05

10

Lgbt

1 3 5 7 9 11 13 1500

05

10

15

e PP

1 3 5 7 9 11 13 15

000

025

050

1 3 5 7 9 11 13 15

1

2

1 3 5 7 9 11 13 15

02

04

06

1 3 5 7 9 11 13 1505

10

15

1 3 5 7 9 11 13 15

05

10

15

1 3 5 7 9 11 13 15

40

60

80

100

1 3 5 7 9 11 13 15

25

50

75

1 3 5 7 9 11 13 1525

50

75

100

1 3 5 7 9 11 13 15

40

60

80

1 3 5 7 9 11 13 15

60

80

100

1 3 5 7 9 11 13 15

60

80

100

Number of Clicks

uniform alpha=0position alpha=0clicks alpha=0

uniform alpha=02position alpha=02clicks alpha=02

uniform alpha=1position alpha=1clicks alpha=1

Figure 10 Dynamic (Mutual)ExDIN for 1 le ℓ le 15 The first and second rows show the probabilities ofmoving across partitionsThe third row indicate the mutual exposure to diverse information Each color correspond to a different level of 120572 the restartparameter The markersrsquo shape indicates the CwP model in use Higher values of the M-ExDIN mirror more symmetric expo-sures between opinions We repeat the computations of the metrics 100 times and report the standard deviations to accountfor the randomness 42

62 Dynamic Exposure to DiversityIn this section we suppose users navigate the network for sessions

longer than 1 and see how their (mutual) exposure to diversity may

change According to the combinations of models employed to mea-

sure the ExDIN and M-ExDIN we provide different insights about

the effect of the current networkrsquos topology on usersrsquo exposure to

diverse content over a navigation session

In Figure 10 for sessions of length 15 we plot the ExDIN from

119875 to 119875 (resp from 119875 to 119875 ) and the respective M-ExDIN We can

notice that each of the topics shows its own trends For this reason

we decide to highlight and provide an explanation for the most

recurrent patterns Moreover for better understanding we suggest

to cross-check the following explanations with the analysis done

above in the paper We start describing how given the current net-

workrsquos topology the ExDIN changes over the course of a navigation

session (first and second row)

(1) The curves corresponding to the same value of 120572 (same color)

show very similar trends Depending on the respective CwP model

(markerrsquos shape) they are shifted up or down This implies that

when users share the same navigation behavior the way they make

the next-click choice plays a crucial role on determining the mag-

nitude of their exposure to diverse content In general the CwP

model (markersrsquo shapes) corresponding to higher exposure is119872119888

followed by119872119906and119872119901

(2) If users navigate mirroring a star-like behavior (green 120572 = 1)

their exposure to the opposite opinion is steady It can slightly

decrease or increase when the probability of clicking links to the

opposite side becomes higher or lower respectively in the first

iterations This happens because these kind of users are only subject

to the exposure of their starting navigation page So the more links

to the opposite partition they click in the first steps the more their

exposure to diversity decreases and vice-versa

(3) The curves of users who randomly navigate the network (sky

blue 120572 = 0) show two trends For both cases after the first few

clicks the ExDIN is lower than at the beginning of the session

Then it inverts the trend In one case it reaches or exceeds the

starting exposure On the other hand it grows getting steady below

the initial exposure The more the destination partition is connected

to the rest of the graph the more users randomly navigating the

network are able to reach it Sometimes ExDIN from 119875 to 119875 of

LGBT the curves start to decrease after many steps This happens

when the pages within the destination partition have been reached

with high probability

(4) Users characterized by a star-like random navigation (blue

120572 = 02) are exposed to diverse content similarly to users exploring

randomly the network but the ExDIN magnitude is greater because

of the possibility of jumping back to the starting point at each step

Given this observation we analyze the ExDIN of guns With the

current networkrsquos topology starting from the guns control partitionthe users with higher exposure to guns rights pages (2 probability)

are those characterized by a star-like behavior As soon as the users

navigate in a random fashion this probability drops down This is

because the guns rights partition has a number of in-going edges

that prevent random users who walk away to get back to it We

observe the opposite for users who start their sessions in guns rightsIndeed for users randomly navigating the graph the exposure to

the guns control partition is higher or comparable to that of star-like

behaving users The probability of reaching guns control after many

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

steps becomes high because of the higher incoming edges of the

partition

From complementary analysis we observe that for sessions

longer than 2 page visits the topology of the graph is not such

to keep random users within the knowledge bubble they accessed

Indeed for all the topics the probability to visit articles of the same

opinion of the starting page becomes close to 0 (refer to Fig 9 to

cross-check the probabilities at the first click) For users with star-

like behavior the probabilities show slightly descending curves

This demonstrates that they tend to pick pages of the same opinion

at the first iterations Due to space constraints figures picturing

these phenomena could not be displayed

Nowwe compare the ExDIN curves of each topic to understand if

users reading about different opinions have equal chances of visiting

each other We recall that to do it we use the Mutual exposure to

diverse information see Sect 42 For all the topics the mutual

exposure to diverse information is lower than 100 meaning that

for none of them does the network topology provide an equal

exposure across opinions If we consider topics like abortion and

guns the longer the usersrsquo session is the more the network topology

prevents readers to symmetrically explore different opinions

In general if we detect low mutuality the topology of the net-

work favors the exploration of one side of the topic more than the

other Moreover we want to stress that all the comments regarding

users navigating according to the uniform CwP model express the

intrinsic topological exposure of the network On Wikipedia two

scenarios determining this phenomenon may be (1) the knowledge

on the encyclopedia is complete but articles are underlinked [49]

that is there is the content and keywords to become anchors thus

we need a strategy to densify the network ensuringmutual exposure

to diversity (2) the knowledge on the encyclopedia is incomplete

that is there are no words to attach links thus the addition of com-

plementary content may be necessary An in depth investigation of

this conditions may be an interesting future work

7 CONCLUSIONSOur work provides a first analysis to understand how the current

Wikipediarsquos network topology assists readers to explore opposite

stances of polarizing topics spanning over sets of articles We for-

malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles

expressing opposing viewpoints The latter evaluates whether the

relationship across diverse views is symmetric that is whether

the flow and the opportunity to go from one side to the other is

comparable for the two directions

We investigate the phenomenon on six polarizing topics (Sect 6)

In addition we also study the overall usersrsquo topics consumption

Our main findings suggest the following

bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-

ing from both external and internal pages suggest that read-

ers are inclined to seek content about one facet of the topic

Most seem to have bias toward liberal content (see Fig 7)

bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse

contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61

the analysis suggests that users reading about an opinion

have higher chances of continuing to explore articles of sim-

ilar views than of the opposite Furthermore for each of the

topics that we explored the users of one of the two views

had substantially more tendency albeit small to visit pages

of the opposing view than the ones of the other one

bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we

observe that the mutual exposure to diverse information is

never achieved by users navigating completely at random

The better one of the two opinions is connected to the rest of

the network the more the graph nudges users toward that

opinion

bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-

mation of the same inclination diminishes On the other

hand if users explore the network with a star-like behavior

their level of exposure to the same opinion is similar to those

who only do one click

In our study we analyze sets of articles assigned to opinions ac-

cording to editorsrsquo crafted categories [44] Although this approach

represents a solid starting point for analysis it can cause article mis-

classification As future work we plan to investigate a more reliable

classification strategy to improve the accuracy of our analysis an

analysis which should include also the content of the articles along

the line information Secondly the performance of a longitudinal

study with the goal of understanding the dynamics that brought

to the current state of the encyclopediarsquos network would provide

further understanding of the usersrsquo behavior Finally in the light

of our findings we deem crucial to design tools to help editors to

contextualize articles within the network such that they are aware

of the effect of links insertion on users knowledge exploration

The prevalence of bias and polarization is well established in

multiple areas of our life and filter bubbles aggravate this phenom-

enon Understanding better how they manifest in Wikipedia (and

other media) is a crucial first step for finding ways to attenuate it

and our hope is that this work is a step towards this goal

Acknowledges Partially supported by the ERCAdvancedGrant

788893 AMDROMA Algorithmic and Mechanism Design Research

in Online Markets and MIUR PRIN project ALGADIMAR Algo-

rithms Games and Digital Markets

REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004

US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem

[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion

networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep

Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18

[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian

Self-Governance in Action Motivating the Policy Lens In ICWSM

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models

IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation

Journal of machine Learning research 3 Jan (2003) 993ndash1022

[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-

work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web

[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content

on famous persons Journal of the American society for information science andtechnology (2011)

[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter

Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM

[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves

Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter

In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs

A complete longitudinal and multi-language dataset of the Wikipedia link net-

works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607

[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena

Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-

ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media

[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho

2016 Extracting Semantics from Random Walks on Wikipedia Comparing

Learning and Counting Methods In Wiki ICWSM

[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-

ferent topic different traffic How search and navigation interplay on Wikipedia

The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28

[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926

[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web

[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding

news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346

[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers

and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320

[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in

Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72

[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In

Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)

[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Political discourse on social media Echo chambers gate-

keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922

[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-

ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671

[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)

[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-

rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282

[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First

Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media

[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013

Models of human navigation in information networks based on decentralized

search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98

[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki

dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-

trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113

[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and

Andreas Hotho 2019 On the right track Analysing and predicting navigation

success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext

and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web

Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web

[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017

How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50

[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of

source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754

[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-

nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and

moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505

[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my

friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726

[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito

2015 Measuring political polarization Twitter shows the two sides of Venezuela

Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018

Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18

[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving

website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624

[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring

Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval

[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk

about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11

[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019

Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-

fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and

Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141

[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014

The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222

[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom

of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn

Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600

[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013

Computing semantic relatedness from human navigational paths A case study

on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70

[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer

2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)

[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-

works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki

WikipediaManual_of_StyleLinking

[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia

Namespace

[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki

WikipediaNeutral_point_of_view

[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect

[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https

doiorg106084m9figshare1305770v22 (2017)

[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing

wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985

  • Abstract
  • 1 Introduction
  • 2 Related Works
  • 3 Data Collection
    • 31 Topic Induced Networks
    • 32 General Statistics on Topics Networks
      • 4 Metrics
        • 41 Content Consumption
        • 42 Exposure Metrics
          • 5 RQ1 Readers Topic Consumption
            • 51 Pageviews Distribution
            • 52 External or Internal Access to the Topic
            • 53 How Much Readers Navigate Links
              • 6 RQ2 Exposure Across Topic Viewpoints
                • 61 Exposure to Diversity
                • 62 Dynamic Exposure to Diversity
                  • 7 Conclusions
                  • References

WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia Cristina Menghini Aris Anagnostopoulos and Eli Upfal

steps becomes high because of the higher incoming edges of the

partition

From complementary analysis we observe that for sessions

longer than 2 page visits the topology of the graph is not such

to keep random users within the knowledge bubble they accessed

Indeed for all the topics the probability to visit articles of the same

opinion of the starting page becomes close to 0 (refer to Fig 9 to

cross-check the probabilities at the first click) For users with star-

like behavior the probabilities show slightly descending curves

This demonstrates that they tend to pick pages of the same opinion

at the first iterations Due to space constraints figures picturing

these phenomena could not be displayed

Nowwe compare the ExDIN curves of each topic to understand if

users reading about different opinions have equal chances of visiting

each other We recall that to do it we use the Mutual exposure to

diverse information see Sect 42 For all the topics the mutual

exposure to diverse information is lower than 100 meaning that

for none of them does the network topology provide an equal

exposure across opinions If we consider topics like abortion and

guns the longer the usersrsquo session is the more the network topology

prevents readers to symmetrically explore different opinions

In general if we detect low mutuality the topology of the net-

work favors the exploration of one side of the topic more than the

other Moreover we want to stress that all the comments regarding

users navigating according to the uniform CwP model express the

intrinsic topological exposure of the network On Wikipedia two

scenarios determining this phenomenon may be (1) the knowledge

on the encyclopedia is complete but articles are underlinked [49]

that is there is the content and keywords to become anchors thus

we need a strategy to densify the network ensuringmutual exposure

to diversity (2) the knowledge on the encyclopedia is incomplete

that is there are no words to attach links thus the addition of com-

plementary content may be necessary An in depth investigation of

this conditions may be an interesting future work

7 CONCLUSIONSOur work provides a first analysis to understand how the current

Wikipediarsquos network topology assists readers to explore opposite

stances of polarizing topics spanning over sets of articles We for-

malize the problem by introducing two metrics the Exposure todiverse information and the Mutual exposure to diverse information(see Sect 42) The former quantifies the ease to jump across articles

expressing opposing viewpoints The latter evaluates whether the

relationship across diverse views is symmetric that is whether

the flow and the opportunity to go from one side to the other is

comparable for the two directions

We investigate the phenomenon on six polarizing topics (Sect 6)

In addition we also study the overall usersrsquo topics consumption

Our main findings suggest the following

bull The traffic on polarizing issues is biased toward oneview of the topic In Section 5 we show that accesses com-

ing from both external and internal pages suggest that read-

ers are inclined to seek content about one facet of the topic

Most seem to have bias toward liberal content (see Fig 7)

bull For sessions of length = 1 the current networkrsquos topol-ogy hinders users to symmetrically explore diverse

contentMoreover on average the probability that thenetwork nudges users to remain in a knowledge bubbleis up to an order of magnitude higher than that of ex-ploring pages of contrasting opinions In Section 61

the analysis suggests that users reading about an opinion

have higher chances of continuing to explore articles of sim-

ilar views than of the opposite Furthermore for each of the

topics that we explored the users of one of the two views

had substantially more tendency albeit small to visit pages

of the opposing view than the ones of the other one

bull For sessions of length gt 1 the networkrsquos topology istypically biased toward one opinion In Section 61 we

observe that the mutual exposure to diverse information is

never achieved by users navigating completely at random

The better one of the two opinions is connected to the rest of

the network the more the graph nudges users toward that

opinion

bull For sessions of length gt 1 the probability of readingabout the same opinion decreases for users browsingaccording to the randomnavigationmodel In Section 62results suggest that after a few clicks the exposure to infor-

mation of the same inclination diminishes On the other

hand if users explore the network with a star-like behavior

their level of exposure to the same opinion is similar to those

who only do one click

In our study we analyze sets of articles assigned to opinions ac-

cording to editorsrsquo crafted categories [44] Although this approach

represents a solid starting point for analysis it can cause article mis-

classification As future work we plan to investigate a more reliable

classification strategy to improve the accuracy of our analysis an

analysis which should include also the content of the articles along

the line information Secondly the performance of a longitudinal

study with the goal of understanding the dynamics that brought

to the current state of the encyclopediarsquos network would provide

further understanding of the usersrsquo behavior Finally in the light

of our findings we deem crucial to design tools to help editors to

contextualize articles within the network such that they are aware

of the effect of links insertion on users knowledge exploration

The prevalence of bias and polarization is well established in

multiple areas of our life and filter bubbles aggravate this phenom-

enon Understanding better how they manifest in Wikipedia (and

other media) is a crucial first step for finding ways to attenuate it

and our hope is that this work is a step towards this goal

Acknowledges Partially supported by the ERCAdvancedGrant

788893 AMDROMA Algorithmic and Mechanism Design Research

in Online Markets and MIUR PRIN project ALGADIMAR Algo-

rithms Games and Digital Markets

REFERENCES[1] Lada A Adamic and Natalie Glance 2005 The political blogosphere and the 2004

US election divided they blog In Proceedings of the WWW-2005 Workshop on theWeblogging Ecosystem

[2] Leman Akoglu 2014 Quantifying political polarity based on bipartite opinion

networks In Eighth International AAAI Conference on Weblogs and Social Media[3] Sumit Asthana and Aaron Halfaker 2018 With few eyes all hoaxes are deep

Proceedings of the ACM on Human-Computer Interaction 2 CSCW (2018) 1ndash18

[4] Ivan Beschastnikh Travis Kriplean and David W McDonald 2008 Wikipedian

Self-Governance in Action Motivating the Policy Lens In ICWSM

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models

IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation

Journal of machine Learning research 3 Jan (2003) 993ndash1022

[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-

work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web

[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content

on famous persons Journal of the American society for information science andtechnology (2011)

[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter

Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM

[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves

Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter

In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs

A complete longitudinal and multi-language dataset of the Wikipedia link net-

works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607

[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena

Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-

ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media

[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho

2016 Extracting Semantics from Random Walks on Wikipedia Comparing

Learning and Counting Methods In Wiki ICWSM

[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-

ferent topic different traffic How search and navigation interplay on Wikipedia

The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28

[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926

[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web

[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding

news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346

[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers

and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320

[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in

Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72

[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In

Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)

[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Political discourse on social media Echo chambers gate-

keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922

[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-

ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671

[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)

[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-

rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282

[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First

Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media

[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013

Models of human navigation in information networks based on decentralized

search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98

[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki

dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-

trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113

[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and

Andreas Hotho 2019 On the right track Analysing and predicting navigation

success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext

and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web

Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web

[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017

How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50

[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of

source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754

[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-

nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and

moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505

[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my

friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726

[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito

2015 Measuring political polarization Twitter shows the two sides of Venezuela

Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018

Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18

[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving

website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624

[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring

Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval

[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk

about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11

[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019

Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-

fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and

Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141

[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014

The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222

[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom

of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn

Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600

[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013

Computing semantic relatedness from human navigational paths A case study

on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70

[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer

2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)

[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-

works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki

WikipediaManual_of_StyleLinking

[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia

Namespace

[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki

WikipediaNeutral_point_of_view

[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect

[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https

doiorg106084m9figshare1305770v22 (2017)

[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing

wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985

  • Abstract
  • 1 Introduction
  • 2 Related Works
  • 3 Data Collection
    • 31 Topic Induced Networks
    • 32 General Statistics on Topics Networks
      • 4 Metrics
        • 41 Content Consumption
        • 42 Exposure Metrics
          • 5 RQ1 Readers Topic Consumption
            • 51 Pageviews Distribution
            • 52 External or Internal Access to the Topic
            • 53 How Much Readers Navigate Links
              • 6 RQ2 Exposure Across Topic Viewpoints
                • 61 Exposure to Diversity
                • 62 Dynamic Exposure to Diversity
                  • 7 Conclusions
                  • References

Auditing Wikipediarsquos Hyperlinks Network on Polarizing Topics WWW rsquo21 April 19ndash23 2021 Ljubljana Slovenia

[5] David Blei Lawrence Carin and David Dunson 2010 Probabilistic topic models

IEEE signal processing magazine 27 6 (2010) 55ndash65[6] DavidMBlei Andrew YNg andMichael I Jordan 2003 Latent dirichlet allocation

Journal of machine Learning research 3 Jan (2003) 993ndash1022

[7] Ulrik Brandes Patrick Kenis Juumlrgen Lerner and Denise Van Raaij 2009 Net-

work analysis of collaboration structure in Wikipedia In Proceedings of the 18thinternational conference on World wide web

[8] Ewa S Callahan and Susan C Herring 2011 Cultural bias in Wikipedia content

on famous persons Journal of the American society for information science andtechnology (2011)

[9] Uthsav Chitra and Christopher Musco 2020 Analyzing the Impact of Filter

Bubbles on Social Network Polarization In Proceedings of the 13th InternationalConference on Web Search and Data Mining ACM

[10] Michael D Conover Jacob Ratkiewicz Matthew Francisco Bruno Gonccedilalves

Filippo Menczer and Alessandro Flammini 2011 Political polarization on twitter

In Fifth international AAAI conference on weblogs and social media[11] Cristian Consonni David Laniado and AlbertoMontresor 2019 WikiLinkGraphs

A complete longitudinal and multi-language dataset of the Wikipedia link net-

works In Proceedings of the International AAAI Conference on Web and SocialMedia Vol 13 598ndash607

[12] Alessandro Cossard Gianmarco De Francisci Morales Kyriaki Kalimeri Yelena

Mejova Daniela Paolotti and Michele Starnini 2020 Falling into the Echo Cham-

ber The Italian Vaccination Debate on Twitter In Proceedings of the InternationalAAAI Conference on Web and Social Media

[13] Alexander Dallmann Thomas Niebler Florian Lemmerich and Andreas Hotho

2016 Extracting Semantics from Random Walks on Wikipedia Comparing

Learning and Counting Methods In Wiki ICWSM

[14] Dimitar Dimitrov and Florian Lemmerich 2019 Democracy and difference Dif-

ferent topic different traffic How search and navigation interplay on Wikipedia

The Journal of Web Science 6 (2019) 67ndash94[15] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2016 Visual positions of links and clicks on wikipedia In Proceedings of the 25thInternational Conference Companion on World Wide Web 27ndash28

[16] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What makes a link successful on wikipedia In Proceedings of the 26thInternational Conference on World Wide Web 917ndash926

[17] Dimitar Dimitrov Philipp Singer Florian Lemmerich and Markus Strohmaier

2017 What Makes a Link Successful on Wikipedia In Proceedings of the 26thInternational Conference on World Wide Web

[18] Besnik Fetahu Katja Markert Wolfgang Nejdl and Avishek Anand 2016 Finding

news citations for wikipedia In Proceedings of the 25th ACM International onConference on Information and Knowledge Management 337ndash346

[19] Seth Flaxman Sharad Goel and Justin M Rao 2016 Filter bubbles echo chambers

and online news consumption Public opinion quarterly 80 S1 (2016) 298ndash320

[20] Andrea Forte Vanesa Larco and Amy Bruckman 2009 Decentralization in

Wikipedia governance Journal of Management Information Systems 26 1 (2009)49ndash72

[21] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2017 Reducing Controversy by Connecting Opposing Views In

Proceedings of the Tenth ACM International Conference on Web Search and DataMining (WSDM rsquo17)

[22] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Political discourse on social media Echo chambers gate-

keepers and the price of bipartisanship In Proceedings of the 2018 World WideWeb Conference 913ndash922

[23] Kiran Garimella Aristides Gionis Nikos Parotsidis and Nikolaj Tatti 2017 Bal-

ancing information exposure in social networks In Advances in Neural Informa-tion Processing Systems 4663ndash4671

[24] Kiran Garimella Gianmarco De Francisci Morales Aristides Gionis and Michael

Mathioudakis 2018 Quantifying controversy on social media ACM Transactionson Social Computing (2018)

[25] Patrick Gildersleve and Taha Yasseri 2018 Inspiration captivation and misdi-

rection Emergent properties in networks of online navigation In InternationalWorkshop on Complex Networks Springer 271ndash282

[26] Eduardo Graells-Garrido Mounia Lalmas and Filippo Menczer 2015 First

Women Second Sex Gender Bias in Wikipedia In Proceedings of the 26th ACMConference on Hypertext amp Social Media

[27] Denis Helic Markus Strohmaier Michael Granitzer and Reinhold Scherer 2013

Models of human navigation in information networks based on decentralized

search In Proceedings of the 24th ACM conference on hypertext and social media89ndash98

[28] Brian Keegan Darren Gergle and Noshir Contractor 2011 Hot off the wiki

dynamics practices and structures in Wikipediarsquos coverage of the Tohoku catas-

trophes In Proceedings of the 7th international symposium on Wikis and opencollaboration 105ndash113

[29] Tobias Koopmann Alexander Dallmann Lena Hettinger Thomas Niebler and

Andreas Hotho 2019 On the right track Analysing and predicting navigation

success in Wikipedia In Proceedings of the 30th ACM Conference on Hypertext

and Social Media 143ndash152[30] Srijan Kumar Robert West and Jure Leskovec 2016 Disinformation on the Web

Impact Characteristics and Detection of Wikipedia Hoaxes In Proceedings ofthe 25th International Conference on World Wide Web

[31] Daniel Lamprecht Kristina Lerman Denis Helic and Markus Strohmaier 2017

How the structure of wikipedia articles influences user navigation New Reviewof Hypermedia and Multimedia 23 1 (2017) 29ndash50

[32] Q Vera Liao and Wai-Tat Fu 2014 Expert voices in echo chambers effects of

source expertise indicators on exposure to diverse opinions In Proceedings of theSIGCHI Conference on Human Factors in Computing Systems 2745ndash2754

[33] Dmitry Lizorkin Olena Medelyan and Maria Grineva 2009 Analysis of commu-

nity structure in Wikipedia In International conference on World wide web[34] Antonis Matakos Evimaria Terzi and Panayiotis Tsaparas 2017 Measuring and

moderating opinion polarization in social networks Data Mining and KnowledgeDiscovery 31 (2017) 1480ndash1505

[35] Antonis Matakos Sijing Tu and Aristides Gionis 2020 Tell me something my

friends do not know diversity maximization in social networks Knowledge andInformation Systems 9 (2020) 3697ndash3726

[36] Alfredo Jose Morales Javier Borondo Juan Carlos Losada and Rosa M Benito

2015 Measuring political polarization Twitter shows the two sides of Venezuela

Chaos An Interdisciplinary Journal of Nonlinear Science 25 3 (2015) 033114[37] Cameron Musco Christopher Musco and Charalampos E Tsourakakis 2018

Minimizing Polarization and Disagreement in Social Networks In Proceedings ofthe 2018 World Wide Web Conference on World Wide Web - WWW rsquo18

[38] Ashwin Paranjape Robert West Leila Zia and Jure Leskovec 2016 Improving

website hyperlink structure using server logs In Proceedings of the Ninth ACMInternational Conference on Web Search and Data Mining 615ndash624

[39] Tiziano Piccardi Michele Catasta Leila Zia and Robert West 2018 Structuring

Wikipedia articles with section recommendations In The 41st International ACMSIGIR Conference on Research amp Development in Information Retrieval

[40] Alessandro Piscopo and Elena Simperl 2019 What we talk about when we talk

about Wikidata quality a literature survey In Proceedings of the 15th InternationalSymposium on Open Collaboration 1ndash11

[41] Miriam Redi Besnik Fetahu Jonathan Morgan and Dario Taraborelli 2019

Citation Needed A Taxonomy and Algorithmic Assessment of Wikipediarsquos Veri-

fiability In The World Wide Web Conference[42] Manoel Horta Ribeiro Raphael Ottoni Robert West Virgiacutelio AF Almeida and

Wagner Meira Jr 2020 Auditing radicalization pathways on youtube In Pro-ceedings of the 2020 Conference on Fairness Accountability and Transparency131ndash141

[43] Aju Thalappillil Scaria Rose Marie Philip Robert West and Jure Leskovec 2014

The last click Why users give up information network navigation In Proceedingsof the 7th ACM international conference on Web search and data mining 213ndash222

[44] Feng Shi Misha Teplitskiy Eamon Duede and James A Evans 2019 The wisdom

of polarized crowds Nature human behaviour (2019)[45] Philipp Singer Florian Lemmerich Robert West Leila Zia Ellery Wulczyn

Markus Strohmaier and Jure Leskovec 2017 Why we read wikipedia In Pro-ceedings of the 26th International Conference on World Wide Web 1591ndash1600

[46] Philipp Singer Thomas Niebler Markus Strohmaier and Andreas Hotho 2013

Computing semantic relatedness from human navigational paths A case study

on Wikipedia In International Journal on Semantic Web and Information Systems9 41ndash70

[47] Claudia Wagner Eduardo Graells-Garrido David Garcia and Filippo Menczer

2016 Women through the glass ceiling gender asymmetries in Wikipedia EPJData Science (2016)

[48] Robert West and Jure Leskovec 2012 Human wayfinding in information net-

works In Proceedings of the 21st international conference on World Wide Web[49] Wikipedia [nd] Manual of StyleLinking In httpsenwikipediaorgwiki

WikipediaManual_of_StyleLinking

[50] Wikipedia [nd] Namespace In httpsenwikipediaorgwikiWikipedia

Namespace

[51] Wikipedia [nd] Neutral Point of View In httpsenwikipediaorgwiki

WikipediaNeutral_point_of_view

[52] Wikipedia [nd] Redirect In httpsenwikipediaorgwikiWikipediaRedirect

[53] Ellery Wulczyn and Dario Taraborelli 2017 Wikipedia Clickstream https

doiorg106084m9figshare1305770v22 (2017)

[54] Ellery Wulczyn Robert West Leila Zia and Jure Leskovec 2016 Growing

wikipedia across languages via recommendation In Proceedings of the 25th Inter-national Conference on World Wide Web 975ndash985

  • Abstract
  • 1 Introduction
  • 2 Related Works
  • 3 Data Collection
    • 31 Topic Induced Networks
    • 32 General Statistics on Topics Networks
      • 4 Metrics
        • 41 Content Consumption
        • 42 Exposure Metrics
          • 5 RQ1 Readers Topic Consumption
            • 51 Pageviews Distribution
            • 52 External or Internal Access to the Topic
            • 53 How Much Readers Navigate Links
              • 6 RQ2 Exposure Across Topic Viewpoints
                • 61 Exposure to Diversity
                • 62 Dynamic Exposure to Diversity
                  • 7 Conclusions
                  • References