Citation analysis for e-government research
-
Upload
independent -
Category
Documents
-
view
2 -
download
0
Transcript of Citation analysis for e-government research
Citation Analysis for e-Government Research Nuša Erman
University of Ljubljana, Faculty of administration Gosarjeva ulica 5,
SI-1000 Ljubljana, Slovenia +386 1 5805 515
ABSTRACT
Various research papers question and analyze the maturity of the
e-government research area and its stance as a scientific
discipline. The common conclusion of these papers is the
exposure of the deficiency in common research methodology in
the EGR field. However, this lack of common methods
characterizes the papers that analyze the field as well. The aim of
this paper is to propose a common methodology for the research
of the EGR field in social network and citation analysis terms.
The paper introduces and describes a data set that allows for such
analysis and outlines the prospects for further research.
Categories and Subject Descriptors
E.1 [Data Structures]: graphs and networks; J.1
[Administrative data processing]: government; J.4 [Social and
behavioral sciences]: sociology;
General Terms
Measurement, Documentation, Theory
Keywords e-government, e-government research, social network analysis,
citation analysis
1. INTRODUCTION e-Government Research (EGR) refers to the study of the use of
modern Information and Communication Technology (ICT) in
government activities and public administration. Research in the
area of e-government started in late 1990’s and has significantly
grown over the past decade. Although an increasing number of
research papers, academic conferences and specialized journals
bear witness to the growth of e-government research area, it is
often claimed that e-government is yet not completely developed,
its study is young and doubts about EGR as a discipline or a field
are still present.
Scholl [11] argues that EGR fails to implement some of the
presumptions that define a discipline. In this sense, the area of
EGR lacks a unifying theory or a scientific shared vision of the
research impact. Grönlund [4, 5] divides the research papers
related to e-government topic in five categories with respect to the
nature of the paper and we can distinguish at least ten various
classes of research, regarding the method used in the paper. Such
a large range of procedures and methods clearly hinders above all
the nature and the quality of contribution. On this basis, the
presumption about the absence of disciplinary allegiance,
promotional pathways, specific terminology, and common
methodology is not completely irrelevant.
With the intention to contribute and to reduce the lack of common
methodology and with the purpose to help promoting the studies
of EGR area, this paper proposes some new aspects in social
network analysis in the EGR field. The present paper does not
deal with analysis of social networks between citizens and
government1, but is rather concerned with social network analysis
from the citation point of view, i.e., analysis of citations between
authors and papers in the e-government research. This will
provide a new insight to the stage of development of EGR area.
The aim of this paper is to introduce citation analysis as the
common research analysis tool, which can in some way substitute
or extend other methods for the research of the growth of the EGR
area. The usage of common methodology will contribute to the
comparability of similar researches, which represents an important
step towards more rigorous study of the EGR area.
The present paper first introduces the map of e-government
research and outlines the main deficiencies of the related works. It
then introduces the citation analysis in terms of statistical and
social network analysis. It goes on with the specification of the
research of EGR field with regard to the papers published in the
proceedings of the International Conference on e-Government
(EGOV) as well as the literature being referenced in those papers.
In the same section it delineates the data description and data
summary. It then goes on with outlining the prospects for further
analysis and concludes with the summary of all ascertainments.
2. MAP OF E-GOVERNMENT RESEARCH Due to the nature of the present paper, in mapping the EGR we
will concentrate primarily on existing research that focus on
analyzing the development of the field of EGR. It is not surprising
that in recent years among several authors, e.g. [4, 5, 11], the
interest in the formal analysis of EGR is growing. The EGR is
now present for several years and many authors putted an effort
into the investigation and representation [5] of the current stage of
the field. On the other hand, it was also shown that the EGR
might be most effective as a multidiscipline and some
argumentations about the reasons for its immaturity were made.
In 2003, Grönlund [4] carried out the research examining the
nature of papers published at three major e-Government
conferences. He introduced a model for measuring the maturity of
the field, which included phases through which the research fields
1 which is also a relevant topic addressed, e.g., in Cotterill and
King [2]
pass in the process of becoming mature. The data set for research
is based mainly on the context of presented papers, which were
coded according to selected categories for assessing rigor and
relevance. He concluded that descriptive papers dominate in the
EGR field and that most papers focus on the Information
Technology itself. He stressed that the field if still immature,
although there were contributions from various disciplines
present. In 2005, Grönlund [5] reiterated his research, but this
time he narrowed the scope of the study to the papers published in
the proceedings of International Conference on e-Government
(EGOV) held in 2005.. He ascertained some progress in the
examined scientific field, as the share of philosophical research
has decreased and the efforts to comply with the research
publication standards have increased. He also noted an enormous
growth in the number of references of the papers, which indicates
better involvement with previous research. The collaboration of
authors from various institutions has increased, and the number of
dubious claims has been reduced. However, some disappointing
facts remain present, since descriptive research clearly increased
in two years (2003-2005) on one hand, but theory testing and
creating has only slightly increased on the other. Finally,
Grönlund summarized that there has been a change in positive
direction and that the field has clearly matured from 2003 to 2005,
as the papers became more rigorous. He also concludes that the
change in founding principles was so enormous, that the applied
model of delineating the growth of the research field cannot be
used any more.
On the basis of Grönlund’s examination, various attempts for
improvements and further formal analysis of the EGR were made.
Scholl [11] concentrated on the examination of the EGR as a
discipline, questioning whether EGR even qualifies as a legitimate
discipline. With his precise specification of several criteria which
define a discipline, comparison of the study of EGR with two
neighboring disciplines and delineation of the challenges and
opportunities for EGR as a cross-discipline, he proposed that it is
possible that EGR wants to avoid being a traditional legitimate
discipline. Because of the fact that the scope of the field is so
wide that the development towards traditional discipline would
signify merely a restriction, EGR should be thriven as a multi-,
inter-, or as a trans-discipline. He concluded that in the field of
EGR the disciplinary allegiance has not yet been found, as the
established rules, procedures and promotional pathways are
absent.
As a respond to the already presented findings about the growth
and limitations in the EGR field, Flak et al. [3] made some further
argumentation on this subject and proposed the necessary
foundations for understanding the basic concepts in the field. As a
reaction to the reproaches that in the EGR field there has been
little theoretical progress with the lack of a cumulative tradition,
they expose the cause for the poor growth of the field. In their
opinion, the main reason is the deficiency of shared understanding
of the basic concepts and entities among scientists in the EGR
field. That is why they propose a fundamental and exact
conceptualization of basic concepts, without which the
comparison of findings and the transfer of knowledge between
different parts of EGR field, as well as a cumulative tradition are
not possible.
According to this brief review of the map of EGR, it is possible to
delineate some aspects, advantages and disadvantages from the
research point of view. Grönlund’s work [4,5] exposed the state
of maturity of the EGR field. In his latter work, he described the
area of EGR as being more mature as it was a couple of years
before that. He also ascertained that the model of phases through
which disciplines pass on their way of growing became unusable
as the changes in a field were so dramatic. Here we are confronted
with a slightly disadvantage situation, as the other works
(especially the work of Flak et al. [3]) clearly pointed out that the
research in e-Government field should have established rules,
procedures, shared understanding and in the addition to all of that
also common research methods and methodology, which will be
applicable in any situation, no matter how huge the changes in the
disciplines are. To address this issue, we propose a citation
analysis as a novel alternative method for analysis of the scientific
field research, which enables the insight into the characteristics,
patterns, growth, and other relevant features of the EGR.
3. CITATION ANALYSIS The citation analysis can be briefly defined as a research of the
frequency, patterns and graphs of citations or references in various
scientific literatures, such as journal articles, books, and papers in
conference proceedings. Citation analysis itself is strongly related
to social network analysis on one hand, and to the statistical
analysis on the other. Over the past few decades, the citation
analysis has been increasingly used to quantify and value the
significance of scientists and scientific research. If social network
analysis offers the methodology to analyze social relations,
citation analysis offers the methodology to analyze relations,
based on citations or references of scientific papers. Although
some authors, e.g. Meho [7], claim that citation analysis evolves
only the counting of times a single scientist or a scientific paper is
cited, citation analysis offers much more. When the desired scope
of the study is formed in the citation analysis sense, many
possibilities for various analyses arise.
According to Nooy et al. [8], citations are a precious and valuable
source of data for the study of scientometrics, history and
sociology of science. Such data enable to study a development of
the science and characteristics of scientific communities, reveal
the impact of papers and their authors on further scientific work
and expose specialities with sharing knowledge.
The citation analysis has been proposed and used for evaluation of
several journals, conferences and research areas, but there is still
no evidence of its use in the field of EGR. Moreover, the
examination of various literature, dealing with the citation
analysis in other research areas, e.g. [6, 1, 9], does not offer to
establish a common methodology for citation analysis in a broad
sense. Most closely related to the approach proposed in this paper
is the study of the publications in the area of inductive logic
programming presented by Sabo et al. [10], which perform
analysis of an actual social network based on bibliography
database. In the next subsections, we present the citation analysis
methodology in terms of social network analysis.
3.1 Defining the citation network The first and the most important step in citation analysis is the
determination of the proper research units (vertices) and the
relations (lines) between them. These must be defined on the
structural level, as the analysis will be primarily concerned with
exploration of the meaningful patterns in the network. Units and
relations between them define a graph, which represents the
structure of a network and can be defined as a set of vertices and a
set of lines between pairs of vertices.
3.2 Describing the citation network When building a data set for the social network analysis, it is
important to gather as much information about the units as
possible. Such a rich data set enables not only to perform the
social network analysis, but also to perform some other analysis
from the statistical point of view that describes the details of the
components of the network. Here we are confronted with the basic
descriptive statistical analysis, which helps us to describe the
network in terms of its constituent units. If the data set is collected
over a longer period of time, we can also observe the dynamic
changes of the outlined characteristics.
3.3 Exploring the citation network After the definition of the research units and relations among
them, description of the units and importation of the collected
data into a proper program for social network analysis, we can
continue with the exploration of the network.
Based on the program for social network analysis, we can gather
some general information about the network, e.g. number of
units, number of directed lines (arcs), number of undirected lines
(edges), and number of loops and density of the network.
Further analysis of the network enables to explore and describe
the patterns and the specialities of the network. We can identify
cohesive subgroups of the network, which are, according to
Wasserman and Faust [12], represented by units, among which
relative strong, direct and frequent connections exist. Cohesive
subgroups in the network can be identified with different
techniques of social network analysis. These techniques are: 1)
finding the components in the network, which represent clusters
of vertices in which any vertex can be reached from all other
vertices from the same cluster, 2) finding the cores, which are
defined as clusters of vertices where each vertex from a cluster is
connected to the fixed number of other vertices from the same
cluster, and 3) finding cliques, which represent a special case of
cores where each vertex from a cluster is connected to all other
vertices from the same cluster.
In directed networks some units receive many positive choices
and these units are considered to be more prestigious as the
others, which means that they are more important. Several
measures can be used to reveal prestigious units in the network: 1)
popularity and is measured by the number of arcs the vertex
receives, 2) input domain of a vertex which refers to the number
or percentage of all other vertices, from which, regarding the
direction of connections, we can reach the selected vertex, and 3)
proximity prestige which is calculated on the basis of the vertex
input domain size divided by the mean distance from all vertices
in the input domain of the vertex.
The unit in the network can appear in various brokerage roles,
namely as a coordinator, itinerant broker, representative,
gatekeeper or as a liaison. In the citation analysis, the brokerage
roles point out the path of exchanging the information or
knowledge between authors or between publications. The concept
of brokerage founds on the notions of centrality and betweenness.
The concept of centrality refers to positions of individual vertices
within the network. The centrality of a vertex can be measured by
a degree centrality and by a closeness centrality. The concept of
betweenness rests on the idea that a unit within a network is more
central if it is more important as an intermediary and is measured
by a betweenness centrality. On the basis of data, gathered over a
longer time period, we can also explore the diffusion. It is defined
as an important social process and represents a special case of
brokerage, which includes the time dimension.
In the case of directed networks, ranking is associated with
asymmetry or hierarchy, meaning that the arcs represent “point-
up” relations and not down. In this respect, the network can be
presented as a set of ranks where each rank contains one or more
clusters. The ranked clusters model represents a simple hierarchy.
4. DATA SET FOR CITATION ANALYSIS
OF E-GOVERNMENT RESEARCH e-Government related papers are being published in various
journals and presented at several conferences worldwide. In this
respect, we decided to begin our analysis of an EGR field with
analyzing the papers published in the seven proceedings of the
International Conference on e-Government (EGOV), which was
held annually since 2002. We collected electronic versions of the
papers published in these proceedings and applied semi-automatic
procedure to build the first data set for citation analysis in the
EGR field. In the continuation of this section, we will first outline
the data set structure and the procedure for extracting data from
papers, and then second, provide descriptive statistics that
summarizes the collected data set.
4.1 Data set description In order to build a data set for citation analysis, we first establish
its structure, presented in Figure 1. Since there is different amount
of information available for the papers published in the
proceedings (full affiliations of authors) and the referenced
literature (only authors and source), we established two separate
entities of EGOV_Paper and Rererence. They both share the
fields Authors, Title, and Year of publication. EGOV_paper entity
that collects data about published paper also includes two
additional fields: countries where the authors come from and
number of references. Reference entity includes two additional
fields: type of publication (e.g., an article, a paper in conference
proceeding, or book) and source that identifies the conference or
journal where the reference comes from.
Figure 1. Structure of the data set for citation analysis
In the following step we gathered data from the electronic
versions (PDF files) of the papers published in the seven
proceedings of the annual EGOV conference. We used a simple
parser to automatically extract data needed for the EGOV-Paper
entity from the front page of the paper (these include data about
authors, countries, title, and year) as well as data needed for the
EGOV_Paper Reference
Authors
Countries
Title
Num_of_refs
Year
Authors
Title
Type
Source
Year
Reference entity from the last section of each paper (these include
reference authors, title, type, source, and year). Once we collect
all the references from a paper, we can also determine the value of
the number of reference field in the EGOV_paper entity.
Note however, that the differences in formatting the front pages
and especially references posed a non-negligible challenges to the
parser. A lot of manual work was needed on data cleansing after
the first phase of automatic data collection. Many manual
corrections were also related to the inconsistencies in referencing
styles used by different authors. To name few of them: some
references enlist authors using full initials (including middle
names) while others do not; some references fully specify source
(journal or conference title) while other specify acronyms.
Following this semi-automatic procedure, we collected data about
the papers published in the seven proceedings of the annual
EGOV conference from 2002 to 2008. Our data set includes data
about 399 papers and about 5000 references. We have already
cleaned all the data about the 399 papers included in the
EGOV_paper entity, while the references data is clean only for the
papers published in the four proceedings from 2005 onward.
4.2 Data summary In this subsection, we present some basic statistical analysis to
introduce the collected data set. Here the focus is directed merely
to the presentation of the conference proceedings from the
statistical point of view, which in relation to the presented
methodology in the previous section means, that the description of
the network is introduced.
The EGOV conference in all seven years of its existence included
399 papers. On the basis of these papers, we outline some of the
characteristics of the conference, collaborating authors and their
presented papers, as well as the changes which took place in the
observed period of time.
In Table 1, the basic descriptive statistics for the number of
authors, the number of countries and the number of references for
the whole time period is presented.
The basic descriptive statistics show that in case of EGOV
conference as a whole, the minimum authors per paper is 1,
although there exist also paper(s) with 10 author(s). The average
number of authors per paper in EGOV conference is 2.43, which
means that in average 2 persons share the authorship of the paper.
The authors collaborating in the EGOV conference come from
different countries. It can be outlined, that per paper the
collaboration of authors from maximum 3 different countries can
be traced. Although, in average, papers are mainly written by
authors from the same country, as the average number of countries
per paper is 1.09.
Table 1. Descriptive statistics for EGOV conferences’ papers
for a 2002-2008 period
Variable N Minimum Maximum Mean
# authors 399 1 10 2.43
# countries 399 1 3 1.09
# refs 399 0 66 12.58
The range of the number of references is huge. We can find papers
with 0 (zero) references, as well as paper(s) with the maximum
number of 66 references. In average, authors in their papers
include approximate 12 references.
In Figure 2, the number of papers for each year is presented, as
well as its change through seven years. The numbers of papers are
represented by the dots in the diagram, and the trend of change is
indicated by the horizontal fractured line, connecting the dots.
Figure 2. Decrease in the number of papers through the years
As we can see in Figure 2, the number of presented papers
steadily increased during the first 3 years. In 2004, 100 papers
were presented at the EGOV conference. From 2004 to 2005, we
observe a huge decrease in the number of papers, as in year 2005
only 30 papers were included in the conference. The decrease can
be attributed to the fact, that the EGOV organizers have made an
effort to improve paper quality by improving the review process
[5] and the improvements were made between 2004 and 2005
conferences. This means that in the last 4 years we are witnessing
higher quality papers being published in the conference
proceedigns.
The following figures present the average numbers of authors per
paper, the average numbers of countries per paper and the average
number of references per paper. The average number are
represented by dots, the change in those numbers is represented
by the horizontal fractured line, connecting the dots, and the
vertical lines represent the minimum and maximum numbers of
authors, countries and references per paper.
In Figure 3, the average number of authors per paper as well as its
trend is presented.
Figure 3. Change in the average number of authors per paper
through years
During the period of seven years, we can ascertain minor increase
in average number of authors per paper (Figure 3), although in
2008, as regards to 2007, the average number of authors per paper
decreased a little. Greater differences can be perceived in the
maximum number of authors, where in 2002 the number was 10,
which means, that a paper was written by 10 authors. In later
years, the difference in maximum number of authors was not so
high anymore, as the number varies for ±1 author.
In the next figure, the average number of countries per paper and
its change is presented.
Figure 4. Change in the average number of countries per
paper through years
Similar as the average number of authors per paper, the average
number of countries per paper also increased a little (Figure 4),
although the average number of countries in all years vary from 1
to 3. In 2002, the maximum number of countries, from which
authors of one paper originated, is 3. In 2003, 2004 and 2005, the
maximum number decreased to 2, and in the last 3 years the same
paper is written by authors from at most 3 different countries.
Regarding the country of origin of the authors, let us include
another diagram, representing the geographical review of the first
author's origin for the 2002-2008 period. Due to the fact, that
many countries are represented only by one author in the whole
period, to improve the presentation of the pie chart in Figure 5,
only the portions of countries, which are represented by at least 10
authors, are included.
Figure 5. Geographical review of the first author's origin for
the 2002-2008 period
According to Figure 5, we can ascertain that authors in most cases
come from Italy, Austria, Netherlands and Germany, although the
portion of authors from Greece, Spain, Sweden, UK and USA is
not negligible.
On the other hand, first authors of papers come from 45 different
countries, from all continents. The continent distribution of first
authors is presented in Figure 6.
Figure 6. Continental distribution of the first authors for the
2002-2008 period
According to Figure 6, we can ascertain, that the main continent
represented in EGOV conference is Europe. 82% of first papers'
authors come from countries from Europe, although authors come
from all other continents too. 9% of authors come from America,
7% from Asia, and also from Australia and Africa, from each 1%
of authors. On this basis we can assert that the EGOV conference
is really international, as it reaches an increasing audience in the
geographical perspective.
In Figure 7, the average number of references per paper and its
change is presented.
In contrast to the average number of authors per paper (Figure 3),
the average number of references per paper is increasing. In the
period 2004-2006 the growth was rapid, although in the last 3
years, the average number of references is remaining almost the
same. In 2008, papers were in average refering to almost 23
references. The changes can be recognized in minimum and
maximum numbers of papers as well. As the minimum number of
references was 0 at first, which means that some papers did not
include any references, the situation began to change in year
2005, as the improvements of the paper review were made. From
2005 on, there are no papers without references. The rapid growth
can be ascertain in the maximum number of references. As in first
years of conference the papers contained around 30 references at
most, from the year 2005 the maximum number of references
increased and is now settled at 66 references per paper.
Figure 7. Change in the average number of references per
paper through years
5. PROSPECTS FOR ANALYSIS According to the presented methodology for citation analysis, in
the next section we propose application of the methodology in
case of EGR citation network, which is based on the papers
presented in EGOV conference. Here we are interested mainly in
authors and their references. Although the references include
other categories as well, the purpose of the present paper is to
outline the characteristics of authors of selected papers according
to their mutual connectedness.
5.1 Definition of citation networks In citation analysis, when studying research papers, there are
several possibilities in choosing the units of the research and in
this paper we focus only on one of them, i.e. the units are
represented by the authors of the studied papers. In this case, the
lines are represented by the relations between the authors, where
relations indicate connections on the citation or reference basis.
Regarding the focus of the present paper, in case of EGOV
conference network, we can define the citation network by the
units which are represented by the authors of the included papers
and relations among authors base on their references. This means
that a unit or author is connected to other authors according to his
or her references. According to the data set description (see Figure
1), author citation network can be presented as it is shown in
Figure 8, where colored points represent units, and relation
(references) are represented by lines. Figure 8 represents a
reduced author citation network, as the network includes 3 papers
from EGOV 2007. Vertices are labeled according to the authors’
names and arcs point from the papers’ authors towards referenced
authors.
In addition to defining the units and relations among them, some
additional information about authors are needed. Example of the
network, presented in Figure 8, indicates that, beside defining
relations among vertices, at least the names of authors are needed
to draw clear networks. Other information, such as the year of the
paper, the year of publications in references, countries and
affiliations of the authors of examined papers, etc., serve us as an
addition in describing, exploring and interpreting final results of
the analysis.
Figure 8. Author citation network
As the description of the data set was presented in the previous
section, we now skip to the very exploration of the EGOV
conference network.
5.2 Exploration of citation networks According to the subsection 3.3 of the present paper, we now
focus on the exploratory citation network analysis. Regarding the
theory of social network analysis, several methods and
possibilities for a detailed detection and interpretation of the
patterns of relations between units can be used. Here, we will lean
mainly on the theory proposed by Nooy et al [8].
5.2.1. General information about the network In specification of the general information about the network, we
will outline the meaning of these information for EGOV
conference network.
Number of units represents the number of authors in the network.
Number of directed lines (arcs) indicates the number of
connections between authors, based on the references of the
proceeding’s papers. Number of undirected lines (edges) in the
EGOV conference will not be displayed, as we are dealing with a
directed network, where all lines point from one vertex to the
others. Number of loops represents the number of times the
vertices in the network are connected to themselves. In the EGOV
network, the number of loops shows how many auto-references
are made by authors. Density of the network shows how dense the
citation network is and is defined as a number of lines in a
network as a proportion of the maximum possible number of
lines. More the network is dense larger is the difference between
lines and units, in favour of lines.
5.2.2. Cohesive subgroups in the network In the network analysis sense, the cohesive subgroups are defined
on the ways in which vertices are interconnected. Cohesive
subgroups in the network can be identified on several ways with
different techniques of social network analysis.
One of them is identification of the components, which represent
the maximal connected subnetworks, in which the vertices are
strongly or weakly connected. In this sense, we distinguish
between weak2 and strong3 components and the distinction
between them is made according to the (un)consideration of the
directions of lines between vertices.
Cohesive subgroups of the network can be identified also by
finding the cores. A k-core is defined as clusters of vertices where
each vertex from a cluster is connected to k number of other
vertices from the same cluster. It identifies relatively dense
subnetworks or cohesive subgroups. However, a k-core does not
always represent a cohesive subgroup, so the vertices of low k-
cores should be eliminated until the network breaks up into
relatively dense components.
The last known technique for identifying cohesive subgroups in
the network is represented by cliques. A clique may be described
as a special case of cores in which each vertex from a cluster is
connected to all other vertices from the same cluster and where
each cluster contains at least three vertices. In other words, these
vertices represent a subset of the network, in which each vertex is
directly connected to all other vertices. Due to this fact, cliques
represent subgroups with the highest degree of cohesion.
From the citation network point of view, the examination of the
presence or absence of cohesive subgroups is very interesting. If
the vertices are represented by authors, the cohesive subgroups
will contain authors from the same research fields, which means,
authors, who are referencing each other the most.
5.2.3. Structural prestige of units in the network In directed networks, as it is presented by citation network, some
units receive many positive choices and are considered to be more
prestigious as the others. Several measures revealing the
prestigious units in the network can be used.
The simplest measure of structural prestige is named popularity
and is measured by the number of arcs the vertex receives in a
directed network – so by indegree of a vertex, and a higher
indegree indicates higher structural prestige. The main deficiency
of this measure is that it only takes direct choices into account.
To extend the notion of prestige to indirect choices, another
measure can be introduced. This is the input domain of a vertex
which refers to the number or percentage of all other vertices,
from which, considering the direction of connection, we can reach
the selected vertex. The main deficiency of this measure is the
fact, that in a well-connected network, the input domain often
includes all or almost all other vertices, so it should be limited to
the direct neighbours or at least to neighbours at maximum
distance of two on the assumption that the indirect choices
contribute less to prestige.
In order to improve the input domain as a measure, proximity
prestige is proposed. This measure is calculated on the basis of
the input domain, which is divided by the mean distance from all
2 A weak component includes a cluster of vertices, in which any
vertices can be reached from all other vertices from the same
cluster, regardless the direction of the lines.
3 The definition of a strong component is the same, the only
difference is that in the case of a strong component, the
direction of lined is significant.
vertices in the input domain of the vertex. A great input domain
and short distances lead to high values of proximity prestige.
From the citation network point of view, structural prestige can be
investigated in citation networks of authors. In this case, the
structural prestige outlines the most prestigious and consecutively
the most influential authors in the research field.
5.2.4. Brokerage roles of units in the network As we have already outlined, the unit in the network can appear in
various brokerage roles4. According to the nature of this
approach, the direction of connections or lines is not very
important, so the direction of relations is disregarded. The concept
of brokerage founds on the notions of centrality and betweenness.
The concept of centrality is used to refer to positions of individual
vertices within the network. There are several ways of measuring
the centrality of the vertices. The simplest measure of centrality is
a degree centrality, which indicates, that the unit in a network is
central if it is active enough in making connections to the other
units. The higher the degree of a unit, the more sources of
information or knowledge it has at its disposal. This measure is an
absolute measure of the centrality, so the normalization is
required to get relative measure of the centrality. Due to the
simplicity of the measure, the degree centrality considers only the
number of direct neighbours of a vertex, which can represent a
problem. This is why another measure of centrality is introduced,
termed as closeness centrality. It is defined as the number of other
vertices divided by the sum of all distances between the vertex
and all others. Closeness centrality is a better measure as the
degree centrality, since it considers not only direct, but also
indirect neighbours of the vertex. The common feature of both
presented measures is that they both consider the reachability of a
unit within the network.
Another concept rests on the idea that a unit within a network is
more central if it is more important as an intermediary. This
concept is called betweenness and the measure of betweenness is
defined as a betweenness centrality. It represents the proportion of
all geodesics between pairs of other vertices which include this
vertex. The concept of betweenness centrality captures the
importance of a vertex to the circulation of information and
knowledge, where high value of this measure indicates that the
unit is an important intermediary in the network.
From the citation analysis point of view, we can apply brokerage
roles in terms of measures of centrality, i.e. degree centrality,
closeness centrality and betweenness centrality, when finding the
characteristics of the knowledge flow through a selected research
area. On the other hand, on the basis of centrality measures, we
can identify central authors, which in the citation network play
essential roles in terms of being central and intermediary units.
5.2.5. Diffusion of information and knowledge in
the network Diffusion is an important social process and represents a special
case of brokerage, which includes a time dimension. Something,
in our case of study the knowledge, is sent from one author to
4 as a coordinator, itinerant broker, representative, gatekeeper or
as a liaison
another in the course of time. It is generally known, that the
diffusion of the definite phenomena is represented by the typical
S-shape of the diffusion curve. This means, that at first few
authors adopt the innovation but the adoption rate accelerates,
then when 10 to 20 percent of the authors have adopted
innovation, the absolute number of new adopters is still
increasing, causing a sharp raise of the adopters and finally, the
number of new adopters decreases, which means that the diffusion
process slowly reaches its end.
In citation analysis, the examination of the diffusion processes is
important from the information or knowledge spreading point of
view. In this case, the adoption rate represents the percentage of
new adopters of the specific knowledge at a particular moment.
Contagion refers to the spreading new knowledge from the author,
who started the innovation, to his or her closest colleagues, which
are exposed to this innovation. After a time, hopefully, adoption
rate accelerates, reaches the critical mass, then still increases and
at the end of the diffusion process decreases.
5.2.6. Ranking clusters of the units in the network In the case of directed networks, as the citation network actually
is, ranking is associated with asymmetry or hierarchy, meaning
that the arcs represent “point-up” relations and not down. In this
respect, the network can be presented as a set of ranks where each
rank contains one or more clusters. The ranked clusters model
represents a simple hierarchy. In the case of ranking, we have to
deal with acyclic networks, which certainly do not hold for the
citation networks. Based on the papers’ references, citation
networks have cyclic structure. Fortunately, it is relatively easy to
detect cyclic parts of the networks, as they are represented by
strong components. If we identify strong components in the
network, we can shrink them and the network becomes acyclic.
However, the notion of strong components seems not to be
sufficiently strict to identify cluster within a rank. Furthermore,
strong components as cyclic subnetworks represent clusters of
equals, where acyclic subnetworks reflect hierarchy perfectly.
That is why acyclic decomposition is usually used to determine
hierarchy in the network.
Acyclic decomposition determines the hierarchy in the network
according to several steps of procedure. In the simplest way, it is
necessary first to find strong components in the network, i.e.
cyclic subnetworks. Then each strong component is shrunk into a
vertex, so we get a new network with so many units as the number
of strong components is. Finally, we compute the depth for each
vertex in the new network, so we get the hierarchy. Then we can
remove arcs, which are interconnecting the strong components,
and convert all bi-directed arcs into edges. On the basis of the
presented procedure, we get a diagram, which represents clusters
of vertices which are interconnected.
6. PRELIMINARY RESULTS In the previous section, the prospects for analysis were
represented. Considering the data set, presented in section 4.2, we
can apply methods and techniques of citation analysis to outline
some of the characteristics of authors’ citation network. In the
continuation, some preliminary results of citation analysis
following sections 5.2.1, 5.2.2., and 5.2.3 are represented. Here
we focus on the citation network among authors of selected papers
for the whole 2005-2008 period, for which all data are clean.
Figure 9. Author citation network for 2005-2008 period
In Figure 9, the whole author citation network for 2005-2008
period is presented. As it is shown in Table 2, the network
consists of 2870 different authors among which 10120
connections are established on the reference basis. Density of the
network is 0.0012 which means that only 0.12% of maximum
possible number of lines or connections is present, so the network
is not dense.
Table 2. General properties (number of vertices, number of
arcs, and density) of the citation network built on data for
2005-2008 period
2005-2008
# vertices 2870
# arcs 10120
Density 0.0012
In social networks, like the observed authors’ citation network can
be best described, the occurrence of cohesive subgroups can be
often found, as some of the members of networks, according to
the relations among them, have relations only with some of the
other members of the network. In SNA, identification of cohesive
subgroups is one of the major concerns, where the cohesive
subgroups are represented by authors, among which relatively
strong, intense, direct, and frequent ties exist.
As it was already described, there are several ways of identifying
cohesive subgroups in the network, i.e. finding components, k-
cores and cliques. The results of finding (weak) components in the
network show that there is a single major component, which
includes almost 97% of the authors. Another interesting result is
also the fact that the observed citation network includes a 2-core
with 20% of all authors; within this core, each author is
referenced by at least two others. The other 80% of authors
belong to cores with lower rank, which indicates the absence of
cohesive subgroups in the observed network. Similar results can
be also found regarding the cliques, where no 3- or 4-cliques can
be found. In sum, there is no evidence of cohesive subgroups in
the observed citation network. This is very likely to be the
consequence of the fact, that the authors within EGR field
represent a relatively young and compact community, in which
sub-communities have not emerged yet.
Despite the fact that there are no cohesive sub-groups in the
citation network, we can still identify central, most cited, authors.
An author is considered to be more popular or prestigious then
others, if he/she receives many positive choices. In Table 3, we
present the results of the three measures of structural prestige –
input degree, input domain, and proximity prestige. For each
measure, the minimum and maximum value is shown, as well as
its average, and in the last column the first 10 authors with the
highest values of prestige are presented.
Table 3. Structural prestige measures (input degree, input
domain, and proximity prestige) for authors in the citation
network built on data for 2005-2008 period
Min. Max. Mean 2005-2008
Input
degree 0.000 0.0195 0.0012
Wimmer,
Grönlund, van
Dijk, Ebbers, Lee
J.W., Heeks,
Layne, Hargittai,
Tambouris,
Peristeras
Input
domain 0.000 0.0537 0.0249
Orlikowski, Lee
S.M., Keen,
Holzer, Drucker,
Thompson,
Morton, Pollitt,
Delancer,
Newcomber
Proximity
Prestige 0.000 0.0250 0.0067
Grönlund, Heeks,
Hood, Wimmer,
Lee J.W., Webster,
van Dijk, Layne,
Margetts,
Hargittai
As it is shown in the table above, input degree and proximity
prestige produce similar results in a sense of the most prestigious
authors, whereas the results of input domain show deviation from
other two measures. In Table 3, the names of authors, which
emerge as the prestigious ones in at least two different measures
of prestige, are marked bold. We can ascertain that regarding the
input degree and proximity prestige Grönlund, Heeks, Wimmer,
Lee, Layne, and Hargittai are the most prestigious authors in the
whole citation network, which means that these authors were
referenced from others the most.
Regarding the results of structural prestige measures, in Table 4
the main topics as well as the number of citations for each of the
most prestigious authors are represented.
Now, in the final stage of our analysis, we can draw some content
conclusions about the most prestigious authors in the observed
network. In this way, we can also delineate the thematic topics
which influenced the work in EGR field the most.
Most of the references are made to the papers, which consider the
state-of-the-art of e-government research, written by Grönlund
and in minor extend also by Heeks. In their papers, they provide
the analysis of EGR field and offer the overview of the current
stage of the field. The second most influential thematic topic is
one of integrating e-services in public administration The papers
here are mainly written by Wimmer and her colleagues, where
they introduce and analyze different frameworks and approaches
to integration and interoperability of e-government services. The
third most influential thematic topic regards to digital divide with
Hargittai as the authors of these papers. In her papers, she goes
beyond the digital divide introducing digital inequality and
exposing the significant differences among Web users. Another
influential topic is one regarding the unification of e-government
theories and models, which was proposed by Lee and Layne.
Although one would expect that with maturation of the EGR field
this topic would become the most important issue, in the EGOV
proceedings this is not the case.
Table 4. Structural prestige measures (input degree, input
domain, and proximity prestige) for authors in the citation
network built on data for 2005-2008 period
Author Topic Published
in # Cit.
1 Grönlund,
A.
state-of-the-art e-
government research
EGOV
IJEGR 13
2 Heeks, R.
success and failure of e-
government projects book 6
state-of-the-art e-
government research GIQ 6
3 Wimmer,
M.A.
integration of e-services
in public administration
HICSS
EM
IFIP
11
roadmaps for future e-
government research EGOV 5
4 Lee, JW
Layne, K.
unifying e-government
theories and models GIQ 7
5 van Dijk, J.
user profiling and
services personalization
in public administration
EGOV
GIQ 6
6 Hargittai, E. digital divide
FM
JASIST
ITS
8
Finally, there are some other thematic topics being almost equally
influential, but are cited less frequently as the four topics
described before. These are: success and failure of e-government
projects, authored by Heeks), roadmaps for future e-government
research, authored by Wimmer, and user profiling and services
personalization in public administration, authored by van Dijk.
7. CONCLUSION In the previous sections we have outlined the stage of the EGR
field and emphasized the interests of the researchers for analysis
of its development. We proposed a methodology for scientific
field research in terms of citation analysis and described the
citation network data set for the papers published at the
International Conference on e-Government. The presented data set
represent the first necessary step in our planned analysis of the
development of EGR. Finally, we outlined the prospects for
analysis in the data set and performed some points of the analysis
which gave us some preliminary results.
We made the first step towards more elaborative analysis of EGR
by proposing the methodology for citation analysis in this paper.
On the basis of the content of the present paper we are continuing
with our research, as the actual exploratory citation network
analysis for papers presented at EGOV conference is going to be
executed. The main aim of our research in progress is to delineate
the history and the growth patterns in the EGR field, the inside
connectedness of the EGR area and the connections of the field
with other research fields. The crucial point is also to investigate
the connections between the EGOV conference and other e-
government conferences worldwide.
But the continuation of our research will include some other
prospects as well. Although the present paper focuses merely on
the citation network among authors of the selected papers, other
possibilities in choosing the units of the research are possible. On
one hand, there is a possibility that the units are represented by
papers/articles themselves, where relations are expressed by the
references made between the papers. In this sense, the relations
among papers can be evaluated, as well as the connectedness of
the EGR field with other research fields. On the other hand, the
units can also be presented by authors and papers, where we can
form a two-mode network and relations are made from papers to
authors, according to the references made. The third possibility is
to employ conferences and/or journals as units, where the
relations are made between conferences and journals regarding the
references. In this case, the exploration of relations among EGR
field and other fields is possible, since every conference or journal
is significant for specific research field.
There are also other possibilities for analysis in the authors’
citation network sense. In the present paper we focused on authors
and relations between them based on the references made. But in
the future we are also planning to form a co-authorship network
and analyze the authors in EGR field from this perspective. In all
these possibilities for analysis, the central point will stay the same,
since the methods of SNA, described in this paper, will be applied
to citation analysis of various networks.
We know that the collected data set has a narrow focus since there
exist other conferences on e-government or digital government
(e.g., this conference, i.e., the National Conference on Digital
Government Research, dg.o). Focusing on the papers from EGOV
conference, the interesting viewpoint is certainly the
connectedness of these two international conferences, which
should contain common research sights and share the knowledge.
The connection can be identified by references, where the papers
of each conference, in our case of EGOV conference, should be
referring to papers of the other conference, in our case to the
papers of dg.o conference.
In the future, the tendency towards expanding our data set and
consequently our research is present. We are planning to examine
also other conferences and, more importantly, journals in the e-
government field. Efforts can be extended to literature from
neighboring research fields that are closely related to the e-
government research. On this basis, we could outline the EGR
field as a whole which will enable the full insight into the field.
Although the realization of this idea is still relatively far away,
once it is implemented, the complete map of the EGR field will be
fully discovered and explored.
8. REFERENCES [1] Clausen, M. and Wormell, I. 2001. A bibliometric analysis of
IOLIM conferences 1977-1999. Journal of Information
Science 27 (3), 157-169.
[2] Cotterill, S. and King, S. 2007. Public Sector Partnerships to
Deliver Local E-Government: A Social Network Study. In
Wimmer, M.A., Scholl, H.J. and Grönlund, A. (Eds). EGOV
2007, LNCS 4656, 240-251. Springer-Verlag, Berlin,
Heidelberg.
[3] Flak, L.S., Sein, M.K. and Sæbo, Ø. 2007. Towards y
Cumulative Tradition in E-Government Research: Going
Beyond the Gs and Cs. In Wimmer, M.A., Scholl, H.J. and
Grönlund, A. (Eds). EGOV 2007, LNCS 4656, 13-22.
Springer-Verlag, Berlin, Heidelberg.
[4] Grönlund, A. 2004. State of the Art in e-Gov Research – A
Survey. In Traunmüller, R. (Ed.). EGOV 2004, LNCS 3183,
178-185. Springer-Verlag, Berling, Heidelberg.
[5] Grönlund, A. 2006. e-Gov Research Quality . In Wimmer,
M.A. et al. (Eds.): EGOV 2006, LNCS 4084, 1-12. Springer-
Verlag, Berling, Heidelberg.
[6] Liu, Z. and Wang, C. 2005. Mapping interdisciplinarity in
demography: a journal network analysis. Journal of
Information Science, 31 (4), 308-316.
[7] Meho, L.I. 2007. The rise and rise of citation analysis.
Physics World, 20 (1), 32-36.
[8] Nooy, W.d., Mrvar, A. and Batagelj, V. 2005. Exploratory
social network analysis with Pajek. Cambridge University
Press, New York.
[9] Rahm, E. and Thor A. 2005. Citation analysis of database
publications. SIGMOD Record, 34 (4), 48-53.
[10] Sabo, S., Grčar, M., Fabjan, D.A., Ljubič, P. and Lavrač, N.
2007. Exploratory analysis of the ILPnet2 social network.
http://kt.ijs.si/Dunja/SiKDD2007/Papers/Sabo_PajekILPNet.
[11] Scholl, H.J. 2006. Is E-Government Research a Flash in the
Pan or Here for the Long Shot? In Wimmer, M.A. et al.
(Eds.): EGOV 2006, LNCS 4084, 13-24. Springer-Verlag,
Berling, Heidelberg.
[12] Wasserman, S. and Faust, K. 1994. Social network analysis:
methods and applications. Cambridge University Press,
Cambridge, New York, Melbourne.