Applying social network analysis to the information in CVS repositories
-
Upload
independent -
Category
Documents
-
view
6 -
download
0
Transcript of Applying social network analysis to the information in CVS repositories
Applying Social Network Analysisto the Information in CVS Repositories
Luis Lopez-Fernandez, Gregorio Robles,
Jesus M. Gonzalez Barahona,GSyC, Universidad Rey Juan Carlos, Madrid, Spain
{llopez,grex,jgb}@gsyc.escet.urjc.es
MSR 2004 (Edinburgh, UK)25th May 2004
Background 1
Background
There is a lot of (too much?) information about libre software projects
out there
We’re starting to streamline the extraction of raw data (e.g., from
CVS repositories)
We have to apply data mining and data interpretation techniques to
get meaningful information
Let’s explore approaches which were productive in other fields
c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Main aims of the study 2
Main aims of the study
To advance in the understanding of the social structure of libre soft-
ware projects
To characterize projects according to this structure
To relate the evolution of a project to the evolution of its social
structure
To explore self-organization in the social structure of libre software
projects
c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Methodology 3
Methodology
Download CVS history information from the repository for a libre
software project
Extract the information related to who commited what
Build with it the commiter and module networks
Analyze the resulting networks using social network analysis
Extract some conclusions
c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories
The commiter network 4
The commiter network
One side of affiliation network
Each vertex, a commiter (usually a developer)
Edge: when there is contribution to at least one common module
Weight of edges: commits by both commiters to all common modules
c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories
The module network 5
The module network
Other side of the same affiliation network
Each vertex, a module (usually a top-level directory)
Edge: when there is at least one common commiter
Weight of edges: commits by common commiters to both modules
c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Both are a complex mesh 6
Both are a complex mesh
Module network for the Apache project, ca. February 2004
c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories
But they can be characterized 7
But they can be characterized
Degree (number of connections per vertex)
Weighted degree (in our case, by commits)
Distance centrality (proximity to the rest of the network)
Betweenness centrality (shortest paths traversing a vertex)
Clustering coefficient (connectivity to the neighborhood)
Weighted clustering coefficient (in our case, by commits)
Community analysis (Girvan-Newman algorithm)
c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Apache: connection degree (commiters network) 8
Apache: connection degree (commitersnetwork)
Degree0 50 100 150 200 250 300 350 400 450
0
20
40
60
80
100
120
Apache, circa February 2004
c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Apache and GNOME clustering coefficient (modules network) 9
Apache and GNOME clustering coefficient(modules network)
cc (clustering coeficient)0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1
0
5
10
15
20
25
30
cc (clustering coeficient)0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05
0
20
40
60
80
100
120
Apache (left), GNOME (right) circa February 2004
c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Apache, GNOME, KDE weighted clustering coefficient (modules network) 10
Apache, GNOME, KDE weighted clusteringcoefficient (modules network)
Weighted clustering coeficient0 5000 10000 15000 20000
0
5
10
15
20
25
30
Weighted clustering coeficient0 20000 40000 60000 80000 100000 120000 140000
0
50
100
150
200
250
Weighted clustering coeficient0 20000 40000 60000 80000 100000 120000 140000
0
50
100
150
200
250
Apache (left), GNOME (center), KDE (right) circa February 2004
c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Apache connection degree (modules network) 11
Apache connection degree (modules network)
Degree0 5 10 15 20 25 30 35
0
1
2
3
4
5
6
7
8
Degree0 10 20 30 40 50 60 70
0
2
4
6
8
10
12
Degree0 10 20 30 40 50 60 70 80 90
0
2
4
6
8
10
12
14
Degree0 20 40 60 80 100 120
0
2
4
6
8
10
12
14
2001 (top left) to 2004 (bottom right)
c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Apache modules community analysis (1999.01) 12
Apache modules community analysis(1999.01)
c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Apache modules community analysis (2000.01) 13
Apache modules community analysis(2000.01)
c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Apache modules community analysis (2000.09) 14
Apache modules community analysis(2000.09)
c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Apache modules community analysis (2002.01) 15
Apache modules community analysis(2002.01)
c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Apache modules community analysis (2004.02) 16
Apache modules community analysis(2004.02)
c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories
Conclusions 17
Conclusions
Methodology for studying the structure of libre software projects
Captures both relationships between modules and commiters
First step to community analysis
Access to traditional social network analysis tools
Further work: characterization of projects
c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories