Applying social network analysis to the information in CVS repositories

18
Applying Social Network Analysis to the Information in CVS Repositories Luis L´ opez-Fern´ andez, Gregorio Robles, Jes´ us M. Gonz´ alez Barahona, GSyC, Universidad Rey Juan Carlos, Madrid, Spain {llopez,grex,jgb}@gsyc.escet.urjc.es MSR 2004 (Edinburgh, UK) 25th May 2004

Transcript of Applying social network analysis to the information in CVS repositories

Applying Social Network Analysisto the Information in CVS Repositories

Luis Lopez-Fernandez, Gregorio Robles,

Jesus M. Gonzalez Barahona,GSyC, Universidad Rey Juan Carlos, Madrid, Spain

{llopez,grex,jgb}@gsyc.escet.urjc.es

MSR 2004 (Edinburgh, UK)25th May 2004

Background 1

Background

There is a lot of (too much?) information about libre software projects

out there

We’re starting to streamline the extraction of raw data (e.g., from

CVS repositories)

We have to apply data mining and data interpretation techniques to

get meaningful information

Let’s explore approaches which were productive in other fields

c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories

Main aims of the study 2

Main aims of the study

To advance in the understanding of the social structure of libre soft-

ware projects

To characterize projects according to this structure

To relate the evolution of a project to the evolution of its social

structure

To explore self-organization in the social structure of libre software

projects

c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories

Methodology 3

Methodology

Download CVS history information from the repository for a libre

software project

Extract the information related to who commited what

Build with it the commiter and module networks

Analyze the resulting networks using social network analysis

Extract some conclusions

c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories

The commiter network 4

The commiter network

One side of affiliation network

Each vertex, a commiter (usually a developer)

Edge: when there is contribution to at least one common module

Weight of edges: commits by both commiters to all common modules

c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories

The module network 5

The module network

Other side of the same affiliation network

Each vertex, a module (usually a top-level directory)

Edge: when there is at least one common commiter

Weight of edges: commits by common commiters to both modules

c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories

Both are a complex mesh 6

Both are a complex mesh

Module network for the Apache project, ca. February 2004

c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories

But they can be characterized 7

But they can be characterized

Degree (number of connections per vertex)

Weighted degree (in our case, by commits)

Distance centrality (proximity to the rest of the network)

Betweenness centrality (shortest paths traversing a vertex)

Clustering coefficient (connectivity to the neighborhood)

Weighted clustering coefficient (in our case, by commits)

Community analysis (Girvan-Newman algorithm)

c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories

Apache: connection degree (commiters network) 8

Apache: connection degree (commitersnetwork)

Degree0 50 100 150 200 250 300 350 400 450

0

20

40

60

80

100

120

Apache, circa February 2004

c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories

Apache and GNOME clustering coefficient (modules network) 9

Apache and GNOME clustering coefficient(modules network)

cc (clustering coeficient)0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1

0

5

10

15

20

25

30

cc (clustering coeficient)0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05

0

20

40

60

80

100

120

Apache (left), GNOME (right) circa February 2004

c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories

Apache, GNOME, KDE weighted clustering coefficient (modules network) 10

Apache, GNOME, KDE weighted clusteringcoefficient (modules network)

Weighted clustering coeficient0 5000 10000 15000 20000

0

5

10

15

20

25

30

Weighted clustering coeficient0 20000 40000 60000 80000 100000 120000 140000

0

50

100

150

200

250

Weighted clustering coeficient0 20000 40000 60000 80000 100000 120000 140000

0

50

100

150

200

250

Apache (left), GNOME (center), KDE (right) circa February 2004

c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories

Apache connection degree (modules network) 11

Apache connection degree (modules network)

Degree0 5 10 15 20 25 30 35

0

1

2

3

4

5

6

7

8

Degree0 10 20 30 40 50 60 70

0

2

4

6

8

10

12

Degree0 10 20 30 40 50 60 70 80 90

0

2

4

6

8

10

12

14

Degree0 20 40 60 80 100 120

0

2

4

6

8

10

12

14

2001 (top left) to 2004 (bottom right)

c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories

Apache modules community analysis (1999.01) 12

Apache modules community analysis(1999.01)

c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories

Apache modules community analysis (2000.01) 13

Apache modules community analysis(2000.01)

c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories

Apache modules community analysis (2000.09) 14

Apache modules community analysis(2000.09)

c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories

Apache modules community analysis (2002.01) 15

Apache modules community analysis(2002.01)

c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories

Apache modules community analysis (2004.02) 16

Apache modules community analysis(2004.02)

c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories

Conclusions 17

Conclusions

Methodology for studying the structure of libre software projects

Captures both relationships between modules and commiters

First step to community analysis

Access to traditional social network analysis tools

Further work: characterization of projects

c©2004 Jesus M. Gonzalez Barahona Applying Social Network Analysisto the Information in CVS Repositories