A study on the uncertainty inherent in class cohesion measurements

11
A study on the uncertainty inherent in class cohesion measurements Moataz A. Ahmed a, * , Adam Abubakar b,1 , Jarallah S. AlGhamdi a,2 a King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia b Yanbu University College, Box 31387, Yanbu Al-Sinaiyah, Saudi Arabia article info Article history: Available online xxxx Keywords: Class cohesion metrics Metrics validity Connection types Metrics classification criteria Uncertainty abstract Software metrics are essential for component certification and for the development of high quality soft- ware in general. Accordingly, research in the area of software metrics has been active and a wide range of metrics has been proposed. However, the variety of metrics proposed for measuring the same quality attribute suggests that there may be some sort of inconsistencies among the measurements computed using these metrics. In this paper, we report a case study considering class cohesion as a quality attribute of concern. We present the results of our empirical investigation to confirm that prominent object-ori- ented class cohesion metrics provide inconsistent measurements. We also present an analysis of the uncertainty that should be considered in these class cohesion measurements due to their inter-inconsis- tencies. Considering such uncertainty, we provide a framework for associating a probability distribution of error to the measurements computed by each metric; thus enabling the assessment of the degree of reliability of measurements of each metric when used to rank a set of classes with regard to their cohe- sion quality. The error probability distribution would be useful in practice where it is seldom feasible to use a large set of metrics and rather a single metric is used. Ó 2010 Elsevier B.V. All rights reserved. 1. Introduction Achieving a high level of software quality has been a major objective of the industry. It is not accepted to deliver poor quality products and then repair problems and deficiencies after they have been delivered to the customer. Accordingly, quality assurance be- gins at an early stage in the software development process by set- ting out the desired product qualities (also known as key performance parameters, key performance indicators, external quality attributes, or simply external attributes). Potential software external attributes include but are not limited to safety, security, reliability, maintainability, adaptability, reusability, and robustness [24]. In general, it is not possible for any system to be optimized for all potential attributes. A quality model is built to define the critical and most significant quality attributes. For example, it may be the case that maintainability is paramount and other attri- butes have to be compromised to achieve this. The quality model also defines how external quality attributes are to be predicted and assessed. Typically, external attributes, such as maintainability, cannot be measured directly during software development; rather, they are indirectly predicted using some measurable attributes. It is typical to measure some internal measurable attributes that are known to have validated functional relationship with the external attributes of interest. Using appropriate metrics, such internal (quality) attri- butes are used to assess/predict the external attributes of interest. Examples of internal attributes include but are not limited to cohe- sion, coupling, and number of lines of code. External attributes are visible to the stakeholders (e.g., customers, users, and development project managers) of the product and are typically available after the software is used for a period of time; internal attributes con- cern the developer of the product and can be measured at different phases of development. In general, stakeholders (other than the developers) of software products care only about external quality attributes, but it is the internal attributes—which deal largely with the structure of the software—that help developers achieve the external qualities. For example, the internal quality of cohesion is necessary for achieving the external quality of maintainability. Similarly, coupling measurements could be used to predict fault density [2]. However, in many cases, the qualities are related clo- sely, and the distinction between internal and external is not sharp. The quality of a component-based software system depends on the quality of the components that is made of. Actually, the idea of certifying components is meant to offer a general scheme that indicates the quality of a component with respect to certain attri- butes. In their survey, Alvaro et al. noted that the research in the 1383-7621/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.sysarc.2010.06.004 * Corresponding author. Present addresses: LEROS Technologies Corporation, USA and George Mason University, Fairfax, VA 22030, USA. Tel.: +1 703 691 8122x225; fax: +1 703 691 8125. E-mail addresses: [email protected] (M.A. Ahmed), [email protected] (A. Abubakar), [email protected] (J.S. AlGhamdi). 1 Tel.: +996 5071 98568; fax: +966 4 392 5394. 2 Present address: CIO of the Ministry of Education, Riyadh, Saudi Arabia. Tel.: +966 3 860 2267; fax: +966 3 860 2174. Journal of Systems Architecture xxx (2010) xxx–xxx Contents lists available at ScienceDirect Journal of Systems Architecture journal homepage: www.elsevier.com/locate/sysarc Please cite this article in press as: M.A. Ahmed et al., A study on the uncertainty inherent in class cohesion measurements, J. Syst. Architect. (2010), doi:10.1016/j.sysarc.2010.06.004

Transcript of A study on the uncertainty inherent in class cohesion measurements

Journal of Systems Architecture xxx (2010) xxx–xxx

Contents lists available at ScienceDirect

Journal of Systems Architecture

journal homepage: www.elsevier .com/ locate /sysarc

A study on the uncertainty inherent in class cohesion measurements

Moataz A. Ahmed a,*, Adam Abubakar b,1, Jarallah S. AlGhamdi a,2

a King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabiab Yanbu University College, Box 31387, Yanbu Al-Sinaiyah, Saudi Arabia

a r t i c l e i n f o a b s t r a c t

Article history:Available online xxxx

Keywords:Class cohesion metricsMetrics validityConnection typesMetrics classification criteriaUncertainty

1383-7621/$ - see front matter � 2010 Elsevier B.V. Adoi:10.1016/j.sysarc.2010.06.004

* Corresponding author. Present addresses: LEROS Tand George Mason University, Fairfax, VA 22030, USAfax: +1 703 691 8125.

E-mail addresses: [email protected] (M.A. AhmAbubakar), [email protected] (J.S. AlGhamdi).

1 Tel.: +996 5071 98568; fax: +966 4 392 5394.2 Present address: CIO of the Ministry of Education, R

3 860 2267; fax: +966 3 860 2174.

Please cite this article in press as: M.A. Ahmeddoi:10.1016/j.sysarc.2010.06.004

Software metrics are essential for component certification and for the development of high quality soft-ware in general. Accordingly, research in the area of software metrics has been active and a wide range ofmetrics has been proposed. However, the variety of metrics proposed for measuring the same qualityattribute suggests that there may be some sort of inconsistencies among the measurements computedusing these metrics. In this paper, we report a case study considering class cohesion as a quality attributeof concern. We present the results of our empirical investigation to confirm that prominent object-ori-ented class cohesion metrics provide inconsistent measurements. We also present an analysis of theuncertainty that should be considered in these class cohesion measurements due to their inter-inconsis-tencies. Considering such uncertainty, we provide a framework for associating a probability distributionof error to the measurements computed by each metric; thus enabling the assessment of the degree ofreliability of measurements of each metric when used to rank a set of classes with regard to their cohe-sion quality. The error probability distribution would be useful in practice where it is seldom feasible touse a large set of metrics and rather a single metric is used.

� 2010 Elsevier B.V. All rights reserved.

1. Introduction

Achieving a high level of software quality has been a majorobjective of the industry. It is not accepted to deliver poor qualityproducts and then repair problems and deficiencies after they havebeen delivered to the customer. Accordingly, quality assurance be-gins at an early stage in the software development process by set-ting out the desired product qualities (also known as keyperformance parameters, key performance indicators, externalquality attributes, or simply external attributes). Potential softwareexternal attributes include but are not limited to safety, security,reliability, maintainability, adaptability, reusability, and robustness[24]. In general, it is not possible for any system to be optimizedfor all potential attributes. A quality model is built to define thecritical and most significant quality attributes. For example, itmay be the case that maintainability is paramount and other attri-butes have to be compromised to achieve this. The quality modelalso defines how external quality attributes are to be predictedand assessed.

ll rights reserved.

echnologies Corporation, USA. Tel.: +1 703 691 8122x225;

ed), [email protected] (A.

iyadh, Saudi Arabia. Tel.: +966

et al., A study on the uncerta

Typically, external attributes, such as maintainability, cannot bemeasured directly during software development; rather, they areindirectly predicted using some measurable attributes. It is typicalto measure some internal measurable attributes that are known tohave validated functional relationship with the external attributesof interest. Using appropriate metrics, such internal (quality) attri-butes are used to assess/predict the external attributes of interest.Examples of internal attributes include but are not limited to cohe-sion, coupling, and number of lines of code. External attributes arevisible to the stakeholders (e.g., customers, users, and developmentproject managers) of the product and are typically available afterthe software is used for a period of time; internal attributes con-cern the developer of the product and can be measured at differentphases of development. In general, stakeholders (other than thedevelopers) of software products care only about external qualityattributes, but it is the internal attributes—which deal largely withthe structure of the software—that help developers achieve theexternal qualities. For example, the internal quality of cohesion isnecessary for achieving the external quality of maintainability.Similarly, coupling measurements could be used to predict faultdensity [2]. However, in many cases, the qualities are related clo-sely, and the distinction between internal and external is not sharp.

The quality of a component-based software system depends onthe quality of the components that is made of. Actually, the idea ofcertifying components is meant to offer a general scheme thatindicates the quality of a component with respect to certain attri-butes. In their survey, Alvaro et al. noted that the research in the

inty inherent in class cohesion measurements, J. Syst. Architect. (2010),

2 M.A. Ahmed et al. / Journal of Systems Architecture xxx (2010) xxx–xxx

component certification area can be ‘‘divided” into two ages: from1993 to 2001 and after 2001. From 1993 to 2001 the focus wasmainly on developing formal ways to predict component qualitiesand build components with fully proved correctness properties.After 2001 the focus was on establishing well-defined componentquality models and defining the kinds of component qualities thatcan be certified [5]. With the latter approach, an indicator for thefulfillment of the quality of concern has to be determined; and amethod and appropriate metrics have to be used to measure thisindicator.

Research in the area of software metrics has been active and avariety of metrics has been proposed for measuring the same inter-nal quality attribute. Such a variety suggests that there may be akind of inconsistencies among the measurements computed usingthese metrics. In turn, such inter-inconsistencies raises the concernthat relying on measurements of one single metric might not leadto the same design decision the designer would take when consid-ering another metric or a combination. Similarly, relying on mea-surements of one single metric might not lead to the samecertification outcome when considering another metric or a combi-nation. This concern motivated our research to study the uncer-tainty that should be considered when relying on a single metric.In this paper, we report a case study considering class cohesionas the internal quality attribute of concern.

We consider cohesion as a case study in this paper due to itsimportance for component certification and due to the fact thatthe cohesion quality of an object-oriented class is among those ob-ject-oriented internal software attributes that caught researchers’attention as demonstrated in Section 2. Just like those other attri-butes, quite a number of metrics have been proposed for measur-ing the cohesion of a class. However, none of the prominentmetrics considers the whole set of factors that may affect classcohesion. This raises reservations regarding the validity of thesemetrics; that is the extent to which a given metric reflects the realmeaning of the concept of cohesion. Moreover, these metrics tendto differ in the factors they consider when measuring the classcohesion [1]. These differences bring up inconsistencies when con-sidering ranking different classes with regard to their cohesionqualities as different metrics may suggest different rankings. Thistype of inconsistency may cause confusion to the software designerwhen trying to come up with the best design possible. This raisesthe issue that the ranking computed using a given metric maynot be perfectly reliable. To the best of our knowledge, there isno framework that is meant to assess the reliability of measure-ments computed using available object-oriented cohesion metrics.In this paper, we study the validity of prominent class cohesionmetrics with the objective of providing a framework for assessingthe uncertainty associated with their corresponding measure-ments. The framework is meant to associate an error probabilitydistribution to measurements computed using a given metric.Accordingly, we use the error probability distribution to assessthe reliability of metric-computed rankings of classes based ontheir cohesion measurements. Such probability distribution ofthe error will help the software designer when relying a singlemetric to decide between different design alternatives as whichdesign is better from cohesion perspectives. Software designerswill be able to compute the probability that the ranking suggestedby the measurements is a reliable one.

It is worth noting here, though, that we just present a concep-tual framework in this paper in the sense it outlines a possible ap-proach to deal with the issue and does not dictate specific settingsof the parameters involved; rather, the parameters are typically setto suit the data available. For example, in demonstrating theframework, we assess reliability as the deviation from the meanof a family of available cohesion metrics; we used the mean ofmeasurements of the family of metrics as the benchmark for calcu-

Please cite this article in press as: M.A. Ahmed et al., A study on the uncertdoi:10.1016/j.sysarc.2010.06.004

lating the mean and standard deviation of errors of the differentmetrics. This interpretation of reliability would answer the ques-tion as what metric is sufficiently similar to the average of its fam-ily so that it could be used as a representative of the average whena single metric is to be used. As discussed in the paper, other pos-sibilities could be considered.

The rest of the paper is organized as follows. Section 2 presentsa literature review on existing cohesion metrics. We briefly intro-duce cohesion metrics classification criteria in Section 3. Section4 then presents an analysis of the validity of present metrics, anddiscusses their inter-inconsistencies. Section 5 presents a frame-work for assessing the uncertainty associated with the measure-ments computed by each metric. Finally, we conclude in Section 6.

2. Cohesion metrics

The cohesion quality of a software unit depicts how well theconstituents of the software unit are related. This can be deter-mined by knowing the extent to which the individual constituentsof the unit are necessary to perform the unit’s task [15]. A unit withlow cohesion does many unrelated things, or does too much work.Such units are undesirable; they suffer from the following prob-lems [18]:

� Hard to comprehend� Hard to reuse� Hard to maintain� Delicate; constantly effected by changes

Software units may be considered at different levels of a hierar-chy and may consist of other software units. They represent ele-ments in the design of software; for example, major subdivisionsof the software, components of a subdivision, modules, packages,classes, objects, functions, routines, procedures, databases, or datafiles. The object-oriented class is the type of unit of interest in thispaper.

There has been quite a number of research works trying to mea-sure the class cohesion in object-oriented development. In 1991,Chidamber and Kemerer [13] used the notion of degree of similar-ity of methods to propose a cohesion metric, Lack of Cohesion Mea-sure (LCOM). Some other researchers, including Chidamber andKemerer, later developed different variations of this metric [14].Some of the variants of LCOM consider the connected componentderived by drawing an undirected graph, which is based on theconnection between the methods. Hitz and Montazeri defined Co(‘‘connectivity”), which further discriminates classes having onlyone connected component [16]. Bieman and Kang proposed TCC(Tight Class Cohesion) and LCC (Loose Class Cohesion) based onthe direct and indirect connectivity between pairs of methods[8]. Briand et al. [10] proposed RCI (Ratio of Cohesive Interaction)that is based on the visualization of a class as a collection of datadeclarations and methods. In 1999, Bansiya et al. [7] proposedCAMC (Cohesion Among Methods of Classes) that evaluates thecohesion among methods of a class in the analysis and designphases of software development process. Chae et al. [12] high-lighted the significance of considering special methods (such as ac-cess methods and constructors) and the patterns of interactionamong the members of a class when defining cohesion metric. Toaddress these issues, they proposed the CBMC metric (CohesionBased on Member Connectivity). AlGhamdi et al. [4] proposedtwo cohesion metrics; CCM (Class Connection Metric) and ECCM(Enhanced Class Connection Metric) in order to assess the extentto which an inheritance hierarchy follows some design principles.Aman et al. [6] proposed two cohesion metrics that do not onlyconsider the connections among the components of a class but also

ainty inherent in class cohesion measurements, J. Syst. Architect. (2010),

M.A. Ahmed et al. / Journal of Systems Architecture xxx (2010) xxx–xxx 3

consider the sizes of the connected components as well as thestrength of method connection. These metrics are: OCC (OptimisticClass Cohesion) and PCC (Pessimistic Class Cohesion). Mitchell andPower [19] proposed two run-time cohesion metrics designed toquantify the external quality of an object-oriented application.These metrics are Run-time Simple LCOM ðRLCOMÞ and Run-timeCall-Weighted LCOM ðRWLCOMÞ. Both metrics are based on the def-inition of the LCOM measure except that here they only count in-stance variables that are actually accessed at run-time. WhileRLCOM simply considers those instance variables that are accessedat run-time, RWLCOM also considers the number of times an in-stance variable is accessed at run-time. Due to the problems inidentified in the LCOM metrics, Adam defined a metric, CBAMU(Cohesion Based on Attribute and Method Usage), based on bothattribute usage and method usage (invocation) within a class [1].Table 1 summarizes the definitions of the cohesion metrics wefound in the literature.

With this large number of cohesion metrics for measuring cohe-sion, software researchers and practitioners need to have a frame-work that would enable understanding the relationship among thedifferent cohesion metrics and determining which among them aremore reliable for capturing cohesion from a given external qualityattribute’s perspective. We discuss such a framework in Section 5.There is a need also to classify those cohesion metrics because theyhave different approaches and use different factors. The followingsection presents the cohesion metrics classification criteria.

3. Classification criteria

In 1997, Briand et al. came up with a unified framework withwhich cohesion metrics can be compared and evaluated [9]. Theyproposed a standardized terminology for expressing cohesion met-rics; they then reviewed the then-available metrics based on theterminology and finally provided a unified framework. One of theaspects addressed in their framework is Type of Connection; whichis a factor used in capturing class cohesion. Their framework iden-tified six connection types based on the existing cohesion metricsthen. More metrics have been emerged since then which necessi-tates the incorporation of the newly identified connections types.In this section, we present a set of classification criteria that is anextension to Briand et al.’s set to allow classifying, comparing,and evaluating current cohesion metrics. We identify two typesof criterion: factor and characteristic.

� Factor: a criterion of this type identifies what may affect thecohesiveness of a module.

� Characteristic: a criterion of this type reflects a feature of themetric.

Table 2 presents our set of criteria; where criteria 3, 8, 10, 11,and 12 represent Briand et al.’s suggested criteria.

A more detailed discussion of these classification criteria is gi-ven in Adam [1]. Critical analysis of the cohesion metrics listedin Table 1 led to the following observations:

1. Most of the metrics are based on the work of Chidamber andKemerer [14] which is based on the notion of degree of sim-ilarity of methods initially proposed by Bunge [11].

2. None of the metrics claims to predict specific external attri-butes or the other.

3. All the metrics capture cohesion at the class level.4. All the metrics are available at the design phase, except

ECCM which is available at the implementation phase.5. Most of the metrics are not validated and few provide expla-

nation on how to interpret the result of their metrics.

Please cite this article in press as: M.A. Ahmed et al., A study on the uncertadoi:10.1016/j.sysarc.2010.06.004

6. Some of the metrics are not normalized.7. More importantly, none of the metrics accurately capture the

cohesion of a class without violating intuition in one exam-ple or the other as discussed in Section 4.

In this paper we present our analysis meant for investigatingthe metrics validity as well as providing a framework for assessingthe reliability of measurements computed using cohesion metrics.The rationale behind our analysis is that if a set of metrics claimto have the same objective; these metrics are expected to offer con-sistent measurements provided that they are of the same level ofgranularity and available at the same phase. As pointed out above,Adam’s analysis has shown that available metrics do not claim topredict one external attributes or the other. Accordingly, in theory,software researchers and practitioners can use any metric regard-less of the external attribute(s) thought. Moreover, all metrics offerthe same level of granularity, available at same phase (except forECCM). Accordingly, one would expect these metrics to provideconsistent measurements. However, our study demonstrates thatthe metrics provide inconsistent measurements as discussed inSection 4. This could be due to the different factors (i.e., connectiontype, special methods, and inheritance) that each metric considers.We discuss the Connection type further below. We then summa-rize how the factors are dealt with by each metric in Table 4.

Table 3 presents all the types of connections used by currentprominent cohesion metrics. This table is an expanded version ofthe work presented by Briand et al. in [9]; here we identify threeadded connection types: IMMR, PPI, and MIBAT. In addition, weprovide a name and abbreviation to each connection type for easyreference. The two elements that are subject of the connection anda description of the type of the connection between the two ele-ments are given. Table 3 also shows the different interaction con-siderations that make up the relationships considered formeasuring cohesion: interactions between methods and attributesðM! AÞ only, between methods and methods only ðM!MÞ, orboth. Some connections consider the interactions between meth-ods that result from attribute sharing: others consider interactionsthat result from method invocations. This property is reflected inthe Means of Interaction columns of Table 3. Furthermore, someconnections consider direct relationships while others considerboth direct and indirect relationships as demonstrated in the Inter-action Mode columns of Table 3.

The cohesion metrics of a class are captured based on some sortof interactions that exist among the components of the class. Someinteractions result in write operations (i.e., they change the valuesof one or more of the components involved) while other interac-tions result in only read operations. The Access Mode column ofTable 3 gives a description of this phenomenon for each connectiontype.

If not carefully studied, DAS and MIBAT may appear to be thesame but they are actually not. DAS is based on direct attributesharing, which means that one method (say m) accesses an attri-bute then another method (say m0) accesses the same attributeirrespective of the kind of access (i.e., the researchers did not giveconsideration to the type of access (read or write)). MIBAT, on theother hand, is based on access type; a method (say m) writes to anattribute then another method (say m0) reads the same attributethat was modified by m; in this case, the writing precedes thereading.

It can clearly be seen that all the considered metrics captureclass cohesion based on attribute usage, or method invocations,or both. Table 4 shows a matrix containing class cohesion metricsand the connection types they used. Note that the last six metricswere not included in the empirical study of the inter-inconsisten-cies with regard to class rankings. These metrics were not includeddue to some implementation issues at the time the experiments

inty inherent in class cohesion measurements, J. Syst. Architect. (2010),

Table 1Object-oriented metric definitions.

Metric Definition

LCOM1 Let P be the set of pairs of methods without shared attributes

LCOM1 ¼j P j

LCOM2 Let P be the set of pairs of methods without shared attributes, and Q be the pairs of methods with shared attributes. Then

LCOM2 ¼jPj � jQ j; if jPj > jQ j0; otherwise

LCOM3 Consider an undirected graph G where the vertices are the methods of a class, and there is an edge between two vertices if the corresponding methods shareat least one attribute

LCOM3 ¼j connected components of G j

LCOM4 Like LCOM3, where graph G additionally has an edge between vertices representing methods Mi and Mj , if Mi invokes Mj or vice-versa

LCOM4 ¼j connected components of G j

LCOM5 Consider a set of methods fMig ði ¼ 1; . . . ;mÞ accessing a set of attributes fAjg ðj ¼ 1; . . . ; aÞ. Let lðAjÞ be the number of methods that reference Aj . Then,

LCOM5 ¼1a

Pai¼1lðAjÞ �m

1�m

CBAMU CBAMUðCÞ ¼ 12 � ðAUðCÞ þMUðCÞÞ where

AUðCÞ ¼0; if a ¼ 0 ðor if m ¼ 0Þ1

am

Pai¼1

lðAiÞ; otherwise

8<:

and

MUðCÞ ¼0; if m ¼ 0 ðor if m ¼ 1Þ

1mðm�1Þ

Pmj¼1

lðMjÞ; otherwise

8><>:

where a = number of attributes in the class, m = number of methods in the class, lðAiÞ = number of methods that access attribute Ai and lðMjÞ = number ofmethods that invoke method Mj

CCM

CCMðCÞ ¼ NCðCÞNMPðCÞ � NCCðCÞ

CCM is based on the connection graph GC of the class C. The connection graph GC has one vertex for each method of the class and there is an edge between twovertices if and only if the corresponding methods are connected according to the connection criterion defined by the metric. NC(C) = number of actualconnections within class C. NMP(C) = maximum possible connections and NCC(C) = number of connected components of the connection graph GC

TCC Let NP be the maximum possible number of direct or indirect connection in a class. NP ¼ N�ðN�1Þ2 for N methods. Let NDC be the number of pairs of directly

connected methods in a class. Then TCC is defined as: TCCðCÞ ¼ NDCðCÞNPðCÞ

Two methods that use one or more common attributes are said to be directly connected. Whereas, two methods that are connected through other directlyconnected methods are called indirectly connected

LCC Let NIC be the number of pairs of indirect connections in the class. Then LCC is defined as follows

LCCðCÞ ¼ NDCðCÞ þ NICðCÞNPðCÞ

where NC(C) is the number of actual connection among the methods of the class, NMP(C) is the number of the maximum possible connections among themethods of the class C and NCC(C) is the number of connected components of the connection graph Gc

CAMC

CAMC ¼Pn

i¼1jPijjTj � n

where

n is the number of methods in the class; Mi is the set of parameters type of method i; T is the overall union of all object types in the parameters of the methodsof the class; and Pi is the intersection of set Mi with the union set T for all the methods in the class

Co

COðCÞ ¼ 2 � jEC j � ðjVC j � 1ÞðjVC j � 1Þ � ðjVC j � 2Þ

where Ec and Vc are the edges and vertices of G from LCOM4

4 M.A. Ahmed et al. / Journal of Systems Architecture xxx (2010) xxx–xxx

Please cite this article in press as: M.A. Ahmed et al., A study on the uncertainty inherent in class cohesion measurements, J. Syst. Architect. (2010),doi:10.1016/j.sysarc.2010.06.004

Table 1 (continued)

Metric Definition

ECCM

ECCMðCÞ ¼ CCMðCÞ � ð1� PenaltyFactorðCÞÞ

Where

PenaltyFactorðCÞ ¼ NORMðCÞNOIMðCÞ

NORM(C) is the number of re-implemented methods and NOIM(C) is the number of inherited methods

OCC

OCCðCÞ ¼maxi¼1;...;n

jRwðmiÞjn�1

h i; ðn > 1Þ

0; ðn ¼ 1Þ

8<:

whereRwðmiÞ is the set of methods which are reachable by mi on GwðV ; EÞGwðV ; EÞ is a weak-connection graph, where V = M (number of methods) and E is given by:

E ¼ ffu; vg 2 M �M j fa 2 As:t:ðacðu; aÞKacðv; aÞÞg

PCC

PCCðCÞ ¼maxi¼1;...;n

jRsðmiÞjn�1

h i; ðn > 1Þ

0; ðn ¼ 1Þ

8<:

whereRsðmiÞ is the set of methods which are reachable by mi on GsðV ; EÞGsðV ; EÞ is a strong connection graph, where V = M (number of methods) and E is given by:

E ¼ ffu; vg 2 M �M j fa 2 As:t:ðwrðu; aÞKreðv ; aÞÞg

RCI

RCIðCÞ ¼ jClðCÞjjMaxðCÞj

A class is seen as a collection of data declarations and methods. ClðCÞ is the set of all data–data interactions and data–method interactions. MaxðCÞ to be the set ofall possible data–data interactions and data–method interactions

CBMC

CBMCðCÞ ¼ FcðGrðCÞÞxFsðGrðCÞÞ ¼ FcðGrðCÞÞx1n

Xn

i¼1

CBMCðGirðCÞÞ

where FcðGrðCÞÞ ¼ jMg ðGr ÞjjMnðGr Þj is the connectivity factor and FsðGrðCÞÞ ¼ 1

n

Pni¼1CBMCðGi

rÞ is the structure factor

where Mg and Mn are the set of glue methods and normal methods, respectively. Glue methods are the minimum number of methods without which the

reference graph will be divided into sub-graphs. Gir is one of the n children of Gr in the structure tree; CBMC denotes the cohesion of a component Gi

r

M.A. Ahmed et al. / Journal of Systems Architecture xxx (2010) xxx–xxx 5

were conducted. However, we include them in Table 4 for the sakeof completeness.

4. Cohesion metrics validity

As discussed above, the numerous object-oriented class cohe-sion metrics available suggest that there may be some sort ofinconsistencies among their computed measurements. Conse-quently, this would lead to reservations with regard to the validityof these metrics; that is whether a given metric really measureswhat we intend to measure. This equally raises doubts as whetherthe metric is reliable. For instance, a metric would not be consid-ered reliable if it gives the same measurement for classes thatare, intuitively, of different cohesion qualities; or vice-versa. Simi-larly, a metric would not be considered reliable if it gives differentmeasurements for classes that are, intuitively, of the same cohe-sion quality. Likewise, a metric would not be considered reliableif it ranks a class as better than another where intuitions suggestthe other way around. With this in mind, we conducted an exper-iment using three different classes, each with five methods and

Please cite this article in press as: M.A. Ahmed et al., A study on the uncertadoi:10.1016/j.sysarc.2010.06.004

four instance variables but with different levels of interactions be-tween the components of the class. As depicted in the first row ofTable 5, we can clearly see that C1 has two islands of connectedcomponents. Intuitively, C1 is expected to be of lower cohesionthan C2 and C3—both of which have a single island of connectedcomponent. We also expect C2 to be of lower cohesion than C3 be-cause it has fewer interactions among its components. Table 5summarizes the measurements achieved by available class cohe-sion metrics for each class.

As Table 5 shows, LCOM1 could not differentiate between C1and C2. LCOM3, LCOM4, LCC, and OCC could not differentiatebetween C2 and C3. These observations raise questions with re-gard to the validity of these metrics. Note that ‘OK’ in the com-ment column of Table 5 signifies that the results of the metricfollow our intuition with regard to differentiating between thethree classes. However, even those metrics which appear toagree with our intuition in the above examples may have othervalidity problems. For instance LCOM2 may return zero valuesfor classes where, intuitively, they are of different cohesionqualities [1].

inty inherent in class cohesion measurements, J. Syst. Architect. (2010),

Table 2Classification criteria.

# Criterion type Definition

1 Objective Characteristic This determines the external quality attribute the metric can be used to predict2 Granularity Characteristic Granularity refers to the types of artifact the metric considers: method, class, or package3 Availability Characteristic Similar to Briand’s usable criterion. It determines the software development phase in which the metric can be used. Some metrics

can only be used when coding is completed such metrics are available at the implementation level. Other metrics can be used atthe end of the design phase such metrics are available at the design level

4 Validity Characteristic Validity refers to the extent to which an empirical measure reflects the real meaning of the concept of cohesion. It equallydetermines the degree of reliability of the measurements computed using the metric. The degree of reliability gets lower if themetric gives the same value for classes that are, intuitively, of different level of cohesion; or vice-versa, it gives different valuesfor classes that are, intuitively, of the same level of cohesion

5 Finite range Characteristic Whether the metric is undefined for some classes. For example, LCOM5 computes to infinity for a class with no attribute or withonly one method. Metrics with similar problems include TCC, LCC and CAMC

6 Sensitivity Characteristic Sensitivity describes how a change in the module (or class) affects the measurement; and whether a change has a proportionalnegative or positive impact on the result of the metric

7 Normalization Characteristic Normalization determines whether the measurements computed using the metric are bounded; e.g., measurements belong to[0,1]

8 Validation Characteristic Validation specifies whether the metric is validated; and if so, whether it was theoretically or empirically validated. If it is notvalidated, how complex would the validation process be, and whether the corresponding researchers have suggested approachesfor validation

9 Interpretation Characteristic Interpretation determines whether the researchers have given suggestions on how to interpret the results obtained from themetric; it also determines the difficulties surrounding such interpretation

10 Connectiontype

Factor Connection type specifies those relationships, among the different components of a class, which the cohesion metric considers incalculating the cohesion of the module. This factor is an extension of Briand’s cohesion criterion where three more connectiontypes were identified. Based on our research, we outlined all the possible relationships that may exist among the components ofa class in Table 3. However, we have not exhausted all possible types; the types outlined in Table 3 are based on currentprominent cohesion metrics

11 Specialmethods

Factor Methods like constructor and access methods have an impact on the cohesion of a class. The special method attribute captureswhether the impact of such methods is considered in the definition of the cohesion measure. This factor is similar to Briand’s‘‘known problems” factor

12 Inheritance Factor Inheritance describes whether the metric considers the impact of inheritance on class cohesion

6 M.A. Ahmed et al. / Journal of Systems Architecture xxx (2010) xxx–xxx

We recognize that the objective of any given metric, that is theexternal attribute(s) the metric tries to predict, may require focus-ing on some factors of cohesion but not all. For example, a cohesionmetric that is meant to predict maintainability of a system may fo-cus on different factors than those factors considered by a metric topredict the fault density. However, as mentioned earlier, availablemetrics do not claim to predict one external attribute or the other.Consequently, measurements computed using these metrics areexpected to be consistent. With this in mind, the observation thatsome metrics could not differentiate between C1, C2, and C3 sug-gests that there might exist some sort of inconsistencies amongthe measurements computed using the different metrics in thesense that two metrics would rank two different classes differentlywith respect to their cohesion quality—that is, for example giventwo classes C1 and C2, Metric X may favor C1 over C2, while MetricY may suggest that C2 is more cohesive than C1. Clearly this wouldconfuse designers as which class design is more cohesive. Accord-ingly, we conducted experiments to study such inconsistencieswhen considering more than one metric to measure the cohesionof a set of classes. We conducted the experiments using seven opensource projects.3 Table 6 shows some details regarding theseprojects.

Table 7 shows the average class cohesion for each of the pro-jects using the different metrics discussed above. It is worth notinghere that we have conducted the experiments using only the first10 metrics in Table 5; the remaining six metrics were not includedin our empirical investigation because the tool we used for con-ducting the experiments, OOMeter [3], supports only the first 10metrics.

It is also worth noting that LCOM’s measurements are notbounded. However, in order to be able to assess the uncertaintyas discussed in Section 5, we needed to use them in a normalizedform. To the best of our knowledge, there is no prominent work

3 These projects were downloaded from SourceForege site http://sourceforge.net/.

Please cite this article in press as: M.A. Ahmed et al., A study on the uncertdoi:10.1016/j.sysarc.2010.06.004

on normalizing the LCOM’s measurements. The analysis and vali-dation of possible approaches for normalizing the measurementsare beyond the scope of this paper. For the sake of conductingexperiments within the context of our framework, we normalizedthe LCOM measurements using the 2014 classes of the projectsconsidered in Table 6. We normalized the measurements asfollows:

Normalized measurement

¼ 1� ðactual measurement=highest measurement4Þ

Ideally, we would normalize the measurements to the maxi-mum possible measurement. However, since the measurementsare not bounded, we use the highest measurement across the2014 classes considered. The rationale for subtracting from 1 isthat LCOMs provide inverse measurements; they reflect the ‘‘Lackof Cohesion” rather than the ‘‘Existence of Cohesion”. The formulaabove would inverse the measurements to directly reflect cohe-sion. More investigation into the matter will be considered in fu-ture work.

The average of each metric’s measurements across all classes ineach project is shown in Table 7. As mentioned above, we used nor-malized LCOM’s measurements. Fig. 1 shows a visual view of therelative ranking of the projects using their average class cohesion.Fig. 1 also groups the metrics into clusters where metrics with clo-sely correlated rankings are assigned to the same cluster. Forexample, by inspecting the figure, we can see that LCOM1, LCOM2,LCOM3, and LCOM4 have similar rankings and can be placed in thesame cluster. However, LCOM1 and LCOM2 have even much closermeasurements than the other two and were also put in a cluster bythemselves. Similarly, CCM, TCC, and LCC have similar rankingsand were put in the same cluster.

These observations from Fig. 1 inspired us to study more aboutthe relationships among the cohesion metrics themselves. To see

4 We excluded the 10% of classes with highest measurements as outliers.

ainty inherent in class cohesion measurements, J. Syst. Architect. (2010),

Tabl

e3

Conn

ecti

onty

pes.

Con

nec

tion

type

nam

eA

bbre

viat

ion

Elem

ent

1El

emen

t2

Des

crip

tion

Rel

atio

nsh

ip/

inte

ract

ion

type

Mea

ns

ofin

tera

ctio

nam

ong

met

hod

sIn

tera

ctio

nm

ode

Acc

ess

mod

e

M!

AM!

MA

ttri

bute

shar

ing

Met

hod

invo

cati

onD

irec

tIn

dire

ctR

W

Met

hod

-att

ribu

tere

fere

nci

ng

MA

RM

eth

odm

ofcl

ass

cA

ttri

bute

aof

clas

sc

mre

fere

nce

sa

XX

XD

irec

tm

eth

od–m

eth

odre

fere

nci

ng

DM

MR

Met

hod

mof

clas

sc

Met

hod

m0

ofcl

ass

cm

invo

kes

m0

dire

ctly

XX

XX

Indi

rect

met

hod

–met

hod

refe

ren

cin

gIM

MR

Met

hod

mof

clas

sc

Met

hod

m0

ofcl

ass

cm

rela

tes

tom0in

dire

ctly

via

oth

erm

eth

ods

that

dire

ctly

invo

keea

chot

her

XX

XX

Dir

ect

attr

ibu

tesh

arin

gD

AS

Met

hod

mof

clas

sc

Met

hod

m0

ofcl

ass

c,m

–m0

man

dm0

dire

ctly

refe

ren

cean

attr

ibu

tea

ofcl

ass

cin

com

mon

XX

XX

X

Indi

rect

attr

ibu

tesh

arin

gIA

SM

eth

odm

ofcl

ass

cM

eth

odm0

ofcl

ass

c,m

–m0

man

dm0

indi

rect

lyre

fere

nce

anat

trib

ute

aof

clas

sc

inco

mm

on

XX

XX

X

Dat

a–da

tain

tera

ctio

nD

DI

Dat

a-de

clar

atio

nin

clas

sc

Dat

a-de

clar

atio

nin

clas

sc

Dat

a–da

tain

tera

ctio

nX

X

Dat

a–m

eth

odin

tera

ctio

nD

MI

Met

hod

mof

clas

sc

Dat

a-de

clar

atio

nin

clas

sc

Dat

a–m

eth

odin

tera

ctio

nX

XX

Type

inte

rsec

tion

PPI

Type

1Ty

pe2

The

nu

mbe

rof

obje

ctty

pes

inth

em

eth

ods

wit

hre

spec

tto

the

nu

mbe

rof

type

sin

the

clas

s

X

Met

hod

sin

tera

ctio

ns

base

don

acce

ssty

pes

MIB

AT

Met

hod

mof

clas

sc

Met

hod

m0

ofcl

ass

c,m

–m0

mw

rite

sto

anat

trib

ute

aof

clas

sc

and

m0

read

sa

XX

X

M.A. Ahmed et al. / Journal of Systems Architecture xxx (2010) xxx–xxx 7

Please cite this article in press as: M.A. Ahmed et al., A study on the uncertadoi:10.1016/j.sysarc.2010.06.004

the possible clusters mathematically, we computed the correla-tions among the different rankings and generated a correlation ma-trix for the metrics. It is worth noting here that the chartspresented in Fig. 1 are based on the average cohesion measure-ment of each of the seven projects. We computed the correlationof the cohesion metrics based on measurements of the classes ofeach project—that is based on ranking the classes of each pro-ject—and computed the average of these correlations. The correla-tion table is shown in Table 8.

From the correlations shown in Table 8 and arbitrarily assumingtwo metrics to be significantly correlated if their correlation coef-ficient is equal to or greater than 0.6 (i.e., their rankings of the pro-jects agree more than 60% of the time), we observe the following.

1. LCOM1, LCOM2, and LCOM3 are significantly correlated.2. LCOM4 is only correlated with LCOM3’s.3. CCM and TCC are significantly correlated.4. TCC and LCC are significantly correlated.5. CBAMU is not significantly correlated to any of the metrics’

rankings; same applies to LCOM5 and CAMC.

Therefore, we can conclude from the above observations thatmeasurements of some of the metrics are consistent to some ex-tent, and accordingly form some clusters:

1. LCOM1, LCOM2 and LCOM3 belong to a cluster.2. LCOM3 also share a cluster with LCOM4.3. CCM, TCC and LCC belong to a cluster.

The overlap between the cohesion metrics with regard to theconnection types they consider is depicted in Fig. 2; the sets inthe figure represent the metrics.

The connection types that a metric considers certainly contrib-ute to the value generated. By introducing the DMMR connectiontype to LCOM4 the correlation between LCOM3 and LCOM4 is re-duced to 0.65. The calculation method is also a contributing factorto the cohesion value of the metric. This explains the difference ofthe generated values of LCOM1, LCOM2, and LCOM3. And to a moreextent, it explains the difference of values generated by LCOM4 andCCM.

Moreover, we can conclude from the correlations between TCC,CCM, and LCC that DAS connection type plays significantly morerole than the IAS and the DMMR connection types in the genera-tion of cohesion value and that DMMR contribute more than theIAS. However, there is still some degree of uncertainty due to theincomplete knowledge that should be investigated.

5. Uncertainty assessment framework

In this section we present a framework for assessing the uncer-tainty associated with the measurement we get when using a givenmetric to evaluate the cohesion of a class. The uncertainty arisesdue to the inability of the metric designer to comprehensively con-sider all factors that would indeed contribute to the level of cohe-sion of a given class; that is neglecting some factors due to the lackof a complete theory of the concept of cohesion, the impracticalityto list all the factors that affect the class cohesion, etc. In otherwords, the framework we present is meant to facilitate portrayingthe probability distribution of the error associated with measure-ments computed using a given metric. This, in turn, allows associ-ating a degree of reliability to classes ranking suggested by a givenmetric; that is a level of how dependable such ranking is. It isworth noting here, though, that we just present here a conceptualframework in the sense it outlines a possible approach to tackle theissue but does not dictate specific settings of the parameters in-

inty inherent in class cohesion measurements, J. Syst. Architect. (2010),

Table 4Cohesion metrics vs. factors.

Metric MAR DMMR IMMR DAS IAS DDI DMI PPI MIBAT Special methods Inheritance

LCOM1 X NO NOLCOM2 X NO NOLCOM3 X NO NOLCOM4 X X Access method NOLCOM5 X NO NOCBAMU X X XCCM X X NO NOTCC X Excludes constructors YESLCC X X Excludes constructors YESCAMC X X NO NOCo X X Access method NOECCM X X NO YESOCC X X Access methods NOPCC X X X Access methods NORCI X X X NO YESCBMC X X X Glue methods NO

Table 5Validity test examples.

# Metric C1

m1

A1 A2 A3 A4

m2 m3 m4 m5

C2

m1

A1 A2 A3 A4

m2 m3 m4 m5

C3

m1

A1 A2 A3 A4

m2 m3 m4 m5

Comment

1 LCOM1 6.00 6.00 0.00 Does not differentiate between C1 and C22 LCOM2 3.00 2.00 0.00 OK3 LCOM3 2.00 1.00 1.00 Does not differentiate between C2 and C34 LCOM4 2.00 1.00 1.00 Does not differentiate between C2 and C35 LCOM5 0.81 0.75 0.44 OK6 CBAMU 0.175 0.2 0.325 OK7 CCM 0.15 0.40 1.00 OK8 TCC 0.30 0.40 1.00 OK9 LCC 0.60 1.00 1.00 Does not differentiate between C2 and C3

10 CAMC N/A N/A N/A Uses parameter types11 Co N/A 0.00 1.00 Applicable only when the connected component is one12 ECCM N/A N/A N/A Cannot be computed for these examples13 OCC 0.75 1.00 1.00 Does not differentiate between C2 and C314 PCC N/A N/A N/A Cannot be computed for these examples15 RCI 0.23 0.27 0.43 OK16 CBMC 0.00 0.13 0.60 OK

Table 6Selected projects.

Project Number ofclasses

Number ofmethods

Number ofattributes

Babeldoc 1.0 212 1541 936Checkstyle 2.4 58 492 228JGraph 2.0 29 750 340VR Juggler 1.1DR3 278 2502 1338Saxon 6.5.2 344 3252 1678Jext 3.2 553 3233 2435Saxon 8.0 540 4881 3298

8 M.A. Ahmed et al. / Journal of Systems Architecture xxx (2010) xxx–xxx

volved; rather, the parameters are typically set to suit the dataavailable. For example, in demonstrating the framework, we assessreliability as the derivation from the mean of a family of prominentcohesion metrics; we used the mean of measurements of the fam-ily of metrics as the benchmark for calculating the mean and stan-dard deviation of errors of the different metrics. In other words, inthis demonstration, reliability is considered to reflect how similar ametric is to the average of its family so that it could be used as arepresentative of the average when a single metric is to be used.Considering a family of metrics, we refer to the reliability of a rank-

Please cite this article in press as: M.A. Ahmed et al., A study on the uncertdoi:10.1016/j.sysarc.2010.06.004

ing of two classes with respect to their cohesion qualities com-puted using a given metric from the family as the consistency ofthe metric’s ranking with the ranking computed using the meanmeasurements of all metrics in the family. Accordingly, the degreeof reliability associated to the ranking would be the probabilitythat the ranking computed using the metric is consistent withthe ranking computed using the mean of the family.

With this in mind, we computed the mean of the error and itsstandard deviation for each metric compared to the mean mea-surements of all metrics. It is worth noting here that the standarddeviation is not expected to be of value 0, that is the error is notcertain; this is because the error is highly dependent on the prob-ability distribution that the factors neglected by the metric indeedcontribute to cohesion of the class under evaluation. For the sake ofdemonstrating the framework, we assume that the error probabil-ity distribution follows a Normal Distribution, also known as Gauss-ian Distribution, as it is the most commonly used distribution indata analysis [17]. The rationale behind this popularity is that nor-mality is the central assumption of the mathematical theory of er-rors. The assumption is that any deviation from normality needs tobe explained. In that sense, in the theory of errors, normality is theonly observation that need not be explained, being expected. How-

ainty inherent in class cohesion measurements, J. Syst. Architect. (2010),

Table 7Average class cohesion of the selected projects.

Metric LCOM1 LCOM2 LCOM3 LCOM4 LCOM5 CBAMU CCM TCC LCC CAMC

Jext 3.2 0.990 0.990 0.930 0.889 0.740 0.100 0.083 0.083 0.098 0.462CheckStyle 2.4 0.969 0.969 0.918 0.820 0.649 0.047 0.154 0.093 0.103 0.369Jgraph 2.0 0.916 0.918 0.805 0.809 0.572 0.073 0.069 0.059 0.095 0.318Babeldoc 1.0 0.949 0.951 0.813 0.778 0.633 0.130 0.129 0.102 0.118 0.514Saxon 6.5.2 0.985 0.985 0.920 0.865 0.700 0.079 0.091 0.081 0.078 0.460VR Juggler 1.1 0.982 0.981 0.882 0.853 0.709 0.065 0.078 0.083 0.083 0.859Saxon 8.0 0.981 0.982 0.928 0.832 0.648 0.165 0.172 0.155 0.178 0.358

Fig. 1. Projects average cohesion measurements.

Table 8Cohesion metrics correlation using the average correlation of the seven projects.

5 Normal Distribution Calculator: http://www.math.csusb.edu/faculty/stanton/m262/normal_distribution/normal_distribution.html.

M.A. Ahmed et al. / Journal of Systems Architecture xxx (2010) xxx–xxx 9

ever, as we will discuss later in this section, if the original data arenot normally distributed with regard to the factors considered forcohesion, then the residuals (i.e., errors) will also not be normallydistributed. For now, for the sake of simplicity, we will assume thenormal distribution. Accordingly, the error associated with a givenmetric, M, would be a normal random variable with a probabilitydensity function (PDF) EM ¼ Nðl;rÞ, that is a normal distributionwith mean l and a standard deviation r. This would mean thatif a measurement m is computed using M for class C; then the ac-tual measurement computed by M for class C, that is M(C), shouldalso be considered as a random variable with a PDF ofMðCÞ ¼ ðmþ EMÞ ¼ ðmþ Nðl;rÞ), that is MðCÞ ¼ Nððmþ lÞ;rÞ.

We conducted experiments using the 2014 classes of the pro-jects shown in Table 6. Table 9 shows the mean and standard devi-ation of the 10 metrics we considered; the table also shows the PDFcorresponding to the measurement m of each metric. For instance,the PDF of LCOM1 measurement m for a class C is given byLCOM1ðCÞ ¼ mþ ELCOM1, that is a normal distribution with meanl ¼ mþ 0:047291, and standard deviation r ¼ 069208, that isNðmþ 0:047291;069208Þ.

Let’s now consider two measurements m1 and m2 computedusing a metric M for two classes C1 and C2, respectively. Let’s as-sume that m1 < m2, which means that M ranks C2 as of higher

Please cite this article in press as: M.A. Ahmed et al., A study on the uncertadoi:10.1016/j.sysarc.2010.06.004

cohesion than C1. Due to the inherent error in the metric design,there is a probability that this ranking is not credible. The confi-dence that this ranking is a trustworthy one can be calculated asthe probability PfMðC1Þ < MðC2Þg ¼ PfMðC1Þ �MðC2Þ < 0g. IfMðC1Þ and MðC2Þ follow the normal distributions Nðl1;r1Þ andNðl2;r2Þ, respectively; then MðC1Þ �MðC2Þ is a normal distribu-tion with l ¼ l1 � l2, and r2 ¼ r2

1 þ r22 [22]. Accordingly, if the

measurements m1 and m2 are computed as the cohesion levelsfor two classes C1 and C2, using a metric M with EM ¼ Nðl;rÞ, thenthe degree of reliability, i.e., the probability that the ranking pro-vided by these measurements is a trustworthy one, is given byRMðC1;C2Þ ¼ Pfx < 0g, where x is a normal random variable witha PDF ¼ Nðm1 �m2;2

1=2rÞ.As an example, let’s consider the error portray for TCC in Table

9, in this case ETCC ¼ Nð0:273847; 0:145389Þ. Now, let’s for instanceconsider the TCC measurements for C1 and C2 in Table 5, that is0.30 and 0.40, respectively. Accordingly, RTCCðC1;C2Þ ¼ Pfx <0g ¼ 0:686642,5 where x is a random variable with normalPDF ¼ Nð�0:1;0:205612Þ; one could interpret this as a 68.66% confi-

inty inherent in class cohesion measurements, J. Syst. Architect. (2010),

PPI

CAMC

LCOM5LCOM1, LCOM2,LCOM3 & TCC

LCOM4, CCM, Co,ECCM & OCC

RCI

CBAMU & CBMC PCCLCC

DDI

DMI

MAR

IMMR

DMMR

MIBAT

DAS IAS

Fig. 2. Cohesion metrics meeting points.

Table 9Measurements error portray.

Metric Mean error Standard deviation Measurement PDF

LCOM1 0.419942 0.175308 Nðmþ 0:419942;0175308ÞLCOM2 0.431665 0.171748 Nðmþ 0:431665;0:171748ÞLCOM3 0.221074 0.126018 Nðmþ 0:2210740:126018ÞLCOM4 0.199889 0.130268 Nðmþ 0:199889;0:130268ÞLCOM5 0.291717 0.204241 Nðmþ 0:291717;0:204241ÞCBAMU 0.329485 0.164205 Nðmþ 0:329485;0:164205ÞCCM 0.280663 0.138033 Nðmþ 0:280663;0:138033ÞTCC 0.273847 0.145389 Nðmþ 0:273847;0:145389ÞLCC 0.273455 0.148917 Nðmþ 0:273455;0:148917ÞCAMC 0.234144 0.146756 Nðmþ 0:234144;0:146756Þ

10 M.A. Ahmed et al. / Journal of Systems Architecture xxx (2010) xxx–xxx

dence that indeed C1 is less cohesive than C2. As another example,using Table 7, RLCOM1ðCheckstyle;BabeldocÞ ¼ Pfx < 0g ¼ 0:65702,where x is a random variable with normal PDF ¼ Nð�0:096;0:247923Þ; i.e., 65.07% confidence that, on the average, Checkstyleis less cohesive than Babeldoc. Similarly, RLCOM1ðSaxon6:5:2;BabeldocÞ ¼ Pfx < 0g ¼ 0:575179, where x is a random variable withnormal PDF ¼ Nð�0:047;0:247923Þ; i.e., only 57.52% confidencethat, on the average, Saxon 6.5.2 is less cohesive than Babeldoc.

It is worth noting here that the higher the standard deviation ofthe error, the lower the confidence on the metric is. Table 9 showsthe mean error (i.e., deviation from the benchmark that is the all-measurements average) to be relatively high for all metrics. It alsoshows that the standard deviation is high too. This is because weconsidered all metrics equally when calculating the all-measure-ments average. Measurements computed using LCOM metrics (ex-cept for LCOM5 and, to some extent, LCOM4) have shown to besignificantly different from those measurements computed usingother metrics; this threw the calculation of all-measurementsaverage off a bit. Actually, using some weighted average of suchmeasurements would be more appropriate in the sense that mea-surements of metrics that consider more factors may be givenhigher weights. Additionally, different factors may be given differ-ent weights based on their importance. Accordingly, measure-ments may be weighted in proportion to the importance of the

Please cite this article in press as: M.A. Ahmed et al., A study on the uncertdoi:10.1016/j.sysarc.2010.06.004

factors considered by their corresponding metrics. These possibili-ties are to be investigated in future work.

In summary, while the analysis presented in this section andthe results reported in Table 9 need refinement, they should serveas eye-opening for researchers and practitioners evaluating de-signs and certifying component using single metrics. Researchersand practitioners would be better-off using the results in Table 9rather than just assuming that the measurements of their appliedmetrics are noise free and have no errors. Nevertheless, there isan urgent need for more rigorous analysis to refine Table 9 as dis-cussed below.

6. Conclusion and future work

Despite several years of active research in developing andapplying object-oriented cohesion metrics, studying the differ-ences and inter-inconsistencies among such metrics remains anopen problem. In order to orient future research in this area, wehave presented classification criteria that can serve as a basis forcomparing object-oriented class cohesion metrics. In light of thecriteria, we analyzed the measurements computed using promi-nent object-oriented cohesion metrics. We have shown empiricallythat there are inconsistencies among these metrics. Accordingly,we have presented a framework for assessing the reliability ofcohesion metrics and the uncertainty associated with their mea-surements. The authors are engaged in studying the uncertaintysurrounding the use of quality metrics within quality models,e.g., using coupling, cohesion, size, etc., as predictors to maintain-ability, effort, etc. Handling measurements computed using qualitymetrics as random variables rather than certain values have shownto be more effective [20,21].

In demonstrating the framework, we assessed reliability as thederivation from the mean of a family of prominent cohesion met-rics; we used the mean of measurements of the family of metrics asthe benchmark for calculating the mean and standard deviation oferrors of the different metrics. Clearly, other possibilities could beconsidered. In our future work, we will conduct experiments usingweighted average of the measurements when developing the

ainty inherent in class cohesion measurements, J. Syst. Architect. (2010),

M.A. Ahmed et al. / Journal of Systems Architecture xxx (2010) xxx–xxx 11

benchmark. Future work will also investigate the effectiveness ofassigning weights to the factors and reflecting such weights incomputing the weighted average of the measurements. Experts’judgment with regard to the cohesion of a given set of classes willbe investigated for incorporation in portraying the error as well.This would compensate for the fact that none of the metrics cap-ture all factors contributing to cohesion and it may be the case thateven the whole set of metrics considered does not yet cover whatexperts would consider to be cohesion.

Also in demonstrating the framework, we assumed that themeasurements error follows normal distribution without actualformal test for normality being conducted. In future work, we willconduct more rigorous analysis of the probability distributionfunction of the error. Analysis using the Shapiro–Wilk test [23],etc., will be conducted to confirm normality or report otherwise.

Our future work will also consider assessing the reliability ofrankings by directly comparing rankings of each metric to a bench-mark of rankings on the same data, rather than assessing the reli-ability of rankings via using the uncertainty associated with themeasurements as demonstrated in this paper. The advantage inthis case would be that there will be no need to normalize theLCOMs; accordingly, this is expected to offer a better accuracy withregard to the reliability of rankings. However, clearly, assessing thereliability using the rankings only will not give any insight on thenature of error associated with measurements of each metric.

Last but not least, the analysis and framework presented in thispaper could be an eye-opening for researchers to conduct similarinvestigations considering other internal attributes of interest forcomponent certification and components quality assessment ingeneral. This is necessary for those internal attributes which canbe measured using a range of metrics.

Acknowledgements

The authors wish to acknowledge King Fahd University ofPetroleum and Minerals (KFUPM) for utilizing the various facilitiesin carrying out this research.

References

[1] A. Adam, Implementation and Validation of OO Design-level Cohesion Metrics,M.S. Thesis, Information and Computer Science Department, King FahdUniversity of Petroleum and Minerals, 2004.

[2] A. Adam, J. AlGhamdi, M. Ahmed, Can cohesion predict fault density? in: The4th ACS/IEEE International Conference on Computer Systems and Applications,Dubai/Sharjah, UAE, March 8–11, 2006.

[3] J. Alghamdi, R. Rufai, S. Khan, OOMeter: a software quality assurance tool, in:Proceedings of the Ninth European Conference on Software Maintenance andReengineering (CSMR 2005), 21–23 March 2005, Manchester, UK, IEEEComputer Society, 2005, pp. 190–191, ISBN 0-7695-2304.

[4] J. AlGhamdi, M. Wasiq, M. Ahmed , Principle and Metrics for Cohesion-BasedObject-Oriented Component Assessment, Technical report, College ofComputer Science and Engineering, King Fahd University of Petroleum andMinerals, Saudi Arabia, 2001.

[5] A. Alvaro, E.S. Almeida, S.R.L. Meira, A software component certification: asurvey, in: 31st IEEE EUROMICRO Conference on Software Engineering andAdvanced Applications (SEAA), CBSE Track, 2005.

[6] H. Aman, K. Yamasaki, H. Yamada, M. Noda, A proposal of class cohesionmetrics using sizes of cohesive parts, in: T. Welzer et al. (Eds.), Knowledge-Based Software Engineering, IOS Press, 2002, pp. 102–107.

[7] J. Bansiya, L. Etzkorn, C. Davis, W. Li, A class cohesion metric for object-oriented designs, Journal of Object-Oriented Programming (January) (1999)47–52.

[8] J.M. Bieman, B. Kang, Cohesion and reuse in an object-oriented system, in:Proceedings of the ACM Symposium on Software Reusability (SSR’94), 1995,pp. 259–262.

[9] L. Briand, J. Daly, J. Wust, A Unified Framework for Cohesion Measurement inObject-Oriented Systems, Technical Report ISERN-97-05, Fraunhofer Institutefor Experimental Software Engineering, Kaiserlautern, Germany, 1997.

[10] L. Briand, S. Morasca, V. Basili, Defining and Validating High-Level DesignMetrics, Computer Science Technical Report CS-TR 3301, University ofMaryland at College Park, 1994.

[11] M. Bunge, Treatise on Basic Philosophy: Ontology I: The Furniture of theWorld, Boston, Riedel, 1977.

Please cite this article in press as: M.A. Ahmed et al., A study on the uncertadoi:10.1016/j.sysarc.2010.06.004

[12] H.S. Chae, Y.R. Kwon, D. Bae, A cohesion measure for object-oriented classes,Software-Practice and Experience 30 (12) (2000) 1405–1431.

[13] S.R. Chidamber, C.F. Kemerer, Towards a metrics suite for object orienteddesign, in: A. Paepcke, (Ed.), Proceedings of the Conference on Object-OrientedProgramming: Systems, Languages and Applications (OOPSLA’91), SIGPLANNotices, vol. 26(11), 1991, pp. 197–211.

[14] S. Chidamber, C. Kemerer, A metrics suite for object oriented design, IEEETransactions on Software Engineering 20 (6) (1994).

[15] E.N. Fenton, S.L. Pfleeger, Software Metrics – A Rigorous and PracticalApproach, PWS Publishing Company, Boston, 1997.

[16] M. Hitz, B. Montazeri, Chidamber and Kemerer’s metrics suite: a measurementtheory perspective, IEEE Transaction on Software Engineering 22 (4) (1996).

[17] R. Jain, The Art of Computer Systems Performance Analysis, John Wiley andSons, Inc., 1991.

[18] C. Larman, Applying UML and Patterns: An Introduction to Object-OrientedAnalysis and Design and the Unified Process, Prentice-Hall, Inc., US, 2002.

[19] A. Mitchell, J.F. Power, Run-time cohesion metrics: an empirical investigation,in: International Conference on Software Engineering Research and Practice(SERP’04), Las Vegas, Nevada, June 21–24, 2004, pp. 532–537.

[20] S.Z. Muzaffar, Adaptive Fuzzy Logic Based Framework for HandlingImprecision And Uncertainty in Software Development Effort PredictionModels, M.S. Thesis, Information and Computer Science Department, KingFahd University of Petroleum and Minerals, 2006.

[21] Q.A. Rahman, Handling Imprecision and Uncertainty in Software QualityModels, M.S. Thesis, Information and Computer Science Department, KingFahd University of Petroleum and Minerals, 2005.

[22] S. Ross, A First Course in Probability, fourth ed., Macmillan College PublishingCompany, Inc., 1994.

[23] S.S. Shapiro, M.B. Wilk, An analysis of variance test for normality (completesamples), Biometrika 52 (3/4) (1965) 591–611. JSTOR: 2333709.

[24] I. Sommerville, Software Engineering, eighth ed., Addison Wesley, 2006.

Moataz A. Ahmed received his Ph.D. in computer sci-ence from George Mason University in 1997. He iscurrently the Chief Technology Officer, LEROS Technol-ogies Corporation, Fairfax, Virginia. He is also currentlyAdjunct/Guest Professor in a number of universities inthe US as well as overseas too. During his career, heworked as a software architect in several softwarehouses. His research interest includes softcomputing-based software engineering, especially, software testing,software reuse, and cost estimation; and softwaremetrics and quality models. He has supervised a num-ber of theses and published a number of technical

papers in refereed journals and conferences in these areas.

Abubakar Adam is a lecturer in the computer sciencedepartment of Yanbu University College. He has beenworking with the college for about four years. In theprocess of his teaching, he got quite a number of cer-tificates of appreciation from the college. Previously, hewas a research assistant at King Fahd University ofPetroleum and Minerals. His research interest is insoftware engineering and database systems.

Dr. Jarallah S. AlGhamdi received his B.S. degree incomputer science and engineering in 1982 from Uni-versity of Petroleum and Minerals, Dhahran, SaudiArabia, the M.S. degree in 1986 and Ph.D. degree in 1994in computer science from Arizona State University. Hechaired the Information and Computer Science Depart-ment of King Fahd University of Petroleum and Minerals(KFUPM) from 1996 to 2000. He was the Dean of theCollege of Computer Sciences and Engineering atKFUPM from February 2000 to January 2007. Hisresearch and consultancy work is in software metrics,software architecture, eLearning, eGovernment, and IT

strategic planning. He is now the CIO of the Ministry of Education, Saudi Arabia.

inty inherent in class cohesion measurements, J. Syst. Architect. (2010),