The Confounding Effect of Class Size on The Validity of Object-Oriented Metrics
更新时间:2023-07-23 20:22:01 阅读量: 实用文档 文档下载
- the推荐度:
- 相关推荐
de l’information
Natonia Rlsearch Ceoucnl iCnada anIstitue for tIfonmatrio neThcnoogy
Colsenl intioana le rdechrcehs eCaandaI stntiu dt Techenlooig de l’eiformationnER-106B2
hTe Cnoofnudng ifEecf tfoC als Sszi onethe Validty ifoOb jct-oeiernte MdtreicKshaled lE mEam,S adi Banleabi,ra nd Nishih toGe Seltembpe 199r9
Cnada
NRaC4 360
de l’information
National ResearchCouncil Canada
Institute forInformation TechnologyConseil nationalde recherches CanadaInstitut de Technologiede l’information
The Confounding Effect of Class Size on the Validity
of Object-oriented Metrics
Khaled El Emam, Saida Benlarbi, and
Nishith Goel
September 1999
Copyright 1999 by
National Research Council of Canada
Permission is granted to quote short excerpts and to reproduce figures and tables from this report,
provided that the source of such material is fully acknowledged.
de l’information
The Confounding Effect of Class Size on
The Validity of Object-Oriented Metrics
Khaled El Emam
National Research Council, Canada
Institute for Information Technology
Building M-50, Montreal Road
Ottawa, Ontario
Canada K1A OR6
khaled.el-emam@iit.nrc.caSaida BenlarbiNishith GoelCistel Technology210 Colonnade RoadSuite 204Nepean, OntarioCanada K2E 7L5{benlarbi, ngoel}@
Abstract
Much effort has been devoted to the development and empirical validation of object-oriented metrics.
The empirical validations performed thus far would suggest that a core set of validated metrics is close
to being identified. However, none of these studies control for the potentially confounding effect of class
size. In this paper we demonstrate a strong size confounding effect, and question the results of previous
object-oriented metrics validation studies. We first investigated whether there is a confounding effect of
class size in validation studies of object-oriented metrics and show that based on previous work there is
reason to believe that such an effect exists. We then describe a detailed empirical methodology for
identifying those effects. Finally, we perform a study on a large C++ telecommunications framework to
examine if size is really a confounder. This study considered the Chidamber and Kemerer metrics, and
a subset of the Lorenz and Kidd metrics. The dependent variable was the incidence of a fault
attributable to a field failure (fault-proneness of a class). Our findings indicate that before controlling for
size, the results are very similar to previous studies: the metrics that are expected to be validated are
indeed associated with fault-proneness. After controlling for size none of the metrics we studied were
associated with fault-proneness anymore. This demonstrates a strong size confounding effect, and
casts doubt on the results of previous object-oriented metrics validation studies. It is recommended that
previous validation studies be re-examined to determine whether their conclusions would still hold after
controlling for size, and that future validation studies should always control for size.
1 Introduction
The validation of software product metrics has received much research attention by the software
engineering community. There are two types of validation that are recognized [48]: internal and external.
Internal validation is a theoretical exercise that ensures that the metric is a proper numerical
characterization of the property it claims to measure. External validation involves empirically
demonstrating that the product metric is associated with some important external metric (such as
measures of maintainability or reliability). These are also commonly referred to as theoretical and
empirical validation respectively [73], and procedures for achieving both are described in [15]. Our focus2in this paper is empirical validation.
Product metrics are of little value by themselves unless there is empirical evidence that they are
associated with important external attributes [65]. The demonstration of such a relationship can serve
two important purposes: early prediction/identification of high risk software components, and the
construction of preventative design and programming guidelines.1
Some authors distinguish between the terms ‘metric’ and ‘measure’ [2]. We use the term “metric” here to be consistent with
prevailing international standards. Specifically, ISO/IEC 9126:1991 [64] defines a “software quality metric” as a “quantitative scale
and method which can be used to determine the value a feature takes for a specific software product”.
21 Theoretical validations of many of the metrics that we consider in this paper can be found in [20][21][30].
de l’information
Early prediction is commonly cast as a binary classification problem. This is achieved through a quality
model that classifies components into either a high or low risk category. The definition of a high risk
component varies depending on the context of the study. For example, a high risk component is one that
contains any faults found during testing [14][75], one that contains any faults found during operation [72],
or one that is costly to correct after an error has been found [3][13][1]. The identification of high risk
components allows an organization to take mitigating actions, such as focus defect detection activities on
high risk components, for example optimally allocating testing resources [56], or redesign components
that are likely to cause field failures or be costly to maintain. This is motivated by evidence showing that
most faults are found in only a few of a system’s components [86][51][67][91].
A number of organizations have integrated quality models and modeling techniques into their overall
quality decision making process. For example, Lyu et al. [81] report on a prototype system to support
developers with software quality models, and the EMERALD system is reportedly routinely used for risk
assessment at Nortel [62][63]. Ebert and Liedtke describe the application of quality models to control the
quality of switching software at Alcatel [46].
The construction of design and programming guidelines can proceed by first showing that there is a
relationship between say a coupling metric and maintenance cost. Then proscriptions on the maximum
allowable value on that coupling metric are defined in order to avoid costly rework and maintenance in the4future. Examples of cases where guidelines were empirically constructed are [1][3]. Guidelines based
on anecdotal experience have also been defined [80], and experience-based guidelines are used directly
in the context of software product acquisition by Bell Canada [34].
Concordant with the popularity of the object-oriented paradigm, there has been a concerted research
effort to develop object oriented product metrics [8][17][30][80][78][27][24][60][106], and to validate them
[4][27][17][19][22][78][32][57][89][106][8][25][10]. For example, in [8] the relationship between a set of
new polymorphism metrics and fault-proneness is investigated. A study of the relationship between
various design and source code measures using a data set from student systems was reported in
[4][17][22][18], and a validation study of a large set of object-oriented metrics on an industrial system was
described in [19]. Another industrial study is described in [27] where the authors investigate the
relationship between object-oriented design metrics and two dependent variables: the number of defects
and size in LOC. Li and Henry [78] report an analysis where they related object-oriented design and code
metrics to the extent of code change, which they use as a surrogate for maintenance effort. Chidamber
et al. [32] describe an exploratory analysis where they investigate the relationship between object-
oriented metrics and productivity, rework effort and design effort on three different financial systems
respectively. Tang et al. [106] investigate the relationship between a set of object-oriented metrics and
faults found in three systems. Nesi and Querci [89] construct regression models to predict class
development effort using a set of new metrics. Finally, Harrison et al. [57] propose a new object-oriented
coupling metric, and compare its performance with a more established coupling metric.
Despite minor inconsistencies in some of the results, a reading of the object-oriented metrics validation
literature would suggest that a number of metrics are indeed ‘validated’ in that they are strongly
associated with outcomes of interest (e.g., fault-proneness) and that they can serve as good predictors of
high-risk classes. The former is of course a precursor for the latter. For example, it has been stated that
some metrics (namely the Chidamber and Kemerer – henceforth CK – metrics of [30]) “have been proven
empirically to be useful for the prediction of fault-prone modules” [106]. A recent review of the literature
stated that “Existing data suggests that there are important relationships between structural attributes and
external quality indicators” [23].
However, almost all of the validation studies that have been performed thus far completely ignore the
potential confounding impact of class size. This is the case because the analyses employed are
univariate: they only model the relationship between the product metric and the dependent variable of
interest. For example, recent studies used the bivariate correlation between object-oriented metrics and3
It is not, however, always the case that binary classifiers are used. For example, there have been studies that predict the number
of faults in individual components (e.g., [69]), and that produce point estimates of maintenance effort (e.g., [78][66]).
It should be noted that the construction of guidelines requires the demonstration of a causal relationship rather than a mere
association.43
de l’information
the number of faults to investigate the validity of the metrics [57][10]. Also, univariate logistic regression
models are used as the basis for demonstrating the relationship between object-oriented product metrics
and fault-proneness in [22][19][106]. The importance of controlling for potential confounders in empirical
studies of object-oriented products has been emphasized [23]. However, size, the most obvious potential
confounder, has not been controlled in previous validation studies.
The objective of this paper is to investigate the confounding effect of class size on the validation of object-
oriented product metrics. We first demonstrate based on previous work that there is potentially a size
confounding effect in object-oriented metrics validation studies, and present a methodology for empirically
testing this. We then perform an empirical study on an object-oriented telecommunications framework5written in C++ [102]. The metrics we investigate consist of the CK metrics suite [30], and some of the
metrics defined by Lorenz and Kidd [80]. The external metric that we validate against is the occurrence of
a fault, which we term the fault-proneness of the class. In our study a fault is detected due to a field
failure.
Briefly, our results indicate that by using the commonly employed univariate analyses our results are
consistent with previous studies. After controlling for the confounding effect of class size, none of the
metrics is associated with fault-proneness. This indicates a strong confounding effect of class size on
some common object-oriented metrics. The results cast serious doubt that many previous validation
studies demonstrate more than that size is associated with fault-proneness.
Perhaps the most important practical implication of these results is that design and programming
guidelines based on previous validation studies are questioned. Efforts to control cost and quality using
object-oriented metrics as early indicators of problems may be achieved just as well using early indicators
of size. The implications for research are that data from previous validation studies should be re-
examined to gauge the impact of the size confounding effect, and future validation studies should control
for size.
In Section 2 we provide the rationale behind the confounding effect of class size and present a framework
for its empirical investigation. Section 3 presents our research method, and Section 4 includes the results
of the study. We conclude the paper in Section 5 with a summary and directions for future work.
2 Background
This section is divided in two parts. First, we present the theoretical and empirical basis of the object-
oriented metrics that we attempt to validate. Second, we demonstrate that there is a potentially strong
size confounding effect in object-oriented metrics validation studies.
2.1 Theoretical and Empirical Basis of Object-Oriented Metrics
2.1.1 Theoretical Basis and Its Empirical Support
The primary reason why there is an interest in the development of product metrics in general is
exemplified by the following justification for a product metric validity study “There is a clear intuitive basis
for believing that complex programs have more faults in them than simple programs” [87]. However, an
intuitive belief does not make a theory. In fact, the lack of a strong theoretical basis driving the
development of traditional software product metrics has been criticized in the past [68]. Specifically,
Kearney et al. [68] state that “One of the reasons that the development of software complexity measures
is so difficult is that programming behaviors are poorly understood. A behavior must be understood before
what makes it difficult can be determined. To clearly state what is to be measured, we need a theory of
programming that includes models of the program, the programmer, the programming environment, and
the programming task.” It has been stated that for historical reasons the CK metrics are the most referenced [23]. Most commercial metrics collection tools
available at the time of writing also collect these metrics.5
de l’information
Figure 1: Theoretical basis for the development of object-oriented product metrics.
In the arena of object-oriented metrics, a slightly more detailed articulation of a theoretical basis for
developing quantitative models relating product metrics and external quality metrics has been provided in
[19], and is summarized in Figure 1. There, it is hypothesized that the structural properties of a software
component (such as its coupling) have an impact on its cognitive complexity. Cognitive complexity is
defined as the mental burden of the individuals who have to deal with the component, for example, the
developers, testers, inspectors, and maintainers. High cognitive complexity leads to a component6exhibiting undesirable external qualities, such as increased fault-proneness and reduced maintainability.
Certain structural features of the object-oriented paradigm have been implicated in reducing the
understandability of object-oriented programs, hence raising cognitive complexity. We describe these
below.
2.1.1.1 Distribution of Functionality
In traditional applications developed using functional decomposition, functionality is localized in specific
procedures, the contents of data structures are accessed directly, and data central to an application is
often globally accessible [110]. Functional decomposition makes procedural programs easier to
understand because it is based on a hierarchy in which a top-level function calls lower level functions to
carry out smaller chunks of the overall task [109]. Hence tracing through a program to understand its
global functionality is facilitated.
In one experimental study with students and professional programmers [11], the authors compared
maintenance time for three equivalent programs (implementing three different applications, therefore we
have nine programs): one consisted of a straight serial structure (i.e., one main function), a program
developed following the principles of functional decomposition, and an object-oriented program (without
inheritance). In general, it took the students more time to change the object-oriented programs, and the
professionals exhibited the same effect, although not as strongly. Furthermore, both the students and
professionals noted that they found that it was most difficult to recognize program units in the object-
oriented programs, and the students felt that it was also most difficult to find information in the object-
oriented programs. Widenbeck et al. [109] make a distinction between program functionality at the local
level and at the global (application) level. At the local level they argue that the object-oriented paradigm’s
concept of encapsulation ensures that methods are bundled together with the data that they operate on,
making it easier to construct appropriate mental models and specifically to understand a class’ individual
functionality. At the global level, functionality is dispersed amongst many interacting classes, making itharder to understand what the program is doing. They support this in an experiment with equivalent small
C++ (with no inheritance) and Pascal programs whereby the subjects were better able to answer
questions about the functionality of the C++ program. They also performed an experiment with larger
programs. Here the subjects with the C++ program (with inheritance) were unable to answer questions
about its functionality much better than guessing. While this study was done with novices, it supports the
general notions that high cohesion makes object-oriented programs easier to understand, and high
coupling makes them more difficult to understand. Wilde et al.’s [110] conclusions based on an interview-
based study of two object-oriented systems at Bellcore implemented in C++ and an investigation of a PC
Smalltalk environment, all in different application domains, are concordant with this finding, in that
programmers have to understand a method’s context of use by tracing back through the chain of calls
that reach it, and tracing the chain of methods it uses. When there are many interactions, this
6 To reflect the likelihood that not only structural properties affect a component’s external qualities, some authors have included
additional metrics as predictor variables in their quantitative models, such as reuse [69], the history of corrected faults [70], and the
experience of developers [72][71]. However, this does not detract from the importance of the primary relationship between product
metrics and a component’s external qualities.
de l’information
exacerbates the understandability problem. An investigation of a C and a C++ system, both developed by
the same staff in the same organization, concluded that “The developers found it much harder to trace
faults in the OO C++ design than in the conventional C design. Although this may simply be a feature of
C++, it appears to be more generally observed in the testing of OO systems, largely due to the distorted
and frequently nonlocal relationships between cause and effect: the manifestation of a failure may be a
‘long way away’ from the fault that led to it. […] Overall, each C++ correction took more than twice as long
to fix as each C correction.” [59].
2.1.1.2 Inheritance Complications
As noted in [43], there has been a preoccupation within the community with inheritance, and therefore
more studies have investigated that particular feature of the object-oriented paradigm.
Inheritance introduces a new level of delocalization, making the understandability even more difficult. It
has been noted that “Inheritance gives rise to distributed class descriptions. That is, the complete
description for a class C can only be assembled by examining C as well as each of C’s superclasses.
Because different classes are described at different places in the source code of a program (often spread
across several different files), there is no single place a programmer can turn to get a complete
description of a class” [77]. While this argument is stated in terms of source code, it is not difficult to
generalize it to design documents. Wilde et al.’s study [110] indicated that to understand the behavior of
a method one has to trace inheritance dependencies, which is considerably complicated due to dynamic
binding. A similar point was made in [77] about the understandability of programs in languages that
support dynamic binding, such as C++.
In a set of interviews with 13 experienced users of object-oriented programming, Daly et al. [40] noted
that if the inheritance hierarchy is designed properly then the effect of distributing functionality over the
inheritance hierarchy would not be detrimental to understanding. However, it has been argued that there
exists increasing conceptual inconsistency as one travels down an inheritance hierarchy (i.e., deeper
levels in the hierarchy are characterized by inconsistent extensions and/or specializations of super-
classes) [45], therefore inheritance hierarchies may not be designed properly in practice. In one study
Dvorak [45] found that subjects were more inconsistent in placing classes deeper in the inheritance
hierarchy when compared to at higher levels in the hierarchy.
An experimental investigation found that making changes to a C++ program with inheritance consumed
more effort than a program without inheritance, and the author attributed this to the subjects finding the
inheritance program more difficult to understand based on responses to a questionnaire [26]. A
contradictory result was found in [41], where the authors conducted a series of classroom experiments
comparing the time to perform maintenance tasks on a ‘flat’ C++ program and a program with three levels
of inheritance. This was premised on a survey of object-oriented practitioners showing 55% of
respondents agreeing that inheritance depth is a factor when attempting to understand object-oriented
software [39]. The result was a significant reduction in maintenance effort for the inheritance program.
An internal replication by the same authors found the results to be in the same direction, albeit the p-
value was larger. The second experiment in [41] found that C++ programs with 5 levels of inheritance
took more time to maintain than those with no inheritance, although the effect was not statistically
significant. The authors explain this by observing that searching/tracing through the bigger inheritance
hierarchy takes longer. Two experiments that were partial replications of the Daly et al. experiments
produced different conclusions [107]. In both experiments the subjects were given three equivalent Java
programs to make changes to, and the maintenance time was measured. One of the Java programs was
‘flat’, one had an inheritance depth of 3, and one had an inheritance depth of 5. The results for the first
experiment indicate that the programs with inheritance depth of 3 took longer to maintain than the ‘flat’
program, but the program with inheritance depth of 5 took as much time as the ‘flat’ program. The authors
attribute this to the fact that the amount of changes required to complete the maintenance task for the
deepest inheritance program was smaller. The results for a second task in the first experiment and the
results of the second experiment indicate that it took longer to maintain the programs with inheritance. To
explain this finding and its difference from the Daly et al. results, the authors showed that the “number of
methods relevant for understanding” (which is the number of methods that have to be traced in order to
perform the maintenance task) was strongly correlated with the maintenance time, and this value was
much larger in their study compared with the Daly et al. programs. The authors conclude that inheritance
de l’information
depth per se is not the factor that affects understandability, but the number of methods that have to be
traced.
2.1.1.3 Summary
The current theoretical framework for explaining the effect of the structural properties of object-oriented
programs on external program attributes can be justified empirically. To be specific, studies that have
been performed indicate that the distribution of functionality across classes in object-oriented systems,
and the exacerbation of this through inheritance, potentially makes programs more difficult to understand.
This suggests that highly cohesive, sparsely coupled, and low inheritance programs are less likely to
contain a fault. Therefore, metrics that measure these three dimensions of an object-oriented program
would be expected to be good predictors of fault-proneness or the number of faults.
The empirical question is then whether contemporary object-oriented metrics measure the relevant
structural properties well enough to substantiate the above theory. Below we review the evidence on this.
2.1.2 Empirical Validation of Object-Oriented Metrics
In this section we review the empirical studies that investigate the relationship between the ten object-
oriented metrics that we study and fault-proneness (or number of faults). The product metrics cover the
following dimensions: coupling, cohesion, inheritance, and complexity. These dimensions are based on
the definition of the metrics, and may not reflect their actual behavior.
Coupling metrics characterize the static usage dependencies amongst the classes in an object-oriented
system [21]. Cohesion metrics characterize the extent to which the methods and attributes of a class
belong together [16]. Inheritance metrics characterize the structure of the inheritance hierarchy.
Complexity metrics, as used here, are adaptations of traditional procedural paradigm complexity metrics
to the object-oriented paradigm.
Current methodological approaches for the validation of object-oriented product metrics are best
exemplified by two articles by Briand et al. [19][22]. These are validation studies for an industrial
communications system and a set of student systems respectively, where a considerable number of
contemporary object-oriented product metrics were studied. We single out these studies because their
methodological reporting is detailed and because they reflect what can be considered best
methodological practice to date.
The basic approach starts with a data set of product metrics and binary fault data for a complete system
or multiple systems. The important element of the Briand et al. methodology that is of interest to us here
is the univariate analysis that they stipulate should be performed. In fact, the main association between
the product metrics and fault-proneness is established on the basis of the univariate analysis. If the
relationship is statistically significant (and in the expected direction) than a metric is considered7validated. For instance, in [22] the authors state a series of hypotheses relating each metric with fault-
proneness. They then explain “Univariate logistic regression is performed, for each individual measure
(independent variable), against the dependent variable to determine if the measure is statistically related,
in the expected direction, to fault-proneness. This analysis is conducted to test the hypotheses..”
Subsequently, the results of the univariate analysis are used to evaluate the extent of evidence
supporting each of the hypotheses. Reliance on univariate results as the basis for drawing validity
conclusions is common practice (e.g., see [4][10][17][18][57][106]).
In this review we first present the definition of the metrics as we have operationalized them. The
operationalization of some of the metrics is programming language dependent. We then present the
magnitude of the coefficients and p values computed in the various studies. Validation coefficients were
either the change in odds ratio as a measure of the magnitude of the metric to fault-proneness
association from a logistic regression (see the appendix, Section 7) or the Spearman correlation
coefficient. Finally, this review focuses only on the fault-proneness or number of faults dependent
variable. Other studies that investigated effort, such as [32][89][78], are not covered as effort is not the
topic of the current paper.7 Briand et al. use logistic regression, and consider the statistical significance of the regression parameters.
de l’information
2.1.2.1 WMC
This is the Weighted Methods per Class metric [30], and can be classified as a traditional complexity
metric. It is a count of the methods in a class. The developers of this metric leave the weighting scheme
as an implementation decision [30]. We weight it using cyclomatic complexity as did [78]. However, other
authors did not adopt a weighting scheme [4][106]. Methods from ancestor classes are not counted and
neither are “friends” in C++. This is similar to the approach taken in, for example, [4][31]. To be precise,8WMC was counted after preprocessing to avoid undercounts due to macros [33].
One study found WMC to be associated with fault-proneness on three different sub-systems written in9C++ with p-values 0.054, 0.0219 and 0.0602, and change in odds ratio 1.26, 1.45, and 1.26 [106]. A
study that evaluated WMC on a C++ application and a Java application found WMC to have a Spearman
correlation of 0.414 and 0.456 with the number of faults due to field failures respectively, and highly
significant p-values (<0.0001 and <0.0056) [10]. Another study using student systems found WMC to be10associated with fault-proneness with a p-value for the logistic regression coefficient of 0.0607 [4].
2.1.2.2 DIT
The Depth of Inheritance Tree [30] metric is defined as the length of the longest path from the class to the
root in the inheritance hierarchy. It is stated that as one goes further down the class hierarchy the more
complex a class becomes, and hence more fault-prone.
The DIT metric was empirically evaluated in [19][22]. In [19] the authors found that this metric was related
to fault-proneness (p=0.0074) with a change in odds ratio equal to 0.572 when measured on non-library
classes. The second study [22] also found it to be associated with fault-proneness (p=0.0001) with a
change in odds ratio of 2.311. Another study using student systems found DIT to be associated with fault-
proneness with a p-value for the logistic regression coefficient <0.0001 [4].
It will be noted that in the first study a negative association was found between DIT and fault-proneness.
The authors explain this by stating that in the system studied classes located deeper in the inheritance
hierarchy provide only implementations for a few specialized methods, and are therefore less likely to
contain faults than classes closer to the root [19]. This was a deliberate strategy to place as much
functionality as close as possible to the root of the inheritance tree. Note that for the latter two
investigations, the same data set was used, and therefore the slightly different coefficients may have
been due to removal of outliers.
One study using data from an industrial system found that classes involved in an inheritance structure
were more likely to have defects (found during integration testing and within 12 months post-delivery)
[27]. Another study did not find DIT to be associated with fault-proneness on three different sub-systems
written in C++, where faults were based on three years’ worth of trouble reports [106]. One study that
evaluated DIT on a Java application found that it had a Spearman correlation of 0.523 (p<0.0015) with the
number of faults due to field failures [10].
2.1.2.3 NOC
This is the Number of Children inheritance metric [30]. This metric counts the number of classes which
inherit from a particular class (i.e., the number of classes in the inheritance tree down from a class).
The NOC metric was empirically evaluated in [19][22]. In [19] the authors found that this metric was not
related to fault-proneness. Conversely, the second study [22] found it to be associated with fault-
proneness (p=0.0276) with a change in odds ratio of 0.322. Another study using student systems found8 Note that macros embodied in #ifdef’s are used to customize the implementation to a particular platform. Therefore, the method is
defined at design time but its implementation is conditional on environment variables. Not counting it, as suggested in [31], would
undercount methods known at design time.
In this study faults were classified as either object-oriented type faults or traditional faults. The values presented here are for all of
the faults, although the same metrics were found to be significanct for both all faults and the object-oriented only faults.
Furthermore, the change in odds ratio reported is based on a change of one unit of the metric rather than a change in the standard
deviation.
109 This study used the same data set as in [22], except that the data was divided into subsets using different criteria. The results
presented here are for all of the classes.
de l’information
NOC to be associated with fault-proneness with a p-value for the regression coefficient <0.0001 [4]. Note
that for the latter two investigations, the same data set was used, and therefore the slightly different
coefficients may have been due to removal of outliers. In both studies NOC had a negative association
with fault-proneness and this was interpreted as indicating that greater attention was given to these
classes (e.g., through inspections) given that many classes were dependent on them.
Another study did not find NOC to be associated with fault-proneness on three different sub-systems
written in C++, where faults were based on three years’ worth of trouble reports [106]. NOC was not
associated with the number of faults due to field failures in a study of two systems, one implemented in
C++ and the other in Java [10].
2.1.2.4 CBO
This is the Coupling Between Object Classes coupling metric [30]. A class is coupled with another if
methods of one class uses methods or attributes of the other, or vice versa. In this definition, uses can
mean as a member type, parameter type, method local variable type or cast. CBO is the number of other
classes to which a class is coupled. It includes inheritance-based coupling (i.e., coupling between
classes related via inheritance).
The CBO metric was empirically evaluated in [19][22]. In [19] the authors found that this metric was
related to fault-proneness (p<0.0001) with a change in odds ratio equal to 5.493 when measured on non-
library classes. The second study [22] also found it to be associated with fault-proneness (p<0.0001) with
a change in odds ratio of 2.012 when measured on non-library classes. Another study did not find CBO to
be associated with fault-proneness on three different sub-systems written in C++, where faults were
based on three years’ worth of trouble reports [106]. This was also the case in a recent empirical analysis
on two traffic simulation systems, where no relationship between CBO and the number of known faults
was found [57], and a study of a Java application where CBO was not found to be associated with faults
due to field failures [10]. Finally, another study using student systems found CBO to be associated with
fault-proneness with a p-value for the logistic regression coefficient <0.0001 [4].
2.1.2.5 RFC
This is the Response for a Class coupling metric [30]. The response set of a class consists of the set M of
methods of the class, and the set of methods invoked directly by methods in M (i.e., the set of methods
that can potentially be executed in response to a message received by that class). RFC is the number of
methods in the response set of the class.
The RFC metric was empirically evaluated in [19][22]. In [19] the authors found that this metric was
related to fault-proneness (p=0.0019) with a change in odds ratio equal to 1.368 when measured on non-
library classes. The second study [22] also found it to be associated with fault-proneness (p<0.0001) with
a change in odds ratio of 3.208 when measured on non-library classes. Another study found RFC to be
associated with fault-proneness on two different sub-systems written in C++ with p-values 0.0401 and110.0499, and change in odds ratio 1.0562 and 1.0654 [106]. A study that evaluated RFC on a C++
application and a Java application found RFC to have a Spearman correlation of 0.417 and 0.775 with the
number of faults due to field failures respectively, and highly significant p-values (both <0.0001) [10].
Another study using student systems found RFC to be associated with fault-proneness with a p-value for
the logistic regression coefficient <0.0001 [4]. In this study faults were classified as either object-oriented type faults or traditional faults. The values presented here are for all of
the faults, although the same metrics were found to be significanct for both all faults and the object-oriented only faults.
Furthermore, the change in odds ratio reported is based on a change of one unit of the metric rather than a change in the standard
deviation.11
de l’information
2.1.2.6 LCOM
This is a cohesion metric that was defined in [30]. This measures the number of pairs of methods in the
class using no attributes in common minus the number of pairs of methods that do. If the difference is
negative it is set to zero.
The LCOM metric was empirically evaluated in [19][22]. In [19] the authors found it to be associated with
fault-proneness (p=0.0.249) with a change in odds ratio of 1.613. Conversely, the second study [22] did
not find it to be associated with fault-proneness.
2.1.2.7 NMO
This is an inheritance metric that has been defined in [80], and measures the number of inherited
methods overriden by a subclass. A large number of overriden methods indicates a design problem [80].
Since a subclass is intended to specialize its parent, it should primarily extend the parent’s services [94].
This should result in unique new method names. Numerous overrides indicate subclassing for the
convenience of reusing some code and/or instance variables when the new subclass is not purely a
specialization of its parent [80].
The NMO metric was empirically evaluated in [19][22]. In [19] the authors found that this metric was
related to fault-proneness (p=0.0082) with a change in odds ratio equal to 1.724. The second study [22]
also found it to be associated with fault-proneness (p=0.0243) with a change in odds ratio of 1.948.
Lorenz and Kidd [80] caution that in the context of frameworks methods are often defined specifically for
reuse or that are meant to be overriden. Therefore, for our study there is already an a priori expectation
that this metric may not be a good predictor.
2.1.2.8 NMA
This is an inheritance metric that has been defined in [80], and measures the number of methods added
by a subclass (inherited methods are not counted). As this value becomes larger for a class, the
functionality of that class becomes increasingly distinct from that of the parent classes.
The NMO metric was empirically evaluated in [19][22]. In [19] the authors found that this metric was
related to fault-proneness (p=0.0021) with a change in odds ratio equal to 3.925, a rather substantial
effect. The second study [22] also found it to be associated with fault-proneness (p=0.0021) with a
change in odds ratio of 1.710.
2.1.2.9 SIX
This is an inheritance metric that has been defined in [80], and consists of a combination of inheritance
metrics. It is calculated as the product of the number of overriden methods and the class hierarchy
nesting level normalized by the total number of methods in the class. The higher value for SIX, the more
likely that a particular class does not conform to the abstraction of it's superclasses [94].
The SIX metric was empirically evaluated in [19][22]. In [19] the authors found that this metric was not
related to fault-proneness. Conversely, the second study [22] found it to be associated with fault-
proneness (p=0.0089) with a change in odds ratio of 1.337.
2.1.2.10 NPAVG
This can be considered as a coupling metric and has been defined in [80], and measures the average
number of parameters per method (not including inherited methods). Methods with a high number of
parameters generally require considerable testing (as their input can be highly varied). Also, large
numbers of parameters lead to more complex, less maintainable code.
2.1.2.11 Summary
The current empirical studies do provide some evidence that object oriented metrics are associated with
fault-proneness or the incidence of faults. Though, the evidence is equivocal. For some of the inheritance
metrics that were studied (DIT and NOC) some studies found a positive association, some found a
negative association, and some found no association. The CBO metric was found to be positively
associated with fault-proneness in some studies, and not associated with either the number of faults
found or fault-proneness in other studies. The RFC and WMC metrics were consistently found to be
de l’information
associated with fault-proneness. The NMO and NMA metrics were found to be associated with fault-
proneness, but the evidence for the SIX metric is more equivocal. The LCOM cohesion metric also has
equivocal evidence supporting its validity.
It should be noted that the differences in the results obtained across studies may be a consequence of
the measurement of different dependent variables. For instance, some treat the dependent variable as
the (continuous) number of defects found. Other studies use a binary value of incidence of a fault during
testing or in the field, or both. It is plausible that the effects of product metrics may be different for each of
these.
An optimistic observer would conclude that the evidence as to the predictive validity of most of these
metrics is good enough to recommend their practical usage.
2.2 The Confounding Effect of Size
In this section we take as a starting point the stance of an optimistic observer and assume that there is
sufficient empirical evidence demonstrating the relationship between the object-oriented metrics that we
study and fault-proneness. We already showed that previous empirical studies drew their conclusions
from univariate analyses. Below we make the argument that univariate analyses ignore the potential
confounding effects of class size. We show that if there is indeed a size confounding effect, then
previous empirical studies could have harbored a large positive bias.
For ease of presentation we take as a running example a coupling metric as the main metric that we are
trying to validate. For our purposes, a validation study is designed to determine whether there is an
association between coupling and fault-proneness. Furthermore, we assume that this coupling metric is
appropriately dichotomized: Low Coupling (LC) and High Coupling (HC). This dichotomization
assumption simplifies the presentation, but the conclusions can be directly generalized to a continuous
metric.
2.2.1 The Case Control Analogy
An object-oriented metrics validation study can be easily seen as an unmatched case-control study.
Case-control studies are frequently used in epidemiology to, for example, study the effect of exposure to12carcinogens on the incidence of cancers [95][12]. The reason for using case-control studies as opposed
to randomized experiments in certain instances is that it would not be ethically and legally defensible to
do otherwise. For example, it would not be possible to have deliberately composed ‘exposed’ and
‘unexposed’ groups in a randomized experiment when the exposure is a suspected carcinogen or toxic
substance. Randomized experiments are more appropriately used to evaluate treatments or preventative
measures [52].
In applying the conduct of a case-control study to the validation of an object-oriented product metric, one
would first proceed by identifying classes that have faults in them (the cases). Then, for the purpose of
comparison another group of classes without faults in them are identified (the controls). We determine
the proportion of cases that have, say High Coupling and the proportion with Low Coupling. Similarly, we
determine the proportion of controls with High Coupling, and the proportion with Low Coupling. If there is
an association of coupling with fault-proneness then the prevalence of High Coupling classes would be
higher in the cases than in the controls. Effectively then, a case-control study follows a paradigm that
proceeds from effect to cause, attempting to find antecedents that lead to faults [99]. In a case-control
study, the control group provides an estimate of the frequency of High Coupling that would be expected
among the classes that do not have faults in them.
In an epidemiological context, it is common to have ‘hospital-based cases’ [52][95]. For example, a
subset or all patients that have been admitted to a hospital with a particular disease can be considered as13cases. Controls can also be selected from the same hospital or clinic. The selection of controls is not
necessarily a simple affair. For example, one can match the cases with controls on some confounding12
13 Other types of studies that are used are cohort-studies [52], but we will not consider these here. This raises the issue of generalizability of the results. However, as noted by Breslow and Day [12], generalization from the sample
in a case-control study depends on non-statistical arguments. The concern with the design of the study is to maximize internal
validity. In general, replication of results establishes generalizability [79].
de l’information
variables, for instance, on age and sex. Matching ensures that the cases and controls are similar on the
matching variable and therefore this variable cannot be considered a causal factor in the analysis.
Alternatively, one can have an unmatched case-control study and control for confounding effects during
the analysis stage.
In an unmatched case-control study the determination of an association between the exposure (product
metric) and the disease (fault-proneness) proceeds by calculating a measure of association and
determining whether it is significant. For example, consider the following contingency table that is
obtained from a hypothetical validation study:
Fault PronenessCouplingHC
LCFaulty9119Not Faulty1991
Table 1: A contingency table showing the results of a hypothetical validation study.
For this particular data set, the odds ratio is 22.9 (see the appendix, Section 7, for a definition of the odds
ratio), which is highly significant, indicating a strong positive association between coupling and fault-
proneness.
2.2.2 The Potential Confounding Effect of Size
One important element that has been ignored in previous validation studies is the potential confounding
effect of class size. This is illustrated in Figure 2.
Figure 2: Path diagram illustrating the confounding effect of size.
The path diagram in Figure 2 depicts a classic text-book example of confounding in a case-control study 14[99][12]. The path (a) represents the current causal beliefs about product metrics being an antecedent
We make the analogy to a case-control study because it provides us with a well tested framework for defining and evaluating
confounding effects, as well as for conducting observational studies from which one can make stronger causal claims (if all known
confounders are controlled). However, for the sole purposes of this paper, the characteristics of a confounding effect have been
described and exemplified in [61] without resort to a case-control analogy.14
de l’information
to fault-proneness. The path (b) depicts a positive causal relationship between size and fault-proneness.
The path (c) depicts a positive association between product metrics and size.
If this path diagram is concordant with reality, then size distorts the relationship between product metrics
and fault-proneness. Confounding can result in considerable bias in the estimate of the magnitude of the
association. Size is a positive confounder, which means that ignoring size will always result in the
association between say coupling and fault-proneness to be more positive than it really is.
The potential confounding effect of size can be demonstrated through an example (adapted from [12]).
Consider the table in Table 1 that gave an odds ratio of 22.9. As mentioned earlier, this is representative
of the current univariate analyses used in the object-oriented product metrics validation literature (which
explicitly exclude size as a covariate nor employ a stratification on size).
Now, let us say that if we analyze the data seperately for small and large classes, we have the data in15Table 2 for the large classes, and the data in Table 3 for the small classes.
Fault PronenessCouplingHC
LCFaulty9010Not Faulty91
Table 2: A contingency table showing the results for only large classes of a hypothetical validation study.
Fault PronenessCouplingHC
LCFaulty19Not Faulty1090
Table 3: A contingency table showing the results for only small classes of a hypothetical validation study.
In both of the above tables the odds ratio is one. By stratifying on size (i.e., controlling for the effect of
size), the association between coupling and fault-proneness has been reduced dramatically. This is
because size was the reason why there was an association between coupling and fault-proneness in the
first place. Once the influence of size is removed, the example shows that the impact of the coupling
metric disappears.
Therefore, an important improvement on the conduct of validation studies of object oriented metrics is to
control for the effect of size, otherwise one may be getting the illusion that the product metric is strongly
associated with fault-proneness, when in reality the association is much weaker or non-existent.
2.2.3 Evidence of a Confounding Effect
Now we must consider whether the path diagram in Figure 2 can be supported in reality.
There is evidence that object-oriented product metrics are associated with size. For example, in [22] the
Spearman rho correlation coefficients go as high as 0.43 for associations between some coupling and
cohesion metrics with size, and 0.397 for inheritance metrics, and both are statistically significant (at an
alpha level of say 0.1). Similar patterns emerge in the study reported in [19], where relatively large
correlations are shown. In another study [27] the authors display the correlation matrix showing the
Spearman correlation between a set of object-oriented metrics that can be collected from Shlaer-Mellor
designs and C++ LOC. The correlations range from 0.563 to 0.968, all statistically significant at an alpha
level 0.05. This also indicates very strong correlations with size.
Note that in this example the odds ratio of the size to fault-proneness association is 100, and the size to coupling association is
81.3. Therefore, it follows the model in Figure 2.15
de l’information
Associations between size and defects have been reported in non-object oriented systems [58]. For
object oriented programs, the relationship between size and defects is clearly visible in the study of [27],
where the Spearman correlation was found to be 0.759 and statistically significant. Another study of
image analysis programs written in C++ found a Spearman correlation of 0.53 between size in LOC and
the number of errors found during testing [55], and was statistically significant at an alpha level of 0.05.
Briand et al. [22] find statistically significant associations between 6 different size metrics and fault-
proneness for C++ programs, with a change in odds ratio going as high as 4.952 for one of the size
metrics.
General indications of a confounding effect are seen in Figure 3, which shows the associations between a
set of coupling metrics and fault-proneness, and with size from a recent study [22]. The association
between coupling metrics and fault-proneness is given in terms of the change in the odds ratio and the p-
value of the univariate logistic regression parameter. The association with size is in terms of the
Spearman correlation. As can be seen in Figure 3, all the metrics that had a significant relationship with
fault-proneness in the univariate analysis also had a significant correlation with size. Furthermore, there
is a general trend of increasing association between the coupling metric and fault-proneness as its
association with size increases.Relationship with fault-proneness
MetricChange in odds
Ratio
2.012
2.062
3.208
8.168
5.206
7.170
1.090
9.272
1.395
1.385
1.416
1.206
1.133
0.816
1.575
1.067
4.937
1.214p-value<0.0001<0.0001<0.0001<0.0001<0.0001<0.00010.5898<0.00010.03290.03890.03070.32130.33840.2520.09220.6735<0.00010.2737Relationship with sizerho0.32170.33590.39400.43100.32320.3168-0.1240.34550.17530.19580.12960.02970.0493-0.08550.2365-0.12290.2765-0.0345p-value<0.0001<0.0001<0.0001<0.0001<0.0001<0.00010.1082<0.00010.01630.00880.07850.70100.49130.25280.00190.11150.00010.6553CBOCBO’RFC1RFCMPCICPIH-ICPNIH-ICPDACDAC’OCAICFCAECOCMICOCMECIFMMICAMMICOMMICOMMEC
Figure 3: Relationship between coupling metrics and fault-proneness, and between coupling metrics and
size from [22]. This covers only coupling to non-library classes. This also excludes the following metrics
because no results pertaining to the relationship with fault-proneness were presented: ACAIC, DCAEC,
IFCMIC, ACMIC, IFCMEC, and DCMEC. The definition of these metrics is provided in the appendix.
de l’information
This leads us to conclude that, potentially, previous validation studies have overestimated the impact of
object oriented metrics on fault-proneness due to the confounding effect of size.
2.3 Summary
In this section the theoretical basis for object-oriented product metrics was presented. This states that
cognitive complexity is an intervening variable between the structural properties of classes and fault-
proneness. Furthermore, the empirical evidence supporting the validity of the object oriented metrics that
we study was presented, and this indicates that some of the metrics are strongly associated with fault-
proneness or the number of faults. We have also demonstrated that there is potentially a strong size
confounding effect in empirical studies to date that validate object oriented product metrics. This makes it
of paramount importance to determine whether such a strong confounding effect really exists.
If a size confounding effect is found, this means that previous validation studies have a positive bias and
may have exaggerated the impact of product metrics on fault-proneness. The reason is that studies to
date relied exclusively on univariate analysis to test the hypothesis that the product metrics are
associated with fault-proneness or the number of faults. The objective of the study below then is to
directly test the existence of this confounding effect and its magnitude.
3 Research Method
3.1 Data Source
Our data set comes from a telecommunications framework written in C++ [102]. The framework
implements many core design patterns for concurrent communication software. The communication
software tasks provided by this framework include event demultiplexing and event handler dispatching,
signal handling, service initialization, interprocess communication, shared memory management,
message routing, dynamic (re)configuration of distributed services, and concurrent execution and
synchronization. The framework has been used in applications such as electronic medical imaging
systems, configurable telecommunications systems, high-performance real-time CORBA, and web
servers. Examples of its application include in the Motorola Iridium global personal communications
system [101] and in network monitoring applications for telecommunications switches at Ericsson [100]. A
total of 174 classes from the framework that were being reused in the development of commercial
switching software constitute the system that we study. A total of 14 different programmers were involved16in the development of this set of classes.
3.2 Measurement
3.2.1 Product Metrics
All product metrics are defined on the class, and constitute design metrics, and they have been presented
in Section 2.1.2. In our study the size variable was measured as non-comment source LOC for the class.
Measurement of product metrics used a commercial metrics collection tool that is currently being used by
a number of large telecommunications software development organizations.
3.2.2 Dependent Variable
17For this product, we obtained data on the faults found in the library from actual field usage. Each fault
was due to a unique field failure and represents a defect in the program that caused the failure. Failures
were reported by the users of the framework. The developers of the framework documented the reasons
for each delta in the version control system, and it was from this that we extracted information on whether
a class was faulty.16
17 This number was obtained from the different login names of the version control system associated with each class. It has been argued that considering faults causing field failures is a more important question to address than faults found during
testing [9]. In fact, it has been argued that it is the ultimate aim of quality modeling to predict post-release fault-proness [50]. In at
least one study it was found that pre-release fault-proneness is not a good surrogate measure for post-release fault-proness, the
reason posited being that pre-release fault-proness is a function of testing effort [51].
de l’information
A total of 192 faults were detected in the framework at the time of writing. These faults occurred in 70 out
of 174 classes. The dichotomous dependent variable that we used in our study was the detection or non-
detection of a fault. If one or more faults are detected then the class is considered to be faulty, and if not
then it is considered not faulty.
3.3 Data Analysis Methods
3.3.1 Testing for a Confounding Effect
It is tempting to use a simple approach to test for a confounding effect of size: examine the association
between size and fault-proneness. If this association is not significant at a traditional alpha level, then
conclude that size is not different between cases and controls (and hence has no confounding effect),
and proceed with a usual univariate analysis.
However, it has been noted that this is an incorrect approach [38]. The reason is that traditional
significance testing places the burden of proof on rejecting the null hypothesis. This means that one has
to prove that the cases and controls do differ in size. In evaluating confounding potential, the burden of
proof should be in the opposite direction: before discarding the potential for confounding, the researcher
should demonstrate that cases and controls do not differ on size. This means controlling the Type II error
rather than the Type I error. Since one usually has no control over the sample size, this means setting
the alpha level to 0.25, 0.5, or even larger.
A simpler and more parsimonious approach is as follows. For an unmatched case-control study, a
measured confounding variable can be controlled through a regression adjustment [12][99]. A regression
adjustment entails including the confounder as another independent variable in a regression model. If the
regression coefficient of the object-oriented metric changes dramatically (in magnitude and statistical
significance) with and without the size variable, then this is a strong indication that there was indeed a
confounding effect [61]. This is further elaborated below.
3.3.2 Logistic Regression Model
Binary logistic regression is used to construct models when the dependent variable can only take on two
values, as in our case. It is most convenient to use a logistic regression (henceforth LR) model rather
than the contingency table analysis used earlier for illustrations since the model does not require
dichotomization of our product metrics.The general form of an LR model is:
π=
1+e1 β0+βixi i=1 Eqn. 1∑k
where π is the probability of a class having a fault, and the xi’s are the independent variables. The β
parameters are estimated through the (unconditional) maximization of a log-likelihood [61].
In a univariate analysis only one xi,
being validated:18x1, is included in the model, and this is the product metric that is
1
1+e β0+βix1π=Eqn. 2
When controlling for size, a second xi, x2, is included that measures size:
π=
1811+e β0+βix1+β2x2Eqn. 3 Conditional logistic regression is used when there has been matching in the case-control study and each matched set is treated
as a stratum in the analysis [12].
de l’information
In constructing our models, we could follow the previous literature and not consider interaction effects nor
consider any transformations (for example, see [4][8][17][18][19][22][106]). To err on the conservative
side, however, we did test for interaction effects between the size metric and the product metric for all
product metrics evaluated. In none of the cases was a significant interaction effect identified.19Furthermore, we performed a logarithmic transformation on our variables and re-evaluated all the20models. Our conclusions would not be affected by using the transformed models. Therefore, we only
present the detailed results for the untransformed model.
The magnitude of an association can be expressed in terms of the change in odds ratio as the x1 variable
changes by one standard deviation. This is explained in the appendix (Section 7), and is denoted by
Ψ. Since we construct two models as shown in Eqn. 2 and Eqn. 3 without and with controlling for size
respectively, we will denote the change in odds ratio as Ψx1 and Ψx1+x2 respectively. As suggested
in [74], we can evaluate the extent to which the change in odds ratio changes as an indication of theextent of confounding. We operationalize this as follows:
ψ=2 ψx1 ψx1+x2
ψx1+x2×100Eqn. 4
This gives the percent change in Ψx1+x2 by removing the size confounder. If this value is large then we
can consider that class size does indeed have a confounding effect. The definition of “large” can be
problematic, however, as will be seen in the results, the changes are sufficiently big in our study that by
any reasonable threshold, there is little doubt.
3.3.3 Diagnostics and Hypothesis Testing
The appendix of this paper presents the details of the model diagnostics that were performed, and the
approach to hypothesis testing. Here we summarize these.
The diagnostics concerned checking for collinearity and identifying influential observations. We compute
the condition number specific to logistic regression, ηLR, to determine whether dependencies amongst
the independent variables are affecting the stability of the model (collinearity). The β value provides us
an indication of which observations are overly influential. For hypothesis testing, we use the likelihood
ratio statistic, G, to test the significance of the overall model, the Wald statistic to test for the significance2of individual model parameters, and the Hosmer and Lemeshow R value as a measure of goodness of
fit. Note that for the univariate model the G statistic and the Wald test are statistically equivalent, but we
present them both for completeness. All statistical tests were performed at an alpha level of 0.05.
4 Results
4.1 Descriptive Statistics
Box and whisker plots for all the product metrics that we collected are shown in Figure 4. These indicatethth21the median, the 25 and 75 quantiles. Outliers and extreme points are also shown in the figure.
As is typical with product metrics their distributions are clearly heavy tailed. Most of the variables are
counts, and therefore their minimal value is zero. Variables NOC, NMO, and SIX have less than six
observations that are non-zero. Therefore, they were excluded from further analysis. This is the
approach followed in [22].19
20
21 Given that product metrics are counts, an appropriate transformation to stablize the variance would be the logarithm. We wish to thank an anonymous reviewer for making this suggestion. As will be noted that in some cases the minimal value is zero. For metrics such as CBO, WMC and RFC, this would be because
the class was defined in a manner similar to a C struct, with no methods associated with it.
de l’information
The fact that few classes have NOC values greater than zero indicates that most classes in the system
are leaf classes. Overall, 76 of the classes had a DIT value greater than zero, indicating that they are
subclasses. The remaining 98 classes are at the root of the inheritance hierarchy. The above makes
clear that the inheritance hierarchy for this system was “flat”. Variable DIT has a small variation, but this
is primarily due to there not being a large amount of deep inheritance in this framework. Shallow
inheritance trees, indicating sparse use of inheritance, have been reported in a number of systems thus
far [27][30][32].
Figure 4: Box and whisker plots for all the object-oriented product metrics. Two charts are shown to
allow for the fact that two y-axis scales are required due to the different ranges.
The LCOM values may seem to be large. However, examination of the results from previous systems
indicate that they are not exceptional. For instance, in the C++ systems reported in [22], the maximum
LCOM value was 818, the mean 43, and the standard deviation was 106. Similarly, the system reported
in [19] had a maximum LCOM value of 4988, a mean of 99.6, and standard deviation of 547.7.
4.2 Correlation with Size
Table 4 shows the correlation of the metrics with size as measured in LOC. As can be seen all of the
associations are statistically significant except DIT. But DIT did not have much variation, and therefore a
weak association with size is not surprising. All metrics except LCOM and NPAVG have a substantial
correlation coefficient, indicating a non-trivial assocation with size.
de l’information
OO
Metric
WMC
DIT
CBO
RFC
LCOM
NMA
NPAVGLOCRho0.880.0980.460.880.240.860.27p-value<0.00010.19<0.0001<0.00010.0011<0.00010.000256
Table 4: Spearman correlation of the object oriented metrics (only the ones that have more than five non-
zero values) with size in LOC.
4.3 Validation Results
The results of the univariate analyses and the models controlling for size for each of the remaining
metrics are presented in this section. The complete results are presented in Table 5.
de l’information
otu iSzeCon rot MelticrWC MDI CTO RBFC LOC NMA MPNAG
VoCntolling rfro iSz
eG p(-vauel)101 .0.0015)( 0.7 1(0.864)1 41.6( 00.144 )1.418( .0007)03.24 (0 .0271 )2.13 (080.00)30 (0 .975)
HL-R .04032
LηR2.24 21.66 .712 294. 1.0141 276 1..5
7Ceof.f (p-valu)0e.0820 0.(020) 2-.1202 4(.03049 ).20048( 0.0266) .00480 0.001()3 00.70 4(0.3091) 0.060 (90.007) 00.0530 0(4.857) Ψ
174 ..9038 .3187 1.4 1.33 1.895 1.0704G (p-aluev)1383 (.00.1) -10.39 5(.0010) 161.51(0 0.03) 03.19 (100.01)16 .0 6(0.003)0 -
--HLR0 059 .--2
η L7R6. 3-3.597. 84 4.368 2.47 7--Coffe.(p- vluea)-.00113 (0.0337 -0.020)9(0. 449) -05.2070(0 252.) 00.0026 0.(8230) 0.0-05 2(.04783 )-
-Ψ 08.0 1.-2400 .731 10. 7.907 --ize SoCeff .(-pvaul)0e0.411 (.0029) -08.0150(0 .003)00. 1017(0 .1507) .01001 0.001(7 ).0102 (70.042) --0Size
Ψ2374.- 1.9 28. 1.086 .15 --2
.00007 .010 0.8490 .00140 05.54 1.8e6
-00.7950 07.10. 059 .0690 -
T-ble 5:a Ovreallre ults of sthem deols witouthc ntrolo ofsiz eu(ivnriaat eodems)l, andw ti hcotrol ofn siz.e Te G havuleis the iklleiohd orato iesttf o tre hwohl modee.l he“Coeff.T c”oulnsm iveg ht estimaetedp ramaeter sfomrt hel oigtsi regcrsseoi nodml. ehe“p-Tvauel” sith eo e-nided stes tfo 22 hte unllhy pthosesifo rhet cofefciietn T.e hRvalues re baase don th eedinitfin oof Rpovidredb y oHsmera n demLesowh 6[]1 Hen.c tehe are 2yrefer erdt oa ths H-L R eavleu.s oFrt ehse ond haclf o tfh tabee, lpesentrnig he trsuelst fothe odml wieh tizsec onrtl, tohecoeff ciiet for nhte ize psaametre rsiprov dediwith tsich ageni ndod satio.r orFt heme rtcsiwh ee trh eomdelw thouit izseco tnor ls nit oisgnifcaitn(DI T nda NPVA)G ewdo otnp reesntt he odmle wihts zieco ntor sinlecg vien he tyhoptehsise condofudinng effect,ht resuelstw ill notb eubs
stantiveyldif freet nfomr ht noes zei cnotrlomo edl.V181-/0040/
019
- 1Size dependent interface energy and its applications
- 2Size dependent interface energy and its applications
- 3Effect of Annealing Temperature of ZnO on the Energy
- 4How America Lives (for class)
- 5TEST BANK FOR SATURDAY CLASS
- 6CORD an Object Model supporting Statistical Summary Informat
- 7Size the Day - 雷抒雁《日子》赏析
- 8Controlling_size_and_yield_of_zeolite_Y_nanocrystals_using
- 9SolidWorks解决failed to create toolboxlibrary object详细方法
- 10SD卡关于CLASS4和CLASS10区别
- 教学能力大赛决赛获奖-教学实施报告-(完整图文版)
- 互联网+数据中心行业分析报告
- 2017上海杨浦区高三一模数学试题及答案
- 招商部差旅接待管理制度(4-25)
- 学生游玩安全注意事项
- 学生信息管理系统(文档模板供参考)
- 叉车门架有限元分析及系统设计
- 2014帮助残疾人志愿者服务情况记录
- 叶绿体中色素的提取和分离实验
- 中国食物成分表2020年最新权威完整改进版
- 推动国土资源领域生态文明建设
- 给水管道冲洗和消毒记录
- 计算机软件专业自我评价
- 高中数学必修1-5知识点归纳
- 2018-2022年中国第五代移动通信技术(5G)产业深度分析及发展前景研究报告发展趋势(目录)
- 生产车间巡查制度
- 2018版中国光热发电行业深度研究报告目录
- (通用)2019年中考数学总复习 第一章 第四节 数的开方与二次根式课件
- 2017_2018学年高中语文第二单元第4课说数课件粤教版
- 上市新药Lumateperone(卢美哌隆)合成检索总结报告
- Confounding
- Validity
- Oriented
- Metrics
- Effect
- Object
- Class
- Size
- 国家级教学成果奖推荐书
- 给水设备设施运行管理规程
- 活动课 气候与我们的生产生活
- 桥建合一大型站房结构关键技术研究
- 桥面铺装层温度场的ANSYS模拟
- 基于小波变换的模极大值图像边缘检测算法
- 国家安全生产监督管理总局令第51号
- 公共管理硕士论文:绩效 政府效能 地方政府绩效管理
- 道路运输企业安全生产管理制度
- 2012届虹口区高三一模英语
- 盘点变频器控制柜几种功能及结构
- 横道中学物理小发明、小制作活动五(自制惯性演示仪)
- 基于经济增长视角的中国最优城市规模实证研究
- 产品需求规格说明书模板
- 【原卷版】2015年普通高等学校招生全国统一考试(新课标Ⅱ卷)理综(化学部分)
- 2011、2012年山西、山东、昆明等地公开竞岗乡镇副职面试真题
- 专职安全员考试题库(C证)
- 地理知识梳理八下
- 小学美术教师考调试题及答案4
- 校园安全及周边环境整治工作总结