Issues in Customer Churning in aniTelecom Company
Tesfaye Onsho GudetaDepartment of
ComputingMSc Distributed and Mobile
ComputingInstitute of Technology
TallaghtDublin, Ireland, 2013
T esf [email protected]
IntroductionData mining is one of the best ways to identify patterns and
problems in large amount of data to support problem solving
process. In this paper the causes business problems under a
company’s will be identified from the data collected from the day
to day activities of the company. iTelecom, the provider of
telephone and broadband services, is facing customer management
problems. There were high customer churning in company but the
company does not know what is exactly causing it. The company
collected vast amount of data related to its customer, the plan
customers
signed on and the services it provides. iTelecom has placed a
high emphasis on churning problem and it is trying to find new
ways of reducing it. The directors of iTelecom would like to
answer the following questions.
- What is it that makes a customer churn?- Are some customers more likely to churn than others?- How can we identify these customers before they churn?
In an attempt to answer these questions the company data was
organised and analysed using Weka software. The data is composed
of 21 attributes with nominal and numeric data types. There were
3333 instances used to analyse the data. The data is pre-processed
tested with at different levels for missing data, for
inconsistencies. Then, appropriate attributes were selected. The
appropriateness
an attribute was measure with its ability in predicting
churning. Then, the data mining performed using different
classifiers and tests. Then the data is divided into sub
datasets to find out which attribute or group of attribute can
better predict churning. In conclusion the best group was
selected and used to spot why these attributes are linked to the
risk of churning.
3
Data Understanding and Pre-Processing PhaseData UnderstandingIn an attempt to understand the data, different exploration
mechanisms were used. As stated, there were twenty-one attributes
with 3333 instances. The following table summarised all the
attributes and the type of data they hold. As can be seen, there
are two types of attributes with two typesdata-nominal and numeric.No. Attributes Type
1 State Nominal2 Account Length Numeric3 Area Code Nominal4 Phone Number Nominal5 Inter Plan Nominal6 VoiceMail Plan Nominal7 No of Vmail Mesgs Numeric8 Total Day Min Numeric9 Total Day calls Numeric10 Total Day Charge Numeric11 Total Evening Min Numeric12 Total Evening Calls Numeric13 Total Evening Charge Numeric14 Total Night Minutes Numeric15 Total Night Calls Numeric16 Total Night Charge Numeric17 Total Int Min Numeric18 Total Int Calls Numeric19 Total Int Charge Numeric20 No of Calls Customer
ServiceNumeric21 Churn Nomina
Table 1: Attributes and data typesAll the values in the attributes are complete and no value is
missing. Thus, it is possible to easily calculate the measures
of central tendency and depression on the numeric data. It is
possible to calculate frequency distribution and modal on the
nominal data as well as shown in the table below. Some
attributes such as Inter plan and VoiceMail and Churn Plan
contained categorical data like Yes / No and False/True.
Attributes Minimum Maximum Average St,devState Nominal Nominal Nominal NominalAccount Length 1 243 101.06
539.822
4
Area Code Nominal Nominal Nominal NominalPhone Number Nominal Nominal Nominal NominalInter Plan Nominal Nominal 3010(N) 323(Y)VoiceMail Plan Nominal Nominal 922(Y) 2411(N)No of Vmail Mesgs 0 51 8.09
913.688Total Day Min 0 350.
8179.775
54.467Total Day calls 0 165 100.43
620.069Total Day Charge 0 59.6
430.562
9.259Total Evening Min 0 363.
7200.98
50.714
5
Total Evening Calls 0 170 100.114
19.923Total Evening Charge 0 30.9
117.084
4.311Total Night Minutes 23.2 395 200.87
250.574Total Night Calls 33 175 100.10
819.569Total Night Charge 1.04 17.7
79.039
2.276Total Int Min 0 20 10.23
72.792Total Int Calls 0 20 4.47
92.641Total Int Charge 0 5.4 2.76
50.754No of Calls Customer
Service0 9 1.56
31.315Churn Nominal Nominal 2850(F) 483(T)
Table 2: Attributes and the measure of central tendencyWe can also see if a given distribution is normal, skewed or
peaked. Average and standard deviation show us how the data is
distributed about the mean/average and standard deviation. The
smaller the standard deviation the less distributed the data.
Therefore, we have both very less distributed and highly
distributed data. Total Day Min, Total Evening Min and Total
Night Minutes are the highly distributed data.
Distribution
AttributesStateAccount Length
AverageNominal101.06
St, devNominal
39.82
SkewednessNominal 0.10
KurtosisNominal
Area Code Nominal Nominal Nominal NominalPhone Number Nominal Nominal Nominal NominalInter Plan 3010(N) 323(Y) Nominal NominalVoiceMail Plan 922(Y) 2411(N) Nominal NominalNo of Vmail Mesgs 8.09
913.688
1.27 -0.0Total Day Min 179.77
554.467
-0.03
-0.0Total Day calls 100.43
620.069
-0.11
0.24Total Day Charge 30.56
29.259
-0.03
-0.0Total Evening Min 200.9
850.714
-0.02
0.03Total Evening Calls 100.11
419.923
-0.06
0.21Total Evening Charge 17.08
44.311
-0.02
0.03Total Night Minutes 200.87
250.574
0.01 0.09Total Night Calls 100.10
819.569
0.03 -0.0Total Night Charge 9.03
92.276
0.01 0.09Total Int Min 10.23
72.792
-0.24
0.61Total Int Calls 4.47
92.641
1.32 3.08Total Int Charge 2.76
50.754
-0.25
0.61No of Calls Customer
ServiceChurn
1.563
Nominal
1.315
Nominal2850(F)
1.06 1.52
483(T)
Table 3: Attributes and themeasures of Distribution
The scores of skewness tell us if the data is a normal or a
6
skewed distribution. As the scores show most of the numeric data
is distributed normally. Scores near to zero means the data is
distributed near normal. “The skewness for a normal distribution
is zero, and any symmetric data should have a skewness near zero.
Negative values for the skewness indicate data that are skewed
left and
7
positive values for the skewness indicate data that are skewed
right.”1 Though there are three right skewed distributions- the ‘No of Voice messages’, ‘Total Int. Charge’ and “No of Calls CustomerService” The ‘Total Int. charge’ and ‘No of Calls Customer Service’ are also affected by peak data as measured by Kurtosis. Other than this attributes Kurtosis scores show that they have flat distribution as “datasets with low kurtosis tend to have a flat top near the mean rather than a sharppeak.”2 How the data is distributed can be seen from the histogrambelow.
Figure 1: Distribution as represented by histogramThough is very difficult to see at this level these histograms
show which attributed is more likely linked to churning. For
example, ‘Total Day Minutes’ and ‘Total Day Charge’, especially at
the highest scores level, seem to link to churning. In addition,
looking at the ‘VoiceMail Plan’ one can say those who are not in
the plans are more prone to churning than those who have the plan.
8
As shown in the histogram and the tables, most of the data is
normally distributed. But those which are skewed have outliers.
Outliers can be noise or just normal data out of the normal
range.3
These data groups need to be visited or removed from the analysis.Weka 2D-scatter plots can help us understand the data more. Thefollowing plot distribution is shown for each attribute. The column at the right hand contains plot for each attribute and when1 h tt p:// www. itl .ni st. gov/ di v898/ han dbook/ eda / secti on 3/ eda 35b. h tm2 h tt p:// www. itl .ni st. gov/ di v898/ han dbook/ eda / secti on 3/ eda 35b. h tm3 Witten Ian H and Frank Eibe (2005) Data Mining: Practical Machine Learning Tools and Techniques. MorganKaufman Publishers, SF, USA.
9
we click on any row we ca see the details of that row on the left
side. In the example below ‘No of call customer service’ is in
correlation with ‘churn’ attribute. From the plot we can see that
‘No of call customer service’ which are above the average (4.5)
value are linked to churning.
Figure 2: Attributes as seen in 2D plotIt is also possible to view individual instance of an attribute
by clicking on each individual plots as shown below
Figure 3: Instances in a 2D plot window
There are other many attributes that seem to have correlated as we can see from the plot. The following two 2D plots
10
represent attribute ‘churn’ and ‘Inter. Plan’. As can be seenfrom the colours both attributes seem to correlate.
11
Figure 4: Plotted ‘churn’ attribute
Figure 5: Plotted ‘Inter Call’ attributeMoreover looking at the plots below one can easily guess that
‘Inter Plan and ‘VoiceMail Plan’ are highly correlated. Such
similarities
Figure 6: Multi-plotted view of correlated attributesA closer look at the 2D plot of the relationship between churn
attribute and that of No of Call customer service show that
these two the later is highly related to the former especially
when the value of instance is around and greater than the
13
Figure 7: Plotted ‘Churn’ verse ‘No of customer call services’If we look at the other attributes in the figure below they
follow some similar pattern and look more correlated than
before after this transformation
Figure 8 Patterns of dataset of group 8
It takes long phase of analysis to identify which attribute or
which group of attribute can better predict the churn attribute.
The following six groups of attributes are made and an attempt
was made to measure their predictive ability. They are groups of
14
attribute in service plan, day time services, evening time
services, night time’s services and international services.
These groups, as
No. Datatype
1 Inter Plan Nominal
VoiceMail Plan Nominal
2 Total Day Min Numeric
Total Day calls NumericTotal Day Charge Numeric
3 Total Evening Min Numeric
Total Evening Calls NumericTotal Evening Charge Numeric
4 Total Night Minutes Numeric
Total Night Calls NumericTotal Night Charge Numeric
5 Total Int Min Numeric
Total Int Calls NumericTotal Int Charge Numeric
6 No. Calls Cust Service Numeric
Churn Nominal
JRip J48 Logistic ZeroR Naivebayes
85.5086 85.5086 85.1485 85.5086 85.5086
85.5252 85.5252 85.5252 85.5086 85.5252
86.3186 86.7987 86.7987 85.5086 86.7987
85.5252 86.0547 86.0547 85.5252 86.0547
85.5086 85.5086 85.1485 85.5086 85.5086
85.5252 85.5252 85.5252 85.5086 85.5252
85.5086 85.5086 85.1485 85.5086 85.5086
85.5252 85.5252 85.5252 85.5086 85.5252
85.5086 85.5086 85.1485 85.5086 85.5086
85.5252 85.5252 85.5252 85.5086 83.4951
85.5086 85.1485 85.5086 85.5086 85.5086
85.1721 85.1721 85.1721 85.5252 85.1721
15
can be seen in the table, were measured by JRip, J48, Logistic function, ZeroR and NaiveBayesclassifiers.
Training setCross validation Training setCross validation
Training setCross validation
Training setCross validation
Training setCross validation
Training setCross validation
Table 4: Measuring attributes churn attribute prediction ability
As indicated in the table, out of 21 attributes 16 of them were grouped according to therelationship of service provision and their general similarity. All the prediction measures on all the3333 instances indicated only around 85%. Five attributes,
including ‘state, ‘area code’ and ‘phone number’ were manually
removed from the groupings. We will come back to these later. The
above groups are no good as their measure is low. Thus, the
selection of the attributes must be changed.
If we test all the attributes on J48 we get the highest prediction(95.5596% on training set and93.6994% on cross validation) as shown below. The accuracy of
true positive is as high as 0.992. But we need to bring the
number of attributes down to find the smallest number of
16
attributes that is leading to customer churning.=== Evaluation on training set ====== Summary ===Correctly Classified Instances 3185 95.5596 % (93.6994 % on Cross validation) Incorrectly Classified Instances 148 4.4404 %
Kappa statistic 0.804Mean absolute error 0.0797Root mean squared error 0.1996Relative absolute error
32.1415 % Root relative
squared error
56.7108 % Total Number of
Instances 3333
=== Detailed Accuracy By Class ===TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.992 0.257 0.958 0.992 0.974 0.914 FALSE0.743 0.008 0.937 0.743
0.829 0.914 TRUE Weighted Avg.
0.956 0.221 0.955 0.956 0.953 0.914
17
=== Confusion Matrix ===a b <--classified as
2826 24 | a = FALSE124 359 | b = TRUE
After applying filtering by attributeselection by Weka, only the
five attributes were selected. J48 was used to measure their
common prediction ability to check the goodness of selection.
As can
be seen below, the result came down to 89.089% on training set
and 88.5389% on cross validation. This is reflected in all other
measures as well. Again this grouping is not good but will be used
as a check point against the other group. The list of this group
is included in the section data pre- processing.Evaluation on training set ====== Summary ===Correctly Classified Instances 2969 89.0789 % (88.5389% on 10 fold Cross validation) Incorrectly Classified Instances 364 10.9211 %Kappa statistic 0.4839Mean absolute error 0.1806Root mean squared error
0.3005Relative absolute error
72.8226 %Root relative
squared error85.3622%Total Number of
Instances3333
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.966 0.553 0.912 0.966 0.938 0.736 FALSE0.447 0.034 0.69 0.447
0.543 0.736 TRUE Weighted Avg.
0.891 0.478 0.879 0.891 0.881 0.736
=== Confusion Matrix ===
a b <--classified as
2753 97 | a = FALSE267 216 | b = TRUE
New grouping of attributes was used to bring a new better
18
prediction. 19 attributes were used after the ‘state’, ‘area
code’ and ‘phone number’ attributes were removed manually. Using
J48 the measure shows that this19 groups can predict 96.8497%
correctly on training set and 94.5395% correctly on 10 fold cross
validation. The accuracy of the true positive goes up to 0.997.
This group predicted better than the whole attributes.=== Evaluation on training set ====== Summary ===
Correctly Classified Instances 3228 96.8497 % (94.5395 % on cross validation) Incorrectly Classified Instances 105 3.1503 %
19
Kappa statistic 0.8626Mean absolute error 0.0595Root mean squared error
0.1725Relative absolute error
24.0043 %Root relative
squared error49.0091%Total Number of
Instances3333
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.997 0.199 0.967 0.997 0.982 0.925 FALSE0.801 0.003 0.977 0.801
0.881 0.925 TRUE Weighted Avg. 0.968
0.17 0.969 0.968 0.967 0.925
=== Confusion Matrix ===a b <--classified as
2841 9 | a = FALSE96 387 | b =
TRUE
After applying filter of attributeselection of Weka onto these 19
attributes, 8 attributes were automatically selected as good
predators. Thus, this selection process involved both manual and
automatic procedure. When J48 and other measures were applied to
this group, the outcome shows the prediction ability of this
group did not drop that much. As shown below this group predicts
churning by 94.3294% on training set and 94.0294% on 10 fold
cross validation. This is the
highest prediction among any sub dataset so far. This indicates
that the factors that lead to customer churning should be among
any combination of these attribute or all of these attributes
combined.8 attributes selected and used=== Evaluation on training set ====== Summary ===Correctly Classified Instances 3144
94.3294 % (94.0294 % on cross validation) Incorrectly
Classified Instances 189 5.6706 %Kappa statistic 0.7576Mean absolute error 0.0992
20
Root mean squared error 0.2227Relative absolute error
39.9945 % Root relative
squared error
63.2605 % Total Number of
Instances 3333 #
=== Detailed Accuracy By Class ===TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.978 0.263 0.956 0.978 0.967 0.908 FALSE0.737 0.022 0.852 0.737
0.79 0.908 TRUE Weighted Avg. 0.943 0.228 0.941 0.943 0.942 0.908
=== Confusion Matrix ===a b <--classified as
2788 62 | a = FALSE127 356 | b = TRUE
21
There are attributes that are manually eliminated. These are
‘state’, ‘area code’ and ‘phone number. Unless other variables
like service provision, service plan and service charge etc, are
added to them it is not reasonable to assume that these
attributes can lead to customer churning. The
weka.filters.supervised.attribute.AttributeSelection is used to eliminate other
attributes. As this was done automatically no justification can
be made. But it can be guessed that the eliminated attributes
have less churn prediction capabilities and are not likely linked
to the risk of customer churning.
Data Pre-processingAs discussed above the following dataset is considered the mostpredictive ones. This dataset is referred to as the dataset of group 8. The attributes of dataset group of 8 are:
1. Inter Plan
2. VoiceMail Plan
3. Total Day Min
4. Total Evening Min
5. Total Int Min
6. Total Int Calls
7. No of Calls Customer Service
8. Churn
There is no missing data in this group. There is no known or identified outlier in this dataset as well but there are some skewed instances variables. This group is discretised using weka.filters.supervised.attribute.Discretize. This is an instance filter that discretizes a range of numeric attributes in the dataset into nominal attributes.
22
The dataset is normalized using weka.filters.unsupervised.attribute.Normalize. It normalizes all numeric values in the given dataset. In addition to this the dataset is transformed using weka.attributeSelection.PrincipalComponents. This performs a principal components analysis and transformation of the data. It is used in conjunction with a Ranker search. The dimensionality reduction is accomplished by choosing enough eigenvectors to account for some percentage of the variance in the original data-
0.95 (95%).4 This is the group that best predicted the risk ofchurning. If any one attribute is removed from this group, or theif they are used separately, the
4 Notes in Weka software
23
scores for measure of prediction highly come down as shown in the table below. Thus, thisindicates that this group should be used together.
Attributes Test J48
Inter PlanTraining set 85.5086Cross 85.5086
VoiceMail PlanTraining set 85.5086Cross 85.5086
Total Day MinTraining set 86.7987Cross 86.7987
Total Evening MinTraining set 85.5086Cross 85.5086
Total Int MinTraining set 85.5086Cross 85.5086
Total Int CallsTraining set 85.5086Cross 85.5086
No of Calls Customer
Training set 85.7786Cross 85.1485
Table 5: J48 test score on individualattributes of dataset group 8
The other dataset is the dataset that is automatically selected by
Weka. This dataset is referred to as dataset group of 5. Dataset
group of 5 will be tested in all the procedures to see if fits for
modeling for the whole dataset. The dataset group of 5 consists
of:
1. Phone Number
2. Inter Plan
3. Total Day Min
4. No of Calls Customer Service
5. ChurnThis dataset included the phone number which was manually removedfrom the dataset group of 8.
ClusteringWEKA’s weka.clusterers.SimpleKMeans is used to segment the
24
customers in order to get groups of customers with similar
service usage characteristics.kMeansNumber ofiterations: 4Within cluster sum of squarederrors: 5011.0Missing values globally replacedwith mean/modeClustercentroids:
Cluster#Attribute Full Data 0 1
25
(3333) (483) (2850)=============================================
=== Inter Plan
no no noVoiceMail Plan no no noTotal Day Min '(-inf-175]' '(-inf-175]' '(-inf-175]'Total Evening Min '(-inf-249.15]'
'(-inf-249.15]' '(-inf-249.15]' Total Int Min
'(-inf-13.15]' '(-inf-13.15]' '(-inf-13.15]'Total Int Calls '(2.5-inf)' '(2.5-inf)' '(2.5-inf)'No of Calls Customer Service '(-inf-3.5]'
'(-inf-3.5]' '(-inf-3.5]' Churn
FALSE TRUE FALSE Time
taken to build model (full training data) :
0.08 seconds
=== Model and evaluation on
training set === Clustered
Instances0 483 ( 14%)1 2850 ( 86%)
kMeans on all attributesNumber of iterations: 15Within cluster sum of squared errors: 10265.248610085338Missing values globally replaced with mean/mode=== Model and evaluation on training set === Clustered Instances
0 1458 ( 44%)1 1875 ( 56%)
As indicated in the output above, there is a reliable (14% and
86%) clustering of the instances. But the clustering of the whole
attributes does not seem to differentiate churners and non-
churners
(44% and 56%) properly. Comparing the clustering results show
the group of 8 attribute is better than the whole dataset.
Mining the DataTwo datasets were used in the whole process of data mining:the
group of 8 attributes and the group of 5 attributes. Three
26
classifiers, OneR, JRip, and J48 were used to classify the
datasets.
OneROneR is a class for building and using a 1R classifier. In other
words, it uses the minimum-error attribute for prediction, and
discretizes numeric attributes. OneR produced the same output on
both datasets (5, and 8). The classification of correctness is
only 85.5086%, which is the same as the earlier score.ZeroR predicts class
value: FALSE Time taken
to build model: 0 seconds=== Evaluation on training set ====== Summary ===Correctly Classified Instances 2850 85.5086 %
Incorrectly Classified Instance
483 14.4914%Kappa statistic
Mean absolute error 0.248Root mean squared error
0.352Relative absolute error
100 %Root relative
squared error100 %Total Number of
Instances3333
Kappa statistic 0.7743Mean absolute error 0.0956Root mean squared error
0.2187Relative absolute error
38.5594 %Root relative
squared error62.1151%Total Number of
Instances3333
27
s0
=== Detailed Accuracy By Class ===TP Rate FP Rate Precision Recall F-Measure ROC Area Class1 1 0.855 1 0.922 0.5 FALSE0 0 0 0
0 0.5 TRUE Weighted Avg. 0.855 0.855 0.731 0.855 0.788 0.5
=== Confusion Matrix ===a b <-- classified as
2850 0 | a = FALSE483 0 | b = TRUE
JRipJRip is performed on both datasets. It is the class for building
and using a 0-R classifier. It predicts the mean for a numeric
class or the mode for a nominal class. JRip better measured the
correctness of the dataset than the OneR in both dataset. It is
higher For the 8 group of dataset than for the dataset of 5, just
as it was stated earlier.=== Evaluation on training set ====== Summary ===Correctly Classified Instances 3153 94.5995 % IncorrectlyClassified Instances 180 5.4005 %
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.975 0.228 0.962 0.975 0.969 0.878 FALSE0.772 0.025 0.842 0.772
0.806 0.878 TRUE Weighted Avg. 0.946
28
0.198 0.945 0.946 0.945 0.878
=== Confusion Matrix ===
a b <--classified as
2780 70 | a = FALSE110 373 | b = TRUE
JRip on dataset group of 5Evaluation on training set ===
29
=== Summary ===
Correctly Classified Instances 2947 88.4188 % IncorrectlyClassified Instances 386 11.5812 %Kappa statistic 0.4614Mean absolute error 0.1904Root mean squared error
0.3086Relative absolute error
76.7969 %Root relative
squared error87.6606%Total Number of
Instances3333
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.959 0.559 0.91 0.959 0.934 0.702 FALSE0.441 0.041 0.647 0.441
0.525 0.702 TRUE Weighted Avg. 0.884
0.484 0.872 0.884 0.875 0.702
=== Confusion Matrix ===
a b <--classified as
2734 116 | a = FALSE270 213 | b = TRUE
The following is a test of JRip on group of 5 and 8 datasets on 10
fold cross validation. The scores on correctness are slightly
lower in both. While the rate for confusion matrix did not change
for the group of 8 it is slightly lower for the group of 5. The
following is JRip on dataset of 8 for cross validation. Compared
to that of group 5, this is much higher than that both in training
set and cross validation scores.=== Stratified cross-validation ====== Summary ===
Correctly Classified Instances 3151 94.5395 % IncorrectlyClassified Instances 182 5.4605 %Kappa statistic 0.7722Mean absolute error 0.0966
30
Root mean squared error
0.22Relative absolute
error38.9564 %Root relative
squared error62.5069%Total Number of
Instances3333
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.975 0.228 0.962 0.975 0.968 0.867 FALSE0.772 0.025 0.838 0.772
0.804 0.867 TRUE Weighted Avg. 0.945 0.198 0.944 0.945 0.944 0.867
31
=== Confusion Matrix ===a b <--classified as
2778 72 | a = FALSE110 373 | b = TRUE
The following is JRip on dataset group of 5 for cross validation=== Stratified cross-validation ====== Summary ===Correctly Classified Instances 2939 88.1788 % IncorrectlyClassified Instances 394 11.8212 %Kappa statistic 0.4577Mean absolute error 0.1905Root mean squared error
0.3097Relative absolute error
76.8053 %Root relative
squared error87.975 %Total Number of
Instances3333
=== Detailed Accuracy By Class ===TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.955 0.553 0.911 0.955 0.933 0.688 FALSE0.447 0.045 0.63 0.447
0.523 0.688 TRUE Weighted Avg. 0.882 0.479 0.87 0.882 0.873 0.688
=== Confusion Matrix ===
a b <--classified as
2723 127 | a = FALSE267 216 | b = TRUE
J48The score on cross validation test on J48 for the same dataset
is slightly higher for dataset group of 5 on both training set
and cross validation as shown below.=== Evaluation on training (un pruned) ====== Summary ===Correctly ClassifiedInstances
2967 89.0189 % (89.0489 % pruned)Incorrectly
Classified Instances366 10.9811 %
Kappa statistic 0.4992Mean absolute error 0.1743
32
Root mean squared error
0.2952Relative absolute error
70.3023 %Root relative
squared error83.872 %Total Number of
Instances3333
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
33
0.959 0.518 0.916 0.959 0.937 0.796 FALSE0.482 0.041 0.668 0.482
0.56 0.796 TRUE Weighted Avg. 0.89
0.448 0.88 0.89 0.883 0.796
=== Confusion Matrix ===a b <-- classified as2734 116 | a = FALSE250 233 | b = TRUE
The following is J48 on dataset of group of 5 after and before
pruning. The scores show no much difference.=== Stratified cross-validation (un pruned)====== Summary ===
Correctly Classified Instances 2961 88.8389 % (88.7789 % pruned) Incorrectly Classified Instances 372 11.1611 %Kappa statistic 0.4949Mean absolute error 0.175Root mean squared error
0.2965Relative absolute error
70.5623 %Root relative
squared error84.2364%Total Number of
Instances3333
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.957 0.516 0.916 0.957 0.936 0.78 FALSE0.484 0.043 0.655 0.484
0.557 0.78 TRUE Weighted Avg. 0.888
0.447 0.879 0.888 0.881 0.78
=== Confusion Matrix ===
a b <--classified as
2727 123 | a = FALSE249 234 | b = TRUE
The following is the test score of J48 on dataset group of 8 both
after pruning and before, on training set. The result shows there
is not much difference in all figures. But it is very much higher
to that of group five scores.=== Evaluation on training set (un pruned)=== Summary ===
Kappa statistic 0.7576Mean absolute error 0.0992Root mean squared error
0.2227Relative absolute error
39.9945 %Root relative
squared error63.2605%Total Number of
Instances3333
34
Correctly Classified Instances 3144 94.3294 % (94.5995 % pruned) Incorrectly Classified Instances 189 5.6706 %
Kappa statistic 0.7471Mean absolute error 0.102Root mean squared error
0.2288Relative absolute error
41.1161 %Root relative
squared error64.9986%Total Number of
Instances3333
35
=== Detailed Accuracy By Class ===TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.978 0.263 0.956 0.978 0.967 0.908 FALSE0.737 0.022 0.852 0.737
0.79 0.908 TRUE Weighted Avg. 0.943 0.228 0.941 0.943 0.942 0.908
=== Confusion Matrix ===
a b <--classified as
2788 62 | a = FALSE127 356 | b = TRUE
The following is J48 on dataset 8 after and before pruning both ontraining set and cross validation.=== Stratified cross-validation (un pruned)====== Summary ===Correctly Classified Instances 3134 94.0294 % ( 93.9994 % pruned) Incorrectly Classified Instances 199 5.9706 %
=== Detailed Accuracy By Class ===TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.975 0.263 0.956 0.975 0.965 0.89 FALSE0.737 0.025 0.832 0.737
0.782 0.89 TRUE Weighted Avg. 0.94
0.228 0.938 0.94 0.939 0.89
=== Confusion Matrix ===a b <--classified as
2778 72 | a = FALSE127 356 | b = TRUE
The following table summarises all the three tests on the 8
datasets with and without pruning. JRip and J48 both measures to
the best accuracy level both on training set and 10 fold cross
validation
36
before and after pruning.ZeroR JRip J48
Inter Plan With Pruning Training Set 85.5086 94.5995 94.3294VoiceMail Plan Cross 85.5086 94.5395 94.0294Total Day MinTotal Evening Min No pruning Training Set 94.5695
%94.5995%Total Int Min Cross 94.2094 93.9994
Total Int CallsNo of Calls Customer Churn
Table 6: ZeroR, JRip and J48 test score for the dataset group of 8
37
The test with J48 on dataset group of 5, and on the dataset of 8
show that the dataset group of 8 better predicts the churning
attribute easier pruned or not, both on the training set and on
the 10 fold cross validation. Therefore, as previously held
conclusion is true that dataset group of 8 is thebest predictor of the risk to churning.Attributes ZeroR JRip J48Phone Number With Pruning Training Set 85.5086
%88.4188%
89.0189%Inter Plan Cross
Validation85.5086%
88.1788%
88.8389%Total Day Min
No of Calls Customer Service
No pruning Cross Validation
87.8788%
89.1389%Churn Training Set 86.7987
%88.4188%Table 7: ZeroR, JRip and J48 test score for the dataset group of
5
Figure 9: The Tree representation of the dataset group 8
Weka’s Experimenter was used to see if the model performs well.
Both JRip and J48 were the best models to use but as can be seen
below JRip is slightly better than J48.
38
Figure 10: Classifiers comparison on ExperimenterThe JRip test on experimenter resulted different set of new
scores for correctness. Accordingly while all of the scores were
more than 92% the maximum score was up to 97.005988%. This is a
nearly perfect prediction.
SummaryAs can be seen from all the procedures undertaken so far the
dataset group of 8 attributes which includes Inter Plan,
VoiceMail Plan, Total Day Min, Total Evening Min, Total Int. Min,
Total Int. Calls ,No of Calls Customer Service and churn is the
best sub dataset to predict churning. To get this group the other
attributes were eliminated either manually or automatically.
State, area code and phone number were manually removed because
these attributes by themselves can not be related to churning
unless related to other variables.
39
ConclusionsAccording to the data collected by the company the risk of
customer churning is related to the service plan customers have,
the time they avail of the services and the services and charges
of international call. The data shows that customers who have no
voicemail plan are at risk of churning. Moreover, customers with
few minutes of day and night time calls are found to be prone to
churning. International call minutes and charges were related to
customer churning. Though this
40
needs to be investigated, it seems that people who are charged
beyond average are linked to the risk of churning. The other
important finding is that clients who have called customer
service for more than average times are more linked to churning
than those who called below average.
In conclusion the company need to encourage its customers to
sign for voice mail services. This may be considered as this may
have been encouraging those signed to stay. The company also
need to consider the charges for international calls.
International call minutes and charges are in correlation to
each other. The other important area to consider is the client
call to customer call. Calls should be recorded and the mangers
need to follow if customer needs and problems were solved on
time. The company needs to look at its customer call
representatives and see if they are professional enough to solve
customers’ problems.
THANKS TO WEKA DEVELOPERS
Tesfaye Onsho.
Top Related