Data Analysis Using WEKA-Issues in Customer Churning

40
Issues in Customer Churning in an iTelecom Company Tesfaye Onsho Gudeta Department of Computing MSc Distributed and Mobile Computing Institute of Technology Tallaght Dublin, Ireland, 2013 T e s f a yeonsho @ .yahoo.com Introduction Data mining is one of the best ways to identify patterns and problems in large amount of data to support problem solving process. In this paper the causes business problems under a company’s will be identified from the data collected from the day to day activities of the company. iTelecom, the provider of telephone and broadband services, is facing customer management problems. There were high customer churning in company but the company does not know what is exactly causing it. The company collected vast amount of data related to its customer, the plan customers signed on and the services it provides. iTelecom has placed a high emphasis on churning problem and it is trying to find new ways of reducing it. The directors of iTelecom would like to answer the following questions. - What is it that makes a customer churn? - Are some customers more likely to churn than others? - How can we identify these customers before they churn? In an attempt to answer these questions the company data was organised and analysed using Weka software. The data is composed

Transcript of Data Analysis Using WEKA-Issues in Customer Churning

Issues in Customer Churning in aniTelecom Company

Tesfaye Onsho GudetaDepartment of

ComputingMSc Distributed and Mobile

ComputingInstitute of Technology

TallaghtDublin, Ireland, 2013

T esf [email protected]

IntroductionData mining is one of the best ways to identify patterns and

problems in large amount of data to support problem solving

process. In this paper the causes business problems under a

company’s will be identified from the data collected from the day

to day activities of the company. iTelecom, the provider of

telephone and broadband services, is facing customer management

problems. There were high customer churning in company but the

company does not know what is exactly causing it. The company

collected vast amount of data related to its customer, the plan

customers

signed on and the services it provides. iTelecom has placed a

high emphasis on churning problem and it is trying to find new

ways of reducing it. The directors of iTelecom would like to

answer the following questions.

- What is it that makes a customer churn?- Are some customers more likely to churn than others?- How can we identify these customers before they churn?

In an attempt to answer these questions the company data was

organised and analysed using Weka software. The data is composed

of 21 attributes with nominal and numeric data types. There were

3333 instances used to analyse the data. The data is pre-processed

tested with at different levels for missing data, for

inconsistencies. Then, appropriate attributes were selected. The

appropriateness

an attribute was measure with its ability in predicting

churning. Then, the data mining performed using different

classifiers and tests. Then the data is divided into sub

datasets to find out which attribute or group of attribute can

better predict churning. In conclusion the best group was

selected and used to spot why these attributes are linked to the

risk of churning.

3

Data Understanding and Pre-Processing PhaseData UnderstandingIn an attempt to understand the data, different exploration

mechanisms were used. As stated, there were twenty-one attributes

with 3333 instances. The following table summarised all the

attributes and the type of data they hold. As can be seen, there

are two types of attributes with two typesdata-nominal and numeric.No. Attributes Type

1 State Nominal2 Account Length Numeric3 Area Code Nominal4 Phone Number Nominal5 Inter Plan Nominal6 VoiceMail Plan Nominal7 No of Vmail Mesgs Numeric8 Total Day Min Numeric9 Total Day calls Numeric10 Total Day Charge Numeric11 Total Evening Min Numeric12 Total Evening Calls Numeric13 Total Evening Charge Numeric14 Total Night Minutes Numeric15 Total Night Calls Numeric16 Total Night Charge Numeric17 Total Int Min Numeric18 Total Int Calls Numeric19 Total Int Charge Numeric20 No of Calls Customer

ServiceNumeric21 Churn Nomina

Table 1: Attributes and data typesAll the values in the attributes are complete and no value is

missing. Thus, it is possible to easily calculate the measures

of central tendency and depression on the numeric data. It is

possible to calculate frequency distribution and modal on the

nominal data as well as shown in the table below. Some

attributes such as Inter plan and VoiceMail and Churn Plan

contained categorical data like Yes / No and False/True.

Attributes Minimum Maximum Average St,devState Nominal Nominal Nominal NominalAccount Length 1 243 101.06

539.822

4

Area Code Nominal Nominal Nominal NominalPhone Number Nominal Nominal Nominal NominalInter Plan Nominal Nominal 3010(N) 323(Y)VoiceMail Plan Nominal Nominal 922(Y) 2411(N)No of Vmail Mesgs 0 51 8.09

913.688Total Day Min 0 350.

8179.775

54.467Total Day calls 0 165 100.43

620.069Total Day Charge 0 59.6

430.562

9.259Total Evening Min 0 363.

7200.98

50.714

5

Total Evening Calls 0 170 100.114

19.923Total Evening Charge 0 30.9

117.084

4.311Total Night Minutes 23.2 395 200.87

250.574Total Night Calls 33 175 100.10

819.569Total Night Charge 1.04 17.7

79.039

2.276Total Int Min 0 20 10.23

72.792Total Int Calls 0 20 4.47

92.641Total Int Charge 0 5.4 2.76

50.754No of Calls Customer

Service0 9 1.56

31.315Churn Nominal Nominal 2850(F) 483(T)

Table 2: Attributes and the measure of central tendencyWe can also see if a given distribution is normal, skewed or

peaked. Average and standard deviation show us how the data is

distributed about the mean/average and standard deviation. The

smaller the standard deviation the less distributed the data.

Therefore, we have both very less distributed and highly

distributed data. Total Day Min, Total Evening Min and Total

Night Minutes are the highly distributed data.

Distribution

AttributesStateAccount Length

AverageNominal101.06

St, devNominal

39.82

SkewednessNominal 0.10

KurtosisNominal

Area Code Nominal Nominal Nominal NominalPhone Number Nominal Nominal Nominal NominalInter Plan 3010(N) 323(Y) Nominal NominalVoiceMail Plan 922(Y) 2411(N) Nominal NominalNo of Vmail Mesgs 8.09

913.688

1.27 -0.0Total Day Min 179.77

554.467

-0.03

-0.0Total Day calls 100.43

620.069

-0.11

0.24Total Day Charge 30.56

29.259

-0.03

-0.0Total Evening Min 200.9

850.714

-0.02

0.03Total Evening Calls 100.11

419.923

-0.06

0.21Total Evening Charge 17.08

44.311

-0.02

0.03Total Night Minutes 200.87

250.574

0.01 0.09Total Night Calls 100.10

819.569

0.03 -0.0Total Night Charge 9.03

92.276

0.01 0.09Total Int Min 10.23

72.792

-0.24

0.61Total Int Calls 4.47

92.641

1.32 3.08Total Int Charge 2.76

50.754

-0.25

0.61No of Calls Customer

ServiceChurn

1.563

Nominal

1.315

Nominal2850(F)

1.06 1.52

483(T)

Table 3: Attributes and themeasures of Distribution

The scores of skewness tell us if the data is a normal or a

6

skewed distribution. As the scores show most of the numeric data

is distributed normally. Scores near to zero means the data is

distributed near normal. “The skewness for a normal distribution

is zero, and any symmetric data should have a skewness near zero.

Negative values for the skewness indicate data that are skewed

left and

7

positive values for the skewness indicate data that are skewed

right.”1 Though there are three right skewed distributions- the ‘No of Voice messages’, ‘Total Int. Charge’ and “No of Calls CustomerService” The ‘Total Int. charge’ and ‘No of Calls Customer Service’ are also affected by peak data as measured by Kurtosis. Other than this attributes Kurtosis scores show that they have flat distribution as “datasets with low kurtosis tend to have a flat top near the mean rather than a sharppeak.”2 How the data is distributed can be seen from the histogrambelow.

Figure 1: Distribution as represented by histogramThough is very difficult to see at this level these histograms

show which attributed is more likely linked to churning. For

example, ‘Total Day Minutes’ and ‘Total Day Charge’, especially at

the highest scores level, seem to link to churning. In addition,

looking at the ‘VoiceMail Plan’ one can say those who are not in

the plans are more prone to churning than those who have the plan.

8

As shown in the histogram and the tables, most of the data is

normally distributed. But those which are skewed have outliers.

Outliers can be noise or just normal data out of the normal

range.3

These data groups need to be visited or removed from the analysis.Weka 2D-scatter plots can help us understand the data more. Thefollowing plot distribution is shown for each attribute. The column at the right hand contains plot for each attribute and when1 h tt p:// www. itl .ni st. gov/ di v898/ han dbook/ eda / secti on 3/ eda 35b. h tm2 h tt p:// www. itl .ni st. gov/ di v898/ han dbook/ eda / secti on 3/ eda 35b. h tm3 Witten Ian H and Frank Eibe (2005) Data Mining: Practical Machine Learning Tools and Techniques. MorganKaufman Publishers, SF, USA.

9

we click on any row we ca see the details of that row on the left

side. In the example below ‘No of call customer service’ is in

correlation with ‘churn’ attribute. From the plot we can see that

‘No of call customer service’ which are above the average (4.5)

value are linked to churning.

Figure 2: Attributes as seen in 2D plotIt is also possible to view individual instance of an attribute

by clicking on each individual plots as shown below

Figure 3: Instances in a 2D plot window

There are other many attributes that seem to have correlated as we can see from the plot. The following two 2D plots

10

represent attribute ‘churn’ and ‘Inter. Plan’. As can be seenfrom the colours both attributes seem to correlate.

11

Figure 4: Plotted ‘churn’ attribute

Figure 5: Plotted ‘Inter Call’ attributeMoreover looking at the plots below one can easily guess that

‘Inter Plan and ‘VoiceMail Plan’ are highly correlated. Such

similarities

Figure 6: Multi-plotted view of correlated attributesA closer look at the 2D plot of the relationship between churn

attribute and that of No of Call customer service show that

these two the later is highly related to the former especially

when the value of instance is around and greater than the

12

average value.

13

Figure 7: Plotted ‘Churn’ verse ‘No of customer call services’If we look at the other attributes in the figure below they

follow some similar pattern and look more correlated than

before after this transformation

Figure 8 Patterns of dataset of group 8

It takes long phase of analysis to identify which attribute or

which group of attribute can better predict the churn attribute.

The following six groups of attributes are made and an attempt

was made to measure their predictive ability. They are groups of

14

attribute in service plan, day time services, evening time

services, night time’s services and international services.

These groups, as

No. Datatype

1 Inter Plan Nominal

VoiceMail Plan Nominal

2 Total Day Min Numeric

Total Day calls NumericTotal Day Charge Numeric

3 Total Evening Min Numeric

Total Evening Calls NumericTotal Evening Charge Numeric

4 Total Night Minutes Numeric

Total Night Calls NumericTotal Night Charge Numeric

5 Total Int Min Numeric

Total Int Calls NumericTotal Int Charge Numeric

6 No. Calls Cust Service Numeric

Churn Nominal

JRip J48 Logistic ZeroR Naivebayes

85.5086 85.5086 85.1485 85.5086 85.5086

85.5252 85.5252 85.5252 85.5086 85.5252

86.3186 86.7987 86.7987 85.5086 86.7987

85.5252 86.0547 86.0547 85.5252 86.0547

85.5086 85.5086 85.1485 85.5086 85.5086

85.5252 85.5252 85.5252 85.5086 85.5252

85.5086 85.5086 85.1485 85.5086 85.5086

85.5252 85.5252 85.5252 85.5086 85.5252

85.5086 85.5086 85.1485 85.5086 85.5086

85.5252 85.5252 85.5252 85.5086 83.4951

85.5086 85.1485 85.5086 85.5086 85.5086

85.1721 85.1721 85.1721 85.5252 85.1721

15

can be seen in the table, were measured by JRip, J48, Logistic function, ZeroR and NaiveBayesclassifiers.

Training setCross validation Training setCross validation

Training setCross validation

Training setCross validation

Training setCross validation

Training setCross validation

Table 4: Measuring attributes churn attribute prediction ability

As indicated in the table, out of 21 attributes 16 of them were grouped according to therelationship of service provision and their general similarity. All the prediction measures on all the3333 instances indicated only around 85%. Five attributes,

including ‘state, ‘area code’ and ‘phone number’ were manually

removed from the groupings. We will come back to these later. The

above groups are no good as their measure is low. Thus, the

selection of the attributes must be changed.

If we test all the attributes on J48 we get the highest prediction(95.5596% on training set and93.6994% on cross validation) as shown below. The accuracy of

true positive is as high as 0.992. But we need to bring the

number of attributes down to find the smallest number of

16

attributes that is leading to customer churning.=== Evaluation on training set ====== Summary ===Correctly Classified Instances 3185 95.5596 % (93.6994 % on Cross validation) Incorrectly Classified Instances 148 4.4404 %

Kappa statistic 0.804Mean absolute error 0.0797Root mean squared error 0.1996Relative absolute error

32.1415 % Root relative

squared error

56.7108 % Total Number of

Instances 3333

=== Detailed Accuracy By Class ===TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.992 0.257 0.958 0.992 0.974 0.914 FALSE0.743 0.008 0.937 0.743

0.829 0.914 TRUE Weighted Avg.

0.956 0.221 0.955 0.956 0.953 0.914

17

=== Confusion Matrix ===a b <--classified as

2826 24 | a = FALSE124 359 | b = TRUE

After applying filtering by attributeselection by Weka, only the

five attributes were selected. J48 was used to measure their

common prediction ability to check the goodness of selection.

As can

be seen below, the result came down to 89.089% on training set

and 88.5389% on cross validation. This is reflected in all other

measures as well. Again this grouping is not good but will be used

as a check point against the other group. The list of this group

is included in the section data pre- processing.Evaluation on training set ====== Summary ===Correctly Classified Instances 2969 89.0789 % (88.5389% on 10 fold Cross validation) Incorrectly Classified Instances 364 10.9211 %Kappa statistic 0.4839Mean absolute error 0.1806Root mean squared error

0.3005Relative absolute error

72.8226 %Root relative

squared error85.3622%Total Number of

Instances3333

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.966 0.553 0.912 0.966 0.938 0.736 FALSE0.447 0.034 0.69 0.447

0.543 0.736 TRUE Weighted Avg.

0.891 0.478 0.879 0.891 0.881 0.736

=== Confusion Matrix ===

a b <--classified as

2753 97 | a = FALSE267 216 | b = TRUE

New grouping of attributes was used to bring a new better

18

prediction. 19 attributes were used after the ‘state’, ‘area

code’ and ‘phone number’ attributes were removed manually. Using

J48 the measure shows that this19 groups can predict 96.8497%

correctly on training set and 94.5395% correctly on 10 fold cross

validation. The accuracy of the true positive goes up to 0.997.

This group predicted better than the whole attributes.=== Evaluation on training set ====== Summary ===

Correctly Classified Instances 3228 96.8497 % (94.5395 % on cross validation) Incorrectly Classified Instances 105 3.1503 %

19

Kappa statistic 0.8626Mean absolute error 0.0595Root mean squared error

0.1725Relative absolute error

24.0043 %Root relative

squared error49.0091%Total Number of

Instances3333

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.997 0.199 0.967 0.997 0.982 0.925 FALSE0.801 0.003 0.977 0.801

0.881 0.925 TRUE Weighted Avg. 0.968

0.17 0.969 0.968 0.967 0.925

=== Confusion Matrix ===a b <--classified as

2841 9 | a = FALSE96 387 | b =

TRUE

After applying filter of attributeselection of Weka onto these 19

attributes, 8 attributes were automatically selected as good

predators. Thus, this selection process involved both manual and

automatic procedure. When J48 and other measures were applied to

this group, the outcome shows the prediction ability of this

group did not drop that much. As shown below this group predicts

churning by 94.3294% on training set and 94.0294% on 10 fold

cross validation. This is the

highest prediction among any sub dataset so far. This indicates

that the factors that lead to customer churning should be among

any combination of these attribute or all of these attributes

combined.8 attributes selected and used=== Evaluation on training set ====== Summary ===Correctly Classified Instances 3144

94.3294 % (94.0294 % on cross validation) Incorrectly

Classified Instances 189 5.6706 %Kappa statistic 0.7576Mean absolute error 0.0992

20

Root mean squared error 0.2227Relative absolute error

39.9945 % Root relative

squared error

63.2605 % Total Number of

Instances 3333 #

=== Detailed Accuracy By Class ===TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.978 0.263 0.956 0.978 0.967 0.908 FALSE0.737 0.022 0.852 0.737

0.79 0.908 TRUE Weighted Avg. 0.943 0.228 0.941 0.943 0.942 0.908

=== Confusion Matrix ===a b <--classified as

2788 62 | a = FALSE127 356 | b = TRUE

21

There are attributes that are manually eliminated. These are

‘state’, ‘area code’ and ‘phone number. Unless other variables

like service provision, service plan and service charge etc, are

added to them it is not reasonable to assume that these

attributes can lead to customer churning. The

weka.filters.supervised.attribute.AttributeSelection is used to eliminate other

attributes. As this was done automatically no justification can

be made. But it can be guessed that the eliminated attributes

have less churn prediction capabilities and are not likely linked

to the risk of customer churning.

Data Pre-processingAs discussed above the following dataset is considered the mostpredictive ones. This dataset is referred to as the dataset of group 8. The attributes of dataset group of 8 are:

1. Inter Plan

2. VoiceMail Plan

3. Total Day Min

4. Total Evening Min

5. Total Int Min

6. Total Int Calls

7. No of Calls Customer Service

8. Churn

There is no missing data in this group. There is no known or identified outlier in this dataset as well but there are some skewed instances variables. This group is discretised using weka.filters.supervised.attribute.Discretize. This is an instance filter that discretizes a range of numeric attributes in the dataset into nominal attributes.

22

The dataset is normalized using weka.filters.unsupervised.attribute.Normalize. It normalizes all numeric values in the given dataset. In addition to this the dataset is transformed using weka.attributeSelection.PrincipalComponents. This performs a principal components analysis and transformation of the data. It is used in conjunction with a Ranker search. The dimensionality reduction is accomplished by choosing enough eigenvectors to account for some percentage of the variance in the original data-

0.95 (95%).4 This is the group that best predicted the risk ofchurning. If any one attribute is removed from this group, or theif they are used separately, the

4 Notes in Weka software

23

scores for measure of prediction highly come down as shown in the table below. Thus, thisindicates that this group should be used together.

Attributes Test J48

Inter PlanTraining set 85.5086Cross 85.5086

VoiceMail PlanTraining set 85.5086Cross 85.5086

Total Day MinTraining set 86.7987Cross 86.7987

Total Evening MinTraining set 85.5086Cross 85.5086

Total Int MinTraining set 85.5086Cross 85.5086

Total Int CallsTraining set 85.5086Cross 85.5086

No of Calls Customer

Training set 85.7786Cross 85.1485

Table 5: J48 test score on individualattributes of dataset group 8

The other dataset is the dataset that is automatically selected by

Weka. This dataset is referred to as dataset group of 5. Dataset

group of 5 will be tested in all the procedures to see if fits for

modeling for the whole dataset. The dataset group of 5 consists

of:

1. Phone Number

2. Inter Plan

3. Total Day Min

4. No of Calls Customer Service

5. ChurnThis dataset included the phone number which was manually removedfrom the dataset group of 8.

ClusteringWEKA’s weka.clusterers.SimpleKMeans is used to segment the

24

customers in order to get groups of customers with similar

service usage characteristics.kMeansNumber ofiterations: 4Within cluster sum of squarederrors: 5011.0Missing values globally replacedwith mean/modeClustercentroids:

Cluster#Attribute Full Data 0 1

25

(3333) (483) (2850)=============================================

=== Inter Plan

no no noVoiceMail Plan no no noTotal Day Min '(-inf-175]' '(-inf-175]' '(-inf-175]'Total Evening Min '(-inf-249.15]'

'(-inf-249.15]' '(-inf-249.15]' Total Int Min

'(-inf-13.15]' '(-inf-13.15]' '(-inf-13.15]'Total Int Calls '(2.5-inf)' '(2.5-inf)' '(2.5-inf)'No of Calls Customer Service '(-inf-3.5]'

'(-inf-3.5]' '(-inf-3.5]' Churn

FALSE TRUE FALSE Time

taken to build model (full training data) :

0.08 seconds

=== Model and evaluation on

training set === Clustered

Instances0 483 ( 14%)1 2850 ( 86%)

kMeans on all attributesNumber of iterations: 15Within cluster sum of squared errors: 10265.248610085338Missing values globally replaced with mean/mode=== Model and evaluation on training set === Clustered Instances

0 1458 ( 44%)1 1875 ( 56%)

As indicated in the output above, there is a reliable (14% and

86%) clustering of the instances. But the clustering of the whole

attributes does not seem to differentiate churners and non-

churners

(44% and 56%) properly. Comparing the clustering results show

the group of 8 attribute is better than the whole dataset.

Mining the DataTwo datasets were used in the whole process of data mining:the

group of 8 attributes and the group of 5 attributes. Three

26

classifiers, OneR, JRip, and J48 were used to classify the

datasets.

OneROneR is a class for building and using a 1R classifier. In other

words, it uses the minimum-error attribute for prediction, and

discretizes numeric attributes. OneR produced the same output on

both datasets (5, and 8). The classification of correctness is

only 85.5086%, which is the same as the earlier score.ZeroR predicts class

value: FALSE Time taken

to build model: 0 seconds=== Evaluation on training set ====== Summary ===Correctly Classified Instances 2850 85.5086 %

Incorrectly Classified Instance

483 14.4914%Kappa statistic

Mean absolute error 0.248Root mean squared error

0.352Relative absolute error

100 %Root relative

squared error100 %Total Number of

Instances3333

Kappa statistic 0.7743Mean absolute error 0.0956Root mean squared error

0.2187Relative absolute error

38.5594 %Root relative

squared error62.1151%Total Number of

Instances3333

27

s0

=== Detailed Accuracy By Class ===TP Rate FP Rate Precision Recall F-Measure ROC Area Class1 1 0.855 1 0.922 0.5 FALSE0 0 0 0

0 0.5 TRUE Weighted Avg. 0.855 0.855 0.731 0.855 0.788 0.5

=== Confusion Matrix ===a b <-- classified as

2850 0 | a = FALSE483 0 | b = TRUE

JRipJRip is performed on both datasets. It is the class for building

and using a 0-R classifier. It predicts the mean for a numeric

class or the mode for a nominal class. JRip better measured the

correctness of the dataset than the OneR in both dataset. It is

higher For the 8 group of dataset than for the dataset of 5, just

as it was stated earlier.=== Evaluation on training set ====== Summary ===Correctly Classified Instances 3153 94.5995 % IncorrectlyClassified Instances 180 5.4005 %

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.975 0.228 0.962 0.975 0.969 0.878 FALSE0.772 0.025 0.842 0.772

0.806 0.878 TRUE Weighted Avg. 0.946

28

0.198 0.945 0.946 0.945 0.878

=== Confusion Matrix ===

a b <--classified as

2780 70 | a = FALSE110 373 | b = TRUE

JRip on dataset group of 5Evaluation on training set ===

29

=== Summary ===

Correctly Classified Instances 2947 88.4188 % IncorrectlyClassified Instances 386 11.5812 %Kappa statistic 0.4614Mean absolute error 0.1904Root mean squared error

0.3086Relative absolute error

76.7969 %Root relative

squared error87.6606%Total Number of

Instances3333

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.959 0.559 0.91 0.959 0.934 0.702 FALSE0.441 0.041 0.647 0.441

0.525 0.702 TRUE Weighted Avg. 0.884

0.484 0.872 0.884 0.875 0.702

=== Confusion Matrix ===

a b <--classified as

2734 116 | a = FALSE270 213 | b = TRUE

The following is a test of JRip on group of 5 and 8 datasets on 10

fold cross validation. The scores on correctness are slightly

lower in both. While the rate for confusion matrix did not change

for the group of 8 it is slightly lower for the group of 5. The

following is JRip on dataset of 8 for cross validation. Compared

to that of group 5, this is much higher than that both in training

set and cross validation scores.=== Stratified cross-validation ====== Summary ===

Correctly Classified Instances 3151 94.5395 % IncorrectlyClassified Instances 182 5.4605 %Kappa statistic 0.7722Mean absolute error 0.0966

30

Root mean squared error

0.22Relative absolute

error38.9564 %Root relative

squared error62.5069%Total Number of

Instances3333

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.975 0.228 0.962 0.975 0.968 0.867 FALSE0.772 0.025 0.838 0.772

0.804 0.867 TRUE Weighted Avg. 0.945 0.198 0.944 0.945 0.944 0.867

31

=== Confusion Matrix ===a b <--classified as

2778 72 | a = FALSE110 373 | b = TRUE

The following is JRip on dataset group of 5 for cross validation=== Stratified cross-validation ====== Summary ===Correctly Classified Instances 2939 88.1788 % IncorrectlyClassified Instances 394 11.8212 %Kappa statistic 0.4577Mean absolute error 0.1905Root mean squared error

0.3097Relative absolute error

76.8053 %Root relative

squared error87.975 %Total Number of

Instances3333

=== Detailed Accuracy By Class ===TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.955 0.553 0.911 0.955 0.933 0.688 FALSE0.447 0.045 0.63 0.447

0.523 0.688 TRUE Weighted Avg. 0.882 0.479 0.87 0.882 0.873 0.688

=== Confusion Matrix ===

a b <--classified as

2723 127 | a = FALSE267 216 | b = TRUE

J48The score on cross validation test on J48 for the same dataset

is slightly higher for dataset group of 5 on both training set

and cross validation as shown below.=== Evaluation on training (un pruned) ====== Summary ===Correctly ClassifiedInstances

2967 89.0189 % (89.0489 % pruned)Incorrectly

Classified Instances366 10.9811 %

Kappa statistic 0.4992Mean absolute error 0.1743

32

Root mean squared error

0.2952Relative absolute error

70.3023 %Root relative

squared error83.872 %Total Number of

Instances3333

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

33

0.959 0.518 0.916 0.959 0.937 0.796 FALSE0.482 0.041 0.668 0.482

0.56 0.796 TRUE Weighted Avg. 0.89

0.448 0.88 0.89 0.883 0.796

=== Confusion Matrix ===a b <-- classified as2734 116 | a = FALSE250 233 | b = TRUE

The following is J48 on dataset of group of 5 after and before

pruning. The scores show no much difference.=== Stratified cross-validation (un pruned)====== Summary ===

Correctly Classified Instances 2961 88.8389 % (88.7789 % pruned) Incorrectly Classified Instances 372 11.1611 %Kappa statistic 0.4949Mean absolute error 0.175Root mean squared error

0.2965Relative absolute error

70.5623 %Root relative

squared error84.2364%Total Number of

Instances3333

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.957 0.516 0.916 0.957 0.936 0.78 FALSE0.484 0.043 0.655 0.484

0.557 0.78 TRUE Weighted Avg. 0.888

0.447 0.879 0.888 0.881 0.78

=== Confusion Matrix ===

a b <--classified as

2727 123 | a = FALSE249 234 | b = TRUE

The following is the test score of J48 on dataset group of 8 both

after pruning and before, on training set. The result shows there

is not much difference in all figures. But it is very much higher

to that of group five scores.=== Evaluation on training set (un pruned)=== Summary ===

Kappa statistic 0.7576Mean absolute error 0.0992Root mean squared error

0.2227Relative absolute error

39.9945 %Root relative

squared error63.2605%Total Number of

Instances3333

34

Correctly Classified Instances 3144 94.3294 % (94.5995 % pruned) Incorrectly Classified Instances 189 5.6706 %

Kappa statistic 0.7471Mean absolute error 0.102Root mean squared error

0.2288Relative absolute error

41.1161 %Root relative

squared error64.9986%Total Number of

Instances3333

35

=== Detailed Accuracy By Class ===TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.978 0.263 0.956 0.978 0.967 0.908 FALSE0.737 0.022 0.852 0.737

0.79 0.908 TRUE Weighted Avg. 0.943 0.228 0.941 0.943 0.942 0.908

=== Confusion Matrix ===

a b <--classified as

2788 62 | a = FALSE127 356 | b = TRUE

The following is J48 on dataset 8 after and before pruning both ontraining set and cross validation.=== Stratified cross-validation (un pruned)====== Summary ===Correctly Classified Instances 3134 94.0294 % ( 93.9994 % pruned) Incorrectly Classified Instances 199 5.9706 %

=== Detailed Accuracy By Class ===TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.975 0.263 0.956 0.975 0.965 0.89 FALSE0.737 0.025 0.832 0.737

0.782 0.89 TRUE Weighted Avg. 0.94

0.228 0.938 0.94 0.939 0.89

=== Confusion Matrix ===a b <--classified as

2778 72 | a = FALSE127 356 | b = TRUE

The following table summarises all the three tests on the 8

datasets with and without pruning. JRip and J48 both measures to

the best accuracy level both on training set and 10 fold cross

validation

36

before and after pruning.ZeroR JRip J48

Inter Plan With Pruning Training Set 85.5086 94.5995 94.3294VoiceMail Plan Cross 85.5086 94.5395 94.0294Total Day MinTotal Evening Min No pruning Training Set 94.5695

%94.5995%Total Int Min Cross 94.2094 93.9994

Total Int CallsNo of Calls Customer Churn

Table 6: ZeroR, JRip and J48 test score for the dataset group of 8

37

The test with J48 on dataset group of 5, and on the dataset of 8

show that the dataset group of 8 better predicts the churning

attribute easier pruned or not, both on the training set and on

the 10 fold cross validation. Therefore, as previously held

conclusion is true that dataset group of 8 is thebest predictor of the risk to churning.Attributes ZeroR JRip J48Phone Number With Pruning Training Set 85.5086

%88.4188%

89.0189%Inter Plan Cross

Validation85.5086%

88.1788%

88.8389%Total Day Min

No of Calls Customer Service

No pruning Cross Validation

87.8788%

89.1389%Churn Training Set 86.7987

%88.4188%Table 7: ZeroR, JRip and J48 test score for the dataset group of

5

Figure 9: The Tree representation of the dataset group 8

Weka’s Experimenter was used to see if the model performs well.

Both JRip and J48 were the best models to use but as can be seen

below JRip is slightly better than J48.

38

Figure 10: Classifiers comparison on ExperimenterThe JRip test on experimenter resulted different set of new

scores for correctness. Accordingly while all of the scores were

more than 92% the maximum score was up to 97.005988%. This is a

nearly perfect prediction.

SummaryAs can be seen from all the procedures undertaken so far the

dataset group of 8 attributes which includes Inter Plan,

VoiceMail Plan, Total Day Min, Total Evening Min, Total Int. Min,

Total Int. Calls ,No of Calls Customer Service and churn is the

best sub dataset to predict churning. To get this group the other

attributes were eliminated either manually or automatically.

State, area code and phone number were manually removed because

these attributes by themselves can not be related to churning

unless related to other variables.

39

ConclusionsAccording to the data collected by the company the risk of

customer churning is related to the service plan customers have,

the time they avail of the services and the services and charges

of international call. The data shows that customers who have no

voicemail plan are at risk of churning. Moreover, customers with

few minutes of day and night time calls are found to be prone to

churning. International call minutes and charges were related to

customer churning. Though this

40

needs to be investigated, it seems that people who are charged

beyond average are linked to the risk of churning. The other

important finding is that clients who have called customer

service for more than average times are more linked to churning

than those who called below average.

In conclusion the company need to encourage its customers to

sign for voice mail services. This may be considered as this may

have been encouraging those signed to stay. The company also

need to consider the charges for international calls.

International call minutes and charges are in correlation to

each other. The other important area to consider is the client

call to customer call. Calls should be recorded and the mangers

need to follow if customer needs and problems were solved on

time. The company needs to look at its customer call

representatives and see if they are professional enough to solve

customers’ problems.

THANKS TO WEKA DEVELOPERS

Tesfaye Onsho.