Churn Analysis Report
Churn Analysis Report
1|Page
TABLE CONTENT:
Content Page no
1. Abstract 3
2. Introduction 3
2.1 Problem Statement. 4
2.2 Problem Objective. 4
2.3 Methodology. 4
2.4 Bayesian Networks 5
2.5 Naïve Bayes algorithm 5
2|Page
Abstract
In telecommunication companies, ‘churn’ means customers’ decision to move
from one service provider to another. The competition environment in telecom
companies' makes their aim is to maintains their customers who are likelihood to
leave and earns their satisfaction, so to avoid the problem of churn they
need churn predictive models.
Data mining techniques can be used to build churn prediction model for
telecommunication companies to identify churner and non-churner customers
because it can extract the predictive information from large databases.
Introduction:
Churn is a term used in many companies which is mean loss of customers of the
company for many rezones one of them is the dissatisfaction of customer.
Churn occurs easily because of the strong and breeding competition environment
in services which are providing especially in telecommunication sector, also churn
can be happen for another rezones for examples customer's dissatisfaction with
services and high cost of these services which can be in another service provider
with best quality and lower cost. So churn become a concern issue in telecom
sector because retaining of existing customer is costly than acquiring new one.
3|Page
Problem Statement: Churn is big issue for telecommunication
companies so they need a churn prediction model to help them to identify
churner and non-churner customers.
The model contains four steps these steps in ordering are identification of the
problem, acquisition the data, preparation of data which was acquisitioned, and
finally implementing this model by classification technique and Naïve bayes
algorithm.
4|Page
Bayesian networks
Bayesian models are probability models that can be used in classification
problems to estimate the likelihood of occurrences. They are graphical models
that provide a visual representation of the attribute relationships, ensuring
transparency, and an explanation of the model’s rationale.
They can be grouped into two main models based on their goals:
5|Page
guided by a specific target attribute. The goal of such models is to uncover
data patterns in the set of input fields.
Classification of Techniques
6|Page
The Proposed Model
The proposed model composed of four steps:
7|Page
is a difficult problem for the researchers to acquire the actual
dataset from the telecom industries. This is because the
customer’s private details may be misused.
3. Data Preparation: After collection of required datasets, we
require to prepare the dataset so that they can be processed by
appropriate tools. These datasets are generally large volumes of
unstructured data which cannot be handled by traditional data
processing techniques. So for processing them we require special
computing frameworks which process the data and makes the
data ready for analysis to be applied on.
4. Data mining Technique (Naïve Bayes algorithm): In this step the
model was implemented by using Naïve bayes algorithm in
classification. Naïve bayes algorithm was implemented to predicts
wither the customer will churn or not.
8|Page
1. Implementation of mapper, custom key, reducer class for
calculation of the conditional probabilities.
Then the probabilities are appended to the set’s records along the
predicted decision. Additionally, we have also counted the no of
records where the predicted decision meets the actual decision
10 | P a g e
and this is marked as the accuracy of the system. All these have
been done in program ‘AppendPr.java’.
Program Codes
Code for CStype.java
package churn;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
return c;
}
@Override
public int hashCode()
{
11 | P a g e
int a=attribute.hashCode();
int c=churn.hashCode();
int hc=a*31+c; //a+c , a*c
return hc;
}
@Override
public String toString()
{
// TODO Auto-generated method stub
StringBuilder sb = new StringBuilder();
//sb.append("[");
sb.append(attribute.toString()+","+churn.toString());
//sb.append("]");
String r=sb.toString();
return r;
}
package churn;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
12 | P a g e
{
private CStype keyout=new CStype();
private IntWritable valueout=new IntWritable(1);
@Override
protected void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException
{
// TODO Auto-generated method stub
String rec = value.toString();//hadoop doesnt undrstand text
so string
String f[] = rec.split(" ");
for(int i=0;i<f.length;i++)
{
//if(f[i].trim().length()!=0)
keyout.set(f[0],f[i]);
context.write(keyout,valueout);
}
}
}
import java.io.IOException;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Reducer;
@Override
protected void reduce(CStype key, Iterable<IntWritable>
values,Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
int sum = 0;
for(IntWritable v : values)
sum = sum + v.get();
context.write(key,new IntWritable(sum));
}
13 | P a g e
Code for ChurnProb.java
package churn;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.HashMap;
import java.util.Map;
hm1.put("yes", (yes/(yes+no)));
hm2.put("no", (no/(yes+no)));
14 | P a g e
pw.println(etr.getKey()+",Yes,"+etr.getValue()+",No,"+hm2.get(etr.ge
tKey()));
}
pw.println("total,Yes,"+hm1.get("yes")+",No,"+hm2.get("no"));
pw.close();
br.close();
}
}
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Mapper;
15 | P a g e
t_n=Double.parseDouble(ent[4]);
}
}
}
catch(Exception e){}
}
@Override
protected void setup(Context context) throws IOException,
InterruptedException {
// TODO Auto-generated method stub
@SuppressWarnings("deprecation")
Path [] paths=context.getLocalCacheFiles();
if(paths!=null && paths.length!=0)
{
for(Path p:paths)
{
readFile(p);
}
}
}
@Override
protected void map(LongWritable key, Text value,
Context context)
throws IOException, InterruptedException
{
// TODO Auto-generated method stub
l_y = pr_y/t_y;
l_n = pr_n/t_n;
if(f_y>f_n)
context.write(new Text("yes"), new DoubleWritable(f_y));
else
context.write(new Text("no"), new DoubleWritable(f_n));
16 | P a g e
}
}
import java.io.*;
/**
* @param args
* @throws IOException
*/
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
BufferedReader br1 = new BufferedReader(new
FileReader("finalProb"));
BufferedReader br2 = new BufferedReader(new
FileReader("trainingSet"));
PrintWriter pw = new PrintWriter(new FileWriter("fop_tr"));
int cnt=0;
while(true)
{
String l1 = br1.readLine();
String l2 = br2.readLine();
if(l1==null || l2==null) break;
if(l2 != null)
{
StringBuilder sb = new StringBuilder(l2);
sb.append(" "+l1);
pw.write(sb.toString()+"\n");
String ar[]=sb.toString().split(" ");
if(ar[7].split("\t")[0].endsWith(ar[0]))
cnt++;
}
}
pw.write("\n"+cnt+" out of 85 records agree with the
formulation.\nAccuracy in % : "+((cnt/85.0)*100));
pw.close();
br2.close();
br1.close();
}
import java.io.IOException;
17 | P a g e
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
j.setJarByClass(PAtrriDriver.class);
j.setMapperClass(PAtrriMapper.class);
j.setReducerClass(PAtrriReducer.class);
j.setOutputKeyClass(CStype.class);
j.setOutputValueClass(IntWritable.class);
//j.setOutputKeyClass(CStype.class);
//j.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(j, new
Path("smp_data/trainingSet"));//folder name and data set name
FileOutputFormat.setOutputPath(j, new
Path("ATrriProbeFile1"));
j.waitForCompletion(true);
}
}
import java.io.IOException;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
18 | P a g e
public class TestDriver
{
public static void main(String[] args) throws IOException,
ClassNotFoundException, InterruptedException
{
// TODO Auto-generated method stub
Configuration conf = new Configuration();
Job j = Job.getInstance(conf,"composite");
j.setJarByClass(TestDriver.class);
j.setMapperClass(TestMapper.class);
j.setOutputKeyClass(Text.class);
j.setOutputValueClass(DoubleWritable.class);
j.setNumReduceTasks(0);
//j.setOutputKeyClass(CStype.class);
//j.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(j, new
Path("smp_data/trainingSet"));//folder name and data set name
FileOutputFormat.setOutputPath(j, new
Path("ATrriProbeFile3"));
j.waitForCompletion(true);
}
19 | P a g e
Outputs
1. Getting the counts of (Attribute, Churn state pairs)
3. Final results : Probabilities of customer’s churning out in the given data set, and
the accuracy of the work.
20 | P a g e
21 | P a g e
22 | P a g e
Future Enhancement:
The efficiency that we have got after the calculation is almost 62.3% which can be
enhanced using some developed formula and further calculations, so that it will be
easier for us to calculate the churn probability of a customer, which will provide
better stability and better flexibility.
Conclusion:
Customer churn is a big issue in telecom companies especially for prepaid
subscribers because it happen easily under light of strong competition in this
business area, so these companies need to build a churn prediction model t identify
churner and non-churner customer and avoid this churn.
In this study the churn prediction model was built, this model contains of four steps
which in ordering are: identify problem which is the churn problem in telecom
companies, data selection in this step the data which is given to us by the faculty,
data preparation, implementation of Naïve Bayes algorithm.
The model was built by treading its four steps and using the data which selected
and Naïve bayes algorithm to implement it the result was there is from 100
customers correctly predicted with a certain level of accuracy.
23 | P a g e
Certificate
This is to certify that Mr. DHRITIMAN SOME of B.P PODDAR INSTITUTE OF MANAGEMENT &
TECHNOLOGY, registration number: 151150120005 OF 2015-2016, has successfully
completed a project on CUSTOMER CHURN ANALYSIS using BIG DATA TECHNOLOGY under
the guidance of Mr. TITAS ROYCHOUDHURY.
--- ---------------------------------------------------
-
Titas Roychowdhury
Globsyn Finishing School
www.globsynfinishingschool.com 24
24 | P a g e
Certificate
This is to certify that Mr. ARPAN GUIN of B.P. PODDAR INSTITUTE OF MANAGEMENT &
TECHNOLOGY, registration number: 141150110028 OF 2014-2015, has successfully
completed a project on CUSTOMER CHURN ANALYSIS using BIG DATA TECHNOLOGY under
the guidance of Mr. TITAS ROYCHOUDHURY.
--- ---------------------------------------------------
-
Titas Roychowdhury
Globsyn Finishing School
www.globsynfinishingschool.com 24
25 | P a g e
Certificate
This is to certify that Mr. ARUNAVA BANERJEE of B.P PODDAR INSTITUTE OF MANAGEMENT
& TECHNOLOGY, registration number: 141150110030 OF 2014-2015, has successfully
completed a project on CUSTOMER CHURN ANALYSIS using BIG DATA TECHNOLOGY under
the guidance of Mr. TITAS ROYCHOUDHURY.
--- ---------------------------------------------------
-
Titas Roychowdhury
Globsyn Finishing School
www.globsynfinishingschool.com 24
26 | P a g e
Certificate
This is to certify that Mr. DEBANJAN DEY of B.P. PODDAR INSTITUTE OF MANAGEMENT &
TECHNOLOGY, registration number: 141150110137 OF 2014-2015, has successfully
completed a project on CUSTOMER CHURN ANALYSIS using BIG DATA TECHNOLOGY under
the guidance of Mr. TITAS ROYCHOUDHURY.
--- ---------------------------------------------------
-
Titas Roychowdhury
Globsyn Finishing School
www.globsynfinishingschool.com 24
27 | P a g e
Certificate
This is to certify that Mr. SANKAR PRASAD BISWAS of BRAINWARE GROUP OF INSTITUTIONS-
SDET, registration number: 152700120006 OF 2015-2016, has successfully completed a
project on CUSTOMER CHURN ANALYSIS using BIG DATA TECHNOLOGY under the guidance
of Mr. TITAS ROYCHOUDHURY.
--- ---------------------------------------------------
-
Titas Roychowdhury
Globsyn Finishing School
www.globsynfinishingschool.com 24
28 | P a g e