0% found this document useful (0 votes)
16 views

Assignment 1

Uploaded by

lavanyagowdau
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Assignment 1

Uploaded by

lavanyagowdau
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

# Importing necessary libraries

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix

# Sample Data Creation


# Creating a controlled sample dataset
np.random.seed(42) # For reproducibility
data = {
'customer_id': range(1, 101),
'age': np.random.randint(18, 70, 100),
'tenure': np.random.randint(1, 20, 100),
}

# Using a simple rule for churn just for example purposes


data['churn'] = np.where((data['age'] > 40) & (data['tenure'] < 5), 1,
0)

# Load the dataset into a pandas DataFrame


df = pd.DataFrame(data)

# Data Aggregation: Calculate average age and tenure for churned vs.
non-churned customers
average_stats = df.groupby('churn').agg(
average_age=('age', 'mean'),
average_tenure=('tenure', 'mean')
).reset_index()

print("Average Age and Tenure of Churned vs Non-churned Customers:")


print(average_stats)

# Data Splitting: Define features and target variable


X = df[['age', 'tenure']] # Features: age and tenure
y = df['churn'] # Target variable: churn

# Splitting the dataset into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Model Training: Using Logistic Regression for prediction


model = LogisticRegression()

# Fitting the model on the training set


model.fit(X_train, y_train)

# Making predictions on the test set


y_pred = model.predict(X_test)
# Evaluating model performance using accuracy
accuracy = accuracy_score(y_test, y_pred)
confusion = confusion_matrix(y_test, y_pred)

print(f"Accuracy of the model: {accuracy*100:.2f}%")


print("Confusion Matrix:")
print(confusion)

Average Age and Tenure of Churned vs Non-churned Customers:


churn average_age average_tenure
0 0 40.903614 10.084337
1 1 55.294118 1.647059
Accuracy of the model: 90.00%
Confusion Matrix:
[[14 0]
[ 2 4]]

Explanation of Each Step


Importing Libraries: Import necessary libraries including pandas, numpy, and scikit-learn for
machine learning.

1. Sample Data Creation: Generate a sample customer dataset containing customer_id,


age, tenure, and churn status. Here, churn is a binary variable where 1 represents a
customer who has churned, and 0 indicates a customer who hasn’t.

2. Data Aggregation: Use the groupby function to aggregate data, specifically to


calculate the average age and tenure of customers categorized by their churn status
(0 or 1).

3. Data Splitting: Define features (X) and the target variable (y). Then, split the data
into training and testing sets using train_test_split, with 80% of the data for training
and 20% for testing.

4. Model Training: Instantiate a LogisticRegression model, train it using the training


data, and then make predictions on the test set.

5. Model Evaluation: Calculate the accuracy of the model's predictions and output the
confusion matrix for a more detailed performance analysis.

You might also like