0% found this document useful (0 votes)
7 views

Assignment 1

Uploaded by

lavanyagowdau
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Assignment 1

Uploaded by

lavanyagowdau
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

# Importing necessary libraries

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix

# Sample Data Creation


# Creating a controlled sample dataset
np.random.seed(42) # For reproducibility
data = {
'customer_id': range(1, 101),
'age': np.random.randint(18, 70, 100),
'tenure': np.random.randint(1, 20, 100),
}

# Using a simple rule for churn just for example purposes


data['churn'] = np.where((data['age'] > 40) & (data['tenure'] < 5), 1,
0)

# Load the dataset into a pandas DataFrame


df = pd.DataFrame(data)

# Data Aggregation: Calculate average age and tenure for churned vs.
non-churned customers
average_stats = df.groupby('churn').agg(
average_age=('age', 'mean'),
average_tenure=('tenure', 'mean')
).reset_index()

print("Average Age and Tenure of Churned vs Non-churned Customers:")


print(average_stats)

# Data Splitting: Define features and target variable


X = df[['age', 'tenure']] # Features: age and tenure
y = df['churn'] # Target variable: churn

# Splitting the dataset into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Model Training: Using Logistic Regression for prediction


model = LogisticRegression()

# Fitting the model on the training set


model.fit(X_train, y_train)

# Making predictions on the test set


y_pred = model.predict(X_test)
# Evaluating model performance using accuracy
accuracy = accuracy_score(y_test, y_pred)
confusion = confusion_matrix(y_test, y_pred)

print(f"Accuracy of the model: {accuracy*100:.2f}%")


print("Confusion Matrix:")
print(confusion)

Average Age and Tenure of Churned vs Non-churned Customers:


churn average_age average_tenure
0 0 40.903614 10.084337
1 1 55.294118 1.647059
Accuracy of the model: 90.00%
Confusion Matrix:
[[14 0]
[ 2 4]]

Explanation of Each Step


Importing Libraries: Import necessary libraries including pandas, numpy, and scikit-learn for
machine learning.

1. Sample Data Creation: Generate a sample customer dataset containing customer_id,


age, tenure, and churn status. Here, churn is a binary variable where 1 represents a
customer who has churned, and 0 indicates a customer who hasn’t.

2. Data Aggregation: Use the groupby function to aggregate data, specifically to


calculate the average age and tenure of customers categorized by their churn status
(0 or 1).

3. Data Splitting: Define features (X) and the target variable (y). Then, split the data
into training and testing sets using train_test_split, with 80% of the data for training
and 20% for testing.

4. Model Training: Instantiate a LogisticRegression model, train it using the training


data, and then make predictions on the test set.

5. Model Evaluation: Calculate the accuracy of the model's predictions and output the
confusion matrix for a more detailed performance analysis.

You might also like