0% found this document useful (0 votes)
96 views8 pages

Task-2 Example Code

The document outlines a data analytics project using Python in Google Colab, focusing on exploratory data analysis (EDA) of a dataset. It includes steps for handling missing values, performing univariate analysis for numerical and categorical features, and visualizing distributions and correlations. The project also sets up for predictive modeling of delinquency using machine learning techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
96 views8 pages

Task-2 Example Code

The document outlines a data analytics project using Python in Google Colab, focusing on exploratory data analysis (EDA) of a dataset. It includes steps for handling missing values, performing univariate analysis for numerical and categorical features, and visualizing distributions and correlations. The project also sets up for predictive modeling of delinquency using machine learning techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 8
519925, 772M Data Analytics ipynb -Colab v Exploratory Data Analysis import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from google.colab import drive drive.mount('/content/drive') Sy Mounted at /content/drive df = pd.read_csv('/content/drive/MyOrive/Colab Notebooks/ML Projects/Tata Data Analysis/Deli # Missing Value Analysis missing values = df. isnul1().sum().sort_values(ascending-False) missing percent = (missing values / len(df)) * 100 # Univariate Analysis: Numerical Features # Get numerical columns from the dataframe numerical_cols = df.select_dtypes(include=[ ‘number’ ]).columns # Calculate the summary statistics numerical_sunmary = df[numerical_cols].describe() # Univariate Analysis: Categorical Features categorical_cols = df.select_dtypes(include=["object']).colunns # Define categorical_cols categorical_sunnary = df[categorical_cols].describe() # Identify numerical columns numerical_cols = df.select_dtypes(include=['int64', ‘floaté4']).colunns.tolist() numerical_cols.remove(‘Delinquent_Account’) # Exclude the target # Apply median imputation for col in numerical_cols median_value = df[col].median() df[col].fillna(median_value, inplac: rue) Sy :4: Futurearning: A value is trying to be set on a copy ¢ The behavior will change in pandas 3.0. This inplace method will never work because the For example, when doing ‘df[col].method(value, inplace=True)', try using ‘df.method({col hitps:oolab research google.comidrveleTOOINK7peMtcS 1G55qs5WW9AoqyZInGitscrolTo=yzaqblajR3Sz&priniMode=tus 18 513125, 7:12AM Ta ‘Data Anais ipynb -Colab df{col].fillna(median_value, inplace=True) # Optional: Check if missing values remain print (d#[numerical_cols].isnull().sum()) By Age @ SO Credit_Score Credit_Utilization Missed_Payments Loan_Balance Debt_to_Income_Ratio Account_Tenure dtype: intea ‘# Plotting distributions for numerical features fig, axes = plt.subplots(len(numerical_cols), 1, figsize=(8, len(numerical_cols)*3)) for i, col in enumerate(nunerical_cols): sns.histplot(df[col], kde-True, ax-axes[]) axes[i].set_title(#"Distribution of (col) pit. tight_layout() plt.show() htips:ifeolab. rch google.comidrveleTOOINH7pcMtcS 1G55qs5WW94oqyZInGitscrolTo=yzaqblajR3Sz8priniMode=tus 28 513125, 7:12AM Ta Data Analytis.pynb - Cola = Distribution of Age 60 50 40 a 30 count 20 0 20 30 40 50 Age istribution of Income 70 Di 80 60 “ = - : 20 count M / 25000 © sod00- 75000 © 100000125000 «150000175000 200000 Income Distribution of Credit_Score 60 count 20 300 400 500 600 700 00 ceedit_ score Distribution of Credit_Utilization 60 a 20 02 04 06 og 10 arch google.comidrveleTOOINHI7pcMtcS 1G55qs5W94oqy7InGitscrolTo=yzaqblajR3Sz8priniMode=tus htips:ifeolab. 513125, 7:12AM Ta htips:ifeolab. Data Analjtics.ipynb - Clad Credit_utilization Distribution of Missed_Payments 80 60 30) = i—f i= —~ 20 o ° 1 2 3 4 5 6 issed_payments Distribution of Loan Balance 80 rd 60 oS g 0 =~ 20 ° ° 20000 40900 160000 ‘30000 100000 Loan Balance Distribution of Debt_to_Income Ratio 60 count 20 OL a2 a3 oa os Debt to_incame_Ratio Distribution of Account_Tenure 60 é count 20 0 ° a0 25 5.0 73 10.0 25 15.0 5 rch google.comidrveleTOOINNI7peMtcS 1G55qs5W94oqyZInGitscrolTo=yzaqblajR3Sz&priniMode=tus 48 519925, 772M ‘ata Data Analytics ipynb - Colab ‘Account_Tenure hitps:oolab research googla.comidrvel1eTOOINK7peMtcS 1G55qs5W9AoqyZInGitscrolTo=yzaqblajR3Sz&priniMode=tus 58 513125, 7:12AM “ata Data Analytics. pynb - Colab # Target Variable Distribution plt. figure(Figsize=(10, 8)) sns.countplot (x='Delinquent_Account', data=df) plt.title("Target Variable Distribution") plt.show() = Target Variable Distribution count Delinguent_Account # Correlation Heatmap plt. figure(Figsize=(10, 8)) # numerical_cols is already a list, no need to call tolist() sns.heatmap(df[numerical_cols + ['Delinquent_Account']].corr(), annot=True, cmap="coolwarm} plt.title("Correlation Matrix”) plt.shon() |ntps:ifolab research google.comidrve/1eTOOINXI7peMcS 1GS5qs6W94od)yZInCitscro\To=yzaqbajR3Sz&printMode=ue 68 513125, 7:12AM Data Analytis.pynb - Cola = Correlation Matrix credit Score reeit utilization Missed Payments toan Balance Debt. to_Income_Ratio F ery ‘Account Tenure Delinquent_Account MeateeamE yyments toan_palance | linquent_Account credit Missed Pay Ipip install ace_tools SB collecting ace_tools Downloading [email protected] (300 bytes) Downloading ace_tools-@[email protected] (1.1 kB) Installing collected packages: ace_tools !ntips:ifolab research google. com/drvateTOOINXTpeMeS 1G55qs6W94oqyZInCitscro\To=yzaqbOaJR35z&printMode=rus Lo 08 -06 -04 2 oo 519925, 772M Successfully installed ace_tools-0.0 df .head(1@) = GC > ( >) Next steps: (Generate code with df ) (€2 View recommended plots ) ((New interactive sheet ) v Predicting Delinquency with Al # Re-import necessary packages after kernel reset Customer_ID Age 0 cusTo001 4 cusToo02 2 cusTo003 3 CUSTO004 4 cUSTOO05 5 CUSTO006 6 — cUSTOOO7 7 cUusToo08 8 — CUSTO009 9 cusToo10 import pandas as pd sklearn. impute import SimpleImputer sklearn. preprocessing import OneHotEncoder, StandardScaler sklearn.compose import ColumnTransformer sklearn.pipeline import Pipeline from from from from from from from from from sklearn.model_selection import train_test_split sklearn.linear_model import LogisticRegression 56 69 46 32 60 25 38 56 36 40 Income 165580.0 100999.0 188416.0 101672.0 38524.0 84042.0 35056.0 123215.0 66991.0 34870.0 ‘ata Data Analytics. ipynb - Colab Credit Score Credit _Utilization 398.0 493.0 500.0 413.0 487.0 700.0 364.0 415.0 405.0 679.0 sklearn.tree import DecisionTreeClassifier sklearn.neural_network import MLPClassifier sklearn.metrics import accuracy_score, precision_score, recall_score, #1_score # Define features and target X = df.drop(columns=[ ‘Delinquent_Account' , y = df[ 'Delinquent_Account"] hitps:oolab research googla.comidrvel1eTOOINK7peMtcS 1G55qs5W9AoqyZInGitscrolTo=yzaqblajR3Sz&priniMode=tus “customer_ID"]) 0.390502 0.312444 0.359930 0.371400 0.234716 0.650540 0.390581 0.532715 0.413035 0.361824 Missed Payments Deling a8

You might also like