0% found this document useful (0 votes)
25 views17 pages

ML Cops

The document discusses analyzing a bank marketing dataset using various Python libraries like NumPy, Pandas, Scikit-learn, and Matplotlib. It loads and explores the dataset, performs preprocessing, analyzes correlations, trains models, and visualizes results.

Uploaded by

THARNIKANTH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views17 pages

ML Cops

The document discusses analyzing a bank marketing dataset using various Python libraries like NumPy, Pandas, Scikit-learn, and Matplotlib. It loads and explores the dataset, performs preprocessing, analyzes correlations, trains models, and visualizes results.

Uploaded by

THARNIKANTH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

t numpy as np

t pandas as pd
t seaborn as sns
t matplotlib.pyplot as plt
sklearn.model_selection import train_test_split
sklearn.linear_model import LinearRegression
sklearn.metrics import mean_squared_error
sklearn.metrics import mean_absolute_error
sklearn.linear_model import Lasso, Ridge
sklearn.model_selection import GridSearchCV
sklearn.model_selection import RandomizedSearchCV
sklearn.metrics import accuracy_score, classification_report, confusion_mat
sklearn.metrics import r2_score
sklearn.preprocessing import StandardScaler
sklearn.linear_model import LinearRegression
sklearn.metrics import mean_absolute_error
sklearn.linear_model import Lasso, Ridge
sklearn.tree import DecisionTreeRegressor
sklearn.linear_model import LogisticRegression
sklearn.tree import DecisionTreeClassifier
sklearn.tree import DecisionTreeClassifier, plot_tree
sklearn.ensemble import RandomForestClassifier
sklearn.cluster import KMeans,AgglomerativeClustering
sklearn.metrics import silhouette_score
t scipy.cluster.hierarchy as sch
sklearn.preprocessing import LabelEncoder
sklearn.decomposition import PCA
sklearn.feature_selection import VarianceThreshold
statsmodels.tsa.seasonal import seasonal_decompose
dateutil.parser import parse
statsmodels.tsa.seasonal import seasonal_decompose
statsmodels.tsa.statespace.sarimax import SARIMAX
t statsmodels.api as sm
imblearn.over_sampling import SMOTE
sklearn.naive_bayes import GaussianNB
sklearn.svm import SVC

df = pd.read_csv('/content/bank.csv',delimiter=';')
df
age job marital education default housing loan contact mo

0 56 housemaid married basic.4y no no no telephone

1 57 services married high.school unknown no no telephone

2 37 services married high.school no yes no telephone

3 40 admin. married basic.6y no no no telephone

4 56 services married high.school no no yes telephone

... ... ... ... ... ... ... ... ...

41183 73 retired married professional.course no yes no cellular

41184 46 blue-collar married professional.course no no no cellular

41185 56 retired married university.degree no yes no cellular

41186 44 technician married professional.course no no no cellular

41187 74 retired married professional.course no yes no cellular

41188 rows × 21 columns

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41188 entries, 0 to 41187
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 41188 non-null int64
1 job 41188 non-null object
2 marital 41188 non-null object
3 education 41188 non-null object
4 default 41188 non-null object
5 housing 41188 non-null object
6 loan 41188 non-null object
7 contact 41188 non-null object
8 month 41188 non-null object
9 day_of_week 41188 non-null object
10 duration 41188 non-null int64
11 campaign 41188 non-null int64
12 pdays 41188 non-null int64
13 previous 41188 non-null int64
14 poutcome 41188 non-null object
15 emp.var.rate 41188 non-null float64
16 cons.price.idx 41188 non-null float64
17 cons.conf.idx 41188 non-null float64
18 euribor3m 41188 non-null float64
19 nr.employed 41188 non-null float64
20 y 41188 non-null object
dtypes: float64(5), int64(5), object(11)
memory usage: 6.6+ MB

df.isnull().sum()

age 0
job 0
marital 0
education 0
default 0
housing 0
loan 0
contact 0
month 0
day_of_week 0
duration 0
campaign 0
pdays 0
previous 0
poutcome 0
emp.var.rate 0
cons.price.idx 0
cons.conf.idx 0
euribor3m 0
nr.employed 0
y 0
dtype: int64

sns.scatterplot(df)
<Axes: >
/usr/local/lib/python3.10/dist-packages/IPython/core/events.py:89: UserWarning: C
func(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarn
fig.canvas.print_figure(bytes_io, **kw)

x = df
plt.figure(figsize=(50,50))
for i, col in enumerate(x,1):
plt.subplot(10,3,i)
sns.countplot(data = x, x= col)
for column in df.select_dtypes(include='object'):
sns.countplot(x=column,hue='y' ,data=df)
plt.show()
df1 =df.select_dtypes('number')
df1
age duration campaign pdays previous emp.var.rate cons.price.idx con

0 56 261 1 999 0 1.1 93.994

1 57 149 1 999 0 1.1 93.994

2 37 226 1 999 0 1.1 93.994

3 40 151 1 999 0 1.1 93.994

4 56 307 1 999 0 1.1 93.994

... ... ... ... ... ... ... ...

41183 73 334 1 999 0 -1.1 94.767

41184 46 383 1 999 0 -1.1 94.767

41185 56 189 2 999 0 -1.1 94.767

41186 44 442 1 999 0 -1.1 94.767

41187 74 239 3 999 1 -1.1 94.767

41188 rows × 10 columns

df1.corr()

age duration campaign pdays previous emp.var.rate cons.p

age 1.000000 -0.000866 0.004594 -0.034369 0.024365 -0.000371

duration -0.000866 1.000000 -0.071699 -0.047577 0.020640 -0.027968

campaign 0.004594 -0.071699 1.000000 0.052584 -0.079141 0.150754

pdays -0.034369 -0.047577 0.052584 1.000000 -0.587514 0.271004

previous 0.024365 0.020640 -0.079141 -0.587514 1.000000 -0.420489

emp.var.rate -0.000371 -0.027968 0.150754 0.271004 -0.420489 1.000000

cons.price.idx 0.000857 0.005312 0.127836 0.078889 -0.203130 0.775334

cons.conf.idx 0.129372 -0.008173 -0.013733 -0.091342 -0.050936 0.196041

euribor3m 0.010767 -0.032897 0.135133 0.296899 -0.454494 0.972245

nr.employed -0.017725 -0.044703 0.144095 0.372605 -0.501333 0.906970

plt.figure(figsize=(10,10))
sns.heatmap(df1.corr(),annot = True)
<Axes: >

le = LabelEncoder()
for column in df.select_dtypes(include='object'):
df[column] = le.fit_transform(df[column])
df.head()
age job marital education default housing loan contact month day_of_wee

0 56 3 1 0 0 0 0 1 6

1 57 7 1 3 1 0 0 1 6

2 37 7 1 3 0 2 0 1 6

3 40 0 1 1 0 0 0 1 6

4 56 7 1 3 0 0 2 1 6

5 rows × 21 columns

x=df.drop(['y'],axis = 1)
y = df['y']
x_train,x_test,y_train,y_test = train_test_split(x,y,train_size=.8,random_
x_train,x_test,y_train,y_test

( age job marital education default housing loan contact month \


31956 28 8 2 6 1 2 0 0 6
8163 52 1 1 2 0 0 0 1 4
2624 34 1 2 0 0 2 2 1 6
16544 24 0 1 6 0 0 0 0 3
14687 56 11 0 0 0 0 0 0 3
... ... ... ... ... ... ... ... ... ...
23346 57 5 1 0 0 0 0 0 1
11863 33 4 1 6 0 2 0 1 4
27063 51 6 1 6 1 0 0 0 7
8366 35 1 1 2 1 0 0 1 4
17530 41 9 0 5 0 2 0 0 3

day_of_week duration campaign pdays previous poutcome \


31956 2 191 2 999 0 1
8163 3 542 1 999 0 1
2624 3 1093 3 999 0 1
16544 4 135 1 999 0 1
14687 3 547 12 999 0 1
... ... ... ... ... ... ...
23346 4 50 2 999 0 1
11863 0 18 5 999 0 1
27063 0 46 4 999 1 0
8366 3 94 18 999 0 1
17530 1 112 2 999 0 1

emp.var.rate cons.price.idx cons.conf.idx euribor3m nr.employed


31956 -1.8 92.893 -46.2 1.327 5099.1
8163 1.4 94.465 -41.8 4.864 5228.1
2624 1.1 93.994 -36.4 4.856 5191.0
16544 1.4 93.918 -42.7 4.963 5228.1
14687 1.4 93.918 -42.7 4.961 5228.1
... ... ... ... ... ...
23346 1.4 93.444 -36.1 4.964 5228.1
11863 1.4 94.465 -41.8 4.959 5228.1

You might also like