DS Task-3 - Jupyter Notebook
DS Task-3 - Jupyter Notebook
Collecting pandas
Downloading pandas-2.1.1-cp310-cp310-win_amd64.whl (10.7 MB)
---------------------------------------- 10.7/10.7 MB 4.5 MB/s eta 0:00:00
Requirement already satisfied: pytz>=2020.1 in c:\users\ram\appdata\local\progr
ams\python\python310\lib\site-packages (from pandas) (2022.2.1)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\ram\appdata\l
ocal\programs\python\python310\lib\site-packages (from pandas) (2.8.2)
Collecting numpy>=1.22.4
Downloading numpy-1.26.0-cp310-cp310-win_amd64.whl (15.8 MB)
---------------------------------------- 15.8/15.8 MB 5.2 MB/s eta 0:00:00
Collecting tzdata>=2022.1
Downloading tzdata-2023.3-py2.py3-none-any.whl (341 kB)
-------------------------------------- 341.8/341.8 KB 4.3 MB/s eta 0:00:00
Requirement already satisfied: six>=1.5 in c:\users\ram\appdata\local\programs
\python\python310\lib\site-packages (from python-dateutil>=2.8.2->pandas) (1.1
6.0)
Installing collected packages: tzdata, numpy, pandas
Successfully installed numpy-1.26.0 pandas-2.1.1 tzdata-2023.3
Note: you may need to restart the kernel to use updated packages.
WARNING: You are using pip version 22.0.4; however, version 23.2.1 is availabl
e.
You should consider upgrading via the 'C:\Users\RAM\AppData\Local\Programs\Pyth
on\Python310\python.exe -m pip install --upgrade pip' command.
Collecting seaborn
Downloading seaborn-0.13.0-py3-none-any.whl (294 kB)
-------------------------------------- 294.6/294.6 KB 2.0 MB/s eta 0:00:00
localhost:8888/notebooks/DS Task-3.ipynb# 1/16
10/11/23, 8:45 PM DS Task-3 - Jupyter Notebook
Collecting seaborn
Downloading seaborn-0.13.0-py3-none-any.whl (294 kB)
-------------------------------------- 294.6/294.6 KB 2.0 MB/s eta 0:00:00
Collecting matplotlib!=3.6.1,>=3.3
Downloading matplotlib-3.8.0-cp310-cp310-win_amd64.whl (7.6 MB)
---------------------------------------- 7.6/7.6 MB 4.7 MB/s eta 0:00:00
Requirement already satisfied: pandas>=1.2 in c:\users\ram\appdata\local\progra
ms\python\python310\lib\site-packages (from seaborn) (2.1.1)
Requirement already satisfied: numpy!=1.24.0,>=1.20 in c:\users\ram\appdata\loc
al\programs\python\python310\lib\site-packages (from seaborn) (1.26.0)
Requirement already satisfied: packaging>=20.0 in c:\users\ram\appdata\local\pr
ograms\python\python310\lib\site-packages (from matplotlib!=3.6.1,>=3.3->seabor
n) (21.3)
Requirement already satisfied: pyparsing>=2.3.1 in c:\users\ram\appdata\local\p
rograms\python\python310\lib\site-packages (from matplotlib!=3.6.1,>=3.3->seabo
rn) (3.0.9)
Collecting kiwisolver>=1.0.1
Downloading kiwisolver-1.4.5-cp310-cp310-win_amd64.whl (56 kB)
-------------------------------------- 56.1/56.1 KB 975.6 kB/s eta 0:00:00
Collecting fonttools>=4.22.0
Downloading fonttools-4.43.1-cp310-cp310-win_amd64.whl (2.1 MB)
---------------------------------------- 2.1/2.1 MB 2.1 MB/s eta 0:00:00
Collecting contourpy>=1.0.1
Downloading contourpy-1.1.1-cp310-cp310-win_amd64.whl (477 kB)
-------------------------------------- 478.0/478.0 KB 1.7 MB/s eta 0:00:00
Collecting cycler>=0.10
Downloading cycler-0.12.1-py3-none-any.whl (8.3 kB)
Collecting pillow>=6.2.0
Downloading Pillow-10.0.1-cp310-cp310-win_amd64.whl (2.5 MB)
---------------------------------------- 2.5/2.5 MB 2.9 MB/s eta 0:00:00
Requirement already satisfied: python-dateutil>=2.7 in c:\users\ram\appdata\loc
al\programs\python\python310\lib\site-packages (from matplotlib!=3.6.1,>=3.3->s
eaborn) (2.8.2)
Requirement already satisfied: tzdata>=2022.1 in c:\users\ram\appdata\local\pro
grams\python\python310\lib\site-packages (from pandas>=1.2->seaborn) (2023.3)
Requirement already satisfied: pytz>=2020.1 in c:\users\ram\appdata\local\progr
ams\python\python310\lib\site-packages (from pandas>=1.2->seaborn) (2022.2.1)
Requirement already satisfied: six>=1.5 in c:\users\ram\appdata\local\programs
\python\python310\lib\site-packages (from python-dateutil>=2.7->matplotlib!=3.
6.1,>=3.3->seaborn) (1.16.0)
Installing collected packages: pillow, kiwisolver, fonttools, cycler, contourp
y, matplotlib, seaborn
Successfully installed contourpy-1.1.1 cycler-0.12.1 fonttools-4.43.1 kiwisolve
r-1.4.5 matplotlib-3.8.0 pillow-10.0.1 seaborn-0.13.0
Note: you may need to restart the kernel to use updated packages.
WARNING: You are using pip version 22.0.4; however, version 23.2.1 is availabl
e.
You should consider upgrading via the 'C:\Users\RAM\AppData\Local\Programs\Pyth
on\Python310\python.exe -m pip install --upgrade pip' command.
WARNING: You are using pip version 22.0.4; however, version 23.2.1 is availabl
e.
You should consider upgrading via the 'C:\Users\RAM\AppData\Local\Programs\Pyth
on\Python310\python.exe -m pip install --upgrade pip' command.
Collecting scikit-learn
Downloading scikit_learn-1.3.1-cp310-cp310-win_amd64.whl (9.3 MB)
---------------------------------------- 9.3/9.3 MB 4.0 MB/s eta 0:00:00
Collecting threadpoolctl>=2.0.0
Downloading threadpoolctl-3.2.0-py3-none-any.whl (15 kB)
Requirement already satisfied: numpy<2.0,>=1.17.3 in c:\users\ram\appdata\local
\programs\python\python310\lib\site-packages (from scikit-learn) (1.26.0)
Collecting joblib>=1.1.1
Downloading joblib-1.3.2-py3-none-any.whl (302 kB)
-------------------------------------- 302.2/302.2 KB 2.7 MB/s eta 0:00:00
Collecting scipy>=1.5.0
Downloading scipy-1.11.3-cp310-cp310-win_amd64.whl (44.1 MB)
---------------------------------------- 44.1/44.1 MB 3.0 MB/s eta 0:00:00
Installing collected packages: threadpoolctl, scipy, joblib, scikit-learn
Successfully installed joblib-1.3.2 scikit-learn-1.3.1 scipy-1.11.3 threadpoolc
tl-3.2.0
Note: you may need to restart the kernel to use updated packages.
WARNING: You are using pip version 22.0.4; however, version 23.2.1 is availabl
e.
You should consider upgrading via the 'C:\Users\RAM\AppData\Local\Programs\Pyth
on\Python310\python.exe -m pip install --upgrade pip' command.
In [30]: df = pd.read_csv(r'G:\programming\bank-additional\bank-additional.csv',delimiter=
df.rename (columns={'y':'deposit'}, inplace=True)
In [31]: df.head()
Out[31]:
age job marital education default housing loan contact month day_of_wee
bl
localhost:8888/notebooks/DS Task-3.ipynb# 3/16
10/11/23, 8:45 PM DS Task-3 - Jupyter Notebook
In [31]: df.head()
Out[31]:
age job marital education default housing loan contact month day_of_wee
blue-
0 30 married basic.9y no yes no cellular may f
collar
5 rows × 21 columns
In [32]: df.tail()
Out[32]:
age job marital education default housing loan contact month day_of_week
5 rows × 21 columns
In [33]: df.shape
In [34]: df.columns
In [37]: df.dtypes.value_counts()
Out[37]: object 11
int64 5
float64 5
Name: count, dtype: int64
In [38]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4119 entries, 0 to 4118
Data columns (total 21 columns):
localhost:8888/notebooks/DS Task-3.ipynb# 4/16
10/11/23, 8:45 PM DS Task-3 - Jupyter Notebook
In [38]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4119 entries, 0 to 4118
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 4119 non-null int64
1 job 4119 non-null object
2 marital 4119 non-null object
3 education 4119 non-null object
4 default 4119 non-null object
5 housing 4119 non-null object
6 loan 4119 non-null object
7 contact 4119 non-null object
8 month 4119 non-null object
9 day_of_week 4119 non-null object
10 duration 4119 non-null int64
11 campaign 4119 non-null int64
12 pdays 4119 non-null int64
13 previous 4119 non-null int64
14 poutcome 4119 non-null object
15 emp.var.rate 4119 non-null float64
16 cons.price.idx 4119 non-null float64
17 cons.conf.idx 4119 non-null float64
18 euribor3m 4119 non-null float64
19 nr.employed 4119 non-null float64
20 deposit 4119 non-null object
dtypes: float64(5), int64(5), object(11)
memory usage: 675.9+ KB
In [39]: df.duplicated().sum()
Out[39]: 0
In [40]: df.isna().sum()
Out[40]: age 0
job 0
marital 0
education 0
default 0
housing 0
loan 0
contact 0
month 0
day_of_week 0
duration 0
campaign 0
pdays 0
previous 0
poutcome 0
emp.var.rate 0
cons.price.idx 0
cons.conf.idx 0
euribor3m 0
nr.employed 0
deposit 0
In [42]: cat_cols = df.select_dtypes(include='object').columns
dtype: int64
print(cat_cols)
num_cols = df.select_dtypes(exclude='object').columns
print(num_cols)
In [43]: df.describe()
Out[43]:
age duration campaign pdays previous emp.var.rate cons.price.idx
Out[44]:
job marital education default housing loan contact month day_of_week pou
count 4119 4119 4119 4119 4119 4119 4119 4119 4119
unique 12 4 8 3 3 3 2 10 5
freq 1012 2509 1264 3315 2175 3349 2652 1378 860
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [68], in <cell line: 1>()
----> 1 corr = df.corr()
2 print(corr)
3 corr =corr[abs(corr) >= 0.90]
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\fr
ame.py:10707, in DataFrame.corr(self, method, min_periods, numeric_only)
10705 cols = data.columns
10706 idx = cols.copy()
> 10707 mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)
10709 if method == "pearson":
10710 correl = libalgos.nancorr(mat, minp=min_periods)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\fr
ame.py:1892, in DataFrame.to_numpy(self, dtype, copy, na_value)
1890 if dtype is not None:
1891 dtype = np.dtype(dtype)
-> 1892 result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)
1893 if result.dtype is not dtype:
1894 result = np.array(result, dtype=dtype, copy=False)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\in
ternals\managers.py:1656, in BlockManager.as_array(self, dtype, copy, na_value)
1654 arr.flags.writeable = False
1655 else:
-> 1656 arr = self._interleave(dtype=dtype, na_value=na_value)
1657 # The underlying data was copied within _interleave, so no need
1658 # to further copy if copy=True or setting na_value
1660 if na_value is lib.no_default:
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\in
ternals\managers.py:1715, in BlockManager._interleave(self, dtype, na_value)
1713 else:
1714 arr = blk.get_values(dtype)
-> 1715 result[rl.indexer] = arr
1716 itemmask[rl.indexer] = 1
1718 if not itemmask.all():
0 12 1 1 2 0 2 0 0 6 0 ...
1 21 7 2 3 0 0 0 1 6 0 ...
2 7 7 1 3 0 2 0 1 4 4 ...
3 20 7 1 2 0 1 1 1 4 0 ...
4 29 0 1 6 0 2 0 0 7 1 ...
... ... ... ... ... ... ... ... ... ... ... ...
4114 12 0 1 1 0 2 2 0 3 2 ...
4115 21 0 1 3 0 2 0 1 3 0 ...
4116 9 8 2 3 0 0 0 0 6 1 ...
4117 40 0 1 3 0 0 0 0 1 0 ...
4118 16 4 2 3 0 2 0 0 7 4 ...
In [77]: df_encoded['deposit'].value_counts()
Out[77]: deposit
0 3668
1 451
Name: count, dtype: int64
In [78]: x = df_encoded.drop('deposit',axis=1)
y = df_encoded ['deposit']
print(x.shape)
print (y.shape)
print(type(x))
print (type(y))
(4119, 20)
(4119,)
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>
In [80]: print(4119*0.25)
1029.75
(3089, 20)
(1030, 20)
(3089,)
(1030,)
[0 0 1 ... 1 0 0]
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [96], in <cell line: 1>()
----> 1 eval_model(y_test, ypred_dt)
In [103]: mscore(dt1)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [103], in <cell line: 1>()
----> 1 mscore(dt1)
In [105]: eval_model(y_test,ypred_dt1)
In [105]: eval_model(y_test,ypred_dt1)
In [ ]: