0% found this document useful (0 votes)
12 views15 pages

Solution LabAssignment

This lab assignment involves analyzing drug consumption data using machine learning techniques. The document: 1. Imports and preprocesses a drug consumption dataset to prepare it for modeling. 2. Fits logistic regression models to predict alcohol, amphetamine, and amylnitrite consumption using demographic and behavioral features as predictors. 3. Evaluates the models by computing confusion matrices on held-out test data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views15 pages

Solution LabAssignment

This lab assignment involves analyzing drug consumption data using machine learning techniques. The document: 1. Imports and preprocesses a drug consumption dataset to prepare it for modeling. 2. Fits logistic regression models to predict alcohol, amphetamine, and amylnitrite consumption using demographic and behavioral features as predictors. 3. Evaluates the models by computing confusion matrices on held-out test data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Lab Assignment

November 27, 2023

19EAC381 - Machine Learning Lab


Lab Assignment

0.0.1 Course Outcomes Mapped: CO3


0.0.2 Name: Hitarth Anand Rohra
0.0.3 Roll No: AM.EN.U4EAC21032
[1]: import pandas as pd
df = pd.read_csv('/Users/hitaarthh/Downloads/Lab assignment/
↪Drug_Consumption_Quantified.csv')

df.head()

[1]: ID Age Gender Education Country Ethnicity Nscore Escore \


0 2 -0.07854 -0.48246 1.98437 0.96082 -0.31685 -0.67825 1.93886
1 3 0.49788 -0.48246 -0.05921 0.96082 -0.31685 -0.46725 0.80523
2 4 -0.95197 0.48246 1.16365 0.96082 -0.31685 -0.14882 -0.80615
3 5 0.49788 0.48246 1.98437 0.96082 -0.31685 0.73545 -1.63340
4 6 2.59171 0.48246 -1.22751 0.24923 -0.31685 -0.67825 -0.30033

Oscore AScore … Ecstasy Heroin Ketamine Legalh LSD Meth \


0 1.43533 0.76096 … CL4 CL0 CL2 CL0 CL2 CL3
1 -0.84732 -1.62090 … CL0 CL0 CL0 CL0 CL0 CL0
2 -0.01928 0.59042 … CL0 CL0 CL2 CL0 CL0 CL0
3 -0.45174 -0.30172 … CL1 CL0 CL0 CL1 CL0 CL0
4 -1.55521 2.03972 … CL0 CL0 CL0 CL0 CL0 CL0

Mushrooms Nicotine Semer VSA


0 CL0 CL4 CL0 CL0
1 CL1 CL0 CL0 CL0
2 CL0 CL2 CL0 CL0
3 CL2 CL2 CL0 CL0
4 CL0 CL6 CL0 CL0

[5 rows x 32 columns]

1
[2]: df = df.drop(['ID','Age','Gender','Education', 'Country','Ethnicity'],axis=1)
df.head()

[2]: Nscore Escore Oscore AScore Cscore Impulsive SS Alcohol \


0 -0.67825 1.93886 1.43533 0.76096 -0.14277 -0.71126 -0.21575 CL5
1 -0.46725 0.80523 -0.84732 -1.62090 -1.01450 -1.37983 0.40148 CL6
2 -0.14882 -0.80615 -0.01928 0.59042 0.58489 -1.37983 -1.18084 CL4
3 0.73545 -1.63340 -0.45174 -0.30172 1.30612 -0.21712 -0.21575 CL4
4 -0.67825 -0.30033 -1.55521 2.03972 1.63088 -1.37983 -1.54858 CL2

Amphet Amyl … Ecstasy Heroin Ketamine Legalh LSD Meth Mushrooms \


0 CL2 CL2 … CL4 CL0 CL2 CL0 CL2 CL3 CL0
1 CL0 CL0 … CL0 CL0 CL0 CL0 CL0 CL0 CL1
2 CL0 CL0 … CL0 CL0 CL2 CL0 CL0 CL0 CL0
3 CL1 CL1 … CL1 CL0 CL0 CL1 CL0 CL0 CL2
4 CL0 CL0 … CL0 CL0 CL0 CL0 CL0 CL0 CL0

Nicotine Semer VSA


0 CL4 CL0 CL0
1 CL0 CL0 CL0
2 CL2 CL0 CL0
3 CL2 CL0 CL0
4 CL6 CL0 CL0

[5 rows x 26 columns]

[3]: df.columns

[3]: Index(['Nscore', 'Escore', 'Oscore', 'AScore', 'Cscore', 'Impulsive', 'SS',


'Alcohol', 'Amphet', 'Amyl', 'Benzos', 'Caff', 'Cannabis', 'Choc',
'Coke', 'Crack', 'Ecstasy', 'Heroin', 'Ketamine', 'Legalh', 'LSD',
'Meth', 'Mushrooms', 'Nicotine', 'Semer', 'VSA'],
dtype='object')

[4]: frequency_mapping = {
'CL0': 0,
'CL1': 1,
'CL2': 2,
'CL3': 3,
'CL4': 4,
'CL5': 5,
'CL6': 6
}
columns = ['Alcohol', 'Amphet', 'Amyl', 'Benzos', 'Caff', 'Cannabis', 'Choc',␣
↪'Coke', 'Crack', 'Ecstasy', 'Heroin', 'Ketamine', 'Legalh', 'LSD', 'Meth',␣

↪'Mushrooms', 'Nicotine', 'Semer', 'VSA']

df[columns] = df[columns].applymap(lambda x: frequency_mapping.get(x, x))

2
df

[4]: Nscore Escore Oscore AScore Cscore Impulsive SS \


0 -0.67825 1.93886 1.43533 0.76096 -0.14277 -0.71126 -0.21575
1 -0.46725 0.80523 -0.84732 -1.62090 -1.01450 -1.37983 0.40148
2 -0.14882 -0.80615 -0.01928 0.59042 0.58489 -1.37983 -1.18084
3 0.73545 -1.63340 -0.45174 -0.30172 1.30612 -0.21712 -0.21575
4 -0.67825 -0.30033 -1.55521 2.03972 1.63088 -1.37983 -1.54858
… … … … … … … …
1879 -1.19430 1.74091 1.88511 0.76096 -1.13788 0.88113 1.92173
1880 -0.24649 1.74091 0.58331 0.76096 -1.51840 0.88113 0.76540
1881 1.13281 -1.37639 -1.27553 -1.77200 -1.38502 0.52975 -0.52593
1882 0.91093 -1.92173 0.29338 -1.62090 -2.57309 1.29221 1.22470
1883 -0.46725 2.12700 1.65653 1.11406 0.41594 0.88113 1.22470

Alcohol Amphet Amyl … Ecstasy Heroin Ketamine Legalh LSD \


0 5 2 2 … 4 0 2 0 2
1 6 0 0 … 0 0 0 0 0
2 4 0 0 … 0 0 2 0 0
3 4 1 1 … 1 0 0 1 0
4 2 0 0 … 0 0 0 0 0
… … … … … … … … … …
1879 5 0 0 … 0 0 0 3 3
1880 5 0 0 … 2 0 0 3 5
1881 4 6 5 … 4 0 2 0 2
1882 5 0 0 … 3 0 0 3 3
1883 4 3 0 … 3 0 0 3 3

Meth Mushrooms Nicotine Semer VSA


0 3 0 4 0 0
1 0 1 0 0 0
2 0 0 2 0 0
3 0 2 2 0 0
4 0 0 6 0 0
… … … … … …
1879 0 0 0 0 5
1880 4 4 5 0 0
1881 0 2 6 0 0
1882 0 3 4 0 0
1883 0 3 6 0 2

[1884 rows x 26 columns]

[5]: from sklearn.linear_model import LogisticRegression


model = LogisticRegression(multi_class="auto", max_iter=1000, solver="lbfgs")

3
[6]: target_variable = columns[0]
X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)

model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)

Config matrix for: Alcohol


[[ 0 0 0 0 0 12 1]
[ 0 0 0 0 1 9 0]
[ 0 0 1 2 1 10 0]
[ 1 0 0 3 3 51 2]
[ 0 0 0 2 3 81 5]
[ 2 0 2 1 14 184 9]
[ 1 0 0 5 5 140 15]]

[7]: target_variable = columns[1]


X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)

model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)

Config matrix for: Amphet


[[266 5 9 10 2 0 0]
[ 56 6 5 3 1 0 0]
[ 23 8 19 14 1 0 2]
[ 13 1 16 21 2 3 0]
[ 5 1 4 10 1 1 3]
[ 5 0 3 6 1 1 2]
[ 13 1 6 8 3 2 4]]

[8]: target_variable = columns[2]


X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split

4
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)

model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)

Config matrix for: Amyl


[[386 0 10 9 1 0]
[ 44 1 4 1 2 0]
[ 53 1 11 3 0 0]
[ 20 0 4 4 0 0]
[ 6 0 1 1 0 0]
[ 3 0 0 0 0 1]]
/Users/hitaarthh/anaconda3/lib/python3.11/site-
packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed
to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(

[9]: target_variable = columns[3]


X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)

model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)

Config matrix for: Benzos


[[283 1 2 8 3 2 0]
[ 21 1 1 3 0 0 0]
[ 48 2 6 9 4 1 1]
[ 38 1 7 12 6 7 5]
[ 17 0 2 7 8 2 0]
[ 7 0 2 6 3 1 3]

5
[ 6 0 4 5 7 4 10]]

[10]: target_variable = columns[4]


X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)

model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)

Config matrix for: Caff


[[ 0 0 0 0 0 0 12]
[ 0 0 0 0 0 1 2]
[ 0 0 0 0 0 0 8]
[ 0 0 0 0 0 0 21]
[ 0 0 0 0 0 0 36]
[ 0 0 0 0 0 0 87]
[ 0 0 0 0 1 0 398]]

[11]: target_variable = columns[5]


X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)

model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)

Config matrix for: Cannabis


[[109 8 4 0 0 0 0]
[ 40 9 7 3 0 0 1]
[ 25 13 17 8 0 1 18]
[ 5 5 15 7 1 1 27]
[ 1 1 5 4 0 7 29]
[ 3 1 9 6 1 2 34]
[ 6 3 7 7 1 5 110]]

[12]: target_variable = columns[6]


X = df.drop(target_variable,axis=1)

6
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)

model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)

Config matrix for: Choc


[[ 1 0 0 0 0 3 10]
[ 0 0 0 0 0 2 0]
[ 0 0 0 0 0 1 3]
[ 0 0 0 0 0 5 14]
[ 0 0 1 0 3 45 41]
[ 2 0 0 0 7 74 90]
[ 0 0 0 0 7 94 163]]
/Users/hitaarthh/anaconda3/lib/python3.11/site-
packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed
to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(

[13]: target_variable = columns[7]


X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)

model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)

Config matrix for: Coke


[[292 4 14 16 0 0 0]
[ 26 5 3 3 0 0 0]
[ 28 7 17 20 3 0 1]

7
[ 18 5 14 35 6 0 2]
[ 8 0 3 16 3 0 1]
[ 0 0 3 4 5 0 1]
[ 2 0 0 0 0 0 1]]
/Users/hitaarthh/anaconda3/lib/python3.11/site-
packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed
to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(

[14]: target_variable = columns[8]


X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)

model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)

Config matrix for: Crack


[[479 0 9 2 1 3]
[ 15 1 0 0 0 0]
[ 23 0 6 0 0 0]
[ 12 0 6 3 0 0]
[ 1 0 3 0 0 0]
[ 1 0 1 0 0 0]]
/Users/hitaarthh/anaconda3/lib/python3.11/site-
packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed
to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(

8
[15]: target_variable = columns[9]
X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)

model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)

Config matrix for: Ecstasy


[[293 1 8 19 2 0 0]
[ 19 2 1 2 0 0 0]
[ 22 5 13 23 4 0 0]
[ 13 0 19 31 14 1 1]
[ 7 0 9 23 10 2 2]
[ 1 0 2 5 4 3 0]
[ 0 0 0 3 1 1 0]]
/Users/hitaarthh/anaconda3/lib/python3.11/site-
packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed
to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(

[16]: target_variable = columns[10]


X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)

model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)

Config matrix for: Heroin


[[472 1 2 6 0 2 0]

9
[ 16 0 0 1 1 0 0]
[ 14 1 6 1 0 4 0]
[ 7 1 1 6 1 0 1]
[ 6 0 0 3 1 1 0]
[ 2 0 1 1 1 0 1]
[ 0 0 0 4 0 0 1]]
/Users/hitaarthh/anaconda3/lib/python3.11/site-
packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed
to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(

[17]: target_variable = columns[11]


X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)

model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)

Config matrix for: Ketamine


[[427 1 4 5 0 2 1]
[ 8 0 1 0 0 1 0]
[ 36 0 3 1 0 1 0]
[ 31 1 3 10 0 2 0]
[ 6 0 2 5 0 0 0]
[ 6 0 3 2 1 1 0]
[ 1 0 0 1 0 0 0]]
/Users/hitaarthh/anaconda3/lib/python3.11/site-
packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed
to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:

10
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(

[18]: target_variable = columns[12]


X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)

model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)

Config matrix for: Legalh


[[305 1 4 26 0 2 1]
[ 5 0 0 1 0 0 0]
[ 23 0 2 19 0 1 1]
[ 30 1 9 52 5 6 0]
[ 6 0 2 21 1 0 0]
[ 5 0 2 11 2 0 0]
[ 6 0 2 7 1 2 4]]

[19]: target_variable = columns[13]


X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)

model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)

Config matrix for: LSD


[[304 6 4 15 4 1 0]
[ 46 14 4 3 0 1 0]
[ 17 9 16 12 2 0 0]
[ 10 3 9 25 3 3 0]
[ 2 1 8 17 2 4 0]
[ 3 0 2 7 4 0 0]
[ 1 0 0 1 2 1 0]]
/Users/hitaarthh/anaconda3/lib/python3.11/site-

11
packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed
to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(

[20]: target_variable = columns[14]


X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)

model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)

Config matrix for: Meth


[[414 0 1 7 3 0 1]
[ 10 0 1 1 0 0 0]
[ 18 0 3 4 2 1 1]
[ 28 1 3 8 2 1 4]
[ 7 0 2 3 1 1 0]
[ 8 0 0 7 0 0 2]
[ 6 0 1 3 2 3 6]]

[21]: target_variable = columns[15]


X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)

model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)

Config matrix for: Mushrooms


[[271 7 12 12 3 0 0]

12
[ 42 7 4 2 0 0 0]
[ 23 6 20 22 1 0 0]
[ 19 1 13 37 10 0 0]
[ 5 3 3 16 10 1 0]
[ 1 2 3 6 2 0 0]
[ 0 0 1 1 0 0 0]]
/Users/hitaarthh/anaconda3/lib/python3.11/site-
packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed
to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(

[22]: target_variable = columns[16]


X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)

model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)

Config matrix for: Nicotine


[[ 99 1 1 3 0 2 21]
[ 42 1 1 1 0 1 19]
[ 29 0 3 1 0 2 21]
[ 11 0 1 6 1 1 35]
[ 5 0 0 1 0 3 24]
[ 5 0 0 1 0 3 38]
[ 40 1 2 3 2 5 130]]

[23]: target_variable = columns[17]


X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)

model.fit(X_train,y_train)

13
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)

Config matrix for: Semer


[[563 0 0]
[ 1 0 0]
[ 2 0 0]]

[24]: target_variable = columns[18]


X = df.drop(target_variable,axis=1)
y = df[target_variable]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.
↪3,random_state=1)

model.fit(X_train,y_train)
y_predict = model.predict(X_test)
from sklearn.metrics import confusion_matrix
conf = confusion_matrix(y_test,y_predict)
print("Config matrix for:", target_variable)
print(conf)

Config matrix for: VSA


[[432 2 7 2 0 3 0]
[ 54 0 5 0 0 0 0]
[ 26 2 6 1 0 0 0]
[ 12 0 1 0 0 0 0]
[ 3 0 2 1 0 0 0]
[ 2 0 1 1 0 0 0]
[ 2 0 1 0 0 0 0]]
/Users/hitaarthh/anaconda3/lib/python3.11/site-
packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed
to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(

0.0.4 Inference:
• Learnt how to implement hot encoding can be performed on the categorical data inorder to
make it fit for machine learning algorithm.

14
• Instead of directly using the concept of dummies or one hot encoding, i prefered mapping out
the class of the input from the user in a form of discrete data. This approach works for small
data set, but if the number of columns increase drastically, hot encoding is the only solution
to optimize the code.

15

You might also like