0% found this document useful (0 votes)
18 views

TT - Ipynb - Colaboratory

The document discusses importing a pandas dataframe from an Excel file, cleaning the data, and performing machine learning with a k-nearest neighbors classifier. Specifically, it extracts numeric values from strings in the dataframe, encodes categorical variables, splits the data into training and test sets, trains a kNN model on the training set and evaluates it on the test set, achieving 83% accuracy. It also prints the resulting confusion matrix.

Uploaded by

hos1999moh78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

TT - Ipynb - Colaboratory

The document discusses importing a pandas dataframe from an Excel file, cleaning the data, and performing machine learning with a k-nearest neighbors classifier. Specifically, it extracts numeric values from strings in the dataframe, encodes categorical variables, splits the data into training and test sets, trains a kNN model on the training set and evaluates it on the test set, achieving 83% accuracy. It also prints the resulting confusion matrix.

Uploaded by

hos1999moh78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

10/24/23, 8:50 PM tt.

ipynb - Colaboratory

import pandas as pd
df = pd.read_excel('Out_20.xlsx')

data = [
"2646", "2650", "2652", "2656", "2660", "2670", "2671",
"2630", "2631", "2632", "2633", "2634", "2635",
"2901", "2902A", "2903A", "2904A", "2905A", "2906A", "2907A", "2908A", "2909A",
"2910", "2911A", "2912A", "2913A", "2914A", "2915", "2916A",
"2921",
"2941", "2942", "2943", "2944",
"3101", "3102", "3174", "3103", "3104A", "3105", "3109",
"3170", "3171", "3172", "3173", "3110", "3111", "3112", "3113", "3114", "3115",
"3120", "3121", "3122", "3123", "3124", "3125", "3130", "3131", "3132", "3133", "3134", "3135",
"3140", "3141", "3147", "3148", "3149", "3150", "3140", "3141", "3143", "3147", "3148", "3149", "3150", "3151", "3140", "3
"3160", "3162A", "3163A", "3164", "3166", "3167", "3168", "3169", "3186A", "3187A",
"3201", "3202", "3203A", "3209A", "3210A", "3212A",
"3701", "3702", "3703", "3704", "3705", "3709A",
"3711", "3712", "3713", "3714", "3715", "3719A",
"3721", "3722", "3723", "3724", "3725", "3728A", "3729A",
"3731", "3732", "3733", "3734", "3735", "3739A",
"3741", "3742", "3743", "3744", "3745", "3749A",
"3751", "3752", "3753", "3754", "3755", "3758",
"3759A", "3761", "3762", "3763", "3764", "3765", "3768",
"3769A", "3771", "3772", "3773", "3774", "3775", "3778", "3779A"
]
lil = [x if x in data else None for x in df.columns]
new_list = [x for x in lil if x is not None]
a = df[new_list]
Y = df['Labels']

# Define the string to replace 'None' values


replacement_string = "Missing"

# Use .fillna() to replace 'None' values with the string


a = a.fillna(replacement_string)

import re
def extract_numbers_from_string(s):
numbers = re.findall(r'(\d+\.\d+|\d+)', s)
if [float(num) for num in numbers] != []:
a = [float(num) for num in numbers]
return a[0] # Convert to float
else:
return s

b = a.copy()
res = b.applymap(extract_numbers_from_string)

/tmp/ipykernel_7631/3268899633.py:2: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


res = b.applymap(extract_numbers_from_string)

from sklearn.preprocessing import LabelEncoder


label_encoder = LabelEncoder()

def str_to_num(value):
if type(value) != str:
return value
else:
return label_encoder.fit_transform([value])[0]

res = res.applymap(str_to_num)

/tmp/ipykernel_7631/1121957528.py:10: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.


res = res.applymap(str_to_num)

from sklearn.model_selection import train_test_split


from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
import matplotlib.pyplot as plt

X_train, X_test, y_train, y_test = train_test_split(res, Y, test_size=0.3, random_state=42)

https://colab.research.google.com/drive/15ii9g4Kt64khxXRvelS0PlqLUkyRxHMG?authuser=1 1/3
10/24/23, 8:50 PM tt.ipynb - Colaboratory
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

/mnt/Hossein-HDD/Files/anaconda3/envs/mr.davoodabadi/lib/python3.11/site-packages/sklearn/utils/validation.py:767: Future
if not hasattr(array, "sparse") and array.dtypes.apply(is_sparse).any():
/mnt/Hossein-HDD/Files/anaconda3/envs/mr.davoodabadi/lib/python3.11/site-packages/sklearn/utils/validation.py:605: Future
if is_sparse(pd_dtype):
/mnt/Hossein-HDD/Files/anaconda3/envs/mr.davoodabadi/lib/python3.11/site-packages/sklearn/utils/validation.py:614: Future
if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
/mnt/Hossein-HDD/Files/anaconda3/envs/mr.davoodabadi/lib/python3.11/site-packages/sklearn/utils/validation.py:605: Future
if is_sparse(pd_dtype):
/mnt/Hossein-HDD/Files/anaconda3/envs/mr.davoodabadi/lib/python3.11/site-packages/sklearn/utils/validation.py:614: Future
if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
▾ KNeighborsClassifier
KNeighborsClassifier()

y_pred = knn.predict(X_test)

/mnt/Hossein-HDD/Files/anaconda3/envs/mr.davoodabadi/lib/python3.11/site-packages/sklearn/utils/validation.py:767: Future
if not hasattr(array, "sparse") and array.dtypes.apply(is_sparse).any():
/mnt/Hossein-HDD/Files/anaconda3/envs/mr.davoodabadi/lib/python3.11/site-packages/sklearn/utils/validation.py:605: Future
if is_sparse(pd_dtype):
/mnt/Hossein-HDD/Files/anaconda3/envs/mr.davoodabadi/lib/python3.11/site-packages/sklearn/utils/validation.py:614: Future
if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):

accuracy = accuracy_score(y_test, y_pred)


print('Accuracy:', accuracy)

Accuracy: 0.8333333333333334
/mnt/Hossein-HDD/Files/anaconda3/envs/mr.davoodabadi/lib/python3.11/site-packages/sklearn/utils/validation.py:605: Future
if is_sparse(pd_dtype):
/mnt/Hossein-HDD/Files/anaconda3/envs/mr.davoodabadi/lib/python3.11/site-packages/sklearn/utils/validation.py:614: Future
if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
/mnt/Hossein-HDD/Files/anaconda3/envs/mr.davoodabadi/lib/python3.11/site-packages/sklearn/utils/validation.py:605: Future
if is_sparse(pd_dtype):
/mnt/Hossein-HDD/Files/anaconda3/envs/mr.davoodabadi/lib/python3.11/site-packages/sklearn/utils/validation.py:614: Future
if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
/mnt/Hossein-HDD/Files/anaconda3/envs/mr.davoodabadi/lib/python3.11/site-packages/sklearn/utils/validation.py:605: Future
if is_sparse(pd_dtype):
/mnt/Hossein-HDD/Files/anaconda3/envs/mr.davoodabadi/lib/python3.11/site-packages/sklearn/utils/validation.py:614: Future
if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):

# Create a confusion matrix


conf_matrix = confusion_matrix(y_test, y_pred)
print('Confusion Matrix:')
print(conf_matrix)

Confusion Matrix:
[[7 0]
[2 3]]
/mnt/Hossein-HDD/Files/anaconda3/envs/mr.davoodabadi/lib/python3.11/site-packages/sklearn/utils/validation.py:605: Future
if is_sparse(pd_dtype):
/mnt/Hossein-HDD/Files/anaconda3/envs/mr.davoodabadi/lib/python3.11/site-packages/sklearn/utils/validation.py:614: Future
if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
/mnt/Hossein-HDD/Files/anaconda3/envs/mr.davoodabadi/lib/python3.11/site-packages/sklearn/utils/validation.py:605: Future
if is_sparse(pd_dtype):
/mnt/Hossein-HDD/Files/anaconda3/envs/mr.davoodabadi/lib/python3.11/site-packages/sklearn/utils/validation.py:614: Future
if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
/mnt/Hossein-HDD/Files/anaconda3/envs/mr.davoodabadi/lib/python3.11/site-packages/sklearn/utils/validation.py:605: Future
if is_sparse(pd_dtype):
/mnt/Hossein-HDD/Files/anaconda3/envs/mr.davoodabadi/lib/python3.11/site-packages/sklearn/utils/validation.py:614: Future
if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):

https://colab.research.google.com/drive/15ii9g4Kt64khxXRvelS0PlqLUkyRxHMG?authuser=1 2/3
10/24/23, 8:50 PM tt.ipynb - Colaboratory

https://colab.research.google.com/drive/15ii9g4Kt64khxXRvelS0PlqLUkyRxHMG?authuser=1 3/3

You might also like