0% found this document useful (0 votes)

61 views26 pages

HIV Regression Source Code

1. The document describes a Python module for importing, cleaning, and preparing data for machine learning. It defines classes to import data from a CSV file, clean the data by dropping duplicates, removing low variance features, imputing missing values, and controlling noise. 2. The data is then split into training and test sets using train_test_split. Target variables can be binarized using a threshold. 3. Additional classes visualize the target distribution in the training and test sets to check for imbalance. The cleaned and prepared data frames are accessible as attributes for further analysis or model training.

Uploaded by

Văn Thịnh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views26 pages

HIV Regression Source Code

Uploaded by

Văn Thịnh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Module 1: Import data

In [1]:
import numpy as np
import pandas as pd
import os
class import_data():
'use class.data to get data'
def __init__(self):
while True:
self.path = input("Please input the path file (EX:...HIV Classificat
checkPath = os.path.isfile(self.path)
if checkPath == True:
break
self.data = None
self.data_csv()
def data_csv(self):
self.data = pd.read_csv(self.path)
display(self.data.head(2))
while True:
try:
col_drop = input("Please input the columns you want to drop? (En
if len(col_drop.strip()) == 0:
break
self.data = self.data.drop([col_drop], axis = 1)
except:
print("Error columns")
display(self.data.head(2))

In [4]:
# /Users/macbook/Documents/CH2020/Database Regression/HIV regression/Database fu
df = import_data()
data = df.data

pChEMBL
Name Smiles nAcid ALogP
Value

0 1 O=C(N/N=C/c1c(O)cc(O)cc1)c1n[nH]c(C2CC2)c1 4.37 0 -1.8049 3.25766

1 2 O=C(NCc1occc1)c1cc(C(=O)NCc2occc2)cc(C(=O)NCc2... 5.10 0 -2.3319 5.43775

2 rows × 2529 columns

pChEMBL
nAcid ALogP ALogp2 AMR naAromAtom nAromBond nAtom nHeavyAtom
Value

0 4.37 0 -1.8049 3.257664 35.1281 11 11 35

1 5.10 0 -2.3319 5.437758 41.3184 21 21 54

2 rows × 2527 columns

Module 2: Data_cleaning
In [5]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.feature_selection import VarianceThreshold
from sklearn.model_selection import train_test_split

class Data_cleaned:

def init(self, data):

self.data_0= data
self.data = self.data_0.copy()

def Duplicate_data(self):

#Duplicate rows:
dup_rows = self.data.duplicated()
print(f"Total duplicated rows: {(dup_rows == True).sum()}")

print("Data befor drop duplicates:", self.data.shape[0])

self.data.drop_duplicates(inplace = True)
print("Data after drop duplicates:", self.data.shape[0])

#Duplicate collumns:
self.data = self.data.T
dup_cols = self.data.duplicated()
print(f"Total similar columns: {(dup_cols == True).sum()}")
print("Data befor drop duplicates:", self.data.shape[0])
self.data.drop_duplicates(inplace = True)

self.data = self.data.T
print("Data after drop duplicates:", self.data.shape[1])

def Variance_Threshold(self):
X = self.data.values[:, 1:]
y = self.data.values[:, 0]
print(X.shape, y.shape)

# Define thresholds to check

thresholds = np.arange(0.0, 1, 0.05)

# Apply transform with each threshold

results = list()
for t in thresholds:
# define the transform
transform = VarianceThreshold(threshold=t)
# transform the input data
X_sel = transform.fit_transform(X)
# determine the number of input features
n_features = X_sel.shape[1]
print('>Threshold=%.2f, Features=%d' % (t, n_features)) # store the
results.append(n_features)
# plot the threshold vs the number of selected features
plt.plot(thresholds, results)
plt.show()
#Clear variance Threshold:
def Low_variance_cleaning(self):
while True:
try:
inputthreshold=float(input("Please, input threshold you want:"
break
except:
print("Error threshold!")

def VarianceThreshold_selector(data, thresh):

df1 = data.copy(deep = True)
selector = VarianceThreshold(thresh)
selector.fit(df1)
features = selector.get_support(indices = False)
df2 = data.loc[:,features]
return df2

data5 = VarianceThreshold_selector(self.data, thresh = inputthreshold

data5.shape
self.data = data5

def Missing_value_cleaning(self):
(self.data.isnull().sum()).sum()
print("Total missing value", (self.data.isnull().sum()).sum())
null_data = self.data[self.data.isnull().any(axis=1)]
display(null_data)
print("Total row with missing value", null_data.shape[0])

print("Shape of Data after cleaning missing value:", self.data.shape

self.data.dropna(inplace = True)
print("Shape of Data before cleaning missing value:", self.data.shape

def Activate_Data_Cleaned(self):
self.Duplicate_data()
self.Variance_Threshold()
self.Low_variance_cleaning()
self.Missing_value_cleaning()

class noise_control(Data_cleaned):
def __init__(self, data):
self.data_0 = data
self.data = self.data_0.copy()
def feature_noise(self):
self.cols_remove = []
cols = []
while True:
feature_doub_1 = input("Please input 1st feature duplicated")
feature_doub_2 = input("Please input 2nd feature duplicated")
if feature_doub_1 and feature_doub_2 in self.data.columns:
cols.append(feature_doub_1)
cols.append(feature_doub_2)
self.cols_remove.append(feature_doub_1)
if len(feature_doub_1.strip()) == 0:
break
self.data_noise = self.data[cols]
def check_noise(self):
self.data_dif = pd.DataFrame()
for i in range(0, self.data_noise.shape[1]-1):
self.data_dif[f"{i}"] = self.data_noise.iloc[:, i+1] - self.
self.data_dif = self.data_dif.iloc[:, [i for i in range(0,self.data_dif

def check_index_noise(self):
self.idx = []
for i in range(0, self.data_dif.shape[1]):

for key, values in enumerate(self.data_dif.iloc[:,i]):

if values != 0.0:
self.idx.append(key)
self.idx = list(set(self.idx))

self.data.drop(self.idx, axis = 0, inplace = True)

self.data.drop(self.cols_remove, axis = 1, inplace = True)

def Activate_noise_control(self):
self.feature_noise()
self.check_noise()
self.check_index_noise()

class train_test_prepare(noise_control):

def init(self, data):

self.data_0 = data
self.data = self.data_0.copy()
self.Data_train = None
self.Data_test = None

def target_bin(self, thresh):

self.df1 = self.data.copy()
self.thresh = thresh
X_name=str(input("Please, input the X column's name:"))
t1 = self.df1[ X_name] < self.thresh
self.df1.loc[t1, X_name] = 0
t2 = self.df1[ X_name] >= self.thresh
self.df1.loc[t2, X_name] = 1
return self.df1

def Data_split(self):
self.df = self.data.copy()
while True:
self.RoC = input("Do you want to make classification?(Y/N)").title
if self.RoC == 'Y' or self.RoC == 'N':
break
if self.RoC.title() == "Y":
while True:
try:
self.thresh = float(input("Please input the threshold"))
break
except:
print("Error value!")
self.df1 = self.target_bin(thresh = self.thresh)
y = self.df1.iloc[:, 0].values
self.stratify = y
y = self.df1.iloc[:, 0].values
else:
self.stratify = None
y = self.df.iloc[:, 0].values

X = self.df.iloc[:, 1:].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size

#index:
self.idx = self.df.T.index

#Train:
self.df_X_train = pd.DataFrame(X_train)
self.df_y_train = pd.DataFrame(y_train)
self.df_train = pd.concat([self.df_y_train, self.df_X_train], axis
self.df_a = self.df_train.T
self.df_a = self.df_a.reset_index(drop = True)
for i in range(0,self.idx.size):
self.df_a.rename(index ={i: self.idx[i]},inplace= True)
self.Data_train = self.df_a.T

#test
self.df_X_test = pd.DataFrame(X_test)
self.df_y_test = pd.DataFrame(y_test)
self.df_test = pd.concat([self.df_y_test, self.df_X_test], axis = 1
self.df_b = self.df_test.T
self.df_b = self.df_b.reset_index(drop = True)
for i in range(0,self.idx.size):
self.df_b.rename(index ={i: self.idx[i]},inplace= True)
self.Data_test = self.df_b.T

def Visualize_target(self):
if self.RoC.title() == "Y":
plt.figure(figsize = (16,5))
plt.subplot(1,2,1)
plt.hist(self.Data_train.iloc[:,0])
plt.title(f'Imbalanced ratio: {((self.Data_train.iloc[:,0].values
plt.subplot(1,2,2)
plt.hist(self.Data_test.iloc[:,0])
plt.title(f'Imbalanced ratio: {((self.Data_test.iloc[:,0].values
plt.show()
else:
plt.figure(figsize = (16,5))
plt.subplot(1,2,1)
plt.hist(self.Data_train.iloc[:,0])
plt.title(f'Train distribution')
plt.subplot(1,2,2)
plt.hist(self.Data_test.iloc[:,0])
plt.title(f'Test distribution')
plt.show()

def Nomial(self):
DF_1 = self.data_0.select_dtypes("int64")
DF_2 = DF_1.loc[:, (DF_1.nunique() <10).values & (DF_1.max() <10).values
idx2 = DF_2.T.index #select columns with int64
idx3 = self.Data_train.T.index #select all columns in data_train
idx4 = idx2.intersection(idx3) #idx4 are int64 cols in Data_train
self.Data_train[idx4]=self.Data_train[idx4].astype('int64') #set all id
self.Data_test[idx4]=self.Data_test[idx4].astype('int64')
if self.RoC == 'Y':
self.Data_train.iloc[:,0] = self.Data_train.iloc[:,0].astype('int64'
self.Data_test.iloc[:,0] = self.Data_test.iloc[:,0].astype('int64'
def Activate(self):
self.Activate_Data_Cleaned()
self.Activate_noise_control()
self.Data_split()
self.Visualize_target()
self.Nomial()

In [6]:
df=train_test_prepare(data)
df.Activate()

Total duplicated rows: 42

Data befor drop duplicates: 1806
Data after drop duplicates: 1764
Total similar columns: 780
Data befor drop duplicates: 2527
Data after drop duplicates: 1747
(1764, 1746) (1764,)
>Threshold=0.00, Features=1744
>Threshold=0.05, Features=1080
>Threshold=0.10, Features=946
>Threshold=0.15, Features=863
>Threshold=0.20, Features=804
>Threshold=0.25, Features=675
>Threshold=0.30, Features=660
>Threshold=0.35, Features=647
>Threshold=0.40, Features=636
>Threshold=0.45, Features=632
>Threshold=0.50, Features=627
>Threshold=0.55, Features=625
>Threshold=0.60, Features=617
>Threshold=0.65, Features=611
>Threshold=0.70, Features=601
>Threshold=0.75, Features=594
>Threshold=0.80, Features=589
>Threshold=0.85, Features=584
>Threshold=0.90, Features=579
>Threshold=0.95, Features=575

Error threshold!

Total missing value 35

pChEMBL
nAcid ALogP ALogp2 AMR naAromAtom nAromBond nAtom nHeavyAto
Value

1003 5.98 0.0 -1.7872 3.194084 55.9045 6.0 6.0 38.0

1 rows × 1081 columns

Total row with missing value 1

Shape of Data after cleaning missing value: (1764, 1081)
Shape of Data before cleaning missing value: (1763, 1081)

Các cặp thông số mang ý nghĩa giống nhau nhưng không thể loại bỏ bằng
Duplicated_row! (Cần cập nhật để loại noise)

diameter : topoDiameter
radius : topoRadius
weinerPath : WPATH
weinerPol : WPOL
zagreb : Zagreb

In [7]:
Data_train = df.Data_train
Data_test = df.Data_test

Module 3: Outlier handling

In [9]:
import warnings
warnings.filterwarnings(action='ignore')
from sklearn.preprocessing import PowerTransformer, QuantileTransformer, KBinsDi
import matplotlib.pyplot as plt
import seaborn as sns
class Check_Univariate_outlier():
def __init__(self, data_train=None, data_test=None):
self.data_train_0 = data_train
self.data_test_0 = data_test
self.data_train = self.data_train_0.copy()
self.data_test = self.data_test_0.copy()
self.data_train_clean = None
self.data_test_clean = None
self.df = None
self.good = []
self.bad = []
self.scl1 = PowerTransformer()
self.scl2 = QuantileTransformer(output_distribution = "normal")
self.scl3 = QuantileTransformer(output_distribution = "uniform")

def Check_remove_data(self): #self.

self.df_train = self.data_train.copy()
self.df_test = self.data_test.copy()
for col_name in self.df_train.select_dtypes("float64").columns:
q1 = self.df_train[col_name].quantile(0.25)
q3 = self.df_train[col_name].quantile(0.75)
iqr = q3 - q1
low = q1-1.5*iqr
high = q3+1.5*iqr
self.df_train = self.df_train[(self.df_train[col_name] <= high)
self.df_test = self.df_test[(self.df_test[col_name] <= high) &
print("Total data remove on Train", self.data_train_0.shape[0] -self
print("Total data remove on Test", self.data_test_0.shape[0] -self.
self.data_train_clean = self.df_train
self.data_test_clean = self.df_test

def Check_quantity_features(self):
self.good = []
self.bad = []
self.df_train = self.data_train.copy()
self.df_test = self.data_test.copy()
for col_name in self.df_train.select_dtypes("float64").columns:
q1 = self.df_train[col_name].quantile(0.25)
q3 = self.df_train[col_name].quantile(0.75)
iqr = q3-q1
remove = self.data_train.shape[0] - (self.df_train[(self.df_train
if remove == 0:
self.good.append(col_name)
else:
self.bad.append(col_name)
print(f"Number of good features: {len(self.good)}")
print(f"Number of bad features with data remove > 0: {len(self.bad)
print("*"*75)
def Check_remove_outlier(self):
self.Check_remove_data()
self.Check_quantity_features()

def Outlier_Winsor(self):
print("Handling with Winsorization method")
self.df_train = self.data_train_0.copy()
self.df_test = self.data_test_0.copy()
for col_name in self.df_train.select_dtypes(include="float64").columns
q1 = self.df_train[col_name].quantile(0.25)
q3 = self.df_train[col_name].quantile(0.75)
iqr = q3-q1
self.df_train.loc[(self.df_train[col_name] <= (q1-1.5*iqr)), col_nam
self.df_train.loc[(self.df_train[col_name] >= (q3+1.5*iqr)), col_nam
#for test
self.df_test.loc[(self.df_test[col_name] <= (q1-1.5*iqr)), col_name
self.df_test.loc[(self.df_test[col_name] >= (q3+1.5*iqr)), col_name
self.data_train = self.df_train
self.data_test = self.df_test
self.Check_remove_outlier()

def Transformation(self):
self.df_train = self.data_train_0.copy()
self.df_test = self.data_test_0.copy()
#Train
while True:
try:
self.transformer = int(input("Please select type of transformati
break
except:
print("Error values! Input number!")
if self.transformer == 1:
self.scl =self.scl1
print("Handling with Transformation_Powertransformer method")
elif self.transformer == 2:
self.scl =self.scl2
print("Handling with Transformation_Gaussiantransformer method"
else:
self.scl =self.scl3
print("Handling with Transformation_Uniformtransformer method")

#Train

df_train_int = self.df_train.select_dtypes("int64")
df_train_int = df_train_int.reset_index(drop = True)

y_train = self.df_train.select_dtypes("float64").iloc[:,0].values
X_train = self.df_train.select_dtypes("float64").iloc[:,1:].values

self.scl.fit(X_train)
X_train_trans = self.scl.transform(X_train)
idx = self.df_train.select_dtypes("float64").T.index

df_X_train = pd.DataFrame(X_train_trans)
df_y_train = pd.DataFrame(y_train)
df_train = pd.concat([df_y_train, df_X_train], axis = 1)
df_a = df_train.T
df_a = df_a.reset_index(drop = True)
for i in range(0,idx.size):
df_a.rename(index ={i: idx[i]},inplace= True)
Data_train_float = df_a.T
self.data_train = pd.concat([Data_train_float , df_train_int], axis

#test
df_test_int = self.df_test.select_dtypes("int64")
df_test_int = df_test_int.reset_index(drop = True)

y_test = self.df_test.select_dtypes("float64").iloc[:,0].values
X_test = self.df_test.select_dtypes("float64").iloc[:,1:].values

X_test_trans = self.scl.transform(X_test)
idx = self.df_test.select_dtypes("float64").T.index

df_X_test = pd.DataFrame(X_test_trans)
df_y_test = pd.DataFrame(y_test)
df_test = pd.concat([df_y_test, df_X_test], axis = 1)
df_b = df_test.T
df_b = df_b.reset_index(drop = True)
for i in range(0,idx.size):
df_b.rename(index ={i: idx[i]},inplace= True)
Data_test_float = df_b.T
self.data_test = pd.concat([Data_test_float , df_test_int], axis =
self.Check_remove_outlier()
input_point = input("Do you want to use KBin method for this Transformat
point = input_point.title()
if point == "Y":
self.KBin()
else:
pass

def KBin (self):

print("Handling with KBin method")
#Train
self.data_train_int = self.data_train.select_dtypes('int64')
self.data_train_good = self.data_train[self.good]
self.data_train_bad = self.data_train[self.bad]
self.n_bins = int(input("Please input number of bins"))
self.encode = input("Please input type of encode")
self.strategy = input("Please input type of strategy")
kst = KBinsDiscretizer(n_bins = 3, encode = self.encode, strategy =
kst.fit(self.data_train_bad)
self.bad_new = pd.DataFrame(kst.transform(self.data_train_bad)).astype

self.data_train_clean = pd.concat([self.data_train_good,self.bad_new
self.data_train = self.data_train_clean

#test
self.data_test_int = self.data_test.select_dtypes('int64')
self.data_test_good = self.data_test[self.good]
self.data_test_bad = self.data_test[self.bad]
self.bad_new = pd.DataFrame(kst.transform(self.data_test_bad)).astype
self.data_test_clean = pd.concat([self.data_test_good,self.bad_new,
self.data_test = self.data_test_clean

self.Check_remove_outlier()

def Activate_Check(self):
print('remove by IQR without handling')
self.Check_remove_outlier()
self.Outlier_Winsor()
self.Transformation()

In [10]:
df1 = Check_Univariate_outlier(Data_train, Data_test)
df1.Check_remove_outlier()
df1.Activate_Check()
Total data remove on Train 1399
Total data remove on Test 351
Number of good features: 73
Number of bad features with data remove > 0: 673
***************************************************************************
remove by IQR without handling
Total data remove on Train 1399
Total data remove on Test 351
Number of good features: 73
Number of bad features with data remove > 0: 673
***************************************************************************
Handling with Winsorization method
Total data remove on Train 0
Total data remove on Test 0
Number of good features: 746
Number of bad features with data remove > 0: 0
***************************************************************************

Handling with Transformation_Uniformtransformer method

Total data remove on Train 1398
Total data remove on Test 350
Number of good features: 700
Number of bad features with data remove > 0: 46
***************************************************************************

Handling with KBin method

Total data remove on Train 0

Total data remove on Test 0
Number of good features: 700
Number of bad features with data remove > 0: 0
***************************************************************************

In [11]:
Data_train = df1.data_train
Data_test = df1.data_test

In [13]:
from sklearn.ensemble import IsolationForest
from sklearn.svm import OneClassSVM
from sklearn.covariance import EllipticEnvelope
from sklearn.neighbors import LocalOutlierFactor

LOF = LocalOutlierFactor(n_neighbors = 20)

robust_cov = EllipticEnvelope(contamination=0.1)
emp_cov = EllipticEnvelope(contamination=0.1, support_fraction =1)
o_SVM = OneClassSVM()
iso_forest = IsolationForest(n_estimators=100, contamination=0.10)
class Mutivariate():
def __init__(self, data_train, data_test):
self.data_train_0 = data_train
self.data_test_0 = data_test

def LOF(self):
self.data_train_LOF = self.data_train_0.copy()
self.data_test_LOF = self.data_test_0.copy()
while True:
try:
self.n_neighbors = int(input("Please input number of neighbors f
break
except:
print("Error values!")
LOF = LocalOutlierFactor(n_neighbors = self.n_neighbors)
LOF.fit(self.data_train_LOF)
self.Outlier_LOF = self.data_train_LOF[LOF.fit_predict(self.data_train_L
self.Data_train_LOF = self.data_train_LOF[LOF.fit_predict(self.data_trai
print(f"Total outlier remove by LOF:", self.Outlier_LOF.shape[0])
#Test
LOF = LocalOutlierFactor(n_neighbors = self.n_neighbors, novelty =
LOF.fit(self.data_train_LOF)
self.Data_test_LOF = self.data_test_LOF[LOF.predict(self.data_test_LOF

def Ist_for(self):
self.data_train_Ist_for = self.data_train_0.copy()
self.data_test_Ist_for = self.data_test_0.copy()
while True:
try:
self.n_estimators = int(input("Please input number of estimators
self.contamination = float(input("Please input number of contami
break
except:
print("Error values!")
Iso_for = IsolationForest(n_estimators=self.n_estimators, contamination
Iso_for.fit(self.data_train_Ist_for)
self.Outlier_iso = self.data_train_Ist_for[Iso_for.predict(self.data_tra
self.Data_train_iso = self.data_train_Ist_for[Iso_for.predict(self.
self.Data_test_iso = self.data_test_Ist_for[Iso_for.predict(self.data_te
print(f"Total outlier remove by Isolation forest:", self.Outlier_iso

def o_SVM(self):
self.data_train_o_SVM = self.data_train_0.copy()
self.data_test_o_SVM = self.data_test_0.copy()
o_SVM = OneClassSVM()
o_SVM.fit(self.data_train_o_SVM)
self.Outlier_osvm = self.data_train_o_SVM[o_SVM.predict(self.data_train_
self.Data_train_osvm = self.data_train_o_SVM[o_SVM.predict(self.data_tra
self.Data_test_osvm = self.data_test_o_SVM[o_SVM.predict(self.data_test_
print(f"Total outlier remove by One Class SVM:", self.Outlier_osvm.

def robust_cov(self):
self.data_train_r_cov = self.data_train_0.copy()
self.data_test_r_cov = self.data_test_0.copy()
while True:
try:
self.contamination = float(input("Please input number of contami
break
except:
print("Error values!")
robust_cov = EllipticEnvelope(contamination= self.contamination)
robust_cov.fit(self.data_train_r_cov)
self.Outlier_rcov = self.data_train_r_cov[robust_cov.predict(self.data_t
self.Data_train_rcov = self.data_train_r_cov[robust_cov.predict(self
self.Data_test_rcov = self.data_test_r_cov[robust_cov.predict(self.
print(f"Total outlier remove by Robust covariance:", self.Outlier_rcov

def emp_cov(self):
self.data_train_e_cov = self.data_train_0.copy()
self.data_test_e_cov = self.data_test_0.copy()
while True:
try:
self.contamination = float(input("Please input number of contami
self.support_fraction = float(input("Please input number of supp
break
except:
print("Error values!")
emp_cov = EllipticEnvelope(contamination= self.contamination, support_fr
emp_cov.fit(self.data_train_e_cov)
self.Outlier_ecov = self.data_train_e_cov[emp_cov.predict(self.data_trai
self.Data_train_ecov = self.data_train_e_cov[emp_cov.predict(self.data_t
self.Data_test_ecov = self.data_test_e_cov[emp_cov.predict(self.data_tes
print(f"Total outlier remove by Emperical covariance:", self.Outlier_eco
def Visualize_Outlier(self):
self.LOF()
self.Ist_for()
self.o_SVM()
self.robust_cov()
self.emp_cov()
Models = [('Local Outlier Factor', self.Outlier_LOF.shape[0]), ('Isolat
('One Class SVM', self.Outlier_osvm.shape[0]),('Robust covaria
for name, N_out in Models:
plt.rcParams["figure.figsize"] = (20,8)
plt.bar(name,N_out)
def Mutivariate_Outlier_Handling(self):
while True:
try:
algo = input("Please select algorithm for multivariate method:
break
except:
print("Wrong! Please input number from 1-5.")
if algo == 1:
self.LOF()
elif algo == 2:
self.Ist_for()
elif algo == 3:
self.o_SVM()
elif algo == 4:
self.robust_cov()
elif algo == 5:
self.emp_cov()
else:
self.Mutivariate_Outlier_Handling()

In [ ]:
df4= Mutivariate(Data_train, Data_test)
df4.Visualize_Outlier()

In [14]:
df4= Mutivariate(Data_train, Data_test)
df4.LOF()

Total outlier remove by LOF: 27

In [15]:
Data_train = df4.Data_train_LOF
Data_test = df4.Data_test_LOF

Module 4: Rescale
In [16]:
from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler
class rescale(Mutivariate):
def __init__(self, data_train, data_test):
self.data_train_0 = data_train
self.data_test_0 = data_test
self.scl1 = MinMaxScaler()
self.scl2 = StandardScaler()
self.scl3 = RobustScaler()
def rescale_fit(self):
self.data_train = self.data_train_0.copy()
self.data_test = self.data_test_0.copy()
while True:
try:
self.transformer = int(input("Please select type of transformati
break
except:
print("Error value")
if self.transformer == 1:
self.scl =self.scl1
elif self.transformer == 2:
self.scl =self.scl2
else:
self.scl =self.scl3
#Train
df_train_int = self.data_train.select_dtypes("int64")
df_train_int = df_train_int.reset_index(drop = True)

y_train = self.data_train.select_dtypes("float64").iloc[:,0].values
X_train = self.data_train.select_dtypes("float64").iloc[:,1:].values

self.scl.fit(X_train)
X_train_trans = self.scl.transform(X_train)
idx = self.data_train.select_dtypes("float64").T.index

df_X_train = pd.DataFrame(X_train_trans)
df_y_train = pd.DataFrame(y_train)
df_train = pd.concat([df_y_train, df_X_train], axis = 1)
df_a = df_train.T
df_a = df_a.reset_index(drop = True)
for i in range(0,idx.size):
df_a.rename(index ={i: idx[i]},inplace= True)
Data_train_float = df_a.T
self.Data_train = pd.concat([Data_train_float , df_train_int], axis

#Test
df_test_int = self.data_test.select_dtypes("int64")
df_test_int = df_test_int.reset_index(drop = True)

y_test = self.data_test.select_dtypes("float64").iloc[:,0].values
X_test = self.data_test.select_dtypes("float64").iloc[:,1:].values

X_test_trans = self.scl.transform(X_test)
idx = self.data_test.select_dtypes("float64").T.index

In [17]:
df5 = rescale(Data_train, Data_test)
df5.rescale_fit()

In [18]:
df5.Data_train.head(2)

Out[18]: pChEMBL
ALogP ALogp2 AMR naAromAtom nAromBond nAtom nHeavyAtom
Value

0 8.4 -1.192863 1.154339 1.061050 -0.899420 -0.114518 0.770779 0.863838

1 6.3 -1.641491 1.635178 -1.545092 -0.074438 -0.114518 -1.227154 -1.383974

2 rows × 1085 columns

In [19]:
Data_train = df5.Data_train
Data_test = df5.Data_test

In [20]:
X_train = Data_train.iloc[:,1:].values
y_train = Data_train.iloc[:,0].values
X_test = Data_test.iloc[:,1:].values
y_test = Data_test.iloc[:,0].values

Module 5: Feature selection

In [27]:
from sklearn.feature_selection import SelectFromModel
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, ExtraTreesRegressor
from sklearn.linear_model import LinearRegression, LassoCV, ElasticNetCV
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import AdaBoostRegressor
from sklearn.ensemble import GradientBoostingRegressor
from xgboost import XGBRegressor
from catboost import CatBoostRegressor
from sklearn.model_selection import KFold, cross_val_score, RepeatedKFold
from sklearn.svm import LinearSVR

In [28]:
import matplotlib.pyplot as plt
class feature_selection:
def __init__(self, data_train, data_test):
self.X_train = data_train.iloc[:,1:].values
self.y_train = data_train.iloc[:,0].values
self.X_test = data_test.iloc[:,1:].values
self.y_test = data_test.iloc[:,0].values
self.result = list()
self.name = list()
def random_forest(self):
forest = RandomForestRegressor(random_state=42)
forest.fit(self.X_train, self.y_train)
model_RF = SelectFromModel(forest, prefit=True)
self.X_train_new = model_RF.transform(self.X_train)
self.X_test_new = model_RF.transform(self.X_test)
self.name.append("Random Forest")
self.check_intenal_performance()

def extra_tree(self):
ext_tree = ExtraTreesRegressor(random_state=42)
ext_tree.fit(self.X_train, self.y_train)
model_ext_tree = SelectFromModel(ext_tree, prefit=True)
self.X_train_new = model_ext_tree.transform(self.X_train)
self.X_test_new = model_ext_tree.transform(self.X_test)
self.name.append("ExtraTree")

self.check_intenal_performance()

def ada(self):
ada = AdaBoostRegressor(random_state=42)
ada.fit(self.X_train, self.y_train)
model_ada = SelectFromModel(ada, prefit=True)
self.X_train_new = model_ada.transform(self.X_train)
self.X_test_new = model_ada.transform(self.X_test)
self.name.append("AdaBoost")

self.check_intenal_performance()

def grad(self):
grad = GradientBoostingRegressor(random_state=42)
grad.fit(self.X_train, self.y_train)
model_grad = SelectFromModel(grad, prefit=True)
self.X_train_new = model_grad.transform(self.X_train)
self.X_test_new = model_grad.transform(self.X_test)
self.name.append("GradientBoost")
self.check_intenal_performance()

def XGb(self):
XGb = XGBRegressor(random_state=42)
XGb.fit(self.X_train, self.y_train)
model_XGb = SelectFromModel(XGb, prefit=True)
self.X_train_new = model_XGb.transform(self.X_train)
self.X_test_new = model_XGb.transform(self.X_test)
self.name.append("XGBoost")
self.check_intenal_performance()

def Lasso(self):
lasso = LassoCV(random_state = 42)
lasso.fit(self.X_train, self.y_train)
model_lasso = SelectFromModel(lasso, prefit=True)
self.X_train_new = model_lasso.transform(self.X_train)
self.X_test_new = model_lasso.transform(self.X_test)
self.name.append("lasso")
self.check_intenal_performance()

def ELN(self):
ELN = ElasticNetCV(random_state = 42)
ELN.fit(self.X_train, self.y_train)
model_ELN = SelectFromModel(ELN, prefit=True)
self.X_train_new = model_ELN.transform(self.X_train)
self.X_test_new = model_ELN.transform(self.X_test)
self.name.append("ElasticNet")
self.check_intenal_performance()

def feature_importance(self):
model = RandomForestRegressor(random_state = 42)
model.fit(self.X_train, self.y_train)
importance = model.feature_importances_
while True:
threshold = float(input("Select features importances threshold"
print("The remain features = ", (importance > threshold).sum())
action = input("Do you want to check another threshold?(Y/N)")
if action.title() == 'N':
break
self.X_train_new=self.X_train[:,importance > threshold]
self.X_test_new= self.X_test[:,importance > threshold]
self.name.append("Feature Importance")
self.check_intenal_performance()

def check_performance(self):
forest_model = RandomForestRegressor(random_state=42)
forest_model.fit(self.X_train_new, self.y_train)

self.r2= r2_score(y_test, forest_model.predict(self.X_test_new))

self.MSE = mean_squared_error(y_test, forest_model.predict(self.X_test_n
self.RMSE = np.sqrt(self.MSE)
self.MAE = mean_absolute_error(y_test, forest_model.predict(self.X_test_

print("R2 = ", self.r2)

print("MSE = ", self.MSE)
print("RMSE = ", self.RMSE)
print("MAE = ", self.MAE)

def check_intenal_performance(self):
cv = RepeatedKFold(n_splits=5, n_repeats=3, random_state=42)
in_model = RandomForestRegressor(random_state=42)
score_internal = cross_val_score(in_model, self.X_train_new, self.y_trai
print(score_internal.mean())
self.result.append(score_internal)

def model_feature_selection(self):
while True:
try:
models = int(input("Please select algorithm for feature selectio
break
except:
print("\nWrong values! Input number from 1-5!")
if models == 1:
self.random_forest()
elif models == 2:
self.extra_tree()
elif models ==3:
self.ada()
elif models == 4:
self.grad()
elif models == 5:
self.XGb()
elif models == 6:
self.feature_importance()
else:
self.model_feature_selection()
def compare_model(self):
fig =plt.figure(figsize = (20,8))
self.result = list()
self.name = list()
self.random_forest()
self.extra_tree()
self.ada()
self.grad()
self.XGb()
self.Lasso()
self.ELN()
self.feature_importance()

plt.boxplot(self.result, labels=self.name, showmeans=True)

plt.show()
fig.savefig("Compare feature selection method.png", transparent = True

In [29]:
Descriptor_select = feature_selection(Data_train, Data_test)
Descriptor_select.compare_model()

-0.7358630025248843
-0.7281728504218296
-0.745356557436394
-0.7261001611653198
-0.7343649689156617
-0.7552536490535624
-0.7507194810054509

The remain features = 180

-0.7360665483457994
In [31]:
# Use Anova test to choose feature selection method
d = pd.DataFrame(Descriptor_select.result)
idx = Descriptor_select.name
for i in range(0,len(idx)):
d.rename(index ={i: idx[i]},inplace= True)
check_result = d.T

import scipy.stats as stats

# stats f_oneway functions takes the groups as input and returns ANOVA F and p v
Ftest = stats.f_oneway(check_result['Random Forest'], check_result['ExtraTree'
print(f"FTest pvalue = {Ftest[1]}")

Ttest = stats.ttest_ind(check_result['Random Forest'], check_result['ExtraTree'

print(f"TTest pvalue = {Ttest[1]}")

FTest pvalue = 0.8552435952594457

TTest pvalue = 0.6130101128631499
Cả 2 test đều có giá trị p_value > 0.05 => khác biệt giữa các pp không có y nghĩa thống
kê

Trong đó có 3pp:

Extra Tree
XGboost
ElasticNet CV

15 fold đều cho kết quả không lệch quá nhiều Chọn KQ tốt nhất là Extra Tree

In [35]:
Descriptor_select = feature_selection(Data_train, Data_test)
Descriptor_select.extra_tree()

-0.7281728504218296
RFE METHOD # evaluate RFE for regression from numpy import mean from numpy import std from
sklearn.datasets import make_regression from sklearn.model_selection import cross_val_score from
sklearn.model_selection import RepeatedKFold from sklearn.feature_selection import RFECV from
sklearn.pipeline import Pipeline # create pipeline rfe =
RFECV(estimator=RandomForestRegressor(random_state=42)) model =
RandomForestRegressor(random_state=42) pipeline = Pipeline(steps=[('s',rfe),('m',model)]) # evaluate
model cv = RepeatedKFold(n_splits=5, n_repeats=3, random_state=42) n_scores =
cross_val_score(pipeline, X_train, y_train, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1,
error_score='raise') # report performance print('MAE: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

Module 6: Model Prepare

In [36]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression, ElasticNetCV, Ridge
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import AdaBoostRegressor
from sklearn.ensemble import GradientBoostingRegressor
from xgboost import XGBRegressor
from catboost import CatBoostRegressor
from sklearn.svm import SVR
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
from sklearn.model_selection import KFold
from sklearn.cross_decomposition import PLSRegression

In [37]:
X_train = Descriptor_select.X_train_new
X_test = Descriptor_select.X_test_new
y_train = Descriptor_select.y_train
y_test = Descriptor_select.y_test

1. Auto Model
In [ ]:
from Auto_ML.Auto_ML_HHC import LabHHCRegressor
reg = LabHHCRegressor(verbose=0, ignore_warnings=False, custom_metric=None)
models, predictions = reg.fit(X_train, X_test, y_train, y_test)
models

In [114…
models

Out[114… Adjusted R- R- Time

RMSE MAE MAPE
Squared Squared Taken

Model

CatBoostRegressor 0.62 0.76 0.62 0.47 0.08 11.57

ExtraTreesRegressor 0.62 0.75 0.63 0.46 0.08 2.43

HistGradientBoostingRegressor 0.61 0.75 0.63 0.47 0.08 1.09

LGBMRegressor 0.61 0.75 0.64 0.48 0.08 0.57

SVR 0.59 0.73 0.65 0.49 0.08 0.25

RandomForestRegressor 0.58 0.73 0.66 0.49 0.08 4.41

KNeighborsRegressor 0.58 0.73 0.66 0.49 0.08 0.04

NuSVR 0.58 0.73 0.66 0.50 0.08 0.23

MLPRegressor 0.54 0.71 0.69 0.52 0.09 1.37

GradientBoostingRegressor 0.53 0.70 0.70 0.54 0.09 2.12

XGBRegressor 0.53 0.70 0.70 0.53 0.09 0.72

BaggingRegressor 0.49 0.67 0.73 0.54 0.09 0.48

SGDRegressor 0.36 0.59 0.81 0.64 0.11 0.03

RidgeCV 0.36 0.59 0.81 0.65 0.11 0.03

Ridge 0.36 0.59 0.82 0.65 0.11 0.02

HuberRegressor 0.35 0.58 0.82 0.63 0.11 0.12

BayesianRidge 0.35 0.58 0.82 0.66 0.11 0.04

AdaBoostRegressor 0.35 0.58 0.82 0.68 0.11 0.74

LinearRegression 0.35 0.58 0.82 0.65 0.11 0.02

TransformedTargetRegressor 0.35 0.58 0.82 0.65 0.11 0.02

PoissonRegressor 0.34 0.58 0.82 0.66 0.11 0.02

ElasticNetCV 0.34 0.57 0.83 0.66 0.11 0.43

LassoCV 0.34 0.57 0.83 0.66 0.11 0.54

LassoLarsCV 0.33 0.57 0.83 0.66 0.11 0.21

LinearSVR 0.31 0.56 0.84 0.65 0.11 0.23

LassoLarsIC 0.30 0.55 0.85 0.67 0.11 0.06

ExtraTreeRegressor 0.30 0.55 0.85 0.61 0.10 0.05

GammaRegressor 0.28 0.54 0.86 0.69 0.12 0.02

TweedieRegressor 0.28 0.54 0.86 0.70 0.12 0.02

DecisionTreeRegressor 0.26 0.53 0.87 0.61 0.10 0.11

LarsCV 0.21 0.50 0.90 0.73 0.12 0.16

OrthogonalMatchingPursuitCV 0.20 0.48 0.91 0.73 0.12 0.03

OrthogonalMatchingPursuit 0.20 0.48 0.91 0.73 0.12 0.02

PLSRegression 0.16 0.46 0.93 0.76 0.13 0.02

PassiveAggressiveRegressor -0.26 0.19 1.14 0.93 0.16 0.02

ElasticNet -0.27 0.19 1.14 0.99 0.16 0.02

LassoLars -0.57 -0.01 1.28 1.10 0.18 0.02

DummyRegressor -0.57 -0.01 1.28 1.10 0.18 0.02

Lasso -0.57 -0.01 1.28 1.10 0.18 0.02

RANSACRegressor -1.99 -0.92 1.76 1.33 0.22 0.38

Lars -6.56 -3.85 2.80 1.87 0.31 0.07

GaussianProcessRegressor -32.12 -20.25 5.85 5.38 0.86 0.58

KernelRidge -36.54 -23.09 6.23 6.18 1.02 0.10

2. Model from scratch

In [111…
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC, NuSVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier,AdaBoostClassifier,
from xgboost import XGBClassifier
from catboost import CatBoostClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.naive_bayes import GaussianNB, BernoulliNB
class Regression_report:
def __init__(self, X_train, X_test, y_train, y_test, metric = None, create_d
self.X_train = X_train
self.X_test = X_test
self.y_train = y_train
self.y_test = y_test
self.metric = metric
self.df_compare_train = pd.DataFrame(columns =["R-squared", "Adjusted R-
self.df_compare_test = self.df_compare_train.copy()
self.create_df = create_df
self.lr = LinearRegression()
self.Rg = Ridge(alpha = 1, random_state = 42)
self.eln = ElasticNetCV(cv = KFold(5), random_state = 42)
self.PLS = PLSRegression(20)
self.knn = KNeighborsRegressor()
self.svr = SVR(kernel='rbf', gamma='scale', coef0=0.0, tol=0.001, C
self.dt = DecisionTreeRegressor(random_state=42)
self.rf = RandomForestRegressor(random_state=42)
self.ada = AdaBoostRegressor(random_state=42)
self.gbr = GradientBoostingRegressor(random_state=42)
self.xgb = XGBRegressor(random_state=42)
self.cb = CatBoostRegressor(random_state=42, verbose = 0)

def model(self):
self.regressors = [('Linear Regression', self.lr),('Ridge Regression'
('Decision Tree', self.dt), ('Random Forest', self.rf), ('AdaBoos
('Gradient Boosting Regressor', self.gbr), ('XGBoost', self.xgb

for self.name, self.estimator in self.regressors:

self.estimator.fit(self.X_train, self.y_train)
self.Report_metrics()

def adjusted_rsquared(r2, n, p):

return 1 - (1-r2) * ((n-1) / (n-p-1))

def Report_metrics(self):

self.P_train =self.estimator.predict(self.X_train)
self.P_test =self.estimator.predict(self.X_test)

if self.create_df==True:
r2_train = r2_score(self.y_train,self.P_train)
r2_test = r2_score(self.y_test,self.P_test)
r_squared_train = (1 - (1-r2_train) * ((self.X_train.shape[0]-1
r_squared_test = (1 - (1-r2_test) * ((self.X_test.shape[0]-1) /

ind =["R-squared", "Adjusted R-squared","RMSE","MAE","MedAE","MAPE"

self.metrics_df =pd.DataFrame(index=ind,data=[[r2_score (self
[r_squared_train,
[mean_squared_error
[mean_absolute_error
[median_absolute_error
[mean_absolute_percenta
[len(self.y_train)

#train
self.metrics_df["Estimator Name"]= self.name
df_compared_train = self.metrics_df.drop(['Test', "Estimator Name"
df_compared_train = df_compared_train.rename(columns ={'Train':
self.df_compare_train = self.df_compare_train.append(df_compared_tra

#test
self.metrics_df["Estimator Name"]= self.name
df_compared_test = self.metrics_df.drop(['Train', "Estimator Name"
df_compared_test = df_compared_test.rename(columns ={'Test': self
self.df_compare_test = self.df_compare_test.append(df_compared_test

else:
self.metrics_df ="File not created"

def Visualize_report(self):
while True:
try:
metric = int(input("Which metric do you want to visualize?\n\t
break
except:
print("Wrong metric! Please input number!")
for self.name, self.regressor in self.regressors:

# Fit regressor to the training set

self.regressor.fit(self.X_train, self.y_train)

# Predict
y_pred = self.regressor.predict(X_test)

# Evaluate performance on the test set

if metric == 1:
name = 'R_squared'
result = round(r2_score(self.y_test,y_pred),3)*100
if metric == 2:
name = 'Root mean squared error'
result = round(mean_squared_error(self.y_test,y_pred),3)
if metric == 3:
name = 'Mean absolute error'
result = round(mean_absolute_error(self.y_test,y_pred),3)
if metric == 4:
name = 'Median absolute error'
result = round(median_absolute_error(self.y_test,y_pred),3)
if metric == 5:
name = 'Mean absolute percentage error'
result = round(mean_absolute_percentage_error(self.y_test,y_pred

plt.rcParams["figure.figsize"] = (36,15)
ax = plt.bar(self.name,result)
plt.ylabel(name)
plt.xlabel("Algorithm")
plt.title(f"{name} compare", size = 20)

for p in ax.patches:
x = p.get_x()+ (p.get_width()/3)
y = p.get_height()+0.05
plt.text(x, y, round(result,3), fontsize=15)

In [112…
auto_models = Regression_report(X_train, X_test, y_train, y_test, create_df
auto_models.model()

In [113…
auto_models.Visualize_report()

In [115…
auto_models.df_compare_test

Out[115… R- Adjusted R- No. of

RMSE MAE MedAE MAPE
squared squared obs.

Linear Regression 0.58 0.35 0.82 0.65 0.54 0.11 347.00

Ridge Regression 0.59 0.36 0.81 0.65 0.54 0.11 347.00

Elastic Net 0.58 0.34 0.83 0.66 0.53 0.11 347.00

Partial Least Squares 0.59 0.36 0.81 0.64 0.52 0.11 347.00

K Nearest Neighbours 0.68 0.50 0.72 0.52 0.36 0.09 347.00

Support vector
0.71 0.56 0.68 0.50 0.39 0.09 347.00
machine

Decision Tree 0.53 0.26 0.87 0.61 0.44 0.10 347.00

Random Forest 0.73 0.58 0.66 0.49 0.38 0.08 347.00

AdaBoost 0.58 0.35 0.82 0.68 0.61 0.11 347.00

Gradient Boosting
0.70 0.53 0.70 0.54 0.44 0.09 347.00
Regressor

XGBoost 0.70 0.53 0.70 0.53 0.42 0.09 347.00

catboost 0.75 0.62 0.63 0.47 0.37 0.08 347.00

In [116…
auto_models.df_compare_train

Out[116… R- Adjusted R- No. of

RMSE MAE MedAE MAPE
squared squared obs.

Linear Regression 0.69 0.65 0.76 0.59 0.50 0.10 1376.00

Ridge Regression 0.69 0.65 0.76 0.59 0.50 0.10 1376.00

Elastic Net 0.67 0.64 0.78 0.61 0.52 0.11 1376.00

Partial Least Squares 0.68 0.65 0.76 0.59 0.50 0.10 1376.00

K Nearest Neighbours 0.81 0.79 0.59 0.42 0.31 0.07 1376.00

Support vector
0.84 0.83 0.53 0.35 0.18 0.06 1376.00
machine

Decision Tree 0.97 0.97 0.23 0.05 0.00 0.01 1376.00

Random Forest 0.94 0.93 0.33 0.22 0.15 0.04 1376.00

AdaBoost 0.67 0.63 0.78 0.66 0.65 0.11 1376.00

Gradient Boosting
0.84 0.83 0.53 0.41 0.33 0.07 1376.00
Regressor

XGBoost 0.97 0.97 0.23 0.07 0.02 0.01 1376.00

catboost 0.96 0.95 0.28 0.17 0.11 0.03 1376.00

Tunning SVM
In [117…
models = SVR()

In [141…
from sklearn.model_selection import GridSearchCV
cv = RepeatedKFold(5,3)
param_grid = {'C': [0.1,1, 10, 100], 'gamma': [1,0.1,0.01,0.001],'kernel':
grid = GridSearchCV(models,param_grid,refit=True,verbose=1, cv = cv)
grid.fit(X_train,y_train)

Fitting 15 folds for each of 48 candidates, totalling 720 fits

Out[141… GridSearchCV(cv=RepeatedKFold(n_repeats=3, n_splits=5, random_state=None),
estimator=SVR(),
param_grid={'C': [0.1, 1, 10, 100], 'gamma': [1, 0.1, 0.01, 0.
001],
'kernel': ['rbf', 'poly', 'sigmoid']},
verbose=1)

In [143…
print(grid.best_estimator_.C)
print(grid.best_estimator_.gamma)
print(grid.best_estimator_.kernel)

1
0.01
rbf
In [144…
#Before
svr = SVR()
svr.fit(X_train, y_train)
RMSE = mean_squared_error(y_test, svr.predict(X_test), squared = False)
MAPE = mean_absolute_percentage_error(y_test, svr.predict(X_test))
MAPE*100

Out[144… 8.547101079121475

In [145…
#After tunning
model_tuning = SVR(kernel = 'rbf', gamma = 0.01, C = 1)
model_tuning.fit(X_train, y_train)
RMSE = mean_squared_error(y_test, model_tuning.predict(X_test), squared = False
MAPE = mean_absolute_percentage_error(y_test, model_tuning.predict(X_test))
MAPE*100

Out[145… 8.449803139071971

In [ ]:

Dolezel Lubomir - Truth and Authenticity in Narrative
No ratings yet
Dolezel Lubomir - Truth and Authenticity in Narrative
20 pages
Facilitators Guide For Introducing Human Centered Design
100% (1)
Facilitators Guide For Introducing Human Centered Design
76 pages
Mercedes-Benz Greener Manufacturing Ai
0% (1)
Mercedes-Benz Greener Manufacturing Ai
16 pages
Amazing Love - Lyrics & Chords
No ratings yet
Amazing Love - Lyrics & Chords
1 page
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
24 pages
Exp-2 ML
No ratings yet
Exp-2 ML
6 pages
Exp 2
No ratings yet
Exp 2
6 pages
DA Programs
No ratings yet
DA Programs
44 pages
Machine File
No ratings yet
Machine File
27 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
C121 Exp1
No ratings yet
C121 Exp1
32 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
9 pages
ML Manual
No ratings yet
ML Manual
18 pages
Name: Muhammad Sarfraz Seat: EP1850086 Section: A Course Code: 514 Course Name: Data Warehousing and Data Mining
No ratings yet
Name: Muhammad Sarfraz Seat: EP1850086 Section: A Course Code: 514 Course Name: Data Warehousing and Data Mining
39 pages
DA Lab
No ratings yet
DA Lab
27 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
18 pages
ML Lab Codes
No ratings yet
ML Lab Codes
14 pages
C121 Exp2
No ratings yet
C121 Exp2
23 pages
CSC - 310 Advanced Python Programming Continuous Assessment-2 Assignment:Ca2
No ratings yet
CSC - 310 Advanced Python Programming Continuous Assessment-2 Assignment:Ca2
33 pages
Ex7 HTML
No ratings yet
Ex7 HTML
3 pages
Da Program Upto 6
No ratings yet
Da Program Upto 6
20 pages
Mlalllabprgs
No ratings yet
Mlalllabprgs
17 pages
Assignment 2 Documentation
No ratings yet
Assignment 2 Documentation
15 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
Ai Lab
No ratings yet
Ai Lab
11 pages
MACHINE LEARNING Manual
No ratings yet
MACHINE LEARNING Manual
36 pages
Data Analyzer
No ratings yet
Data Analyzer
10 pages
ML Lab File Batch 1
No ratings yet
ML Lab File Batch 1
20 pages
ASSi2 DSBDA
No ratings yet
ASSi2 DSBDA
4 pages
Featureselection
No ratings yet
Featureselection
11 pages
ML Lab
No ratings yet
ML Lab
23 pages
Roll NO 2020
No ratings yet
Roll NO 2020
8 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
Machine Learning Laboratory Manual
No ratings yet
Machine Learning Laboratory Manual
11 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
Data Preprocessing Example Programs1
No ratings yet
Data Preprocessing Example Programs1
9 pages
ML Spy Programs
No ratings yet
ML Spy Programs
16 pages
16BCB0126 VL2018195002535 Pe003
No ratings yet
16BCB0126 VL2018195002535 Pe003
40 pages
External
No ratings yet
External
11 pages
EDA Plots Code
No ratings yet
EDA Plots Code
13 pages
ASSESSMENT2
No ratings yet
ASSESSMENT2
22 pages
1st PGM
No ratings yet
1st PGM
10 pages
S6 - Data Mining Lab Experiments (Except 1)
No ratings yet
S6 - Data Mining Lab Experiments (Except 1)
6 pages
ML Lab
No ratings yet
ML Lab
14 pages
1
No ratings yet
1
13 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
Tanu Raman ML Lab File
No ratings yet
Tanu Raman ML Lab File
21 pages
Machine Learning Lab Assignment 2
No ratings yet
Machine Learning Lab Assignment 2
23 pages
FDS All Practicals
No ratings yet
FDS All Practicals
10 pages
Data Preprocessing 2
No ratings yet
Data Preprocessing 2
5 pages
Final ML File
No ratings yet
Final ML File
34 pages
Wa0012.
No ratings yet
Wa0012.
30 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
Machine Learning Lab File
No ratings yet
Machine Learning Lab File
45 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
26 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
47 pages
ML Manual
No ratings yet
ML Manual
30 pages
DA Lab Manual r22
No ratings yet
DA Lab Manual r22
31 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
47 pages
Import As: Pandas PD DF PD - Read - CSV DF - Head
No ratings yet
Import As: Pandas PD DF PD - Read - CSV DF - Head
91 pages
AI Lab Codes.
No ratings yet
AI Lab Codes.
12 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
LAW 100 - Pana v. Heirs of Juanite, Sr. (G.R. No. 164201)
No ratings yet
LAW 100 - Pana v. Heirs of Juanite, Sr. (G.R. No. 164201)
3 pages
C What Is The Meaning of "Solar"?: Reading Skill 1. Word Study 1.1 Identifying Contextual Definition
No ratings yet
C What Is The Meaning of "Solar"?: Reading Skill 1. Word Study 1.1 Identifying Contextual Definition
11 pages
St. Vincent College of Cabuyao: Bachelor of Science in Information Technology
No ratings yet
St. Vincent College of Cabuyao: Bachelor of Science in Information Technology
3 pages
Bcom Sem Vi - Principles of Business Decisions
0% (1)
Bcom Sem Vi - Principles of Business Decisions
24 pages
3.2, Machine in The Garden
No ratings yet
3.2, Machine in The Garden
6 pages
Analyzing The Decentralization of Health Systems in Developing Countries: Decision Space, Innovation and Performance
No ratings yet
Analyzing The Decentralization of Health Systems in Developing Countries: Decision Space, Innovation and Performance
15 pages
Employment Law: Termination - Involuntary and Without Cause (Eric Shames, Esq.)
No ratings yet
Employment Law: Termination - Involuntary and Without Cause (Eric Shames, Esq.)
2 pages
Reading Evaluation 0503
No ratings yet
Reading Evaluation 0503
1 page
Question Tags Exercises 7
100% (1)
Question Tags Exercises 7
1 page
What Is Leukemia?: Ramoran, Pamela Grace L
No ratings yet
What Is Leukemia?: Ramoran, Pamela Grace L
4 pages
066 - Sison V David (1961) - Subido
No ratings yet
066 - Sison V David (1961) - Subido
3 pages
Mis List
No ratings yet
Mis List
47 pages
Deflection Paths and Reference Envelopes For Diaphragm Walls in The Taipei Basin
No ratings yet
Deflection Paths and Reference Envelopes For Diaphragm Walls in The Taipei Basin
12 pages
442HW Ii Stat
No ratings yet
442HW Ii Stat
2 pages
KS-5 Vol 10 Additional List - 6pp
No ratings yet
KS-5 Vol 10 Additional List - 6pp
16 pages
Summary of "A Midsummer Night's Dream"
No ratings yet
Summary of "A Midsummer Night's Dream"
4 pages
Course-502: Pedagogic Processes in Elementary Schools
No ratings yet
Course-502: Pedagogic Processes in Elementary Schools
18 pages
Unreleased Quorum Based Computations Paper
No ratings yet
Unreleased Quorum Based Computations Paper
19 pages
IT-Technical Recruiter
No ratings yet
IT-Technical Recruiter
3 pages
Jollibee Food Corporat ION: (Case Study Sample)
100% (2)
Jollibee Food Corporat ION: (Case Study Sample)
20 pages
Instructions: Read The Following Article and Answer The Questions. Six Sigma in Industry: Some Observations After Twenty-Five Years T. N. Goh
No ratings yet
Instructions: Read The Following Article and Answer The Questions. Six Sigma in Industry: Some Observations After Twenty-Five Years T. N. Goh
7 pages
Controlling Chapter Quiz
No ratings yet
Controlling Chapter Quiz
1 page
Assimilation
No ratings yet
Assimilation
2 pages
Understanding Contemporary India Sumit Ganguly Editor Neil Devotta Editor PDF Download
No ratings yet
Understanding Contemporary India Sumit Ganguly Editor Neil Devotta Editor PDF Download
88 pages
The Bible - Its 66 Books in Brief-By L M Grant
No ratings yet
The Bible - Its 66 Books in Brief-By L M Grant
44 pages
How To Manage Your Time Like A CEO
No ratings yet
How To Manage Your Time Like A CEO
47 pages
Defenses in Criminal Law
No ratings yet
Defenses in Criminal Law
14 pages