0% found this document useful (0 votes)

19 views1 page

A Step-By-Step Guide To Robust ML Classification by Ryan Burke Mar, 2023 Towards Data Science

Uploaded by

Proton Chaosrikul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views1 page

A Step-By-Step Guide To Robust ML Classification by Ryan Burke Mar, 2023 Towards Data Science

Uploaded by

Proton Chaosrikul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Search Medium Write

New: Navigate Medium

Published from
in Towards Data Science
the top of the page, and
Get unlimited access
focus more on reading as
you scroll.
You
Okay, gothave
it 1 free member-only story left this month. Upgrade for unlimited access.

Ryan Burke
Mar 3 · 17 min read · Member-only · Listen Ryan Burke
166 Followers

A step-by-step guide to robust ML classification Data scientist and a life-long learner.

How to avoid common pitfalls and dig deeper into our models Follow

Noble Ack… in Towards Data S…

son ence
GPT Is an Unreliable
Information Store

Youssef Hosni in Level Up Coding

13 SQL Statements for 90% of

Your Data Science Tasks

Wei Yi in Towards Data Science

Understanding the Denoising

Diffusion Probabilistic Model,
the Socratic Way

Yasemin Turker, PhD

Data Science and Industrial

Engineering: The Perfect
Match for Business…
Optimization

Photo by Luca Bravo on Unsplash Help Status Writers Blog Careers Privacy Terms About
Text to speech

In previous articles, I focused mainly on presenting individual algorithms

that I found interesting. Here, I walk through a complete ML classification
project. The goal is to touch on some of the common pitfalls in ML projects
and describe to the readers how to avoid them. I will also demonstrate how
we can go further by analysing our model errors to gain important insights
that normally go unseen.

If you would like to see the whole notebook, please check it out → here ←

Libraries
Below, you will find a list of the libraries I used for today’s analyses. They
consist of the standard data science toolkit along with the necessary sklearn
libraries.

import sys
import os
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns
from IPython.display import display
%matplotlib inline

import plotly.offline as py
import plotly.graph_objs as go
import plotly.tools as tls
py.init_notebook_mode(connected=True)

import warnings
warnings.filterwarnings('ignore')

from pandas import set_option

from pandas.plotting import scatter_matrix
from sklearn.preprocessing import StandardScaler, MinMaxScaler, QuantileTransformer, RobustScaler
from sklearn.model_selection import train_test_split, KFold, StratifiedKFold, cross_val_score, GridSearchCV
from sklearn.feature_selection import RFECV, SelectFromModel, SelectKBest, f_classif
from sklearn.metrics import classification_report, confusion_matrix, balanced_accuracy_score, ConfusionMatrixDisplay, f1_score
from sklearn.pipeline import Pipeline
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier, VotingClassifier
from scipy.stats import uniform

from imblearn.over_sampling import ADASYN

import swifter

# Always good to set a seed for reproducibility

SEED = 8
np.random.seed(SEED)

Data
Today’s dataset includes the forest cover data that is ready-to-employ with
sklearn. Here’s a description from sklearn’s site.

Data Set Characteristics:

The samples in this dataset correspond to 30×30m patches of forest in the US,
collected for the task of predicting each patch’s cover type, i.e. the dominant
species of tree. There are seven cover types, making this a multi-class
classification problem. Each sample has 54 features, described on the dataset’s
homepage. Some of the features are boolean indicators, while others are
discrete or continuous measurements.

Number of Instances: 581 012

Feature information (Name / Data Type / Measurement / Description)

Elevation / quantitative /meters / Elevation in meters

Aspect / quantitative / azimuth / Aspect in degrees azimuth

Slope / quantitative / degrees / Slope in degrees

Horizontal_Distance_To_Hydrology / quantitative / meters / Horz Dist to

nearest surface water features

Vertical_Distance_To_Hydrology / quantitative / meters / Vert Dist to

nearest surface water features

Horizontal_Distance_To_Roadways / quantitative / meters / Horz Dist to

nearest roadway

Hillshade_9am / quantitative / 0 to 255 index / Hillshade index at 9am,

summer solstice

Hillshade_Noon / quantitative / 0 to 255 index / Hillshade index at noon,

summer soltice

Hillshade_3pm / quantitative / 0 to 255 index / Hillshade index at 3pm,

summer solstice

Horizontal_Distance_To_Fire_Points / quantitative / meters / Horz Dist to

nearest wildfire ignition points

Wilderness_Area (4 binary columns) / qualitative / 0 (absence) or 1

(presence) / Wilderness area designation

Soil_Type (40 binary columns) / qualitative / 0 (absence) or 1 (presence) /

Soil Type designation

Number of classes:
Cover_Type (7 types) / integer / 1 to 7 / Forest Cover Type designation

Load dataset
Here’s a simple function to load this data into your notebook as a dataframe.

columns = ['Elevation', 'Aspect', 'Slope', 'Horizontal_Distance_To_Hydrology', 'Vertical_Distance_To_Hydrology'

'Hillshade_9am', 'Hillshade_Noon', 'Hillshade_3pm', 'Horizontal_Distance_To_Fire_Points'
'Wilderness_Area_3', 'Soil_Type_0', 'Soil_Type_1', 'Soil_Type_2', 'Soil_Type_3'
'Soil_Type_9', 'Soil_Type_10', 'Soil_Type_11', 'Soil_Type_12', 'Soil_Type_13',
'Soil_Type_19', 'Soil_Type_20', 'Soil_Type_21', 'Soil_Type_22', 'Soil_Type_23',
'Soil_Type_29', 'Soil_Type_30', 'Soil_Type_31', 'Soil_Type_32', 'Soil_Type_33',
'Soil_Type_39']

from sklearn import datasets

def sklearn_to_df(sklearn_dataset):
df = pd.DataFrame(sklearn_dataset.data, columns=columns)
df['target'] = pd.Series(sklearn_dataset.target)
return df

df = sklearn_to_df(datasets.fetch_covtype())
df_name=df.columns
df.head(3)

Using df.info() and df.describe() to get to know our data better, we see that
there are no missing data and it consists of quantitative variables. The
dataset is also rather large (> 580 000 rows). I originally tried to run this on
the entire dataset, but it took FOREVER, so I recommend using a fraction of
the data.

Regarding the target variable, which is the forest cover class, using
df.target.value_counts(), we see the following distribution (in descending
order):

Class 2 = 283,301
Class 1 = 211,840
Class 3 = 35,754
Class 7 = 20,510
Class 6 = 17,367
Class 5 = 9,493
Class 4 = 2,747

It is important to note that our classes are imbalanced and we will need to
keep this in mind when selecting a metric to evaluate our models.

Prepare your data

One of the most common misunderstandings when running ML models is
processing our data prior to splitting. Why is this a problem?

Let’s say we plan on scaling our data using the whole dataset. The equations
below are taken from their respective links.

Ex1 StandardScaler()

z = (x — u) / s

Ex2 MinMaxScaler()

X_std = (X - X.min()) / (X.max() - X.min())

X_scaled = X_std * (max - min) + min

The most important thing we should notice is they include information such
as mean, standard deviation, min, max. If we perform these functions prior
to splitting, the features in our train set will be computed based on
information included in the test set. This is an example of data leakage.

Data leakage is when information from outside the training dataset is used to
create the model. This additional information can allow the model to learn or
know something that it otherwise would not know and in turn invalidate the
estimated performance of the mode being constructed.

Therefore, the first step after getting to know our dataset is to split it and
keep your test set unseen until the very end. In the code below, we split the
data into 80% (training set) and 20% (test set). You will also note that I have
only kept 50,000 total samples to reduce the time it takes to train & evaluate
our models. Trust me, you will thank me later!

It is also worth noting that we are stratifying on the target variable. This is
good practice for imbalanced datasets as it maintains the distribution of
classes in the train and test set. If we don’t do this, there’s a chance that some
of the underrepresented classes aren’t even present in our train or test sets.

# here we are first separating our df into features (X) and target (y)
X = df[df_name[0:54]]
Y = df[df_name[54]]

# now we are separating into training (80%) and test (20%) sets. The test set won't be seen until we want to test our top model!
X_train, X_test, y_train, y_test =train_test_split(X,Y,
train_size = 40_000,
test_size=10_000,
random_state=SEED,
stratify=df['target']) # we stratify to ensure similar distribution in train/te

Feature engineering
With our train and test sets ready, we can now work on the fun stuff. The
first step in this project is to generate some features that could add useful
information to train our models.

This step can be a little tricky. In the real world, this requires domain-
specific knowledge on the particular subject you are working. To be
completely transparent with you, despite being a lover of nature and
everything outdoors, I am no expert in why certain trees grow in specific
areas.

For this reason, I have consulted [1] [2] [3] who have a better understanding
of this domain than myself. I have amalgamated the knowledge from these
references to create the features you will find below.

# engineering new columns from our df

def FeatureEngineering(X):

X['Aspect'] = X['Aspect'] % 360

X['Aspect_120'] = (X['Aspect'] + 120) % 360

X['Hydro_Elevation_sum'] = X['Elevation'] + X['Vertical_Distance_To_Hydrology'

X['Hydro_Elevation_diff'] = abs(X['Elevation'] - X['Vertical_Distance_To_Hydrology'

X['Hydro_Euclidean'] = np.sqrt(X['Horizontal_Distance_To_Hydrology']**2 +
X['Vertical_Distance_To_Hydrology']**2)

X['Hydro_Manhattan'] = abs(X['Horizontal_Distance_To_Hydrology'] +
X['Vertical_Distance_To_Hydrology'])

X['Hydro_Distance_sum'] = X['Horizontal_Distance_To_Hydrology'] + X['Vertical_Distance_To_Hydrology'

X['Hydro_Distance_diff'] = abs(X['Horizontal_Distance_To_Hydrology'] - X['Vertical_Distance_To_Hydrology'

X['Hydro_Fire_sum'] = X['Horizontal_Distance_To_Hydrology'] + X['Horizontal_Distance_To_Fire_Points'

X['Hydro_Fire_diff'] = abs(X['Horizontal_Distance_To_Hydrology'] + X['Horizontal_Distance_To_Fire_Points'

X['Hydro_Fire_mean'] = (X['Horizontal_Distance_To_Hydrology'] + X['Horizontal_Distance_To_Fire_Points'

X['Hydro_Road_sum'] = X['Horizontal_Distance_To_Hydrology'] + X['Horizontal_Distance_To_Roadways'

X['Hydro_Road_diff'] = abs(X['Horizontal_Distance_To_Hydrology'] + X['Horizontal_Distance_To_Roadways'

X['Hydro_Road_mean'] = (X['Horizontal_Distance_To_Hydrology'] + X['Horizontal_Distance_To_Roadways'

X['Road_Fire_sum'] = X['Horizontal_Distance_To_Roadways'] + X['Horizontal_Distance_To_Fire_Points'

X['Road_Fire_diff'] = abs(X['Horizontal_Distance_To_Roadways'] - X['Horizontal_Distance_To_Fire_Points'

X['Road_Fire_mean'] = (X['Horizontal_Distance_To_Roadways'] + X['Horizontal_Distance_To_Fire_Points'

X['Hydro_Road_Fire_mean'] = (X['Horizontal_Distance_To_Hydrology'] + X['Horizontal_Distance_To_Roadways'

X['Horizontal_Distance_To_Fire_Points'])/3

return X

X_train = X_train.swifter.apply(FeatureEngineering, axis = 1)

X_test = X_test.swifter.apply(FeatureEngineering, axis = 1)

On a side note, when you are working with large datasets, pandas can be
somewhat slow. Using swifter, as you can see in the last two lines above, you
can significantly speed up the time it takes to apply a function to your
dataframe. The article → here compares several methods used to speed this
process up.

Feature selection
At this point we have more than 70 features. If the goal is end up with the
best performing model, then you could try to use all of these as inputs. With
that said, often in business there is a trade-off between performance and
complexity that needs to be considered.

As an example, suppose we have 94% accuracy in our model using all of

these features. Then, imagine we have 89% accuracy with only 4 features.
What is the cost we are willing to pay for a more interpretable model. Always
weigh performance and complexity.

Keeping that in mind, I will perform feature selection to try and reduce the
complexity right away. Sklearn provides many options worth considering. In
this example, I will use SelectKBest which will select a pre-specified number
of features that provide the best performance. Below, I have requested (and
listed) the best performing 15 features. These are the features that I will use
to train the models in the following section.

selector = SelectKBest(f_classif, k=15)

selector.fit(X_train, y_train)
mask = selector.get_support()
X_train_reduced_cols = X_train.columns[mask]

X_train_reduced_cols

>>> Index(['Elevation', 'Wilderness_Area_3', 'Soil_Type_2', 'Soil_Type_3',

'Soil_Type_9', 'Soil_Type_37', 'Soil_Type_38', 'Hydro_Elevation_sum',
'Hydro_Elevation_diff', 'Hydro_Road_sum', 'Hydro_Road_diff',
'Hydro_Road_mean', 'Road_Fire_sum', 'Road_Fire_mean',
'Hydro_Road_Fire_mean'],
dtype='object')

Baseline models
In this section I will compare three different classifiers:

KNeighboursClassifier

RandomForestClassifier

ExtraTreesClassifer

I have provided links for those who wish to investigate each model further.
They will also be helpful in the section on hyperparameter tuning, where
you can find all modifiable parameters when trying to improve your models.
Below you will find two functions to define and evaluate the baseline
models.

# baseline models
def GetBaseModels():
baseModels = []
baseModels.append(('KNN' , KNeighborsClassifier()))
baseModels.append(('RF' , RandomForestClassifier()))
baseModels.append(('ET' , ExtraTreesClassifier()))

return baseModels

def ModelEvaluation(X_train, y_train,models):

# define number of folds and evaluation metric
num_folds = 10
scoring = "f1_weighted" #This is suitable for imbalanced classes

results = []
names = []
for name, model in models:
kfold = StratifiedKFold(n_splits=num_folds, random_state=SEED, shuffle = True)
cv_results = cross_val_score(model, X_train, y_train, cv=kfold, scoring=scoring, n_jobs = -1)
results.append(cv_results)
names.append(name)
msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std())
print(msg)

return names, results

There are some key elements in the second function that are worth
discussing further. The first of which is StratifiedKFold. Recall, we split the
original dataset into 80% training and 20% test. The test set will be reserved
for the final evaluation of our top performing model.

Using cross-validation will provide us with a better evaluation of our models.

Specifically, I have set up a 10-fold cross-validation. For those not familiar,
the model is trained on k — 1 folds and is validated on the remaining fold at
each step. At the end you will have access to an average and variation of the k
models, providing you with better insight than a simple train-test evaluation.
Stratified K fold, as I eluded to earlier, is used to ensure that each fold has an
approximately equal representation of the target classes.

The second point worth discussing is the scoring metric. There are many
metrics available to evaluate the performance of your models, and often
there are several that could suit your project. It’s important to keep in mind
what you are trying to demonstrate with the results. If you work in a
business setting, often the metric that is most easily explained to those
without a data background is preferred.

On the other hand, there are metrics that are unsuitable to your analyses.
For this project, we have imbalanced classes. If you go to the link provided
above, you will find options for this case. I opted to use the weighted F1
score. Let’s briefly discuss why I chose this metric.

A very common classification metric is accuracy, which is the percent of

correct classifications. While this may seem like an excellent option,
suppose we have a binary classification where the target classes are uneven
(i.e. group 1 = 90, group 2 = 10). It is possible to have 90% accuracy, which is
great, but if we explore further, we have correctly classified all of group 1
and failed to classify any of the group 2. In this case our model is not terribly
informative.

If we would have used the weighted F1 score we would have a result of

42.6%. If you’re interested in reading more on the F1 score → here is an
article explaining how it is calculated.

After training the baseline models, I have plotted the results from each
below. The baseline models all performed relatively well. Remember, at this
point I have done nothing to the data (i.e. transform, remove outliers). The
Extra trees classifier had the highest weighted F1 score at 86.9%.

Results from the 10-fold CV. KNN had the lowest at 78.8%, the RF was second with 85.9%, and the ET had the
highest weighted F1 score at 86.9%. Image provided by author

Transforming the data

The next step in this project will look at the effects of data transformation on
model performance. Whereas many decision tree-based algorithms are not
sensitive to the magnitude of the data, it is reasonable to expect that models
measuring distance between samples , such as the KNN perform differently
when scaled [4] [5]. In this section, we will scale our data using
StandardScaler and MinMaxScaler as described above. Below you will find a
function that describes a pipeline that will apply the scaler and then train
the model using scaled data.

def GetScaledModel(nameOfScaler):

if nameOfScaler == 'standard':
scaler = StandardScaler()
elif nameOfScaler =='minmax':
scaler = MinMaxScaler()

pipelines = []
pipelines.append((nameOfScaler+'KNN' , Pipeline([('Scaler', scaler),('KNN' , KNeighborsClassifier())])))
pipelines.append((nameOfScaler+'RF' , Pipeline([('Scaler', scaler),('RF' , RandomForestClassifier())])))
pipelines.append((nameOfScaler+'ET' , Pipeline([('Scaler', scaler),('ET' , ExtraTreesClassifier())])))

return pipelines

The results using the StandardScaler are presented below. We see that our
hypothesis regarding scaling the data appears to hold. Both the random
forest and extra trees classifiers both performed nearly identically, whereas
the KNN improved in performance by roughly 4%. Despite this increase, the
two tree-based classifiers still outperform the scaled KNN.

Results from the 10-fold CV using the StandardScaler to transform our data. KNN still had the lowest although
the performance increased to 83.8%. The RF was second with 85.8%, and the ET once again had the highest
weighted F1 score at 86.8%. Image provided by author

Similar results can be seen when the MinMaxScaler is used. The results from
all models are almost identical to those presented using the StandardScaler.

Results from the 10-fold CV using the MinMaxScaler to transform our data. Each performed almost identically
to those using StandardScaler. KNN still had the lowest at 83.9%. The RF was second with 86.0%, and the ET
once again had the highest weighted F1 score at 87.0%. Image provided by author

It is worth noting at this point that I also checked the effect of removing
outliers. For this, I removed values that were beyond +/- 3 SD for each
feature. I am not presenting the results here because there were no values
outside this range. If you are interested in seeing how this was performed,
please feel free to check out the notebook found at the link provided at the
beginning of this article.

Hyperparameter tuning using GridSearchCV

The next step is to try and improve our models by tuning the
hyperparameters. We will do so on the scaled data because it had the best
average performance when considering our three models. Sklearn discusses
this in more detail → here.

I chose to use GridSearchCV (CV for cross validated). Below you will find a
function that performs a 10-fold cross validation on the models we have been
using. The one additional detail here is that we need to provide the list of
hyperparameters we want to be evaluated.

Up to this point, we have not even looked at our test set. Before commencing
the grid search, we will scale our train and test data using the
StandardScaler. We are doing this here because we are going to find the best
hyperparameters for each model and use those as inputs into a
VotingClassifier (as we will discuss in the next section).

To properly scale our full dataset 49

we have to follow the procedure below. You

Guide To Intelligent Data Science: Michael R. Berthold Christian Borgelt Frank Höppner Frank Klawonn Rosaria Silipo
100% (1)
Guide To Intelligent Data Science: Michael R. Berthold Christian Borgelt Frank Höppner Frank Klawonn Rosaria Silipo
427 pages
360DigiTMG Practical Data Science New
100% (1)
360DigiTMG Practical Data Science New
168 pages
Data Science Training in Naresh I Technologies
100% (3)
Data Science Training in Naresh I Technologies
18 pages
Coffee Break NumPy PDF
100% (5)
Coffee Break NumPy PDF
211 pages
Machine Learning in Action
100% (1)
Machine Learning in Action
1 page
Data Science Program by Simplilearn
No ratings yet
Data Science Program by Simplilearn
32 pages
5 Free Books To Master Machine Learning - KDnuggets
No ratings yet
5 Free Books To Master Machine Learning - KDnuggets
11 pages
AI-ML Syllabus
100% (1)
AI-ML Syllabus
8 pages
DsNaIT v2.0
No ratings yet
DsNaIT v2.0
43 pages
3 Must-Have Projects For Your Data Science Portfolio - by Aakash N S - Jovian - Jan, 2021 - Medium
No ratings yet
3 Must-Have Projects For Your Data Science Portfolio - by Aakash N S - Jovian - Jan, 2021 - Medium
1 page
Week 12 Intro To DS and ML
No ratings yet
Week 12 Intro To DS and ML
67 pages
Data Science Minimum - 10 Essential Skills You Need To Know To Start Doing Data Science - KDnuggets
No ratings yet
Data Science Minimum - 10 Essential Skills You Need To Know To Start Doing Data Science - KDnuggets
8 pages
Getting Ready For ML Projects - Zero To Mastery Data Science and Machine Learning Bootcamp
No ratings yet
Getting Ready For ML Projects - Zero To Mastery Data Science and Machine Learning Bootcamp
2 pages
CS464 Ch1 Intro Fall2020
No ratings yet
CS464 Ch1 Intro Fall2020
83 pages
Data Science Masters Pro 2024 (Syllabus)
No ratings yet
Data Science Masters Pro 2024 (Syllabus)
16 pages
Data Science and - Engineering at Scale Oreilly Ebook - 44023544USEN
100% (2)
Data Science and - Engineering at Scale Oreilly Ebook - 44023544USEN
91 pages
Guide To Intelligent Data Analysis
No ratings yet
Guide To Intelligent Data Analysis
398 pages
机器学习 - 学习笔记 (All in One) - V0.97更多医学课请加微信782878241
No ratings yet
机器学习 - 学习笔记 (All in One) - V0.97更多医学课请加微信782878241
762 pages
Data Science & AIML Coursework
No ratings yet
Data Science & AIML Coursework
10 pages
Ids PDF
No ratings yet
Ids PDF
397 pages
Introduction To Data Science - Lin and Li
No ratings yet
Introduction To Data Science - Lin and Li
403 pages
SWE 227 Slide 01
No ratings yet
SWE 227 Slide 01
21 pages
Unit - I & II
No ratings yet
Unit - I & II
59 pages
From Field Problems To Machine Learning
No ratings yet
From Field Problems To Machine Learning
51 pages
INSOFE-Comprehensive Curriculum On Big Data Analytics
No ratings yet
INSOFE-Comprehensive Curriculum On Big Data Analytics
11 pages
Practitioner's Guide To Data Science
No ratings yet
Practitioner's Guide To Data Science
403 pages
Data Science Course Syllabus 01
100% (1)
Data Science Course Syllabus 01
20 pages
GE 461 Introduction To Data Science: Spring 2021
No ratings yet
GE 461 Introduction To Data Science: Spring 2021
39 pages
EdYoda Data Scientist Program Curriculum
No ratings yet
EdYoda Data Scientist Program Curriculum
20 pages
Introductory Data Science and ML
No ratings yet
Introductory Data Science and ML
25 pages
Introduction To Data Science: Hui Lin and Ming Li
No ratings yet
Introduction To Data Science: Hui Lin and Ming Li
403 pages
Data Science S (2 Files Merged)
No ratings yet
Data Science S (2 Files Merged)
30 pages
Data Science Side Quests - 4 Uncommon Projects To Elevate Your Skills - KDnuggets
No ratings yet
Data Science Side Quests - 4 Uncommon Projects To Elevate Your Skills - KDnuggets
7 pages
Self-Learning Data Science
No ratings yet
Self-Learning Data Science
16 pages
4 Automatic Outlier Detection Algorithms in Python
No ratings yet
4 Automatic Outlier Detection Algorithms in Python
2 pages
Data Prep and Cleaning For Machine Learning
No ratings yet
Data Prep and Cleaning For Machine Learning
22 pages
Ai Python
No ratings yet
Ai Python
7 pages
360DigiTmg E Book Data Science
100% (1)
360DigiTmg E Book Data Science
168 pages
Complete Download (Ebook) The Data Science Design Manual by Steven S. Skiena ISBN 9783319554433, 3319554433 PDF All Chapters
100% (9)
Complete Download (Ebook) The Data Science Design Manual by Steven S. Skiena ISBN 9783319554433, 3319554433 PDF All Chapters
65 pages
PGP-AIML Curriculum - Great Lakes
No ratings yet
PGP-AIML Curriculum - Great Lakes
43 pages
Data Science - Ebook
No ratings yet
Data Science - Ebook
32 pages
Data Science & Aiml (Mile Stone Solution)
No ratings yet
Data Science & Aiml (Mile Stone Solution)
37 pages
Data Science Solutions Sample
100% (6)
Data Science Solutions Sample
53 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
11 pages
AI Strategy Flow Chart Share by WorldLine Technology
No ratings yet
AI Strategy Flow Chart Share by WorldLine Technology
1 page
DS Tools&Techniques
No ratings yet
DS Tools&Techniques
36 pages
Steven Skiena-The Algorithm Design Manual-En
50% (2)
Steven Skiena-The Algorithm Design Manual-En
27 pages
Data Science Project - An Inductive Learning Approach, Verri
No ratings yet
Data Science Project - An Inductive Learning Approach, Verri
238 pages
Step-By-Step Tutorial To Building Your First Machine Learning Model - KDnuggets
No ratings yet
Step-By-Step Tutorial To Building Your First Machine Learning Model - KDnuggets
20 pages
PRCV Viva Notes
No ratings yet
PRCV Viva Notes
32 pages
Mastering in Data Science 3RITPL
No ratings yet
Mastering in Data Science 3RITPL
33 pages
Answer Key Split Up Fds
No ratings yet
Answer Key Split Up Fds
11 pages
(Program Curriculum) : PG Diploma in Data Science
No ratings yet
(Program Curriculum) : PG Diploma in Data Science
6 pages
Predictive Analytics in Marketing
No ratings yet
Predictive Analytics in Marketing
90 pages
Agent Cheat Sheet in AIML
No ratings yet
Agent Cheat Sheet in AIML
2 pages
Executive Program in Data Science & Data Analytics Along With Python
No ratings yet
Executive Program in Data Science & Data Analytics Along With Python
21 pages
1 C2 Ho Pxyvdp MXXUfo T5 Fi K
No ratings yet
1 C2 Ho Pxyvdp MXXUfo T5 Fi K
30 pages
EdYoda Data Scientist Program Curriculum
No ratings yet
EdYoda Data Scientist Program Curriculum
14 pages
Breaking Into AI!
No ratings yet
Breaking Into AI!
30 pages
Machine Learning in Healthcare
From Everand
Machine Learning in Healthcare
Vaibhav Rupapara
No ratings yet
Ultimate Machine Learning with Scikit-Learn: Unleash the Power of Scikit-Learn and Python to Build Cutting-Edge Predictive Modeling Applications and Unlock Deeper Insights Into Machine Learning (English Edition)
From Everand
Ultimate Machine Learning with Scikit-Learn: Unleash the Power of Scikit-Learn and Python to Build Cutting-Edge Predictive Modeling Applications and Unlock Deeper Insights Into Machine Learning (English Edition)
Parag Saxena
No ratings yet
Be Computer-Engineering Semester-7
No ratings yet
Be Computer-Engineering Semester-7
1 page
(Each Question Carries 10 Marks) Answer All The Questions
No ratings yet
(Each Question Carries 10 Marks) Answer All The Questions
1 page
Data Analysis Plan
No ratings yet
Data Analysis Plan
23 pages
One-Sample Statistics: N Mean Std. Deviation Std. Error Mean Size - of - The - Home - in - Squ Are - Feet
No ratings yet
One-Sample Statistics: N Mean Std. Deviation Std. Error Mean Size - of - The - Home - in - Squ Are - Feet
4 pages
OPPORTUNITY AND CHALLENGE OF INTER CASTE MARRIAGE Proposal
No ratings yet
OPPORTUNITY AND CHALLENGE OF INTER CASTE MARRIAGE Proposal
24 pages
Res511 - Group 4
No ratings yet
Res511 - Group 4
36 pages
Anova Table
No ratings yet
Anova Table
1 page
BUS 511 Presentation: Topic: Estimating The Housing Price in Oregon
No ratings yet
BUS 511 Presentation: Topic: Estimating The Housing Price in Oregon
20 pages
22-Article Text-144-3-10-20200220
No ratings yet
22-Article Text-144-3-10-20200220
10 pages
Time Series Analysis - COMPLETE
No ratings yet
Time Series Analysis - COMPLETE
15 pages
DBSCAN Clustering Algorithm: Presented by
No ratings yet
DBSCAN Clustering Algorithm: Presented by
22 pages
Univariate Bivariavte Multivariate
No ratings yet
Univariate Bivariavte Multivariate
10 pages
Final Exam in Stats
No ratings yet
Final Exam in Stats
6 pages
Multiple Regressor - Jupyter Notebook
No ratings yet
Multiple Regressor - Jupyter Notebook
78 pages
Course Outline - FM217
No ratings yet
Course Outline - FM217
4 pages
Udacity Data Analysis
75% (4)
Udacity Data Analysis
5 pages
ST 3101 Slides
No ratings yet
ST 3101 Slides
251 pages
Fanatisme Dan Perilaku Agresif Verbal Di Media Sos PDF
No ratings yet
Fanatisme Dan Perilaku Agresif Verbal Di Media Sos PDF
26 pages
Eda Lab
No ratings yet
Eda Lab
43 pages
The Relationship Between Human Resource Practices and Employee Retention
No ratings yet
The Relationship Between Human Resource Practices and Employee Retention
244 pages
Unit 1 Rept
No ratings yet
Unit 1 Rept
61 pages
Workerareportaje
No ratings yet
Workerareportaje
23 pages
Genre Analysis
No ratings yet
Genre Analysis
10 pages
Chapter 4
No ratings yet
Chapter 4
15 pages
R03 - Construction Quality & Safety Systems
No ratings yet
R03 - Construction Quality & Safety Systems
35 pages
Big Data Computing - Assignment 6
No ratings yet
Big Data Computing - Assignment 6
3 pages
Matthias Schonlau, Ph.D. Statistical Learning - Classification Stat441
No ratings yet
Matthias Schonlau, Ph.D. Statistical Learning - Classification Stat441
30 pages
Nagarjuna Pilaka Thesis
No ratings yet
Nagarjuna Pilaka Thesis
284 pages
Syllabus (CBCS) : Faculty of Commerce & Business Management, Kakatiya University
No ratings yet
Syllabus (CBCS) : Faculty of Commerce & Business Management, Kakatiya University
50 pages