0% found this document useful (0 votes)
129 views161 pages

Data Science Bootcamp (Day-01) (1) - Compressed

Uploaded by

Hamza Mateen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
129 views161 pages

Data Science Bootcamp (Day-01) (1) - Compressed

Uploaded by

Hamza Mateen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 161

DATACRUMBS SESSION # 12

DATA SCIENCE
BOOTCAMP

PRESENTATION BY SYED ABIS DATA SCIENCE


BOOTCAMP
DAY 1
Applications of Machine Learning
DAY 1
Applications of Machine Learning
Applications of
Machine Learning
Applications of
Machine Learning
Applications of
Machine Learning
Applications of
Machine Learning
Applications of
Machine Learning
Applications of
Machine Learning
Applications of
Machine Learning
ML is the Future
Since the dawn of time
Since the dawn of time

Up until 2005
Since the dawn of time

Up until 2005

Humans had created


Since the dawn of time

Up until 2005

Humans had created

130 Exabytes of Data


2005 – 130 EXABYTES
2005 – 130 EXABYTES

2010 – 1,200 EXABYTES


2005 – 130 EXABYTES

2010 – 1,200 EXABYTES

2015 – 7,900 EXABYTES


2005 – 130 EXABYTES

2010 – 1,200 EXABYTES

2015 – 7,900 EXABYTES

2020 – 40,900 EXABYTES


Machine Learning Process
Data Pre-Processing
Machine Learning Process
Data Pre-Processing
Import the data
Machine Learning Process
Data Pre-Processing
Import the data
Clean the data
Machine Learning Process
Data Pre-Processing
Import the data
Clean the data
Split into training & test sets
Machine Learning Process
Data Pre-Processing
Import the data
Clean the data
Split into training & test sets
Feature Scaling
Machine Learning Process
Data Pre-Processing
Import the data
Clean the data
Split into training & test sets
Feature Scaling

Modelling
Machine Learning Process
Data Pre-Processing
Import the data
Clean the data
Split into training & test sets
Feature Scaling

Modelling
Build the model
Machine Learning Process
Data Pre-Processing
Import the data
Clean the data
Split into training & test sets
Feature Scaling

Modelling
Build the model
Train the model
Machine Learning Process
Data Pre-Processing
Import the data
Clean the data
Split into training & test sets
Feature Scaling

Modelling
Build the model
Train the model
Make predictions
Machine Learning Process
Data Pre-Processing
Import the data
Clean the data
Split into training & test sets
Feature Scaling

Modelling
Build the model
Train the model
Make predictions

Evaluation
Machine Learning Process
Data Pre-Processing
Import the data
Clean the data
Split into training & test sets
Feature Scaling

Modelling
Build the model
Train the model
Make predictions

Evaluation
Calculate performance metrics
Machine Learning Process
Data Pre-Processing
Import the data
Clean the data
Split into training & test sets
Feature Scaling

Modelling
Build the model
Train the model
Make predictions

Evaluation
Calculate performance metrics
Make a verdict
Training Set & Test Set
Training Set & Test Set

~
Training Set & Test Set

~
Training Set & Test Set

~
Training Set & Test Set

~
Training Set & Test Set

Train
80%
Training Set & Test Set

Train
80%
Training Set & Test Set

Train
80%

Test
20%
Training Set & Test Set

Train
80%

Test
20%
Training Set & Test Set

Train
80%

Test
20%
Training Set & Test Set

Train
80%

Test
20%
Training Set & Test Set

Train
80%

Test
20%
Training Set & Test Set

Train
80%

Test VS
20%
Training Set & Test Set

Train
80%

Test VS
20%
Feature Scaling
X1 X2 X3 X4

179.43 56.784 34.6181 3.55

641.87 62.054 47.7306 1.692

556.30 64.13 55.596 1.559

578.47 63.377 52.7121 1.679

591.16 61.553 46.1315 1.984

438.08 60.484 43.493 2.47

637.17 62.525 49.428 1.725


Feature Scaling
X1 X2 X3 X4

179.43 56.784 34.6181 3.55

641.87 62.054 47.7306 1.692

556.30 64.13 55.596 1.559

578.47 63.377 52.7121 1.679

591.16 61.553 46.1315 1.984

438.08 60.484 43.493 2.47

637.17 62.525 49.428 1.725


Feature Scaling
X1 X2 X3 X4

179.43 56.784 34.6181 3.55

641.87 62.054 47.7306 1.692

556.30 64.13 55.596 1.559

578.47 63.377 52.7121 1.679

591.16 61.553 46.1315 1.984

438.08 60.484 43.493 2.47

637.17 62.525 49.428 1.725


Feature Scaling
X1 X2 X3 X4

179.43 56.784 34.6181 3.55

641.87 62.054 47.7306 1.692

556.30 64.13 55.596 1.559

578.47 63.377 52.7121 1.679

591.16 61.553 46.1315 1.984

438.08 60.484 43.493 2.47

637.17 62.525 49.428 1.725


Feature Scaling
X1 X2 X3 X4

179.43 56.784 34.6181 3.55

641.87 62.054 47.7306 1.692

556.30 64.13 55.596 1.559

578.47 63.377 52.7121 1.679

591.16 61.553 46.1315 1.984

438.08 60.484 43.493 2.47

637.17 62.525 49.428 1.725


Feature Scaling
Feature Scaling
Feature Scaling
Feature Scaling
Feature Scaling
Feature Scaling
Feature Scaling
Feature Scaling
Feature Scaling
Feature Scaling

70,000 $
Feature Scaling

70,000 $

60,000 $
Feature Scaling

70,000 $

60,000 $
52,000 $
Feature Scaling

70,000 $

60,000 $
52,000 $
Feature Scaling

70,000 $ 45 yrs

60,000 $
52,000 $
Feature Scaling

70,000 $ 45 yrs

60,000 $ 44 yrs
52,000 $
Feature Scaling

70,000 $ 45 yrs

60,000 $ 44 yrs
52,000 $ 40 yrs
Feature Scaling

70,000 $ 45 yrs
10,000

60,000 $ 44 yrs
52,000 $ 40 yrs
Feature Scaling

70,000 $ 45 yrs
10,000

60,000 $ 44 yrs
8,000

52,000 $ 40 yrs
Feature Scaling

70,000 $ 45 yrs
10,000 1

60,000 $ 44 yrs
8,000

52,000 $ 40 yrs
Feature Scaling

70,000 $ 45 yrs
10,000 1

60,000 $ 44 yrs
8,000 4

52,000 $ 40 yrs
Feature Scaling

70,000 $ 45 yrs
10,000 1

60,000 $ 44 yrs
8,000 4

52,000 $ 40 yrs
Feature Scaling
Feature Scaling

70,000 $ 45 yrs
60,000 $ 44 yrs
52,000 $ 40 yrs
Feature Scaling

1 45 yrs
0.444 44 yrs
0 40 yrs
Feature Scaling

1 1
0.444 0.75
0 0
Feature Scaling

1 1
0.444 0.75
0 0
Libraries
Libraries
Libraries

Fast array computations


Libraries

Fast array computations

Multidimensional array handling


Libraries

Fast array computations

Multidimensional array handling


Linear algebra & stats functions
Libraries
Libraries

Plotting graphs
Libraries

Plotting graphs

Customizing plots
Libraries

Plotting graphs

Customizing plots
Visualizing data trends
Libraries
Libraries

Data manipulation
Libraries

Data manipulation

Handling missing data


Libraries

Data manipulation

Handling missing data


Data filtering & grouping
Importing the Dataset

dataset = pd.read_csv(“Data.csv”)

Variable Name Pandas Alias Pandas Function Filename


Importing the Dataset
Importing the Dataset

dataset = pd.read_csv(“Data.csv”)
Importing the Dataset

dataset = pd.read_csv(“Data.csv”)

Variable Name
Importing the Dataset

dataset = pd.read_csv(“Data.csv”)

Variable Name Pandas Alias


Importing the Dataset

dataset = pd.read_csv(“Data.csv”)

Variable Name Pandas Alias Pandas Function


Importing the Dataset

dataset = pd.read_csv(“Data.csv”)

Variable Name Pandas Alias Pandas Function Filename


Importing the Dataset

df.iloc[0]: Accesses the first row.


df.iloc[0:3]: Accesses the first three rows.
df.iloc[0, 1]: Accesses the element in the first
row and second column.
Importing the Dataset

df.iloc[0]: Accesses the first row.


df.iloc[0:3]: Accesses the first three rows.
df.iloc[0, 1]: Accesses the element in the first
row and second column.

df.iloc[:, 1]: Accesses every row in the second


column.
df.iloc[:, :]: Accesses all rows and all columns
(i.e., the entire DataFrame).
df.iloc[:, :-1]: Accesses every row till last
column
Seperating the variables

x = dataset.iloc[:, :-1].values
Seperating the variables

x = dataset.iloc[:, :-1].values

y = dataset.iloc[:, -1].values
Seperating the variables

x = dataset.iloc[:, :-1].values

y = dataset.iloc[:, -1].values

iloc is used for integer-location based


indexing to select rows and columns by their To return values
position. It allows you to access data by
specifying row and column indices.
Seperating the variables

x = dataset.iloc[:, :-1].values

y = dataset.iloc[:, -1].values

iloc is used for integer-location based


indexing to select rows and columns by their
position. It allows you to access data by
specifying row and column indices.
Sklearn
Sklearn
Sklearn

Machine learning algorithms


Sklearn

Machine learning algorithms

Model evaluation tools


Sklearn

Machine learning algorithms

Model evaluation tools


Data preprocessing functions
Replacing the missing values
Replacing the missing values

from sklearn.impute import SimpleImputer


Replacing the missing values

from sklearn.impute import SimpleImputer

This imports the SimpleImputer class from scikit-learn, which is


used to handle missing data by replacing them with a specified
strategy (e.g., mean, median, mode).
Replacing the missing values

imputer = SimpleImputer(missing_values=np.nan, strategy='mean')


Replacing the missing values

imputer = SimpleImputer(missing_values=np.nan, strategy='mean')

This creates an instance of SimpleImputer.


Replacing the missing values

imputer = SimpleImputer(missing_values=np.nan, strategy='mean')

This creates an instance of SimpleImputer.

strategy='mean': Specifies that missing values should be replaced


with the mean of the column.
Replacing the missing values

imputer = SimpleImputer(missing_values=np.nan, strategy='mean')

This creates an instance of SimpleImputer.

strategy='mean': Specifies that missing values should be replaced


with the mean of the column.

missing_values=np.nan: Specifies that missing values are


represented by np.nan (NaN values).
Replacing the missing values
Replacing the missing values

imputer.fit(X[:, 1:3])
Replacing the missing values

imputer.fit(X[:, 1:3])

fit() calculates the mean for the specified columns (in this case,
columns 1 and 2, as slicing starts at 0) where missing values are
found.
Replacing the missing values
Replacing the missing values

X[:, 1:3] = imputer.transform(X[:, 1:3])


Replacing the missing values

X[:, 1:3] = imputer.transform(X[:, 1:3])

transform() replaces the missing values in columns 1 and 2 with the


mean (calculated during the fit() step).
Replacing the missing values

X[:, 1:3] = imputer.transform(X[:, 1:3])

transform() replaces the missing values in columns 1 and 2 with the


mean (calculated during the fit() step).

The result is assigned back to X[:, 1:3], effectively updating these


columns with imputed values.
Replacing the missing values

from sklearn.impute import SimpleImputer


Replacing the missing values

from sklearn.impute import SimpleImputer

imputer = SimpleImputer(missing_values=np.nan, strategy='mean')


Replacing the missing values

from sklearn.impute import SimpleImputer

imputer = SimpleImputer(missing_values=np.nan, strategy='mean')

imputer.fit(X[:, 1:3])
Replacing the missing values

from sklearn.impute import SimpleImputer

imputer = SimpleImputer(missing_values=np.nan, strategy='mean')

imputer.fit(X[:, 1:3])

X[:, 1:3] = imputer.transform(X[:, 1:3])


Encoding Variables
Encoding Variables

from sklearn.compose import ColumnTransformer


Encoding Variables

from sklearn.compose import ColumnTransformer

ColumnTransformer: This class allows you to apply different


preprocessing techniques to different columns of your dataset.
Encoding Variables

from sklearn.preprocessing import OneHotEncoder


Encoding Variables

from sklearn.preprocessing import OneHotEncoder

OneHotEncoder: This is used to convert categorical variables (like


country names) into a format that can be provided to machine
learning algorithms to do a better job in prediction. It creates binary
(0 or 1) columns for each category.
Encoding Variables
Encoding Variables
ct = ColumnTransformer(transformers=
[('encoder', OneHotEncoder(), [0])],
remainder='passthrough')
Encoding Variables
ct = ColumnTransformer(transformers=
[('encoder', OneHotEncoder(), [0])],
remainder='passthrough')

A name for the transformer (in this case, 'encoder').


Encoding Variables
ct = ColumnTransformer(transformers=
[('encoder', OneHotEncoder(), [0])],
remainder='passthrough')

A name for the transformer (in this case, 'encoder').

The transformer itself (here, OneHotEncoder()).


Encoding Variables
ct = ColumnTransformer(transformers=
[('encoder', OneHotEncoder(), [0])],
remainder='passthrough')

A name for the transformer (in this case, 'encoder').

The transformer itself (here, OneHotEncoder()).

The indices of the columns to be transformed (in this case, [0], which indicates the
first column of your dataset, presumably containing the country names).
Encoding Variables
ct = ColumnTransformer(transformers=
[('encoder', OneHotEncoder(), [0])],
remainder='passthrough')

A name for the transformer (in this case, 'encoder').

The transformer itself (here, OneHotEncoder()).

The indices of the columns to be transformed (in this case, [0], which indicates the
first column of your dataset, presumably containing the country names).

remainder='passthrough': This parameter specifies that any columns not specified in the
transformers should be passed through unchanged. This means that after applying the
OneHotEncoder, the other columns (if any) in your dataset will remain intact.
Encoding Variables
Encoding Variables

X = np.array(ct.fit_transform(X))
Encoding Variables

X = np.array(ct.fit_transform(X))

fit_transform(X): This method fits the transformer to the data X and


then transforms it. In this context, it will convert the first column of
country names into a one-hot encoded format, while leaving the
other columns as they are.
Encoding Variables

X = np.array(ct.fit_transform(X))

fit_transform(X): This method fits the transformer to the data X and


then transforms it. In this context, it will convert the first column of
country names into a one-hot encoded format, while leaving the
other columns as they are.

np.array(...): This converts the transformed data back


into a NumPy array.
Encoding Variables

Country Feature1 Feature2

USA 1 3

Canada 2 4

USA 1 2

Mexico 3 1
Encoding Variables

USA Canada Mexico Feature1 Feature2

1 0 0 1 3

0 1 0 2 4

1 0 0 1 2

0 0 1 3 1
Encoding Variables
Encoding Variables

from sklearn.preprocessing import LabelEncoder


Importing the Dataset

from sklearn.preprocessing import LabelEncoder

The LabelEncoder is used to convert categorical labels (e.g.,


country names, class labels, etc.) into a numerical format. This is
useful for machine learning algorithms that require numerical
input.
Encoding Variables

label_encoder = LabelEncoder()
Importing the Dataset

label_encoder = LabelEncoder()

This creates an instance of LabelEncoder.


Encoding Variables
Encoding Variables

y = le.fit_transform(y)
Encoding Variables

y = le.fit_transform(y)

fit(y): It finds the unique categories (labels) in y


Encoding Variables

y = le.fit_transform(y)

fit(y): It finds the unique categories (labels) in y

transform(y): It converts each category label into a corresponding integer, based on


the order in which the categories are encountered.
Encoding Variables

y = le.fit_transform(y)

fit(y): It finds the unique categories (labels) in y

transform(y): It converts each category label into a corresponding integer, based on


the order in which the categories are encountered.

The result is a numeric array where each unique


category in y is replaced with a unique integer.

You might also like