Data Science Bootcamp (Day-01) (1) - Compressed
Data Science Bootcamp (Day-01) (1) - Compressed
DATA SCIENCE
BOOTCAMP
Up until 2005
Since the dawn of time
Up until 2005
Up until 2005
Modelling
Machine Learning Process
Data Pre-Processing
Import the data
Clean the data
Split into training & test sets
Feature Scaling
Modelling
Build the model
Machine Learning Process
Data Pre-Processing
Import the data
Clean the data
Split into training & test sets
Feature Scaling
Modelling
Build the model
Train the model
Machine Learning Process
Data Pre-Processing
Import the data
Clean the data
Split into training & test sets
Feature Scaling
Modelling
Build the model
Train the model
Make predictions
Machine Learning Process
Data Pre-Processing
Import the data
Clean the data
Split into training & test sets
Feature Scaling
Modelling
Build the model
Train the model
Make predictions
Evaluation
Machine Learning Process
Data Pre-Processing
Import the data
Clean the data
Split into training & test sets
Feature Scaling
Modelling
Build the model
Train the model
Make predictions
Evaluation
Calculate performance metrics
Machine Learning Process
Data Pre-Processing
Import the data
Clean the data
Split into training & test sets
Feature Scaling
Modelling
Build the model
Train the model
Make predictions
Evaluation
Calculate performance metrics
Make a verdict
Training Set & Test Set
Training Set & Test Set
~
Training Set & Test Set
~
Training Set & Test Set
~
Training Set & Test Set
~
Training Set & Test Set
Train
80%
Training Set & Test Set
Train
80%
Training Set & Test Set
Train
80%
Test
20%
Training Set & Test Set
Train
80%
Test
20%
Training Set & Test Set
Train
80%
Test
20%
Training Set & Test Set
Train
80%
Test
20%
Training Set & Test Set
Train
80%
Test
20%
Training Set & Test Set
Train
80%
Test VS
20%
Training Set & Test Set
Train
80%
Test VS
20%
Feature Scaling
X1 X2 X3 X4
70,000 $
Feature Scaling
70,000 $
60,000 $
Feature Scaling
70,000 $
60,000 $
52,000 $
Feature Scaling
70,000 $
60,000 $
52,000 $
Feature Scaling
70,000 $ 45 yrs
60,000 $
52,000 $
Feature Scaling
70,000 $ 45 yrs
60,000 $ 44 yrs
52,000 $
Feature Scaling
70,000 $ 45 yrs
60,000 $ 44 yrs
52,000 $ 40 yrs
Feature Scaling
70,000 $ 45 yrs
10,000
60,000 $ 44 yrs
52,000 $ 40 yrs
Feature Scaling
70,000 $ 45 yrs
10,000
60,000 $ 44 yrs
8,000
52,000 $ 40 yrs
Feature Scaling
70,000 $ 45 yrs
10,000 1
60,000 $ 44 yrs
8,000
52,000 $ 40 yrs
Feature Scaling
70,000 $ 45 yrs
10,000 1
60,000 $ 44 yrs
8,000 4
52,000 $ 40 yrs
Feature Scaling
70,000 $ 45 yrs
10,000 1
60,000 $ 44 yrs
8,000 4
52,000 $ 40 yrs
Feature Scaling
Feature Scaling
70,000 $ 45 yrs
60,000 $ 44 yrs
52,000 $ 40 yrs
Feature Scaling
1 45 yrs
0.444 44 yrs
0 40 yrs
Feature Scaling
1 1
0.444 0.75
0 0
Feature Scaling
1 1
0.444 0.75
0 0
Libraries
Libraries
Libraries
Plotting graphs
Libraries
Plotting graphs
Customizing plots
Libraries
Plotting graphs
Customizing plots
Visualizing data trends
Libraries
Libraries
Data manipulation
Libraries
Data manipulation
Data manipulation
dataset = pd.read_csv(“Data.csv”)
dataset = pd.read_csv(“Data.csv”)
Importing the Dataset
dataset = pd.read_csv(“Data.csv”)
Variable Name
Importing the Dataset
dataset = pd.read_csv(“Data.csv”)
dataset = pd.read_csv(“Data.csv”)
dataset = pd.read_csv(“Data.csv”)
x = dataset.iloc[:, :-1].values
Seperating the variables
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
Seperating the variables
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
imputer.fit(X[:, 1:3])
Replacing the missing values
imputer.fit(X[:, 1:3])
fit() calculates the mean for the specified columns (in this case,
columns 1 and 2, as slicing starts at 0) where missing values are
found.
Replacing the missing values
Replacing the missing values
imputer.fit(X[:, 1:3])
Replacing the missing values
imputer.fit(X[:, 1:3])
The indices of the columns to be transformed (in this case, [0], which indicates the
first column of your dataset, presumably containing the country names).
Encoding Variables
ct = ColumnTransformer(transformers=
[('encoder', OneHotEncoder(), [0])],
remainder='passthrough')
The indices of the columns to be transformed (in this case, [0], which indicates the
first column of your dataset, presumably containing the country names).
remainder='passthrough': This parameter specifies that any columns not specified in the
transformers should be passed through unchanged. This means that after applying the
OneHotEncoder, the other columns (if any) in your dataset will remain intact.
Encoding Variables
Encoding Variables
X = np.array(ct.fit_transform(X))
Encoding Variables
X = np.array(ct.fit_transform(X))
X = np.array(ct.fit_transform(X))
USA 1 3
Canada 2 4
USA 1 2
Mexico 3 1
Encoding Variables
1 0 0 1 3
0 1 0 2 4
1 0 0 1 2
0 0 1 3 1
Encoding Variables
Encoding Variables
label_encoder = LabelEncoder()
Importing the Dataset
label_encoder = LabelEncoder()
y = le.fit_transform(y)
Encoding Variables
y = le.fit_transform(y)
y = le.fit_transform(y)
y = le.fit_transform(y)