100% found this document useful (2 votes)

429 views5 pages

Machine Learning Notes: 2. All The Commands For Eda

This document contains notes on machine learning concepts and processes. It outlines steps for exploratory data analysis, including handling missing values, outliers, and feature engineering. It also discusses preprocessing such as scaling, encoding categorical data, and splitting data into training and test sets. Model building is covered with examples of linear regression, including fitting a model to training data and making predictions on test data.

Uploaded by

naveen katta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

429 views5 pages

Machine Learning Notes: 2. All The Commands For Eda

Uploaded by

naveen katta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Machine Learning Notes

1. All the Import Modules Commands :

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

2. All the commands for Eda :

df.isna() / df.isna().sum()
df.info()
df.describe()
df.dropna( axis = 0,1 ) #0 for row and 1 for column
df.fillna()

 To calculate mean :-
df['column_name'].mean()

 To fill missing values by mean :-

x = df['column_name'].mean()
df['column_name'].fillna(x, inplace=True)

 To read a csv file :-

df = pd.read_csv('cars.csv')
df["column_name"].unique()
df["column_name"].value_counts()

 To replace a string by nan value :-

df['column_name'].replace("string",np.nan,inplace =True)
df['column_name'] = df['column_name'].astype("float")

 To create a new df with specific data type :-

# df_cat / df_num = df with categorical / numerical data
df_cat = df.select_dtypes(object)
df_num = df.select_dtypes(['int64','float64'])
 Steps to handle missing values :
#step1 - use replace
df['column_name'].replace("string",np.nan,inplace =True)

#step2 - change the datatype to float
df['column_name'] = df['column_name'].astype("float")

#step3 - calculate the mean for the cols
x = df['column_name'].mean()

#step4 - use fillna
df['column_name'].fillna(x, inplace=True)

 Label Encoder :
from sklearn.preprocessing import LabelEncoder

for col in df_cat:
le=LabelEncoder()
df_cat[col] = le.fit_transform(df_cat[col])

 To drop columns and rows :

df.drop('column_name', axis = 1) #for a single column
df.drop(['column_name','column_name'],axis=1) #multiple
df.drop(index_number) #to drop a Row

 To handle outliers :
#Step1-: Make boxplot with two variable
Eg :- sns.boxplot(data=df,x='price',y='make')

#Step2-: Filter out the outliers
Eg :- df[(df['make']=='dodge') & (df['price']>10000)]

#Step3-: Drop the outliers
Eg :- df.drop(29,inplace=True)

 Feature engineering : It is used to reduce the columns / features in the

data frame. Eg : if a data set has height and width
column ,we can create a new column = area ; a=l*b
and then remove height and width columns .
 Skewness and handling Skewness :
from scipy.stats import skew

To find skewness of a column :

skew(df_num['column_name'])

Using for loop & plotting graph :

for col in df_num:
print(col)
print(skew(df_num[col]))

   plt.figure()
   sns.distplot(df_num[col])
   plt.show()

#to find correlation
df_num.corr()
sns.heatmap(df_num.corr(), annot=True)

WE SHOULD NOT REMOVE THE SKEWNESS FOR THE COLUMN WHICH HAS
VERY HIGH CO-RELATION WITH TARGET, BECAUSE IF WE DO THAT THEN
THEIR CO-RELATION WITH THE TARGET WILL ALSO BE CHANGE.
ALSO NEVER REMOVE SKEWNESS OF A NEGATIVE COLUMNS , IT WILL GIVE
YOU A NAN VALUE.

 To Handle Skewness either find the Square root or log of that

column :
df_num['column_name']= np.sqrt(df_num['column_name'])

 Scaling :-
1. MinMax Scaler
from sklearn.preprocessing import MinMaxScaler
for col in df_new:
ms = MinMaxScaler()
df_new[col]=ms.fit_transform(df_new[[col]])

2. Standard Scaler
from sklearn.preprocessing import StandardScaler
for col in df_new:
sc = StandardScaler()
df_new[col]=sc.fit_transform(df_new[[col]])

 Requirements for working with data in Sklearn :-

 Feature and response should be seperated objects

 Feature and response should be Numeric
 Feature and response should be numpy array
 Feature and response should have specific shape (2D)

x = df.iloc[:,:-1].values #Features -> independent Variable
y = df.iloc[:,-1].values # Response-> dependent variable

 Taking care of missing data :-

from sklearn.impute import SimpleImputer

#step1: define the missing value & strategy
si = SimpleImputer(missing_values=np.nan, strategy='mean'
)

#step2: select the col that has missing values
si.fit(x[:,1:3])

#step3: fill the value using transform method to selected
cols and save it back
x[:,1:3] = si.transform(x[:,1:3])

 Encoding categorical data ( One Hot Encoder ) : -

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

ct = ColumnTransformer(transformers= [('encoder',
OneHotEncoder(), [0])], remainder=' passthrough ')

#selecting and apply change at the same time
x = np.array(ct.fit_transform(x))

 Splitting the dataset into the training set and test set :-
from sklearn.model_selection import train_test_split

xtrain, xtest, ytrain, ytest = train_test_split(x,y,
test_size=0.2, random_state = 1)

 Feature Scaling :-
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
xtrain[:,3:] = sc.fit_transform(xtrain[:,3:])
xtest[:,3:] = sc.fit_transform(xtest[:,3:])

 Linear regression model :-

#step 1-: Select a model from sklearn
from sklearn.linear_model import LinearRegression

#step 2 -: Create an object of your model
linreg = LinearRegression()

#step 3 -: Train your model
linreg.fit(xtrain, ytrain)

#step 4: Predict the value
ypred = linreg.predict(xtest)

Machine Learning Unit 1
100% (7)
Machine Learning Unit 1
112 pages
Charles - Shirley Jackson - Analysis
100% (1)
Charles - Shirley Jackson - Analysis
2 pages
Question Bank - Machine Learning (Repaired)
100% (1)
Question Bank - Machine Learning (Repaired)
78 pages
The 8 Basic Statistics Concepts For Data Science - KDnuggets
No ratings yet
The 8 Basic Statistics Concepts For Data Science - KDnuggets
13 pages
Unit 1 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Machine Learning - WWW - Rgpvnotes.in
23 pages
Top 100 Machine Learning Questions With Answers For Interview PDF
100% (3)
Top 100 Machine Learning Questions With Answers For Interview PDF
48 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Metrode WPS Superduplex
100% (4)
Metrode WPS Superduplex
4 pages
49 Machine Learning
No ratings yet
49 Machine Learning
300 pages
Deep Learning
No ratings yet
Deep Learning
43 pages
Tensorflow 2.0 Cheat Sheet: Some Pre-Requisites TF Core Learning Algorithms Working With Keras Models
No ratings yet
Tensorflow 2.0 Cheat Sheet: Some Pre-Requisites TF Core Learning Algorithms Working With Keras Models
2 pages
Deep Learning Interview Questions
No ratings yet
Deep Learning Interview Questions
17 pages
Machine Learning Summarized Notes 1660762916
No ratings yet
Machine Learning Summarized Notes 1660762916
111 pages
ML Notes
100% (2)
ML Notes
125 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
27 pages
Scikit - Notes ML
100% (2)
Scikit - Notes ML
12 pages
Machine Learning
100% (5)
Machine Learning
56 pages
Machine Learning Using Python PDF
No ratings yet
Machine Learning Using Python PDF
2 pages
Chapter 5.3-Mulitple Linear Regression
No ratings yet
Chapter 5.3-Mulitple Linear Regression
26 pages
Ai Cheat Sheet Machine Learning With Python Cheat Sheet
100% (4)
Ai Cheat Sheet Machine Learning With Python Cheat Sheet
2 pages
Machine Learning
100% (2)
Machine Learning
211 pages
Unit - 5.1 - Introduction To Machine Learning
No ratings yet
Unit - 5.1 - Introduction To Machine Learning
38 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
15 pages
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
Supervised Learning 1 PDF
100% (1)
Supervised Learning 1 PDF
162 pages
ML First Unit
No ratings yet
ML First Unit
70 pages
A Novel Adoption of LSTM in Customer Touchpoint Prediction Problems Presentation 1
100% (1)
A Novel Adoption of LSTM in Customer Touchpoint Prediction Problems Presentation 1
73 pages
Machine Learning
100% (3)
Machine Learning
2,520 pages
Advanced Deep Learning Questions - ChatGPT
No ratings yet
Advanced Deep Learning Questions - ChatGPT
13 pages
Deep Learning Notes
100% (1)
Deep Learning Notes
16 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
110 pages
Clustering (Unit 3)
100% (2)
Clustering (Unit 3)
71 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
Unit-V Deep Learning Techniques
100% (1)
Unit-V Deep Learning Techniques
31 pages
21 Machine Learning Using Scikit Learn Ipynb Colaboratory PDF
100% (1)
21 Machine Learning Using Scikit Learn Ipynb Colaboratory PDF
23 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
35 pages
Hyperparameter Tuning in XGBoost Using Genetic Algorithm
100% (1)
Hyperparameter Tuning in XGBoost Using Genetic Algorithm
11 pages
Python DataScience Cheat-Sheet
100% (1)
Python DataScience Cheat-Sheet
7 pages
Top 100 ML Interview Q&A
100% (1)
Top 100 ML Interview Q&A
39 pages
40 Interview Questions Asked at Startups in Machine Learning - Data Science
100% (3)
40 Interview Questions Asked at Startups in Machine Learning - Data Science
33 pages
Notes On Machine Learning
No ratings yet
Notes On Machine Learning
2 pages
L2 - Machine Learning Process
No ratings yet
L2 - Machine Learning Process
17 pages
Ebook Deep Learning Objective Type Questions
No ratings yet
Ebook Deep Learning Objective Type Questions
102 pages
Deep Learning CNN
100% (1)
Deep Learning CNN
22 pages
Machine Learning Mini-Project Report
No ratings yet
Machine Learning Mini-Project Report
26 pages
Machine Learning Notes
100% (4)
Machine Learning Notes
134 pages
Bias Varience Trade Off
100% (2)
Bias Varience Trade Off
35 pages
I Am Sharing 'Interview' With You
100% (3)
I Am Sharing 'Interview' With You
65 pages
Machine Learning Bits
100% (2)
Machine Learning Bits
28 pages
Unit I
No ratings yet
Unit I
10 pages
Algorithms: K Nearest Neighbors
No ratings yet
Algorithms: K Nearest Neighbors
16 pages
Machine Learning With Python
100% (2)
Machine Learning With Python
41 pages
Deep Learning: Prof:Naveen Ghorpade
No ratings yet
Deep Learning: Prof:Naveen Ghorpade
43 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
38 pages
Python Full
100% (1)
Python Full
59 pages
Unit I Notes Machine Learning Techniques 1
No ratings yet
Unit I Notes Machine Learning Techniques 1
21 pages
Symbolic Machine Learning: M.S.Kaysar, M.Engg Cse, Iub
100% (2)
Symbolic Machine Learning: M.S.Kaysar, M.Engg Cse, Iub
112 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Machine Learning with Python: Design and Develop Machine Learning and Deep Learning Technique using real world code examples
From Everand
Machine Learning with Python: Design and Develop Machine Learning and Deep Learning Technique using real world code examples
Abhishek Vijayvargia
No ratings yet
Alkaline Battery Import Sample...
No ratings yet
Alkaline Battery Import Sample...
16 pages
Design and Analysis of Algorithm
100% (1)
Design and Analysis of Algorithm
20 pages
Crib Wall GE441 Lecture6 2
No ratings yet
Crib Wall GE441 Lecture6 2
19 pages
Côté 2014 Emotional Intelligence in Organizations
No ratings yet
Côté 2014 Emotional Intelligence in Organizations
32 pages
Inglés - 1er Año A - 1 l01-l05
No ratings yet
Inglés - 1er Año A - 1 l01-l05
40 pages
CCSDecision Makingv61
No ratings yet
CCSDecision Makingv61
1 page
Psyteachr Github Io Stat-Models-V1 Correlation-And-Regression Html#exercises
No ratings yet
Psyteachr Github Io Stat-Models-V1 Correlation-And-Regression Html#exercises
31 pages
Procurement Methods Procedures PDF
No ratings yet
Procurement Methods Procedures PDF
48 pages
Chapter 4 Airlift Bioreactor
No ratings yet
Chapter 4 Airlift Bioreactor
7 pages
Guide For Research Proposal by FDSM
No ratings yet
Guide For Research Proposal by FDSM
33 pages
I-MACE PCCOE Brochure
No ratings yet
I-MACE PCCOE Brochure
2 pages
Copyright Notes
No ratings yet
Copyright Notes
7 pages
المحاضرة 1-1
No ratings yet
المحاضرة 1-1
6 pages
Developmental Reading EXAM
100% (1)
Developmental Reading EXAM
4 pages
Unit 1&3
No ratings yet
Unit 1&3
18 pages
Unit 3 Estimation of Brickwork in Single Storey Buildings Stone Masonry General Specifications of RCC Work
No ratings yet
Unit 3 Estimation of Brickwork in Single Storey Buildings Stone Masonry General Specifications of RCC Work
46 pages
Explorer Ella's Magic Forest
100% (1)
Explorer Ella's Magic Forest
12 pages
Introduction To Administrative Theory
100% (1)
Introduction To Administrative Theory
10 pages
MRUA
No ratings yet
MRUA
4 pages
Is 2751 1979 PDF
No ratings yet
Is 2751 1979 PDF
41 pages
Solving MLS
No ratings yet
Solving MLS
2 pages
An Introduction To Canned Motor Pumps
100% (1)
An Introduction To Canned Motor Pumps
4 pages
Friction and Wear Behavior of Laser-Sintered Iron Silicon Carbide Composites J Mat Proc Tech PDF
No ratings yet
Friction and Wear Behavior of Laser-Sintered Iron Silicon Carbide Composites J Mat Proc Tech PDF
8 pages
Perform Physical Growth
100% (2)
Perform Physical Growth
25 pages
YOLO Algorithm Implementation For Real Time Object Detection and Tracking-1
No ratings yet
YOLO Algorithm Implementation For Real Time Object Detection and Tracking-1
6 pages
77450600000210
No ratings yet
77450600000210
7 pages
Applications of Automata in Electronic Machines and Android Games (Finite Automata)
No ratings yet
Applications of Automata in Electronic Machines and Android Games (Finite Automata)
5 pages
Controlador Tracer 50A 60A Epever
No ratings yet
Controlador Tracer 50A 60A Epever
2 pages