Machine Learning Notes
Machine Learning Notes
2. Data Selection: Collect and prepare all of the relevant data for from dataset used in
machine learning.
4. Exploratory Data Analysis: Analysis of data. Find hidden patterns in the dataset.
10. Model Evaluation: Model evaluation aims to estimate the generalization accuracy
of a model on future (unseen/out-of-sample) data.
11. Model Deployment: The process of taking a trained ML model and making its
predictions available to users or other systems is known as deployment.
Basic Terminologies:
Feature matrix/Data Matrix:
Features/Attributes:
Columns in a dataset
Rows in a dataset
Dataset:
Dependent/Output(y-axis) variable:
Independent/Input(x-axis)variable:
Target:
Types of Data:
Continuous variables- Always numeric, continuous and infinite, eg: height, score
VLOOKUP() in Excel:
VLOOKUP()-merging various tables together, fetching data from multiple tables.
BIVARIATE ANALYSIS:
numeric vs numeric
categoric vs categoric
numeric vs categoric
MULTIVARIATE ANALYSIS:
Upper limit=Q3+1.5IQR
Lower limit=Q1-1.5IQR
Q3-Q1=IQR
QUARTILE DEVIATION=IQR/2
Frequency table
-Divide in form particular ranges
-Frequency(data,classes)
-returns arrays
0=no correlation
R-square is the square of correlation
eg: each state and each gender their average height ,weight
Imputation: filling out missing data; using average of a column/median/mode of the data;
if there is col where 70 to 80% NA,then you fill in data, dont use for model
Outliers: Anything below or above the lower and upper limits; UL=Q3+1.5Q1
(X-min)/(max+min)
X-value to be normalized
min(of the X's column)
max(X's column)
max+min>x-min
Standarization:
Regression,Linear regression,correlation