Machine Learning Ess - Week 1-4week
Machine Learning Ess - Week 1-4week
Week 1-4
T stands for Theory, P stand for Pythonic and D stand for Discussion
Please make notes while discussions.
Note : Source of all images with open License from the Internet
Short Quiz
Define list?
Define Pandas.
Define Numpy.
Define visualization?
Essential Terminology of Machine learning (Discussion) (1)
1. Data
a. Information
b. Data Science
2. AI
a. Machine learning
b. Types of learning
Essential Terminology of Machine learning (Discussion) (2)
1. Statistical testing
a. Sample
b. Population
2. Evaluation
a. Testing Data
b. Training Data
c. Accuracy
d. Precision
Essential Terminology of Machine learning (Discussion) (3)
1. Supervised learning
a. Classification
b. Regression
2. Unsupervised
a. Clustering
b. Associations
3. Reinforcement learning
Essential Terminology of Machine learning (Discussion) (1)
1. Application
a. Stock Market
b. Sentiment Analysis
c. Questions and Answering System
2. Deep learning
a. MLP
b. CNN
c. RNN
d. LSTM
Understand Your Data With Descriptive Statistics
7 steps For Descriptive Data (Week 2-3)
● Quick Glance at (Data)
● Review Dimensions
● Review data types
● Summarize the distribution
● Summarize data description
● Understanding the relationship
● Review the skew
Time for pythonic (1)
Please visit first Pdf (Descriptive Analysis)
Dimensions
Covers 8 properties
● Count
● Mean
● Standard Deviations
● Minimum Value
● Maximum Value
● 25th percentile
● 50th percentile
● 75th percentile
Time for pythonic (4)
Class distribution
● Why?
● Data imbalance! Do you agree ?
that is assumed to be
Histograms
data.hist()
Change kind=”Density”
Multivariate PLots (1)
● Correlations Matrix
plot
○ Correlation is an
indication of how
related the changes are
between variables.
■ Inversely
proportional
■ Directly
proportional
■ Poor performance in
linear and logistic
regression
Multivariate PLots (2)
● Scatter Plot Matrix
○ A scatter plot shows the
relationship between two variables
as dots in dimensions, one xix for
each attribute
○ Attributes with structured
relationships may also be correlated
and good candidates for removal from
your dataset
Time for pythonic (11)
For Correlation matrix
○ fig = pyplot.figure()
○ ax = fig.add_subplot(111)
○ cax = ax.matshow(correlations, vmin=-1, vmax=1)
○ fig.colorbar(cax)
○ M=pd.Scatter_matrix(data)
Prepare Data For machine learning
Data Preparations (D)
● Rescaling Data
● Standardize Data
● Normalize Data
● Binarize Data
Why????????
Time for pythonic (10)
Rescaling Data
Binarize Data
● Can convert data using the binary threshold.
● All the values above the threshold will be 1.
● All the values below the threshold will be 0.
● Can be implemented by using the Binarizer class.
Small Project on Linear Regression
Types of Probabilistic Models
Types of Regression Models
Linear Regression
Time for pythonic (13)
● Relationship between one dependent variable and
explanatory variable(s)
● Use equation to set up relationship
● Numerical Dependent (Response) Variable
● 1 or More Numerical or Categorical Independent
(Explanatory) Variables
● Used Mainly for Prediction & Estimation
Regression Modeling steps
● Hypothesize Deterministic Component
○ Estimate Unknown Parameters
● Specify Probability Distribution of Random Error Term
○ Estimate Standard Deviation of Error
● Evaluate the fitted Model
● Use Model for Prediction & Estimation
Linear Equations
Linear Regression Model
Y=B_constant+Bx_variable+e
Observed Value
e=Random
Error
Y=B_constant+
BX_variable
Evaluation Techniques (1)
● Mean Absolute Error
(200 Words)
Time for pythonic (14)
Ten baseline variables, age, sex, body mass index, average
blood pressure, and six blood serum measurements were
obtained for each of n = 442 diabetes patients, as well as
the response of interest, a quantitative measure of disease
progression one year after baseline.
Ridge Regression
Lasso Regression
Elastic Regression