0% found this document useful (0 votes)

4 views65 pages

Machine Learning

It will is am i j k

Uploaded by

cse.210840131062

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views65 pages

Machine Learning

It will is am i j k

Uploaded by

cse.210840131062

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

Machine Learning(3170724 )

UNIT 2:

Preparing to Model
Daxa Patel
Assistant Professor
COMPUTER SCIENCE & ENGINEERING
UNIT 2:

Preparing to Model
Outlines
■ Machine Learning activities
■ Types of data in Machine Learning
■ Structures of data
■ Data quality and remediation
■ Data Pre-Processing:
◻ Dimensionality reduction
◻ Feature subset selection.

Introduction to Machine Learning © Daxa Patel 3

Machine Learning activities
■ Following are the typical preparation activities done once the input data
comes into the machine learning system:
■ Understand the type of data in the given input data set.
■ Explore the data to understand the nature and quality.
■ Explore the relationships amongst the data elements,
◻ e.g. inter-feature relationship.
■ Find potential issues in data.
■ Do the necessary remediation,
◻ e.g. impute missing data values, etc., if needed.
■ Apply pre-processing steps, as necessary.

Introduction to Machine Learning © Daxa Patel 4

Machine Learning activities
■ Once the data is prepared for modelling, then the learning tasks start off. As a
part of it, do the following activities:
◻ The input data is first divided into parts – the training data and the test data (called
holdout). This step is applicable for supervised learning only.
◻ Consider different models or learning algorithms for selection.
◻ Train the model based on the training data for supervised learning problem and apply to
unknown data. Directly apply the chosen unsupervised model on the input data for
unsupervised learning problem.

Introduction to Machine Learning © Daxa Patel 5

Process of Machine Learning

Introduction to Machine Learning © Daxa Patel 6

Process of Machine Learning

Introduction to Machine Learning © Daxa Patel 7

Basic Types of Data in ML
A data set is a collection of related
information or records. Each row of a data
set is called a record

It relates to
It provides information
information about the about the
quality of an object quantity of an
or information which object – hence it
cannot be measured can be
measured.

Introduction to Machine Learning © Daxa Patel 8

Types of Qualitative Data
■ Nominal data is one which has no numeric value, but a named value. It is
used for assigning named values to attributes. Nominal values cannot be
quantified.
◻ Examples of nominal data are
1. Blood group: A, B, O, AB, etc.
2. Nationality: Indian, American, British, etc.
3. Gender: Male, Female, Other

9
Introduction to Machine Learning © Daxa Patel
Types of Qualitative Data
■ Ordinal data, in addition to possessing the properties of nominal data, can
also be naturally ordered.
■ They can be arranged in a sequence of increasing or decreasing value so
that we can say whether a value is better than or greater than another value.
◻ Examples of ordinal data are
1. Customer satisfaction: ‘Very Happy’, ‘Happy’, ‘Unhappy’, etc.
2. Grades: A, B, C, etc.
3. Hardness of Metal: ‘Very Hard’, ‘Hard’, ‘Soft’, etc.

10
Introduction to Machine Learning © Daxa Patel
Types of Quantitative Data
■ Interval data is numeric data for which not only the order is known, but the
exact difference between values is also known.
■ An ideal example of interval data is Celsius temperature.
■ Interval data do not have something called a ‘true zero’ value.
◻ For example, there is nothing called ‘0 temperature’ or ‘no temperature’.
■ For that reason, for interval data, the central tendency can be measured by
mean, median, or mode. Standard deviation can also be calculated.
■ Ratio data represents numeric data for which exact value can be
measured. Absolute zero is available for ratio data.
◻ Examples of ratio data include height, weight, age, salary, etc.

11
Introduction to Machine Learning © Daxa Patel
Attribute types based on a no of values
assigned
■ The attributes can be either discrete or continuous
■ Discrete attributes can assume a finite or countably infinite number of values.
■ Nominal attributes such as roll number, street number, pin code, etc. can
have a finite number of values
■ whereas Numeric attributes such as count, rank of students, etc. can have
countably infinite values.
■ A special type of discrete attribute which can assume two values only is
called binary attribute.
■ Examples of binary attribute include male/ female, positive/negative, yes/no,
etc.

12
Introduction to Machine Learning © Daxa Patel
Attribute types based on a no of values
assigned
■ Continuous attributes can assume any possible value which is a real number.
■ Examples of continuous attribute include length, height, weight, price, etc.
■ In general, nominal and ordinal attributes are discrete.
■ Interval and ratio attributes are continuous,barring a few excepions, e.g.
‘count’ attribute.

13
Introduction to Machine Learning © Daxa Patel
Exploring Structure of Data
■ We need to understand that in a data set, which of the attributes are numeric
and which are categorical in nature.
■ This is because, the approach of exploring numeric data is different than the
approach of exploring categorical data.
■ Hence, these attributes are continuous in nature.
■ The only remaining attribute ‘car name’ is of type categorical, or more
■ specifically nominal. This data set is regarding prediction of fuel consumption
in miles per gallon, i.e. the numeric attribute ‘mpg’ is the target attribute.
■ With this understanding of the data set attributes, we can start exploring the
numeric and categorical attributes separately.

14
Introduction to Machine Learning © Daxa Patel
Exploring Structure of Data

15
Introduction to Machine Learning © Daxa Patel
Exploring numerical data

■ There are two most effective mathematical plots to explore

numerical data
■ box plot and
■ Histogram.
■ To understand the nature of numeric variables, we can apply
the measures of central tendency of data,
◻ i.e. mean and median.
■ In statistics, measures of central tendency help us understand
the central point of a set of data.
■ Mean, by definition, is a sum of all data values divided by the
count of data elements.
16
Introduction to Machine Learning © Daxa Patel
Exploring numerical data

■ Mean, by definition, is a sum of all data values divided by the

count of data elements.
■ For example, mean of a set of observations – 21, 89, 34, 67,
and 96 is

■ If the above set of numbers represents marks of 5 students in

a class, the mean marks, or the falling in the middle of the 17
range is 61.4.
Introduction to Machine Learning © Daxa Patel
Exploring numerical data

■ Median, on contrary, is the value of the element appearing in

the middle of an ordered list of data elements.
■ If we consider the above 5 data elements, the ordered list
would be –
■ 21, 34, 67, 89, and 96.
■ Since there are 5 data elements, the 3rd element in the
ordered list is considered as the median.
■ Hence, the median value of this set of data is 67.
■ If the above set of numbers represents marks of 5 students in
a class, the mean marks, or the falling in the middle of the
range is 61.4.
18
Introduction to Machine Learning © Daxa Patel
Exploring numerical data - Mean vs
Median

19
Introduction to Machine Learning © Daxa Patel
Exploring numerical data - Mean vs
Median
■ for the attributes such as ‘mpg’, ‘weight’, ‘acceleration’, and ‘model.year’ the
deviation between mean and median is not significant which means the
chance of these attributes having too many outlier values is less.
■ However, the deviation is significant for the attributes ‘cylinders’,
‘displacement’ and ‘origin’. So, we need to further drill down and look at some
more statistics for these attributes.
■ Also, there is some problem in the values of the attribute horsepower’
because of which the mean and median calculation is not possible.
■ For that reason, the attribute ‘horsepower’ is not treated as a numeric.
■ So we have to first remediate the missing values of the attribute
‘horsepower’before being able to do any kind of exploration.

20
Introduction to Machine Learning © Daxa Patel
Understanding data spread
■ To drill down more, we need to look at the entire range of values of the
attributes, though not at the level of data elements as that may be too vastto
review manually.
■ So we will take a granular view of the data spread in the form of
1. Dispersion of data
2. Position of the different data values
■ Consider the data values of two attributes:
◻ Attribute 1 values : 44, 46, 48, 45, and 47
◻ Attribute 2 values : 34, 46, 59, 39, and 52

21
Introduction to Machine Learning © Daxa Patel
Understanding data spread
■ To measure the extent of dispersion of a data,or to find out how much the
different values of a data are spread out, the variance of the data is
measured.
■ The variance of a data is measured using the formula given below:

X is the variable or attribute whose variance is to be measured and n

is the number of observations or values of variable x.
22
Introduction to Machine Learning © Daxa Patel
Understanding data spread

■ Standard deviation of a data is measured as follows:

■ Larger value of variance or standard deviation indicates more

dispersion in the data and vice versa.
■ In the above example, let’s calculate the variance of attribute
1 and that of attribute 2.
■ For attribute 1,

23
Introduction to Machine Learning © Daxa Patel
24
Introduction to Machine Learning © Daxa Patel
Understanding data spread
■ So it is quite clear from the measure that attribute 1 values are quite
concentrated around the mean while attribute 2 values are extremely spread
out

25
Introduction to Machine Learning © Daxa Patel
Measuring Data Value Position
■ There are specific variants of quantile, the one dividing data set into four
parts being termed as quartile.
■ Another such popular variant is percentile, which divides the data set into 100
parts.
■ Quantiles refer to specific points in a data set which divide the data set into
equal parts or equally sized quantities.
■ However, we still cannot ascertain whether there is any outlier present in the
data.
■ For that, we can better adopt some means to visualize the data.
■ Box plot is an excellent visualization medium for numeric data.
■ Histogram is another plot which helps in effective visualization

26
Introduction to Machine Learning © Daxa Patel
Box plots(Box & whisker plot)
■ The five number summary is another name for the visual representation of
the box and whisker plot.
■ The five number summary consist of :
◻ The median ( 2nd quartile) – Q1
◻ The 1st quartile – Q2
◻ The 3rd quartile – Q3
◻ The maximum value in a data set – Max

27
Introduction to Machine Learning © Daxa Patel
Box plots(Box & whisker plot)
■ IQR(Inter – Quartile Range) - Q3 - Q1
■ Outliers : lower whisker can extend maximum till (Q1– 1.5 × IQR)
■ the actual length of the upper whisker will also depend on the highest data
value that falls within (Q3 + 1.5 × IQR)
■ The data values coming beyond the lower or upper whiskers are the ones
which are of unusually low or high values respectively. These are the outliers,
which may deserve special consideration.
■ Quartile divide the data into four equal part
■ Check these two values (min,max) are outlier or not
■ [Q1– 1.5 × IQR, Q3 + 1.5 × IQR] - value outside these range in your dataset
are Outlier

28
Introduction to Machine Learning © Daxa Patel
Box plots(Box & whisker plot)

29
Introduction to Machine Learning © Daxa Patel
Box plots(Box & whisker plot)
■ Step 1 - Find the median.
■ Remember, the median is the middle value in a data set.
18, 27, 34, 52, 54, 59, 61, 68, 78, 82, 85, 87, 91, 93, 100

68 is the median of this data set.

■ Step 2 – Find the lower quartile.
■ The lower quartile is the median of the data set to the left of 68.
(18, 27, 34, 52, 54, 59, 61,) 68, 78, 82, 85, 87, 91, 93, 100

52 is the lower quartile

30
Introduction to Machine Learning © Daxa Patel
Box plots(Box & whisker plot)
■ Step 3 – Find the upper quartile.
■ The upper quartile is the median of the data set to the right of 68.
18, 27, 34, 52, 54, 59, 61, 68, (78, 82, 85, 87, 91, 93, 100)

87 is the upper quartile

■ Step 4 – Find the maximum and minimum values in the set.
■ The maximum is the greatest value in the data set.
■ The minimum is the least value in the data set.
18, 27, 34, 52, 54, 59, 61, 68, 78, 82, 85, 87, 91, 93, 100

18 is the minimum and 100 is the maximum.

31
Introduction to Machine Learning © Daxa Patel
Box plots(Box & whisker plot)
■ Step 5 – Find the inter-quartile range (IQR).
■ The inter-quartile (IQR) range is the difference between the upper and lower
quartiles.
◻ Upper Quartile = 87
◻ Lower Quartile = 52
◻ 87 – 52 = 35
◻ 35 = IQR
■ Organize the 5 number summary
◻ Q2-Median – 68
◻ Q1-Lower Quartile – 52
◻ Q3-Upper Quartile – 87
◻ Max – 100
◻ Min – 18, 91, 93, 100
32
Introduction to Machine Learning © Daxa Patel
Box plots(Box & whisker plot)
■ Step 5 – Find the inter-quartile range (IQR).
■ The inter-quartile (IQR) range is the difference between the upper and lower
quartiles.
◻ Upper Quartile = 87
◻ Lower Quartile = 52
◻ 87 – 52 = 35
◻ 35 = IQR
■ Find the outlier Range using [Q1– 1.5 × IQR, Q3 + 1.5 × IQR]
◻ [52 – 1.5 × 35 , 87 + 1.5 × 35] = [-0.5,139.5]
◻ Here, We have zero element which have a value outside above range So outliers
are not available

33
Introduction to Machine Learning © Daxa Patel
Box plots(Box & whisker plot)
■ Graphing The Data
■ Notice, the Box includes the lower quartile, median, and upper quartile.
■ The Whiskers extend from the Box to the max and min.

34
Introduction to Machine Learning © Daxa Patel
Practice
■ Use the following set of data to create the 5 number summary.
3, 7, 11, 11, 15, 21, 23, 39, 41, 45, 50, 61, 87, 99, 220

35
Introduction to Machine Learning © Daxa Patel
Histogram
■ Histogram is another plot which helps in effective visualization of numeric
attributes.
■ It helps in understanding the distribution of a numeric data into series of
intervals, also termed as ‘bins’.
■ The important difference between histogram and box plot is
◻ The focus of histogram is to plot ranges of data values (acting as ‘bins’), the number of
data elements in each range will depend on the data distribution. Based on that, the size
of each bar corresponding to the different ranges will vary.
◻ The focus of box plot is to divide the data elements in a data set into four equal portions,
such that each portion contains an equal number of data elements.

37
Introduction to Machine Learning © Daxa Patel
Histogram Example
Bucket Frequency
AGES
1 29 0-9 6
3 19 10-19
3
27 22
20-29
5 51 5
63 58 30-39
1
26 9
40-49
25 42 2
18 6 50-59
2
16 4
60-69
45 1
38
Introduction to Machine Learning © Daxa Patel
Scatter plot

40
Introduction to Machine Learning © Daxa Patel
Exploring Categorical Data
■ Mode of a data is the data value which appears most often.
■ In context of categorical attribute, it is the category which has highest number
of data values.
■ Since mean and median cannot be applied for categorical variables, mode is
the sole measure of central tendency.
■ An attribute may have one or more modes.
■ Frequency distribution of an attribute having single mode is called ‘unimodal’,
two modes are called ‘bimodal’ and multiple modes are called ‘multimodal’.

41
Introduction to Machine Learning © Daxa Patel
Exploring relationship between variables
■ Till now we have been exploring single attributes in isolation.
■ One more important angle of data exploration is to explore relationship
between attributes.
■ There are multiple plots to enable us explore the relationship between
variables.
■ The basic and most commonly used plot is scatter plot.

42
Introduction to Machine Learning © Daxa Patel
Scatter plot
■ A scatter plot helps in visualizing bivariate relationships, i.e. relationship
between two variables.
■ It is a two-dimensional plot in which points or dots are drawn on coordinates
provided by values of the attributes.
■ For example,
◻ in a data set there are two attributes – attr_1 and attr_2.
◻ We want to understand the relationship between two attributes, i.e. with a change in
value of one attribute, say attr_1, how does the value of the other attribute, say attr_2,
changes.
■ As in a two dimensional plot, attr_1 is said to be the independent variable and
attr_2 as the dependent variable.

43
Introduction to Machine Learning © Daxa Patel
Scatter plot
■ For example, there is one data element which has a mpg of 37 for a
displacement of 250.
■ This record is completely different from other data elements having similar
displacement value but mpg value in the range of 15 to 25.
■ This gives an indication that of presence of outlier data values.
■ As you can see, in most of the cases, there is a significant relationship
between the attribute pairs. However, in some cases, e.g. between attributes
‘weight’ and ‘acceleration’, the relationship doesn’t seem to be very strong.

45
Introduction to Machine Learning © Daxa Patel
Two-way cross-tabulations
■ A cross-tab, very much like a scatter plot, helps to understand how much the
data values of one attribute changes with the change in data values of
another attribute.

46
Introduction to Machine Learning © Daxa Patel
Two-way cross-tabulations
■ Moving to the second cross-tab, it gives the number of 3, 4,5, 6, or 8 cylinder
cars in every region present in the sample data set.

47
Introduction to Machine Learning © Daxa Patel
Two-way cross-tabulations
■ The this cross-tab presents the number of 3, 4, 5, 6, or 8 cylinder cars every
year.

48
Introduction to Machine Learning © Daxa Patel
Two-way cross-tabulations
■ In cross-tab, i.e. the one showing relationship between attributes ‘model.
year’ and ‘origin’ help us understand the number of vehicles per year in each
of the regions North America, Europe, and Asia.
■ Looking at it in another way, we can get the count of vehicles per region over
the different years.
■ We may also want to create cross-tabs with a more summarized view like
have a cross-tab giving a number of cars having 4 or less cylinders and more
than 4 cylinders in each region or by the years. This can be done by rolling
up data values by the attribute ‘cylinder‘

49
Introduction to Machine Learning © Daxa Patel
Data quality
■ Success of machine learning depends largely on the quality of data.
■ A data which has the right quality helps to achieve better prediction accuracy,
in case of supervised learning
■ Looking at it in another way, we can get the count of vehicles per region over
the different years.
■ We may also want to create cross-tabs with a more summarized view like
have a cross-tab giving a number of cars having 4 or less cylinders and more
than 4 cylinders in each region or by the years. This can be done by rolling
up data values by the attribute ‘cylinder‘
■ problems:
◻ Certain data elements without a value or data with a missing value.
◻ Data elements having value surprisingly different from the other elements, which we term
as outliers.
50
Introduction to Machine Learning © Daxa Patel
Data quality
■ Incorrect sample set selection:
■ The data may not reflect normal or regular quality due to incorrect selection
of sample set.
■ For example, if we are selecting a sample set of sales transactions from a
festive period and trying to use that data to predict sales in future.
■ In this case, the prediction will be far apart from the actual scenario, just
because the sample set has been selected in a wrong time.

51
Introduction to Machine Learning © Daxa Patel
Data quality
■ Errors in data collection: resulting in outliers and missing values
■ In many cases, a person or group of persons are responsible for the
collection of data to be used in a learning activity.
■ In this manual process, there is the possibility of wrongly recording data
either in terms of value (say 20.67 is wrongly recorded as 206.7 or 2.067) or
in terms of a unit of measurement (say cm. is wrongly recorded as m. or
mm.).
■ This may result in data elements which have abnormally high or low value
from other elements. Such records are termed as outliers.

52
Introduction to Machine Learning © Daxa Patel
Data quality
■ Errors in data collection: resulting in outliers and missing values
■ It may also happen that the data is not recorded at all.
■ In case of a survey conducted to collect data, it is all the more possible as
survey responders may choose not to respond to a certain question.
■ So the data value for that data element in that responder’s record is missing.

53
Introduction to Machine Learning © Daxa Patel
Data remediation - Handling outliers
■ Data remediation is the process of cleansing, organizing and migrating
data
■ We will discuss how to handle outliers and missing values.
■ Remove outliers: If the number of records which are outliers is not many, a
simple approach may be to remove them.
■ Imputation: One other way is to (impute)represent the value with mean or
median or mode. The value of the most similar data element may also be
used for imputation.
■ Capping: For values that lie outside the 1.5|×| IQR limits, we can cap them
by replacing those observations
◻ observations that lie below the lower limit with the value of 5th percentile
◻ observations that lie above the upper limit, with the value of 95th percentile.

54
Introduction to Machine Learning © Daxa Patel
Data remediation - Handling outliers
■ If there is a significant number of outliers,
■ They should be treated separately in the statistical model.
■ In that case, the groups should be treated as two different groups, the model
should be built for both groups and then the output can be combined.

55
Introduction to Machine Learning © Daxa Patel
Data remediation - Handling missing
values
■ Records having a missing value of data elements
■ In case the proportion of data elements having missing values is within a
tolerable limit, a simple but effective approach is to remove the records
having such data elements.
■ In the case of Auto MPG data set, only in 6 out of 398 records, the value of
attribute ‘horsepower’ is missing. If we get rid of those 6 records, we will still
have 392 records, which is definitely a substantial number. So, we can very
well eliminate the records and keep working with the remaining data set.
■ However, this will not be possible if the proportion of records having data
elements with missing value is really high as that will reduce the power of
model because of reduction in the training data size.

56
Introduction to Machine Learning © Daxa Patel
Data remediation - Handling missing
values
■ Imputing missing values
■ Imputation is a method to assign a value to the data elements having missing
values.
■ Mean/mode/median is most frequently assigned value.
■ For quantitative attributes, all missing values are imputed with the mean,
median, or mode of the remaining values under the same attribute.
■ For qualitative attributes, all missing values are imputed by the mode of all
remaining values of the same attribute.
◻ ‘cylinders’ is the attribute which is logically most connected to ‘horsepower’ because with
the increase in number of cylinders of a car, the horsepower of the car is expected to
increase.

58
Introduction to Machine Learning © Daxa Patel
Data remediation - Handling missing
values
■ Estimate missing values
■ If there are data points similar to the ones with missing attribute values, then
the attribute values from those similar data points can be planted in place of
the missing value.
■ For finding similar data points or observations, distance function can be used.
◻ For example, let’s assume that the weight of a student having age 12 years and height 5
ft. is missing. Then the weight of any other student having age close to 12 years and
height close to 5 ft. can be assigned.

59
Introduction to Machine Learning © Daxa Patel
DATA PRE-PROCESSING
■ Dimensionality reduction
■ High-dimensional data sets need a high amount of computational space and
time.
■ At the same time, not all features are useful – they degrade the performance
of machine learning algorithms.
■ Most of the machine learning algorithms perform better if the dimensionality
of data set,
◻ i.e. the number of features in the data set, is reduced.
■ Dimensionality reduction helps in reducing irrelevance and redundancy in
features.
■ Also, it is easier to understand a model if the number of features involved in
the learning activity is less.
60
Introduction to Machine Learning © Daxa Patel
DATA PRE-PROCESSING
■ Dimensionality reduction
■ Dimensionality reduction refers to the techniques of reducing the
dimensionality of a data set by creating new attributes by combining the
original attributes.
■ The most common approach for dimensionality reduction is known as
Principal Component Analysis (PCA)
◻ PCA is a statistical technique to convert a set of correlated variables into a set of
transformed, uncorrelated variables called principal components.
◻ The principal components are a linear combination of the original variables. They are
orthogonal to each other.
◻ Since principal components are uncorrelated, they capture the maximum amount of
variability in the data.
◻ However, the only challenge is that the original attributes are lost due to the
transformation 61
Introduction to Machine Learning © Daxa Patel
DATA PRE-PROCESSING
■ Dimensionality reduction
■ Another commonly used technique which is used for dimensionality reduction
is Singular Value Decomposition (SVD).

62
Introduction to Machine Learning © Daxa Patel
DATA PRE-PROCESSING
■ Feature subset selection
■ Feature subset selection or simply called feature selection,both for
supervised as well as unsupervised learning, try to find out the optimal subset
of the entire feature set which significantly reduces computational cost
without any major impact on the learning accuracy.
■ It may seem that a feature subset may lead to loss of useful information as
certain features are going to be excluded from the final set of features used
for learning.
■ However, for elimination only features which are not relevant or redundant
are selected.

63
Introduction to Machine Learning © Daxa Patel
DATA PRE-PROCESSING
■ Feature subset selection
■ A feature is considered as irrelevant if it plays an insignificant role (or
contributes almost no information) in classifying or grouping together a set of
data instances.
■ All irrelevant features are eliminated while selecting the final feature subset.
A feature is potentially redundant when the information contributed by the
feature is more or less same as one or more other features.
■ Among a group of potentially redundant features, a small number of features
can be selected as a part of the final feature subset without causing any
negative impact to learn model accuracy.

ML Notes All
No ratings yet
ML Notes All
257 pages
Grade 1 Primary Science Worksheet
100% (1)
Grade 1 Primary Science Worksheet
8 pages
Function and Role of DHO
100% (1)
Function and Role of DHO
1 page
Andrew Norton Webber - Distilled Water
88% (8)
Andrew Norton Webber - Distilled Water
15 pages
2 Knowing Data & Visualization
No ratings yet
2 Knowing Data & Visualization
51 pages
AIML Chapter 4
No ratings yet
AIML Chapter 4
100 pages
Boiler Automation Using Programmable Logic Control Final
92% (37)
Boiler Automation Using Programmable Logic Control Final
30 pages
Opt3044 - Post RX BV - Part 1
No ratings yet
Opt3044 - Post RX BV - Part 1
28 pages
50 Studies Every Plastic Surgeon Should Know Full Download
No ratings yet
50 Studies Every Plastic Surgeon Should Know Full Download
411 pages
Action Guide
No ratings yet
Action Guide
48 pages
A. Background of OJT: On The Job Training (OJT) or Internship Program Is One of The
No ratings yet
A. Background of OJT: On The Job Training (OJT) or Internship Program Is One of The
5 pages
1 Unit-1
No ratings yet
1 Unit-1
42 pages
Empirical AND Molecul AR Formulas: Insert Picture From First Page of Chapter
No ratings yet
Empirical AND Molecul AR Formulas: Insert Picture From First Page of Chapter
58 pages
Machine Learning Summer Training
No ratings yet
Machine Learning Summer Training
118 pages
UNIT-2-Preparing To Model
No ratings yet
UNIT-2-Preparing To Model
137 pages
Know - Your - Data and Rescaling-1
No ratings yet
Know - Your - Data and Rescaling-1
78 pages
Lec2 Data
No ratings yet
Lec2 Data
51 pages
Data-Preprocessing
No ratings yet
Data-Preprocessing
138 pages
3-Random Projection and Compressed Sensing Technique-13-01-2025
No ratings yet
3-Random Projection and Compressed Sensing Technique-13-01-2025
84 pages
ML Unit-II Notes
No ratings yet
ML Unit-II Notes
86 pages
Unit 1
No ratings yet
Unit 1
78 pages
Build ETL Using Python
No ratings yet
Build ETL Using Python
7 pages
Ch01 ICS422 04
No ratings yet
Ch01 ICS422 04
84 pages
CIS 467 - Topic 2 - Data Exploration and Preprocessing
No ratings yet
CIS 467 - Topic 2 - Data Exploration and Preprocessing
81 pages
Computational Chem 6
No ratings yet
Computational Chem 6
152 pages
The Machine Learning Process Involves Several Steps That Help Develop and Deploy A Successful Machine Learning Model
No ratings yet
The Machine Learning Process Involves Several Steps That Help Develop and Deploy A Successful Machine Learning Model
62 pages
Amchelltdprofile V2
No ratings yet
Amchelltdprofile V2
146 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
A Guide To Writing The Dissertation Literature Review Justus J Randolph
100% (2)
A Guide To Writing The Dissertation Literature Review Justus J Randolph
8 pages
The Maxwell Capacitance Matrix WP110301 R02
No ratings yet
The Maxwell Capacitance Matrix WP110301 R02
3 pages
B Lab Manual Machine Learning SEM-7 CSE 2024
No ratings yet
B Lab Manual Machine Learning SEM-7 CSE 2024
49 pages
Lec01 Dataprep
No ratings yet
Lec01 Dataprep
67 pages
Kaizen Model in African Bank
No ratings yet
Kaizen Model in African Bank
107 pages
Unit 2
No ratings yet
Unit 2
67 pages
Machine Learning: Data Set
100% (1)
Machine Learning: Data Set
52 pages
UNIT-1 (Preparing To Model)
No ratings yet
UNIT-1 (Preparing To Model)
82 pages
Data Science Mid Syllabus
No ratings yet
Data Science Mid Syllabus
102 pages
UNIT02
No ratings yet
UNIT02
41 pages
ML Lecture 4 Data
No ratings yet
ML Lecture 4 Data
22 pages
DMML
No ratings yet
DMML
65 pages
Chapter - 3 Data Pre - Processing
No ratings yet
Chapter - 3 Data Pre - Processing
54 pages
Unit 3 Data Preprocessing - Data
No ratings yet
Unit 3 Data Preprocessing - Data
90 pages
Topics To Be Covered
No ratings yet
Topics To Be Covered
58 pages
ML Unit 1 Part 2
No ratings yet
ML Unit 1 Part 2
56 pages
Lecture 01-05 Data, Central Tendency PDF
No ratings yet
Lecture 01-05 Data, Central Tendency PDF
51 pages
Unit 4
No ratings yet
Unit 4
66 pages
ML 3170724 Unit-2
No ratings yet
ML 3170724 Unit-2
40 pages
02know Your Data-Lecture2-3
No ratings yet
02know Your Data-Lecture2-3
53 pages
Unit2PreparingtoModelpptx 2023 09 02 14 52 40
No ratings yet
Unit2PreparingtoModelpptx 2023 09 02 14 52 40
43 pages
Datalec 1
No ratings yet
Datalec 1
23 pages
Machine Learning: Where To Start?
No ratings yet
Machine Learning: Where To Start?
71 pages
Machine Learning
No ratings yet
Machine Learning
80 pages
Knowing The Data Set
No ratings yet
Knowing The Data Set
31 pages
CHP 2
No ratings yet
CHP 2
52 pages
ML 2
No ratings yet
ML 2
8 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Week2 UnderstandingData
No ratings yet
Week2 UnderstandingData
27 pages
Unit 3
No ratings yet
Unit 3
30 pages
UNIT 2 DT
No ratings yet
UNIT 2 DT
8 pages
6.lab Activity
No ratings yet
6.lab Activity
23 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
19 pages
Unit 2
No ratings yet
Unit 2
12 pages
Machine Learning: Dr. Muhammad Asadullah
No ratings yet
Machine Learning: Dr. Muhammad Asadullah
69 pages
Electrostatic
No ratings yet
Electrostatic
77 pages
Emotron-Fdu-Vfx 2 0 Technical Catalogue 01-4948-01 Rev-2018.en
No ratings yet
Emotron-Fdu-Vfx 2 0 Technical Catalogue 01-4948-01 Rev-2018.en
34 pages
Wk. 3. Data (12-05-2021)
No ratings yet
Wk. 3. Data (12-05-2021)
57 pages
Machine Learning Unit 2
No ratings yet
Machine Learning Unit 2
9 pages
MDplot R
No ratings yet
MDplot R
37 pages
D.02 Facilitators Manual Debrief AAR - Sept2019
No ratings yet
D.02 Facilitators Manual Debrief AAR - Sept2019
15 pages
Forensic 1 Lesson 3
No ratings yet
Forensic 1 Lesson 3
28 pages
Gi Emeai Iso12944
No ratings yet
Gi Emeai Iso12944
7 pages
#1-Fahad AlDhaheri, Ali Ameen, Osama Isaac. 2020.
No ratings yet
#1-Fahad AlDhaheri, Ali Ameen, Osama Isaac. 2020.
11 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
Syllabus - Technology For Teaching and Learning 1
No ratings yet
Syllabus - Technology For Teaching and Learning 1
9 pages
Preprocessing 1
No ratings yet
Preprocessing 1
11 pages
EE2211 Introduction To Machine Learning: Semester 1 2021/2022
No ratings yet
EE2211 Introduction To Machine Learning: Semester 1 2021/2022
35 pages
Study of Siesmic Analysis of Multistorey Building With or Without Floating Columns
No ratings yet
Study of Siesmic Analysis of Multistorey Building With or Without Floating Columns
18 pages
Toefl Tips
No ratings yet
Toefl Tips
12 pages
Machine Learning Machine Learning Data
No ratings yet
Machine Learning Machine Learning Data
43 pages
(IJCST-V3I1P21) : S. Padmapriya
No ratings yet
(IJCST-V3I1P21) : S. Padmapriya
5 pages
ML Unit1.notes
No ratings yet
ML Unit1.notes
8 pages
Tugas Bahasa Inggris
No ratings yet
Tugas Bahasa Inggris
6 pages
Statss
No ratings yet
Statss
4 pages
AJanae Valentine - Gizmos - NaturalSelectionSE
No ratings yet
AJanae Valentine - Gizmos - NaturalSelectionSE
4 pages
NICABM Brain Infographic
No ratings yet
NICABM Brain Infographic
3 pages
Machine Learning: Where To Start?
No ratings yet
Machine Learning: Where To Start?
4 pages
Tutorial 6
No ratings yet
Tutorial 6
1 page
Mrigshira Nakshatra
No ratings yet
Mrigshira Nakshatra
3 pages
Grasshoppers Case Study
No ratings yet
Grasshoppers Case Study
2 pages
Data Collection: Six Sigma Thinking, #1
From Everand
Data Collection: Six Sigma Thinking, #1
Sumeet Savant
No ratings yet

Machine Learning

Uploaded by

Machine Learning

Uploaded by

Machine Learning(3170724 )

Introduction to Machine Learning © Daxa Patel 3

Introduction to Machine Learning © Daxa Patel 4

Introduction to Machine Learning © Daxa Patel 5

Introduction to Machine Learning © Daxa Patel 6

Introduction to Machine Learning © Daxa Patel 7

Introduction to Machine Learning © Daxa Patel 8

■ There are two most effective mathematical plots to explore

■ Mean, by definition, is a sum of all data values divided by the

■ If the above set of numbers represents marks of 5 students in

■ Median, on contrary, is the value of the element appearing in

X is the variable or attribute whose variance is to be measured and n

■ Standard deviation of a data is measured as follows:

■ Larger value of variance or standard deviation indicates more

68 is the median of this data set.

52 is the lower quartile

87 is the upper quartile

18 is the minimum and 100 is the maximum.

You might also like