module 3
module 3
intelligence
In this grid game, the gray tile indicates the danger, black is a block, and the tile with
diagonallines is the goal. The aim is to start, say from bottom-left grid, using the actions
left, right, top andbottom to reach the goal state.
To solve this sort of problem, there is no data. The agent interacts with the environment
toget experience. In the above case, the agent tries to create a model by simulating many
paths and finding rewarding paths. This experience helps in constructing a model.
1, 1 1
2, 1 2
3, 1 3
4, 1 4
5, 1 5
Can a model for this test data be multiplication? That is, y = x1 * x2. Well! It is true! But, this
is equally true that y may be y = x1 / x2 or y = x1 ^ x2. So, there are three functions that fit
the data.
This means that the problem is ill-posed. To solve this problem, one needs more example to
check the model. Puzzles and games that do not have sufficient specification may become an
ill-posed problem and scientific computation has many ill-posed problems.
2. Need for Huge, Quality Data:
o Machine learning requires large amounts of high-quality data. The data must be
complete, without missing or incorrect values. Poor-quality data can lead to inaccurate
models.
3. High Computational Power:
o With the growth of Big Data, machine learning tasks require powerful computers with
specialized hardware like GPUs or TPUs to handle the high computational load. The
increasing complexity of tasks has made high-performance computing essential.
4. Complexity of Algorithms:
o Choosing the right machine learning algorithm, explaining how it works, applying it
correctly, and comparing different algorithms are now critical skills for data scientists.
This makes the selection and evaluation of algorithms a significant challenge.
5. Bias-Variance Trade-off:
o Overfitting: When a model performs well on training data but fails on test data, it’s
called overfitting. This means the model has learned the training data too well but lacks
generalization to new data.
o Underfitting: When a model fails to perform well on both training and test data, it’s
called underfitting. The model is too simple to capture the patterns in the data.
o Balancing between overfitting and underfitting is a major challenge for machine
learning algorithms.
1.6 MACHINE LEARNING PROCESS
The emerging process model for the data mining solutions for business organizations is
CRISP-DM.Since machine learning is like data mining, except for the aim, this process can
be used for machinelearning. CRISP-DM stands for Cross Industry Standard Process – Data
Mining. This process involves six steps. The steps are listed below in Figure 1.11.
1. Understanding the business – This step involves understanding the objectives and
requirements of the business organization. Generally, a single data mining algorithm is
enough for giving the solution. This step also involves the formulation of the problem
statement for the data mining process.
2. Understanding the data – It involves the steps like data collection, study of the
characteristics of the data, formulation of hypothesis, and matching of patterns to the
selected hypothesis.
3. Preparation of data – This step involves producing the final dataset by cleaning the
raw data and preparation of data for the data mining process. The missing values may
cause problems during both training and testing phases. Missing data forces classifiers to
produceinaccurate results. This is a perennial problem for the classification models. Hence,
suitablestrategies should be adopted to handle the missing data.
4. Modelling – This step plays a role in the application of data mining algorithm for the
datato obtain a model or pattern.
5. Evaluate – This step involves the evaluation of the data mining results using statistical
analysis and visualization methods. The performance of the classifier is determined by
evaluating the accuracy of the classifier. The process of classification is a fuzzy issue.
For example, classification of emails requires extensive domain knowledge and requires
domain experts. Hence, performance of the classifier is very crucial.
6. Deployment – This step involves the deployment of results of the data mining
algorithm to improve the existing process or for a new situation.
o Replace all values in the bin with the mean (average) of the bin
values.
Example:
o Given data: S = {12, 14, 19, 22, 24, 26, 28, 31, 34}
o First, divide into bins of size 3:
Bin 1: {12, 14, 19}
Bin 2: {22, 24, 26}
Bin 3: {28, 31, 34}
o Now apply smoothing by means (replace all values with the bin's
mean):
Bin 1 (mean = 15): {15, 15, 15}
Bin 2 (mean = 24): {24, 24, 24}
Bin 3 (mean ≈ 31): {31, 31, 31}
o Explanation: Each value in the bin is replaced by the mean of the
bin to smooth the data.
2. Smoothing by Medians:
o Replace all values in the bin with the median of the bin values (the
middle value when the data is sorted).
Example:
o Given the same data and bins:
Bin 1 (median = 14): {14, 14, 14}
Bin 2 (median = 24): {24, 24, 24}
Bin 3 (median = 31): {31, 31, 31}
o Explanation: Each value in the bin is replaced by the median,
which reduces the effect of outliers or extreme values.
3. Smoothing by Bin Boundaries:
o Replace each value in the bin with the closest boundary value
(minimum or maximum value in the bin).
Example:
o Given the same data and bins:
Bin 1 (boundary values: 12 and 19): {12, 12, 19}
Bin 2 (boundary values: 22 and 26): {22, 22, 26}
Bin 3 (boundary values: 28 and 34): {28, 34, 34}
o Explanation: For each bin, values are replaced by the closest
boundary value (either the minimum or maximum of that bin).
o Example: In Bin 1, the original data was {12, 14, 19}. The
boundaries are 12 and 19, so the value 14 is closer to 12, and it's
replaced by 12.
Here max-min is the range. Min and max are the minimum and
maximum of the given data, new max and new min are the minimum
and maximum of the target range, say 0 and 1.
Example 2.2: Consider the set: V = {88, 90, 92, 94}. Apply Min-Max
procedure and map the marks to a new range 0–1.
Solution: The minimum of the list V is 88 and maximum is 94. The
new min and new max are 0 and 1, respectively. The mapping can be
done using Eq. (2.1) as:
So, it can be observed that the marks {88, 90, 92, 94} are mapped to
the new range {0, 0.33, 0.66, 1}. Thus, the Min-Max normalization
range is between 0 and 1.
Here, s is the standard deviation of the list V and m is the mean of the
list V.
Example 2.3: Consider the mark list V = {10, 20, 30}, convert the
marks to z-score.
Solution: The mean and Sample Standard deviation (s) values of the
list V are 20 and 10, respectively. So the z-scores of these marks are
Hence, the z-score of the marks 10, 20, 30 are -1, 0 and 1, respectively.
Data Reduction
Data reduction reduces data size but produces the same results. There
are different ways in which data reduction can be carried out such as
data aggregation, feature selection, and dimensionality reduction.
is that by visual inspection one can find out who got more marks.
For example, the mean of the three numbers 10, 20, and 30 is 20
Here, n is the number of items and xi are values. For example, if the
values are 6 and 8, the geometric mean is given as In larger cases,
computing geometric mean is difficult. Hence, it is usually calculated
as:
Median class is that class where N/2th item is present. Here, i is the
class interval of the median class and L1 is the lower limit of median
class, f is the frequency of the median class, and cf is the cumulative
frequency of all classes preceding median.
3. Mode – Mode is the value that occurs more frequently in the
dataset. In other words, the value that has the highest frequency is
called mode.
2.5.3 Dispersion
The spreadout of a set of data around the central tendency (mean,
median or mode) is called dispersion. Dispersion is represented by
various ways such as range, variance, standard deviation, and
standard error. These are second order measures. The most common
measures of the dispersion data are listed below:
Range Range is the difference between the maximum and minimum
of values of the given list of data.
Standard Deviation The mean does not convey much more than a
middle point. For example, the following datasets {10, 20, 30} and {10,
50, 0} both have a mean of 20. The difference between these two sets
is the spread of data. Standard deviation is the average distance from
the mean of the dataset to each point.
The formula for sample standard deviation is given by: