Data Science Unit-I Notes
Data Science Unit-I Notes
and systems to extract knowledge and insights from structured and unstructured data.
This article delves deep into what model fitting in data science is, its importance, how it works, and its
application in various industries.
These models are created using algorithms and are then ‘fitted’ to a
set of data points.
The quality of the model fit can significantly impact the validity and
reliability of the conclusions drawn from the data.
Data collection
The first step in the model fitting process is data collection.
This involves gathering relevant data that will be used to train the
model.
Model selection
Once the data has been collected, the next step is model selection.
The choice of model depends on the nature of the data, the problem at
hand, and the specific goals of the analysis.
Parameter estimation
After selecting a model, the next step is parameter estimation.
Model evaluation
The final step in the model fitting process is model evaluation.
This involves assessing the quality of the model fit and determining
how well the model can predict new data.
You only get accurate predictions if the machine learning model generalizes to all types of data
within its domain. Overfitting occurs when the model cannot generalize and fits too closely to
the training dataset instead. Overfitting happens due to several reasons, such as:
• The training data size is too small and does not contain enough data samples to accurately
represent all possible input data values.
• The training data contains large amounts of irrelevant information, called noisy data.
• The model trains for too long on a single sample set of data.
• The model complexity is high, so it learns the noise within the training data.
Overfitting examples
Consider a use case where a machine learning model has to analyze photos and identify the
ones that contain dogs in them. If the machine learning model was trained on a data set that
contained majority photos showing dogs outside in parks , it may may learn to use grass as a
feature for classification, and may not recognize a dog inside a room.
Another overfitting example is a machine learning algorithm that predicts a university student's
academic performance and graduation outcome by analyzing several factors like family income,
past academic performance, and academic qualifications of parents. However, the test data
only includes candidates from a specific gender or ethnic group. In this case, overfitting causes
the algorithm's prediction accuracy to drop for candidates with gender or ethnicity outside of
the test dataset.