0% found this document useful (0 votes)
27 views3 pages

Data Science Unit-I Notes

Uploaded by

krishnareddyrpt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views3 pages

Data Science Unit-I Notes

Uploaded by

krishnareddyrpt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Definition: Data science is a multidisciplinary field that uses algorithms, processes, scientific methods,

and systems to extract knowledge and insights from structured and unstructured data.

Characteristics of Data Science:

One of the key concepts in this field is model fitting.

This article delves deep into what model fitting in data science is, its importance, how it works, and its
application in various industries.

Defining model fitting in data science


Model fitting measures how well a statistical model describes a set of
observations.

In data science, models are mathematical constructs that represent


real-world processes.

These models are created using algorithms and are then ‘fitted’ to a
set of data points.

The fitting process involves adjusting the model’s parameters to


optimise the match between the model’s predictions and the actual
data.

Model fitting in data science is a crucial step in data analysis.

It allows data scientists to make accurate predictions, identify patterns,


and make informed decisions based on the data.

The quality of the model fit can significantly impact the validity and
reliability of the conclusions drawn from the data.

The process of model fitting in data


science
The process of model fitting in data science involves several steps.

These include data collection, model selection, parameter estimation,


and evaluation.
Each step is crucial in ensuring that the model accurately represents
the data and can make reliable predictions.

Data collection
The first step in the model fitting process is data collection.

This involves gathering relevant data that will be used to train the
model.

The quantity and quality of the data collected can significantly


influence the accuracy of the model fit.

Collecting diverse data is essential to ensure the model can generalise


well to new, unseen data.

Model selection
Once the data has been collected, the next step is model selection.

This involves choosing the most appropriate statistical or machine-


learning model for the data.

The choice of model depends on the nature of the data, the problem at
hand, and the specific goals of the analysis.

Parameter estimation
After selecting a model, the next step is parameter estimation.

This involves adjusting the model’s parameters to optimise the fit to


the data.

This is often done using maximum likelihood estimation or least


squares estimation.

Model evaluation
The final step in the model fitting process is model evaluation.
This involves assessing the quality of the model fit and determining
how well the model can predict new data.

This is typically done using techniques such as cross-


validation or bootstrapping.

Why does overfitting occur?

You only get accurate predictions if the machine learning model generalizes to all types of data
within its domain. Overfitting occurs when the model cannot generalize and fits too closely to
the training dataset instead. Overfitting happens due to several reasons, such as:
• The training data size is too small and does not contain enough data samples to accurately
represent all possible input data values.
• The training data contains large amounts of irrelevant information, called noisy data.
• The model trains for too long on a single sample set of data.
• The model complexity is high, so it learns the noise within the training data.

Overfitting examples
Consider a use case where a machine learning model has to analyze photos and identify the
ones that contain dogs in them. If the machine learning model was trained on a data set that
contained majority photos showing dogs outside in parks , it may may learn to use grass as a
feature for classification, and may not recognize a dog inside a room.
Another overfitting example is a machine learning algorithm that predicts a university student's
academic performance and graduation outcome by analyzing several factors like family income,
past academic performance, and academic qualifications of parents. However, the test data
only includes candidates from a specific gender or ethnic group. In this case, overfitting causes
the algorithm's prediction accuracy to drop for candidates with gender or ethnicity outside of
the test dataset.

You might also like