0% found this document useful (0 votes)
6 views21 pages

Data Science Lecture No 02

Uploaded by

abdul baqi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views21 pages

Data Science Lecture No 02

Uploaded by

abdul baqi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Lecture No.

02 AI th
7 , SEN –5th

Course: Data Science


Instructor: Dr. Maryum Nisar

10/22/2024 1
Data Science

10/22/2024 2
Lecture Contents
❑Data Science
❑Understanding Data Science
❑Exploratory Data Analysis

10/22/2024 3
Data Science
❑Data Science
▪ Data science is the application of computational and statistical techniques to
address or gain insight into some problem in the real world
▪ Data science = statistics +
data processing +
machine learning +
scientific inquiry +
visualization +
business analytics +
big data + …

10/22/2024 4
CRISP process
CRoss-Industry Standard Process for data mining
(CRISP)

5
Data Science Process Step

6
Understanding data science
❑Data requirements:
▪ There can be various sources of data for an organization. It is important to comprehend what type of data is
required for the organization to be collected, curated, and stored.
▪ In addition to this, it is required to categorize the data, numerical or categorical, and the format of storage
and dissemination.

❑ Data collection:
▪ Data collected from several sources must be stored in the correct format and transferred to the right information
technology personnel within a company. As mentioned previously, data can be collected from several objects on
several events using different types of sensors and storage tools.

❑Data processing:
▪ Preprocessing involves the process of pre-curating the dataset before actual analysis. Common tasks involve
correctly exporting the dataset, placing them under the right tables, structuring them, and exporting them
in the correct format.
Understanding data science
❑Data cleaning:
▪ Preprocessed data is still not ready for detailed analysis. It must be correctly transformed for an incompleteness check,
duplicates check, error check, and missing value check. These tasks are performed in the data cleaning stage, which
involves responsibilities such as matching the correct record, finding inaccuracies in the dataset, understanding the overall
data quality, removing duplicate items, and filling in the missing values.
▪ However, how could we identify these anomalies on any dataset?
▪ An example of data cleaning would be using outlier detection methods for quantitative data cleaning.

❑ EDA:
▪ Exploratory data analysis, is the stage where we actually start to understand the message contained in the data.

❑Modeling and algorithm:


▪ From a data science perspective, generalized models or mathematical formulas can represent or exhibit relationships
among different variables, such as correlation or causation..
Understanding data science
❑Data Product:
▪ A data product is generally based on a model developed during data analysis, for example, a
recommendation model that inputs user purchase history and recommends a related item that
the user is highly likely to buy.

❑Communication:
▪ This stage deals with disseminating the results to end stakeholders to use the result for business
intelligence. One of the most notable steps in this stage is data visualization.
▪ Visualization deals with information relay techniques such as tables, charts, summary diagrams,
and bar charts to show the analyzed result.
Prior Knowledge
Gaining information on:

- Objective of the problem


- Subject area of the problem
- Data

10

10
Data Preparation / Data exploration

Data Exploration
Data quality
Handling missing values
Data type conversion
Transformation
Outliers
Feature selection
Sampling

11

11
Introduction to Exploratory Data Analysis (EDA)

EDA is a crucial step in data


science that allows for
understanding data.

It involves summarizing data,


detecting anomalies, and
testing assumptions.

EDA helps make data-driven


decisions before modeling.
12

12
Key aspects of EDA
❑Correlation Analysis
▪ Checking the relationships between variables to understand how they might affect each other. This
includes computing correlation coefficients and creating correlation matrices.

❑ Handling Missing Values


▪ Detecting and deciding how to address missing data points, whether by imputation or removal,
depending on their impact and the amount of missing data.

❑ Summary Statistics
▪ Calculating key statistics that provide insight into data trends and nuances

❑Testing Assumptions
▪ Many statistical tests and models assume the data meet certain conditions (like normality
Why Exploratory Data Analysis is Important?
Exploratory Data Analysis (EDA) is important for several reasons, especially in the context of data
science and statistical modeling. Here are some of the key reasons why EDA is a critical step in the
data analysis process:
▪ Understanding Data Structures
▪ Identifying Patterns and Relationships
▪ Detecting Anomalies and Outliers
▪ Testing Assumptions
▪ Informing Feature Selection and Engineering
▪ Optimizing Model Design
▪ Facilitating Data Cleaning
▪ Enhancing Communication
EDA Importance
❑Understanding Data Structures
o EDA helps in getting familiar with the dataset, understanding the number of features, the type of data in each
feature, and the distribution of data points. This understanding is crucial for selecting appropriate analysis or
prediction techniques.

❑Identifying Patterns and Relationships


o Through visualizations and statistical summaries, EDA can reveal hidden patterns and intrinsic relationships
between variables. These insights can guide further analysis and enable more effective feature engineering and
model building.

❑Detecting Anomalies and Outliers


o EDA is essential for identifying errors or unusual data points that may adversely affect the results of your analysis.
Detecting these early can prevent costly mistakes in predictive modeling and analysis.
EDA Importance
❑ Testing Assumptions
o Many statistical models assume that data follow a certain distribution or that variables are independent. EDA
involves checking these assumptions.

❑ Informing Feature Selection and Engineering


o Insights gained from EDA can inform which features are most relevant to include in a model and how to
transform them (scaling, encoding) to improve model performance.

❑Optimizing Model Design


o By understanding the data’s characteristics, analysts can choose appropriate modeling techniques, decide on
the complexity of the model, and better tune model parameters.
EDA Importance
❑ Facilitating Data Cleaning
o EDA helps in spotting missing values and errors in the data, which are critical to address before further analysis
to improve data quality and integrity.

❑ Enhancing Communication
o Visual and statistical summaries from EDA can make it easier to communicate findings and convince others of the
validity of your conclusions, particularly when explaining data-driven insights to stakeholders without technical
backgrounds.
Traditional Vs Machine Learning Model

18
https://www.ranker.com/crowdranked-list/best-jobs-in-the-world
https://www.panelplace.com/blogs/top-8-coolest-jobs-world 18
Data Science process

19
https://www.ranker.com/crowdranked-list/best-jobs-in-the-world
https://www.panelplace.com/blogs/top-8-coolest-jobs-world 19
Data Science process

20

20
Thank You !
10/22/2024 21

You might also like