Presentation Session 1 - Practical Data Science Final
Presentation Session 1 - Practical Data Science Final
Topics:
1. Introduction to Data Science
2. Data characteristics
3. Descriptive Statistics
4. Inferential Statistics
WEEK 2Data
- CSEScience – WhyVISUALIZATION
3020 – DATA it is needed?
Data Growth – IDC-Seagate Study
Data
2. Data Preparations:
Visualisation
Preparations
Data Science • Data Cleaning (remove bad data, null values,
handling missing values)
– Life Cycle
• Data Transformation – takes raw data and
turns it into desired outputs by normalizing
Modelling (min-max, zscore)
Building and Data Mining
Testing • Handling Outliers
• Data Reduction
Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 3
Data
WEEK 2 - CSE Science
3020 – DATA– VISUALIZATION
Life Cycle
3. Data Mining– Uncover the data patterns and
relationships to take better business decisions.
Data Acquisition
It’s a discovery process to get hidden and useful
knowledge, commonly known as exploratory
data analysis
Visualisation
Data
Preparations
4. Modelling Building and Testing –
Data Science • Modeling is the heart of data analysis. It
– Life Cycle takes organized data as ip and gives op.
• Suitable ML/DL models to be built for the
data, problem - to gain deeper insights,
Modelling predict outcomes – using training data set
Building and Data Mining
Testing • Tested against predetermined test data to
assess result accuracy
• Fine-tuned to improve the result
Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 4
Data
WEEK 2 - CSE Science
3020 – DATA– VISUALIZATION
Life Cycle
5. Visualisation –
Data Acquisition
• Communicate insights from data through
visual representation
• Explaining the process of operationalisation
Data • Communicate results
Visualisation
Preparations
Data Science • Highlights the findings, correlations, etc..
– Life Cycle
Modelling
Building and Data Mining
Testing
Recommendation
Systems
Visualization,
• Improves Insights
• Enables faster decision making
Session 2 - Boxplot, Line Plots, Pie Charts, Scatter Plots, Heatmaps for
Correlation Analysis, Text visualisation. Hands On - MatplotLib for creating
multiple plots
Session 3 - Dashboard creation using visualization tools for the use cases:
Finance/marketing/healthcare (anyone) etc.
Outline:
• Data types
• Measurements of Data
• Dataset types
• Semantics
• Categories or groups
• Answer to Yes or No
• Qualitative data can be separated into different categories
that are distinguished by some nonnumeric characteristics.
• E.g.: Genders (male/female) of professional athletes.
• Expressed in terms of natural language descriptions
• Sometimes categorical data can take numerical values, but
those numbers do not have mathematical meaning.
• E.g.: Birthdate
• Calculate the average,
Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 16
Quantitative
WEEK 2 - CSE 3020 data
– DATA VISUALIZATION
Nominal
Ordinal
Interval
Ordinal Attributes can be ordered
Ratio
Interval Distance is meaningful
Any guess
Thejas 3333 82 98 88
Ruhan 2222 78 67 90
Thejas 3333 82 98 88
Outline:
• Normal Distribution
• Correlation
• Covariance
• Central Limit Theorem
• Hypothesis testing