0% found this document useful (0 votes)

15 views20 pages

Module 1 Importance of Data Visualization and Data Exploration 1 (1)

Uploaded by

Vritika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views20 pages

Module 1 Importance of Data Visualization and Data Exploration 1 (1)

Uploaded by

Vritika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Data Visualization 21AD71

Prepared By,
Dr. Anitha DB
Associate Professor & Head
Department of CSE-Data Science
ATME College of Engineering, Mysuru

ATME College of 1
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module1 :Data Visualization and Data Exploration

• Introduction: Data Visualization, Importance of Data Visualization, Data Wrangling, Tools and Libraries
for Visualization
• Overview of Statistics: Measures of Central Tendency, Measures of Dispersion, Correlation, Types of
Data, Summary Statistics
• Numpy: Numpy Operations - Indexing, Slicing, Splitting, Iterating, Filtering, Sorting, Combining, and
Reshaping
• Pandas: Advantages of pandas over numpy, Disadvantages of pandas, Pandas operation - Indexing,
Slicing, Iterating, Filtering, Sorting and Reshaping using Pandas

ATME College of 2
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module1 Data Visualization and Data Exploration

Topic1: Introduction

• Data Visualization,
• Importance of Data Visualization,
• Data Wrangling,
• Tools and Libraries for Visualization

ATME College of 3
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module1 Data Visualization and Data Exploration

Introduction to Data Visualization

• Computers and smartphones store data such as names and numbers in digital format.
• Data representation refers to the forms in which we can store, process, and transmit data.
• Effective representations can narrate story and convey fundamental discoveries to audience
• Creating representations helps to achieve a more precise, more concise, and more direct perspective of
information , making it easier for anyone to understand the data.
• Representations are useful apparatus to derive insights from the data
• Representations convert data into useful information.

ATME College of 4
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module1 Data Visualization and Data Exploration

The Importance of Data Visualization

• Instead of just looking at data in the columns of an Excel spreadsheet, we get a better idea of what our
data contains by using visualization.
• For instance, it is easy to see a pattern emerge from the numerical data that’s given in the following
scatter plot.
• It shows the correlation between diameter and the height of various trees.
• There is a positive correlation between diameter and height.

ATME College of 5
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module1 Data Visualization and Data Exploration

The Importance of Data Visualization

Visualizing data has many advantages

• Complex data can be easily understand

• A simple visual representation of outliers, target audiences, and futures market can be created

• Storytelling can be done using dashboards and animations

• Data can be explored through interactive visualizations

Questions

Briefly explain Data Visualization.

Why Data Visualization is Important/Significant?

ATME College of 6
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module1 Data Visualization and Data Exploration

Data Wrangling
• Data wrangling is the process of transforming raw data into a suitable representation for various tasks. It
is the discipline of augmenting, cleaning, filtering, standardizing, and enriching data in a way that allows
it to be used in a downstream task, which in our case is data visualization.
• Examine the following flow diagram of the data wrangling process to understand how precise and
actionable data is prepared for business analysts to utilize.

ATME College of 7
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module1 Data Visualization and Data Exploration

Data Wrangling
The following steps explain the flow of the data wrangling process:
1.First, the Employee Engagement data is in its raw form.
2. Then, the data gets imported as a DataFrame and is later cleaned.
3.The cleaned data is then transformed into graphs, from which findings can be derived.
4.Finally, we analyze this data to communicate the final results.
For example, employee engagement can be measured based on raw data gathered from feedback surveys,
employee tenure, exit interviews, one-on-one meetings, and so on. This data is cleaned and made into graphs
based on parameters such as referrals, faith in leadership, and scope of promotions. The percentages, that is,
information derived from the graphs, help us reach our result, which is to determine the measure of
employee engagement.

ATME College of 8
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module1 Data Visualization and Data Exploration

Tools and Libraries for Visualization

• Several tools are available for creating data visualizations to suit different needs.
• Non-coding tools like Tableau provide an intuitive interface for exploring and understanding data.
• Alongside Python, MATLAB and R are also commonly used in data analytics.
• Python stands out as the industry's preferred language due to its user-friendly nature and efficiency in
data manipulation and visualization.
• Its extensive library ecosystem further enhances Python's appeal, making it the optimal choice for robust
data visualization tasks.

Questions:
1. What is Data Wrangling?
2. Explain the data wrangling process with an example of employee engagement.
3. With a neat diagram explain the steps involved in the Data Wrangling process.

ATME College of 9
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module1 Data Visualization and Data Exploration

Topic 2: Overview of Statistics

• Measures of Central Tendency,

• Measures of Dispersion,
• Correlation,
• Types of Data,
• Summary Statistics

ATME College of 10
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module1 Data Visualization and Data Exploration

Overview of Statistics
• Statistics is a combination of the analysis, collection, interpretation and representation of numerical data.
• Probability is a measure of the likelihood that an event will occur and is quantified as a number between 0 and 1
• A probability distribution is a function that provides the probability for every possible event. It is frequently used
for statistical analysis.
• There are two types of probability distributions, namely continuous and discrete.

ATME College of Engineering, 11

Department of CSE-DS, ATMECE
Mysuru
Module1 Data Visualization and Data Exploration

Measures of Central Tendency

Measures of central tendency are often called averages and describe central or typical values for a probability distribution.
Three kind of averages are Mean, Median and Mode.
Mean: The arithmetic average is computed by summing up all measurements and dividing the sum by the number of
1
observations. The mean is calculated as follows µ = σ𝑁
𝑖=1 𝑥𝑖
𝑁

Median: The middle value in a dataset that is arranged in ascending order (from the smallest value to the largest value). If a
dataset contains an even number of values, the median of the dataset is the mean of the two middle values. The median is
less prone to outliers compared to the mean, where the outliers are distinct values in data
Mode: Defines the most frequently occurring value in a dataset. In some cases, a dataset may contain multiple modes,
while some datasets may not have any mode at all.

ATME College of 12
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module1 Data Visualization and Data Exploration

Measures of Central Tendency

Example:A die was rolled 10 times and we got the following numbers:4,5,4,3,4,2,1,1,2,1. Find the central tendency.
Mean= (4+5+4+3+4+2+1+1+2+1)/10=2.7
Medin=middle value of ordered data=midlle value(1,1,1,2,2,3,4,4,4,5)=(2+3)/2=2.5
Mode=1 and 4

ATME College of 13
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module1 Data Visualization and Data Exploration

Measures of Dispersion

ATME College of 14
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module1 Data Visualization and Data Exploration

Measures of Dispersion
Variance: Variance is a measure of how far each data point in the set is from the mean and is calculated by taking the
average of the squared differences from the mean.
Variance (σ²) = Σ(xi - μ)² / N, where μ is the mean and N is the number of data points.

Example: Consider the dataset 2, 4, 4, 4, 5.

The mean is (2+4+4+4+5)/5 = 19/5 = 3.8.
The variance would be [(2-3.8)² + (4-3.8)² + (4-3.8)² + (4-3.8)² + (5-3.8)²] / 5 = 1.36.

Standard Deviation: The standard deviation is the square root of the variance and provides a more interpretable
measure of dispersion.
Standard Deviation (σ) = √Variance

Example: Using the variance example above, the standard deviation would be √1.36 ≈ 1.17.

ATME College of 15
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module1 Data Visualization and Data Exploration

Measures of Dispersion

Range: The range is the simplest measure of dispersion and is calculated as the difference between the maximum and
minimum values in a dataset.
Range = Maximum value - Minimum value
Example: Consider the following set of exam scores - 60, 65, 70, 75, 80.
The range would be 80 (maximum) - 60 (minimum) = 20.

Interquartile Range (IQR):Also called as midspread or middle 50%, This is the difference between the 75th and 25th
percentiles or between the upper and lower quartiles. (the range of the middle 50% of a dataset).
IQR = Q3 (third quartile) - Q1 (first quartile)
Example: If the dataset is 10, 15, 20, 25, 30, the first quartile (Q1) is 15, the third quartile (Q3) is 25, and the IQR
would be 25 - 15 = 10.
ATME College of 16
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module1 Data Visualization and Data Exploration

Correlation
The correlation describes the statistical relationship between two variables:
• In a positive correlation, both variables move in the same direction.
• In a negative correlation, the variables move in opposite directions.
• In Zero correlation, the variables are not related.

ATME College of 17
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module1 Data Visualization and Data Exploration

Correlation
Example: Consider you want to find a decent apartment to rent that is not too expensive compared to other apartments
you have found. The other apartments(all belonging to the same locality) you found on a website are priced as follows:
$700, $850, $1,500 and $750 per month. Calculate some values statistical measures to help us make a decision:
Mean=$950, Median=$800, Standard Deviation=$322.10, Range= $800

A simple statistical analysis helped us to narrow down our choices.

ATME College of 18
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module1 Data Visualization and Data Exploration

Types of Data
• It is important to understand what kind of data you are dealing with so that you can select both the right statistical
measure and the right visualization.
• We categorize data as categorical/qualitative and numerical/quantitative.
• Categorical data describes characteristics, for example, the color of an object or a persons gender.
• We can further divide the categorical data into nominal and ordinal data.
• Numerical data can be divided into discrete and continuous data
• Discrete data can have certain values, whereas continuous data can take any value(some times limited to a range)
• Another aspect to consider is whether the data has temporal domain(is it bounded to time or does it changes over
time?) or spatial domain(if the data is bound to location)

ATME College of 19
Department of CSE-DS, ATMECE
Engineering, Mysuru
Module1 Data Visualization and Data Exploration

Summary Statistics
In real-world applications, we often encounter enormous datasets. Therefore, summary statistics are used to summarize
important aspects of data. They are necessary to communicate large amounts of information in a compact and simple way.

The following table gives an overview of which measure of central tendency is best suited to a particular type of data.

Data Type Best measure of central tendency

Nominal Mode
Ordinal Median
Numerical Mean/Median

ATME College of 20
Department of CSE-DS, ATMECE
Engineering, Mysuru

The Data Visualization Workshop
75% (4)
The Data Visualization Workshop
535 pages
Fin 460-HW 4 Adianto Joel
No ratings yet
Fin 460-HW 4 Adianto Joel
7 pages
300+ TOP Business Statistics MCQs and Answers 2021 PDF
100% (2)
300+ TOP Business Statistics MCQs and Answers 2021 PDF
13 pages
Mod 4
No ratings yet
Mod 4
115 pages
21AD71-module-1-textbook
No ratings yet
21AD71-module-1-textbook
75 pages
Module 4 PPT
No ratings yet
Module 4 PPT
195 pages
M1
No ratings yet
M1
40 pages
It B.tech II Year II Sem DV (R18a0555)
No ratings yet
It B.tech II Year II Sem DV (R18a0555)
73 pages
Unit .......
No ratings yet
Unit .......
45 pages
L1
No ratings yet
L1
49 pages
DSV Module-4
No ratings yet
DSV Module-4
36 pages
Data science-Unit-3-Complete
No ratings yet
Data science-Unit-3-Complete
33 pages
Data Analytics and Interactive Dashboards using Python
No ratings yet
Data Analytics and Interactive Dashboards using Python
96 pages
program-1_
No ratings yet
program-1_
15 pages
UCS551 Chapter 4 - Descriptive Analytics - Visualization
No ratings yet
UCS551 Chapter 4 - Descriptive Analytics - Visualization
39 pages
Topic 8 Data Processing and Analysis PDF
No ratings yet
Topic 8 Data Processing and Analysis PDF
157 pages
CH - 4
No ratings yet
CH - 4
71 pages
Module 1 Overview_of_Statistics
No ratings yet
Module 1 Overview_of_Statistics
11 pages
Chapter 2 - Understand Data
No ratings yet
Chapter 2 - Understand Data
63 pages
Data Exploration
No ratings yet
Data Exploration
11 pages
Module 1 Introduction to Data Visualization
No ratings yet
Module 1 Introduction to Data Visualization
5 pages
Exploratory Data Analysis: Datascience Using Python Topic: 3
No ratings yet
Exploratory Data Analysis: Datascience Using Python Topic: 3
32 pages
data mining 2
No ratings yet
data mining 2
64 pages
Module4 DSV
No ratings yet
Module4 DSV
89 pages
Qtymeth Dispersion
No ratings yet
Qtymeth Dispersion
8 pages
02Data (2)
No ratings yet
02Data (2)
36 pages
1_L2_Intro_DAM
No ratings yet
1_L2_Intro_DAM
27 pages
02 Exploratory Data Analytics
No ratings yet
02 Exploratory Data Analytics
41 pages
ML 3170724 Unit-2
No ratings yet
ML 3170724 Unit-2
40 pages
Chapter 2
No ratings yet
Chapter 2
53 pages
02know Your Data-Lecture2-3
No ratings yet
02know Your Data-Lecture2-3
53 pages
Data Mining Vs Data Exploration UNIT-II
No ratings yet
Data Mining Vs Data Exploration UNIT-II
11 pages
CHP 2
No ratings yet
CHP 2
52 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
FIT1043 - Lecture 3 - 2024
No ratings yet
FIT1043 - Lecture 3 - 2024
69 pages
Unit 2 1
No ratings yet
Unit 2 1
54 pages
02 Data
No ratings yet
02 Data
62 pages
unit1
No ratings yet
unit1
78 pages
DOM503 Session 1
No ratings yet
DOM503 Session 1
19 pages
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
6 pages
Datascience First Conti..and Second Unit
No ratings yet
Datascience First Conti..and Second Unit
49 pages
Module1 BDA
No ratings yet
Module1 BDA
39 pages
S2.Measures of Central Tendency and Variability, Data Visualization
No ratings yet
S2.Measures of Central Tendency and Variability, Data Visualization
17 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
Business Analytics Unit 4
No ratings yet
Business Analytics Unit 4
24 pages
CIS 467 - Topic 2 - Data Exploration and Preprocessing
No ratings yet
CIS 467 - Topic 2 - Data Exploration and Preprocessing
81 pages
ADS PRINT ans
No ratings yet
ADS PRINT ans
4 pages
Section 1 Slide
No ratings yet
Section 1 Slide
132 pages
Module 1
No ratings yet
Module 1
64 pages
CS 591.03 Introduction To Data Mining Instructor: Abdullah Mueen
No ratings yet
CS 591.03 Introduction To Data Mining Instructor: Abdullah Mueen
52 pages
DM UNIT-1-1
No ratings yet
DM UNIT-1-1
56 pages
Data Mining: Data Exploration: - Chapter 6
No ratings yet
Data Mining: Data Exploration: - Chapter 6
56 pages
Lecture 2.2.1, 2.2.2 2.2.3
No ratings yet
Lecture 2.2.1, 2.2.2 2.2.3
19 pages
Fundamentals of Statistics With MS Excel
No ratings yet
Fundamentals of Statistics With MS Excel
83 pages
7u7 PDF
No ratings yet
7u7 PDF
31 pages
02Data
No ratings yet
02Data
65 pages
Descriptive Statistics (1)
No ratings yet
Descriptive Statistics (1)
63 pages
Data Science Visualization in R
No ratings yet
Data Science Visualization in R
42 pages
Chapter 2
No ratings yet
Chapter 2
65 pages
Unit1 Statistics
No ratings yet
Unit1 Statistics
60 pages
Data Scientist Roadmap
From Everand
Data Scientist Roadmap
Mohammed Ahmed
5/5 (1)
Exploratory Data Science: A Practical Guide for Engineering and Science Students
From Everand
Exploratory Data Science: A Practical Guide for Engineering and Science Students
Pasquale De Marco
No ratings yet
Standard Deviation Assignment
No ratings yet
Standard Deviation Assignment
1 page
STA 224 LECTURE NOTE BY UPCOMING UPDATE ?
No ratings yet
STA 224 LECTURE NOTE BY UPCOMING UPDATE ?
63 pages
Risk Novo
No ratings yet
Risk Novo
4 pages
2 Z-Test
No ratings yet
2 Z-Test
16 pages
Bayesian Guide v0.12.2
No ratings yet
Bayesian Guide v0.12.2
120 pages
Moderation Implied An Interaction Effect, Where Introducing A Moderating Variable
No ratings yet
Moderation Implied An Interaction Effect, Where Introducing A Moderating Variable
11 pages
Snowball Sampling
No ratings yet
Snowball Sampling
1 page
Home Work On Hypothesis Testing
No ratings yet
Home Work On Hypothesis Testing
3 pages
Relationship Different Between Set of Data: (Between or Within Group(s) )
No ratings yet
Relationship Different Between Set of Data: (Between or Within Group(s) )
1 page
BUDGETED-LESSON-PLAN-2nd SEMESTER
No ratings yet
BUDGETED-LESSON-PLAN-2nd SEMESTER
14 pages
Lecture No 10:: STA301 - Statistics and Probability
No ratings yet
Lecture No 10:: STA301 - Statistics and Probability
6 pages
Biostatistics
No ratings yet
Biostatistics
2 pages
Lecture 02 20190212
No ratings yet
Lecture 02 20190212
49 pages
Overall Descriptive Statistics
No ratings yet
Overall Descriptive Statistics
127 pages
Backward Elimination Method
No ratings yet
Backward Elimination Method
3 pages
Applied Econometrics Using Stata
No ratings yet
Applied Econometrics Using Stata
48 pages
Week 6 - Reliability and Validity
No ratings yet
Week 6 - Reliability and Validity
26 pages
Tabulasi Data
No ratings yet
Tabulasi Data
31 pages
Skew Kurtosis
No ratings yet
Skew Kurtosis
7 pages
MATH 121 Chapter 8 Hypothesis Testing
No ratings yet
MATH 121 Chapter 8 Hypothesis Testing
31 pages
Understanding Confusion Matrix
No ratings yet
Understanding Confusion Matrix
4 pages
Chapter Two: Statistical Estimation: Definition of Terms: Interval Estimate
100% (1)
Chapter Two: Statistical Estimation: Definition of Terms: Interval Estimate
15 pages
Kathrynn a. Adams, Eva K. McGuire - Student Study Guide With IBM SPSS Workbook for Research Methods, Statistics, And Applications-SAGE Publications (2023)
No ratings yet
Kathrynn a. Adams, Eva K. McGuire - Student Study Guide With IBM SPSS Workbook for Research Methods, Statistics, And Applications-SAGE Publications (2023)
282 pages
Mathematics: Pedro Sukmadijaya Sonia Abrianti Wendi Kurniawan Xisci2
No ratings yet
Mathematics: Pedro Sukmadijaya Sonia Abrianti Wendi Kurniawan Xisci2
15 pages
Keys To Success in A Run-and-Gun Basketball System PDF
No ratings yet
Keys To Success in A Run-and-Gun Basketball System PDF
12 pages
Statastics MQP II Pu 2023-24
No ratings yet
Statastics MQP II Pu 2023-24
8 pages
Analyze House Price For King County
100% (1)
Analyze House Price For King County
15 pages
Var Jmulti
No ratings yet
Var Jmulti
40 pages

Module 1 Importance of Data Visualization and Data Exploration 1 (1)

Uploaded by

Module 1 Importance of Data Visualization and Data Exploration 1 (1)

Uploaded by

Data Visualization 21AD71

Introduction to Data Visualization

The Importance of Data Visualization

The Importance of Data Visualization

Visualizing data has many advantages

• Complex data can be easily understand

• Storytelling can be done using dashboards and animations

• Data can be explored through interactive visualizations

Briefly explain Data Visualization.

Why Data Visualization is Important/Significant?

Tools and Libraries for Visualization

Topic 2: Overview of Statistics

• Measures of Central Tendency,

ATME College of Engineering, 11

Measures of Central Tendency

Measures of Central Tendency

Example: Consider the dataset 2, 4, 4, 4, 5.

A simple statistical analysis helped us to narrow down our choices.

Data Type Best measure of central tendency

You might also like