0% found this document useful (0 votes)
7 views36 pages

SMDS Unit 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views36 pages

SMDS Unit 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

UNIT-1

Data Visualization and Distributions


Syllabus:
Data Visualization Techniques:
Introduction to Statistical Method's
Exploratory Data Analysis:
Charts:( Line, Pie, Bar)
Plots:( Bubble ,Scatter)
Maps:
Heat
Dot Distribution
Diagrams:
Trees
Matrices
principal Components Analysis
Intro to Data distributions
Probability Distributions
Discrete:
[Binomial, poisson]
Continuous:
[Normal, exponential]
Introduction Statistical Methods:
Statistical methods are set of techniques used to Collect,
analyze, interpret & present numerical data.
They help us draw meaningful Conclusion's from data,
identify pattern’s & trends.
Make informed decisions in various fields like:
 Business
 Science
 social research.

Key Concepts:
1.Data:
The raw information Collected fool analysis.
This can be anything from numerical measurements to
Categorical observations.
2.variables:
The characteristics being measured (or) Studied. Ex: age,
income, gender (or) test scores.
3.Descriptive statistics:
summarize & describe the main features of a data set.
This includes measures like:
* Mean :The average of a dataset
* Median: The middle value when data is arranged in order.
* Mode: The most frequent value
* Standard deviation: A Measure of how spread out the data
(4) Inferential statistics:
Make predictions (or) draw Conclusions
about a larger population based on sample.
This involves techniques like:
 Hypothesis Testing: Formulating & testing hypotheses
about a population parameter.
 Confidence intervals: Estimating a range of values that
likely
 Contains the true population parameter.
 Regression Analysis: Examining the relationship b/w
2(or)more variables.
Data:
"Data" is a Collection of facts such as numbers, words,
measurements& observations.
DATA

Qualitative Quantitative
Quantitative
(Categorical Data) Quantitative
(Numerical Data)
Ex :gender, Color

Discrete Continuous
(Counted) (Measured)
Data visualization techniques:
 Data visualization is the graphical representation of
 information & data.
 By using visual elements like charts, graphs & maps.
 Data visualization tools provide an accessible way to See
& understand trends, outliers & patterns in data.
 Some of the most common & effective Data visualization
techniques:

1.Bar chart:
use: Comparing Categories (or) groups.
Ex: sales figures for different products.

Bar chart
2.Line chart:
use: showing trends over time.
Ex: stock prices over a year.

Line chart
3.Pie chart:
Use: Showing the proportion of parts to a whole
EX: Market share of different companies.

Pie chart
4.Scatter plots:
Use: Showing the relationship b/w 2 variables.
EX: Correlation b/w Height & Weight.

Scatter plot
5.Bubble plots:
A bubble plot is a type of charts that visually represents
data with 3 variables.

Bubble plot
Ex: Imagine a bubble plot analyzing the performance of
different car models
Key components of Bubble plot:
1.X-Axis: Represents one variable.
2.Y-Axis: Represents another variable.
3.Bubble size: 3rd variable.

Heat Maps:
Use: Visualization data across 2 dimensions using color.
EX: Showing busy times in a restaurant.
Dot Distribution:
Definition:
A dot distribution is a type of thematic map that uses dots
to represent the presence, quantity,(or) value of a
phenomenon in a specific area.
Each dot represents a specific number of occurrences or
instances of the mapped phenomenon.
Types of Dot Distribution:
1.One-to-One Dot Map: Each dot represents one instance
(e.g., one person, one tree).
2.One-to-Many (Representative Dot Map): Each dot
represents a specific number (e.g., 1 dot =
100 people).
3.Random Dot Map: Dots are placed randomly within an
area to represent the quantity.
4.Uniform Dot Map: Dots are placed evenly spaced for a
more systematic look.
Advantages:
Visual Clarity: Provides an immediate visual impression of
distribution patterns.
Easy Interpretation: Simple to understand, especially for non-
technical audiences.
Effective Comparison: Useful for comparing density and
distribution between regions.
Scalable: Can represent small or large quantities effectively.
Disadvantages
Overlapping Dots: In high-density areas, dots may overlap,
making it hard to interpret.
Misleading Placement: Dots might be placed randomly,
leading to false impressions of exact
locations.
Scale Sensitivity: Choosing the right scale (value per dot) is
critical; wrong choice can distort
interpretation.
Data Generalization: Often uses aggregate data, which might
hide local variations.
Applications:
Population Studies: Mapping human population density and
distribution.
Epidemiology: Tracking the spread of diseases (e.g., COVID-
19 cases).
Agriculture: Showing distribution of crops or livestock.
Urban Planning: Visualizing distribution of facilities like
schools or hospitals.
Environmental Science: Mapping occurrences of natural
features like forests or water bodies.
Historical Studies: Representing historical events like battles
or migrations.
Tree diagrams:
Tree diagrams are a valuable tool in various statistical
Methods for data science.
It is used for representing the structure of a given website.
Matrices diagram:
Matrices diagram is powerful visualization
techniques that help you to understand and analyze the
relationships between different sets of data.
Principal components Analysis:
Introduction to data distributions:
Data distributions are a foundational concept in
statistics and data analysis.
They describe how data values are spread or
distributed across a range.
Understanding distributions helps us analyze data
patterns, make predictions, and draw meaningful
conclusions.
Types of Data Distributions:
1. Discrete
2. Continuous
1.Discrete:
1. Binomial Distribution:
o Used to describe the probability of success or
failure in experiments with two possible
outcomes.
o Example: Flipping a coin (heads or tails).
2. Poisson Distribution:
o Describes the probability of a given number
of events occurring within a fixed interval.
Example: Number of customer arrivals per minute.
2.Continuous:
1.Normal Distribution:
o Also called a "bell curve," this is one of the
most common data distributions.
o The data is symmetrically distributed around
the mean.
o Examples: Heights, weights, test scores often
follow a normal distribution.

2.Exponential Distribution:
o Describes the time between events in a
Poisson process (events occurring at a
constant rate independently).
o Example: Time between arrivals at a bus
stop.
Key Concepts in Data Distributions:
 Central Tendency: Measures like mean, median,
and mode indicate where the data is centered.
 Variability: Includes range, variance, and standard
deviation, showing the spread of data.
 Shape: Describes whether the data is
symmetrical, skewed, or has specific patterns.
\

You might also like