SMDS Unit 1
SMDS Unit 1
Key Concepts:
1.Data:
The raw information Collected fool analysis.
This can be anything from numerical measurements to
Categorical observations.
2.variables:
The characteristics being measured (or) Studied. Ex: age,
income, gender (or) test scores.
3.Descriptive statistics:
summarize & describe the main features of a data set.
This includes measures like:
* Mean :The average of a dataset
* Median: The middle value when data is arranged in order.
* Mode: The most frequent value
* Standard deviation: A Measure of how spread out the data
(4) Inferential statistics:
Make predictions (or) draw Conclusions
about a larger population based on sample.
This involves techniques like:
Hypothesis Testing: Formulating & testing hypotheses
about a population parameter.
Confidence intervals: Estimating a range of values that
likely
Contains the true population parameter.
Regression Analysis: Examining the relationship b/w
2(or)more variables.
Data:
"Data" is a Collection of facts such as numbers, words,
measurements& observations.
DATA
Qualitative Quantitative
Quantitative
(Categorical Data) Quantitative
(Numerical Data)
Ex :gender, Color
Discrete Continuous
(Counted) (Measured)
Data visualization techniques:
Data visualization is the graphical representation of
information & data.
By using visual elements like charts, graphs & maps.
Data visualization tools provide an accessible way to See
& understand trends, outliers & patterns in data.
Some of the most common & effective Data visualization
techniques:
1.Bar chart:
use: Comparing Categories (or) groups.
Ex: sales figures for different products.
Bar chart
2.Line chart:
use: showing trends over time.
Ex: stock prices over a year.
Line chart
3.Pie chart:
Use: Showing the proportion of parts to a whole
EX: Market share of different companies.
Pie chart
4.Scatter plots:
Use: Showing the relationship b/w 2 variables.
EX: Correlation b/w Height & Weight.
Scatter plot
5.Bubble plots:
A bubble plot is a type of charts that visually represents
data with 3 variables.
Bubble plot
Ex: Imagine a bubble plot analyzing the performance of
different car models
Key components of Bubble plot:
1.X-Axis: Represents one variable.
2.Y-Axis: Represents another variable.
3.Bubble size: 3rd variable.
Heat Maps:
Use: Visualization data across 2 dimensions using color.
EX: Showing busy times in a restaurant.
Dot Distribution:
Definition:
A dot distribution is a type of thematic map that uses dots
to represent the presence, quantity,(or) value of a
phenomenon in a specific area.
Each dot represents a specific number of occurrences or
instances of the mapped phenomenon.
Types of Dot Distribution:
1.One-to-One Dot Map: Each dot represents one instance
(e.g., one person, one tree).
2.One-to-Many (Representative Dot Map): Each dot
represents a specific number (e.g., 1 dot =
100 people).
3.Random Dot Map: Dots are placed randomly within an
area to represent the quantity.
4.Uniform Dot Map: Dots are placed evenly spaced for a
more systematic look.
Advantages:
Visual Clarity: Provides an immediate visual impression of
distribution patterns.
Easy Interpretation: Simple to understand, especially for non-
technical audiences.
Effective Comparison: Useful for comparing density and
distribution between regions.
Scalable: Can represent small or large quantities effectively.
Disadvantages
Overlapping Dots: In high-density areas, dots may overlap,
making it hard to interpret.
Misleading Placement: Dots might be placed randomly,
leading to false impressions of exact
locations.
Scale Sensitivity: Choosing the right scale (value per dot) is
critical; wrong choice can distort
interpretation.
Data Generalization: Often uses aggregate data, which might
hide local variations.
Applications:
Population Studies: Mapping human population density and
distribution.
Epidemiology: Tracking the spread of diseases (e.g., COVID-
19 cases).
Agriculture: Showing distribution of crops or livestock.
Urban Planning: Visualizing distribution of facilities like
schools or hospitals.
Environmental Science: Mapping occurrences of natural
features like forests or water bodies.
Historical Studies: Representing historical events like battles
or migrations.
Tree diagrams:
Tree diagrams are a valuable tool in various statistical
Methods for data science.
It is used for representing the structure of a given website.
Matrices diagram:
Matrices diagram is powerful visualization
techniques that help you to understand and analyze the
relationships between different sets of data.
Principal components Analysis:
Introduction to data distributions:
Data distributions are a foundational concept in
statistics and data analysis.
They describe how data values are spread or
distributed across a range.
Understanding distributions helps us analyze data
patterns, make predictions, and draw meaningful
conclusions.
Types of Data Distributions:
1. Discrete
2. Continuous
1.Discrete:
1. Binomial Distribution:
o Used to describe the probability of success or
failure in experiments with two possible
outcomes.
o Example: Flipping a coin (heads or tails).
2. Poisson Distribution:
o Describes the probability of a given number
of events occurring within a fixed interval.
Example: Number of customer arrivals per minute.
2.Continuous:
1.Normal Distribution:
o Also called a "bell curve," this is one of the
most common data distributions.
o The data is symmetrically distributed around
the mean.
o Examples: Heights, weights, test scores often
follow a normal distribution.
2.Exponential Distribution:
o Describes the time between events in a
Poisson process (events occurring at a
constant rate independently).
o Example: Time between arrivals at a bus
stop.
Key Concepts in Data Distributions:
Central Tendency: Measures like mean, median,
and mode indicate where the data is centered.
Variability: Includes range, variance, and standard
deviation, showing the spread of data.
Shape: Describes whether the data is
symmetrical, skewed, or has specific patterns.
\