Data Visualization & Probability Basics

- Data visualization uses visual representations like charts, graphs, and maps to clearly show patterns in data. It allows for fast decision making and higher understanding across different roles. - Common types include charts, tables, graphs, and maps. Specific examples are area charts, bar charts, and scatter plots. Measures of central tendency and dispersion help summarize data. Probability distributions describe the possible values and likelihoods of random variables and events. Joint distributions define the probabilities of multiple variables.

Uploaded by

Manohar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views29 pages

Data Visualization & Probability Basics

Uploaded by

Manohar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Data Visualization

“Data visualization is the visual representation of your data. With the help of charts,
maps, and other graphical elements these tools provide a simple and comprehensible
way to clearly see and easily discover insights and patterns in your data.”
Why do we need data visualization?
With the help of descriptive graphics and dashboards, even difficult information can
be clear and comprehensible.
Here are some noteworthy numbers, based on research, that confirm the importance
of visualization:
• People get 90% of information about their environment from the eyes.
• 50% of brain neurons take part in visual data processing.
• Pictures increase the wish to read a text up to 80%.
• People remember 10% of what they hear, 20% of what they read, and 80% of
what they see.
Advantages
Relevant visualization brings lots of advantages for your business:

• Fast decision-making. Summing up data is easy and fast with graphics, which let
you quickly see that a column or touchpoint is higher than others without looking
through several pages of statistics in Google Sheets or Excel.
• More people involved. Most people are better at perceiving and remembering
information presented visually.
• Higher degree of involvement. Beautiful and bright graphics with clear messages
attract readers’ attention.
• Better understanding. Perfect reports are transparent not only for technical
specialists, analysts, and data scientists but also for CMOs and CEOs, and help
each and every worker make decisions in their area of responsibility.
Common general types of data visualization:
Charts
Tables
Graphs
Maps
Infographics
Dashboards
More specific examples of methods to visualize data:
Area Chart
Bar Chart
Box-and-whisker Plots
Scatter plot
Charts
The easiest way to show the development of one or several data sets is a chart.
Charts vary from bar and line charts that show the relationship between elements
over time to pie charts that demonstrate the components or proportions between the
elements of one whole.
Plots
Plots allow to distribute two or more data sets over a 2D or even 3D space to show
the relationship between these sets and the parameters on the plot. Plots also vary.
Scatter and bubble plots are some of the most widely-used visualizations. When it
comes to big data, analysts often use more complex box plots that help visualize the
relationship between large volumes of data.
Histogram
A histogram is a graphical representation that organizes a group of data points into
user-specified ranges. Similar in appearance to a bar graph, the histogram condenses
a data series into an easily interpreted visual by taking many data points and
grouping them into logical ranges or bins.
Table
Table (or a data table) is an efficient format for comparative data analysis on
categorical objects. Usually, the items being compared are placed in a column,
while the categorical objects are in the rows. The quantitative value is then placed at
the intersection of the row and column, called the cell.
Frequency distribution
A frequency distribution is an overview of all distinct values in some variable and
the number of times they occur. That is, a frequency distribution tells how
frequencies are distributed over values. Frequency distributions are mostly used for
summarizing categorical variables.
Central tendency and dispersion
Measures of central tendency map a vector of observations onto a single number
that represents, roughly put, “the center”.
Since what counts as a “center” is ambiguous, there are several measures of central
tendencies.
Different measures of central tendencies can be more or less adequate for one
purpose or another.
The type of variable (nominal, ordinal or metric, for instance) will also influence the
choice of measure.
We will visit three prominent measures of central tendency here: (arithmetic)
mean, median and mode.
Measures of dispersion indicate how much the observations are spread out around,
let’s say, “a center”. We will visit three prominent measures of dispersion:
the variance and the standard deviation .
Central tendency and dispersion
Central tendency and dispersion
• The (arithmetic) mean

• The arithmetic mean can be understood intuitively as the centre of gravity.

The median
• If →x=⟨x1,…,xn⟩ is a vector of n data observations from an at least ordinal measure and
if →x→ is ordered such that for all 1≤ i < n we have xi ≤ xi+1, the median is the value xi such
that the number of data observations that are bigger or equal to xi and the number of data
observations that are smaller or equal to xi are equal.
Central tendency and dispersion
The mode
• The mode is the unique value that occurred most frequently in the data. If there is
no unique value with that property, there is no mode.
• While the mean is only applicable to metric variables, and the median only to
variables that are at least ordinal, the mode is only reasonable for variables that
have a finite set of different possible observations.
•
•
Probabilities and outcomes
Your grade on an exam, and the number of times your internet connection fails
while you are writing a term paper all have an element of chance or randomness. In
each of these examples, there is something not yet known that is eventually
revealed.
The mutually exclusive potential results of a random process are called the
outcomes. For example, while writing your term paper, the internet connection
might never fail, it might fail once, it might fail twice, and so on. Only one of these
outcomes will actually occur (the outcomes are mutually exclusive)
The probability of an outcome is the proportion of the time that the outcome occurs
in the long run. (Trail)
If the probability of your internet connection not failing while you are writing a
term paper is 80%, then over the course of writing many term papers, you will
complete 80% without a wireless connection failure.
Probabilities and outcomes
The sample space and events.
The set of all possible outcomes is called the sample space. An event is a subset of
the sample space; that is, an event is a set of one or more outcomes.
The event “my internet connection will fail no more than once” is the set consisting
of two outcomes: “no failures” and “one failure.”
Random variables
A random variable is a numerical summary of a random outcome. The number of
times your internet connection fails while you are writing a term paper is random
and takes on a numerical value, so it is a random variable.
Some random variables are discrete and some are continuous. As their names
suggest, a discrete random variable takes on only a discrete set of values, like 0, 1,
2, . . . , whereas a continuous random variable takes on a continuum of possible
values.
Probability Distributions
A probability distribution is a statistical function that describes all the possible
values and likelihoods that a random variable can take within a given range.
Probability Distribution of a Discrete Random Variable
The probability distribution of a discrete random variable is the list of all possible
values of the variable and the probability that each value will occur. These
probabilities sum to 1.
For example, let M be the number of times your internet network connection
fails while you are writing a term paper. The probability distribution of the random
variable M is the list of probabilities of all possible outcomes: The probability that
M = 0, denoted Pr (M = 0), is the probability of no wireless connection failures;
Pr (M = 1) is the probability of a single connection failure; and so forth.
•
Probability Distributions
•
Probability Distributions
Cumulative probability distribution. of a Continuous Random Variable
Probability Distributions
Probability Distribution of a Continuous Random Variable
Probability density function.
Because a continuous random variable can take on a continuum of possible values,
the probability distribution used for discrete variables, which lists the probability of
each possible value of the random variable, is not suitable for continuous variables.
Instead, the probability is summarized by the probability density function. The area
under the probability density function between any two points is the probability that
the random variable falls between those two points. A probability density function is
also called a p.d.f., a density function, or simply a density.
Probability Distributions
Probability density function of a Continuous Random Variable
Probability Distributions
Joint distribution.
The joint probability distribution of two discrete random variables, say X and Y, is the probability
that the random variables simultaneously take on certain values, say x and y. The probabilities of all
possible (x, y) combinations sum to 1.
The joint probability distribution can be written as the function Pr(X = x, Y = y)
For example, weather conditions—whether or not it is raining—affect the commuting time of the
student commute.
Let Y be a binary random variable that equals 1 if the commute is short (less than 20 minutes) and
that equals 0 otherwise, and let X be a binary random variable that equals 0 if it is raining and 1 if
not.
Between these two random variables, there are four possible outcomes: it rains and the commute is
long (X = 0, Y = 0); rain and short commute (X = 0, Y = 1); no rain and long commute (X = 1, Y =
0); and no rain and short commute (X = 1, Y = 1). The joint probability distribution is the frequency
with which each of these four outcomes occurs over many repeated commutes.
Probability Distributions
Joint distribution.
•
Probability Distributions
Joint distribution.
For example, the probability of a long rainy commute is 15%, and the probability of
a long commute with no rain is 7%, so the probability of a long commute (rainy or
not) is 22%. The marginal distribution of commuting times is given in the final
column of Table 2.2. Similarly, the marginal probability that it will rain is 30%, as
shown in the final row
Probability Distributions
Conditional Distributions
Conditional distribution. The distribution of a random variable Y conditional on
another random variable X taking on a specific value is called the conditional
Distribution of Y given X. The conditional probability that Y takes on the value y
when X takes on the value x is written Pr(Y = y | X = x).

Internet failure type of network

0 old
1 new
Probability Distributions
Conditional Distributions

In general, the conditional distribution of Y given X = x is

• Pr (Y = y |X = x) = Pr (X = x, Y = y)/ Pr (X = x)
• Probability of long commute given a rainy day
• Pr (Y = LC |X = R) = Pr (X = R, Y = LC)/ Pr (X = R)
• =0.15/.3=0.5

Prob & Stat
No ratings yet
Prob & Stat
50 pages
Business Statistics: Qualitative or Categorical Data
No ratings yet
Business Statistics: Qualitative or Categorical Data
14 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
Interpreting Test Score: Online Workshop 8602 Aiou
100% (1)
Interpreting Test Score: Online Workshop 8602 Aiou
39 pages
Data Science Using R
No ratings yet
Data Science Using R
34 pages
Insem AIML
No ratings yet
Insem AIML
8 pages
Module 2 - Statistical Foundations
No ratings yet
Module 2 - Statistical Foundations
108 pages
Data Science Using R
No ratings yet
Data Science Using R
34 pages
Understanding Statistics and Data Analysis
No ratings yet
Understanding Statistics and Data Analysis
104 pages
Year 1 Statistics Guide
No ratings yet
Year 1 Statistics Guide
49 pages
Lesson 4 Notes
No ratings yet
Lesson 4 Notes
14 pages
Unit Iii
No ratings yet
Unit Iii
152 pages
Business Analytics & Excel Functions
No ratings yet
Business Analytics & Excel Functions
68 pages
Basic Stat
No ratings yet
Basic Stat
46 pages
M1 & M2 Supplementaries
No ratings yet
M1 & M2 Supplementaries
52 pages
Basic Biostats Part
No ratings yet
Basic Biostats Part
59 pages
Lesson2 - Measures of Tendency
No ratings yet
Lesson2 - Measures of Tendency
65 pages
Business Statistics & Quantitative Techniques
No ratings yet
Business Statistics & Quantitative Techniques
35 pages
Math Iii
No ratings yet
Math Iii
6 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
21 pages
The Idiomatic Programmer - Statistics Primer
No ratings yet
The Idiomatic Programmer - Statistics Primer
44 pages
GLY 413 Geostatistics and Data Analysis (1) - 021606
No ratings yet
GLY 413 Geostatistics and Data Analysis (1) - 021606
22 pages
Understanding Variables in Statistics
No ratings yet
Understanding Variables in Statistics
16 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
59 pages
Statistics
No ratings yet
Statistics
4 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
100 pages
Applied Statistics for Business Students
No ratings yet
Applied Statistics for Business Students
14 pages
Basics For Understanding
No ratings yet
Basics For Understanding
8 pages
Foundational Mathematics of Data Science B. Tech Sem-VI UNIT-I, II
No ratings yet
Foundational Mathematics of Data Science B. Tech Sem-VI UNIT-I, II
41 pages
Statistics 101: Introduction To Data Management
No ratings yet
Statistics 101: Introduction To Data Management
37 pages
Stats Week 1 PDF
No ratings yet
Stats Week 1 PDF
6 pages
1.ungrouped Data Mean, Median&Mode
No ratings yet
1.ungrouped Data Mean, Median&Mode
39 pages
Intro to Descriptive Statistics
No ratings yet
Intro to Descriptive Statistics
51 pages
SMA 140 Lectures Notes 2024 Sep
No ratings yet
SMA 140 Lectures Notes 2024 Sep
87 pages
Basic Statistics
100% (10)
Basic Statistics
73 pages
Intro to Statistics for Beginners
No ratings yet
Intro to Statistics for Beginners
101 pages
Statistics
No ratings yet
Statistics
10 pages
How Much Data Does Google Handle?
No ratings yet
How Much Data Does Google Handle?
132 pages
Analytical Techniques Lec 1
No ratings yet
Analytical Techniques Lec 1
42 pages
Week 2
No ratings yet
Week 2
27 pages
IntroductionofStatistics PDF
No ratings yet
IntroductionofStatistics PDF
163 pages
Statistics - Introduction To Basic Concepts
No ratings yet
Statistics - Introduction To Basic Concepts
5 pages
Analytics Compendium (Incl Stats)
No ratings yet
Analytics Compendium (Incl Stats)
31 pages
Chapter IV Data Exploration and Visualization
No ratings yet
Chapter IV Data Exploration and Visualization
3 pages
PIM3 - Basics of Business Statistics
No ratings yet
PIM3 - Basics of Business Statistics
37 pages
Descriptive Analytics Overview
No ratings yet
Descriptive Analytics Overview
25 pages
Descriptive Analytics Notes
No ratings yet
Descriptive Analytics Notes
6 pages
Univariate Statistics w24 Update
No ratings yet
Univariate Statistics w24 Update
144 pages
Notes
No ratings yet
Notes
29 pages
Mathematics in The Modern World
No ratings yet
Mathematics in The Modern World
2 pages
L2 - Mathematical Preliminaries
No ratings yet
L2 - Mathematical Preliminaries
24 pages
Probability & Statistics Basics
No ratings yet
Probability & Statistics Basics
72 pages
02 UNIT 2 Home Work
No ratings yet
02 UNIT 2 Home Work
24 pages
Lecture Slides - Inferential Statistics
100% (1)
Lecture Slides - Inferential Statistics
42 pages
Statistics Basics for Beginners
No ratings yet
Statistics Basics for Beginners
18 pages
Business Statistics with Excel Guide
No ratings yet
Business Statistics with Excel Guide
13 pages
Adv U2
No ratings yet
Adv U2
13 pages
MATH 361: Probability & Statistics Overview
No ratings yet
MATH 361: Probability & Statistics Overview
17 pages
Accounting Basics for Beginners
No ratings yet
Accounting Basics for Beginners
13 pages
3405 24654 Textbooksolution PDF
No ratings yet
3405 24654 Textbooksolution PDF
36 pages
3405 24653 Textbooksolution PDF
No ratings yet
3405 24653 Textbooksolution PDF
24 pages
Recruitment Process and Factors Explained
No ratings yet
Recruitment Process and Factors Explained
36 pages
Module 2 Part B
No ratings yet
Module 2 Part B
6 pages
Human Resource Planning Guide
No ratings yet
Human Resource Planning Guide
6 pages
Data Science's Role in Business & Society
No ratings yet
Data Science's Role in Business & Society
47 pages
Personality Traits and Job Fit Analysis
No ratings yet
Personality Traits and Job Fit Analysis
7 pages
AI Trends in Business Analytics
No ratings yet
AI Trends in Business Analytics
20 pages
FM 1
No ratings yet
FM 1
20 pages
Scope
No ratings yet
Scope
11 pages
Session 04 Lecture Notes 0202
No ratings yet
Session 04 Lecture Notes 0202
23 pages
Windows: Institute of Structural Engineering
No ratings yet
Windows: Institute of Structural Engineering
15 pages
Probability Density Functions Analysis
No ratings yet
Probability Density Functions Analysis
3 pages
OTC 24063 Reliability of API, NGI, ICP and Fugro Axial Pile Capacity Calculation Methods
No ratings yet
OTC 24063 Reliability of API, NGI, ICP and Fugro Axial Pile Capacity Calculation Methods
22 pages
Buonopane e Schafer (2006) - Reliability Hot-Rolled
No ratings yet
Buonopane e Schafer (2006) - Reliability Hot-Rolled
10 pages
MA107 Tutorial 5
No ratings yet
MA107 Tutorial 5
4 pages
Introduction To Probability For Data Science
100% (1)
Introduction To Probability For Data Science
70 pages
PQT Unit1
0% (1)
PQT Unit1
61 pages
Weibull Distribution Guide
No ratings yet
Weibull Distribution Guide
29 pages
Probabilistic Method
No ratings yet
Probabilistic Method
120 pages
Reliability-21 08 2023
No ratings yet
Reliability-21 08 2023
51 pages
MCA 2nd Semester Math Assignment
No ratings yet
MCA 2nd Semester Math Assignment
2 pages
MM Slides
No ratings yet
MM Slides
43 pages
Stochastic Processes for Engineers
No ratings yet
Stochastic Processes for Engineers
89 pages
Joint Probability Distribution
No ratings yet
Joint Probability Distribution
35 pages
Assignment I
No ratings yet
Assignment I
2 pages
HL AA Integration and Diff Equations RMS
No ratings yet
HL AA Integration and Diff Equations RMS
20 pages
Probability With STEM Applications 3rd Edition Carlton Instant Download
100% (5)
Probability With STEM Applications 3rd Edition Carlton Instant Download
47 pages
Advanced Probability Homework
No ratings yet
Advanced Probability Homework
3 pages
2024 Question Paper Mid Term Mat2227
No ratings yet
2024 Question Paper Mid Term Mat2227
3 pages
Probability Distributions Overview
No ratings yet
Probability Distributions Overview
11 pages
Decision-Support Systems Basics
No ratings yet
Decision-Support Systems Basics
44 pages
Statistics A-Level Formula Sheet
No ratings yet
Statistics A-Level Formula Sheet
9 pages
Chapter .4
No ratings yet
Chapter .4
10 pages
MSE204 Lecture Questions
No ratings yet
MSE204 Lecture Questions
24 pages
Probability Solutions for Students
No ratings yet
Probability Solutions for Students
37 pages
Practice Questions - Continuous Distributions
No ratings yet
Practice Questions - Continuous Distributions
19 pages
Ma301 Probability and Statistics
No ratings yet
Ma301 Probability and Statistics
1 page
Econometrics 1 Cumulative Final Study Guide
No ratings yet
Econometrics 1 Cumulative Final Study Guide
35 pages
Work Sheet-III Final
No ratings yet
Work Sheet-III Final
6 pages

Data Visualization & Probability Basics

Uploaded by

Data Visualization & Probability Basics

Uploaded by

Data Visualization

• The arithmetic mean can be understood intuitively as the centre of gravity.

Internet failure type of network

In general, the conditional distribution of Y given X = x is

You might also like