0% found this document useful (0 votes)

20 views90 pages

Unit 03 Descriptive Analysis and Visual Exploration

Uploaded by

missphamhoaituanh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views90 pages

Unit 03 Descriptive Analysis and Visual Exploration

Uploaded by

missphamhoaituanh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 90

ANLC751: Managerial

Analytics
UNIT 03: DESCRIPTIVE ANALYSIS
AND VISUAL EXPLORATION -1
Lesson Objectives
• Data Modifications in Excel
• Creating distributions with Data
• Measures of Location and Variation
• Analyzing Distributions
• Measures of Association between variables
Part 1 : All About Structured Data: Database, Tables, Rows, Columns
Columns| Variables| Fields

Rows|
Observations|
Records

• Data we deal with are mainly 2 dimensional tables

• Made up of column and rows
• Column refer to as a variable, usually is a characteristic within a specific record
• Row refer to a specific record or instance
• There could be multiple related tables that holds different sets of information,
which can be referred to as a database
3
Data Exploration: Modifying
Data in Excel
SOME EXPLORATORY TECHNIQUES
MODIFYING DATA IN EXCEL

• Projects often involve so much data that it is difficult to

analyze all of the data at once
• We will look at methods for summarizing and
manipulating data to make the data more manageable
and to develop insights.
Exploratory Techniques
Top 20 Selling Automobiles in United States in March 2011
Rank (by March Sales (March Sales (March
2011 Sales) Manufacturer Model 2011) 2010)
1 Honda Accord 33616 29120
2 Nissan Altima 32289 24649
3 Toyota Camry 31464 36251
4 Honda Civic 31213 22463
5 Toyota Corolla/Matrix 30234 29623
6 Ford Fusion 27566 22773
7 Hyundai Sonata 22894 18935
8 Hyundai Elantra 19255 8225
9 Toyota Prius 18605 11786
10 Chevrolet Cruze/Cobalt 18101 10316
11 Chevrolet Impala 18063 15594
12 Nissan Sentra 17851 8721
13 Ford Focus 17178 19500
14 Volkswagon Jetta 16969 9196
15 Chevrolet Malibu 15551 17750
16 Mazda 3 12467 11353
17 Nissan Versa 11075 13811 Top20Cars.xlsx
18 Subaru Outback 10498 7619
19 Kia Soul 10028 5106
20 Ford Fiesta 9787 0
Top 20 Selling Automobiles Data entered into Excel with
Percent Change in Sales from 2010

7
Modifying Data in Excel

• Sorting and Filtering data in excel

• Illustration - To sort the automobiles by March 2010 sales
oStep 1: Select cells A1:F21
oStep 2: Click the DATA tab in the Ribbon
oStep 3: Click Custom Sort in the Sort & Filter group
oStep 4: Select the check box for My data has headers
oStep 5: In the first Sort by dropdown menu, select Sales (March 2010)
oStep 6: In the Order dropdown menu, select Largest to Smallest
oStep 7: Click OK

8
Using Excel’s Sort Function
Top Selling Automobiles Data Sorted by Sales in
March 2010 Sales
Modifying Data in Excel
• Sorting and Filtering Data in Excel
• Find the number of Toyota models that were among the top 20 selling in 2011

Example - Using Excel’s Filter function to see the sales of models made by Toyota.
o Step 1: Select cells A1:F21
o Step 2: Click the DATA tab in the Ribbon
o Step 3: Click Filter in the Sort & Filter group
o Step 4: Click on the Filter Arrow in column B, next to Manufacturer
o Step 5: Select only the check box for Toyota. You can easily deselect all choices by
unchecking (Select All)

11
Top Selling Automobiles Data Filtered to Show
Only Automobiles Manufactured by Toyota

12
Modifying Data in Excel

 Conditional Formatting of Data in Excel: Makes it easy to

identify data that satisfy certain conditions in a data set.

Illustration - To identify the automobile models for which sales had

decreased from March 2010 to March 2011.
• Step 1: Starting with the original data shown in Cars, select cells F1:F21
• Step 2: Click on the HOME tab in the Ribbon

13
Modifying Data in Excel
Illustration (contd.)
• Step 3: Click Conditional Formatting in the Styles group

• Step 4: Select Highlight Cells Rules, and click Less Than from the
dropdown menu

• Step 5: Enter 0% in the Format cells that are LESS THAN: box
Step 6: Click OK

14
Using Conditional Formatting in Excel to Highlight
Automobiles with Declining Sales from March 2010

15
Using Conditional Formatting in Excel to Generate
Data Bars for the Top Selling Automobiles Data

16
Data Exploration: Creating
Distributions from Data
Creating Distributions from Data
Frequency distributions for qualitative/categorical data

• Frequency distribution: A summary of data that

shows the number (frequency) of observations in
each of several nonoverlapping classes, typically
referred to as bins, when dealing with distributions.

18
Sample Data
Coke Coke
Diet Coke Sprite
Pepsi Pepsi
Diet Coke Coke
Coke Pepsi
Coke Sprite

Data from a Qualitative Sample of 50 Soft Drink

Dr. Pepper Dr. Pepper
Diet Coke Pepsi
Pepsi
Pepsi
Diet Coke
Pepsi Purchases
Coke Coke
Dr. Pepper Coke
Sprite Diet Coke
Coke Pepsi
Diet Coke Pepsi
Coke Pepsi
Coke Coke
Diet Coke Dr. Pepper
Coke Sprite
Coke Coke
Coke Coke
Sprite Pepsi Data:
Coke Dr. Pepper SoftDrinks.xlsx
Coke Pepsi
Diet Coke Pepsi

19
Frequency Distribution of Soft Drink Purchases

• The frequency distribution summarizes information

about the popularity of the five soft drinks

20
Creating a Frequency Distribution for Soft Drinks
Data in Excel

SoftDrinks.xlsx

21
Creating Distributions from Data
 Relative frequency and Percent frequency
distributions
• Relative frequency distribution: It is a tabular summary
of data showing the relative frequency for each bin.
• Percent frequency distribution: Summarizes the percent
frequency of the data for each bin.
• Used to provide estimates of the relative likelihoods of
different values of a random variable.

22
Percent
Bins Frequency
Relative
Frequency Relative
Frequency
(%) Frequency and
Coke 19 0.38 38 Percent
Diet Coke 8 0.16 16
Frequency
Pepsi 13 0.26 26
Distributions of
Soft Drink
Dr. Pepper 5 0.1 10
Purchases
Sprite 5 0.1 10
Total 50

23
Creating Distributions from Data
Frequency distributions for quantitative data

• Three steps necessary to define the classes for a frequency

distribution with quantitative data:
1. Determine the number of nonoverlapping bins (groups).
2. Determine the width of each bin.
3. Determine the bin limits.
Approximate bin width = Largest data value smallest data value
Number of bins
Creating Distributions from Data
Example: Year-End Audit Times (Days)
Find number of times audit duration was greater 30 days
Year-End Audit Times (in Days)
12 14 19 18
15 15 18 17
20 27 22 23
22 21 33 28
14 18 16 13

Frequency, Relative Frequency, and Percent Frequency

Distributions for the Audit Time Data
Class-Interval Relative Percent
(days) Bin Frequency Frequency Frequency
10-14 14 4 0.2 20
15-19 19 8 0.4 40
20-14 24 5 0.25 25
25-29 29 2 0.1 10
30-34 34 1 0.05 5 AuditData.xlsx
Using Excel to Generate a Frequency
Distribution for Audit Times Data
Year-End Audit Times (in Days)
12 14 19 18
15 15 18 17
20 27 22 23
22 21 33 28
14 18 16 13

Min =MIN(A2:D6)
Max =MAX(A2:D6)

Class-Interval (days) Bin Frequency Relative Frequency Percent Frequency

10-14 14 =FREQUENCY(A2:D6,B12:B16) =C12/$C$17 =D12*100

Class-Interval
15-19 19 =FREQUENCY(A2:D6,B12:B16) =C13/$C$17 =D13*100 (days) Bin Frequency
10-14 14 4
20-14 24 =FREQUENCY(A2:D6,B12:B16) =C14/$C$17 =D14*100 15-19 19 8
20-14 24 5
25-29 29 =FREQUENCY(A2:D6,B12:B16) =C15/$C$17 =D15*100
25-29 29 2
30-34 34 =FREQUENCY(A2:D6,B12:B16) =C16/$C$17 =D16*100 30-34 34 1
Total =SUM(C12:C16) =SUM(D12:D16) =SUM(E12:E16)
Creating Distributions from Data
 Histogram: A common graphical presentation of
quantitative data
• Constructed by placing the variable of interest on the
horizontal axis and the selected frequency measure (absolute
frequency, relative frequency, or percent frequency) on the
vertical axis.
• The frequency measure of each class is shown by drawing a
rectangle whose base is determined by the class limits on the
horizontal axis and whose height is the corresponding
frequency measure.
27
Histogram for the Audit Time Data

28
Creating a Histogram for the Audit Time Data using Data
Analysis Toolpak in Excel

29
Completed Histogram for the Audit Time Data using Data
Analysis ToolPak in Excel

30
Creating Distributions from Data
 Histogram provides information about the
shape, or form, of a distribution.

 Skewness: Lack of symmetry

 Important characteristic of the shape of a
distribution

31
Histograms Showing Distributions with Different
Levels of Skewness

Skewness
of audit
data?

32
Creating Distributions from Data
 Cumulative frequency distribution: A variation of the
frequency distribution that provides another tabular
summary of quantitative data.

• Uses the number of classes, class widths, and class

limits developed for the frequency distribution.

• Shows the number of data items with values less

than or equal to the upper class limit of each class.

33
Cumulative Frequency, Cumulative Relative
Frequency, and Cumulative Percent Frequency
Distributions for the Audit Time Data

34
Descriptive Analysis
MEASURES OF LOCATION (CENTRAL TENDENCY) AND
VARIATION
All About Data: Descriptive Statistics
• For any numerical dataset we would generally calculate the central tendency, (measures of
location) The most commonly used measures are
o Mean – AKA Average
o Median – The middle value of your data when the numbers are listed in order from
smallest to largest
o Mode – The number that occurs most in your value
• Measure the variability
o Min (Minimum) – Smallest value
o Max (Maximum) – Largest value
o Range (Min, Max) – Smallest to Largest
o Standard Deviation
• This should give you an indication overall data size, values, and variability of the data.
All About Data: Descriptive Statistics
• Continue to explore shape of your data variability
• Interquartile range – similar to range but instead of
calculating difference between smallest and biggest
value, you calculate the difference between the 25th
quantile and 75th quantile (values that fall within 25%
and 75% respectively)

37
Measures of Location
Measures of Location
• Mean/Arithmetic mean
• Average value for a variable.
• The sample mean is denoted by .
∑ + +···+
Sample mean, = =
o n = sample size
o = value of variable x for the first observation
o = value of variable x for the second observation
o = value of variable x for the nth observation

39
Data on Home Sales in Cincinnati, Ohio, Suburb
Illustration: Computation of the mean home selling
price for the sample of 12 home sales:

40
Computation of Sample Mean
Illustration: Computation of the mean home selling
price for the sample of 12 home sales:
= =
12
138,000 254,000 + 456,250
=
12
2,639,250
=
12
= 219,937.50

41
Measures of Location
• Median: Value in the middle when the data are arranged
in ascending order.

• Middle value, for an odd number of observations

• Average of two middle values, for an even number of
observations

42
Computation of Sample Median
Illustration - When the number of observations are odd
• Consider the class size data for a sample of five college classes:
46 54 42 46 32
• Arrange the class size data in ascending order
32 42 46 46 54
• Middlemost value in the data set = 46.
• Median is 46.

43
Computation of Sample Median
Illustration - When the number of observations are even
• Consider the data on home sales in Cincinnati, Ohio, Suburb:
Home Sale Selling Price ($)
1 138000
2 254000

3 186000
4 257500
5 108000
6 254000
7 138000
8 298000
9 199500
HomeSales.xlsx
10 208000
11 142000
12 456250 44
Computation of Sample Median
Illustration (contd.) - When the number of observations are even

• Arrange the data in ascending order:

1. 108,000
2. 138,000
3. 138,000
4. 142,000
5. 186,000
6. 199,500
Middle Two Values
7. 208,000
8. 254,000
9. 254,000
10. 257,500
11. 298,000
12. 456,250
199,500 + 208,000
• Median = average of two middle values = = 203,750
2

45
Measures of Location
• Mode: Value that occurs most frequently in a data set.
• Consider the class size data:
32 42 46 46 54
• Observe - 46 is the only value that occurs more than once.
• Mode is 46.
• Multimodal data - Data contain at least two modes.
• Bimodal data - Data contain exactly two modes.

46
Calculating the Mean, Median, and Modes for the Home
Sales Data using Excel

MODE.SNGL vs
MODE.MULTI

HomeSales.xlsx

47
Measures of Location
• Geometric mean: nth root of the product of n
values
• Used in analyzing growth rates in financial data.
• Sample geometric mean:
• =[ /

48
Illustration - Consider the percentage annual returns and growth
factors for the mutual fund data over the past 10 years.

Year Return (%) Growth Factor

Percentage
1
2
-22.1
28.7
0.779
1.287
(Return%/100) + 1
Annual
3
4
10.9
4.9
1.109
1.049
Returns and
5 15.8 1.158
Growth
Factors for the
6 5.5 1.055
7 -37 0.63

Mutual Fund
8 26.5 1.265
9 15.1 1.151
Data:
Data
49
10 2.1 1.021
MutualFundReturns.xlsx

• We will determine the mean rate of growth for the fund over the 10-
year period.
Computation of Geometric Mean
Solution:
• Product of the growth factors:
• (.779)(1.287)(1.109)(1.049)(1.158)(1.055)(.630)(1.265)(1.151)(1.021)
= 1.335
• Geometric mean of the growth factors:
= 1.335 = 1.029
• Conclude that annual returns grew at an average annual rate
of (1.029 – 1)*100% or 2.9%.

50
Calculating the Geometric Mean for the
Mutual Fund Data Using Excel

51
Measures of Variability
Measures of Variability
Home Sale Selling Price ($) • Range: Found by
1 138000
subtracting the
2 254000

3 186000
smallest value from
4 257500 the largest value in a
5 108000
6 254000 data set.
7 138000
8 298000 Illustration: Consider the data
9 199500 on home sales in Cincinnati,
10 208000 Ohio, Suburb:
11 142000
12 456250

53
Computation of Range
Illustration (contd.):
• Largest home sales price - $456,250
• Smallest home sales price - $108,000
• Range = Largest value – Smallest value
= $456,250 – $108,000
= $348,250
• Drawback: Range is based on only two of the observations
and thus is highly influenced by extreme values.

54
Measures of Variability
• Variance: Measure of variability that utilizes all the data.
• It is based on the deviation about the mean, which is the
difference between the value of each observation (xi) and
the mean.
• The deviations about the mean are squared while
computing the variance.
∑ ̅
• Sample variance, =
∑ µ
• Population variance , =

55
Computation of Deviations and Squared Deviations about the Mean
for the Class Size Data
Number of
Students in Class Mean Deviation about Squared Deviation about
(xi) class size the Mean (xi - 𝑥̅ ) the Mean (xi - 𝑥̅ )2
46 44 2 4
54 44 10 100
42 44 -2 4
46 44 2 4
32 44 -12 144
Total ε 0 𝑥 − 𝑥̅ = 256

• Computation of Sample Variance: • Computation of Sample Standard Deviation:

∑ ̅ 256
= = = 64 s= =
4
56
Measures of Variability
• Standard deviation: Positive square root of the variance
• Measured in the same units as the original data.
• For sample , s =
• For population, σ = σ
• Coefficient of variation:
• x 100
• Measures the standard deviation relative to the mean.
• Expressed as a percentage.

57
Computation of Coefficient of Variation
Illustration:
• Consider the customer store visit size data:
46 54 42 46 32
• Mean, = 44
• Standard deviation, s = 8
8
• Coefficient of variation = x 100 % = 18.2%
44
58
Measures of Variation:
Comparing Coefficients of Variation
Stock A: The scatter around the mean, relative
• Average price last year = $50 to the size of the mean, is 10%
• Standard deviation = $5
S  $5
CVA     100% 
  100%  10%
X  $50 Both stocks have
the same
Stock B: standard
deviation, but
• Average price last year = $100 stock B is less
• Standard deviation = $5 variable relative
to its price
S $5
CVB     100%   100%  5%
X $100
COPYRIGHT ©2013 PEARSON EDUCATION, INC. PUBLISHING AS PRENTICE HALL
Measures of Variation:
Comparing Coefficients of Variation
(continued)

Stock A:
• Average price last year = $50
• Standard deviation = $5
S $5
 
CVA     100%   100%  10%
X $50 Stock C has a
much smaller
standard
Stock C:
deviation but a
• Average price last year = $8 much higher
• Standard deviation = $2 coefficient of
variation
 S  $2
CVC     100%   100%  25%

X  $8
COPYRIGHT ©2013 PEARSON EDUCATION, INC. PUBLISHING AS PRENTICE HALL
Calculating Variability Measures for the Home
Sales Data in Excel

HomeSales.xlsx

61
Analyzing Distribution
Analyzing Distributions
• Percentile: Value of a variable at which a specified
(approximate) percentage of observations are below that
value.
• The pth percentile tells us the point in the data where:
• Approximately p percent of the observations have values less
than the pth percentile;
• Approximately (100 – p) percent of the observations have
values greater than the pth percentile.

63
Img source: Percentiles (mathsisfun.com)
Analyzing Distributions
• Steps to calculate the pth percentile:
1. Arrange the data in ascending order (smallest to largest
value). Our percentile value
2. Compute k = (n + 1) × p.
3. Divide k into its integer component, i, and its decimal
component, d.
a. If d = 0, find the kth largest value in the data set. This is the pth
percentile.

(contd.)

65
Analyzing Distributions
3b. If d > 0, the percentile is between the values in positions i
and i + 1 in the sorted data. To find this percentile, we must
interpolate between these two values.
i. Calculate the difference between the values in positions i and i + 1
in the sorted data set. We define this difference between the two
values as m.
ii. Multiply this difference by d: t = m × d.
iii. To find the pth percentile, add t to the value in position i of the
sorted data.
Say, if k = 2.75

66
Analyzing Distributions
Illustration: To determine the 85th percentile for the home sales data
1. Arrange the data in ascending order.
108,000 138,000 138,000 142,000 186,000 199,500
208,000 254,000 254,000 257,500 298,000 456,250
2. Compute k = (n + 1) × p = (12 + 1) × 0.85 = 11.05.
3. Dividing 11.05 into the integer and decimal components gives us
i = 11 and d = 0.05.
• d > 0, interpolate between the values in the 11th and 12th positions in the sorted data.

67
Analyzing Distributions
Illustration (contd.): To determine the 85th percentile for
the home sales data
The value in the 11th position is 298,000, and
• The value in the 12th position is 456,250.
i. m = 456,250 – 298,000 = 158,250
ii. t = m × d = 158,250 × 0.05 = 7912.5 Excel Function:
iii. pth percentile = 298,000 + 7912.5 = 305,912.5 Percentile.Exc (array,k)
• $305,912.50 represents the 85th percentile of the home sales data.
68
Analyzing Distributions
• Quartiles:
• When the data is divided into four equal parts:
• Each part contains approximately 25% of the observations.
• Division points are referred to as quartiles.
• = first quartile, or 25th percentile =PERCENTILE.EXC(array,0.25)
= second quartile, or 50th percentile (also the median) =PERCENTILE.EXC(array,0.50)
= third quartile, or 75th percentile =PERCENTILE.EXC(array,0.75)
Application: Interquartile Range (IQR)
The IQR is calculated by finding the difference between the first quartile and the third
quartile (Q3 – Q1). Meant to show the middle half of the data.
69
Analyzing Distributions
• z-score:
• Measures the relative location of a value in the data set.
• Helps to determine how far a particular value is from the mean relative to
the data set’s standard deviation.
• Standardized value
• If , ,..., is a sample of n observations
̅
• =
• = z-score for
• = sample mean
• s = sample standard deviation

70
z-Scores for the Class Size Data

• For class size data, = 44 and s = 8.

• For observations with a value > mean, z-score > 0.
• For observations with a value < mean, z-score < 0.
71
Calculating z-Scores for the Home Sales
Data in Excel

HomeSales.xlsx

72
Analyzing Distributions
• Identifying outliers:
• Outliers: Extreme values in a data set.
• It can be identified using standardized
values (z-scores).
• Any data value with a z-score less than –3
or greater than +3 is an outlier.

73
Analyzing Distributions
• Box plot: Graphical summary of the distribution of data.
• Developed from the quartiles for a data set.
Box Plot for the Home Sales Data • By using the
interquartile range,
IQR = Q3 – Q1, limits
Outlier are located for the
whiskers
• The limits for the box
plot are 1.5(IQR)
Q3
below Q1 and
1.5(IQR) above Q3.
Q1 Quartile 1 $139,000.00
Quartile 3 $256,625.00

IQR $256,624.00
74
Box Plots Comparing Home Sale Prices in Different
Communities

Data:
HomeComparison3.xlsx

75
Analyzing Distributions

• The Empirical Rule

• Central limit theorem is important because if your data follows
normal distribution you can use the empirical rule which states:
• Approximately 68% of the data will fall within 1 standard
deviation of the mean
• Approximately 95% of the data will fall within 2 standard
deviations of the mean
• Approximately 99% of the data will fall within 3 standard
deviations of the mean

76
Analyzing Distributions: Normal Distribution
• The Empirical Rule

77
Measures of Association
Between Two Variables
Measures of Association Between Two
Variables
• Scatter Charts: Useful graph for analyzing the relationship
between two variables.
• Covariance: Descriptive measure of the linear association
between two variables.
• Sample covariance for a sample of size n with the observations
(𝑥 , 𝑦 ), (𝑥 , 𝑦 ), and so on:
∑
𝑠 =
∑ µ µ
• Population covariance, =

79
Measures of Association Between Two
Variables
• Correlation coefficient: Measures the relationship between
two variables.
• Not affected by the units of measurement for x and y.
• Sample correlation coefficient denoted by 𝒙𝒚 .
• =
∑
• = sample covariance =
∑ ̅
• = sample standard deviation of x =
∑
• = sample standard deviation of y =

80
Interpretation of Correlation Coefficient
• –1 ≤ r ≤ +1
r value Relationship between
the x and y variables
<0 Negative linear
Near 0 No linear relationship
>0 Positive linear

81
Data for Bottled Water Sales at Queensland
Amusement Park for a Sample of 14 Summer Days

BottledWater.xlsx

82
Chart Showing the Positive Linear Relation Between Sales and
High Temperatures
Bottled Water Sales (cases)
35

25
Sales (cases)

0
76 78 80 82 84 86 88 90 92 94
High Temperature (F)

83
Sample Covariance Calculations for Daily High
Temperature and Bottled Water Sales at Queensland Amusement Park

84
Scatter Diagrams and Associated Covariance Values for Different
Variable Relationships

(a) (b) (c)

𝑠 Positive:
Approximately 0: Negative:
(x and y are positively
(x and y are not (x and y are negatively
linearly related)
linearly related) linearly related)

85
Computation of Correlation Coefficient
Illustration - To determine the sample correlation coefficient for
bottled water sales at Queensland Amusement Park:

12.8
= = = 0.93
(4.36)(3.15)

• There is a very strong linear relationship between high

temperature and sales.

86
Example of Nonlinear Relationship Producing a Correlation
Coefficient Near Zero

rxy = –0.007

87
Calculating Covariance and Correlation Coefficient for Bottled Water Sales
Using Excel

88
All About The Data
 Ask Questions:
 What is this dataset about?
 What are the different rows and variables?
 What are we trying to determine?
 Is there one variable that’s the end result or being impacted by
other variables?
 Understand the data
 Mean, Median, Mode, Min, Max, Variance, Standard Deviation
 Look at relationship between variables covariance, correlation
 Visualize the data relationship via data distribution (histogram, skew,
scatterplot)
 Draw your hypothesis Source: Khan Academy

89
End of Part 1

Chapter 9 - Structured Query Language (SQL) - NCERT Solutions For Class 12 Computer Science Code 083 CB
No ratings yet
Chapter 9 - Structured Query Language (SQL) - NCERT Solutions For Class 12 Computer Science Code 083 CB
30 pages
Association Analysis: Unit-V
No ratings yet
Association Analysis: Unit-V
12 pages
Computer Studies Paper 1 Question Paper 1
No ratings yet
Computer Studies Paper 1 Question Paper 1
12 pages
MTN - Database Design Concepts
No ratings yet
MTN - Database Design Concepts
5 pages
Emerging Tech Assignment
No ratings yet
Emerging Tech Assignment
3 pages
Database Management Systems OVERVIEW
No ratings yet
Database Management Systems OVERVIEW
21 pages
Srms Project Report 2
No ratings yet
Srms Project Report 2
112 pages
Mariadb Mysql
No ratings yet
Mariadb Mysql
2 pages
Coronel PPT Ch02
No ratings yet
Coronel PPT Ch02
59 pages
Chapter-2 BUSINESS ANALYTICS
No ratings yet
Chapter-2 BUSINESS ANALYTICS
114 pages
1631286832635-100220-Polly Pipe Database Assignment
No ratings yet
1631286832635-100220-Polly Pipe Database Assignment
118 pages
Skskksks
No ratings yet
Skskksks
25 pages
Data Tables and Pivot Tables Essential Excel Skills For Business Essential Excel Business For Skills Book 2 by Carl Nixon
100% (1)
Data Tables and Pivot Tables Essential Excel Skills For Business Essential Excel Business For Skills Book 2 by Carl Nixon
116 pages
Iiitb Ed DS Ai
No ratings yet
Iiitb Ed DS Ai
24 pages
05 Task Performance 1
No ratings yet
05 Task Performance 1
6 pages
Oakridge International School: Student Grade Report System
No ratings yet
Oakridge International School: Student Grade Report System
29 pages
Chi Square Test
No ratings yet
Chi Square Test
7 pages
DevOps Onboarding Blueprint 6 Months Success Plan
No ratings yet
DevOps Onboarding Blueprint 6 Months Success Plan
46 pages
BA 2023 - 2024 T04 Descriptive Statistics
No ratings yet
BA 2023 - 2024 T04 Descriptive Statistics
115 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
21 pages
Oracle Applications - Oracle Workflows Interview Questions
No ratings yet
Oracle Applications - Oracle Workflows Interview Questions
3 pages
DSILYTC Session 3 - Data Visualization and Presentation
No ratings yet
DSILYTC Session 3 - Data Visualization and Presentation
63 pages
C2. Descriptive Statistics
No ratings yet
C2. Descriptive Statistics
157 pages
03 02 Working With Various Data Sources - Handout - Understanding Normalization
No ratings yet
03 02 Working With Various Data Sources - Handout - Understanding Normalization
5 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
6 pages
The Role of Sustainability in Shaping Customer Perception of Expensive Goods in The Fashion Industry
No ratings yet
The Role of Sustainability in Shaping Customer Perception of Expensive Goods in The Fashion Industry
49 pages
Implementation of Management Information System in School Health Unit
No ratings yet
Implementation of Management Information System in School Health Unit
4 pages
Angular CRUD Using PHP and MySQL
No ratings yet
Angular CRUD Using PHP and MySQL
9 pages
Chapter 2 (Descriptive)
No ratings yet
Chapter 2 (Descriptive)
92 pages
Swetha G
No ratings yet
Swetha G
9 pages
Assignment - 1 Solution
No ratings yet
Assignment - 1 Solution
9 pages
Managing - Materials
No ratings yet
Managing - Materials
38 pages
DAIB Pertemuan 4 - Data Visualisasi Dan Eksplorasi
No ratings yet
DAIB Pertemuan 4 - Data Visualisasi Dan Eksplorasi
38 pages
Cengage EBA 2e Chapter02
No ratings yet
Cengage EBA 2e Chapter02
84 pages
Module 5 - Data Visualization
No ratings yet
Module 5 - Data Visualization
53 pages
Slide PTDL.1
No ratings yet
Slide PTDL.1
16 pages
Lesson Two
No ratings yet
Lesson Two
66 pages
Camm BA 5e PPT CH02 03-09-23 PC - Final
No ratings yet
Camm BA 5e PPT CH02 03-09-23 PC - Final
52 pages
Chapter 2
No ratings yet
Chapter 2
84 pages
Chapter 3 (Descriptive)
No ratings yet
Chapter 3 (Descriptive)
78 pages
CH 3
No ratings yet
CH 3
29 pages
Unit 3
No ratings yet
Unit 3
75 pages
Wk2 Activity 1 Modifying Data in Excel
No ratings yet
Wk2 Activity 1 Modifying Data in Excel
12 pages
How To Utilize Data Analysis in Excel
No ratings yet
How To Utilize Data Analysis in Excel
19 pages
FDA Practical - Book
No ratings yet
FDA Practical - Book
66 pages
Topic 5 - Data Analysis
No ratings yet
Topic 5 - Data Analysis
68 pages
SQL MCQ
No ratings yet
SQL MCQ
7 pages
Topic2 - 2024 - Descriptive Statistics - STD - Revised
No ratings yet
Topic2 - 2024 - Descriptive Statistics - STD - Revised
20 pages
Descriptive Na Ly Tics
No ratings yet
Descriptive Na Ly Tics
112 pages
Business Analytics Notes
No ratings yet
Business Analytics Notes
65 pages
Unit 2
No ratings yet
Unit 2
29 pages
DOHMH AirQualityIndicator DataDictionary March2023
No ratings yet
DOHMH AirQualityIndicator DataDictionary March2023
4 pages
Data Mining 2
No ratings yet
Data Mining 2
64 pages
Basic Economic Analytics Using Excel!
No ratings yet
Basic Economic Analytics Using Excel!
72 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
26 pages
By Microsoft Website: DURATION: 6 Weeks Amount Paid: Yes: Introduction To Data Science
100% (1)
By Microsoft Website: DURATION: 6 Weeks Amount Paid: Yes: Introduction To Data Science
21 pages
Excel Chap 2 Describe Data
No ratings yet
Excel Chap 2 Describe Data
5 pages
5 1 Operations Analytics
No ratings yet
5 1 Operations Analytics
23 pages
An Approach To Highly Intuitive Fuzzy Search in Elasticsearch With Typo Handling - by Neelambuj Singh at Software Engineer - Medium
No ratings yet
An Approach To Highly Intuitive Fuzzy Search in Elasticsearch With Typo Handling - by Neelambuj Singh at Software Engineer - Medium
1 page
Excel DataAnalysis
No ratings yet
Excel DataAnalysis
38 pages
Lesson 2
No ratings yet
Lesson 2
2 pages
Cengage EBA 2e Chapter02
No ratings yet
Cengage EBA 2e Chapter02
84 pages
Elective Finals 3A
No ratings yet
Elective Finals 3A
2 pages
Chapter 2 DESCRIPTIVE ANALYTICS
No ratings yet
Chapter 2 DESCRIPTIVE ANALYTICS
86 pages
1 ASAP Business Analytics Introduction
No ratings yet
1 ASAP Business Analytics Introduction
25 pages
Camm 3e Ch02 PPT PDF
No ratings yet
Camm 3e Ch02 PPT PDF
112 pages
Data Visualisation
No ratings yet
Data Visualisation
55 pages
Transportation Data Mining: Chapter 2. Getting To Know Your Data
No ratings yet
Transportation Data Mining: Chapter 2. Getting To Know Your Data
77 pages
Bharatiya Antariksh Hackathon 2025 Idea Submission
No ratings yet
Bharatiya Antariksh Hackathon 2025 Idea Submission
10 pages
Nota Kursus Analyzing Visualizing Data Excel INTANBK Jun 2023
No ratings yet
Nota Kursus Analyzing Visualizing Data Excel INTANBK Jun 2023
51 pages
Excel Guidelines Chapter2
No ratings yet
Excel Guidelines Chapter2
15 pages
Notes Week 3
No ratings yet
Notes Week 3
4 pages
ISM Case Study 1
No ratings yet
ISM Case Study 1
3 pages
Visualizing Data
No ratings yet
Visualizing Data
51 pages
Chapter 3 - Visualizing Data
No ratings yet
Chapter 3 - Visualizing Data
70 pages
Displaying Descriptive Statistics: Chapter 2 Map
No ratings yet
Displaying Descriptive Statistics: Chapter 2 Map
58 pages
Ba Lecture 2
No ratings yet
Ba Lecture 2
54 pages
Course Content - For Merge
No ratings yet
Course Content - For Merge
6 pages
Descriptive Statistics: Instructor: Maira Sami
No ratings yet
Descriptive Statistics: Instructor: Maira Sami
55 pages
Vikash Raj 2113047 NITS
No ratings yet
Vikash Raj 2113047 NITS
1 page
Ethyl Alcohol World Summary: Market Values & Financials by Country
From Everand
Ethyl Alcohol World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet
Learn More: Free Tutorials
No ratings yet
Learn More: Free Tutorials
16 pages
208 RM Lab File1 PDF
No ratings yet
208 RM Lab File1 PDF
31 pages
Tabular and Graphical Descriptive Techniques Using MS-Excel
No ratings yet
Tabular and Graphical Descriptive Techniques Using MS-Excel
20 pages
Data Collection and Collation Reporting Analysis
No ratings yet
Data Collection and Collation Reporting Analysis
24 pages
A Pivottable.: Show in Outline Form Outlines The Data in The Pivottable
No ratings yet
A Pivottable.: Show in Outline Form Outlines The Data in The Pivottable
8 pages
Chandra Finn: Work Experience
No ratings yet
Chandra Finn: Work Experience
1 page
Paint and Body Repair Shop Revenues World Summary: Market Values & Financials by Country
From Everand
Paint and Body Repair Shop Revenues World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet
Pumps World Summary: Market Values & Financials by Country
From Everand
Pumps World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet

Unit 03 Descriptive Analysis and Visual Exploration

Uploaded by

Unit 03 Descriptive Analysis and Visual Exploration

Uploaded by

ANLC751: Managerial

• Data we deal with are mainly 2 dimensional tables

• Projects often involve so much data that it is difficult to

• Sorting and Filtering data in excel

 Conditional Formatting of Data in Excel: Makes it easy to

Illustration - To identify the automobile models for which sales had

• Frequency distribution: A summary of data that

Data from a Qualitative Sample of 50 Soft Drink

• The frequency distribution summarizes information

• Three steps necessary to define the classes for a frequency

Frequency, Relative Frequency, and Percent Frequency

Class-Interval (days) Bin Frequency Relative Frequency Percent Frequency

10-14 14 =FREQUENCY(A2:D6,B12:B16) =C12/$C$17 =D12*100

 Skewness: Lack of symmetry

• Uses the number of classes, class widths, and class

• Shows the number of data items with values less

• Middle value, for an odd number of observations

• Arrange the data in ascending order:

Year Return (%) Growth Factor

• Computation of Sample Variance: • Computation of Sample Standard Deviation:

• For class size data, = 44 and s = 8.

• The Empirical Rule

(a) (b) (c)

• There is a very strong linear relationship between high

You might also like