0% found this document useful (0 votes)
51 views

Notes (Chapter 1 - 3)

This document discusses key concepts in statistics including data, variables, scales of measurement, types of data, and descriptive statistics. It defines statistics as numerical facts used to understand business situations and explains its applications in accounting, economics, finance, marketing, production, and information systems. The scales of measurement are nominal, ordinal, interval, and ratio. Data can be categorical or quantitative. Descriptive statistics are used to summarize data in an easy to understand form using tables, graphs, or numbers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Notes (Chapter 1 - 3)

This document discusses key concepts in statistics including data, variables, scales of measurement, types of data, and descriptive statistics. It defines statistics as numerical facts used to understand business situations and explains its applications in accounting, economics, finance, marketing, production, and information systems. The scales of measurement are nominal, ordinal, interval, and ratio. Data can be categorical or quantitative. Descriptive statistics are used to summarize data in an easy to understand form using tables, graphs, or numbers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

• ORDINAL

CHAPTER 1 – DATA AND STATISTICS


• The data have the properties of nominal data and the
What is statistics? order or rank of the data is meaningful
• can refer to numerical facts (i.e. averages, medians, • A nonnumeric label or numeric code may be used
percentages, and maximums) that help us understand a • Example: The nonnumeric rating labels from AAA to F
variety of business and economic situations used for Fitch rating. These can be rank ordered from
• can also refer to the art and science → collecting, analyzing, best credit rating AAA to poorest credit rating F.
presenting, and interpreting data Numerical code can also be used - Class rank of a
student in school.
Applications in Business and Economics • INTERVAL
• ACCOUNTING • The data have the properties of ordinal data, and the
• Public accounting firms use statistical sampling interval between observations is expressed in terms of
procedures when conducting audits for their clients. a fixed unit of measure.
• ECONOMICS • Interval data is always numeric
• Economists use statistical information in making • Example: Melissa has a SAT score of 1985, while Kevin
forecasts about the future of the economy or some has a SAT score of 1880. Melissa scored 105 points
aspect of it. more than Kevin.
• FINANCE • RATIO
• Financial advisors use price-earnings ratios and dividend • The data have all the properties of interval data and
yields to guide their investment advice. the ratio of two values is meaningful.
• MARKETING • Variables such as distance, height, weight, and time
• Electronic point-of-sale scanners at retail checkout use the ratio scale.
counters are used to collect data for a variety of • This scale must contain a zero value that indicates that
marketing research applications. nothing exists for the variable at the zero point.
• PRODUCTION • Example: Melissa’s college record shows 36 credit
• A variety of statistical quality control charts are used to hours earned, while Kevin’s record shows 72 credit
monitor the output of a production process. hours earned. Kevin has twice as many credit hours
• INFORMATION SYSTEMS earned as Melissa. → 1:2
• A variety of statistical information helps administrators
Categorial and Quantitative Data
assess the performance of computer networks.
Data can be further classified as being categorical or
quantitative.
Data and Data Sets
• Categorical Data
• Data → are the facts and figures collected, analyzed, and
o Labels or names used to identify an attribute of each
summarized for presentation and interpretation.
element
• Data Sets → refers to all the data collected in a particular
o Often referred to as qualitative data
study
o Use either the nominal or ordinal scale of
measurement
Elements, Variables, and Observations
o Can be either numeric or nonnumeric
• Elements → are the entities on which data are collected
o Appropriate statistical analysis is rather limited
• Variables → is a characteristic of interest for the elements
• Quantitative Data
• Observation → the set of measurements obtained for a
o Indicates how many or how much
particular element
▪ Discrete → if measuring how many
o a data set with n elements contains n observations
▪ Continuous → if measuring how much
o the total no. of data values in a complete data set is
o are always numeric
the number of elements multiplied by the number of
o ordinary arithmetic operations are meaningful for
variables
quantitative data.

Scales of Measurement (NOIR)


• the scale determines the amount of information contained
in the data. It also indicates the data summarization and
Cross-Sectional Data
statistical analyses that are most appropriate.
• are collected at the same or approximately the same point
• Scales of measurement includes:
in time.
• NOMINAL
• Example: Data detailing different variables like status, Per
• Data are labels or names used to identify an attribute
capita GDP, Fitch rating for 60 different WTO nations at the
of the element.
same point in time.
• A nonnumeric label or numeric code may be used
• Example: The WTO status category for the nations in Time Series Data
the previous example is classified using • are collected over several time periods
nonnumerical labels – “member” and “observer”. • Example: U.S average price per gallon of conventional
Alternatively, a numeric code could be used for the regular gasoline between 2010 and 2015.
WTO status variable by letting 1 denote a member • Graphs of time series help analysts understand
nation and 2 denote an observer nation.
• what happened in the past,
• identify any trends over time, and Descriptive Statistics
• project future values for the time series • Most of the statistical information in newspapers,
magazines, company reports, and other publications
consists of data that are summarized and presented in a
form that is easy to understand.
• Such summaries of data, which may be tabular, graphical,
or numerical, are referred to as descriptive statistics.
• Example: Hudson Auto Repair. The manager of Hudson
Auto would like to have a better understanding of the cost
of parts used in the engine tune-ups performed in her
shop. She examines 50 customer invoices for tune-ups.
The costs of parts, rounded to the nearest dollar, are listed
on the next slide.

Data Sources
• EXISTING SOURCES

• DATA AVAILABLE FROM INTERNAL COMPANY RECORDS

• DATA AVAILABLE FORM SELECTED GOVERNMENT AGENCIES

• STATISTICAL STUDIES – OBSERVATIONAL


• In observational (nonexperimental) studies no attempt is
made to control or influence the variables of interest
• Example: survey (studies of smokers and nonsmokers are
observational studies because researchers do not
determine or control who will smoke and who will not
smoke)
• STATISTICAL STUDIES – EXPERIMENTAL Numerical Descriptive Statistics
• In experimental studies the variable of interest is first • most common NDS → mean (or average) → demonstrates
identified. Then one or more variables are identified and a measure of the central tendency, or central location, of
controlled so that data can be obtained about how they the data for a variable.
influence the variable of interest. • Example: Hudson’s mean cost of parts, based on the 50
• Example: The largest experimental study ever conducted tune-ups studied, is $79 (found by summing up the 50 cost
is believed to be the 1954 Public Health Service values and then dividing by 50).
experiment for the Salk polio vaccine. Nearly two million
U.S. children (grades 1- 3) were selected. Statistical Inference
• Population → The set of all elements of interest in a
Data Acquisition Considerations particular study
• TIME REQUIREMENT • Sample → A subset of the population
• searching for information can be time consuming • Statistical Inference → The process of using data obtained
• information may no longer be useful by the time it is from a sample to make estimates and test hypotheses
available about the characteristics of a population.
• COST OF ACQUISITION • Census → Collecting data for the entire population
• Organizations often charge for information even when
it is not their primary business activity.
• DATA ERRORS
• Using any data that happen to be available or were
acquired with little care can lead to misleading
information.
• Sample Survey → collecting data for a sample • Organizations obtain large amounts of data on a daily basis
by means of magnetic card readers, bar code scanners,
point of sale terminals, and touch screen monitors.
• Example(s): Wal-Mart captures data on 20-30 million
transactions per day; Visa processes 6,800 payment
transactions per second.

Data Mining
• is used to identify related products that customers who
have already purchased a specific product are also likely to
Statistical Analysis Using Microsoft Excel
purchase (and then pop-ups are used to draw attention to
those related products).
• the most effective data mining systems use automated
procedures to discover relationships in the data and predict
future outcomes, … prompted by only general, even vague,
queries by the user.
• the major applications of data mining have been made by
companies with a strong consumer focus such as retail,
financial, and communication firms.
• as another example, data mining is used to identify
customers who should receive special discount offers based
on their past purchasing volumes.
• Requirements:
• Statistical methodology (i.e. multiple regression, logistic
regression, and correlation) are heavily used.
• Computer science technologies are also needed in
relation to the involving artificial intelligence and
machine learning.
• Significant investment in time and money
• Model Reliability:
• Statistical model for a particular sample may not be
applicable to other data
• Data set can be partitioned into: training set (model
development) & test set (validating the model)
• over fitting the model can cause danger → misleading
associations & conclusions appear to exist
• careful interpretation of results and extensive testing is
important
Ethical Guidelines of Statistical Practice
• unethical behavior can take a variety of forms including:
• improper sampling
• Inappropriate analysis of the data
• Development of misleading graphs
• Use of inappropriate summary statistics
• Biased interpretation of the statistical results
• Be fair, thorough, objective, and neutral as you collect,
analyze, and present data.
• “Ethical Guidelines for Statistical Practice” → developed by
American Statistical Association
• It contains 67 guidelines organized into 8 topic area:
▪ Professionalism
▪ Responsibilities to Funders, Clients, Employers
▪ Responsibilities in Publications and Testimony
Analytics ▪ Responsibilities to Research Subjects
• Scientific process of transforming data into insight for ▪ Responsibilities to Research Team Colleagues
making better decisions. ▪ Responsibilities to Other Statisticians/Practitioners
• Types: ▪ Responsibilities Regarding Allegations of
• DESCRIPTIVE A. → Analytical techniques that describe Misconduct
what happened in the past ▪ Responsibilities of Employers Including
• PREDICTIVE A. → Analytical techniques that use models Organizations, Individuals, Attorneys, or Other
constructed from past data to predict future. It help Clients
assess the impact of one variable on another
• PRESCRIPTIVE A. → Analytical techniques that yield a
best course of action to take

Data Warehousing
• is capturing, storing, and maintaining the data and it is a
significant undertaking.
CHAPTER 2A: DESCRIPTIVE STATISTICS
(TABULAR AND GRAPHICAL DISPLAYS
Categorical Data
• FREQUENCY DISTRIBUTION
- is a tabular summary of data showing the number
(frequency) of observations in each of several non-
overlapping categories or classes.
- Objective: to provide insights about the data that cannot
be quickly obtained by looking only at the original data.

• BAR CHART
- is a graphical display for depicting qualitative data
- are used to identify the most important causes of
problems.
- horizontal axis → we specify the labels that are used for
each class
- vertical axis → frequency, relative frequency, or percent
frequency scale
- bar or fixed width → drawn above each class label, we
extend the height appropriately
- bars are separated → to emphasize the fact that each
class is separate.
- Pareto Diagram → When the bars are arranged in
descending order of height from left to right (with the
most frequently occurring cause appearing first) →
founder “Vilfredo Pareto”, an Italian economist.

• RELATIVE FREQUENCY DISTRIBUTION


- Relative frequency → is the fraction or proportion of the
total number of data items belonging to a class.

- Relative frequency distribution → is a tabular summary of


a set of data showing the relative frequency for each
class.
• PERCENT FREQUENCY DISTRIBUTION
- Percent frequency → is the relative frequency multiplied
by 100.
- Percent frequency distribution → is a tabular summary of
a set of data showing the percent frequency for each
class.
Quantitative Data
• FREQUENCY DISTRIBUTION
- Example: Sanderson and Clifford, a small public
accounting firm wants to determine time in days
required to complete year end audits. It takes a sample
of 20 clients.
Year-end Audit Time (in days)
12 14 19 18 15
15 18 17 20 27
22 23 22 21 33
28 14 18 16 13
- Three (3) steps necessary to define the classes for a
frequency distribution with quantitative data are:
o Step 1: determine the number of non-
overlapping classes.
o Step 2: Determine the width of each class.
o Step 3: Determine the class limits.

• PIE CHART
- is a commonly used graphical display for presenting
relative frequency and percent frequency distributions
for categorical data.

- Inferences from the Pie Chart:


• Almost one-half of the customers surveyed
preferred Pepsi (looking at the left side of the pie).
• The second preference is for Dr. Pepper with 25% of
the customers opting for it.
• Only 5% of the customers opted for Sprite.
• RELATIVE FREQUENCY AND PERCENT FREQUENCY
DISTRIBUTIONS
- Example: Sanderson and Clifford

• CUMULATIVE DISTRIBUTIONS
- Cumulative Frequency D. → shows the number of
items with values less than or equal to the upper limit
- Insights obtained from Percent Frequency of each class. (Last entry = total no. of observations)
Distribution: - Cumulative Relative FD. → shows the proportion of
• 40% of the audits required from 15 to 19 days items with values less than or equal to the upper limit
• Another 25% of the audits required 20 to 25 days of each class. (Last entry = 1.00)
• Only 5% of the audits required more than 30 days - Cumulative Percent FD → shows the percentage of
• DOT PLOT items with values less than or equal to the upper limit
- one of the simplest graphical summaries of data of each class. (Last entry = 100)
- Example: Sanderson and Clifford
- horizontal axis → range of data values
- then each data value is represented by a dot placed
above the axis
- Example: Sanderson and Clifford

• STEM-AND-LEAF DISPLAY
- shows both the rank order and shape of the
distribution of the data.
• HISTOGRAM
- It is similar to a histogram on its side, but it has the
- Common graphical display of quantitative data
advantage of showing the actual data values.
- Horizontal axis → variable of interest
- the first digits of each data item are arranged to the
- A rectangle is drawn above each class interval with its
left of a vertical line.
height corresponding to the interval’s frequency,
- to the right of the vertical line we record the last digit
relative frequency, or percent frequency.
for each item in rank order.
- has no natural separation between rectangles of
- each line (row) in the display is referred to as a stem.
adjacent classes.
- Each digit on a stem is a leaf.
- Example: Sanderson and Clifford
- Example: The number of questions answered correctly
on an aptitude test by 50 students analysed with the
help of a Stem – and – leaf display here. The relevant
data is given in the following table.
No. of questions answered correctly by 50 students
112 73 126 82 92 115 95 84 68 100
72 92 128 104 108 76 141 119 98 85
69 76 118 132 96 91 81 113 115 94
97 86 127 134 100 102 80 98 106 106
107 73 124 83 92 81 106 75 95 119

- Histogram showing skewness:


• Symmetrical → Left tail is the mirror image of the
right tail (e.g. heights of people)

• Moderately Skewed Left → A longer tail to the left


(e.g. exam scores)

• Moderately Right Skewed → A Longer tail to the


right (e.g. housing values)
• Highly Skewed Right → A very long tail to the right
(e.g. executive salaries)
CHAPTER 2B: DESCRIPTIVE STATISTICS
(TABULAR AND GRAPHICAL DISPLAYS)
Crosstabulation
- Is a method of summarizing the data for two variables
- tabular summary of data for two variables
- can be used when:
• one variables is categorical and the other is
quantitative, Side-by-side bar chart
• both variables are categorical, or - is a graphical display for depicting multiple bar charts on the
• both variables are quantitative. same display.
• The left and top margin labels define the classes for - Each cluster of bars represents one value of the first variable
the two variables. - Each bar within a cluster represents one value of the second
- Example: Zagat’s Restaurant Review variable.
Crosstabulation of quality rating and meal price data
for 300 Los angeles restaurants is given here.

- Insights gained from preceding crosstabulation Stacked Bar Chart


• Greatest number of restaurants in the sample (64) - It is a bar chart in which each bar is broken into
have a very good rating and the meal price in the rectangular segments of a different color.
$20-29 range. - If percentage frequencies are displayed, all bars will be of
• Only 2 restaurants have an excellent rating and a the same height (or length), extending to the 100% mark.
meal price in the range of $10-19 range.
- Row or Column Percentages
• Converting the entries in the table into row
percentages or column percentages can provide
additional insight about the relationship between
the two variables.

Data Visualization: Best Practices in Creating Effective


Graphical Displays
- Simpson’s Paradox → the reversal of conclusions based - Data Visualization → describes the use of graphical
on aggregate and unaggregated data. displays to summarize and present information about a
- Scatter diagrams and trendlines → are useful in exploring data set.
the relationship between two variables. - The goal is to communicate as effectively and clearly as
• Scatter Diagram → is a graphical presentation of the possible the key information about the data.
relationship between two quantitative variables.
Choosing the Type of Graphical Display
▪ One variable is shown on the horizontal axis
Displays used to show the distribution of data:
and the other variable is shown on the vertical
• Bar Chart → to show the frequency distribution or
axis.
relative frequency distribution for categorical data
▪ The general pattern of the plotted points
suggests the overall relationship between the • Pie Chart → to show the relative frequency or percent
variables. frequency for categorical data
▪ Trendline → provides an approximation of the • Dot Plot → to show the distribution for quantitative data
relationship over the entire range of the data
• Histogram → to show the frequency distribution for
quantitative data over a set of class intervals
• Stem-and-Leaf Display → to show both the rank order
and shape of the distribution for quantitative data
Display used to make comparisons:
• Side-by-Side Chart → to compare two variables
• Stacked Bar Chart → to compare the relative frequency
or Percent frequency of two categorical variables
Display used to show relationships:
• Scatter Diagram → to show the relationship between
two quantitative variables
• Trendline → to approximate the relationship of data in a
scatter diagram

Data Dashboard
• Data dashboard → widely used data visualization tool
• It organizes and presents key performance indicators
(KPIs) used to monitor an organization or process.
• It provides timely, summary information that is easy to
read, understand, and interpret.
• Some additional guidelines include . . .
o Minimize the need for screen scrolling
o Avoid unnecessary use of color or 3D
o Use borders between charts to improve readability
CHAPTER 3A: DESCRIPTIVE STATISTICS
(NUMERICAL MEASURES)

Numerical Measures
• Sample statistics → if the measure is computed for data
from a sample.
• Population parameters → If the measures are computed
for data from a population.
• Point estimator → a sample statistic of the corresponding
population parameter.

Measures of Location
• MEAN [Excel Function → AVERAGE(data cell range)]
- most important measure of location
- provides a measure of central point
- the mean of a data set is the average of all the data
values
- The sample mean 𝑥̅ is the point estimator of the - Trimmed Mean
population mean µ. o another measure sometimes used when extreme
values are present
o it is obtained by deleting a percentage of the
smallest and largest values from a data set and
then computing the mean of the remaining values.
Sample Mean o Example: the 5% trimmed mean is obtained by
removing the smallest 5% and the largest 5% of the
data values and then computing the mean of the
remaining values.
• MODE [Excel Function → MODE.SNGL(data cell range)]
- is the value that occurs with greatest frequency.
- greatest frequency can occur at two or more different
Population Mean
values
- Example: Monthly Starting Salary - Bimodal → If the data have exactly two modes
A placement office wants to know the - Multimodal → If the data have more than two modes
average starting salary of business graduates. Monthly - Example: Monthly Starting Salary
starting salaries for a sample of 12 business school The only monthly starting salary that occurs more
graduates is provided here. than once is $3,880. Mode = 3,880

Note: Data is in ascending order.

• MEDIAN [Excel Function → MEDIAN(data cell range)] Using Excel to Compute


- is the value in the middle when the data items are the Mean, Median, and
arranged in ascending order (least to greatest). Mode
- is the measure of location most often reported for
annual income and property value data.
- Whenever a data set has extreme values, median is the
preferred measure of central location. A few
extremely large incomes or property values can inflate • WEIGHTED MEAN
the mean. - In some instances the mean is computed by giving each
observation a weight that reflects its relative
importance. The choice of weights depends on the
application (e.g. no. of credit hrs. earned for each
grade, GPA)
- Example: Purchase of Raw Material • PERCENTILES [Excel F. → PERCENTILE.EXC(data range,
Consider the following sample of five p/100)
purchases of a raw material over a period of three - provides information about how the data are spread
months: over the interval from the smallest value to the largest
value. (e.g. Admission test scores for colleges and
universities are frequently reported in terms of
percentiles.)
- pth percentile → is a value such that at least p percent
of the items take on this value or less and at least (100
- p) percent of the items take on this value or more.

- Example: Monthly Starting Salary (80th percentile)

• GEOMETRIC MEAN [Excel F. → GEOMEAN(data cell


range)]
- is calculated by finding the nth root of the product of n
values.
- It is often used in analyzing growth rates in financial
data (where using the arithmetic mean will provide
misleading results).
- It should be applied anytime you want to determine the
mean rate of change over several successive periods
(be it years, quarters, weeks, . . .).
- Other common applications include: changes in
populations of species, crop yields, pollution levels, and
birth and death rates.

- Example: Mutual Fund

• QUARTILES [Excel F. → QUARTILE.EXC (array,QUART)]


- 1st quartile → 25th percentile
- 2nd quartile → 50th percentile = median
- 3rd quartile → 75th percentile
- Example: Monthly Starting Salary → 3rd Q. (75th P.)
Measures of Variability • COEFFICIENT OF VARIATION
- It is often desirable to consider measures of variability - indicates how large the standard deviation is in
(dispersion), as well as measures of location. relation to the mean.
- E.g. in choosing supplier A or supplier B we might consider
not only the average delivery time for each, but also the
variability in delivery time for each.
• RANGE
- is the difference between the largest and smallest
data values.

Example: Monthly Starting Salary (Variance, Standard Dev.,


- It is the simplest measure of variability and Coefficient of Var.)
- It is very sensitive to the smallest and largest data
values
- Example: Monthly Starting Salary

• INTERQUARTILE RANGE
- is the difference between the third quartile and the
first quartile.
- It is the range for the middle 50% of the data
- It overcomes the sensitivity to extreme data values
- Example: Monthly Starting Salary
3,950 + 4,050
= 4𝑘
2
3,880 + 3,850
= 3,865
2

• VARIANCE [Excel F. → VAR.S(data cell range)


- is a measure of variability that utilizes all the data
- It is based on the difference between the value of
each observation (xi) and the mean (𝑥̅ for a sample,
m for a population).
- is useful in comparing the variability of two or more
variables.
- is the average of the squared differences between
each data value and the mean.

• STANDARD DEVIATION [Excel F. → STDEV.S(data cell


range)
- is the positive square root of the variance.
- It is measured in the same units as the data, making
it more easily interpreted than the variance.
- equal → the sample mean will have a z-score of zero
CHAPTER 3B: DESCRIPTIVE STATISTICS - Example: Class size data
(NUMERICAL MEASURES)
Measures of Distribution Shape, Relative Location, and
𝑋𝑖
Detecting Outliers
• DISTRIBUTION SHAPE
- Skewness → An important measure of the shape of a
distribution. It can be easily computed using statistical
software.
o Formula: (sample data)

o Symmetric (not skewed)


- skewness is zero • CHEBYSHEV’S THEOREM
- mean and median is equal - At least (1 - 1/z2) of the items in any data set will be
within z standard deviations of the mean, where z is any
value greater than 1.
- Chebyshev’s theorem requires z > 1, but z need not be
an integer.
- At least 75% of the data values must be within z = 2
standard deviations of the mean.
- At least 89% of the data values must be within z = 3
o Moderately Skewed Left standard deviations of the mean.
- skewness is negative - At least 94% of the data values must be within z = 4
- mean will usually be less than the median standard deviations of the mean.
- Example: Marks of Students
Suppose the marks of 100 students in a course
had a mean of 70 and a standard deviation of 5. We
want to know the number of students having test
scores between 60 and 80.

60 and 80 are 2 standard deviations below and


above the mean respectively.
o Moderately Skewed Right
- skewness is positive
- mean is usually be more than the median

o Highly Skewed Right


- skewness is positive (often above 1.0) • EMPIRICAL RULE
- mean is usually be more than the median - When the data are believed to approximate a bell-
shaped distribution.
- can be used to determine the percentage of data values
that must be within a specified number of standard
deviations of the mean.
- rule is based on the normal distribution (chap.6)
- For data having a bell-shaped distribution:
o Approximately 68% of the data values will be
within +/- 1 standard deviation of its mean.
o Approximately 95% of the data values will be
• Z-SCORES
within +/- 2 standard deviations of its mean.
- is often called the standardized value
o Almost all of the data values will be within +/- 3
- It denotes the number of standard deviations a data
standard deviations of its mean.
value xi is from the mean.

- Excel’s STANDARDIZE function can be used to compute


the z-score.
- observation’s z-score → is a measure of the relative
location of the observation in a data set.
- A data value less than the sample mean will have a z-
score less than zero.
- greater than → sample mean will have a z-score is
greater than zero
• DETECTING OUTLIERS
- Outlier → is an unusually small or unusually large value
in a data set.
- A data value with a z-score less than -3 or greater than
+3 might be considered an outlier.
- It might be:
o an incorrectly recorded data value
o a data value that was incorrectly included in the
data set
o a correctly recorded unusual data value that
belongs in the data set
- Example: Class size data

Measures of Association Between Two Variables


- Two descriptive measures of the relationship between two
variables are covariance and correlation coefficient.
o COVARIANCE
Five-number Summaries and Box Plots - is a measure of the linear association between
• Smallest value two variables.
• First (1st) quartile - Positive Values → positive relationship & vice
• Median versa
• Third (3rd) quartile
• Largest Value

o CORRELATION COEFFICIENT
- Correlation → is a measure of linear association
and not necessarily causation.
- Just because two variables are highly correlated, it
does not mean that one variable is the cause of
Box Plot the other.
- is a graphical summary of data that is based on a five-
number summary.
- A key to the development of a box plot is the computation
of the median and the quartiles Q1 and Q3.
- Box plots provide another way to identify outliers
- Example: Monthly Starting Salary
o A box is drawn with its ends located at the first and
third quartiles.
o A vertical line is drawn in the box at the location of - The coefficient can take on values between -1 and
the median (second quartile). +1
o Strong negative linear relationship → values
near -1
o Strong positive linear relationship → values
near +1
- The closer the correlation is to zero, the weaker
the relationship.
- Example: Stereo and Sound Equipment Store
The store’s manager wants to determine
the relationship between the number of weekend
television commercials shown and the sales at the
store during the following week

o Limits are located (not drawn) using the interquartile


range (IQR).
o Data outside these limits are considered outliers
o The locations of each outlier is shown with the
symbol → ●
Data Dashboards: Adding Numerical Measures to Improve
Effectiveness
- Data dashboards are not limited to graphical displays.
- The addition of numerical measures, such as the mean
and standard deviation of KPIs, to a data dashboard is
often critical.
- Dashboards are often interactive
- Drilling Down → refers to functionality in interactive
dashboards that allows the user to access information
and analyses at increasingly detailed level.

You might also like