BAFBANA - Chapter 1-4
BAFBANA - Chapter 1-4
SOAR: Analyze the Data ● The visualization displays the expected bell-
shaped distribution of earnings for publicly
traded companies from 1976-1994.
● However, the visualization shows an anomaly—
in this case, a discontinuity, right around zero.
Just below zero, there seem to be missing
observations or a lower-than-expected
frequency, suggesting some type of an
anomaly. And just above zero, there seem to be
higher-than-expected frequency, again
suggesting an anomaly.
● Does this finding warrant additional
investigations
● Different Purposes Warrant Different
Visualizations to Report the Results to the
Decision Maker
○ What are We Trying to Communicate?
● Chapter 6 will discuss further.
Chapter 2 | Obtain the Data: An Introduction ● Small Business Administration Data
to Business Data Sources ● Publicly Available Data
○ Financial Statements of All Publicly
Traded Companies
Obtain the Data
○ Stock Price Data
○ Summarized Financial Data
LO2.1 | Internal and External Data Sources
File Formats
● Data usually delivered as comma-separated
Ordinal Data – Categorical data that allows/implies
files with the file extension .csv
ranking and sorting
● .csv files store data as text, but they convert
● Gold, Silver and Bronze
rapidly to a tabular format when they are
● Survey Answers: Agree, Indifferent, Disagree
imported into spreadsheet applications or used
● Transaction Dates
with programming languages.
Summarize by
● .csv files do not have the same row and column
● Counting and grouping
limits that Excel has, which allows .csv files to
● Proportion
hold Big Data that exceed Excel’s size limit.
● Ranking (Because Ordinal Data is Ranked)
Level of Aggregation
● How do you want it?
○ Aggregated Data
■ Data Already Processed and
Transformed
■ Already Combined into
Subtotals, Counts, Sums or
Averages
● Raw Data
○ Give the Analyst the Flexibility to
Process Data as They See Fit
Protecting Data
Questions a Company Must Address to Handle Data
Ethically
● If credit card information is taken, what
assurance do customers have that their credit
card number will be protected?
● Does the company keep the data secure and
private, and does it have safeguards in place to
protect the data?
● Has the company established effective
practices to mitigate the risks of data misuse?
Are penalties enforced for data misuse?
Chapter 3 | Analyze the Data: Basic ○ Focus on the most critical, interesting,
Statistics and Tools Required in Business or abnormal items
○ Speeds up analysis and may reduce
Analytics
analysis cost
○ Common method: Filtering
LO1.3 | Defining Populations and Samples
Filtering in Excel (Exhibits 3.1 & 3.2)
Population versus Sample
● Population: A group with something in common.
○ Expensive/impossible to get all
○ Parameter: characteristic of a
population
○ Example: survey all restaurants in the
country
Discrete Data
● Whole-number (integer only)
● Finite set of values between any two
observations
● Examples: inventory, vehicles, manufacturing
Standard Normal Distribution
plants
● Special case of the normal distribution
● Theoretical distribution
Measures of Central Tendency
○ Used for comparisons
Describe the center point of a data set.
○ Calculate probabilities of individual
● Mean: average = Sum/n
observations
● Median: midpoint of the data distribution
● Mean = Median = Mode = 0
● Mode: most common observation in a data set
● Standard deviation = 1
● Kurtosis: distribution shape (i.e., data central or
● Z-score
in tails)
○ Number of standard deviations a data
● Symmetry: Mean = Median = Mode
point is from the mean
○ z = (x - mean) / standard deviation
Skewness
Confidence Interval
Point Estimate vs Confidence Interval
● Point Estimate: single value calculated from
the sample used to estimate population
parameter
○ Difficult to be accurate
Tableau Summary Statistics- Descriptive Statistics ● Confidence Interval: a range of numbers
around the point estimate at a certain level of
confidence
○ Level of confidence: probability that
the true value of the population
parameter falls within a certain range
→
Point estimate ± Margin of error
Lower bound < population parameter < upper bound
LO3.5 | Interpreting and Visualizing Statistics ● Function of desired confidence level plus
standard error
Frequency Distribution ● Error reflects the inability to capture true
● Numerical data population parameter
● Bins, classes, and intervals
○ Categories in numerical data Hypothesis Testing
● Table that uses bins or categories to list the ● Hypothesis: proposed explanation
frequency of various outcomes in a sample. ● Hypothesis Test: used to determine if there are
statistically significant differences between
Histogram groups
Visual representation of frequency distribution ○ Significant, Not Random (or by chance)
○ Two-tailed: different?
○ One-tailed: direction of difference?
● Correlation Coefficient = -1
○ perfectly negatively correlated
● Correlation Coefficient = 1
○ perfectly positively correlated
Chapter 4 | Analyze the Data: Exploratory Diagnostic Analytics – Why did it Happen?
Business Analytics
(Descriptive and Diagnostic Analytics)