0% found this document useful (0 votes)
740 views12 pages

STA 111 Note

Statistics note

Uploaded by

chimdidaniella
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
740 views12 pages

STA 111 Note

Statistics note

Uploaded by

chimdidaniella
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

STA 111: DESCRIPTIVE STATISTICS

Course Outline
WEEKS TOPIC CONTENT
1 Introduction Concept of Statistics
2 Statistical data Types, sources and methods of collection.
3 Presentation of data Tables chart and graph. Errors and approximations.
Frequency and cumulative distributions
4 Measures of location Partition, dispersion, skewness and Kurtosis. Rates, ratios
and index numbers
5 Permutation and
combination
6 Probability Concepts and principles of probability
7 Concepts and principles of probability
8 Random variables Random variables
9 Random variables
10 Probability and Basic distributions: Binomial,
distribution functions
11 geometric, Poisson,
12 normal and sampling normal and sampling distributions
distributions
13 exploratory data analysis
STATISTICS
Statistics is the branch of mathematics that deals with collecting, analyzing, interpreting,
presenting, and organizing data. It helps in making informed decisions based on data analysis.
Types of Statistics:
1. Descriptive Statistics: Summarizes and describes the characteristics of a dataset. It includes
measures like mean, median, mode, range, and standard deviation.
2. Inferential Statistics: Makes inferences and predictions about a population based on a sample
of data. This includes hypothesis testing, confidence intervals, and regression analysis.
Additional Categories:
Parametric Statistics: Assumes that the data follows a certain distribution (e.g., normal
distribution) and includes techniques like t-tests and ANOVA.
Non-parametric Statistics: Does not assume a specific distribution and includes methods like chi-
square tests and Mann-Whitney U tests.
Frequency Distribution
Definition: A frequency distribution is a summary of how often different values occur within a
dataset. It organizes data into classes or intervals, showing the number of observations
(frequency) in each class.
Purpose: It helps in understanding the distribution and spread of data, making it easier to identify
patterns, trends, and anomalies.
Components of Frequency Distribution
Data Set: A collection of observations or measurements.
Classes/Intervals: Groups into which data is categorized. Each class has a lower and upper
boundary.
Frequency: The count of data points that fall within each class interval.
Cumulative Frequency: The running total of frequencies through the classes, useful for
determining medians and percentiles.
Types of Frequency Distribution
Ungrouped Frequency Distribution: Lists each unique value and its corresponding frequency.
Grouped Frequency Distribution: Organizes data into intervals (classes), which simplifies
analysis for large datasets.
Creating a Frequency Distribution
1. Collect Data: Gather observations or measurements.
2. Decide on Classes: Determine the number of classes and the range for each class. A common
rule of thumb is to use the Sturges' rule: 𝑘 = 1 + 3.322 𝑙𝑜𝑔10 𝑛, where 𝑘 is the number of
classes and 𝑛 is the number of observations.
3. Tally Frequencies: Count how many observations fall into each class.
4. Tabulate Results: Present data in a table format, typically including columns for classes,
frequencies, and cumulative frequencies.
Example of a Frequency Distribution
Data Set: 2, 3, 5, 7, 8, 8, 10, 12, 15, 18
Classes: 0-5, 6-10, 11-15, 16-20
Frequency Table:
Class Tally Frequency Cumulative Frequency
0-5 III 3 3
6-10 IIII 4 7
11-15 II 2 9
16-20 I 1 10

Visual Representation
Histograms: A graphical representation where classes are on the x-axis and frequencies on the y-
axis.
Bar Graphs: Used for ungrouped data, with bars representing the frequency of each unique value.
Importance of Frequency Distribution
Data Analysis: Helps in identifying the shape of the data distribution (normal, skewed, etc.).
Descriptive Statistics: Provides the basis for calculating measures like mean, median, mode,
variance, and standard deviation.
Data Comparison: Enables comparison between different datasets or groups.
STATISTICAL DATA
Statistical data is essential for research, decision-making, and analysis in various fields.
Understanding its types, sources, and methods of collection is crucial for obtaining valid and
reliable information.
Types of Statistical Data
A. Quantitative Data
Definition: Numerical data that can be measured and analyzed mathematically.
Characteristics:
Can be subjected to statistical analysis.
Provides information about quantities.
Examples:
Discrete Data: Countable values (e.g., number of students, cars).
Continuous Data: Measurable values (e.g., height, temperature).
B. Qualitative Data
Definition: Non-numerical data that describes qualities or characteristics.
Characteristics:
Focuses on attributes and categories.
Often used in exploratory research.
Examples:
Nominal Data: Categories without a specific order (e.g., colors, types of cuisine).
Ordinal Data: Categories with a meaningful order (e.g., education levels, satisfaction
ratings).
Sources of Statistical Data
A. Primary Sources
Definition: Data collected directly for a specific research purpose.
Examples: Surveys, Interviews, Experiments, Observations
B. Secondary Sources
Definition: Data that has been previously collected and published by others.
Examples: Academic journals, Government publications, Databases, Reports from
research organizations
Methods of Collection of Data
A. Surveys and Questionnaires
Description: Structured tools consisting of questions designed to gather information.
Types:
Closed-ended questions: Fixed responses (e.g., yes/no).
Open-ended questions: Allow for detailed responses.
B. Interviews
Description: Direct interactions with individuals to gather in-depth information.
Types:
Structured: Predefined questions.
Semi-structured: Combination of predefined and spontaneous questions.
Unstructured: Free-flowing conversation.
C. Observations
Description: Collecting data by watching subjects in their natural environment.
Types:
Participant observation: Researcher engages with subjects.
Non-participant observation: Researcher remains detached.
D. Experiments
Description: Controlled studies to test hypotheses and analyze cause-and-effect relationships.
Characteristics:
Involves manipulation of independent variables.
Randomization is often used to reduce bias.
E. Administrative Data
Description: Data collected by organizations for administrative purposes.
Examples: Health records, Census data, Educational records
F. Online Data Collection
Description: Utilizing digital platforms and tools for data collection.
Examples: Web surveys, Social media analytics, Online focus groups

TABLES
Definition
• A table is a systematic arrangement of data in rows and columns that facilitates
comparison.
Types of Tables
1. Simple Tables:
o Present a single variable, such as a frequency count.
o Example: A table listing the number of students in different grade ranges.
2. Complex Tables:
o Display multiple variables and summary statistics.
o Example: A table showing average test scores categorized by gender and class.

Best Practices
• Clear Headings: Use descriptive titles and labels for rows and columns.
• Logical Arrangement: Organize data in a way that enhances readability.
• Include Units: Always specify units of measurement to avoid ambiguity.
• Highlight Key Data: Use bold text or shading for important figures to draw attention.

Charts

Definition
• Charts visually represent data, making it easier to understand comparisons and trends.
Types of Charts
1. Bar Charts:
o Ideal for comparing quantities across categories.
o Can be horizontal or vertical.
6

Series 1 Series 2 Series 3

2. Pie Charts:
o Represent parts of a whole; useful for showing percentage breakdowns.
o Best used for a small number of categories.

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

3. Line Charts:
o Show trends over time, connecting data points with a continuous line.
o Ideal for displaying continuous data.

6
5
4
3
2
1
0
Category 1 Category 2 Category 3 Category 4

Series 1 Series 2 Series 3

4. Histograms:
o Similar to bar charts but used for displaying frequency distributions of numerical
data.
o Bars touch each other, indicating continuous intervals.
Best Practices

• Choose the Right Chart: Select the chart type that best conveys the data story.
• Label Axes Clearly: Ensure axes are labeled with units and titles for clarity.
• Use Legends: When multiple data series are presented, include a legend for clarification.
• Maintain Visual Clarity: Avoid overcrowding the chart with too much information.

4. Graphs
Definition
• Graphs are visual tools that illustrate relationships between two or more variables.
Types of Graphs
1. Scatter Plots:
o Plot individual data points on two axes to visualize the relationship between
variables.
o Useful for identifying correlations and trends.

Y-Values
4

0
0 1 2 3

2. Box Plots:
o Summarize data distributions through quartiles, displaying median, range, and
outliers.
o Effective for comparing distributions across multiple groups.
Best Practices
• Ensure Proper Scaling: Use consistent and appropriate scales on axes.
• Highlight Important Trends: Use annotations to point out significant data points or
trends.
• Avoid Misleading Representations: Present data accurately without distorting it.

Errors in Data Presentation

Errors in data presentation can significantly distort the interpretation and conclusions drawn from
data. Here are some common statistical examples:

1. Misleading Graphs:
o Truncated Y-Axis: When the y-axis starts above zero, it can exaggerate
differences between data points. For example, a bar graph that shows sales growth
might appear dramatic if the y-axis begins at 100 instead of 0.
2. Cherry-Picked Data:
o Presenting only favorable data while ignoring data that contradicts it. For
instance, a study might highlight only the years where a product's sales increased,
neglecting years of decline. Example: Reporting an average return 𝑥̅ of 15% from
2015-2019 but omitting 2020, where returns were -5%.
3. Inappropriate Aggregation:
o Combining data from different categories without considering context can lead to
misleading conclusions. For example, averaging the test scores of students from
vastly different schools can obscure disparities. Example: Averages 𝑥̅ from
different schools can be misleading if one school has significantly lower scores.

4. Improper Use of Averages:


o Using the mean instead of the median can misrepresent data, especially in skewed
distributions. For instance, in income data, a few very high incomes can raise the
mean significantly, while the median may better reflect the typical income.
Example: Average income 𝑥̅ = $100,000 due to a few high earners, while
median income 𝑥̅ = $50,000 reflects typical earnings.
5. Overgeneralization:
o Drawing broad conclusions from a small or unrepresentative sample. For
example, surveying a small group of people in one location and claiming the
results apply universally.
6. Ignoring Sample Size:
o Presenting results from a small sample as if they are representative of a larger
population can lead to misleading insights. For example, a study with only 10
participants should not be generalized to a population of thousands.
7. Improper Scale:
o Using inconsistent scales on graphs or charts can distort the viewer's
understanding. For instance, a pie chart that doesn't sum to 100% or uses unequal
segment sizes can mislead.
8. Correlation vs. Causation:
o Implying that correlation indicates causation can be misleading. For example, a
graph showing an increase in ice cream sales alongside a rise in drowning
incidents doesn’t mean ice cream consumption causes drowning.
9. Data Dredging:
o Mining data for patterns without a priori hypotheses can lead to spurious
correlations. For instance, finding a relationship between two unrelated variables
after testing many hypotheses without proper controls.
10. Lack of Context:
o Presenting statistics without context can lead to misinterpretation. For example,
stating that crime rates decreased by 20% without mentioning that they were
previously at a historically high level.

Common Errors
• Misleading Scales: Distorting the visual impact by manipulating axis scales.
• Cherry-Picking Data: Selectively presenting data that supports a specific conclusion
while ignoring contrary evidence.
• Overcomplication: Using complex visuals that overwhelm rather than clarify.
Avoiding Errors
• Transparency: Disclose data sources and methodologies.
• Follow Standards: Adhere to standard practices in data presentation to ensure reliability.
6. Approximations
Definition
• Approximations simplify data for clarity and ease of understanding.
When to Use
• When precise values are less critical than demonstrating a trend or pattern.
• In contexts where data is subject to variability or measurement error.
Frequency Distribution
Definition
• A frequency distribution is a summary of how often each value occurs in a dataset. It
provides an organized way to present raw data.
Types
1. Ungrouped Frequency Distribution:
o Lists each unique value and its corresponding frequency.
o Example: Number of students scoring each grade.
2. Grouped Frequency Distribution:
o Organizes data into classes or intervals.
o Example: Age groups (0-9, 10-19, etc.) with corresponding frequencies.
Cumulative Frequency Distribution
• Represents the total number of observations that fall below or at a particular value.
• Calculation: Add the frequency of each class to the sum of frequencies of all preceding
classes.
Example
Class Interval Frequency Cumulative Frequency
0–9 5 5
10 – 19 10 15
20 – 29 7 22
30 – 39 3 25

2. Measures of Location

Definition

• Measures of location indicate the central tendency of a dataset, summarizing the data
with a single value.

Key Measures

1. Mean:
o
Average of all values.
∑x
o Formula: 𝑥̅ = where n is the number of observations.
𝑛
2. Median:
o The middle value when data is ordered.
o If n is odd, it’s the ((n + 1)/2)th value; if even, it’s the average of the (𝑛/2)𝑡ℎ
and (𝑛/2 + 1)th values.
3. Mode:
o The most frequently occurring value(s) in the dataset.
o A dataset can be unimodal (one mode), bimodal (two modes), or multimodal
(multiple modes).

3. Measures of Dispersion

Definition

• Measures of dispersion describe the spread or variability of a dataset.

Key Measures

1. Range:
o
Difference between the maximum and minimum values.
Formula: Range=Max−Min
o
2. Variance:
o Measures the average squared deviation from the mean.
∑(x−x̅)2
o Sample variance: 𝑠 2 = 𝑛−1
2 ∑(x−μ)2
Population variance: 𝜎 = 𝑁
o
3. Standard Deviation:
o The square root of variance; indicates the average distance of data points from the
mean.
o Sample standard deviation: 𝑠 = √𝑠 2
o Population standard deviation: 𝜎 = √𝜎 2

SKEWNESS

Definition

• Skewness measures the asymmetry of the probability distribution of a real-valued random


variable.

Interpretation

• Positive Skew: Tail on the right side; mean > median.


• Negative Skew: Tail on the left side; mean < median.
• Zero Skew: Symmetrical distribution; mean = median.

Calculation

𝑛 𝑥𝑖 −𝑥̅
• Formula for sample skewness: Skewnes𝑠 = (n−1)(n−2) ∑ ( )
𝑠
where 𝑠 is the standard deviation.

5. Kurtosis

Definition

• Kurtosis measures the "tailedness" of the distribution, indicating the presence of outliers.

Types

1. Mesokurtic: Normal distribution (kurtosis ≈ 3).


2. Leptokurtic: More peaked than normal (kurtosis > 3).
3. Platykurtic: Flatter than normal (kurtosis < 3).

Calculation

• Excess kurtosis: Kurtosis−3

6. Index Numbers
Definition

• Index numbers are statistical measures that represent changes in a variable or a group of
variables over time.

Types

1. Price Index:
o Measures changes in price levels, such as the Consumer Price Index (CPI).
2. Quantity Index:
o Measures changes in quantities produced or consumed.
3. Value Index:
o Measures changes in total value, combining price and quantity changes.

Calculation Example

Price in year t
• Price Index for year ttt: 𝑃𝑟𝑖𝑐𝑒 𝐼𝑛𝑑𝑒𝑥 = (Price in base year) × 100

References

1. Triola, M. F. (2018). Elementary Statistics. Pearson.


2. Bluman, A. G. (2018). Elementary Statistics: A Step by Step Approach. McGraw-Hill.
3. Weiss, N. A. (2016). Introductory Statistics. Pearson.
4. Newman, A. J. (2017). Statistics for Business and Economics. Cengage Learning.

You might also like