0% found this document useful (0 votes)
10 views22 pages

Stats Lecture-2

Uploaded by

zeyneperolmez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views22 pages

Stats Lecture-2

Uploaded by

zeyneperolmez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Probability and

Statistics Lecture 2
Dr Sumeyye BAKIM
2024
1
Outline
• Histograms
• Shapes of Frequency Distributions
• Misleading Graphs
• Frequency Tables and Histograms in
Research Articles

2
Histograms
A graph is another good way to facilitate
understanding of a large group of scores. A result
can be described in a thousand words or a
thousand numbers. A simple approach is to create
a graph of the frequency table. A graph of the
information in a frequency table is called a
histogram, which is a type of bar chart. In a
histogram, the height of each bar represents the
frequency of each value in the frequency table.
Normally, in a histogram, all the bars are placed
side by side without any gaps in between

A histogram is a bar-like graph of a frequency distribution


where the values are plotted along the horizontal axis, and the
height of each bar represents the frequency of that value; since
the bars are typically placed side by side without gaps, it gives

3
the appearance of a city skyline.
Stress Level Example
Histogram

4
Social Interaction Frequency
Histogram

Histogram for number of social interactions during a week for 94 college students,
based on grouped frequencies. (Data from McLaughlin-Volpe et al., 2001.
5
How to Make a Histogram

❶ Make a frequency table (or grouped frequency table).

❷ Put the values along the bottom of the page, from left to right, from lowest
to highest

Attention!! When creating a histogram from a grouped frequency table, the values you place at the
bottom of the page are the midpoints of the intervals. The midpoint of an interval is halfway between
the start of that interval and the start of the next highest interval (for the interval 0-4, the midpoint is
2.5).
❸ Make a scale of frequencies along the left edge of the page that goes from 0 at the bottom to the highest
frequency for any value.
❹ Make a bar above each value with a height for the frequency of that value.
For each bar, make sure that the middle of the bar is above its value.

If you have a nominal variable, the histogram is called a bar chart.

Since the values of a nominal variable are not ordered, a gap is left between the bars.
6
Closest Person Example
Bar Graph

Bar graph for the closest


person in life for 208 students
(see Table 4). (Data from Aron
et al., 1995.)

7
Shapes of Frequency Distributions
A frequency distribution shows the pattern of frequencies across various values. A frequency table
or histogram defines a frequency distribution because each illustrates how the frequencies are
spread or 'distributed.' Psychologists also describe this shape in words. It is important to define the
shape of a distribution.

Question: How many 'peaks' are there in a distribution?

Single?
Double?
Multiple? None?
8
FREQUENCY POLIGONS
Unimodal Distribution A frequency distribution in which one value has a
frequency that is clearly larger than the others.

Two frequency distributions that have approximately


Bimodal Distribution equal frequencies, each clearly larger than the others.

Multimodal Distribution Any distribution with two or more high points

Rectangular Distribution A distribution with values of all about the same frequency

Unimodal Bimodal Rectangular 9


In a frequency polygon, the line moves from point to point. The height of each
point represents the number of scores at that value, creating the silhouette of
a mountain. Scores obtained from most studies typically follow an
approximately unimodal (single-peaked) distribution. Bimodal and other
multimodal distributions may occasionally occur.

In the example of stress levels,


the most frequently occurring
value is 7 (the highest
frequency is 7). This is an
example of a unimodal
distribution.

10
Bimodal
(a) A bimodal distribution showing
the possible frequencies for people
of different ages in a toddler’s play
area.

Rectangular
(b) A regular distribution showing
the possible frequencies of
students at different grade
levels in an elementary school.

11
Symmetric and Skewed Distributions
Take another look at the histograms of the
stress rating example. The distribution is
balanced by an increase in scores towards
the ends, which is somewhat unusual.
Most things we measure in science have
equal numbers on both sides of the
middle. This means that in science, scores
often follow an approximately symmetric
distribution (if you fold the graph of a
symmetric distribution in half, the two
halves look the same).

12
A distribution that is not symmetric is called a skewed
distribution. The stress rating distribution is an example of
this. A skewed distribution has a long and stretched side,
resembling a tail. The side with fewer scores (the tail-like
side) is considered the direction of skewness. Thus, the stress
study example, which has very few scores at the lower end, is
skewed to the left.
The example of social interactions, which has very few scores
at the upper end, is skewed to the right (see the figure on
right). The figure below shows examples of approximately
symmetric and skewed distributions.

Approximately Positively Skewed Negatively Skewed


symmetrical
13
Strongly Skewed Distributions and Floor Effect
Strongly skewed distributions often arise in science when
there is an upper and lower limit on the measured
variable.

For example, a mechanical component cannot have a


negative tensile strength. The situation where tensile
strength cannot take a value lower than zero is called a
floor effect. The right-skewed distribution caused by this
lower limit can be seen in the figure

14
A distribution skewed to the right due to a floor effect: fictional
distribution of the number of children in families.
Ceiling Effect
The skewed distribution caused by the upper
limit can be seen in the figure on the right. This
distribution represents the results of an adults'
multiplication table test and is strongly skewed
to the left. This illustrates a ceiling effect. A
ceiling effect is also evident in the stress level
example, where the highest stress level is 10
and cannot exceed this value.
A distribution skewed to the left due to a ceiling
effect: fictional distribution of adults’ scores on
a multiplication table test.

Floor effect: Skewed to the right


Ceiling effect : Skewed to the left 15
Normal Distribution Social Interaction

Scientists define distributions by their peak points, which can


either rise or fall. The standard for this comparison is the bell-
shaped curve. In research and natural phenomena,
distributions typically resemble this bell curve, known as the
normal distribution. It's important to note that the normal
distribution has a single peak (unimodal) and is symmetrical in
shape. However, in examples of stress levels and social
interactions, the distributions can be skewed. In general, when
examining studies, results often closely align with the normal Stress Level
distribution, except in these two cases.

16
Kurtotic Distribution
Kurtosis measures how different the shape of a distribution is from a normal curve. Is it taller or
flatter than the normal curve? The term "kurtosis" comes from the Greek word "kyrtos," meaning
"curve."
The figure below (b) shows a kurtotic distribution with a more pronounced peak than the normal
curve. Figure (c) illustrates an extreme example of a kurtotic distribution that is very flat. (A
rectangular distribution would be an even more extreme example.)
Distributions that are taller or flatter than a normal curve also tend to have different shapes in
their tails. Distributions with a very tall curve typically have more data points in their tails
compared to the normal curve (see figure b).
In contrast, flatter distributions tend to have fewer data points in their tails than the normal curve
(see figure c).
By comparing kurtosis to the normal curve, we can determine how much it has become taller or
flatter. The key point here is the number of data points in the tails.

(a) normal, (b) heavy-tailed, and (c)


light-tailed distributions. The normal
distribution is shown as a dashed line
in (b) and (c).
17
Misleading Graphs

The most serious discussions about frequency


tables and histograms are not among scientists,
but among the general public.

Of course, people can lie with statistics and


often do. It's easy to lie with words, but you
may not always recognize the lies told with
numbers.

There are two main ways to explain how


frequency tables and graphs can be misused
and how to recognize such abuses.

18
Failure to Use Equal Interval Sizes
A fundamental requirement for a grouped frequency
table or graph is that the size of the intervals must
be equal. If they are not, the table or graph can be
very misleading. The table next to this text gives the
impression that commissions paid to travel agencies
dropped dramatically in 1978.

Upon closer inspection of the graph, it reveals that the


third bar for each airline only represents the first half
of 1978. Therefore, only half of the year is being
compared to complete previous years. If we assume
that the second half of 1978 is similar to the first half,
the information in this graph actually indicates an
increase rather than a decrease in 1978. For example,
Delta Airlines reached a total of $72 million in 1978,
significantly higher than the $57 million in 1977.

19
Histograms in Research Articles
Maggi and colleagues (2007) conducted a study on age-related
changes in smoking behavior among Canadian adolescents. As
shown in the figure, they created a histogram from a grouped
frequency table to display their results. Their histogram
represents the results from two samples (illustrated with dark
and light bars).

As can be seen from the figure, less than 10% of those aged 10-11
have tried smoking, while more than half of those aged 16-17
have attempted it. In this example, the researchers drew the
histogram with gaps between the bars, whereas gaps should be
avoided (unless you are drawing a bar graph for a nominal
variable).

Additionally, the differing sample sizes in each age group can lead
to misleading percentages.

20
Exaggeration of Proportions
The height of a histogram or bar graph (or
frequency polygon) typically starts at 0 or the
lowest value on the scale and extends to the
highest value on the scale.

Figure a shows a bar graph that does not


adhere to this standard. The bar graph
illustrates the average housing prices in a
specific area over four years (from 2008 to
2011). By starting the vertical axis at $150,000
(rather than 0), the graph appears to
exaggerate the changes in housing prices over
time.

Figure b, which starts the vertical axis at $0,


shows the same results. You can observe the
changes in housing prices year by year in
Figure b, and these changes are accurate.

21
The total ratio of a histogram or bar graph should be approximately 1 to 1.5 times its length, as
seen in Figure a for the stress rating example. However, consider what happens if we make the
graph much shorter or longer, as shown in Figures b and c. This change is akin to using
software to alter a person's photograph: the actual image is distorted. Any shape of a
histogram can be considered correct in some sense. However, a ratio of 1 to 1.5 has been
adopted as a standard for comparison purposes. Altering this ratio misleads the viewer.

22

You might also like