0% found this document useful (0 votes)
103 views5 pages

Descriptive Statistics

Descriptive statistics can be used to provide basic information about variables and highlight relationships between variables. There are three main types of descriptive statistics: measures of central tendency like the mean and median; measures of dispersion like variance and standard deviation; and measures of association like chi-square and correlation. Graphical methods such as histograms and scatter plots provide visual representations of the data.

Uploaded by

api-472656698
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views5 pages

Descriptive Statistics

Descriptive statistics can be used to provide basic information about variables and highlight relationships between variables. There are three main types of descriptive statistics: measures of central tendency like the mean and median; measures of dispersion like variance and standard deviation; and measures of association like chi-square and correlation. Graphical methods such as histograms and scatter plots provide visual representations of the data.

Uploaded by

api-472656698
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

DESCRIPTIVE STATISTICS

Descriptive statistics can be useful for two purposes:


1) To provide basic information about variables in a dataset and
2) To highlight potential relationships between variables.
The three most common descriptive statistics can be displayed graphically or pictorially and are
measures of:

• Graphical/Pictorial Methods
• Measures of Central Tendency
• Measures of Dispersion
• Measures of Association

Graphical/Pictorial Methods

There are several graphical and pictorial methods that enhance researchers' understanding of
individual variables and the relationships between variables. Graphical and pictorial methods
provide a visual representation of the data. Some of these methods include:

• Histograms
• Scatter plots
• Geographical Information Systems (GIS)
• Sociograms

Histograms

• Visually represent the frequencies with which values of variables occur


• Each value of a variable is displayed along the bottom of a histogram, and a bar is drawn
for each value
• The height of the bar corresponds to the frequency with which that value occurs

Scatter plots

• Display the relationship between two quantitative or numeric variables by plotting one
variable against the value of another variable
• For example, one axis of a scatter plot could represent height and the other could
represent weight. Each person in the data would receive one data point on the scatter plot
that corresponds to his or her height and weight

Geographic Information Systems (GIS)

• A GIS is a computer system capable of capturing, storing, analyzing, and displaying


geographically referenced information; that is, data identified according to location
• Using a GIS program, a researcher can create a map to represent data relationships
visually

Sociograms

• Display networks of relationships among variables, enabling researchers to identify the


nature of relationships that would otherwise be too complex to conceptualize

Visit the following websites for more information:

• Graphical Analytic Techniques


• Geographic Information Systems

Glossary terms related to graphical and pictorial methods:


GIS Scatter Plot
Histogram Sociogram

Measures of Central Tendency

Measures of central tendency are the most basic and, often, the most informative description of a
population's characteristics. They describe the "average" member of the population of interest.
There are three measures of central tendency:
Mean -- the sum of a variable's values divided by the total number of values
Median -- the middle value of a variable
Mode -- the value that occurs most often
Example:
The incomes of five randomly selected people in the United States are $10,000, $10,000,
$45,000, $60,000, and $1,000,000.
Mean Income = (10,000 + 10,000 + 45,000 + 60,000 + 1,000,000) / 5 = $225,000
Median Income = $45,000
Modal Income = $10,000
The mean is the most commonly used measure of central tendency. Medians are generally used
when a few values are extremely different from the rest of the values (this is called a skewed
distribution). For example, the median income is often the best measure of the average income
because, while most individuals earn between $0 and $200,000, a handful of individuals earn
millions.
Visit the following websites for more information:

• Basic Statistics
• Descriptive Statistics
• Measures of Position
Glossary terms related to measures of central tendency:
Average Mode
Central Tendency Moving Average
Confidence Interval Point Estimate
Mean Univariate Analysis
Median
Measures of Dispersion

Measures of dispersion provide information about the spread of a variable's values. There are
four key measures of dispersion:

• Range
• Variance
• Standard Deviation
• Skew

Range is simply the difference between the smallest and largest values in the data. The
interquartile range is the difference between the values at the 75th percentile and the
25th percentile of the data.
Variance is the most commonly used measure of dispersion. It is calculated by taking the
average of the squared differences between each value and the mean.
Standard deviation, another commonly used statistic, is the square root of the variance.
Skew is a measure of whether some values of a variable are extremely different from the
majority of the values. For example, income is skewed because most people make between $0
and $200,000, but a handful of people earn millions. A variable is positively skewed if the
extreme values are higher than the majority of values. A variable is negatively skewed if the
extreme values are lower than the majority of values.
Example:
The incomes of five randomly selected people in the United States are $10,000, $10,000,
$45,000, $60,000, and $1,000,000:
Range = 1,000,000 - 10,000 = 990,000
Variance = [(10,000 - 225,000)2 + (10,000 - 225,000)2 + (45,000 - 225,000)2 + (60,000 -
225,000)2 + (1,000,000 - 225,000)2] / 5 = 150,540,000,000
Standard Deviation = Square Root (150,540,000,000) = 387,995
Skew = Income is positively skewed
Visit the following websites for more information:
• Descriptive Statistics • Summarizing and Presenting Data
• Survey Research Tools • Skewness
• Variance and Standard Deviation • Skewness Simulation
Glossary terms related to measures of dispersion:
• Confidence Interval Range
Distribution Skewness
Kurtosis Standard Deviation
Point Estimate Univariate Analysis
Quartiles Variance

Measures of Association

Measures of association indicate whether two variables are related. Two measures are commonly
used:

• Chi-square
• Correlation

Chi-Square

• As a measure of association between variables, chi-square tests are used on nominal data
(i.e., data that are put into classes: e.g., gender [male, female] and type of job [unskilled,
semi-skilled, skilled]) to determine whether they are associated*
• A chi-square is called significant if there is an association between two variables, and
nonsignificant if there is not an association

To test for associations, a chi-square is calculated in the following way: Suppose a researcher
wants to know whether there is a relationship between gender and two types of jobs, construction
worker and administrative assistant. To perform a chi-square test, the researcher counts up the
number of female administrative assistants, the number of female construction workers, the
number of male administrative assistants, and the number of male construction workers in the
data. These counts are compared with the number that would be expected in each category if
there were no association between job type and gender (this expected count is based on statistical
calculations). If there is a large difference between the observed values and the expected values,
the chi-square test is significant, which indicates there is an association between the two
variables.
*The chi-square test can also be used as a measure of goodness of fit, to test if data from a
sample come from a population with a specific distribution, as an alternative to Anderson-
Darling and Kolmogorov-Smirnov goodness-of-fit tests. As such, the chi square test is not
restricted to nominal data; with non-binned data, however, the results depend on how the bins or
classes are created and the size of the sample
Correlation

• A correlation coefficient is used to measure the strength of the relationship between


numeric variables (e.g., weight and height)
• The most common correlation coefficient is Pearson's r, which can range from -1 to +1.
• If the coefficient is between 0 and 1, as one variable increases, the other also increases.
This is called a positive correlation. For example, height and weight are positively
correlated because taller people usually weigh more
• If the correlation coefficient is between -1 and 0, as one variable increases the other
decreases. This is called a negative correlation. For example, age and hours slept per
night are negatively correlated because older people usually sleep fewer hours per night

Visit the following websites for more information:

• Chi-Square Procedures for the Analysis of Categorical Frequency Data


• Chi-square Analysis
• Correlation

Glossary terms related to measures of association:


Association Measures of Association
Chi Square Pearson's Correlation Coefficient
Correlation Product Moment Correlation Coefficient
Correlation Coefficient

You might also like