0% found this document useful (0 votes)
49 views16 pages

Assessment 1 - Getting Started With Your Data

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views16 pages

Assessment 1 - Getting Started With Your Data

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Assessment 1 - Getting Started With Your Data

Overview: Throughout this course, you will be working toward your final assessment—an
archival data research project. What does that mean? It means you will be using real data,
collected as part of the General Social Survey (GSS), to answer a real question.

To help you complete this final assessment, these earlier assessments will walk you through
the steps. In fact, to provide added assistance (and maybe even a sense of fun), you will
have three fictional research assistants.

Meet your research assistants:

Juanita: She is junior in college with the intention of being a counselor. She’s more
interested in the results of the study than in the process to get the results.

Duante: Duante is a senior in college who loves algebra but is skeptical about statistics.
However, he wants a job where he can do research and is very interested in what you are
doing.

Amanda: Amanda is a first-year student in college. She’s friendly and likes to talk a lot (and
ask plenty of questions). She hasn’t been through her statistics course yet, but she is great
at staying organized and loves using Excel.

Directions: Complete all six parts of this worksheet.


PART 1: GETTING STARTED

The General Social Survey (GSS) has been studying America, specifically American society, for 50
years. All of their data are available to the public (and to researchers). This makes GSS a great source
for archival data projects. The surveys are long, providing a number of potential variables. For this
project, certain variables have been chosen for you and the data extracted from the GSS site ahead of
time. If you’re curious and want to learn more about GSS, feel free to check out the About the GSS
web page.

Directions: Complete the steps below. These steps will prepare you for later portions of this
assessment as well as for your final project.

Step 1: Choose your variables, one from each column and fill in the white blocks of the table
below.

List A (Choose ONE variable from this list) List B (Choose ONE variable from this list)
RACLIVE. Have other race living in HAPPY. General happiness. This asks the
neighborhood. This is a Yes/No question respondents to rate how happy they are (small
asking respondents if they live in a Likert scale).
neighborhood with people of another race.
NEWS. How often does respondent read LIFE. If life is exciting or dull. This question
newspaper. This question asks respondents to asks respondents to rate their life as exciting,
rate how often they read the newspaper (Likert routine, or dull (small Likert scale).
scale).
WWWHR. Internet hours per week. This MNTLHLTH. Days of poor mental health
question asks respondents to share how many past 30 days. This question asks respondents
hours in a week they use the Internet for non- how many days of poor mental health they’ve
email activities. had in the past 30 days.
DEPRESS. Told have depression. This is a
Yes/No question asking respondents if they
have been told they have depression.
My Variable (from List A): My Variable (from List B):
RACLIVE HAPPY

Note: Once you’ve selected your variables, use them throughout the course. Changing
variables partway through will require you to re-do work you’ve already done.

What’s your preliminary research question? Note: The survey method used to collect
these data was not designed to allow you to determine causation or the effect of one
variable on the other.
Step 2: Download data. Download the "GSS Data 2018" Excel file from the assessment. Make
sure you save this somewhere on your computer (where you can find it again).

Step 3: Clean the data. In this case, Amanda helped you out and, if you look at the Excel file
you downloaded, the data have been cleaned for you. She spent time going through the data
to make sure there were results for both variables you will use and remove those
participants who did not respond to both questions. She also made sure that the data were
in a form that JASP could read accurately (right labels, right file format, et cetera). She’s a bit
of an overachiever, so she cleaned everyone’s data. On the Excel spreadsheet, you will want
to make sure you are on the right tab for your project. To understand what answer the
numbers correspond to, look at the Codes tab.

Amanda’s Notes on Naming:


 Raclive and happy: Projects looking at general happiness and whether people live in
neighborhoods with people of another race.
 Raclive and life: Projects looking at if life is exciting/dull and whether people live in
neighborhoods with people of another race.
 Raclive and mntlhlth: Projects looking at days of poor mental health and whether people live
in neighborhoods with people of another race.
 Raclive and depress: Projects looking at if someone has been told they have depression and
whether people live in neighborhoods with people of another race.
 News and happy: Projects looking at amount of time reading the news and general
happiness.
 News and life: Projects looking at amount of time reading the news and if life is
exciting/dull.
 News and mntlhlth: Projects looking at amount of time reading the news and days of poor
mental health.
 News and depress: Projects looking at amount of time reading the news and if someone has
been told they have depression.
 Wwwhr and happy: Projects looking at number of hours on the Internet and general
happiness.
 Wwwhr and life: Projects looking at number of hours on the Internet and if life is
exciting/dull.
 Wwwhr and mntlhlth: Projects looking at number of hours on the Internet and days of poor
mental health.
 Wwwhr and depress: Projects looking at number of hours on the Internet and if someone has
been told they have depression.
Step 4: Save your .csv file. Duante didn’t want to be outdone by Amanda, so he created .csv
files of everyone’s data project. To find them, open the data files in the assessment and locate
the data for your project. He used the same naming convention as Amanda did above. Make
sure you save these files somewhere on your computer (where you can find them again).
PART 2: UNDERSTANDNG MEASURES OF CENTRAL TENDENCY

Amanda is confused by the measures of central tendency and has come to you asking questions.

Directions: Answer all questions in the table.

Scoring Criterion: Describe key statistical concepts.

What types of variables can you compute a Nominal


useful mean for? Select as many options as Ordinal
apply. ✘ Interval
✘ Ratio
When would a mean be useful?
The mean is necessary when the data is
evenly distributed in the form of numerical
numbers that are symmetric or unbalanced
in equally weighted and continuous
frequency. It determines a point notionally
close and consistent with the whole body
provided that the sample size is large, and
also provides simplified and direct
comparison between groups. Consequently,
beware of the problem of skewed
distributions and outliers, because they
may change the mean value, which will be
relatively bias as it will represent not the
average value. Instead, the mean might not
be a good choice, but measures such as
median or mode will be better.
Now what about the mode - what types of ✘ Nominal
variables can you find a mode for? Select as ✘ Ordinal
many options as apply. Interval
Ratio
When is a mode useful? Modal value deems significant because it
indicates prevalent category or answer
across categories i.e. for ordinal, nominal
and non-numeric data where traditional
measures of variance such as mean or
median may not work. When there's a need
for a simple and easy to grasp information,
it comes up especially as there is a simple
method of getting a result and optical form
of its presentation. On the datasets with
non-symmetric shape, the mode also can act
as a robust statistic for getting to know the
central tendency and the data set in general.

Then the median - what types of variables Nominal


can you find the median for? Select as many ✘ Ordinal
options as apply. ✘ Interval
✘ Ratio
And when would you use the median (and The median finds its application when you
not the mean or mode)? need a measure of central tendency that
does not massively change when extreme
values or outliers are introduced in the data
and that is less biased compared to the
mean. It is well suited in case of skewed
distributions or when the data comprises of
extreme values which might consequently
lower the mean. Besides, is median the
appropriate variable when there is dealing
with the data that are not distinguished
numerically but ordinal data? In cases
where the data set is unevenly distributed
or there are extreme outliers, using the
median becomes more useful and leads to a
more precise distribution’s central value
when compared to the mean and mode.
What does the variance tell us? Variance defines the range or distribution
characterizing a dataset relating to its
average value. It measures the average
degree to which each data point deviates
from the mean, of which the squared value
is taken. When it comes down to it, this is
an indication of how badly the individual
data points differ from the mean value or
some measure of central tendency. A high-
level of variance is characterized by data
points being very spread-out from the
mean value, while low-level variation
means that the data points are very close to
the mean value. Measures like this are
crucial in the statistics' process of
understanding the specificity of sample and
population distribution of data which is
very helpful in making inferences and
drawing conclusions.
Why is the variance important? It is important that it is so, variance plays
the role of the quantifier, indicating how
spread are the data points from the mean,
to disclose the characteristics of distribution
as also for the risk assessment. This process
enables us to appraise the variability in
datasets, distinguish different groups and
evaluate the performance of a statistical
model. For instance, within finance,
economics and research fields, variance
performs a paramount role in decisions,
which entails capturing the degree of
chance or volatility existing in data,
henceforth guiding the resolving of the
problems.
Consider the demographic data on our Typical measures of dispersion, including
participants. Where might you use mean, median and mode, would be of great
measures of central tendency? use in the evaluation of demographic
information on our participants in the
different aspects. However, this might be
done through the creation of the measures
with typical thickness of age, income,
education level, or any other demographic
characteristic. We could use a mean to give
an average for the variables of a continuous
data type like age or income, while the
median could be a better choice if these are
skewed or with outliers. Moreover, there
may be a system which would help to
determine the most often used category for
enumeration of qualitative factors such as
gender and race. Generally, mean, median,
mode and typical values are used to gather
the idea around general central values and
demand for the additional interpretation
and analysis.
PART 3: YOUR DATA

You’ve sent Amanda off to think more about measures and central tendency. Working with Duante,
you start to consider your variables. He wants to know what the data type for each available variable
is.

Directions: Fill in the table below.

Scoring Criterion: Determine data type.

What data type is our data (for JASP)? Note: JASP offers you three options: Nominal,
Ordinal, Continuous. So, if the data is ratio or interval, call it continuous for the purpose of
this table.

Variable Type Variable Type Variable Type


AGE Continuous RACE Nominal SEX Nominal
RACLIV Nominal NEWS Ordinal WWWHR continuous
HAPPY Ordinal LIFE Ordinal MNTLHLTH Ordinal
DEPRES Nominal
S

NOTES:

Age: Respondents entered their exact age

Race: Respondents had a choice of: White, Black, or Other (The survey uses limited options,
which is a limitation for modern studies but useful when comparing modern data with
historical data.)

Sex: Respondents had a choice of: Male, Female (Based on biological sex at birth; this is not a
variable looking at gender.)

Racliv: Respondents were given a Yes/No question on whether they lived in a neighborhood
with other races.

News: Respondents were given a version of a Likert scale.

Wwwhr: Respondents entered the number of hours they are on the Internet each week.

Happy: Respondents were given a version of a Likert scale.

Life: Respondents were given a version of a Likert scale.


Mntlhlth: Respondents entered the numbers of days of poor mental health.

Depress: Respondents were given a Yes/No question on whether they had been told they had
depression.

In JASP, select the three blue bars, then select open, then select the location you saved the
.csv file.

JASP tries to guess at it and will use the symbols below.

You and Duante look at JASP’s guesses.

 If JASP got it wrong, correct it.

 Click on the appropriate variable data icon in the column title to change it to the
correct format.

Paste a screenshot of your data column in JASP below.


PART 4: DATA ON YOUR PARTICIPANTS

Continuing to work with Duante, you decide to learn more about your participants.

Duante has done some research into GSS and provides you with the following information:

The General Social Survey uses random sampling to get a representative sample of adults
across the United States (NORC, 2019).

Reference:

National Opinion Research Center (NORC). (2019). Appendix A: Sampling design and
weighting. In General Social Surveys 1972–2018: Cumulative Codebook (pp. 3171–3189).
https://gss.norc.org/documents/codebook/GSS_Codebook_AppendixA.pdf

Before you start looking at your participants, Juanita has a few questions for you.

Directions: Answer the questions below.

Scoring Criterion: Explain the use of a mean with different types of variables.

Can you use a mean for the variable Sex? This feature value "Sex" was categorical
Why or why not? and had two groups (e.g. male and female),
thus "Sex" was the nominal data. Therefore,
the mean for the variable "Sex" isn't
appropriate to use. Discrete/ located
variables like "Sex" that have not been given
a numerical order or magnitude are
required for figuring out a mean. The mean
is a type of an average that is calculated for
continuous variables via numeric values
which may be of no use. Rather than, to
mention the cases of categorical variables
like "Sex," one would expect percentages or
proportions to be used in order to describe
the distribution of categories within the
dataset.
Can you use a mean for the variable Race? Nope, "Race" cannot be the vector you are
Why or why not? looking for since "Race" is "-ace" categorical
variable which represents different
categories or groups of individuals based
upon their racial or ethnic identities or lines
of descent. Ordinal variables cannot be used
to calculate a mean because they may lack
mathematical specificity like the ability to
mend the variable into numerical
progression or magnitude. In contrast,
parameters such as "Race" do not account to
a group of people. So, categorical variables
like "Race" are usually analyzed using
frequencies, proportions, or percentages
with which the data differs among distinct
groups.
Can you use a mean for the variable Age? Yes, you can use a mean for the variable
Why or why not? "Age." Age distributed in values that are
numerically continuous like the age of that
person. Discrete variables can be
formulated in ways that make them capable
of having the meaningful numerical
properties, such as magnitude and order,
which are fiendly to their calculation of a
mean. The mean gives a central tendency
measure of the data set for us to view the
average age of all individuals as the set of
data. On the other hand, an average
calculation for the variable "Age" is
justifiable given the fact that it is able to
sum up the central trend of ages within the
sample.

Now that Juanita has a better understanding of the mean, the two of you start looking at your
demographic data.

 Open JASP.

 Then open your .csv data file for your project. You can do this by clicking the three
blue bars, selecting Open, then select Computer, then choose your file from
wherever you saved it on your computer.

 In JASP, in your data file, select Descriptives.


 Select age, then select the

Copy and paste the resulting table below.


PART 5: GRAPHIC DISPLAY OF YOUR PARTICIPANTS

Juanita likes to process information visually and asks if you can create graphs or charts of the data.

Directions: Create a bar graph, pie chart, and frequency table by following the directions
below.

Scoring Criterion: Communicate statistical data in graphs and tables.

In JASP, in your data file, select Descriptives.

 Select sex, then select the

 Select race then select the

 Click Basic Plots, then put a check next to Distribution plots.

Copy and paste or take a screenshot of the Bar Graph for Race. Place it below.

 Next, Click Basic Plots, then put a check next to Pie Charts.

Copy and paste or take a screenshot of the Pie Chart for Sex. Place it below.

 Lastly, click Tables, then put a check next to Frequency Tables.


Copy and paste or take a screenshot of the Frequency Chart for Race. Place it below.
PART 6: LOOKING TO THE FUTURE

As you look over the participants, Amanda, Juanita, and Duante discuss what careers they want in
the future. They ask you what your thoughts are and whether you’d ever consider a job that involves
data analysis.

Directions: Answer all of the questions below.

Scoring Criterion: Discover career contingencies based on accurate self-assessment of


abilities, achievement, motivation, and work habits.

Step 1: Statistics and data analysis are marketable job skills. Search the Internet for jobs you
could apply for with a bachelor’s degree that require the use of statistics. Some good, key
search terms: psychology research assistance or survey data analysis.

Step 2: Answer the following questions.

What is Data Analyst


the job
title?
Provide a https://www.indeed.com/jobs?q=data+analyst&l=New+York
link to %2C+NY&from=searchOnHP&vjk=afe2fba3b2bf2362
the job
ad.
What are Besides, a BA degree in the area of statistics, mathematics, computer science or
the an analogous program is common for the admission. Some positions might
educatio just require candidates to possess a master's degree or professional
nal certifications such as in data analysis and statistics, which is a highly
requirem competitive field for professionals.
ents?

What Skillset including the proficiency in statistical software, for example — R,


other job Python, or SPSS.
requirem
ents Strong analytical and troubleshooting skills.
stand out
to you? Strong communication skills in order to translate and effectively convey
complex data in a way that non-technical stakeholders can understand in a
webinar.
Data visualization techniques and software integration.

The issues of detail and accuracy in data analysis demand for a more detailed
and accurate attention to detail and analysis.

If knowledge in data bases querying languages (for example, SQL) is required,


there is a chance that previous experience with it may be welcomed.

Familiarity with the latter will continue to remain an important criteria for job
seekers.
Does the Of course, I am an enthusiastic candidate for such a position. To me, this
job activity is an exciting opportunity to work with data and make discoveries
sound that can be further used to drive the business decisions or become the
interestin fundamentals of research. The employment of statistics, data visualization,
g to you? problem-solving tendencies among others, rekindles my analytical self. It also
Why or matches my skills and background with the framework of the job. Another
why not? compelling aspect involves aligning my skills to facilitate the development
projects and using a data-informed approach to initiate change.

Reference

Goss-Sampson, M. (2019). Statistical analysis in JASP: A guide for students.

You might also like