0% found this document useful (0 votes)
120 views36 pages

Module 2

This document discusses methods for data collection and presentation. It describes two key methods: observation and interviews. For observation, it notes that data can be collected through structured or unstructured observation. Interviews can be structured, semi-structured, or unstructured. Structured interviews have standardized questions but limited flexibility, while semi-structured interviews provide more leeway for researchers to probe respondents while maintaining a basic structure.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views36 pages

Module 2

This document discusses methods for data collection and presentation. It describes two key methods: observation and interviews. For observation, it notes that data can be collected through structured or unstructured observation. Interviews can be structured, semi-structured, or unstructured. Structured interviews have standardized questions but limited flexibility, while semi-structured interviews provide more leeway for researchers to probe respondents while maintaining a basic structure.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

MODULE 2: DATA COLLECTION AND PRESENTATION

Learning Outcomes:
1. Discuss the different methods used to collect data.
2. Choose an appropriate type of data representation to present data effectively

Data Collection

Data collection is the process of gathering and measuring information on variables of interest, in
an established systematic fashion that enables one to answer stated research questions, test
hypotheses, and evaluate outcomes.

The most critical objective of data collection is ensuring that information-rich and reliable
data is collected for statistical analysis so that data-driven decisions can be made for research.
Inaccurate data collection can impact the results of a study and ultimately lead to invalid results.

Four important points to consider when collecting data:


1. Suppose measurements of some characteristics from people (such as IQ) are being obtained.
In that case, better results will be achieved if the researcher does the measuring instead of
asking the respondent for the value.
2. The method of data collection used may expedite or delay the process. Avoid a medium they
would produce low response rates
3. Ensure that the sample is large enough for the required purpose.
4. Ensure that the method used to collect data results in a sample that is representative of the
population.

Data can be collected in a variety of ways. Observation, interviews, questionnaires,


experimentation, and registration are some of the most popular methods of collecting data.

1
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
Observation is a way of collecting data through observing. The observation data collection
method is classified as a participatory study because the researcher has to immerse herself in the
setting where her respondents are while taking notes and/or recording.

Observation as a data collection method can be structured or unstructured. Data collection


is conducted using specific variables and according to a pre-defined schedule in structured or
systematic observation. On the other hand, unstructured observation is conducted in an open and
free manner in the sense that there would be no pre-determined variables or objectives.
It is important to note that the observation data collection method may be associated with
certain ethical issues. Fully informed consent of research participant(s) is one of the basic ethical
considerations to be adhered to by researchers. At the same time, the behavior of sample group
members may change with negative implications on the level of research validity if they are notified
about the presence of the observer.
This delicate matter needs to be addressed by consulting with the dissertation supervisor
and commencing observation primary data collection process only after the supervisor has
approved the ethical aspects of the issue.

Advantages of observation
1. data collection method includes direct access to research phenomena,
2. high levels of flexibility in terms of application, and
3. generating a permanent record of phenomena to be referred to later.

Disadvantages of observation
1. longer time requirements,
2. high levels of observer bias, and
3. impact of observer on primary data, in a way that the presence of an observer may
influence the behavior of sample group elements.

2
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
An interview is generally a qualitative research technique that involves asking open-ended
questions to converse with respondents and collect elicit data about a subject (QuestionPro, 2020).

The interviewer in most cases, is the subject matter expert who intends to understand
respondent opinions in a well-planned and executed series of questions and answers. Interviews are
conducted with a sample from a population, and the key characteristic they exhibit is their
conversational tone.
Interviews offer the researchers a platform to prompt their participants and obtain inputs in
the desired detail. There are three fundamental types of interviews in research: structured interviews,
semi-structured interviews, and unstructured interviews.

Structured interview is defined as research tools that are extremely rigid in their operations
are allows very little or no scope of prompting the participants to obtain and analyze results. It is
thus also known as a standardized interview and is significantly quantitative in its approach.
Questions in this interview are pre-decided according to the required detail of information. They can
be closed-ended as well as open-ended – according to the type of target population. Closed-ended
questions can be included to understand user preferences from a collection of answer options. In
contrast, open-ended can be included to gain details about a particular section in the interview.

Advantages of structured interviews


1. Structured interviews focus on the accuracy of different responses, due to which
highly organized data can be collected. Different respondents have different answers
to the same structure of questions – answers obtained can be collectively analyzed.
2. They can be used to get in touch with a large sample of the target population.
3. The interview procedure is made easy due to the standardization offered by
structured interviews.
4. Replication across multiple samples becomes easy due to the same structure of the
interview.

3
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
5. As the scope of detail is already considered while designing the interview, better
information can be obtained. The researcher can analyze the research problem
comprehensively by asking accurate research questions.
6. Since the interview structure is fixed, it often generates reliable results and is quick to
execute.
7. The relationship between the researcher and the respondent is not formal. The
researcher can clearly understand the margin of error if the respondent either
degrees to be a part of the survey or is just not interested in providing the correct
information.

Disadvantages of structured interviews


1. Limited scope of assessment of obtained results.
2. The accuracy of information overpowers the detail of information.
3. Respondents are forced to select from the provided answer options.
4. The researcher is expected to always adhere to the list of decided questions
irrespective of how interesting the conversation is turning out to be with the
participants.
5. A significant amount of time is required for a structured interview.

Semi-structured interview offers a considerable amount of leeway to the researcher to


probe the respondents and maintain a basic interview structure. Even if it is a guided conversation
between researchers and interviewees, the researchers offer appreciable flexibility. A researcher can
be assured that multiple interview rounds will not be required in the presence of structure in this
type of research interview. Keeping the structure in mind, the researcher can follow any idea or take
creative advantage of the entire interview. Additional respondent probing is always necessary to
garner information for a research study. The best application of a semi-structured interview is when
the researcher does not have time to conduct research and requires detailed information about the
topic.

4
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
Advantages of semi-structured interviews:
1. Questions of semi-structured interviews are prepared before the scheduled interview,
allowing the researcher to prepare and analyze the questions.
2. It is flexible to an extent while maintaining the research guidelines.
3. Researchers can express the interview questions in the format they prefer, unlike the
structured interview.
4. Reliable qualitative data can be collected via these interviews.
5. Flexible structure of the interview.

Disadvantages of semi-structured interviews:


1. Participants may question the reliability factor of these interviews due to the
flexibility offered.
2. Comparing two different answers becomes difficult as the guideline for conducting
interviews is not entirely followed. No two questions will have the exact same
structure, and the result will be an inability to compare are infer results.

Unstructured Interview is called in-depth interviews. These interviews have the least number
of questions as they lean more towards a normal conversation but with an underlying subject. There
are no guidelines for the researchers to follow. So, they can ethically approach the participants to
gain as much information as possible for their research topic. Since there are no guidelines for these
interviews, a researcher is expected to keep their approach in check so that the respondents do not
sway away from the main research motive. For a researcher to obtain the desired outcome, he/she
must keep the following factors in mind:
 Intent of the interview.
 The interview should primarily take into consideration the participant’s interests and skills.
 All the conversations should be conducted within permissible limits of research and the
researcher should try and stick by these limits.
 The skills and knowledge of the researcher should match the purpose of the interview.
 Researchers should understand the do’s and don’ts of unstructured interviews.

5
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
Advantages of Unstructured Interviews:
1. Due to the informal nature of unstructured interviews – it becomes extremely easy
for researchers to try and develop a friendly rapport with the participants. This leads
to gaining insights in extreme detail without much conscious effort.
2. The participants can clarify all their doubts about the questions and the researcher
can take each opportunity to explain his/her intention for better answers.
3. There are no questions which the researcher has to abide by and this usually
increases the flexibility of the entire research process.

Disadvantages of Unstructured Interviews:


1. As there is no structure to the interview process, researchers take time to execute
these interviews.
2. The absence of a standardized set of questions and guidelines indicates that the
reliability of unstructured interviews is questionable.
3. In many cases, the ethics involved in these interviews are considered borderline
upsetting.

There are three methods to conduct research interviews, each of which is peculiar in its
application and can be used according to the research study requirement.

A personal interview, also called a face-to-face interview, is utilized when a specific target
population is involved. The purpose of conducting a personal interview survey is to explore the
people's responses to gather more and deeper information.

Personal interviews are one of the most used types of interviews, where the questions are
asked personally directly to the respondent. For this, a researcher can have online guide surveys to
take note of the answers. A researcher can design his/her survey so that they take notes of the
comments or points of view that stand out from the interviewee.

6
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
Advantages of the personal interview:
1. Higher response rate.
2. When the interviewees and respondents are face-to-face, there is a way to adapt the
questions if this is not understood.
3. More complete answers can be obtained if there is doubt on both sides or particular
information is detected that is remarkable.
4. The researcher has an opportunity to detect and analyze the interviewee’s body
language at the time of asking the questions and taking notes about it.

Disadvantages of the personal interview:


1. They are time-consuming and extremely expensive.
2. They can generate distrust on the interviewee's part, since they may be self-
conscious and not answer truthfully.
3. Contacting the interviewees can be a real headache, either scheduling an
appointment in workplaces or going from house to house and not finding anyone.
4. Therefore, many interviews are conducted in public places, such as shopping centers
or parks. There are even consumer studies that take advantage of these sites to
conduct interviews or surveys and give incentives, gifts, coupons, in short; There are
great opportunities for online research in shopping centers.
5. Among the advantages of conducting these types of interviews is that the
respondents will have more fresh information if the interview is conducted in the
context and with the appropriate stimuli, so that researchers can have data from their
experience at the scene of the events, immediately and first hand. The interviewer
can use an online survey through a mobile device that will undoubtedly facilitate the
entire process.

7
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
Telephone interview is when the interviewer communicates with the respondent on the telephone
in accordance with the prepared questionnaire. Usually, standardized questionnaires with closed-
ended questions are recommended for this kind of questioning.

Telephone interview is a quantitative research tool practiced in public opinion, customer, or


other target group surveys.

Advantages of Telephone interview:


1. To find the interviewees, it is enough to have their telephone numbers on hand.
2. They are usually lower cost.
3. The information is collected quickly.
4. Having a personal contact can also clarify doubts or give more details of the
questions.

Disadvantages of Telephone interview:


1. Researchers often observe that people do not answer phone calls because it is an
unknown number for the respondent or simply already changed their place of
residence and cannot locate it, which causes a bias in the interview.
2. Researchers also face that they simply do not want to answer and resort to pretexts
such as they are busy responding, sick, do not have the authority to answer the
questions asked, have no interest in answering, or are afraid of putting their security
at risk.
3. One of the aspects that should be taken care of in these types of interviews is the
kindness with which the interviewers address the respondents to get them to
cooperate more easily. Good communication is vital for the generation of better
answers.

8
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
Computer-Assisted Personal Interviewing (CAPI) is a face-to-face data collection method in
which the interviewer uses a tablet, mobile phone, or a computer to record answers given during
the interview (Beam, 2019).

The primary purpose of CAPI is to conduct large-scale continuous surveys for the
commercial sector and government. CAPI defies traditional paper questionnaires and adopts a
face-to-face stance, which has had enormous effects on the quality of data.

Advantages of CAPI:
1. Time – Being purely electronic, the time taken to convert a paper questionnaire into a
computer would be time-consuming. CAPI software systems also provide data entry,
checking, and exportation all in one place.
2. Exposure – The program can be incorporated on to the internet, potentially attracting
a global audience.
3. Cost – With CAPI you can store data online and offline, eliminating any printing and
data-entry costs.
4. Accurate results – CAPI software systems provide analysis of results in real-time,
which are easily exportable to Excel or CSV, avoiding any possibility of human error.

Disadvantages of CAPI:
1. Due to the effectiveness of this market research tool, there may be additional time
spent on preparation e.g. programming and procurement.
2. Practicalities such as technical difficulties, internet access, and accessibility could
affect the development of research.

9
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
Questionnaire is a set of standardized questions, often called items, which follow a fixed scheme
to collect individual data about one or more specific topics (Reddy, 2019)

Questionnaire provides the speediest and simple technique of gathering data about groups
of individuals scattered in a wide and extended field. In this method, a questionnaire form is sent
usually by post to the persons concerned, with a request to answer the questions and return the
questionnaire.

Paper-pencil questionnaires can be sent to a large number of people and save the
researcher time and money. People are more truthful while responding to the questionnaires
regarding controversial issues because their responses are anonymous. But they also have
drawbacks. The majority of the people who receive questionnaires do not return them, and those
who do might not be representative of the originally selected sample.

Web-based questionnaires are a new and inevitably growing methodology that is Internet-
based research. This would mean receiving an e-mail on which you would click on an address that
would take you to a secure website to fill in a questionnaire. This type of research is often quicker
and less detailed. Some disadvantages of this method include the exclusion of people who do not
have a computer or cannot access a computer. Also, the validity of such surveys is in question as
people might be in a hurry to complete them and so might not give accurate responses.

Advantages of Questionnaire:
1. Questionnaires are inexpensive when appropriately handled. They can be cheaper
than taking surveys, which requires a lot of time and money.
2. It is an effective method to get an opinion from a large number of people.
3. Unlike face-to-face surveys where the respondent has to answer within that moment
itself, questionnaires give time to the respondents to think carefully, before giving
the answers.
4. They are easy to administer and manage.

10
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
5. Questionnaires allow people to answer questions when they feel it is convenient.
Thus, it is more applicable than face-to-face surveys where people are expected to
reply to the question immediately.
6. If anonymous, more honest answers can be expected from the people being
surveyed.
7. Used for getting answers from a large group of people in a short space of time.
Disadvantages of Questionnaire:
1. The results for questionnaires are based only on the type of question being asked. If
the questions are poorly worded or are biased, then the result analyzed will also be
of the same nature.
2. The response rate may be poor in questionnaires if people do not have time or do
not feel any importance in answering them. This is one of the main disadvantages of
questionnaires.
3. Open-ended questions may take a long time and will produce a large amount of
data that will take time to analyze.
4. If any doubts in the answers, the analyst cannot trace them back to the respondents
since most of the questionnaires are usually anonymous.
5. Questionnaires can also give the respondents freedom to lie, resulting in vague
answers or opinions distant from the main issue.
6. Questionnaires do not explain the questions to the respondents, which might lead to
misinterpreted answers and facts.
7. Because of the ambiguous language used, it might be confusing for the respondent
to answer such questions.

11
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
Experimentation is a controlled study in which the researcher attempts to understand cause-and-
effect relationships.

The study is "controlled" in the sense that the researcher controls (1) how subjects are
assigned to groups and (2) which treatments each group receives. It involves manipulating one
variable to determine if changes in one variable cause changes in another variable. The variables
that you manipulate are referred to as independent, while the variables that change due to
manipulation are dependent variables.

Examples: Medical technologists would like to know the effect of a new brand of vitamins on
toddlers' growth. The new brand will be taken by a set of toddlers, while another set will be given
the existing brand. The growth of toddlers will then be compared to determine which vitamins are
better.

Advantages of Experimentation:
1. The biggest advantage of the experimental method is its unique ability to isolate
causal factors since an experiment is highly controlled.
2. This method promises more accuracy in the study.
3. Reliable data can be collected.
4. This is more suitable for the problem with heterogeneous (varied) influencing factors.
Disadvantages of Experimentation
1. The disadvantage is that this control may distort the validity of the obtained results,
especially the ecological validity.
2. This is a very costly method.
3. This is suitable for simple problems with limited scope.
4. This is a time-consuming method.

12
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
Registration refers to the continuous, permanent, compulsory recording of the occurrence of vital
events together with certain identifying or descriptive characteristics concerning them. It is the
gathering of information enforced by law.

Examples are the number of registered professionals that can be found at the Professional
Regulation Commission (PRC). The number of births and death rates are registered in the National
Statistics Office (NSO).

Advantages of Registration:
1. This method is the most reliable since laws enforce it. This method promises more
accuracy in the study.
Disadvantages of Registration
1. Data are limited to what are listed in the document.

Data Presentation

Data Presentation or Visualization refers to an exhibition or putting up data in an attractive and


useful way such that it can be easily interpreted.

Data gathered to provide a partial picture of reality. Regardless of the use, it was intended to
serve, one must always consider things such as what information the data are conveying, and what
must be done to include more useful information. Since most data are available in a raw format,
they must be summarized and organized to derive such useful information from them. Furthermore,
each data set needs to be presented in a certain way depending on its use. Planning how the data
will be presented is essential before appropriately processing raw data.
Data Visualization is a term to describe the use of graphical displays to summarize and
present information about a data set. Data become more comprehensible and more useful when
they are organized and presented using graphs, frequency distribution tables, charts, diagrams, and
the like to derive logical solutions and conclusions.

13
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
Data Patterns in Graphs
The data patterns are commonly described in terms of the center, spread, shape, and other
unusual features.
Center. The point in a graphic display where about half of the observations are on either
side.
Spread. This refers to the variability of the data. If the observations cover a wide range, then
the spread is larger. On the other hand, the spread is smaller when the observations are clustered
around a single value.
Shape. It is described by the following characteristics:
 Symmetry. Graph can be divided at the center so that each half is a mirror image of
the other.
 Number of peaks. A distribution with one peak is referred to as unimodal, while a
distribution with two peaks is bimodal.
 Skewness. Some distributions have more observations on one side of the graph
than the other. A distribution with fewer observations on the right (toward higher
values) is said to be skewed to the right. On the other hand, distribution with fewer
observations on the left (toward lower values) is said to be skewed to the left.
 Uniform. Data distribution is equally spread across the range of the distribution.
Unusual features.
 Gaps. Areas of a distribution where there are no observations.
 Outliers. The distribution of data is sometimes characterized by extreme values that
greatly differ from the other observations.

Summarizing Qualitative and Quantitative Data for a Single Variable


Data obtained from a single variable can be summarized and presented in many ways. A
frequency distribution table, a bar chart and a pie chart can be used to present qualitative data.
Quantitative data, on the other hand, can be summarized using a dot plot, a stem-and-leaf display,
a frequency distribution table, and a histogram. Let us look at each of these methods more closely.

14
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
FREQUENCY DISTRIBUTION TABLE (FDT). A frequency distribution is a table that shows how
often each value (or set of values) of the variable in question occurs in a data set. It is used to
summarize categorical (qualitative) or numerical (quantitative) data. Simply put, it is a tabular
summary of data showing the number or frequency of observations in each of several non-
overlapping categories or classes.
The relative frequency of a class equals the fraction or proportion of the observations
belonging to a class or category. Thus, the relative frequency can be computed using

A relative frequency distribution gives a tabular summary of data showing the relative
frequency for each class. If the relative frequency is multiplied by 100, we get the percent frequency
of a class. A percent frequency distribution summarizes the percent frequency of the data for each
class.

Example 6. The raw data in the table shows fifty


soft drink purchases. Notice that there is not so
much information that we can get from the
data in its current form, so it is best to consider
other ways to present the data. Let us construct
a frequency distribution table for the sample.

15
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
The frequency distribution table for this data set can be constructed manually or by using the
PivotTable feature of Microsoft Excel. With some editing, below are the frequency, relative
frequency, and percent frequency tables generated:

RSTUDIO

Using RStudio, the task can be completed by running the following R code in the Console window.
We will use the “purchase.csv” file in our working directory.

16
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
The same R code or script can also be written in the Source window or pane if you want to
keep a copy of the scripts you write in RStudio. First, we create a new R script file by clicking on the
File menu, then click on New File and select R Script. The same result can be obtained by using the
hot keys Ctrl+Shift+N.
Write the R code on the Source window. You should be able to have something similar to
figure below.

17
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
R script for the frequency distribution table for the soft drink purchase data.

Save the R script file. R script files are named with an. R extension. Click on the save icon on
the Source window and browse to your set working directory. Name the file as purchase.R.
After saving the file, execute the script by highlighting all the lines on the Source window and then
clicking on the ‘Run’ icon on the upper right part of the Source window. As an alternative to the
‘Run’ icon, you can press on the Ctrl+Enter keys to run the script. Take note of this.
For the relative frequency table, we can run the following R script.

18
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
Note that since the dataset was already imported in RStudio from the previous R script,
there is no need to import the data again. Also, since the packages were already installed and
loaded from the previous R script, there is no need to repeat these commands.

Example 7. An engineering school arranged a charity concert to raise funds for COVID-19
patients. The following data give the status of 40 randomly selected students who attended the
concert. The numbers 1, 2, 3, and 4 represent the categories freshman, sophomore, junior, and
senior, respectively:

The table below shows the frequency, relative frequency, and percent frequency for the data
in just one table. Note that in practice, it is customary only to include one such type of frequency.

19
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
In this example, the frequency table constructed is for ungrouped data, which means that
the individual values do not lose their identity in the table.

RSTUDIO

Doing this in RStudio, let us consider a different approach by instead constructing a vector
representing the data values. Open a new R script file then enter and run following script.

Frequency distribution table for the number of cars registered in each household

20
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
BAR GRAPH. It represents the data by using vertical or horizontal bars whose heights or
lengths denote the frequencies of the data. It can be used to represents qualitative or categorical
data. A bar graph can be drawn using either horizontal or vertical bars. For a vertical bar chart, the
horizontal (x) axis represents the categories; the vertical (y) axis represents a value (frequency,
relative frequency, or percent frequency) for those categories.

Example 8. The raw data in the table shows fifty


soft drink purchases. Notice that there is not so
much information that we can get from the
data in its current form, so it is best to consider
other ways to present the data. Let us construct
a frequency distribution table for the sample.

21
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
The figure below shows the bar chart of the data on soft drink purchases .

RSTUDIO

22
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
Just a note, you may not assign the bar graphs into the objects bar1 and bar2. Removing
these assignments in the script would generate the bar charts right away. Also, the bars will be
shown in the plots window of RStudio where you have the options to “Save as Image”, “Save as
PDF”, or “Copy to Clipboard” once you click of the “Export” icon on the Plots window.

23
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
PIE CHART. It (also called a pie graph or circle graph) provides another graphical device for
presenting relative frequency and percent frequency distributions for qualitative data. The numerical
values shown for each sector can be frequencies, relative frequencies, or percent frequencies, which
subdivides the circles into sectors.
A pie chart makes use of sectors (slices) in a circle. The angle of a sector is proportional to
the frequency of each of the categories of the variable that defines the data. The formula to
determine the angle of a sector in a circle graph is:

Example 9. The raw data in the table shows fifty


soft drink purchases. Notice that there is not so
much information that we can get from the
data in its current form, so it is best to consider
other ways to present the data. Let us construct
a frequency distribution table for the sample.

The figure below shows the pie chart of the data on soft drink purchases generated using
Microsoft Excel.

24
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
RSTUDIO

25
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
DOT PLOT. It is a graphical display of data using dots. It is similar to a bar graph because the
height of each “bar” of dots is equal to the number of items in a particular category. To draw a dot
plot, count the number of data points falling in each category and draw a stack of dots that number
high for each category. A dot plot can be used as a graphical display of the frequency of qualitative
and quantitative (ungrouped) data.

Example 10. The figure that follows shows the dot plot for the the number of students, classified
according to year, who went to the concert:

26
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
R Script
Here we present two ways by which a dot plot is constructed. First is by importing a .csv data
file from MS Excel, which is very useful especially if we have a large data set, and the other way is by
constructing the data vector in the RStudio environment. This is applicable if we would be dealing
with a small set of data. The following are the scripts. For the first method, we use the “concert.csv”
data from our directory.

27
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
Notice the difference in dot sizes when you use different binwidths. You can further explore
RStudio functionality by varying the values of “arguments” in the syntax.

STEM -AND -LEAF PLOT. It is a graphical display for quantitative data that shows both the
rank order and shape of a data set. It is particularly useful when data are not too numerous. Stem-
and-leaf plots are a method for showing the frequency with which certain classes of values occur.

Example 11. The following illustration and steps


are taken from the website:
https://study.com/academy/lesson/how-to-
make-a-stem-and-leaf-plot.html
The process will be easiest to follow with sample
data, so let us pretend that a sports statistician
wants to make a stem-and-leaf plot for a recent
game played by the Blues basketball team. The
total minutes played by each team member has
been recorded and shown below:

28
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
Step 1: Determine the smallest and largest number in the data.

Looking at the stats, we see the number of minutes played ranges from a low of 1 minute to
a high of 31 minutes.

Step 2: Identify the stems.

For any number, the digit/s to the left of the right-most digit is a stem. For example, the
number 31 has a stem of 3, while the number 29 has a stem of 2. A one-digit number like 4
has a stem of 0. Think ''04'' for 4. Based on the range of 1 to 31, we need stems of 0, 1, 2 and
3.

Step 3: Draw a vertical line and list the stem numbers to the left of the line.

29
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
The place value of the leaf is called the leaf unit. In the example above, the leaf unit is 1.
Other leaf units maybe 100, 10, 0.1, and so on. If the leaf unit is not 1, it should be displayed in the
stem-and-leaf plot.

R Script

For the same example, the stem and leaf plot can be generated in RStudio by using the stem ()
function. The script is very short. Try this out in RStudio.

HISTOGRAM. It is used to summarize continuous or discrete data. A histogram shows the


single quantitative variable along the x-axis and frequency of that variable on the y-axis. A
histogram shows no gaps between the bars.

30
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
Histogram offers a visual representation of data distribution. It can display a large amount of
data and the frequency of the data values. A histogram can determine the median and distribution
of the data. In addition, it can show any outliers or gaps in the data.

Example 12. Consider the following data set on the diameter (in mm.) for a sample of 70
machined hex bolts:

425 430 430 435 435 435 435 435 440 440 440 440 440 445 445

445 445 445 450 450 450 450 450 450 450 460 460 460 465 465

465 470 470 472 475 475 475 480 480 480 480 485 490 490 490

500 500 500 500 510 510 515 525 525 525 535 549 550 570 570

575 575 580 590 600 600 600 600 615 615

A frequency table with 8 class intervals for this sample is shown below. In this case, the values are
grouped together in each class, and the individual values are no longer visible.Construct the
histogram corresponding to the frequency distribution table for the data on diameter (in mm) for
a sample of 70 machined hex bolts as shown below:

31
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
RSTUDIO
R Script
To plot the histogram for the same example, again we use the “diameter.csv” file.

32
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
Summarizing Qualitative and Quantitative Data for Two Variables

Tabular and graphical displays for data obtained from two variables help understand the
relationship between them if any.
Cross-tabulation or contingency table is a tabular summary of data for two variables. The
variables can both be qualitative or both quantitative or can be a combination of one qualitative
and one quantitative variable. If either variable is quantitative, classes must be created for the values
of the quantitative variable. The labels shown in the margins of the table define the categories
(classes) for the two variables.

Example 13. For an example, we consider the “salaries.csv” file. We construct a crosstabulation of
the rank and sex of the teachers. Using RStudio, we can generate the crosstabulation.

The following is the RStudio Script.


R Script

33
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
From the crosstabulation, we can see that majority of the teachers have a rank of ‘Professor’.
There are relatively more males than females among all the ranks and teachers who are male

34
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
professors make up the largest group. This could not have been easily observed by just looking at
the raw data.

Scatter diagram or scatter plot is a graphical display of the relationship between two
quantitative variables. One variable (independent variable) is shown on the horizontal axis and the
other variable (dependent variable) is shown on the vertical axis. The general pattern of the plotted
points suggests the overall relationship between the variables. This relationship will be discussed
more in Correlation Analysis and Regression Analysis.

Example 14. Consider the hypothetical study on the age of trees where the simplest way of
determining the age of a tree is to use the relationship between a tree’s diameter at breast height
(in feet) and age. Available data on the age and diameter at breast height of 10 trees on record is
given below. We construct a scatter diagram using RStudio

R Script
Here we present two scripts in generating the scatterplot for the same problem. The example data is
contained in the “advertising.csv” data file.

35
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.
36
Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or
transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document,
without the prior written permission of SLU, is strictly prohibited.

You might also like