0% found this document useful (0 votes)
57 views

I. Data Collection What Is Data?

Data can be qualitative or quantitative and is collected through various methods. Qualitative data includes categorical information that can be divided into nominal and ordinal types. Quantitative data measures numerical values that can be discrete or continuous. Primary data is collected directly by researchers while secondary data is previously collected information. Common data collection methods include surveys, interviews, observations, and reviewing existing documents and records. The choice of collection method depends on the research goals and type of study being conducted.

Uploaded by

Matthew Mirador
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

I. Data Collection What Is Data?

Data can be qualitative or quantitative and is collected through various methods. Qualitative data includes categorical information that can be divided into nominal and ordinal types. Quantitative data measures numerical values that can be discrete or continuous. Primary data is collected directly by researchers while secondary data is previously collected information. Common data collection methods include surveys, interviews, observations, and reviewing existing documents and records. The choice of collection method depends on the research goals and type of study being conducted.

Uploaded by

Matthew Mirador
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

I.

Data Collection
What is Data?
Now, if we talk about data mainly in the field of science, then the answer to “what is data” will
be that data is different types of information that usually is formatted in a particular manner.
All the software is divided into two major categories, and those are programs and data.
Programs are the collection made of instructions that are used to manipulate data. So, now
after thoroughly understanding what is data and data science, let us learn some fantastic facts.

Types of Data
Data science is all about experimenting with raw or structured data. Data is the fuel that can
drive a business to the right path or at least provide actionable insights that can help strategize
current campaigns, easily organize the launch of new products, or try out different
experiments.
All these things have one common driving component and this is Data. We are entering into the
digital era where we produce a lot of Data. For instance, a company like Flipkart produces more
than 2TB of data on daily basis.
When this Data has so much importance in our life then it becomes important to properly store
and process this without any error. When dealing with datasets, the category of data plays an
important role to determine which preprocessing strategy would work for a particular set to get
the right results or which type of statistical analysis should be applied for the best results. Let’s
dive into some of the commonly used categories of data.

Qualitative Data Type


Qualitative or Categorical Data describes the object under consideration using a finite set of
discrete classes. It means that this type of data can’t be counted or measured easily using
numbers and therefore divided into categories. The gender of a person (male, female, or
others) is a good example of this data type.
These are usually extracted from audio, images, or text medium. Another example can be of a
smartphone brand that provides information about the current rating, the color of the phone,
category of the phone, and so on. All this information can be categorized as Qualitative data.
There are two subcategories under this:
Nominal
These are the set of values that don’t possess a natural ordering. Let’s understand this with
some examples. The color of a smartphone can be considered as a nominal data type as we
can’t compare one color with others.
It is not possible to state that ‘Red’ is greater than ‘Blue’. The gender of a person is another one
where we can’t differentiate between male, female, or others. Mobile phone categories
whether it is midrange, budget segment, or premium smartphone is also nominal data type.
Ordinal
These types of values have a natural ordering while maintaining their class of values. If we
consider the size of a clothing brand then we can easily sort them according to their name tag
in the order of small < medium < large. The grading system while marking candidates in a test
can also be considered as an ordinal data type where A+ is definitely better than B grade.
These categories help us deciding which encoding strategy can be applied to which type of
data. Data encoding for Qualitative data is important because machine learning models can’t
handle these values directly and needed to be converted to numerical types as the models are
mathematical in nature.
For nominal data type where there is no comparison among the categories, one-hot encoding
can be applied which is similar to binary coding considering there are in less number and for the
ordinal data type, label encoding can be applied which is a form of integer encoding.
Quantitative Data Type
This data type tries to quantify things and it does by considering numerical values that make it
countable in nature. The price of a smartphone, discount offered, number of ratings on a
product, the frequency of processor of a smartphone, or ram of that particular phone, all these
things fall under the category of Quantitative data types.
The key thing is that there can be an infinite number of values a feature can take. For instance,
the price of a smartphone can vary from x amount to any value and it can be further broken
down based on fractional values. The two subcategories which describe them clearly are:
Discrete
The numerical values which fall under are integers or whole numbers are placed under this
category. The number of speakers in the phone, cameras, cores in the processor, the number of
sims supported all these are some of the examples of the discrete data type.
Continuous
The fractional numbers are considered as continuous values. These can take the form of the
operating frequency of the processors, the android version of the phone, wifi frequency,
temperature of the cores, and so on.

What is Data Collection?


Data collection is a methodical process of gathering and analyzing specific information to
proffer solutions to relevant questions and evaluate the results. It focuses on finding out all
there is to a particular subject matter. Data is collected to be further subjected to hypothesis
testing which seeks to explain a phenomenon.
Hypothesis testing eliminates assumptions while making a proposition from the basis of reason.
For collectors of data, there is a range of outcomes for which the data is collected. But the key
purpose for which data is collected is to put a researcher in a vantage position to make
predictions about future probabilities and trends.
The core forms in which data can be collected are primary and secondary data. While the
former is collected by a researcher through first-hand sources, the latter is collected by an
individual other than the user.

Types of Data Collection


Before broaching the subject of the various types of data collection. It is pertinent to note that
data collection in itself falls under two broad categories; Primary data collection and secondary
data collection.
Primary Data Collection
Primary data collection by definition is the gathering of raw data collected at the source. It is a
process of collecting the original data collected by a researcher for a specific research purpose.
It could be further analyzed into two segments; qualitative research and quantitative data
collection methods.

Qualitative Research Method


The qualitative research methods of data collection do not involve the collection of data that
involves numbers or a need to be deduced through a mathematical calculation, rather it is
based on the non-quantifiable elements like the feeling or emotion of the researcher. An
example of such a method is an open-ended questionnaire.
Quantitative Method
Quantitative methods are presented in numbers and require a mathematical calculation to
deduce. An example would be the use of a questionnaire with close-ended questions to arrive
at figures to be calculated Mathematically. Also, methods of correlation and regression, mean,
mode and median.
Secondary Data Collection
Secondary data collection, on the other hand, is referred to as the gathering of second-hand
data collected by an individual who is not the original user. It is the process of collecting data
that is already existing, be it already published books, journals and/or online portals. In terms of
ease, it is much less expensive and easier to collect.
Your choice between Primary data collection and secondary data collection depends on the
nature, scope and area of your research as well as its aims and objectives.

Methods of data collection


The system of data collection is based on the type of study being conducted. Depending on the
researcher’s research plan and design, there are several ways data can be collected.
The most commonly used methods are: published literature sources, surveys (email and mail),
interviews (telephone, face-to-face or focus group), observations, documents and records, and
experiments.
1. Literature sources
This involves the collection of data from already published text available in the public domain.
Literature sources can include: textbooks, government or private companies’ reports,
newspapers, magazines, online published papers and articles.
This method of data collection is referred to as secondary data collection. In comparison to
primary data collection, tt is inexpensive and not time consuming.
2. Surveys
Survey is another method of gathering information for research purposes. Information are
gathered through questionnaire, mostly based on individual or group experiences regarding a
particular phenomenon.
There are several ways by which this information can be collected. Most notable ways are:
web-based questionnaire and paper-based questionnaire (printed form). The results of this
method of data collection are generally easy to analyse.
3. Interviews
Interview is a qualitative method of data collection whose results are based on intensive
engagement with respondents about a particular study. Usually, interviews are used in order to
collect in-depth responses from the professionals being interviewed.
Interview can be structured (formal), semi-structured or unstructured (informal). In essence, an
interview method of data collection can be conducted through face-to-face meeting with the
interviewee(s) or through telephone.
4. Observations
Observation method of information gathering is used by monitoring participants in a specific
situation or environment at a given time and day. Basically, researchers observe the behaviour
of the surrounding environments or people that are being studied. This type of study can be
contriolled, natural or participant.
Controlled observation is when the researcher uses a standardised precedure of observing
participants or the environment. Natural observation is when participants are being observed in
their natural conditions. Participant observation is where the researcher becomes part of the
group being studied.
5. Documents and records
This is the process of examining existing documents and records of an organisation for tracking
changes over a period of time. Records can be tracked by examining call logs, email logs,
databases, minutes of meetings, staff reports, information logs, etc.
For instance, an organisation may want to understand why there are lots of negative reviews
and complains from customer about its products or services. In this case, the organisation will
look into records of their products or services and recorded interaction of employees with
customers.
6. Experiments
Experiemental research is a research method where the causal relationship between two
variables are being examined. One of the variables can be manipulated, and the other is
measured. These two variables are classified as dependent and independent variables.
In experimental research, data are mostly collected based on the cause and effect of the two
variables being studied. This type of research are common among medical researchers, and it
uses quantitative research approach.
If you are interested in my services, drop me a message or what you need. I will get back to you
as soon as possible.
II. Planning and Conducting Surveys
6 steps in planning a survey that will help you avoid the most common mistakes
1. What is the purpose of the survey- the basic to planning a survey?
Start with the WHY when planning your survey, because everything with a good plan works out
the best. You know already that you need to conduct a survey- that is a great starting point.
There is a reason and there is a purpose. Is it to understand your employees better? Or learn
what your customers think about the support you provide? Or are you planning to launch your
business on a new market and want to know more about purchasing behaviour and demand
there?
Write down the purpose and have it in front of you throughout the work on the survey.
2. Decide on the target group
Once you have decided on the purpose, the target group is usually fairly obvious, but it may
need to be specified. Do you want to know more about how your product is used? The target
group is presumably your customers in that case, but perhaps you want to know the different
ways in which customers of different ages and genders and in different locations use it? If you
specify the segments that are of interest, it becomes easier to work with background data later.
3. How do you reach your target group?
You have specified the purpose, the target group, and the segments, but how do you reach
them best? The most common way of distributing a survey is via a mailshot, but there are also a
number of other ways of doing it. For example:
-via a link on the intranet, social media or advertisements
printed QR code or short link on flyers, business cards or menus
-via iPads that you take to your event or are located in the lobby
-via text message
4. Break down the purpose and limit the scope
When you send out a survey, it may be tempting to ask questions that cover as many areas as
possible. However, remember that the larger the survey, the greater the risk of a lower
response rate. Consequently, it is frequently better to try and limit the questions as much as
possible. Limiting the scope of the survey is usually easier if you break the purpose down into a
number of blocks of questions, as in the example below:

The purpose is to find out more about what our customers think.
Purchase experience
Accessibility of stores, etc.
Accessibility and knowledge of webshop, etc.
Customer contact
Support: What was the experience of the support department’s knowledge/service/time
before help, etc
Service: What was the experience of service at the time of purchase? Was the customer
given enough help, etc.?
5. Which questions should the survey contain?
Start with the result when you write the survey questions. Don’t think: what should I ask? Think
instead: what type of result would be valuable to us? Think about what type of result each
question you write may achieve and how different types of results can be managed.
For example:
a response that confirms something (for example, previous criticism of your website)
a response that shows something unexpected
a response that can give you practical help to improve your business
a response that can help you in your marketing
6. Write a draft and trim it
Write down all the questions you can think of based on the breakdown of the purpose, bearing
in mind what types of results are valuable. Divide the questions up based on the blocks you
arrived at and remove all questions that will not supply any specific value to the result. Do all
responses feel equally important? Make a list of priorities and delete them from the bottom.
Internal surveys can be longer (even though the trend is towards shorter ones) while external
surveys benefit from being as short as possible.

Surveys
Types of Question

- Closed – Survey questionnaire determines the answer


- Open – you decide/provide the answer
Bias on survey question

• Questionnaire Bias – using questions that specifies a single event and not include similar events.
“Wordings.” Ex. Have you ever stolen on your past company?
• Sampling Bias – questioning the wrong people Ex. Asking flood survivors if they are ok
• Interpretation Bias – question may be misinterpreted Ex. If you can fix a problem in our
community what would it be?

Sampling Methods – is the process of selecting a representative group from the population under study

1. Simple random sampling – “random.” Everyone in the population has an equal chance of being
picked in the study.
2. Stratified Random Sampling – split population into “strata” or small groups with similarity. Ex.
Group according to gender, or section.
3. Cluster Sampling – population sorted into groups or clusters within a certain proximity of each
other.

Selection Bias

• Voluntary Sample
• Convenience Sample
• Under coverage
• Over coverage
• Nonresponse
• Experiment
• Untrue Response

III. Planning and Conducting Survey


The Basis of Conducting an Experiment
With an experiment, the researcher is trying to learn something new about the world, an explanation of
'why' something happens.

The experiment must maintain internal and external validity, or the results will be useless.

When designing an experiment, a researcher must follow all of the steps of the scientific method, from
making sure that the hypothesis is valid and testable, to using controls and statistical tests.

Whilst all scientists use reasoning, operationalization and the steps of the scientific process, it is not
always a conscious process.

Experience and practice mean that many scientists follow an instinctive process of conducting an
experiment, the 'streamlined' scientific process. Following the basic steps will usually generate valid
results, but where experiments are complex and expensive, it is always advisable to follow the rigorous
scientific protocols. Conducting an experiment has a number of stages, where the parameters and
structure of the experiment are made clear.

Whilst it is rarely practical to follow each step strictly, any aberrations must be justified, whether they
arise because of budget, impracticality or ethics.
Stage One
After deciding upon a hypothesis, and making predictions, the first stage of conducting an experiment is
to specify the sample groups. These should be large enough to give a statistically viable study, but small
enough to be practical.

Ideally, groups should be selected at random, from a wide selection of the sample population. This
allows results to be generalized to the population as a whole.

In the physical sciences, this is fairly easy, but the biological and behavioral sciences are often limited by
other factors.

For example, medical trials often cannot find random groups. Such research often relies upon
volunteers, so it is difficult to apply any realistic randomization. This is not a problem, as long as the
process is justified, and the results are not applied to the population as a whole.

If a psychological researcher used volunteers who were male students, aged between 18 and 24, the
findings can only be generalized to that specific demographic group within society.

Stage Two
The sample groups should be divided, into a control group and a test group, to reduce the possibility of
confounding variables.

This, again, should be random, and the assigning of subjects to groups should be blind or double blind.
This will reduce the chances of experimental error, or bias, when conducting an experiment.

Ethics are often a barrier to this process, because deliberately withholding treatment, as with the
Tuskegee study, is not permitted.

Again, any deviations from this process must be explained in the conclusion. There is nothing wrong
with compromising upon randomness, where necessary, as long as other scientists are aware of how,
and why, the researcher selected groups on that basis.

Stage Three
This stage of conducting an experiment involves determining the time scale and frequency of sampling,
to fit the type of experiment.

For example, researchers studying the effectiveness of a cure for colds would take frequent samples,
over a period of days. Researchers testing a cure for Parkinson's disease would use less frequent tests,
over a period of months or years.

Stage Four
The penultimate stage of the experiment involves performing the experiment according to the methods
stipulated during the design phase.

The independent variable is manipulated, generating a usable data set for the dependent variable.
Stage Five
The raw data from the results should be gathered, and analyzed, by statistical means. This allows the
researcher to establish if there is any relationship between the variables and accept, or reject, the null
hypothesis.

These steps are essential to providing excellent results. Whilst many researchers do not want to become
involved in the exact processes of inductive reasoning, deductive reasoning and operationalization, they
all follow the basic steps of conducting an experiment. This ensures that their results are valid.

Planning and conducting Experiment; introduction to design experiment


Design Experiment – is a powerful statistical technique for improving product/process designs and
solving process / production problems.

Two types of Design Experiment Study

Observational Study - Type of data collection/study that you don’t intervene but just observe.

Experimental Study - Type of data collection/study where as you control the cause of the
response.

Grouping in Experiments

• Control Group – no treatment applied


• Treatment Group – also called experimental group. A group where treatment is applied

Random Assignment – assigning where they belong experiment group or control group

Replication – using enough experimental units to make sure that the different treatment groups are
similar.

- Experimental Units – variables related to objects.


- Subjects – variable related to persons.

Some Terms needed for Design Experiment.

• Confounded Variable – uncertain effect due to too many variable


• Common Response – Similar Effect with other variables.
• Variable (x) – the cause of change in a design experiment
• Response (y) – the data that is the effect in the experiment.
• Placebo Effect – a subject’s positive response to a placebo.
• Placebo - In an experiment, subject respond differently after they receive a treatment, even if
the treatment is neutral. A Neutral Treatment that has no “real” effect on the dependent
variable is called placebo.
Reference/Links:

https://www.formpl.us/blog/data-collection-method

https://medium.com/@callygood/6-methods-of-data-collection-e946e993b930

https://www.upgrad.com/blog/types-of-data/

https://www.netigate.net/articles/survey-tips/6-steps-in-planning-a-survey-that-will-help-you-avoid-
the-most-common-mistakes/

https://explorable.com/conducting-an-experiment
PROBSET

Answer the ff questions and write it in a clean white paper of any size. Copy and Answer. The writing
must be legible and free from erasures. After answering, scan or take a photo with no distortion
and/or not blurred. Passing of the Probset will depend on the Instructor. (Solutions Have more weight
than the answers)

Determine whether the given data is a Discrete or Continuous Data.


1. Official Weight of a ping pong ball
2. Number of leaves in a tree
3. Height of a man
4. Number of students
5. Atoms in a Hydrogen molecule
6. Concert participants
7. Wind speed
8. Pressure
9. Physical Money
10. Votes in an Election
11. Hair on one’s head
12. Amount of water
13. Length of leaves
14. Number of trees in a forest
15. Number of Friends

You might also like