I. Data Collection What Is Data?
I. Data Collection What Is Data?
Data Collection
What is Data?
Now, if we talk about data mainly in the field of science, then the answer to “what is data” will
be that data is different types of information that usually is formatted in a particular manner.
All the software is divided into two major categories, and those are programs and data.
Programs are the collection made of instructions that are used to manipulate data. So, now
after thoroughly understanding what is data and data science, let us learn some fantastic facts.
Types of Data
Data science is all about experimenting with raw or structured data. Data is the fuel that can
drive a business to the right path or at least provide actionable insights that can help strategize
current campaigns, easily organize the launch of new products, or try out different
experiments.
All these things have one common driving component and this is Data. We are entering into the
digital era where we produce a lot of Data. For instance, a company like Flipkart produces more
than 2TB of data on daily basis.
When this Data has so much importance in our life then it becomes important to properly store
and process this without any error. When dealing with datasets, the category of data plays an
important role to determine which preprocessing strategy would work for a particular set to get
the right results or which type of statistical analysis should be applied for the best results. Let’s
dive into some of the commonly used categories of data.
The purpose is to find out more about what our customers think.
Purchase experience
Accessibility of stores, etc.
Accessibility and knowledge of webshop, etc.
Customer contact
Support: What was the experience of the support department’s knowledge/service/time
before help, etc
Service: What was the experience of service at the time of purchase? Was the customer
given enough help, etc.?
5. Which questions should the survey contain?
Start with the result when you write the survey questions. Don’t think: what should I ask? Think
instead: what type of result would be valuable to us? Think about what type of result each
question you write may achieve and how different types of results can be managed.
For example:
a response that confirms something (for example, previous criticism of your website)
a response that shows something unexpected
a response that can give you practical help to improve your business
a response that can help you in your marketing
6. Write a draft and trim it
Write down all the questions you can think of based on the breakdown of the purpose, bearing
in mind what types of results are valuable. Divide the questions up based on the blocks you
arrived at and remove all questions that will not supply any specific value to the result. Do all
responses feel equally important? Make a list of priorities and delete them from the bottom.
Internal surveys can be longer (even though the trend is towards shorter ones) while external
surveys benefit from being as short as possible.
Surveys
Types of Question
• Questionnaire Bias – using questions that specifies a single event and not include similar events.
“Wordings.” Ex. Have you ever stolen on your past company?
• Sampling Bias – questioning the wrong people Ex. Asking flood survivors if they are ok
• Interpretation Bias – question may be misinterpreted Ex. If you can fix a problem in our
community what would it be?
Sampling Methods – is the process of selecting a representative group from the population under study
1. Simple random sampling – “random.” Everyone in the population has an equal chance of being
picked in the study.
2. Stratified Random Sampling – split population into “strata” or small groups with similarity. Ex.
Group according to gender, or section.
3. Cluster Sampling – population sorted into groups or clusters within a certain proximity of each
other.
Selection Bias
• Voluntary Sample
• Convenience Sample
• Under coverage
• Over coverage
• Nonresponse
• Experiment
• Untrue Response
The experiment must maintain internal and external validity, or the results will be useless.
When designing an experiment, a researcher must follow all of the steps of the scientific method, from
making sure that the hypothesis is valid and testable, to using controls and statistical tests.
Whilst all scientists use reasoning, operationalization and the steps of the scientific process, it is not
always a conscious process.
Experience and practice mean that many scientists follow an instinctive process of conducting an
experiment, the 'streamlined' scientific process. Following the basic steps will usually generate valid
results, but where experiments are complex and expensive, it is always advisable to follow the rigorous
scientific protocols. Conducting an experiment has a number of stages, where the parameters and
structure of the experiment are made clear.
Whilst it is rarely practical to follow each step strictly, any aberrations must be justified, whether they
arise because of budget, impracticality or ethics.
Stage One
After deciding upon a hypothesis, and making predictions, the first stage of conducting an experiment is
to specify the sample groups. These should be large enough to give a statistically viable study, but small
enough to be practical.
Ideally, groups should be selected at random, from a wide selection of the sample population. This
allows results to be generalized to the population as a whole.
In the physical sciences, this is fairly easy, but the biological and behavioral sciences are often limited by
other factors.
For example, medical trials often cannot find random groups. Such research often relies upon
volunteers, so it is difficult to apply any realistic randomization. This is not a problem, as long as the
process is justified, and the results are not applied to the population as a whole.
If a psychological researcher used volunteers who were male students, aged between 18 and 24, the
findings can only be generalized to that specific demographic group within society.
Stage Two
The sample groups should be divided, into a control group and a test group, to reduce the possibility of
confounding variables.
This, again, should be random, and the assigning of subjects to groups should be blind or double blind.
This will reduce the chances of experimental error, or bias, when conducting an experiment.
Ethics are often a barrier to this process, because deliberately withholding treatment, as with the
Tuskegee study, is not permitted.
Again, any deviations from this process must be explained in the conclusion. There is nothing wrong
with compromising upon randomness, where necessary, as long as other scientists are aware of how,
and why, the researcher selected groups on that basis.
Stage Three
This stage of conducting an experiment involves determining the time scale and frequency of sampling,
to fit the type of experiment.
For example, researchers studying the effectiveness of a cure for colds would take frequent samples,
over a period of days. Researchers testing a cure for Parkinson's disease would use less frequent tests,
over a period of months or years.
Stage Four
The penultimate stage of the experiment involves performing the experiment according to the methods
stipulated during the design phase.
The independent variable is manipulated, generating a usable data set for the dependent variable.
Stage Five
The raw data from the results should be gathered, and analyzed, by statistical means. This allows the
researcher to establish if there is any relationship between the variables and accept, or reject, the null
hypothesis.
These steps are essential to providing excellent results. Whilst many researchers do not want to become
involved in the exact processes of inductive reasoning, deductive reasoning and operationalization, they
all follow the basic steps of conducting an experiment. This ensures that their results are valid.
Observational Study - Type of data collection/study that you don’t intervene but just observe.
Experimental Study - Type of data collection/study where as you control the cause of the
response.
Grouping in Experiments
Random Assignment – assigning where they belong experiment group or control group
Replication – using enough experimental units to make sure that the different treatment groups are
similar.
https://www.formpl.us/blog/data-collection-method
https://medium.com/@callygood/6-methods-of-data-collection-e946e993b930
https://www.upgrad.com/blog/types-of-data/
https://www.netigate.net/articles/survey-tips/6-steps-in-planning-a-survey-that-will-help-you-avoid-
the-most-common-mistakes/
https://explorable.com/conducting-an-experiment
PROBSET
Answer the ff questions and write it in a clean white paper of any size. Copy and Answer. The writing
must be legible and free from erasures. After answering, scan or take a photo with no distortion
and/or not blurred. Passing of the Probset will depend on the Instructor. (Solutions Have more weight
than the answers)