Final BRM
Final BRM
PSO 1: To design and develop systems for real time problems in the areas related to
management andhuman values using latest techniques.
PSO 2: To develop innovative, eco-friendly solutions and ideas in the field of Management using
varioussoftware tools with analytical skills.
PEO PO PSO
PO 1 PO 2 PO 3 PO 4 PO 5 PO 6 PSO1 PSO2
I. 3 3 2 2 2 2 3 3
II. 3 3 3 3 2 1 3 2
III. 3 2 3 3 3 2 3 2
IV 3 3 2 3 3 2 3 3
Bloom’s
COURSE OUTCOMES: Taxonomy Level
Co1: Students would know to write research proposals K3
C21: Students would be able to analyze data and find solutions to the K2
problems.
Co3: Ability to understand the role of Business Analytics in decision K2
making andgenerating reports
Co4: Ability to identify the appropriate tool for the analytics scenario K4
Co5: Ability to apply the different analytics tools and generate solutions K2
SYLLABUS
UNIT I INTRODUCTION 12
Research: Meaning, Purpose, Scientific method, types of research; scope of business research. Selection
and formulation of a research problem, formulation of hypothesis, Types of hypothesis, operational
definition of concepts- Review of literature-Data: Types of data- Primary and secondary data sources -
Relevance & Scope of Research in Management and steps involved in the Research Process
Learning Objectives
• To learn fundamentals of business research, its significance and need for present business
scenario
• To understand to formulate research problem and hypothesis
• To learn about operationalisation of variables and research process in business
Learning Outcomes
At the end of the unit they will be able to:
• To apply different sampling techniques and research method
• To apply qualitative and quantitative techniques
• To apply research process in future business
Mode of Assessment
S.No Title of Teaching Textbook/ Link Tool(Quiz/P
Topic (PPT/Semina Reference Book (if Applicable) link uzzle/
r/Chalk & should on Assignment/
Board etc.) Springboard/ Seminaretc..
Coursera / Nptel )
NPTEL Link:
https://onlinecourses.n
Meaning Chalk and Uma Sekaran and Roger ptel.ac.in/noc23_mg54
and scope board Bougie, Research methods /unit?unit=17&lesson=
1. of research for Business, 5th Edition, 18
Wiley India, NewDelhi,
method
2012.
https://www.youtube.c
om/watch?v=Bqef3syc Quiz
mZY
https://www.youtube.c
Methods PPT Panneerselvam. R, Research om/watch?v=1vf8ZvA
2. and types Methodology, 2nd Edition, DxfY
PHI Learning, 2014.
of research
1. INTRODUCTION
In the present fast track business environment marked by cut-throat competition, many organizations
rely on business research to gain a competitive advantage and greater market share. A good research
study helps organizations to understand processes, products, customers, markets and competition, to
develop policies, strategies and tactics that are most likely to succeed.
The word research is composed of two syllables, re and search. The word re is a prefix meaning
again, anew or over again and the word search is a verb meaning to examine closely and carefully,
to test and try, or to probe. Together they form a noun describing a careful, systematic, patient study
and investigation in some field of knowledge, undertaken to establish facts or principles.
Research is a structured enquiry that utilizes acceptable scientific methodology to solve
problems and create new knowledge that is generally applicable. Scientific methods consist of
systematic observation, classification and interpretation of data.
According to Robert Ross, “Research is essentially an investigation, a recording and an
analysis of evidence for the purpose of gaining knowledge”. It can generally be defined as a
systematic method of finding solutions to problems”. It aims at discovering the truth. It is the
search for knowledge through objective and systematic method of finding solution to problems. It is
carried on both for discovering new facts and verification of old ones. Therefore, research is a
process of systematic and in-depth study or search of any particular topic, subject or area of
investigation backed by collection, computation, presentation and interpretation of relevant data.
Research need not lead to ideal solution but it may give rise to new problems which may
require further research. In other words, research is not an end to a problem since every research
gives birth to a new question. It is carried on both for discovering new facts and verification of
old ones.
The purpose of all research is progress and good life. Progress results if the space of ignorance is
occupied by knowledge and wisdom. The latter are the results of good research. Knowledge and
wisdom drive the mankind to live an orderly good life.
One of the purposes of research is to develop scientific attitude. Scientific attitude is one that asks
‘Why’ and ‘How’ and answers are found. This ‘Know-why’ and ‘Know-how’ attitude nurtures
talents and such intellectual talents are the great assets of society.
One of the purposes of research is encouragement to creativity and innovation. New products, new
processes and new uses are the means through which the world goes dynamic. A dynamic world
is not possible without newness introduced every now and then in every walk of life. And this is
possible only through creativity and innovation. Research kindles the creativity and innovative
instincts of people and thus experiments on the possibility of new things instead of waiting for the
accidental and slow experience path to creativity and innovation.
A very important purpose of research is testing of hypothesis and establishing theories. As was
already pointed out knowledge is power. That knowledge comes from testing hypotheses and
establishing new theories. Proven hypotheses become theories.
Applied research has a great say in prediction and control in almost all walks of human endeavor
Prediction is jumping into the future and the theories constitute the launch pad. Control looks for
deviation between actual happening and predicted happening. In the process, the theories get
reevaluated and redefined.
The purpose of any research is problem solving. What is a problem? Problem is deprivation or
depreciation of something. Knowledge deprivation, efficiency deprivation, productivity
depreciation, etc., exist. How can these be solved? Research into the forces that cause deprivation
and measures to contain them from causing deprivation is needed. Thus, problem solving is a great
purpose of research.
Schematic Evaluation
Research is undertaken to assess the impact of certain measures or change introduced on relevant
variables. Impact studies are useful for biological, social, business, economic and other areas of
decision making.
Another purpose of research is improving research methodology itself. Developments in the field
of measurement and scaling are immense. Whether these can be appropriately used in the case of
particular research areas? To answer the question research needs to be done. Validation,
revalidation and de-validation of methodological aspects thus constitute good piece of research.
And this is one of the purposes of research. In fact, any research has a responsibility towards
contribution to methodological enrichment.
The main importance of research is to produce knowledge that can be applied outside
a research setting. Research also forms the foundation of program development and policies
everywhere around the universe. It also solves particular existing problems of concern.
Research is important because we are able to learn more about things, people, and events. In
doing research, we are able to make smart decisions.
Marketing research is important because it allows consumers and producers to become
more familiar with the products, goods, and services around them. Research is important to
society because it allows us to discover more and more that might make are lives easier, more
comfortable, and safer. It presents more information for investigation. This allows for
improvements based on greater information and study. It is very important. Research
encourages interdisciplinary approaches to find solution to problems and to make new
discoveries. Research is a basic ingredient for development and therefore serves as a means
for rapid economic development.
The main importance or uses may be listed as under:
• Helps in solving various operational and planning problems of business and industry
• To find out the truth which is hidden and which has not been discovered so far.
• Aims at advancing systematic knowledge and formulating basic theories about the forces
influencing the relation between groups as well as those acting on personality development
and is adjustment with individuals.
• Try to improve tools of analysis or to test these against the complex human behaviour and
institutions.
• To understand social life and thereby to gain a greater measure of control over social behaviour.
• To provide an educational program in the accumulated knowledge of group dynamics, in skills of
research, in techniques of training leaders and in social action.
1.7. Importance of Business research
• Business research is one of the most effective ways to understand customers, the market and
competitors. Such research helps companies to understand the demand and supply of the market.
Using such research will help businesses reduce costs, and create solutions or products that are
targeted to the demand in the market and the correct audience.
• In-house business research can enable senior management to build an effective team or train or
mentor when needed. Business research enables the company to track its competitors and hence
can give you the upper hand to stay ahead of them. Failures can be avoided by conducting such
research as it can give the researcher an idea if the time is right to launch its product/solution and
also if the audience is right. It will help understand the brand value and measure customer
satisfaction which is essential to continuously innovate and meet customer demands. This will help
the company grow its revenue and market share.
• Business research also helps recruit ideal candidates for various roles in the company. By
conducting such research a company can carry out a SWOT analysis, i.e. understand the strengths,
weaknesses, opportunities, and threats. With the help of this information, wise decisions can be
made to ensure business success.
• Business research is the first step that any business owner needs to set up his business, to survive
or to excel in the market. The main reason why such research is of utmost importance is that it
helps businesses to grow in terms of revenue, market share and brand value.
(i)Empirical
Scientific method is concerned with the realities that are observable through “sensory
experiences.” It generates knowledge which is verifiable by experience or observation. Some of the
realities could be directly observed, like the number of students present in the class and how many
of them are male and how many female. The same students have attitudes, values, motivations,
aspirations, and commitments. These are also realities which cannot be observed directly, but the
researchers have designed ways to observe these indirectly. Any reality that cannot be put to
“sensory experience” directly or indirectly (existence of heaven, the Day of Judgment, life hereafter,
God’s rewards for good deeds) does not fall within the domain of scientific method.
(ii) Verifiable
Observations made through scientific method are to be verified again by using the senses to
confirm or refute the previous findings. Such confirmations may have to be made by the same
researcher or others. We will place more faith and credence in those findings and conclusions if
similar findings emerge on the basis of data collected by other researchers using the same methods.
To the extent that it does happen (i.e. the results are replicated or repeated) we will gain confidence
in the scientific nature of our research. Replicability, in this way, is an important characteristic of
scientific method. Hence revelations and intuitions are out of the domain of scientific method.
(iii) Cumulative
Prior to the start of any study the researchers try to scan through the literature and see that their
study is not a repetition in ignorance. Instead of reinventing the wheel the researchers take stock of
the existing body of knowledge and try to build on it. Also the researchers do not leave their research
findings into scattered bits and pieces. Facts and figures are to be provided with language and thereby
inferences drawn. The results are to be organized and systematized. Nevertheless, we don’t want to
leave our studies as standalone. A linkage between the present and the previous body of knowledge
has to be established, and that is how the knowledge accumulates. Every new crop of babies does not
have to start from a scratch; the existing body of knowledge provides a huge foundation on which
there searchers build on and hence the knowledge keeps on growing.
(iv)Deterministic
Science is based on the assumption that all events have antecedent causes that are subject to
identification and logical understanding. For the scientist, nothing “just happens” – it happens for a
reason. The scientific researchers try to explain the emerging phenomenon by identifying its causes.
Of the identified causes which ones can be the most important? For example, in the 2006 BA/BSC
examination of the Mumbai University 67 per cent of the students failed. What could be the
determinants of such a mass failure of students? The researcher may try to explain this phenomenon
and come up with variety of reasons which may pertain to students, teachers, administration,
curriculum, books, examination system, and so on. Looking into such a large number of reasons may
be highly cumbersome model for problem solution. It might be appropriate to tell, of all these factors
which one is the most important, the second most important, the third most important, which two in
combination are the most important. The researcher tries to narrow down the number of reasons in
such a way that some action could be taken. Therefore, the achievement of a meaningful, rather than
an elaborate and cumbersome, model for problem solution becomes a critical issue in research. That
is parsimony which implies the explanation with the minimum number of variables that are
responsible for an undesirable situation.
The conclusion follows from the two premises logically. Therefore it is valid. The
deduction is the logical conclusion obtained by deducting it from the statements, called
premise of the argument. The argument is so constructed that if the premises are true,
conclusion must also be true. The logical deduction derives only conclusions from given
premises and it cannot affirm the truth of given statements. It serves in connecting
different truths and thus logical derivation is not a means to find ultimate truth.
Induction: It is the process of reasoning from a part to the whole, from particular to
general or from the individual to the universal. It gives rise to empirical generalizations.
It is a passage from observed to unobserved. It involves two processes namely observation
and generalization. Induction may be regarded as a method by means of which material
truth of the premises is established. Generating ideas from empirical observation is the
process of induction. As a matter of fact, concepts can be generated from experience which
justifies the description of particular situations towards theory- building. It is generally
observed that experience is regarded as a sum of individual observations held together by
the loose tie of association and constantly extended by the idea of inductive inferences.
for a better study of an unknown phenomenon. For this purpose, exploratory research is undertaken
to achieve new insights into such phenomenon.
Helps to Predict Events: Research may be undertaken to predict future course of events. For
instance, research may be undertaken to find out the impact of growing unemployment of educated
youth on the social life of the society in future. The findings of such research would not only
indicate the possible impact, but would also make the concerned authorities to take appropriate
measures to reduce unemployment, to reduce the growth of population and to overcome the
negative consequences, as and when they take place.
Extends Knowledge: Researchers undertake research to extend the existing knowledge in
physical sciences (such as physics, chemistry, mathematics, etc). as well as in social sciences (like
sociology, management, psychology) etc. The knowledge can be enhanced by undertaking research
in general and by fundamental research in particular.
Business research is a process of acquiring detailed information of all the areas of business and
using such information in maximizing the sales and profit of the business. Such a study helps companies
determine which product/service is most profitable or in demand. In simple words, it can be stated as the
acquisition of information or knowledge for professional or commercial purpose to determine
opportunities and goals for a business.
Business research can be done for anything and everything. In general, when people speak about
business research it means asking research questions to know where the money can be spent to increase
sales, profits or market share. Such research is critical to make wise and informed decisions.
For example: A mobile company wants to launch a new model in the market. But they are not
aware of what are the dimensions of a mobile that are in most demand. Hence, the company conducts a
business research using various methods to gather information and the same is then evaluated and
conclusions are drawn, as to what dimensions are most in-demand, This will enable the researcher to make
wise decisions to position his phone at the right price in the market and hence acquire a larger market
share.
For example: A research can be conducted to analyze the effect of good educational facilities in
rural areas. Such a study can be done to analyze the changes in the group of people from the rural areas
when they are provided with good educational facilities and before that.
Another example can be to analyze the effect of having dams and how it will affect the farmers or
production of crops in that area.
(i) Interviews
Interviews are somewhat similar to surveys, like sometimes they may have the same questions used.
The difference is that the respondent can answer these open ended questions at a length and the direction
of the conversation or the questions being asked can be changed depending on the response of the subject.
Such a method usually gives the researcher, detailed information about the perspective or opinions from
its subject. Carrying out interviews with subject matter experts can also give important information
critical to some businesses.
For example: An interview was conducted by a telecom manufacturer, with a group of women to
understand why they have less number of female customers. After interviewing them, the researcher
understood that there were less feminine colors in some of the models, hence females preferred not
purchasing them. Such information can be critical to a business such as a telecom manufacturer and
hence it can be used to increase its market share by targeting women customers by launching some
feminine colors in the market.
Another example would be to interview a subject matter expert in social media marketing. Such an
interview can enable a researcher to understand why certain types of social media advertising strategies
work for a company and why some of them don’t.
1.3.1.3 Some Other Types of Research: All other types of research are variations of one or more
of the above stated approaches, based on either the purpose of research, or the time required to
accomplish research, on the environment in which research is done, or on the basis of some other
similar factor.
• One Time Research: From the point of view of time, we can think of research either as one-
time research or longitudinal research. In the former case the research is confined to a single
time-period, whereas in the latter case the research is carried on over several time-periods.
• Laboratory Research: Research can be field-setting research or laboratory research or
simulation research, depending upon the environment in which it is to be carried out. Research
can as well be understood as clinical or diagnostic research. Such research follows case-study
methods or in-depth approaches to reach the basic causal relations. Such studies usually go deep
into the causes of things or events that interest us, using very small samples and very deep
probing data gathering devices.
• Exploratory Research: The research may be exploratory or it may be formalized. The objective
of exploratory research is the development of hypotheses rather than their testing, whereas
formalized research studies are those with substantial structure and with specific hypotheses to
be tested.
• Historical Research: Historical research is that which utilizes historical sources like documents,
remains, etc., to study events or ideas of the past, including the philosophy of persons and groups
at any remote point of time.
• Conclusion-oriented Research: Research can also be classified as conclusion-oriented and
decision-oriented. While doing conclusion-oriented research, a researcher is free to pick up a
problem, redesign the enquiry as he proceeds and is prepared to conceptualize as he wishes.
Decision-oriented research is always for the need of a decision maker and the researcher in this
case is not free to embark upon research according to his own inclination. Operations research
is an example of decision- oriented research since it is a scientific method of providing executive
departments with a quantitative basis for decisions regarding operations under their control
For an effective formulation of the problem following aspects of the problem are to
be considered by the researcher.
(i) Definition of the problem: - Before one takes up a problem for the study one needs to define
it properly. The issues for inquiry are to be identified clearly and specified in details. If any
existing theoretical framework is tested, the particular theorem or theories must be identified.
(ii) Similarly if there are any assumptions made and terms used the meaning of them must be
made clear. As far as possible the statement of the problem should not give any scope for
ambiguity.
(iii) Scope of the problem: - The research scholar has to fix up the four walls of the study. The
researcher must identify which of the aspects he is trying to prove. Taking the example of
sickness he should specify. (1) Whether his study extends to all types of small scale
industries, or limited to only few of them. (2) Whether the study is limited to find cause for
sickness or also to prescribe certain prescriptions etc.
(iv) Justification of the problem: - Many a time research studies are put to the test of justification
or relevance. In the scientific curiosity of the problems, th problem that needs urgent
solution must be given preference.
(v) Feasibility of the problem: - Although a problem needs urgent attention and is justifiable in
several respects, one has to consider the feasibility of the same. Feasibility means the
possibility of conducting the study successfully. The elements of time, data, Cost is to be
taken into consideration before a topic is selected for study.
(vi) Originality of the problem: - In social sciences, particularly in commerce and management,
there is no systematic compilation of the works already done or on hand. Two people may
be doing a work more or less on similar topic. In such situations it is not advisable to
continue work in the same manner. What is advisable is that, each of them should try to
focus on different aspects, so that they could enrich the field of knowledge with their studies.
Another problem faced by a researcher is that a problem which he intends to do is already
worked out. Should he repeat the same or not? This depends upon the situation or
circumstances which engage his attention.
Ordinarily, when one talks about hypothesis, one simply means a mere assumption or
some supposition to be proved or disproved. But for a researcher hypothesis is a formal
question that he/she intends to resolve. Thus a hypothesis may be defined as a proposition or a
set of proposition set forth as an explanation for the occurrence of some specified group of
scarce which happens in case of new science, hypotheses are generated from formal
conceptual framework. This leads to the growth of theory. The growth of statistical theory of
sampling as also the development theories of economic growth illustrates this point. In either
case, the hypotheses are related to the conceptual theoretical level.
(vii) Continuity of research: Continuous research in a field is itself an important source of
hypotheses. The rejection of hypotheses leads to the formulation of new ones. These new
hypotheses explain the relationships between variables in the subsequent studies on the
same subject. In short, an ideal source of fruitful and relevant hypotheses is a fusion of two
elements. Past experience and Imagination in the disciplined mind of the scientist.
It is critical to operationally define a variable to lend credibility to the methodology and ensure the
reproducibility of the study’s results. Another study may identify the same variable differently,
making it difficult to compare the results of these two studies.
• It establishes the rules and procedures the researcher uses to measure the variable.
• It provides unambiguous and consistent meaning to terms/variables that can be interpreted
differently.
• It makes the collection of data and analysis more focused and efficient.
• It guides what type of data and information we are looking for.
By operationally defining a variable, a researcher can communicate a common methodology to
another researcher. Operational definitions lay down the ground rules and procedures that the
investigator will use to observe and record behavior and write down facts without bias. The sole
purpose of defining the variables operationally is to keep them unambiguous, thereby reducing
errors.
Concepts
A concept is a mental construct or a tool used to understand the world around us. An example
of a concept would be intelligence, humor, motivation, desire. These terms have meaning, but they
cannot be seen or observed directly. You cannot pick up intelligence, buy humor, or weigh either of
these. However, you can tell when someone is intelligent or has a sense of humor.
This is because constructs are observed indirectly through behaviors, which provide evidence of the
construct. For example, someone demonstrates intelligence through their academic success, how
they speak, etc. A person can demonstrate humor by making others laugh through what they say.
Concepts represent things around us that we want to study as researchers.
Defining Concepts
To define a concept for the purpose of research requires the following three things
• A manner in which to measure the concept indirectly
• A unit of analysis
• Some variation among the unit of analysis
The criterion listed above is essentially a definition of a conceptual definition. Below is an example
of a conceptual definition of academic dishonesty
Below is a breakdown of this definition:
Academic dishonesty is the extent to which individuals exhibit a disregard towards educational
norms of scholarly integrity.
• Measurement: exhibit a disregard towards educational norms of scholarly integrity.
• Unit of analysis: individual
• Variation: Extent to which
It becomes much easier to shape a research study with these three components.
Measurement Models
A concept is not measured directly, as has already been mentioned. This means that when it is time
to analyze our data, our contract is a latent or unobserved variable. The items on the survey are
observed because people gave us this information directly. This means that the survey items are
observed variables.
The measurement model links the latent variables with the observed variables statistically. A strong
measurement model indicates that the observed variables correlate with the underlying latent
variable or construct.
evaluation of the sources in terms of how each one relates to other sources and to the major
debates on the topic
• The primary purpose of a traditional or narrative literature review is to ana- lyse and
summaries a body of literature. This is achieved by presenting a comprehensive background
of the literature within the interested topic to highlight new research streams, identify gaps or
recognize inconsistencies. This type of literature review can help in refining, focusing and
shaping research questions as well as in developing theoretical and conceptual frameworks
(Coughlan et al., 2007).
• The systematic literature review in contrast undertakes a more rigorous approach to
reviewing the literature, perhaps because this type of review is often used to answer highly
structured and specific research questions.
• The meta-analysis literature review involves taking the findings from the chosen literature
and analyzing these findings by using standardized statistical procedures (Coughlan et al.,
2007). Polit and Beck (2006) argue that meta-analysis methods help in drawing conclusions
and detecting patterns and relationships between findings.
• They also discuss meta-synthesis, which is a non-statistical procedure; instead it evaluates
and analyses findings from qualitative studies and aims to build on previous
conceptualizations and interpretations.
U.S. Office of Management & Budget explained that, “Research data, unlike other types of information,
is collected, observed, or created, for purposes of analysis to produce original research results”
According to University of Edinburgh, "Research data is a recorded factual material commonly accepted
in the scientific community as necessary to validate research findings..."
National Endowment for the Humanities defined about research data as, "they are materials generated or
collected during the course of conducting research."
• External sources
When data is collected from sources outside the organisation, they are known as the external
sources. For example, if a tour and travel company obtains information on Karnataka tourism from
Karnataka Transport Corporation, it would be known as an external source of data
Meaning Primary data refers to the first Secondary data means data
hand data gathered by the collected by someone else earlier.
researcher himself.
Basis For
Primary Data Secondary Data
Comparison
Analysis of data:
After the data have been collected, the researcher turns to the task of analysing them. The
analysis of data requires a number of closely related operations such as establishment of
categories, the application of these categories to raw data through coding, tabulation and then
drawing statistical inferences. The unwieldy data should necessarily be condensed into a few
manageable groups and tables for further analysis. Thus, researcher should classify the raw data
into some purposeful and usable categories. Coding operation is usually done at this stage
through which the categories of data are transformed into symbols that may be tabulated and
counted. Editing is the procedure that improves the quality of the data for coding. With coding
the stage is ready for tabulation. Tabulation is a part of the technical procedure wherein the
classified data are put in the form of tables. The mechanical devices can be made use of at this
juncture. A great deal of data, especially in large inquiries, is tabulated by computers.
Computers not only save time but also make it possible to study large number of variables
affecting a problem simultaneously. Analysis work after tabulation is generally based on the
computation of various percentages, coefficients, etc., by applying various well defined
statistical formulae.
Hypothesis-testing:
After analysing the data as stated above, the researcher is in a position to test the
hypotheses, if any, he had formulated earlier. Do the facts support the hypotheses or they happen
to be contrary? This is the usual question which should be answered while testing hypotheses.
Various tests, such as Chi square test, t-test, F-test, have been developed by statisticians for the
purpose. The hypotheses may be tested through the use of one or more of such tests, depending
upon the nature and object of research inquiry.
i. Finally, the researcher has to prepare the report of what has been done by him. Writing of report
must be done with great care keeping in view the following:
ii. The layout of the report should be as follows: (i) the preliminary pages; (ii) the main text, and (iii)
the end matter.
iii. In its preliminary pages the report should carry title and date followed by acknowledgements and
foreword. Then there should be a table of contents followed by a list of tables and list of graphs
and charts, if any, given in the report.
iv. Report should be written in a concise and objective style in simple language avoiding vague
expressions such as ‘it seems,’ ‘there may be’, and the like.
v. Charts and illustrations in the main report should be used only if they present the information more
clearly and forcibly.
vi. Calculated ‘confidence limits’ must be mentioned and the various constraints experienced in
conducting research operations may as well be stated.
In its preliminary pages the report should carry title and date followed by acknowledgements and
foreword. Then there should be a table of contents followed by a list of tables and list of graphs
and charts, if any, given in the report.
Finally the main text of the report should have the following parts:
a) Introduction: It should contain a clear statement of the objective of the research and an
explanation of the methodology adopted in accomplishing the research. The scope of the study
along with various limitations should as well be stated in this part.
b) Summary of findings: After introduction there would appear a statement of findings and
recommendations in non-technical language. If the findings are extensive, they should be
summarized.
c) Main report: The main body of the report should be presented in logical sequence and broken-
down into readily identifiable sections.
d) Conclusion: Towards the end of the main text, researcher should again put down the results of his
research clearly and precisely. In fact, it is the final summing up.
At the end of the report, appendices should be enlisted in respect of all technical data.
Bibliography, i.e., list of books, journals, reports, etc., consulted, should also be given in the
end. Index should also be given specially in a published research report.
Learning Objectives
• To learn about reliability and validity during research tool construction
• To understand Data processing
• To learn presentation and data tabulation
Learning Outcomes
At the end of the unit they will be able to:
• To apply different data collection methods on business research
• To apply measuring and scaling techniques
• To present data visually for reporting
Mode of Assessment
S.No Title of Teaching Textbook/ Link Tool(Quiz/Pu
Topic (PPT/Seminar/ Reference Book (if Applicable) link zzle/
Chalk & should on Springboard/ Assignment/
Board etc.) Coursera / Nptel Seminaretc..)
NPTEL Link:
Types of https://onlinecourses.n
William G Zikmund, Barry J
research Chalk and ptel.ac.in/noc23_mg54
Babin, Jon C.Carr, Atanu
Reliability & board/PPT /unit?unit=17&lesson=
Adhikari, Mitch Griffin,
1. Validity 20
Business Research methods, A
South Asian Perspective, 8th
Edition, Cengage Learning, https://www.youtube.c
New Delhi,2012. om/watch?v=ur- Quiz
pIS0CxOg
https://www.youtube.c
Sampling Chalk and Donald R. Cooper & Pamela S om/watch?v=pTuj57u
2. Methods board/PPT Schindler, Business Research
XWlk
Methods, Tata MC Graw hill
Publishing companies, 9th
edition, New Delhi
Design, at a basic level, means planning. Generally, some decisions are to be taken before the actual
action. The research design is the conceptual structure within which research is conducted; it constitutes
the blueprint for the collection, measurement and analysis of data. As such the design includes an outline
of what the researcher will do from writing the hypothesis and its operational implications to the final
analysis of data. Decisions regarding what, where, when, how much, by what means concerning an inquiry
or a research study constitute a research design. It is a process of deliberate anticipation directed towards
bringing an expected situation under control. Thus, Research design is the plan and structure of
investigation so conceived as to obtain answers to research questions. The plan is the overall scheme or
program of the research. It includes an outline of what the investigator will do from the selection of
research problem to the conclusion of the research study
A Research Design is simply a structural framework of various research methods as well as techniques
that are utilized by a researcher.
A researcher usually chooses the research methodologies and techniques at the start of the research. The
document that contains information about the technique, methods and essential details of a project is called
a research design. Experts define research design as the glue that holds the research project together. It
(research design) helps provide a structure and direction to the research, yielding favorable results.
According to Kerlinger :
"Research design is the plan, structure, and strategy of investigation conceived so as to obtain answers to
research questions and to control variance".
A proper design sets your study up for success. Successful research studies provide insights that are
accurate and unbiased. You’ll need to create a survey that meets all of the main characteristics of a design.
There are four key characteristics:
• Neutrality: When you set up your study, you may have to make assumptions about the data you expect
to collect. The results projected in the research should be free from bias and neutral. Understand
opinions about the final evaluated scores and conclusions from multiple individuals and consider those
who agree with the results.
• Reliability: With regularly conducted research, the researcher expects similar results every time.
You’ll only be able to reach the desired results if your design is reliable. Your plan should indicate
how to form research questions to ensure the standard of results.
• Validity: There are multiple measuring tools available. However, the only correct measuring tools are
those which help a researcher in gauging results according to the objective of the research.
The questionnaire developed from this design will then be valid.
• Generalization: The outcome of your design should apply to a population and not just a
restricted sample. A generalized method implies that your survey can be conducted on any part of a
population with similar accuracy.
The above factors affect how respondents answer the research questions, so they should balance all the
above characteristics in a good design.
1) Reduces Cost:
Research design is needed to reduce the excessive costs in terms of time, money and effort by planning
the research work in advance.
2) Facilitate the Smooth Scaling:
In order to perform the process of scaling smoothly, an efficient research design is of utmost importance.
It makes the research process effective enough to give maximum relevant outcome in an easy way.
3) Helps in Relevant Data Collection and Analysis:
Research design helps the researchers in planning the methods of data collection and analysis as per the
objective of research. It is also responsible for the reliable research work as it is the foundation for entire
research. Lack of proper attention in preparation of research design can harm the entire research work.
4) Assists in Smooth Flow of Research Operations:
Research design is necessary to give better and effective structure to the research. Since all the decisions
are made in advance, therefore, research design facilitates the smooth flow of research operations and
reduces the possible problems of researchers.
5) Helps in Getting Reviews from Experts :
Research design helps in developing an overview about the whole research process and thus assists in
getting responses and reviews from different experts in that field.
6) Provides a Direction to Executives:
Research design directs the researcher as well as the executives involved in the research for giving their
relevant assistance.
Research design, when done right, can generate similar results every time it is performed. However,
yielding similar results is only possible if your research design is reliable.Here are some of the
elements/components of a good research design:
• Purpose statement- Statement of research objectives, i.e., why the research project is to be
conducted
• Data collection methods- Methods and procedures used for collection of data, Constitution of
sample size and its procedure out of total population
• Techniques of data analysis- tools and techniques used to analyse data
• Types of research methodologies-
• Challenges of the research
2.1.5.1 Research design can be split into four phases: In order to understand the research design
concept, we can go through following four phases:
1. The sampling design: It deals with the method of selecting items to be observed for the given study.
2. The observational design: It relates to the conditions under which the observations are to be made.
3. The statistical design: It deals with the question of how many subjects are to be observed and how the
observations are to be analysed.
4. The operational design: It deals with the specific techniques by which the procedures specified in the
sampling, statistical and observational designs can be carried out.
research design, you can simply observe behaviors or phenomena and record them rather than conducting
an experiment.
3. Descriptive research design
Descriptive design is another type of research design. The goal of using a descriptive research design is
to describe a research topic, so this type of research is useful when you need more information about your
topic. Descriptive research design can also help you understand the "what," "where," "when" and "how"
of your research topic. The one question that a descriptive research design does not answer is "why."
2. Case study
Another type of observational research design is the case study format. Case studies are analyses of real-
world situations to understand and evaluate past problems and solutions. Therefore, case studies are useful
when you want to test how an idea applies to real life, and this research design is especially popular in
marketing, advertising and social science. The five-part case study format includes:
• Title
• Overview
• Problem
• Solution
• Results
5. Action research design
Another type of research design is the action research design. The action research design format involves
initial exploratory analysis and the development of an action strategy. This design format is collaborative,
and it focuses on finding solutions, making it practical for many research topics. You can use the action
research design when you want to solve real problems.
6. Experimental research design
Experimental research design is also common. The experimental research design is especially useful when
you want to test how different factors affect a situation, making this design type very versatile. The
experimental research design uses the scientific method, which includes elements like:
Hypothesis: A research hypothesis is a statement that describes what you predict your research to reveal.
Independent variable: An independent variable is a variable that does not depend on other variables.
Dependent variable: A dependent variable is a variable that depends on another variable.
Control variable: A control variable is a variable that remains constant throughout a research experiment.
7. Causal research design
The causal research design is another type of research design that researchers commonly choose. The
causal research design format attempts to identify and understand relationships between variables, which
can be valuable across many industries. Causal research designs typically involve at at least two variables
and explore many possible reasons for a relationship between variables.
8. Correlational research design
Along with the causal research design, the correlational research design is also commonly used. The
correlational research design format, like the causal format, identifies relationships between variables.
When you use a correlational research design, you measure variables but do not alter them.
9. Diagnostic research design
Another type of research design is the diagnostic research design. The diagnostic research design attempts
to find the underlying factors that cause events or phenomena to occur. This research type is useful to help
you understand what's causing problems so you can find solutions.
10. Cross-sectional research design
Cross-sectional design is another type of observational research design. The cross-sectional research
design involves observing multiple individuals at the same point in time. This research type does not alter
variables.
11. Sequential research design
Sequential research design is another useful type of research design. The sequential research design format
divides research into stages, and each stage builds on the last. Therefore, you can complete sequential
research at multiple points in time, allowing you to study phenomena that occur over periods of time.
12. Cohort research design
Cohort research design, a type of observational research, is another research design type. This type of
research design is commonly used in medicine, but it can also have applications in other industries. Cohort
design involves examining research subjects who have already been exposed to a research topic, making
it especially effective for conducting ethical research on medical topics or risk factors. This design type is
very flexible, and it applies to both primary and secondary data.
13. Historical research design
Researchers can also use historical research design. Using the historical research design allows you to use
past data to test your hypothesis. Historical research relies on historical data like archives, maps, diaries
and logs. Using this research design can be especially useful for completing trend analysis or gathering
context for a research problem.
14. Field research design
Another type of research design is the field research design. The field research design, which is a
qualitative research method, allows you to observe subjects in natural environments. This can allow you
to collect data directly from real-world situations.
15. Systematic review
Systematic review is another type of research design. Completing a systematic review involves reviewing
existing evidence and analyzing data from existing studies. This can allow you to use previous research
to come up with new conclusions.
16. Survey
Researchers also use the survey research design frequently. You can use surveys to gather information
directly from your sample population. Some types of surveys include:
Interviews: Interviews are one popular type of survey. Interviews allow you to ask questions to a research
subject one-on-one, which can give you the opportunity to ask follow-up questions and gain additional
insights.
Online forms: You can also use online forms to conduct surveys. You can use many websites or software
programs to create intuitive online forms with a variety of question types, including short-answer and
multiple-choice.
Focus groups: Focus groups are another key survey method. By using focus groups, you can facilitate
discussions with a group of research subjects to gain valuable research insights from your sample
population.
Questionnaires: Another type of survey is a questionnaire. In a questionnaire, you can simply list questions
for a research subject to answer, making this an effective data collection method.
17. Meta-analysis research design
Meta-analysis is a type of quantitative research design. The meta-analysis research design format uses a
variety of populations from different existing studies. This means that this method allows you to use
previous research to form new conclusions.
18. Mixed-method research design
Researchers can also use a mixed-method research design. Mixed-method research designs combine
multiple research methods to create the best path for a specific research project. This type of research can
include both qualitative and quantitative research methods.
1) Research Questions :
Research questions perform an important role in selecting the method to carry-out research. There are
various forms of research designs which include their own methods for collecting data.
For example, a survey can be conducted for the respondents to ask them descriptive or interconnected
questions while a case study or a field survey can be used to identify the firm's decision-making process.
3) Research Objective :
Every research is carried out to obtain the results which help to achieve some objectives. This research
objective influences the selection of research design. Researcher should adopt the research design which
is suitable for research objective and also provides best solution to the problem along with the valuable
result.
4) Research Problem :
Selection of the research design is greatly affected by the type of research problems. For example, the
researcher selects experimental research design to find out cause and-effect relationship of the research
problem. Similarly, if the research problem includes in depth study, then the researcher generally adopts
experimental research design method.
5) Personal Experiences :
Selection of research design also depends upon the personal experience of researchers.
For example, the researcher who has expertise in statistical analysis would be liable to select the
quantitative research designs. While, those researchers who are specialists in theoretical facets of research
will be forced to select qualitative research design.
6) Target Audience :
The type of target audience plays very important role in selection of research design. Researcher must
consider the target audience for which the research is carried-out. Audiences may either be general public,
business professionals or government.
For example, if the research is proposed for general public, then the researcher should select qualitative
research design. Similarly, quantitative research design would be appropriate for the researcher to
introduce the report in front of the business experts.
Typically, a good and well-planned research design consists of the following components, or tasks:
• Selection of appropriate type of design: Exploratory, descriptive and/or causal design.
• Identification of specific information needed based problem in hand and the selected design.
• Specification of measurement and scaling procedures for measuring the selected information.
• Mode of collection of information and specification of appropriate form for data collection.
• Designing of appropriate sampling process and sample size.
• Specification of appropriate data analysis method.
Research design is a plan to answer your research question. A research method is a strategy used to
implement that plan. Research design and methods are different but closely related, because good research
design ensures that the data you obtain will help you answer your research question more effectively. It
depends on your research goal. It depends on what subjects (and who) you want to study. Let's say you
are interested in studying what makes people happy, or why some students are more conscious about
recycling on campus.
Business research is a part of the business intelligence process. It is usually conducted to determine
whether a company can succeed in a new region, to understand their competitors, or to simply select a
marketing approach for a product. This research can be carried out using qualitative research methods or
quantitative research methods.
Business research can be done for anything and everything. In general, when people speak about business
research it means asking research questions to know where the money can be spent to increase sales,
profits or market share. Such research is critical to make wise and informed decisions.
For example: A mobile company wants to launch a new model in the market. But they are not aware of
what are the dimensions of a mobile that are in most demand. Hence, the company conducts a business
research using various methods to gather information and the same is then evaluated and conclusions are
drawn, as to what dimensions are most in-demand, This will enable the researcher to make wise decisions
to position his phone at the right price in the market and hence acquire a larger market share.
Quantitative research methods are research methods that deal with numbers. It is a systematic empirical
investigation using statistical, mathematical or computational techniques. Such methods usually start
with data collection and then proceed to statistical analysis using various methods. The following are some
of the research methods used to carry out business research.
Survey research
Survey research is one of the most widely used methods to gather data especially for conducting business
research. Surveys involve asking various survey questions to a set of audiences through various types
like online polls, online surveys, questionnaires, etc. Nowadays, most of the major corporations use this
method to gather data and use it to understand the market and make appropriate business decisions.
Various types of surveys like cross-sectional surveys which are needed to collect data from a set of
audience at a given point of time or longitudinal surveys which are needed to collect data from a set of
audience across various time duration in order understand changes in the respondents’ behavior are used
to conduct survey research. With the advancement in technology, now surveys can be sent online
through email or social media.
For example: A company wants to know the NPS score for their website i.e. how satisfied are people who
are visiting their website. An increase in traffic to their website or the audience spending more time on a
website can result in higher rankings on search engines which will enable the company to get more leads
as well as increase its visibility. Hence, the company can ask people who visit their website with a few
questions through an online survey to understand their opinions or gain feedback and hence make
appropriate changes to the website to increase satisfaction.
Correlational research
Correlational research is conducted to understand the relationship between two entities and what impact
each one of them has on the other. Using mathematical analysis methods, correlational research enables
the researcher to correlate two or more variables. Such research can help understand patterns,
relationships, trends, etc. Manipulation of one variable is possible to get the desired results as well.
Generally, a conclusion cannot be drawn only on the basis of correlational research.
For example: A research can be conducted to understand the relationship between colors and gender-based
audiences. Using such research and identifying the target audience, a company can choose the production
of particular color products to be released in the market. This can enable the company to understand the
supply and demand requirements of its products.
Causal-Comparative research
Causal-Comparative research is a method based on the comparison. It is used to deduce the cause-effect
relationship between variables. Sometimes also known as quasi-experimental research, it involves
establishing an independent variable and analyzing the effects on the dependent variable. In such research,
manipulation is not done; however, changes are observed on the variables or groups under the influence
of the same changes. Drawing conclusions through such research is a little tricky as independent and
dependent variables will always exist in a group, hence all other parameters have to be taken into
consideration before drawing any inferences from the research.
For example: A research can be conducted to analyze the effect of good educational facilities in rural
areas. Such a study can be done to analyze the changes in the group of people from the rural areas when
they are provided with good educational facilities and before that.
Another example can be to analyze the effect of having dams and how it will affect the farmers or
production of crops in that area.
Experimental research
Experimental research is based on trying to prove a theory. Such research may be useful in business
research as it can let the product company know some behavioral traits of its consumers, which can lead
to more revenue. In this method, an experiment is carried out on a set of audiences to observe and later
analyze their behavior when impacted with certain parameters.
For example: Experimental research was conducted recently to understand if particular colors have an
effect on its consumers’ hunger. A set of the audience was then exposed to those particular colors while
they were eating and the subjects were observed. It was seen that certain colors like red or yellow increase
hunger. Hence, such research was a boon to the hospitality industry. You can see many food chains like
Mcdonalds, KFC, etc. using such colors in their interiors, brands, as well as packaging.
Another example of inferences drawn from experimental research, which is used widely by most bars/pubs
across the world is that loud music in the workplace or anywhere makes a person drink more in less time.
This was proven through experimental research and was a key finding for many business owners across
the globe.
Literature research is one of the oldest methods available. It is very economical and a lot of information
can be gathered using such research. Online research or literature research involves gathering information
from existing documents and studies which can be available at Libraries, annual reports, etc. Nowadays,
with the advancement in technology, such research has become even more simple and accessible to
everyone. An individual can directly research online for any information that is needed, which will give
him in-depth information about the topic or the organization. Such research is used mostly by marketing
and salespeople in the business sector to understand the market or their customers. Such research is carried
out using existing information that is available from various sources, although care has to be taken to
validate the sources from where the information is going to be collected.
For example: A salesperson has heard a particular firm is looking for some solution which their company
provides. Hence, the salesperson will first search for a decision maker from the company, investigate what
department he is from and understand what the target company is looking for and what are they into. Using
this research he can cater his solution to be spot on when he pitches it to this client. He can also reach out
to the customer directly by finding a mean to communicate with him by researching online.’
Qualitative research is a method that has a high importance in business research. Qualitative research
involves obtaining data through open-ended conversational means of communication. Such research
enables the researcher to not only understand what the audience thinks but also why he thinks it. In such
research, in-depth information can be gathered from the subjects depending on their responses. There are
various types of qualitative research methods such as interviews, focus groups, ethnographic research,
content analysis, case study research that are widely used. Such methods are of very high importance in
business research as it enables the researcher to understand the consumer. What motivates the consumer
to buy and what does not is what will lead to higher sales and that is the prime objective for any
business.Following are a few methods that are widely used in today’s world by most businesses.
Interviews
Interviews are somewhat similar to surveys, like sometimes they may have the same questions used. The
difference is that the respondent can answer these open ended questions at a length and the direction of
the conversation or the questions being asked can be changed depending on the response of the subject.
Such a method usually gives the researcher, detailed information about the perspective or opinions from
its subject. Carrying out interviews with subject matter experts can also give important information critical
to some businesses.
For example: An interview was conducted by a telecom manufacturer, with a group of women to
understand why do they have less number of female customers. After interviewing them, the researcher
understood that there were less feminine colors in some of the models, hence females preferred not
purchasing them. Such information can be critical to a business such as a telecom manufacturer and hence
it can be used to increase its market share by targeting women customers by launching some feminine
colors in the market.
Another example would be to interview a subject matter expert in social media marketing. Such an
interview can enable a researcher to understand why certain types of social media advertising strategies
work for a company and why some of them don’t.
Focus groups
Focus groups are a set of individuals selected specifically to understand their opinions and behaviors. It is
usually a small set of a group that is selected keeping in mind, the parameters for their target market
audience to discuss a particular product or service. Such a method enables a researcher with a larger
sample than the interview or a case study while taking advantage of conversational communication.
Nowadays, focus groups can be sent online surveys as well to collect data and answer why, what and how
questions. Such a method is very crucial to test new concepts or products before they are launched in the
market.
For example: A research is conducted with a focus group to understand what dimension of screen size is
preferred most by the current target market. Such a method can enable a researcher to dig deeper if the
target market focused more on screen size, features or colors of the phone. Using this data, a company can
make wise decisions to its product line and secure a higher market share.
Ethnographic research
Ethnographic research is one of the most challenging research but can give extremely precise results. Such
research is used quite rarely, as it is time-consuming and can be expensive as well. It involves the
researcher to adapt to the natural environment and observe its target audience to collect data. Such a
method is generally used to understand cultures, challenges or other things that can occur in that particular
setting.
For example: The worldly renowned show “Undercover boss” would be an apt example of how
ethnographic research can be used in businesses. In this show, the senior management of a large
organization works in his own company as a regular employee to understand what improvements can be
done, what is the culture in the organization and to identify hard-working employees and to reward them.
It can be seen that the researcher had to spend a good amount of time in the natural setting of the employees
and adapt to their ways and processes. While observing in this setting, the researcher could find out the
information he needed first hand without loss of any information or any bias and to improve certain things
that would impact his business.
Case study research is one of the most important in business research. It is also used as marketing collateral
by most businesses to land up more clients. Case study research is conducted to assess customer
satisfaction, document the challenges that were faced and the solutions that the firm gave them. Using
these inferences are made to point out the benefits that the customer enjoyed for choosing their specific
firm. Such research is widely used in other fields like education, social sciences, and similar. Case studies
are provided by businesses to new clients to showcase their capabilities and hence such research plays a
crucial role in the business sector.
For example: A services company has provided a testing solution to one of its clients. A case study
research is conducted to find out what were the challenges faced during the project, what was the scope
of their work, what objective was to be achieved and what solutions were given to tackle the challenges.
The study can end with the benefits that the company provided through their solutions, like reduced time
to test batches, easy implementation or integration of the system, or even cost reduction. Such a study
showcases the capability of the company and hence it can be stated as empirical evidence to the new
prospect.
Website intercept surveys or website visitor profiling/research is something new that has come up and is
quite helpful in the business sector. It is an innovative approach to collect direct feedback from your
website visitors using surveys. In recent times a lot of business generation happens online and hence it is
important to understand the visitors of your website as they are your potential customers. Collecting
feedback is critical to any business as without understanding a customer, no business can be successful.
A company has to keep its customers satisfied and try to make them loyal customers in order to stay on
top.
A website intercept survey is an online survey that allows you to target visitors to understand their intent
and collect feedback to evaluate the customers’ online experience. Information like visitor intention,
behavior path, satisfaction of overall website, can be collected using this.
Depending on what information a company is looking for, multiple forms of website intercept surveys can
be used to gather responses. Some of the popular ones are Pop-ups or also called Modal boxes and on-
page surveys.
For example: A prospective customer is looking for a particular product that a company is selling. Once
he is directed to the website, an intercept survey will start noting his intent, and path. Once the transaction
has been made a pop-up or an on-page survey is provided to the customer to rate the website. Such research
enables the researcher to put this data to good use and hence understand the customers’ intent, his path
and improve any parts of the website depending on the responses, which in turn would lead to satisfied
customers and hence, higher revenues and market share.
Here is a chart that highlights the major differences between qualitative and quantitative research:
Focus on explaining and understanding experienc Focus on quantifying and measuring phenome
es and perspectives. na.
Use of non-numerical data, such as words, images Use of numerical data, such as statistics and su
, and observations. rveys.
Usually uses small sample sizes. Usually uses larger sample sizes.
Typically emphasizes in-depth exploration and int Typically emphasizes precision and objectivity
erpretation. .
Data analysis involves interpretation and narrative Data analysis involves statistical analysis and h
analysis. ypothesis testing.
Business research is one of the most effective ways to understand customers, the market and competitors.
Such research helps companies to understand the demand and supply of the market. Using such research
will help businesses reduce costs, and create solutions or products that are targeted to the demand in the
market and the correct audience.
In-house business research can enable senior management to build an effective team or train or mentor
when needed. Business research enables the company to track its competitors and hence can give you the
upper hand to stay ahead of them. Failures can be avoided by conducting such research as it can give the
researcher an idea if the time is right to launch its product/solution and also if the audience is right. It will
help understand the brand value and measure customer satisfaction which is essential to continuously
innovate and meet customer demands.
This will help the company grow its revenue and market share. Business research also helps recruit ideal
candidates for various roles in the company. By conducting such research a company can carry out a
SWOT analysis, i.e. understand the strengths, weaknesses, opportunities, and threats. With the help of this
information, wise decisions can be made to ensure business success.
Business research is the first step that any business owner needs to set up his business, to survive or to
excel in the market. The main reason why such research is of utmost importance is that it helps businesses
to grow in terms of revenue, market share and brand value.
The term reliability business research refers to the consistency of a research study or measuring test
(whether the results can be reproduced under the same conditions). It shows how consistently a method
measures something. If the same result can be consistently achieved by using the same methods under the
same circumstances, the measurement is considered reliable.
For example, if a person weighs themselves during the course of a day they would expect to see a similar
reading. Scales which measured weight differently each time would be of little use. The same analogy
could be applied to a tape measure which measures inches differently each time it was used. It would not
be considered reliable.
Consider another example: You measure the temperature of a liquid sample several times under identical
conditions. The thermometer displays the same temperature every time, so the results are reliable.
If findings from research are replicated consistently they are reliable. A correlation coefficient can be used
to assess the degree of reliability. If a test is reliable it should show a high positive correlation.
Of course, it is unlikely the exact same results will be obtained each time as participants and situations
vary, but a strong positive correlation between the results of the same test indicates reliability.
2.3.1 Method to Access Reliability
To determine if your research methods are producing reliable results, you must perform the same task
multiple times or in multiple ways. Typically, this involves changing some aspect of the research
assessment while maintaining control of the research. For example, this could mean:
Both methods maintain control by keeping one element exactly the same and changing other elements to
ensure other factors don't influence the research results. Here are some careers that often test for reliability
in data:
• Media sociologist
• Food scientists
• Forensic science technicians
• Marketing analysts
• Medical scientists
• Economists
• Policy analysts
• Behavioral scientists
• Business analysts
1. Test-retest reliability
The test-retest reliability method in research involves giving a group of people the same test more than
once. If the results of the test are similar each time you give it to the sample group, that shows your
research method is likely reliable and not influenced by external factors, like the sample group's mood or
the day of the week. Here are the guidelines for this type of research:
Example: Give a group of college students a survey about their satisfaction with their school's parking
lots on Monday and again on Friday, then compare the results to check the test-retest reliability. Consider
another example, A test of color blindness for trainee pilot applicants should have high test-retest
reliability, because color blindness is a trait that does not change over time.
To measure test-retest reliability, you conduct the same test on the same group of people at two different
points in time. Then you calculate the correlation between the two sets of results.
• When designing tests or questionnaires, try to formulate questions, statements, and tasks in a way that
won’t be influenced by the mood or concentration of participants.
• When planning your methods of data collection, try to minimize the influence of external factors, and
make sure all samples are tested under the same conditions.
• Remember that changes or recall bias can be expected to occur in the participants over time, and take
these into account.
This strategy involves giving the same group of people multiple types of tests to determine if the results
stay the same when using different research methods. If they do, this means the methods are likely reliable
because, otherwise, the participants in the sample group may behave differently and change the results.
For this strategy to succeed, it's important that:
Example: In marketing, you may interview customers about a new product, observe them using the
product and give them a survey about how easy the product is to use and compare these results as a parallel
forms reliability test. In educational assessment, it is often necessary to create different versions of tests
to ensure that students don’t have access to the questions in advance. Parallel forms reliability means that,
if the same students take two different versions of a reading comprehension test, they should get similar
results in both tests.
The most common way to measure parallel forms reliability is to produce a large set of questions to
evaluate the same thing, then divide these randomly into two question sets.The same group of respondents
answers both sets, and you calculate the correlation between the results. High correlation between the two
indicates high parallel forms reliability.
• Ensure that all questions or test items are based on the same theory and formulated to measure the
same thing.
3. Inter-rater reliability
The inter-rater reliability testing involves multiple researchers assessing a sample group and comparing
their results. This can help them avoid influencing factors related to the assessor, including:
• Personal bias
• Mood
• Human error
If most of the results from different assessors are similar, it's likely the research method is reliable and can
produce usable research because the assessors gathered the same data from the group. This is useful for
research methods where each assessor may have different criteria but can still end up with similar research
results, like:
• Observations
• Interviews
• Surveys
Example: Multiple behavioral specialists may observe a group of children playing to determine their social
and emotional development and then compare notes to check for inter-rater reliability. In an observational
study where a team of researchers collect data on classroom behavior, interrater reliability is important:
all the researchers should agree on how to categorize or rate different types of behavior.
To measure interrater reliability, different researchers conduct the same measurement or observation on
the same sample. Then you calculate the correlation between their different sets of results. If all the
researchers give similar ratings, the test has high interrater reliability.
• Clearly define your variables and the methods that will be used to measure them.
• Develop detailed, objective criteria for how the variables will be rated, counted or categorized.
• If multiple researchers are involved, ensure that they all have exactly the same information and
training.
Checking for internal consistency in research involves making sure your internal research methods or parts
of research methods deliver the same results. There are two typical ways to make this determination:
Split-half reliability test: You can perform this test by splitting a research method, like a survey or test,
in half, delivering both halves separately to a sample group, then comparing the results to ensure the
method can produce consistent results. If the results are consistent, then the results of the research method
are likely reliable.
Inter-item reliability test: With this assessment, you administer sample groups multiple testing items,
like with parallel forms reliability testing, and calculate the correlation between the results of each of the
method results. With this information, you calculate the average and use the number to determine if the
results are reliable.
Example: You may give a company's cleaning department a questionnaire about which cleaning products
work the best, but you split it in half and give each half to the department separately and calculate the
correlation to test for split-half reliability.Later, you interview the members of the cleaning department,
then bring them into small focus groups and observe them at work to determine which cleaning products
get the most use and which people like best. You calculate the correlation between these answers and
average the results to find the average inter-item reliability. Likewise To measure customer satisfaction
with an online store, you could create a questionnaire with a set of statements that respondents must agree
or disagree with. Internal consistency tells you whether the statements are all reliable indicators of
customer satisfaction.
Ways to measure it
Two common methods are used to measure internal consistency.
• Average inter-item correlation: For a set of measures designed to assess the same construct, you
calculate the correlation between the results of all possible pairs of items and then calculate the
average.
• Split-half reliability: You randomly split a set of measures into two sets. After testing the entire set
on the respondents, you calculate the correlation between the two sets of responses.
• Take care when devising questions or measures: those intended to reflect the same concept should be
based on the same theory and carefully formulated
Ensuring Reliability
• To enhance the reliability of your research, you need to apply your measurement method consistently.
The chances of reproducing the same results for a test are higher when you maintain the method you’re
using to experiment.
• For example, you want to determine the reliability of the weight of a bag of chips using a scale. You
have to consistently use this scale to measure the bag of chips each time you experiment.
• You must also keep the conditions of your research consistent. For instance, if you’re experimenting
to see how quickly water dries on sand, you need to consider all of the weather elements that day.
• So, if you experimented on a sunny day, the next experiment should also be conducted on a sunny day
to obtain a reliable result.
Test-retest The same test over time. Measuring a property that you expect
to stay the same over time.
Parallel forms Different versions of a test which Using two different tests to measure
are designed to be equivalent. the same thing.
Internal consistency The individual items of a test. Using a multi-item test where all the
items are intended to measure the
same variable.
Validity refers to how accurately a method measures what it is intended to measure (whether the results
really do represent what they are supposed to measure).
If research has high validity, that means it produces results that correspond to real properties,
characteristics, and variations in the physical or social world.
Consider the same example for thermometer specified above, If the thermometer shows different
temperatures each time, even though you have carefully controlled conditions to ensure the sample’s
temperature stays the same, the thermometer is probably malfunctioning, and therefore its measurements
are not valid.
1) Content Validity
Content Validity a process of matching the test items with the instructional objectives. Content
validity is the most important criterion for the usefulness of a test, especially of an achievement test. It is
also called as Rational Validity or Logical Validity or Curricular Validity or Internal Validity or Intrinsic
Validity.
Content validity refers to the degree or extent to which a test consists items representing the
behaviors that the test maker wants to measure. The extent to which the items of a test are true
representative of the whole content and the objectives of the teaching is called the content validity of the
test.
Content validity is estimated by evaluating the relevance of the test items; i.e. the test items must duly
cover all the content and behavioural areas of the trait to be measured. It gives idea of subject matter or
change in behaviour. This way, content validity refers to the extent to which a test contains items
representing the behaviour that we are going to measure. The items of the test should include every
relevant characteristic of the whole content area and objectives in right proportion.
Before constructing the test, the test maker prepares a two-way table of content and objectives, popularly
known as “Specification Table”.
For example, if I were to measure what causes hair loss in women. I’d have to consider things like
postpartum hair loss, alopecia, hair manipulation, dryness, and so on.
By omitting any of these critical factors, you risk significantly reducing the validity of your research
because you won’t be covering everything necessary to make an accurate deduction.
For example, a certain woman is losing her hair due to postpartum hair loss, excessive manipulation, and
dryness, but in my research, I only look at postpartum hair loss. My research will show that she has
postpartum hair loss, which isn’t accurate.Yes, the conclusion is correct, but it does not fully account for
the reasons why this woman is losing her hair.
Some general points for ensuring content validity are given below:
1. Test should serve the required level of students, neither above nor below their standard.
2. Language should be upto the level of students.
3. Anything which is not in the curriculum should not be included in test items.
4. Each part of the curriculum should be given necessary weightage. More items should be selected from
more important parts of the curriculum.
Limitations:
1. The weightage to be given to different parts of content is subjective.
2. It is difficult to construct the perfect objective test.
3. Content validity is not sufficient or adequate for tests of Intelligence, Achievement, Attitude and to
some extent tests of Personality.
4. Weightage given on different behaviour change is not objective.
2) Criterion Validity
This measures how well your measurement correlates with the variables you want to compare it with to
get your result. The two main classes of criterion validity are predictive and concurrent.
3) Predictive validity
It helps predict future outcomes based on the data you have. For example, if a large number of students
performed exceptionally well in a test, you can use this to predict that they understood the concept on
which the test was based and will perform well in their exams. Predictive Validity the extent to which test
predicts the future performance of employees.
Predictive validity is concerned with the predictive capacity of a test. It indicates the effectiveness of a
test in forecasting or predicting future outcomes in a specific area. The test user wishes to forecast an
individual’s future performance. Test scores can be used to predict future behaviour or performance and
hence called as predictive validity.
In order to find predictive validity, the tester correlates the test scores with testee’s subsequent
performance, technically known as “Criterion”. Criterion is an independent, external and direct measure
of that which the test is designed to predict or measure. Hence, it is also known as “Criterion related
Validity”.
The predictive or empirical validity has been defined by Cureton (1965) as an estimate of the correlation
coefficient between the test scores and the true criterion.
Example:
Medical entrance test is constructed and administered to select candidate for admission into M.B.B.S.
courses. Basing on the scores made by the candidates on this test we admit the candidates.
After completion of the course they appear at the final M.B.B.S. examination. The scores of final M.B.B.S.
examination is the criterion. The scores of entrance test and final examination (criterion) are correlated.
High correlation implies high predictive validity.
Similar examples like other recruitment tests or entrance tests in Agriculture, Engineering, Banking,
Railway etc. could be cited here which must have high predictive validity.
That is tests used for recruitment, classification and entrance examination must have high predictive
validity. This type of validity is sometimes referred to as ‘Empirical validity’ or ‘Statistical validity’ as
our evaluation is primarily empirical and statistical.
Limitation:
If we get a suitable criterion-measure with which our test results are to be correlated, we can determine
the predictive validity of a test. But it is very difficult to get a good criterion. Moreover, we may not get
criterion-measures for all types of psychological tests.
4) Concurrent validity
Concurrent Validity correlating the test scores with another set of criterion scores.
Concurrent validity refers to the extent to which the test scores correspond to already established or
accepted performance, known as criterion. To know the validity of a newly constructed test, it is correlated
or compared with some available information.
Thus a test is validated against some concurrently available information. The scores obtained from a newly
constructed test are correlated with pre-established test performance. Suppose we have prepared a test of
intelligence. We administer it to group of pupils. The Stanford-Binet test is also administered to the same
group. Now test scores made on our newly constructed test and test scores made by pupils on the Stanford-
Binet Intelligence Test are correlated. If the coefficient of correlation is high, our intelligence test is said
to have high concurrent validity.
The dictionary meaning of the term ‘concurrent’ is ‘existing’ or ‘done at the same time’. Thus the term
‘concurrent validity’ is used to indicate the process of validating a new test by correlating its scores with
some existing or available source of information (criterion) which might have been obtained shortly before
or shortly after the new test is given.
For example, setting up a literature test for your students on two different books and assessing them at the
same time. You’re measuring your students’ literature proficiency with these two books. If your students
truly understood the subject, they should be able to correctly answer questions about both books.
To get a criterion measure, we are not required to wait for a long time.
The predictive validity differs from concurrent validity in the sense that in former validity we wait for the
future to get criterion measure. But in ease of concurrent validity we need not wait for longer gaps.
5) Face Validity
Face Validity to the extent the test appears to measure what is to be measured.
Face validity refers to whether a test appears to be valid or not i.e., from external appearance whether the
items appear to measure the required aspect or not. If a test measures what the test author desires to
measure, we say that the test has face validity. Thus, face validity refers not to what the test measures, but
what the test ‘appears to measure’. The content of the test should not obviously appear to be inappropriate,
irrelevant.
For example, a test to measure “Skill in addition” should contain only items on addition. When
one goes through the items and feels that all the items appear to measure the skill in addition, then it can
be said that the test is validated by face.
Although it is not an efficient method of assessing the validity of a test and as such it is not usually used
still then it can be used as a first step in validating the test. Once the test is validated at face, we may
proceed further to compute validity coefficient.
Moreover, this method helps a test maker to revise the test items to suit to the purpose. When a test is to
be constructed quickly or when there is an urgent need of a test and there is no time or scope to determine
the validity by other efficient methods, face validity can be determined. This type of validity is not
adequate as it operates at the facial level and hence may be used as a last resort.
6) Construct-Related Validity
Construct Validity the extent is which the test may be said to measure a theoretical construct or
psychological variable.
A construct is mainly psychological. Usually it refers to a trait or mental process. Construct validation is
the process of determining the extent to which a particular test measures the psychological constructs that
the test maker intends to measure.
It indicates the extent to which a test measures the abstract attributes or qualities which are not
operationally defined.
Gronlund and Linn states,” Construct validation maybe defined as the process of determining the extent
to which the test performance can be interpreted in terms of one or more psychological construct.”
Ebel and Frisbie describes, “Construct validation is the process of gathering evidence to support
the contention that a given test indeed measures the psychological construct that the test makers intended
for it to measure.”
Construct validity is also known as “Psychological Validity” or ‘Trait Validity’ or ‘Logical
Validity’. Construct validity means that the test scores are examined in terms of a construct. It studies the
construct or psychological attributes that a test measures.
The extent to which the test measures the personality traits or mental processes as defined by the test-
maker is known as the construct validity of the test.
While constructing tests on intelligence, attitude, mathematical aptitude, critical thinking, study skills,
anxiety, logical reasoning, reading comprehension etc. we have to go for construct validity. Take for
example, ‘a test of sincerity’.
Before constructing such types of test the test maker is confronted with the questions:
1. What should be the definition of the term sincerity?
2. What types of behaviour are to be expected from a person who is sincere?
3. What type of behaviour distinguishes between sincerity and insincerity?
Each construct has an underlying theory that can be brought to bear in describing and predicting a pupil’s
behaviour.
Gronlund (1981) suggests the following three steps for determining construct validity:
(i) Identify the constructs presumed to account for test performance.
(ii) Derive hypotheses regarding test performance from the theory underlying each construct.
(iii) Verify the hypotheses by logical and empirical means.
It must be noted that construct validity is inferential. It is used primarily when other types of
validity are insufficient to indicate the validity of the test. Construct validity is usually involved in such
as those of study habits, appreciation, honesty, emotional stability, sympathy etc.)
7) Convergent validity
Convergent validity is a subtype of construct validity. Construct validity is an indication of how well a
test measures the concept it was designed to measure. Convergent validity refers to how closely a test is
related to other tests that measure the same (or similar) constructs. Here, a construct is a behavior,
attitude, or concept, particularly one that is not directly observable.
Ideally, two tests measuring the same construct, such as stress, should have a moderate to high correlation.
High correlation is evidence of convergent validity, which, in turn, is an indication of construct validity.
Example: Suppose you use two different methods to collect data about anger: observation and a self-
report questionnaire. If the scores of the two methods are similar, this suggests that they indeed measure
the same construct. A high correlation between the two test scores suggests convergent validity. Consider
another example, the scores of two tests, one measuring self-esteem and the other measuring extroversion,
are likely to be correlated—individuals scoring high in self-esteem are more likely to score high in
extroversion. These two tests would then have high convergent validity.
8) Discriminant validity
Divergent Validity is used to determine if a test is too similar to another test. If a test is found to correlate
too strongly (or be too similar) with another test then it suggests that the tests are measuring the same
thing and are too alike to be considered different. An example would be a test used by a company for
hiring purposes that measures how proficient someone is at a particular skill. If the test correlates too
strongly with an IQ test then it essentially is just another IQ test instead of measuring something different.
Discriminant validity is a way of validifying research that involves demonstrating that one scale is
unrelated to other scales. It helps researchers to discriminate between two scales.
Example: Self-Esteem vs Musical Preferences Scales: Measuring the self-esteem of teenagers
and musical preferences.
Social Skills vs Computer Skills: Assessing the degree of relationship between a measure of social
skills and a measure of computer skills.
Examines the validity of your research by determining what not to base it on. You are removing elements
that are not a strong factor to help validate your research. Being a vegan, for example, does not imply that
you are allergic to meat.
9) Factorial Validity:
Factorial Validity the extent of correlation of the different factors with the whole test.
Factorial validity is determined by a statistical technique known as factor analysis. It uses methods of
explanation of inter-correlations to identify factors (which may be verbalised as abilities) constituting the
test.
In other words methods of inter-correlation and other statistical methods are used to estimate factorial
validity. The correlation of the test with each factor is calculated to determine the weight contributed by
each such factor to the total performance of the test.
This tells us about the factor loadings. This relationship of the different factors with the whole test is called
the factorial validity. Guilford (1950) suggested that factorial validity is the clearest description of what a
test measures and by all means should be given preference over other types of validity.
A. To Ensure Validity and Reliability in Your Research
You need a bulletproof research design to ensure that your research is both valid and reliable. This means
that your methods, sample, and even you, the researcher, shouldn’t be biased.
Ensuring Validity
There are several ways to determine the validity of your research, and the majority of them require the use
of highly specific and high-quality measurement methods.
Before you begin your test, choose the best method for producing the desired results. This method should
be pre-existing and proven.
Also, your sample should be very specific. If you’re collecting data on how dogs respond to fear, your
results are more likely to be valid if you base them on a specific breed of dog rather than dogs in general.
2.5 Variables
A variable is any kind of attribute or characteristic that you are trying to measure, manipulate and control
in statistics and research. All studies analyze a variable, which can describe a person, place, thing or idea.
A variable's value can change between groups or over time.
For example, if the variable in an experiment is a person's eye color, its value can change from brown to
blue to green from person to person.
Researchers and statisticians use variables to describe and measure the items, places, people or ideas
they're studying. Many types of variables exist, and you must choose the right variable to measure when
designing studies, selecting tests and interpreting results. A strong understanding of variables can lead to
more accurate statistical analyses and results.
Researchers often try to find out whether an independent variable causes other variables to change and in
what way. When analyzing relationships between study objects, researchers often try to determine what
makes the dependent variable change and how. Independent variables can influence dependent variables,
but dependent variables cannot influence independent variables.
2.5.3 Quantitative vs. qualitative variables
Quantitative variables Qualitative variables
Any data sets that involve numbers or Non-numerical values or
Definition
amounts groupings
Examples Height, distance or number of items Eye color or dog breed
Types Discrete and continuous Binary, nominal and ordinal
An intervening variable, also known as a mediator or mediating variable, explains the process through
which two variables are related, while a moderating, or moderator, variable affects the strength and
direction of that relationship.
A confounding variable is a type of extraneous variable that is associated with both the independent and
dependent variables. An extraneous variable is anything that could influence the dependent variable.
These unwanted variables can unintentionally change a study's results or how a researcher interprets those
results. A confounding variable influences the dependent variable, and also correlates with or causally
affects the independent variable. Confounding variables can invalidate your experiment results by making
them biased or suggesting a relationship between variables exists when it does not. Some of the ways to
control confounding variables so they do not affect the results of your experiment include:
• Adjustment: Adjust study parameters to account for the confounding variable and minimize its effects.
• Matching: Compare study groups with the same degree of confounding variables.
• Multivariate analysis: Use when analyzing multiple variables at once.
• Randomization: Spread confounding variables evenly between study groups.
• Restriction: Remove subjects or samples that have confounding factors.
• Stratification: Create study subgroups in which the confounding variable does not vary or vary much.
an experiment to prevent bias. Composite variables are often made up of two or more variables that are
highly related to one another conceptually or statistically.
A sample is a subset of individuals from a larger population. Sampling means selecting the group that
you will actually collect data from in your research. For example, if you are researching the opinions of
students in your university, you could survey a sample of 100 students.
In statistics, sampling allows you to test a hypothesis about the characteristics of a population.
When you conduct research about a group of people, it’s rarely possible to collect data from every person
in that group. Instead, you select a sample. The sample is the group of individuals who will actually
participate in the research.
To draw valid conclusions from your results, you have to carefully decide how you will select a sample
that is representative of the group as a whole. This is called a sampling method. There are two primary
types of sampling methods that you can use in your research:
• Probability sampling involves random selection, allowing you to make strong statistical inferences
about the whole group.
• Non-probability sampling involves non-random selection based on convenience or other criteria,
allowing you to easily collect data.
First, you need to understand the difference between a population and a sample, and identify the target
population of your research.
• The population is the entire group that you want to draw conclusions about.
• The sample is the specific group of individuals that you will collect data from.
The population can be defined in terms of geographical location, age, income, or many other
characteristics.
It can be very broad or quite narrow: maybe you want to make inferences about the whole adult population
of your country; maybe your research focuses on customers of a certain company, patients with a specific
health condition, or students in a single school.
It is important to carefully define your target population according to the purpose and practicalities of your
project.
If the population is very large, demographically mixed, and geographically dispersed, it might be difficult
to gain access to a representative sample. A lack of a representative sample affects the validity of your
results, and can lead to several research biases, particularly sampling bias.
Example: Sampling frame You are doing research on working conditions at a social media marketing
company. Your population is all 1000 employees of the company. Your sampling frame is the company’s
HR database, which lists the names and contact details of every employee.
Probability sampling means that every member of the population has a chance of being selected. It is
mainly used in quantitative research. If you want to produce results that are representative of the whole
population, probability sampling techniques are the most valid choice.
To conduct this type of sampling, you can use tools like random number generators or other techniques
that are based entirely on chance.
Example: Simple random samplingYou want to select a simple random sample of 1000 employees of a
social media marketing company. You assign a number to every employee in the company database from
1 to 1000, and use a random number generator to select 100 numbers.
2. Systematic sampling
Systematic sampling is similar to simple random sampling, but it is usually slightly easier to conduct.
Every member of the population is listed with a number, but instead of randomly generating numbers,
individuals are chosen at regular intervals.
Example: Systematic samplingAll employees of the company are listed in alphabetical order. From the
first 10 numbers, you randomly select a starting point: number 6. From number 6 onwards, every 10th
person on the list is selected (6, 16, 26, 36, and so on), and you end up with a sample of 100 people.
If you use this technique, it is important to make sure that there is no hidden pattern in the list that might
skew the sample. For example, if the HR database groups employees by team, and team members are
listed in order of seniority, there is a risk that your interval might skip over people in junior roles, resulting
in a sample that is skewed towards senior employees.
3. Stratified sampling
Stratified sampling involves dividing the population into subpopulations that may differ in important
ways. It allows you draw more precise conclusions by ensuring that every subgroup is properly represented
in the sample.
To use this sampling method, you divide the population into subgroups (called strata) based on the relevant
characteristic (e.g., gender identity, age range, income bracket, job role).
Based on the overall proportions of the population, you calculate how many people should be sampled
from each subgroup. Then you use random or systematic sampling to select a sample from each subgroup.
4. Cluster sampling
Cluster sampling also involves dividing the population into subgroups, but each subgroup should have
similar characteristics to the whole sample. Instead of sampling individuals from each subgroup, you
randomly select entire subgroups.
If it is practically possible, you might include every individual from each sampled cluster. If the clusters
themselves are large, you can also sample individuals from within each cluster using one of the techniques
above. This is called multistage sampling.
This method is good for dealing with large and dispersed populations, but there is more risk of error in the
sample, as there could be substantial differences between clusters. It’s difficult to guarantee that the
sampled clusters are really representative of the whole population.
Example: Cluster samplingThe company has offices in 10 cities across the country (all with roughly the
same number of employees in similar roles). You don’t have the capacity to travel to every office to collect
your data, so you use random sampling to select 3 offices – these are your clusters.
In a non-probability sample, individuals are selected based on non-random criteria, and not every
individual has a chance of being included.
This type of sample is easier and cheaper to access, but it has a higher risk of sampling bias. That means
the inferences you can make about the population are weaker than with probability samples, and your
conclusions may be more limited. If you use a non-probability sample, you should still aim to make it as
representative of the population as possible.
Non-probability sampling techniques are often used in exploratory and qualitative research. In these types
of research, the aim is not to test a hypothesis about a broad population, but to develop an initial
understanding of a small or under-researched population.
1. Convenience sampling
A convenience sample simply includes the individuals who happen to be most accessible to the researcher.
This is an easy and inexpensive way to gather initial data, but there is no way to tell if the sample is
representative of the population, so it can’t produce generalizable results. Convenience samples are at risk
for both sampling bias and selection bias.
Example: Convenience samplingYou are researching opinions about student support services in your
university, so after each of your classes, you ask your fellow students to complete a survey on the topic.
This is a convenient way to gather data, but as you only surveyed students taking the same classes as you
at the same level, the sample is not representative of all the students at your university.
Voluntary response samples are always at least somewhat biased, as some people will inherently be more
likely to volunteer than others, leading to self-selection bias.
Example: Voluntary response samplingYou send out the survey to all students at your university and a lot
of students decide to complete it. This can certainly give you some insight into the topic, but the people
who responded are more likely to be those who have strong opinions about the student support services,
so you can’t be sure that their opinions are representative of all students.
3. Purposive sampling
This type of sampling, also known as judgement sampling, involves the researcher using their expertise
to select a sample that is most useful to the purposes of the research.
It is often used in qualitative research, where the researcher wants to gain detailed knowledge about a
specific phenomenon rather than make statistical inferences, or where the population is very small and
specific. An effective purposive sample must have clear criteria and rationale for inclusion. Always make
sure to describe your inclusion and exclusion criteria and beware of observer bias affecting your
arguments.
Example: Purposive sampling You want to know more about the opinions and experiences of disabled
students at your university, so you purposefully select a number of students with different support needs
in order to gather a varied range of data on their experiences with student services.
4. Snowball sampling
If the population is hard to access, snowball sampling can be used to recruit participants via other
participants. The number of people you have access to “snowballs” as you get in contact with more people.
The downside here is also representativeness, as you have no way of knowing how representative your
sample is due to the reliance on participants recruiting others. This can lead to sampling bias.
Example: Snowball sampling You are researching experiences of homelessness in your city. Since there
is no list of all homeless people in the city, probability sampling isn’t possible. You meet one person who
agrees to participate in the research, and she puts you in contact with other homeless people that she knows
in the area.
Data collection tools are the devices or instruments for gathering data, such as a paper questionnaire or
computer-assisted interviewing system.In addition, here are some of the data collection techniques used
by the Data Collection Tools:
• Interviews
• Questionnaires
• Case Studies
• Usage Data
• Checklists
• Surveys
• Observations
• Documents and records
• Focus groups
• Oral histories
Different Data Collection tools use different techniques as their working principles and not all the tools
are capable of functioning on all these techniques. These tools are developed especially for gathering
specific types of information by applying individual data collection methods. Consider the following
attributes before utilizing a Data Collection Tool:
Variable Type: Consider the type of information you want to collect, your research niche, and the ultimate
objectives of the research.
Study design: Select the approach you’ll follow to collect this information.
Data collection technique: Decide techniques and tools you prefer to collect data
Sample data: Decide the place you want to collect data. This actually refers to the population to be
sampled. Also, figure out which part of the population will be included in your investigation.
Sample size: Consider how many subjects you want to include in your study
Sample design: Also, think about the way you’ll select the sample
Now depending on the problem statement, the data collection method is broadly classified into two
categories-
Above all, primary data collection is the process of gathering raw data by researchers directly from main
sources through surveys, interviews, or experiments. Now, it can be further classified into two categories-
Quantitative Data Collection Methods: Quantitative methods use mathematical calculations to deduce
useful data and present them in numbers. For instance, you can take the questionnaire with close-ended
questions. It produces figures after mathematical calculation. Also, methods of correlation and regression,
mean, mode and median.
Qualitative Data Collection Methods: Qualitative research methods usually work based on non-
quantifiable elements such as the feeling or emotions of the researcher. In addition, it doesn’t require any
mathematical calculation to collect any numerical data. For instance, an example of this method can be an
open-ended questionnaire.
Interviews
The researcher asks questions of a large sampling of people, either by direct interviews or means of mass
communication such as by phone or mail. This method is by far the most common means of data gathering.
Projective data gathering is an indirect interview, used when potential respondents know why they're being
asked questions and hesitate to answer. For instance, someone may be reluctant to answer questions about
their phone service if a cell phone carrier representative poses the questions. With projective data
gathering, the interviewees get an incomplete question, and they must fill in the rest, using their opinions,
feelings, and attitudes.
Delphi Technique
The Oracle at Delphi, according to Greek mythology, was the high priestess of Apollo’s temple, who gave
advice, prophecies, and counsel. In the realm of data collection, researchers use the Delphi technique by
gathering information from a panel of experts. Each expert answers questions in their field of specialty,
and the replies are consolidated into a single opinion.
Focus Groups
Focus groups, like interviews, are a commonly used technique. The group consists of anywhere from a
half-dozen to a dozen people, led by a moderator, brought together to discuss the issue.
Questionnaires
Questionnaires are a simple, straightforward data collection method. Respondents get a series of questions,
either open or close-ended, related to the matter at hand.
Secondary data is the type of data that has already been collected by another person or organization for a
different purpose, e.g. reporting or research. You can collect these data from magazines, newspapers,
books, blogs, journals, etc.
Unlike primary data collection, there are no specific collection methods. Instead, since the information
has already been collected, the researcher consults various data sources, such as:
• Financial Statements
• Sales Reports
• Retailer/Distributor/Deal Feedback
• Customer Personal Information (e.g., name, address, age, contact info)
• Business Journals
• Government Records (e.g., census, tax records, Social Security info)
• Trade/Business Magazines
• The internet
In the Secondary Data collection process, you have all the data available that someone analyses before.
Compare to primary data collection this is much less expensive and easier to collect. It may be either
published data or unpublished data.
• Government publications
• Websites
• Public records
• Historical and statistical documents
• Business documents
• Technical and trade journals
• Podcast
• Diaries
• Letters
• Unpublished biographies
However, depending on your area of research, opportunity, niche type, and ultimate project goal you can
pick any of these data collection methods to make some productive decisions.
Interview
An interview is a qualitative research method that relies on asking questions in order to collect data.
Interviews involve two or more people, one of whom is the interviewer asking the questions.
There are several types of interviews, often differentiated by their level of structure.
Interviews are commonly used in market research, social science, and ethnographic research.
Structured interviews have predetermined questions in a set order. They are often closed-ended,
featuring dichotomous (yes/no) or multiple-choice questions. While open-ended structured interviews
exist, they are much less common. The types of questions asked make structured interviews a
predominantly quantitative tool.
Asking set questions in a set order can help you see patterns among responses, and it allows you to easily
compare responses between participants while keeping other factors constant. This can mitigate research
biases and lead to higher reliability and validity. However, structured interviews can be overly formal, as
well as limited in scope and flexibility. Structured interviews may be a good fit for your research if:
• You feel very comfortable with your topic. This will help you formulate your questions most
effectively.
• You have limited time or resources. Structured interviews are a bit more straightforward to analyze
because of their closed-ended nature, and can be a doable undertaking for an individual.
• Your research question depends on holding environmental conditions between participants constant.
Semi-structured interviews are a blend of structured and unstructured interviews. While the interviewer
has a general plan for what they want to ask, the questions do not have to follow a particular phrasing or
order.
Semi-structured interviews are often open-ended, allowing for flexibility, but follow a predetermined
thematic framework, giving a sense of order. For this reason, they are often considered “the best of both
worlds.”
However, if the questions differ substantially between participants, it can be challenging to look for
patterns, lessening the generalizability and validity of your results.
• You have prior interview experience. It’s easier than you think to accidentally ask a leading question
when coming up with questions on the fly. Overall, spontaneous questions are much more difficult
than they may seem.
• Your research question is exploratory in nature. The answers you receive can help guide your future
research.
An unstructured interview is the most flexible type of interview. The questions and the order in which
they are asked are not set. Instead, the interview can proceed more spontaneously, based on the
participant’s previous answers.
Unstructured interviews are by definition open-ended. This flexibility can help you gather detailed
information on your topic, while still allowing you to observe patterns between participants.
However, so much flexibility means that they can be very challenging to conduct properly. You must be
very careful not to ask leading questions, as biased responses can lead to lower reliability or even
invalidate your research.
• You have a solid background in your research topic and have conducted interviews before.
• Your research question is exploratory in nature, and you are seeking descriptive data that will deepen
and contextualize your initial hypotheses.
• Your research necessitates forming a deeper connection with your participants, encouraging them to
feel comfortable revealing their true opinions and emotions.
A focus group brings together a group of participants to answer questions on a topic of interest in a
moderated setting. Focus groups are qualitative in nature and often study the group’s dynamic and body
language in addition to their answers. Responses can guide future research on consumer products and
services, human behavior, or controversial topics.
Focus groups can provide more nuanced and unfiltered feedback than individual interviews and are easier
to organize than experiments or large surveys. However, their small size leads to low external validity and
the temptation as a researcher to “cherry-pick” responses that fit your hypotheses.
• Your research focuses on the dynamics of group discussion or real-time responses to your topic.
• Your questions are complex and rooted in feelings, opinions, and perceptions that cannot be answered
with a “yes” or “no.”
• Your topic is exploratory in nature, and you are seeking information that will help you uncover new
questions or future research ideas.
Structured • Can be used for quantitative • Researcher can’t ask additional questions for
interview research more clarification or nuance
• Data can be compared • Limited scope: you might miss out on interesting
• High reliability and validity data
• Time-effective for the • At risk of response bias
interviewer and the respondent • Due to the restricted answer options, people
might have to choose the “best fit”
Semi-structured • Can be used in quantitative • Lower validity than the structured interview
interview research • At risk of Hawthorne effect, observer
• Relatively high validity bias, recall bias, and social desirability bias
• You can ask additional questions • You need to have good conversational skills to
if needed get the most out of the interview
• Preparation is time-consuming
Unstructured • You can ask additional • Low reliability and validity
interview questions if needed • You need to have excellent conversational skills
• Respondents might feel more at to keep the interview going
ease • At risk of Hawthorne effect, observer
• You can collect rich, qualitative bias, recall bias, and social desirability bias
data • Easy to get sidetracked
• Can be used if little is known • Hard to compare data
about the topic • Preparation is very time-consuming
Focus group • Efficient method, since you • You can ask a limited number of questions due
interview multiple people at to time constraints
once • You need good conversational and leadership
• Respondents are often more at skills
ease • There is a higher risk of observer bias, recall
• Relatively cost-efficient bias, and social desirability bias
• Easier to discuss difficult • You can’t guarantee confidentiality or
topics other ethical considerations, since there are
multiple people present
There’s a reason why companies like Facebook and Google collect data to study their customers. It can
be helpful when you want to know what your customers are looking for. These two companies aren’t alone
in their quest for data. Any business that wants to grow wants information, which means that there is now
a gold rush of sorts happening with everyone trying to get more data than the other guy by way of social
media and other avenues.
What the tools out there for collecting and gathering data can do is way more than just give you more
customers. They can help you understand and study your customers, who may prove invaluable down the
line as you move your business forward.
If any campaign doesn’t go as your expectations, and it ends up not working well in the market, even with
all of the efforts put into it, you can deploy scraping or data collection tools to help you analyze why this
may have happened. And that’s important so that if there is any particular area where a product can be
improved or modified before being re-launched or replicated, your time & money isn’t wasted as well by
repeating something time after time that doesn’t get the desired result.
Word Association
The researcher gives the respondent a set of words and asks them what comes to mind when they hear
each word.
Sentence Completion
Researchers use sentence completion to understand what kind of ideas the respondent has. This tool
involves giving an incomplete sentence and seeing how the interviewee finishes it.
Role-Playing
Respondents are presented with an imaginary situation and asked how they would act or react if it was
real.
In-Person Surveys
Online/Web Surveys
These surveys are easy to accomplish, but some users may be unwilling to answer truthfully, if at all.
Mobile Surveys
These surveys take advantage of the increasing proliferation of mobile technology. Mobile collection
surveys rely on mobile devices like tablets or smartphones to conduct surveys via SMS or mobile apps.
Phone Surveys
No researcher can call thousands of people at once, so they need a third party to handle the chore.
However, many people have call screening and won’t answer.
Observation
Sometimes, the simplest method is the best. Researchers who make direct observations collect data
quickly and easily, with little intrusion or third-party bias. Naturally, it’s only effective in small-scale
situations.
The first thing that we need to do is decide what information we want to gather. We must choose the
subjects the data will cover, the sources we will use to gather it, and the quantity of information that we
would require. For instance, we may choose to gather information on the categories of products that an
average e-commerce website visitor between the ages of 30 and 45 most frequently searches for.
The process of creating a strategy for data collection can now begin. We should set a deadline for our data
collection at the outset of our planning phase. Some forms of data we might want to continuously collect.
We might want to build up a technique for tracking transactional data and website visitor statistics over
the long term, for instance. However, we will track the data throughout a certain time frame if we are
tracking it for a particular campaign. In these situations, we will have a schedule for when we will begin
and finish gathering data.
We will select the data collection technique that will serve as the foundation of our data gathering plan at
this stage. We must take into account the type of information that we wish to gather, the time period during
which we will receive it, and the other factors we decide on to choose the best gathering strategy.
4. Gather Information
Once our plan is complete, we can put our data collection plan into action and begin gathering data. In our
DMP, we can store and arrange our data. We need to be careful to follow our plan and keep an eye on
how it's doing. Especially if we are collecting data regularly, setting up a timetable for when we will be
checking in on how our data gathering is going may be helpful. As circumstances alter and we learn new
details, we might need to amend our plan.
It's time to examine our data and arrange our findings after we have gathered all of our information. The
analysis stage is essential because it transforms unprocessed data into insightful knowledge that can be
applied to better our marketing plans, goods, and business judgments. The analytics tools included in our
DMP can be used to assist with this phase. We can put the discoveries to use to enhance our business once
we have discovered the patterns and insights in our data.
There are some prevalent challenges faced while collecting data, let us explore a few of them to understand
them better and avoid them.
The main threat to the broad and successful application of machine learning is poor data quality. Data
quality must be your top priority if you want to make technologies like machine learning work for you.
Let's talk about some of the most prevalent data quality problems in this blog article and how to fix them.
Inconsistent Data
When working with various data sources, it's conceivable that the same information will have
discrepancies between sources. The differences could be in formats, units, or occasionally spellings. The
introduction of inconsistent data might also occur during firm mergers or relocations. Inconsistencies in
data have a tendency to accumulate and reduce the value of data if they are not continually resolved.
Organizations that have heavily focused on data consistency do so because they only want reliable data to
support their analytics.
Data Downtime
Data is the driving force behind the decisions and operations of data-driven businesses. However, there
may be brief periods when their data is unreliable or not prepared. Customer complaints and subpar
analytical outcomes are only two ways that this data unavailability can have a significant impact on
businesses. A data engineer spends about 80% of their time updating, maintaining, and guaranteeing the
integrity of the data pipeline. In order to ask the next business question, there is a high marginal cost due
to the lengthy operational lead time from data capture to insight.
Schema modifications and migration problems are just two examples of the causes of data downtime. Data
pipelines can be difficult due to their size and complexity. Data downtime must be continuously
monitored, and it must be reduced through automation.
Ambiguous Data
Even with thorough oversight, some errors can still occur in massive databases or data lakes. For data
streaming at a fast speed, the issue becomes more overwhelming. Spelling mistakes can go unnoticed,
formatting difficulties can occur, and column heads might be deceptive. This unclear data might cause a
number of problems for reporting and analytics.
Duplicate Data
Streaming data, local databases, and cloud data lakes are just a few of the sources of data that modern
enterprises must contend with. They might also have application and system silos. These sources are likely
to duplicate and overlap each other quite a bit. For instance, duplicate contact information has a substantial
impact on customer experience. If certain prospects are ignored while others are engaged repeatedly,
marketing campaigns suffer. The likelihood of biased analytical outcomes increases when duplicate data
are present. It can also result in ML models with biased training data.
While we emphasize data-driven analytics and its advantages, a data quality problem with excessive data
exists. There is a risk of getting lost in an abundance of data when searching for information pertinent to
your analytical efforts. Data scientists, data analysts, and business users devote 80% of their work to
finding and organizing the appropriate data. With an increase in data volume, other problems with data
quality become more serious, particularly when dealing with streaming data and big files or databases.
Inaccurate Data
For highly regulated businesses like healthcare, data accuracy is crucial. Given the current experience, it
is more important than ever to increase the data quality for COVID-19 and later pandemics. Inaccurate
information does not provide you with a true picture of the situation and cannot be used to plan the best
course of action. Personalized customer experiences and marketing strategies underperform if your
customer data is inaccurate.
Data inaccuracies can be attributed to a number of things, including data degradation, human mistake, and
data drift. Worldwide data decay occurs at a rate of about 3% per month, which is quite concerning. Data
integrity can be compromised while being transferred between different systems, and data quality might
deteriorate with time.
Hidden Data
The majority of businesses only utilize a portion of their data, with the remainder sometimes being lost in
data silos or discarded in data graveyards. For instance, the customer service team might not receive client
data from sales, missing an opportunity to build more precise and comprehensive customer profiles.
Missing out on possibilities to develop novel products, enhance services, and streamline procedures is
caused by hidden data
2.10 Questionnaire
A questionnaire is a list of questions or items used to gather data from respondents about their attitudes,
experiences, or opinions. Questionnaires can be used to
collect quantitative and/or qualitative information.
Questionnaires are commonly used in market research as well as in the social and health sciences. For
example, a company may ask for feedback about a recent customer service experience, or psychology
researchers may investigate health risk perceptions using questionnaires.
Designing a questionnaire means creating valid and reliable questions that address your research
objectives, placing them in a useful order, and selecting an appropriate method for administration.
But designing a questionnaire is only one component of survey research. Survey research also involves
defining the population you’re interested in, choosing an appropriate sampling method, administering
questionnaires, data cleansing and analysis, and interpretation.
Sampling is important in survey research because you’ll often aim to generalize your results to the
population. Gather data from a sample that represents the range of views in the population for externally
valid results. There will always be some differences between the population and the sample, but
minimizing these will help you avoid several types of research bias, including sampling
bias, ascertainment bias, and undercoverage bias.
Self-administered questionnaires
Self-administered questionnaires can be delivered online or in paper-and-pen formats, in person or through
mail. All questions are standardized so that all respondents receive the same questions with identical
wording.
• Cost-effective
• Easy to administer for small and large groups
• Anonymous and suitable for sensitive topics
• Self-paced
Researcher-administered questionnaires
Researcher-administered questionnaires are interviews that take place by phone, in-person, or online
between researchers and respondents.
• Help you ensure the respondents are representative of your target audience
• Allow clarifications of ambiguous or unclear questions and answers
• Have high response rates because it’s harder to refuse an interview when personal attention is given
to respondents
Using closed-ended questions limits your responses, while open-ended questions enable a broad range of
answers. You’ll need to balance these considerations with your available time and resources.
Closed-ended questions
Closed-ended, or restricted-choice, questions offer respondents a fixed set of choices to select from.
Closed-ended questions are best for collecting data on categorical or quantitative variables.
Categorical variables can be nominal or ordinal. Quantitative variables can be interval or ratio.
Understanding the type of variable and level of measurement means you can perform
appropriate statistical analyses for generalizable results.
It’s best to include categories that cover all possible answers and are mutually exclusive. There should be
no overlap between response items.
In binary or dichotomous questions, you’ll give respondents only two options to choose from.
White
Black or African American
American Indian or Alaska Native
Asian
Native Hawaiian or Other Pacific Islander
15 or younger
16–35
36–60
61–75
76 or older
Likert scale questions collect ordinal data using rating scales with 5 or 7 points.
Example: Likert-type questionsHow satisfied or dissatisfied are you with your online shopping experience
today?
Very dissatisfied
Somewhat dissatisfied
Neither satisfied nor dissatisfied
Somewhat satisfied
Very satisfied
When you have four or more Likert-type questions, you can treat the composite data as quantitative data
on an interval scale. Intelligence tests, psychological scales, and personality inventories use multiple
Likert-type questions to collect interval data.
With interval or ratio scales, you can apply strong statistical hypothesis tests to address your research
aims.
To solve these problems, you can make questions partially closed-ended, and include an open-ended
option where respondents can fill in their own answer.
Open-ended questions
Open-ended, or long-form, questions allow respondents to give answers in their own words. Because there
are no restrictions on their choices, respondents can answer in ways that researchers may not have
otherwise considered. For example, respondents may want to answer “multiracial” for the question on
race rather than selecting from a restricted list.
They require more time and effort from respondents, which may deter them from completing the
questionnaire.
For researchers, understanding and summarizing responses to these questions can take a lot of time and
resources. You’ll need to develop a systematic coding scheme to categorize answers, and you may also
need to involve other researchers in data analysis for high reliability.
Question wording
Question wording can influence your respondents’ answers, especially if the language is unclear,
ambiguous, or biased. Good questions need to be understood by all respondents in the same way (reliable)
and measure exactly what you’re interested in (valid).
For readability and clarity, avoid jargon or overly complex language. Don’t use double negatives because
they can be harder to understand.
Use a mix of both positive and negative frames to avoid research bias, and ensure that your question
wording is balanced wherever possible.
Unbalanced questions focus on only one side of an argument. Respondents may be less likely to oppose
the question if it is framed in a particular direction. It’s best practice to provide a counter argument within
the question as well.
Unbalanced Balanced
Do you favor…? Do you favor or oppose…?
Do you agree that…? Do you agree or disagree that…?
It’s best to keep your questions short and specific to your topic of interest.
The average daily work commute in the US takes 54.2 minutes and costs $29 per day. Since 2020,
working from home has saved many employees time and money. Do you favor flexible work-from-
home policies even after it’s safe to return to offices?
Experts agree that a well-balanced diet provides sufficient vitamins and minerals, and multivitamins
and supplements are not necessary or effective. Do you agree or disagree that multivitamins are helpful
for balanced nutrition?
Rating scales have been popularly used by brands to collect customer feedback on product or service
reviews. These questions are easy to recognize and understand and sometimes respondents don’t even
need to read the question. We see smiley or star ratings and know what to do. This type of scale is one of
the most commonly used questionnaire types for online and offline surveys. It consists of close-ended
questions along with a set of categories as options for respondents. It helps gain information on the
qualitative and quantitative attributes.
The most common example is the Likert scale, star rating, and slider. For instance, when you visit an
online shopping site, it asks you to rate your shopping experience.
It is a popular choice for conducting market research. It can gather more relative information about a
product or certain aspects of the product. Leverage the best market research software that offers you
various types of rating scale questions.
The scale is commonly used to gain feedback or to evaluate. It can be used to gain insight into the
performance of a product, employee satisfaction or skill, customer service performance, etc.
It is divided into two categories: ordinal scale and interval scale. Some data are measured at the ordinal
level, and some at the interval level.
Ordinal Scale: An ordinal scale gathers data by putting them in a rank without a degree of difference.
Interval Scale: An interval scale measures data with equal distance between two adjacent attributes.
Robust online survey tools should allow you to create interactive surveys with rating questions to keep
the respondents engaged.
Now that we have learned what it is and the two categories of the collected data, let’s look into the different
types.
These six scales gather data based on the categories mentioned above.
1. Numeric scale.
2. Verbal scale.
3. Slider scale.
4. Likert scale.
5. Graphic scale.
6. Descriptive scale.
Researchers should ensure that the survey software they use enables them to create surveys with
various question types. We have explained these six types in detail to help you determine the right time to
use the right question.
Numeric rating scale or NRS
A numeric scale uses numbers to identify the items in a scale. However, not all numbers need to have an
attribute attached to them.
For example, you can ask your target audience to rate your product from 1 to 5 on a scale. You can put 1
as totally dissatisfied and 5 as totally satisfied.
Verbal rating scale or VRS:
Verbal scales are used for pain assessment. Also known as verbal pain scores and verbal descriptor scale
compiles a number of statements describing pain intensity and duration.
For instance, when you go to a dentist, you are asked to rate the intensity of your tooth pain. At that
time, you receive a scale with items like “none,” “mild,” “moderate,” “severe,” and “very severe.”
Visual analog scale(VAS) or Slider scale:
The idea behind VAS is to let the audience select any value from the scale between two endpoints. In the
scale, only the endpoints have attributes allotted to numbers, and the rest of the scale is empty.
Often just called a slider scale, the audience can rate whatever they want without being restricted to
particular characteristics or rank.
For example, a scale rating ranges from extremely easy to extremely difficult, with no other value
allotted.
Likert scale:
A Likert scale is a useful tool for effective market research to receive feedback on a wide range of
psychometric attributes. The agree-disagree scale is particularly useful when your intention is to gather
information on frequency, experience, quality, likelihood, etc.
For example, to evaluate employee satisfaction with company policies, a Likert scale is a good tool to
use.
Graphic rating scale:
Instead of numbers imagine using pictures, such as stars or smiley faces to ask your customers and
audience to rate.
The stars and smiley faces can generate the same value as a number.
Descriptive scale:
In certain surveys or research, a numeric scale may not help much. A descriptive scale explains each
option for the respondent.
It contains a thorough explanation for the purpose of gathering information with deep insights.
You can use these six types in your surveys to make it an engaging and fun experience for the survey
takers. Robust online survey tools offer a diverse range of question types, such as rating scales, ranking
scales, MCQs, etc.
You can include a text box question after your rating scale question so that respondents can get the
opportunity to expand on their previous answers. This will help you better understand why someone gave
you the rating they did.
Step 3: Share on the Best Channels
Once you are happy with the look and feel of your rating scale questionnaire, you must share it with your
target audience. With ProProfs Survey Maker, you can share your survey as a link via email, social media
or embed it directly on your website. Moreover, you can even monitor your responses in real-time
• List the characteristics that define who or what you will be collecting data about (eg age groups,
roles, activities involved in, education level).
• List the characteristics that define where this data will be collected (eg specific departments or
divisions, geographical locations, specific offices or places of work).
• List the characteristics that define when this data will be collected (eg during November, all the
time, for the next 3 years, until an improvement is achieved).
• Use these lists to define the scope of your data collection: your ‘target population’.
• Check and refine your scope definition by testing it with examples of people, things, places or
times that are out of scope.
• Define how reliable you want the data to be (eg how small a change in your measures do you want to
be able to reliably detect?). This may already be recorded in your Performance Measure Definitions.
• Nominate any demographic or classification (or drilling) variables that you want to use in analysis of
your data (eg do you want to have averages or percentages by geographic location or age group or
department or gender?). This may already be recorded in your Performance Measure Definitions.
• Discuss what kind of results you are expecting, in terms of the range of data values you think you are
likely to get (eg are customers likely to rate their satisfaction mostly at 3 or 4 on your 5 point
satisfaction scale, or are they likely to be more spread out on the scale?).
• Explore logistical constraints of collecting data from your target population e.g. accessibility, cost and
data integrity.
• Use the above four decisions (and a survey statistician or other assistance) to decide whether or not a
sample will be more cost effective than a census. And if you have chosen to go with a sample, get
professional help so you don’t inadvertently make it completely useless:
• Identify a survey statistician or other assistance in survey sampling. It’s a science, not an art.
• Decide whether or not it will be stratified (ie your total sample is really a collection of smaller samples
based on your demographic or classification variable, which may be geographic location, age group,
department or gender). Stratifying a sample can sometimes be a way to reduce the overall sample size
or improve the overall reliability of the results.
• Select a sample size (or sample sizes, if stratifying) that will deliver the reliability you require.
• Select your sample using a random method – not a convenient method like quotas or volunteers – or
else you run the risk of bias, where the data you get is not representative of your target population.
• Decide the basic method of data collection you want (or can afford), such as self-completion, telephone
interview, face to face interview, focus group, or automated (if possible).
• Formulate questions or constructs around the set of data items you listed at Step 1. Give consideration
to the type of construct that will give you the data you need, such as open-ended questions, yes/no
questions, multiple choice, rating scales, option lists, etc.
• Sequence the questions or constructs in a logical order.
• Check the language and wording of your questions or constructs to remove ambiguity and “fluff”.
Give consideration to providing concise instructions for how to respond to each construct.
• Design a layout for arranging your questions in a readable and usable way. Give consideration to the
medium you will use (such as web page, computer data entry screen, paper, etc.), how you align things
on the “page”, how you use white space to stop it looking like a huge blob of text, how you use contrast
to make questions stand apart from instructions and the response area (eg the option list, the rating
scale, etc.).
• Test your questionnaire or form on a handful of people ideally those who will collect the data or
provide the data. The obvious problems won’t be obvious to you. Absorb their feedback for ideas on
making the questionnaire more relevant, understandable and usable.
• Identify the trigger that will let people know that data has to be collected. It might be a customer phone
call, a specific event occurring or finishing, an activity starting.
• Identify how the data will be captured, such as which database will it be entered into.
• List the steps that you think will be involved in the data collection procedure, from the trigger to the
capture of the data. Note down who will take a role in which step.
• Draw a flowchart (or cross-functional process map) that shows the flow of the steps through time,
against who performs them. Give consideration to the expected time frames within which each step
should be performed.
• For each step, identify the resources required to perform it successfully.
• Choose a part of your data scope, based on location and time, in which you will conduct your pilot
test.
• List the outcomes that define success for this data collection process. Explore what success might look
like from each stakeholder’s point of view (eg people collecting the data, providing the data, capturing
the data, using the data, etc.). These might include impact on people’s time, data integrity, data
usability, costs and timeliness.
• Develop a Pilot Test plan for testing the data collection process, and a way to observe “evidence of
success”.
• Implement the Pilot Test plan.
• Reflect on the “evidence of success” and summarise what you learned. List changes or improvements
that you need to make to the data collection process and/or resources.
• Make the improvements to your data collection design.
• Deploy the data collection process. Continue to monitor it over time to ensure the “success outcomes”
are tracking well.
2.14Pilot study
Pilot studies can play a very important role prior to conducting a full-scale research project
A pilot study, also called a 'feasibility' study, is a small scale preliminary study conducted before any
large-scale quantitative research in order to evaluate the potential for a future, full-scale project.
Pilot studies are a fundamental stage of the research process. They can help identify design issues and
evaluate feasibility, practicality, resources, time, and cost of a study before the main research is conducted.
This enables researchers to predict an appropriate sample size, budget accordingly, and improve upon the
study design prior to performing a full-scale project.
Pilot studies also provide researchers with preliminary data so they can gain insight into the potential
results of their proposed experiment.
However, pilot studies should not be used to test hypotheses since the appropriate power and sample size
are not calculated. Rather, pilot studies should be used to assess the feasibility of participant recruitment
or study design.
By conducting a pilot study, researchers will be better prepared to face the challenges that might arise in
the larger study, and they will be more confident with the instruments they will use for data collection.
In some studies, multiple pilot studies may be needed and qualitative and/or quantitative methods may be
used.
In order to avoid bias, pilot studies are usually carried out on individuals who are as similar as possible to
the target population, but not on those who will be a part of the final sample.
• Sample size and selection. Your data needs to be representative of the target study population. You
should use statistical methods to estimate the feasibility of your sample size.
• Determine the criteria for a successful pilot study based on the objectives of your study. How will
your pilot study address these criteria?
• When recruiting subjects or collecting samples ensure that the process is practical and manageable.
• Always test the measurement instrument. This could be a questionnaire, equipment, or methods
used. Is it realistic and workable? How can it be improved?
• Data entry and analysis. Run the trial data through your proposed statistical analysis to see whether
your proposed analysis is appropriate for your data set.
• Create a flow chart of the process.
6. Provide preliminary data that you can use to improve your chances for funding and convince
stakeholders that you have the necessary skills and expertise to successfully carry out the research.
Data processing occurs when data is collected and translated into usable information. Usually performed
by a data scientist or team of data scientists, it is important for data processing to be done correctly as not
to negatively affect the end product, or data output.
Data processing starts with data in its raw form and converts it into a more readable format (graphs,
documents, etc.), giving it the form and context necessary to be interpreted by computers and utilized by
employees throughout an organization. It is a method of manipulation of data. It means the conversion of
raw data into meaningful and machine-readable content. It basically is a process of converting raw data
into meaningful information. “It can refer to the use of automated methods to process commercial data.”
Typically, this uses relatively simple, repetitive activities to process large volumes of similar information.
Raw data is the input that goes into some sort of processing to generate meaningful output.
You can choose from three primary methods of data processing based on your needs:
Manual data processing: Through this method, users process data manually, meaning they carry out
every step without using electronics or automation software. Though this method is the least expensive
and requires minimal resources, it can be time-consuming and has a higher risk of producing errors.
Mechanical data processing: Mechanical processing involves the use of machines and devices to filter
data, such as calculators, printing presses or typewriters. This method is suitable for simple data processing
endeavors and produces fewer errors but is more complex than other techniques.
Electronic data processing: Researchers process data using modern data processing software and
technologies, where they feed an instruction set to the program to analyze the data and create a yield
output. Though this method is the most expensive, it is also the fastest and most reliable for generating
accurate output.
When you use data processing in quantitative research, your company will experience a range benefits:
Editing
First step in analysis is to edit the raw data. Editing detects errors and omissions, corrects them whatever
possible. Editor’s responsibility is to guarantee that data are – accurate; consistent with the intent of the
questionnaire; uniformly entered; complete; and arranged to simplify coding and tabulation.
Editing of data may be accomplished in two ways - (i) field editing and (ii) in-house also called central
editing. Field editing is preliminary editing of data by a field supervisor on the same data as the interview.
Its purpose is to identify technical omissions, check legibility, and clarify responses that are logically and
conceptually inconsistent. When gaps are present from interviews, a call-back should be made rather than
guessing what the respondent would probably say. Supervisor is to re-interview a few respondents at least
on some pre-selected questions as a validity check. In center or in-house editing all the questionnaires
undergo thorough editing. It is a rigorous job performed by central office staff.
Coding:
Coding is the process of assigning some symbols (either) alphabetical or numerals or (both) to the answers
so that the responses can be recorded into a limited number of classes or categories. The classes should be
appropriate to the research problem being studied. They must be exhaustive and must be mutually
exclusive so that the answer can be placed in one and only one cell in a given category. Further, every
class must be defined in terms of only one concept. The coding is necessary for the efficient analysis of
data. The coding decisions should usually be taken at the designing stage of the questionnaire itself so that
the likely responses to questions are pre-coded. This simplifies computer tabulation of the data for further
analysis. It may be noted that any errors in coding should be eliminated altogether or at least be reduced
to the minimum possible level. Coding for an open-ended question is more tedious than the closed ended
question. For a closed ended or structured question, the coding scheme is very simple and designed prior
to the field work. For example, consider the following question:
What is your gender? 1.Male 2.Female
We may assign a code of `0' to male and `1' to female respondent. These codes may be specified prior to
the field work and if the codes are written on all questions of a questionnaire, it is said to be wholly
precoded.
The same approach could also be used for coding numeric data that either are not be coded into categories
or have had their relevant categories specified. For example, What is your monthly income? Here the
respondent would indicate his monthly income which may be entered in the relevant column.
The same question may also be asked like this: What is your monthly income? < Rs. 5000 Rs. 5000 -
8999 Rs. 13000 – 12999 Rs. 13000 or above
We may code the class less than Rs.5000' as , 1', Rs. 5000 - 8999' as `2', `Rs. 9000 - 12999' as `3' and `Rs.
13000 or above' as `4'.
Transcription
Qualitative research is more about exploring an idea or a topic instead of finding specific, concrete,
objective answers. Since qualitative research focuses on individuals, groups, and cultures, its data
can’t be measured with tools like thermometers and scales. Instead, qualitative data is measured with
questionnaires, observations, or interviews. All this can make qualitative data more difficult to record
and copy compared with quantitative data.
Qualitative researchers are focused on understanding a person’s opinion or why people behave in
certain ways. This means that researchers may conduct and record focus groups, group discussions,
individual interviews, or observations of a person or group of people. They may capture and preserve
the resulting data with video or audio recordings.
These interviews and other events create important data. However, that data is usually unstructured
and needs to be sorted through and organized before researchers can make sense of it.
This is where qualitative data transcription is incredibly important. Transcription creates a text -based
version of any original audio or video recording. Qualitative data transcription provides a good first
step in arranging your data systematically and analyzing it.
Once data is transcribed in a text format, it can be put into a spreadsheet or similar type of document,
or entered into a qualitative data analysis tool. After data transcription, a qualitative researcher can
read through and annotate the transcriptions, then conceptualize and organize the data to
conduct inductive or deductive analysis. From there, it is a lot easier to make connections between
different observations or findings, and then write them up in the form of a study, report, or article.
Transcripts that are used mainly to select quotes and sound bites may not need the same level of details as
transcripts which will be systematically reviewed, grouped into themes (often through a process of
coding), and analyzed for content.
The sections below provide guidance on:
1) whether to transcribe, 2) budgeting time and resources required for transcription, 3) hiring transcribers,
4) tips and best practices in transcription, and 5) ethics and confidentiality.
Tabulation
Tabulation is the systematic and logical representation of figures in rows and columns to ease comparison
and statistical analysis. It eases comparison by bringing related information closer to each other and helps
further in statistical research and interpretation. In other words, tabulation is a method of arranging or
organizing data in a tabular form. The tabulation process may be simple or complex depending upon the
type of categorization.
Tabulation is defined as the process of placing classified data in tabular form. A table is a systematic
arrangement of statiscal information in rows and columns. The rows of a table are the horizontal
arrangement of data whereas the columns of a table are the vertical arrangement of data.
Components of Tabulation
Table Number –
This is the first part of a table and is given on top of any table to facilitate easy identification and for
further reference.
Types of Tabulation
Simple Tabulation or One-way Tabulation
When the data in the table are tabulated to one characteristic, it is termed as a simple tabulation or
one-way tabulation.
For example, Data tabulation of all the people of the World is classified according to one single
characteristic like religion.
For example, Data tabulation of all the people of the World is classified by three or more characteristics
like religion, sex, and literacy, etc.
There are a few general rules that have to be followed while constructing tables. These are
• The tables illustrated should be self-explanatory, simple and attractive. There should be no need
for further explanation (details). If the volume of information is substantial, it is best to put them
down in multiple tables instead of a single one. This reduces the chances of mistakes and defeats
the purpose of forming a table. However, each table formed should also be complete in itself and
serve the purpose of analysis.
• The number of rows and columns should be kept minimal to present information in a crisp and
concise manner.
• Before tabulating, data should be approximated, wherever necessary.
• Stubs and captions should be self-explanatory and should not require the help of footnotes to be
comprehended.
• If certain positions of data collected cannot be tabulated under any stub or captions, they should
be put down in a separate table under the heading `` miscellaneous.
• Quantity and quality of data should not be compromised under any scenario while forming a table.
Some points you should consider before drafting the tables in your research report:
Tables and illustrations are important tools for efficiently communicating information and data contained
in your research paper to the readers. They present complex results in a comprehensible and organized
manner.
However, it is advisable to use tables and illustrations wisely so as to maximize the impact of your
research. They should be organized in an easy-to-understand format to convey the information and
findings collected in your research. The tabular information helps the reader identify the theme of the
study more readily. Although data tables should be complete,they should not be too complex. Instead of
including a large volume of data in a single unwieldy table, it is prudent to use small tables to help readers
identify the important information easily.
Here are some points you should consider before drafting the tables in your research paper:
For the reader, a research paper that is dense and text-heavy can be tiresome. Conversely, tables not
only encapsulate your data lucidly, but also welcome a visual relief for the reader. They add value to the
layout of your paper. Besides, and more importantly, reviewers often glance at your tabulated data and
illustrations first before delving into the text. Therefore, tables can be the initial draw for a reviewer and
deliver a positive impact about your research paper. If you can achieve an optimum balance among your
text, tables, and illustrations, it can go a long way toward being published.
Graphical Representation
Graphical Representation is a way of analysing numerical data. It exhibits the relation between data,
ideas, information and concepts in a diagram. It is easy to understand and it is one of the most important
learning strategies. It always depends on the type of information in a particular domain. There are different
types of graphical representation. Some of them are as follows:
• Line Graphs – Line graph or the linear graph is used to display the continuous data and it is useful
for predicting future events over time.
• Bar Graphs – Bar Graph is used to display the category of data and it compares the data using
solid bars to represent the quantities.
• Histograms – The graph that uses bars to represent the frequency of numerical data that are
organised into intervals. Since all the intervals are equal and continuous, all the bars have the same
width.
• Line Plot – It shows the frequency of data on a given number line. ‘ x ‘ is placed above a number
line each time when that data occurs again.
• Frequency Table – The table shows the number of pieces of data that falls within the given
interval.
• Circle Graph – Also known as the pie chart that shows the relationships of the parts of the whole.
The circle is considered with 100% and the categories occupied is represented with that specific
percentage like 15%, 56%, etc.
• Stem and Leaf Plot – In the stem and leaf plot, the data are organised from least value to the
greatest value. The digits of the least place values from the leaves and the next place value digit
forms the stems.
• Box and Whisker Plot – The plot diagram summarises the data by dividing into four parts. Box
and whisker show the range (spread) and the middle ( median) of the data.
• Suitable Title: Make sure that the appropriate title is given to the graph which indicates the subject
of the presentation.
• Measurement Unit: Mention the measurement unit in the graph.
• Proper Scale: To represent the data in an accurate manner, choose a proper scale.
• Index: Index the appropriate colours, shades, lines, design in the graphs for better understanding.
• Data Sources: Include the source of information wherever it is necessary at the bottom of the
graph.
• Keep it Simple: Construct a graph in an easy way that everyone can understand.
• Neat: Choose the correct size, fonts, colours etc in such a way that the graph should be a visual
aid for the presentation of information.
Exercise
I. Write down short answers for the following:
Learning Objectives
• To learn about Sampling measurement and techniques
• To learn qualitative and quantitative analysis techniques
• To understand different statistical testing methods
Learning Outcomes
At the end of the unit they will be able to:
• To apply different database application methods
• To apply data analytic techniques
• To choose relevant research tools and methods for analysis.
Mode of Assessment
S.No Title of Teaching Textbook/ Link Tool(Quiz/Pu
Topic (PPT/Seminar/ Reference Book (if Applicable) link zzle/
Chalk & should on Springboard/ Assignment/
Board etc.) Coursera / Nptel Seminaretc..)
NPTEL Link:
Sampling & https://onlinecourses.npt
William G Zikmund, Barry J
Measurement Chalk and el.ac.in/noc23_mg54/uni
Babin, Jon C.Carr,
board/PPT t?unit=26&lesson=31
AtanuAdhikari,Mitch Griffin,
1.
Business Researchmethods, A
https://onlinecourses.npt
South Asian Perspective, 8th
Edition, Cengage Learning, el.ac.in/noc23_mg54/uni
New Delhi,2012. t?unit=44&lesson=46 Quiz
https://www.youtube.co
Analysis Chalk and James R. Evans, "Business m/watch?v=aEgIzibOKT
2. techniques board/PPT Analytics - Methods, Models s
and Decisions", Pearson Ed,
2012.
A data management plan (DMP) is a written document that describes the data you expect to acquire or
generate during the course of a research project, how you will manage, describe, analyze, and store those
data, and what mechanisms you will use at the end of your project to share and preserve your data.
You may have already considered some or all of these issues with regard to your research project, but
writing them down helps you formalize the process, identify weaknesses in your plan, and provide you
with a record of what you intend(ed) to do.
Data management is best addressed in the early stages of a research project, but it is never too late to
develop a data management plan.
Research Data can occur in a variety of formats that include, but are not limited to:
• Notebooks
• survey responses
• software and code
• measurements from laboratory or field equipment (such as IR spectra or hygrothermograph charts)
• images (such as photographs, films, scans, or autoradiograms)
• audio recordings
• physical samples
A proper DMP (Data Management Plan) is a formal plan that outlines how a business researcher intends
to manage research data during and after a research project. It includes a wide range of tasks and
procedures, such as:
• Collecting, processing, validating, and storing data
• Integrating different types of data from disparate sources, including structured and unstructured data
• Ensuring high data availability and disaster recovery
• Governing how data is used and accessed by people and apps
• Protecting and securing data and ensuring data privacy
A DMP is a living document, business research is all about discovery, and the process of doing research
sometimes requires you to shift gears and revise your intended career path. Your DMP is a living document
that you may need to alter as the course of your research changes in the study area. Remember, any time
your research plans change, you should review your DMP to ensure that it still meets your needs.
Element Description
Data A description of the information to be gathered; the nature and scale of the data
description that will be generated or collected.
Existing data A survey of existing data relevant to the project and a discussion of whether and
how these data will be integrated.
Format Formats in which the data will be generated, maintained, and made available,
including a justification for the procedural and archival appropriateness of those
formats.
Metadata A description of the metadata to be provided along with the generated data, and a
discussion of the metadata standards used.
Element Description
Storage and Storage methods and backup procedures for the data, including the physical and
backup cyber resources and facilities that will be used for the effective preservation and
storage of the research data.
Security A description of technical and procedural protections for information, including
confidential information, and how permissions, restrictions, and embargoes will be
enforced.
Responsibility Names of the individuals responsible for data management in the research project.
Intellectual Entities or persons who will hold the intellectual property rights to the data, and
property how IP will be protected if necessary. Any copyright constraints (e.g., copyrighted
rights data collection instruments) should be noted.
Access and A description of how data will be shared, including access procedures, embargo
sharing periods, technical mechanisms for dissemination and whether access will be open
or granted only to specific user groups. A timeframe for data sharing and
publishing should also be provided.
Audience The potential secondary users of the data.
Selection andA description of how data will be selected for archiving, how long the data will be
retention held, and plans for eventual transition or termination of the data collection in the
periods future.
Archiving The procedures in place or envisioned for long-term archiving and preservation of
and the data, including succession plans for the data should the expected archiving
preservation entity go out of existence.
Ethics and A discussion of how informed consent will be handled and how privacy will be
privacy protected, including any exceptional arrangements that might be needed to protect
participant confidentiality, and other ethical issues that may arise.
Budget The costs of preparing data and documentation for archiving and how these costs
will be paid. Requests for funding may be included.
Data How the data will be managed during the project, with information about version
organization control, naming conventions, etc.
Quality Procedures for ensuring data quality during the project.
Assurance
Legal A listing of all relevant federal or funder requirements for data management and
requirements data sharing.
Data management is the practice of collecting, organizing, protecting, and storing an organization’s
data so it can be analyzed for business decisions. As organizations create and consume data at unprecedented
rates, data management solutions become essential for making sense of the vast quantities of data. Today’s
leading data management software ensures that reliable, up-to-date data is always used to drive decisions.
The software helps with everything from data preparation to cataloging, search, and governance, allowing
people to quickly find the information they need for analysis.
• Data pipelines enable the automated transfer of data from one system to another.
• ETLs (Extract, Transform, Load) are built to take the data from one system, transform it, and load
it into the organization’s data warehouse.
• Data catalogs help manage metadata to create a complete picture of the data, providing a summary
of its changes, locations, and quality while also making the data easy to find.
• Data warehouses are places to consolidate various data sources, contend with the many data types
businesses store, and provide a clear route for data analysis.
• Data governance defines standards, processes, and policies to maintain data security and integrity.
• Data architecture provides a formal approach for creating and managing data flow.
• Data security protects data from unauthorized access and corruption.
• Data modeling documents the flow of data through an application or organization.
As your organization increasingly relies on data-driven decision-making, more of your people are asked
to access and analyze data. When analytics falls outside a person’s skill set, understanding naming
conventions, complex data structures, and databases can be a challenge. If it takes too much time or
effort to convert the data, analysis won’t happen and the potential value of that data is diminished or
lost.
3. Compliance requirements
Constantly changing compliance requirements make it a challenge to ensure people are using the right
data. An organization needs its people to quickly understand what data they should or should not be
using—including how and what personally identifiable information (PII) is ingested, tracked, and
monitored for compliance and privacy regulations.
Documentation levels:
• Project-level
• File-level
• Software used (include the version of the software so if future users are using a different version,
they can work through the differences and software issues that might occur)
• Context (it is essential to give any context to the project, why it was created, if hypotheses were
trying to be proved or disproved, etc.)
5. Commitment to data culture
A commitment to data culture includes making sure that your department or company’s leadership
prioritizes data experimentation and analytics. This matters when leadership and strategy are needed and
if budget or time is required to make sure that the proper training is conducted and received.
Additionally, having executive sponsorship as well as lateral buy-in will enable stronger data
collaboration across teams in your organization.
6. Data quality trust in security and privacy
Building a culture committed to data quality means a commitment to making a secure environment with
strong privacy standards. Security matters when you are working to provide secure data for internal
communications and strategy or working to build a relationship of trust with a client that you are
protecting the privacy of their data and information. Your management processes must be in place to
prove that your networks are secure and that your employees understand the critical nature of data
privacy. In today’s digital market, data security has been identified as one of the most significant
decision-making factors when companies and consumers are making their buying decisions. One data
privacy breach is one too many. Plan accordingly.
7. Invest in quality data-management software
When considering these best practices together, it is recommended, if not required, that you invest in
quality data-management software. Putting all the data you are creating into a manageable working
business tool will help you find the information you need. Then you can create the right data sets and
data-extract scheduling that works for your business needs. Data management software will work with
both internal and external data assets and help configure your best governance plan. Software like “R”
and “Tableau” offers a Data Management Add-On that can help you create a robust analytics
environment leveraging these best practices. Using a reliable software that helps you build, catalog, and
govern your data will build trust in the quality of your data and can lead to the adoption of self-service
analytics. Use these tools and best practices to bring your data management to the next level and build
your analytics culture on managed, trusted, and secure data.
Measurement
Measurement is the process of observing and recording the observations that are collected as part
of a research effort. There are two major issues that will be considered here.
First, you have to understand the fundamental ideas involved in measuring. Here we consider
two of major measurement concepts. In Levels of Measurement, I explain the meaning of the four major
levels of measurement: nominal, ordinal, interval and ratio. Then we move on to the reliability of
measurement, including consideration of true score theory and a variety of reliability estimators.
Second, you have to understand the different types of measures that you might use in social
research. We consider four broad categories of measurements. Survey research includes the design and
implementation of interviews and questionnaires. Scaling involves consideration of the major methods
of developing and implementing a scale. Qualitative research provides an overview of the broad range
of non-numerical measurement approaches. And unobtrusive measures presents a variety of
measurement methods that don’t intrude on or interfere with the context of the research.
oTo test hypotheses: unproven theories or suppositions which are the basis for further
investigation
• Advantages of sampling
o The only means of obtaining data about an infinite population (e.g. Air temperatures)
o Cost and time effective means of obtaining data about a large finite population; better data
then hastily collected data for the entire population
o Desirable when measurement is destructive or stressful (e.g. Plant sampling, some
measurements on people)
According to Kerlinger :
"Measurement is the assignment of numerals to objects or events according to rules".
The major application of such data is in the area of marketing where measurements are taken
regarding predispositions or attitudes of current and potential customers of a company. By knowing about
the attitudes of the customers, the marketing managers may take important decisions which are effective
and beneficial to the company. Various areas of marketing where measurement techniques are used are
product positioning. market segmentation, advertising message effectiveness, etc.
1) Direct Observables :
The things which can directly be observed are called direct observables. For example, by meeting an
individual the brand of his/her wrist watch can be directly observed.
2) Indirect Observables :
The things which cannot be directly observed are called 'indirect observables'. More complex and
refined observation efforts is required for observing such things. For example, minutes of earlier board
meetings of corporations can be used to observe past business decisions.
3) Constructs :
The things which cannot be observed directly or indirectly are called 'constructs'. These are the
theoretical concepts, which are developed by observing different aspects of an operation. For example,
IQ is known as a construct. It cannot be directly or indirectly observed. It is determined only by
mathematically observing answers of different test questions asked in an IQ test.
b) Duration Method :
In this duration method of an individual being involved in a particular behavior or activity within a fixed
time limit is calculated.
c) Interval Method :
In this interval method, the whole observation time limit is divided into different intervals and these
intervals are checked for a specific behavior or activity.
3) Using Multiple Observers :
The final step in measurement process is using multiple observations so as to measure inter-rater
reliability. Observations used in measurement process art as follows :
a) Naturalistic Observation :
The naturalistic observation involves observing in a natural or real environment, Here, the actual
behavior of the respondents is observed and recorded, which is free from manipulations.
b) Participant Observation :
In this type of observation, the researcher joins the group of participants as an individual participant and
therefore, observes their behavior.
c) Contrived Observation :
When a simulated environment is created to observe the natural behavior of the respondents, it is called
contrived or structured observation. This type of observation eliminates the need of observing
respondents in natural selling.
1) Questionnaires :
The questionnaire is an inventory of questions used to seek information from respondents on different
topics like
behavior, demographic and psychographic details, opinions, attitudes, beliefs, feelings, etc. The questions
are designed for a particular study and are validated before concluding.
2) Attitude Scales :
Attitude seeks responses on the feelings of respondents towards a particular object. Attitude scales can be
of different types like as follows :
i) Rating scales make a respondent to place an object on a scale which is numerically numbered.
ii) Ranking scales require the respondents to compare a set of objects and rate them
between '1' to '10', where, '1' stands at a highest position and '10' stands at a lowest position.
3) Depth Interviews :
In depth interviews, the respondents have complete freedom to express their feelings without any fear of
rejection or meeting opposition from others. The responses which we received from the respondents are
recorded in specially designed formats. This technique is used when the researcher wants to gather in-
depth information about the feelings and opinions of respondents or when the researcher wants to examine
some new issue or aspect of the study.
Many times, depth interviews are also used to provide clarity or perspective on the other gathered data. It
helps to provide a more comprehensive picture on the data that has been gathered. In depth interviews, a
technique should be used in lieu of focus interviews where It is felt that the respondents will not be
comfortable talking about the topic in a group atmosphere or where the researcher wants to differential
between individual opinions and group opinions on a topic of discussion. Depth interviews are also used
where the researcher wants to refine questions for a future study or survey.
4) Observation :
Observation is a direct technique of examining the behavior or the results of the behavior. This requires
the researcher to observe the behavior of an individual or a group of people. This observation must be
done in a natural setting and over an interval of time. The biggest advantage of this method is it increases
the credibility of the research process. It utilizes trained researchers who are unbiased regarding the
research topic. By observing the behavior formally, the observers are often able to identify attitudes and
predispositions which are often over locked by researchers. The disadvantage is that observation is a time-
taking process and the observers often find that their presence influences the behavior of the people being
observed and thus affects the reliability of the observation process.
1) Reliability :
Reliability is an important criterion for testing the measurement. When the results offered by the
measuring instrument are consistent, it is called reliable. Although reliable instrument is not necessarily a
valid instrument in its nature, but it leads to validity of the measurement.
2) Validity :
The next criterion used for evaluating the measurement is its validity. The extent of
to which a particular measuring instrument specifically measures is called its validity. It can also be
denoted as utility. It also expresses the extent to which differences described by a measuring instrument
between the two behaviors are true.
3) Practicality :
Practicality is also a criterion for testing the measuring instrument. The extent, to
which a particular measuring instrument is suitable, cost-effective and interpret-able, denotes the
practicality of the instrument.
4) Sensitivity :
The next criterion for evaluating the measurement instrument is its sensitivity. A particular measuring
instrument is said to be sensitive if all the variations in responses are effectively measured by it. Measuring
instruments dealing with 'Agree' or 'Disagree' types of responses are not so, sensitive. A little modification
is required in instruments so as to record more sensitive responses.
5) Generalisability :
Generalisability is also an important criterion for testing the measuring instrument. The ability of data
collection of an instrument from widespread respondents along with offering flexibility in its interpretation
is called generalisabilty.
6) Economy :
The choice of data collection method is also often dictated by economic factors. The rising cost of personal
interviewing first led to an increased use of telephone surveys and subsequently to the current rise in
Internet surveys. In standardized tests, the cost of test materials alone can be such a significant expense
that it encourages multiple reuses.
7) Convenience :
A measuring device passes the convenience test if it is easy to administer. A questionnaire or a
measurement scale with a set of detailed but clear instructions, with examples, is easier to complete
correctly than one that lacks these features. In a well-prepared study, it is not uncommon for the
interviewer instructions to be several times longer than the interview questions. Naturally, the more
complex the concepts and constructs, the greater is the need for clear and complete instructions.
1) Irrelevant Data :
Measurement leads to the generation of enormous data. However, it is not necessary that the data is always
relevant, the data may lack purpose of times. Sometimes, measurement forces the marketers to manipulate
the real data for their own purposes.
2) Inaccurate Response :
Respondents have a tendency of giving inaccurate responses in face-to-face interviews. It is very important
that the research activity elicits the correct response from the respondents. Now-a-days, web-based
surveys have made it possible to target large target segment, quickly and economically.
4) Training in Measurement is Rare :
Measurement requires that people have necessary skills and knowledge in particular field. However, very
few organisations invest in knowledge and skill building.
5) Delegating Measurement Strategy :
Deciding the right metrics often requires that the incumbents not only have a big picture perspective but
also the power to challenge the dominant marketing mind-sets of the organisation. This is often not
possible for middle managers but requires involvement of top management. Measurement should not be
delegated, as the quest for truth will then take a backseat in the organisation. It needs leadership and focus
in the organisation so that a congenial environment is created in the organisation.
Issue of preciseness and practical use of the research work are the main concerns for several researchers.
They are curious about the contribution of their research work in the concerned field.
By knowing the different levels of data measurement, researchers are able to choose the best method for
statistical analysis. The different levels of data measurement are: nominal, ordinal, interval, and ratio
scales
Nominal Scale
The nominal scale is a scale of measurement that is used for identification purposes. It is the coldest and
weakest level of data measurement among the four.
Sometimes known as categorical scale, it assigns numbers to attributes for easy identity. These numbers
are however not qualitative in nature and only act as labels.
The only statistical analysis that can be performed on a nominal scale is the percentage or frequency count.
It can be analyzed graphically using a bar chart and pie chart.
Nominal Scale Example
In the example below, the measurement of the popularity of a political party is measured on a nominal
scale.
Which political party are you affiliated with?
• Independent
• Republican
• Democrat
Labeling Independent as “1”, Republican as “2” and Democrat as “3” does not in any way mean any of
the attributes are better than the other. They are just used as an identity for easy data analysis.
Ordinal Scale
Ordinal Scale involves the ranking or ordering of the attributes depending on the variable being scaled.
The items in this scale are classified according to the degree of occurrence of the variable in question.
The attributes on an ordinal scale are usually arranged in ascending or descending order. It measures the
degree of occurrence of the variable.
Ordinal scale can be used in market research, advertising, and customer satisfaction surveys. It uses
qualifiers like very, highly, more, less, etc. to depict a degree.
We can perform statistical analysis like median and mode using the ordinal scale, but not mean. However,
there are other statistical alternatives to mean that can be measured using the ordinal scale.
Ordinal Scale Example
For example: A software company may need to ask its users:
How would you rate our app?
• Excellent
• Very Good
• Good
• Bad
• Poor
The attributes in this example are listed in descending order.
Interval Scale
The interval scale of data measurement is a scale in which the levels are ordered and each numerically
equal distances on the scale have equal interval difference. If it is an extension of the ordinal scale, with
the main difference being the existence of equal intervals.
With an interval scale, you not only know that a given attribute A is bigger than another attribute B, but
also the extent at which A is larger than B. Also, unlike ordinal and nominal scale, arithmetic operations
can be performed on an interval scale.
Ratio Scale
Ratio Scale is the peak level of data measurement. It is an extension of the interval scale, therefore
satisfying the four characteristics of the measurement scale; identity, magnitude, equal interval, and the
absolute zero property.
This level of data measurement allows the researcher to compare both the differences and the relative
magnitude of numbers. Some examples of ratio scales include length, weight, time, etc.
With respect to market research, the common ratio scale examples are price, number of customers,
competitors, etc. It is extensively used in marketing, advertising, and business sales.
The ratio scale of data measurement is compatible with all statistical analysis methods like the measures
of central tendency (mean, median, mode, etc.) and measures of dispersion (range, standard deviation,
etc.).
Comparative Scales
In comparative scaling, respondents are asked to make a comparison between one object and the other.
When used in market research, customers are asked to evaluate one product in direct comparison to the
others. Comparative scales can be further divided into pair comparison, rank order, constant sum, and q-
sort scales.
Paired Comparison Scale
Paired Comparison scale is a scaling technique that presents the respondents with two objects at a time
and asks them to choose one according to a predefined criterion. Product researchers use it in comparative
product research by asking customers to choose the most preferred to them in between two closely related
products.
For example, there are 3 new features in the last release of a software product. But the company is
planning to remove 1 of these features in the new release. Therefore, the product researchers are
performing a comparative analysis of the most and least preferred feature.
• Which feature is most preferred to you between the following pairs?
• Filter – Voice recorder
• Filter – Video recorder
• Voice recorder – Video recorder
Q-Sort Scale
Q-Sort scale is a type of measurement scale that uses a rank order scaling technique to sort similar objects
with respect to some criterion. The respondents sort the number of statements or attitudes into piles,
usually of 11.
The Q-Sort Scaling helps in assigning ranks to different objects within the same group, and the differences
among the groups (piles) are clearly visible. It is a fast way of facilitating discrimination among a relatively
large set of attributes.
For example, a new restaurant that is just preparing its menu may want to collect some information about
what potential customers like:
The document provided contains a list of 50 meals. Please choose 10 meals you like, 30 meals you are
neutral about (neither like nor dislike) and 10 meals you dislike.
Non-Comparative Scales
In non-comparative scaling, customers are asked to only evaluate a single object. This evaluation is totally
independent of the other objects under investigation. Sometimes called monadic or metric scale, Non-
Comparative scale can be further divided into continuous and the itemized rating scales
Likert Scale
A Likert scale is an ordinal scale with five response categories, which is used to order a list of attributes
from the best to the least. This scale uses adverbs of degree like very strongly, highly, etc. to indicate the
different levels.
Stapel Scale:
This a scale with 10 categories, usually ranging from -5 to 5 with no zero point. It is a vertical scale with
3 columns, where the attributes are placed in the middle and the least (-5) and highest (5) is in the 1st and
3rd columns respectively.
Semantic Differential Scale
This is a seven-point rating scale with endpoints associated with bipolar labels (e.g. good or bad, happy,
etc.). It can be used for marketing, advertising and in different stages of product development.
If there is more than one item being inherently investigated, it can be visualized on a table with more than
3 columns.
Frequency counts
One way data scientists can describe statistics is using frequency counts, or frequency statistics, which
describe the number of times a variable exists in a data set. For example, the number of people with blue
eyes or the number of people with a driver’s license in the sample can be counted by frequency. Other
examples include qualifications of education, such as high school diploma, a university degree or
doctorate, and categories of marital status, such as single, married or divorced.
Frequency data is a form of discrete data, as parts of the values can’t be broken down. To calculate
continuous data points, such as age, data scientists can use central tendency statistics instead. To do this,
they find the mean or average of the data point. Using the age example, this can tell them the average age
of participants in the sample.
While data scientists can draw summaries from the use of descriptive statistics and present them in an
understandable form, they can’t necessarily draw conclusions. That’s where inferential statistics come in.
Inferential statistics
Inferential statistics are used to develop a hypothesis from the data set. It would be impossible to get data
from an entire population, so data scientists can use inferential statistics to extrapolate their results. Using
these statistics, they can make generalisations and predictions about a wider sample group, even if they
haven’t surveyed them all.
An example of using inferential statistics is in an election. Even before the entire country has voted, data
scientists can use these kinds of statistics to make assumptions regarding who might win based on a
smaller sample size.
Internet Database
Limiters Can limit by document type (pdf, Can limit by date, document type,
doc) and source (gov, org, com) language, format, peer reviewed
status, full text availability, and
more.
When it comes to most popular databases to use in 2022 through a NoSQL database, there are a few things
to consider. MongoDB is the first Document Database management software that was released in 2009. It
is challenging to load and access data into RDBMS using object-oriented programming languages which
also require additional application-level mapping. Thus, to overcome this problem, Mongo was developed
to handle Document Data.
IBM DB2
IBM also offered DB2 LUW for Windows, Linux, and Unix. DB2 11.5 is the most recent release, and it
speeds up query execution.
The list of databases for mobile apps supports the relational model, but it has grown significantly in recent
years. It now supports object-relational features and non-relational forms such as JSON and XML.
Redis
It is a popular open-source database project. According to Stack Overflow’s Annual Developer Survey,
Redis is ranked as the Most Loved Database platform. As a distributed, in-memory key-value database, it
can be used. Redis can also be used as a distributed cache and message broker, with durability as an option.
Elasticsearch
Elasticsearch is an open core full-text search engine based on Lucene that was first released in 2010 by
Shay Banon. It’s a full-text search engine with a distributed, multi-tenant capability and a REST API.
It provides horizontal scaling via automatic sharing and REST API. It also supports structured and schema
less data (JSON), which is especially suited to analyze Logging or Monitoring data.
Cassandra
It is an open core, distributed, wide column store and commonly used database for an application that was
developed in 2008. This is a highly scalable database management software that is widely used by the
industries to handle massive data.
One of its main features is its decentralized database (Leaderless) having automatic replication and multi-
data center replication, leading it to become a fault-tolerant base without any failures. Cassandra has
several different operations and infrastructure. Cassandra and HBase databases go a long way and have
different use cases according to their types.
MariaDB
It is a Relational Database Management System which is compatible with MySQL Protocol and Clients.
The MySQL server can be easily replaced with MariaDB without requiring any code changes.
This management system provides columnar storage with massively parallel distributed data architecture.
In comparison to MySQL, MariaDB is more community-driven.
OrientDB
OrientDB is an open-source database with NoSQL multi-model database program that enables businesses
to leverage the capabilities of graph database management software without having to build several
systems to handle different data types.
It is a management solution with support for graph, document, key value and object-oriented database
models that improves performance and security while also allowing for scalability.
SQLite
SQLite is an open-source best SQL database with an integrated relational database management system.
It was created in the year 2000. It’s a top database that requires no configuration and doesn’t even require
a server or installation. Despite its simplicity, it contains many commonly used database management
system software functionalities to be used in mobile web development like react native.
DynamoDB
DynamoDB is a nonrelational best database from Amazon. It is a serverless database for mobile apps that
scales up and down automatically while also backing up your data.
This database program features built-in security and in-memory caching, as well as consistent latency.
Neo4j
Neo4j is an open-source, Java-based NoSQL graph database that was launched in 2007. It uses a query
language known as Cypher, labeled on its site as the most efficient and expressive way to describe
relationship queries.
In this database management system software, your data is saved as graphs rather than tables. Neo4j’s
relationship system is quick that allows you to create and use other relationships later to “shortcut” and
speed up domain data as the need arises.
Firebirdsql
Firebird is a free SQL relational database management system software that operates on Mac OS X, Linux,
Microsoft Windows, and a variety of Unix platforms.
This best free database for web applications has upgraded the multi-platform RDBMS. From firebird
memberships to sponsorship commitments, it offers a variety of financing choices.
Quantitative analysis
Quantitative analysis is often associated with numerical analysis where data is collected, classified, and
then computed for certain findings using a set of statistical methods. Data is chosen randomly in large
samples and then analyzed. The advantage of quantitative analysis the findings can be applied in a general
population using research patterns developed in the sample. This is a shortcoming of qualitative data
analysis because of limited generalization of findings.
Quantitative analysis is more objective in nature. It seeks to understand the occurrence of events and then
describe those using statistical methods. However, more clarity can be obtained by concurrently using
qualitative and quantitative methods. Quantitative analysis normally leaves the random and scarce events
in research results whereas qualitative analysis considers them.
Quantitative analysis is generally concerned with measurable quantities such as weight, length,
temperature, speed, width, and many more. The data can be expressed in a tabular form or any
diagrammatic representation using graphs or charts. Quantitative data can be classified as continuous or
discrete, and it is often obtained using surveys, observations, experiments or interviews.
There are, however, limitations in quantitative analysis. For instance, it can be challenging to uncover
relatively new concepts using quantitative analysis and that is where qualitative analysis comes into the
equation to find out “why” a certain phenomenon occurs. That is why the methods are often used
simultaneously.
Qualitative analysis
Qualitative analysis is concerned with the analysis of data that cannot be quantified. This type of data is
about the understanding and insights into the properties and attributes of objects (participants). Qualitative
analysis can get a deeper understanding of “why” a certain phenomenon occurs. The analysis can be used
in conjunction with quantitative analysis or precede it.
Unlike with quantitative analysis that is restricted by certain classification rules or numbers, qualitative
data analysis can be wide ranged and multi-faceted. And it is subjective, descriptive, non-statistical and
exploratory in nature.
Because qualitative analysis seeks to get a deeper understanding, the researcher must be well-rounded
with whichever physical properties or attributes the study is based on. Oftentimes, the researcher may
have a relationship with the participants where their characteristics are disclosed. In a quantitative analysis
the characteristics of objects are often undisclosed. The typical data analyzed qualitatively include color,
gender, nationality, taste, appearance, and many more as long as the data cannot be computed. Such data
is obtained using interviews or observations.
There are limitations in qualitative analysis. For instance, it cannot be used to generalize the population.
Small samples are used in an unstructured approach and they are non-representative of the general
population hence the method cannot be used to generalize the entire population. That is where quantitative
analysis into the factor.
To prepare data for quantitative data analysis simply means to convert it to meaningful and readable
formats, below are the steps to achieve this:
Data Validation: This is to evaluate if the data was collected correctly through the required channels
and to ascertain if it meets the set-out standards stated from the onset. This can be done by checking
if the procedure was followed, making sure that the respondents were chosen based on the research
criteria, and checking for completeness in the data.
Data Editing: Large datasets may include errors where fields may be filled incorrectly or left empty
accidentally. To avoid having a faulty analysis, data checks should be done to identify and clear out
anything that may lead to an inaccurate result.
Data Coding: This involves grouping and assigning values to data. It might mean forming tables and
structures to represent the data accurately.
Now that you are familiar with what quantitative data analysis is and how to prepare your data for analysis,
the focus will shift to the purpose of this article which is the methods and techniques of quantitative data
analysis.
• Skewness: It indicates how symmetrical a range of numbers is, showing if they cluster into a
smooth bell curve shape in the middle of the graph or if they skew towards the left or right.
2) Inferential Statistics
In quantitative analysis, the expectation is to turn raw numbers into meaningful insight using numerical
values and descriptive statistics is all about explaining details of a specific dataset using numbers, but, it
does not explain the motives behind the numbers hence, the need for further analysis using inferential
statistics.
Inferential statistics aim to make predictions or highlight possible outcomes from the analyzed data
obtained from descriptive statistics. They are used to generalize results and make predictions between
groups, show relationships that exist between multiple variables, and are used for hypothesis testing that
predicts changes or differences.
They are various statistical analysis methods used within inferential statistics, a few are discussed below.
• Cross Tabulations: Cross tabulation or crosstab is used to show the relationship that exists
between two variables and is often used to compare results by demographic groups. It uses a basic
tabular form to draw inferences between different data sets and contains data that is mutually
exclusive or has some connection with each other. Crosstabs are helpful in understanding the
nuances of a dataset and factors that may influence a data point.
• Regression Analysis: Regression analysis is used to estimate the relationship between a set of
variables. It is used to show the correlation between a dependent variable (the variable or outcome
you want to measure or predict) and any number of independent variables (factors that may have
an impact on the dependent variable). Therefore, the purpose of the regression analysis is to
estimate how one or more variables might have an effect on a dependent variable to identify trends
and patterns to make predictions and forecast possible future trends. There are many types of
regression analysis and the model you choose will be determined by the type of data you have for
the dependent variable. The types of regression analysis include linear regression, non-linear
regression, binary logistic regression, etc.
• Monte Carlo Simulation: Monte Carlo simulation also known as the Monte Carlo method is a
computerized technique of generating models of possible outcomes and showing their probability
distributions. It considers a range of possible outcomes and then tries to calculate how likely each
outcome will occur. It is used by data analysts to perform an advanced risk analysis to help in
forecasting future events and taking decisions accordingly.
• Analysis of Variance (ANOVA): This is used to test the extent to which two or more groups
differ from each other. It compares the mean of various groups and allows the analysis of multiple
groups.
• Factor Analysis: A large number of variables can be reduced into a smaller number of factors
using the factor analysis technique. It works on the principle that multiple separate observable
variables correlate with each other because they are all associated with an underlying construct. It
helps in reducing large datasets into smaller, more manageable samples.
• Cohort Analysis: Cohort analysis can be defined as a subset of behavioral analytics that operates
from data taken from a given dataset. Rather than looking at all users as one unit, cohort analysis
breaks down data into related groups for analysis where these groups or cohorts usually have
common characteristics or similarities within a defined period.
• MaxDiff Analysis: This is a quantitative data analysis method that is used to gauge customers’
preferences for purchase and what parameters rank highest than the others in the process.
• Cluster Analysis: Cluster analysis is a technique used to identify structures within a dataset.
Cluster analysis aims to be able to sort different data points into groups that are internally similar
and externally different, that is, data points within a cluster will look like each other and different
from data points in other clusters.
• Time Series Analysis: This is a statistical analytic technique used to identify trends and cycles
over time. It is simply the measurement of the same variables at different points in time like
weekly, and monthly email sign-ups to uncover trends, seasonality, and cyclic patterns. By doing
this, the data analyst can forecast how variables of interest may fluctuate in the future.
• SWOT analysis: This is a quantitative data analysis method that assigns numerical values to
indicate strengths, weaknesses, opportunities, and threats of an organization, product, or service to
show a clearer picture of competition to foster better business strategies
Qualitative data can be observed and recorded. This data type is non-numerical in nature. This type
of data is collected through methods of observations, one-to-one interviews, conducting focus groups, and
similar methods. Qualitative data in statistics is also known as categorical data – data that can be arranged
categorically based on the attributes and properties of a thing or a phenomenon.
– conversational. Mostly the open-ended questions are asked spontaneously, with the interviewer letting
the flow of the interview dictate the questions to be asked.
2. Focus groups: This is done in a group discussion setting. The group is limited to 6-10 people, and a
moderator is assigned to moderate the ongoing discussion.
Depending on the data which is sorted, the members of a group may have something in common. For
example, a researcher conducting a study on track runners will choose athletes who are track runners or
were track runners and have sufficient knowledge of the subject matter.
3. Record keeping: This method makes use of the already existing reliable documents and similar sources
of information as the data source. This data can be used in the new research. It is similar to going to a
library. There, one can go over books and other reference material to collect relevant data that can be used
in the research.
4. Process of observation: In this data collection method, the researcher immerses himself/ herself in the
setting where his respondents are, and keeps a keen eye on the participants and takes down notes. This is
known as the process of observation.
Besides taking notes, other documentation methods, such as video and audio recording, photography, and
similar methods, can be used.
5. Longitudinal studies: This data collection method is performed on the same data source repeatedly over
an extended period. It is an observational research method that goes on for a few years and, in some cases,
can go on for even decades. This data collection method aims to find correlations through an empirical
study of subjects with common traits.
6. Case studies: In this method, data is gathered by an in-depth analysis of case studies. The versatility of
this method is demonstrated in how this method can be used to analyze both simple and complex subjects.
The strength of this method is how judiciously it uses a combination of one or more qualitative data
collection methods to draw inferences.
Advantages
1. It helps in-depth analysis: The data collected provide the researchers with a detailed analysis, like
a thematic analysis of subject matters. While collecting it, the researchers tend to probe the participants
and can gather ample information by asking the right kind of questions. The data collected is used to
conclude a series of questions and answers.
2. Understand what customers think: The data helps market researchers understand their customers’
mindsets. The use of qualitative data gives businesses an insight into why a customer purchased a product.
Understanding customer language helps market research infer the data collected more systematically.
3. Rich data: Collected data can also be used to conduct future research. Since the questions asked to
collect qualitative data are open-ended questions, respondents are free to express their opinions, leading
to more information.
Disadvantages
1. Time-consuming: As collecting this data is more time-consuming, fewer people study than collecting
quantitative data. Unless time and budget allow, a smaller sample size is included.
2. Not easy to generalize: Since fewer people are studied, it is difficult to generalize the results of that
population.
3. Dependent on the researcher’s skills: This type of data is collected through one-to-one interviews,
observations, focus groups, etc. it relies on the researcher’s skills and experience to collect information
from the sample.
It is typically descriptive data and is more difficult to analyze than quantitative data. Now, you have to
decide which is the best option for your research project; remember that to obtain and analyze the
qualitative data, we need a little more time, so you should consider it in your planning.
In data science and statistics, hypothesis testing is an important step as it involves the verification of an
assumption that could help develop a statistical parameter. For instance, a researcher establishes a
hypothesis assuming that the average of all odd numbers is an even number.
In order to find the plausibility of this hypothesis, the researcher will have to test the hypothesis using
hypothesis testing methods. Unlike a hypothesis that is ‘supposed’ to stand true on the basis of little or no
evidence, hypothesis testing is required to have plausible evidence in order to establish that a statistical
hypothesis is true.
Perhaps this is where statistics play an important role. A number of components are involved in this
process. But before understanding the process involved in hypothesis testing in research methodology, we
shall first understand the types of hypotheses that are involved in the process. Let us get started!
In data sampling, different types of hypothesis are involved in finding whether the tested samples test
positive for a hypothesis or not. In this segment, we shall discover the different types of hypotheses and
understand the role they play in hypothesis testing.
Alternative Hypothesis
Alternative Hypothesis (H1) or the research hypothesis states that there is a relationship between two
variables (where one variable affects the other). The alternative hypothesis is the main driving force for
hypothesis testing.
It implies that the two variables are related to each other and the relationship that exists between them is
not due to chance or coincidence.
When the process of hypothesis testing is carried out, the alternative hypothesis is the main subject of the
testing process. The analyst intends to test the alternative hypothesis and verifies its plausibility.
Null Hypothesis
The Null Hypothesis (H0) aims to nullify the alternative hypothesis by implying that there exists no
relation between two variables in statistics. It states that the effect of one variable on the other is solely
due to chance and no empirical cause lies behind it.
The null hypothesis is established alongside the alternative hypothesis and is recognized as important as
the latter. In hypothesis testing, the null hypothesis has a major role to play as it influences the testing
against the alternative hypothesis.
Non-Directional Hypothesis
The Non-directional hypothesis states that the relation between two variables has no direction.
Simply put, it asserts that there exists a relation between two variables, but does not recognize the direction
of effect, whether variable A affects variable B or vice versa. Directional Hypothesis
The Directional hypothesis, on the other hand, asserts the direction of effect of the relationship that exists
between two variables.
Herein, the hypothesis clearly states that variable A affects variable B, or vice versa.
Statistical Hypothesis
A statistical hypothesis is a hypothesis that can be verified to be plausible on the basis of statistics.
By using data sampling and statistical knowledge, one can determine the plausibility of a statistical
hypothesis and find out if it stands true or not.
Now that we have understood the types of hypotheses and the role they play in hypothesis testing, let us
now move on to understand the process in a better manner.
In hypothesis testing, a researcher is first required to establish two hypotheses - alternative hypothesis and
null hypothesis in order to begin with the procedure.
To establish these two hypotheses, one is required to study data samples, find a plausible pattern among
the samples, and pen down a statistical hypothesis that they wish to test.
A random population of samples can be drawn, to begin with hypothesis testing. Among the two
hypotheses, alternative and null, only one can be verified to be true. Perhaps the presence of both
hypotheses is required to make the process successful.
At the end of the hypothesis testing procedure, either of the hypotheses will be rejected and the other one
will be supported. Even though one of the two hypotheses turns out to be true, no hypothesis can ever be
verified 100%.
Therefore, a hypothesis can only be supported based on the statistical samples and verified data. Here is a
step-by-step guide for hypothesis testing.
First things first, one is required to establish two hypotheses - alternative and null, that will set the
foundation for hypothesis testing.
These hypotheses initiate the testing process that involves the researcher working on data samples in order
to either support the alternative hypothesis or the null hypothesis.
Once the hypotheses have been formulated, it is now time to generate a testing plan. A testing plan or an
analysis plan involves the accumulation of data samples, determining which statistic is to be considered
and laying out the sample size. All these factors are very important while one is working on hypothesis
testing.
As soon as a testing plan is ready, it is time to move on to the analysis part. Analysis of data samples
involves configuring statistical values of samples, drawing them together, and deriving a pattern out of
these samples.
While analyzing the data samples, a researcher needs to determine a set of things
• Significance Level - The level of significance in hypothesis testing indicates if a statistical result
could have significance if the null hypothesis stands to be true.
• Testing Method - The testing method involves a type of sampling-distribution and a test statistic
that leads to hypothesis testing. There are a number of testing methods that can assist in the analysis
of data samples.
• Test statistic - Test statistic is a numerical summary of a data set that can be used to perform
hypothesis testing.
• P-value - The P-value interpretation is the probability of finding a sample statistic to be as extreme
as the test statistic, indicating the plausibility of the null hypothesis.
The analysis of data samples leads to the inference of results that establishes whether the alternative
hypothesis stands true or not. When the P-value is less than the significance level, the null hypothesis
is rejected and the alternative hypothesis turns out to be plausible.
As we have already looked into different aspects of hypothesis testing, we shall now look into the different
methods of hypothesis testing. All in all, there are 2 most common types of hypothesis testing methods.
They are as follows -
The frequentist hypothesis or the traditional approach to hypothesis testing is a hypothesis testing
method that aims on making assumptions by considering current data.
The supposed truths and assumptions are based on the current data and a set of 2 hypotheses are
formulated. A very popular subtype of the frequentist approach is the Null Hypothesis Significance
Testing (NHST).
The NHST approach (involving the null and alternative hypothesis) has been one of the most sought-
after methods of hypothesis testing in the field of statistics ever since its inception in the mid-1950s.
A much unconventional and modern method of hypothesis testing, the Bayesian Hypothesis Testing
claims to test a particular hypothesis in accordance with the past data samples, known as prior
probability, and current data that lead to the plausibility of a hypothesis.
The result obtained indicates the posterior probability of the hypothesis. In this method, the researcher
relies on ‘prior probability and posterior probability’ to conduct hypothesis testing on hand.
On the basis of this prior probability, the Bayesian approach tests a hypothesis to be true or false. The
Bayes factor, a major component of this method, indicates the likelihood ratio among the null
hypothesis and the alternative hypothesis.
The Bayes factor is the indicator of the plausibility of either of the two hypotheses that are established
for hypothesis testing.
Ho ≥ ≤ = Make sure you match the signs, so they are opposite of each
Ha < > ≠ other, unless your professor wants 𝐻𝑜 to always have a “=”.
Example:
𝐻𝑜: The mean number of GVSU students enrolled in STA215 during WINTER 2018 who
speak English as a second language is 15
𝐻𝑎: The mean number of GVSU students enrolled in STA215 during WINTER 2018 who
speak English as a second language is not equal to 15
3. State Assumptions and Check Conditions
These are the conditions that need to be met in order for the hypothesis test to be performed. If the
conditions are not met, then the results of the test are not valid.
4. Calculate the Test Statistic
The test statistic varies depending on the test performed, see statistical tests handouts for details.
5. Calculate the P-value
P-value = the probability of getting the observed test statistic or something more extreme when
𝐻𝑜 is true. P-values can be find using a calculator or a table from the 215 textbook Introductory Applied
Statistics: A variable Approach.
6. State the Conclusion
If P-value>α then fail to reject the null hypothesis.
“There is insufficient evidence to conclude [𝐻𝑎 in words] If P-
value<α then reject the null hypothesis.
“There is sufficient evidence to conclude [𝐻𝑎 in words] Remember: Never accept 𝐻𝑎 just reject 𝐻𝑜
In hypothesis testing, an analyst tests a statistical sample, with the goal of providing evidence on
the plausibility of the null hypothesis.
Statistical analysts test a hypothesis by measuring and examining a random sample of the population
being analyzed. All analysts use a random population sample to test two different hypotheses: the null
hypothesis and the alternative hypothesis.
The null hypothesis is usually a hypothesis of equality between population parameters; e.g., a null
hypothesis may state that the population mean return is equal to zero. The alternative hypothesis is
effectively the opposite of a null hypothesis (e.g., the population mean return is not equal to zero). Thus,
they are mutually exclusive, and only one can be true. However, one of the two hypotheses will always
be true.
A Type II error happens when you get false negative results: you conclude that the drug intervention didn’t
improve symptoms when it actually did. Your study may have missed key indicators of improvements or
attributed any improvements to other factors instead.
20% or greater. An effect size of 20% means that the drug intervention reduces symptoms by 20% more
than the control treatment.
However, a Type II may occur if an effect that’s smaller than this size. A smaller effect size is unlikely to
be detected in your study due to inadequate statistical power.
Statistical power is determined by:
• Size of the effect: Larger effects are more easily detected.
• Measurement error: Systematic and random errors in recorded data reduce power.
• Sample size: Larger samples reduce sampling error and increase power.
• Significance level: Increasing the significance level increases power.
To (indirectly) reduce the risk of a Type II error, you can increase the sample size or the significance level.
a Type I error is usually worse. In practical terms, however, either type of error could be worse depending
on your research context.
A Type I error means mistakenly going against the main statistical assumption of a null hypothesis. This
may lead to new policies, practices or treatments that are inadequate or a waste of resources. In contrast,
a Type II error means failing to reject a null hypothesis. It may only result in missed opportunities to
innovate, but these can also have important practical consequences.
-square test
The chi-square test for association (contingency) is a standard measure for association between two
categorical variables. The chi-square test, unlike Pearson’s correlation coefficient or Spearman rho, is a
measure of the significance of the association rather than a measure of the strength of the association.
A simple and generic example follows. If scientists were studying the relationship between gender
and political party, then they could count people from a random sample belonging to the various
combinations: female-Democrat, female-Republican, male-Democrat, and male-Republican. The
scientists could then perform a chi-square test to determine whether there was a significant
disproportionate membership among those groups, indicating an association between gender and political
party.
Similarly, an odds ratio is an appropriate measure of strength of association for categorical data
derived from a case-control study. The odds ratio is often interpreted the same way that relative risk is
interpreted when measuring the strength of the association, although this is somewhat controversial when
the risk factor being studied is common.
Additional methods
There are a number of other measures of association for a variety of circumstances. For example, if one
variable is measured on an interval/ratio scale and the second variable is dichotomous (has two outcomes),
then the point-biserial correlation coefficient is appropriate. Other combinations of data types (or
transformed data types) may require the use of more specialized methods to measure the association in
strength and significance.
Other types of association describe the way data are related but are usually not investigated for their own
interest. Serial correlation (also known as autocorrelation), for instance, describes how in a series of events
occurring over a period of time, events that occur closely spaced in time tend to be more similar than those
more widely spaced. The Durbin-Watson test is a procedure to test the significance of such correlations.
If the correlations are evident, then it may be concluded that the data violate the assumptions of
independence, rendering many modeling procedures invalid. A classical example of this problem occurs
when data are collected over time for one particular characteristic. For example, if an epidemiologist
wanted to develop a simple linear regression for the number of infections by month, there would
undoubtedly be serial correlation: each month’s observation would depend on the prior month’s
observation. This serial effect (serial correlation) would violate the assumption of independent
observations for simple linear regression and accordingly render the parameter estimates for simple linear
regression as not credible.
Inferring causality
Perhaps the greatest danger with all measures of association is the temptation to infer causality. Whenever
one variable causes changes in another variable, an association will exist. But whenever an association
exists, it does not always follow that causation exists. In epidemiology, the ability to infer causation from
an association is often weak because many studies are observational and subject to
various alternative explanations for their results. Even when randomization has been applied, as in clinical
trials, inference of causation is often limited.
Testing Association
Measure of association, in statistics, any of various factors or coefficients used to quantify a
relationship between two or more variables. Measures of association are used in various fields of research
but are especially common in the areas of epidemiology and psychology, where they frequently are used
to quantify relationships between exposures and diseases or behaviours.
A measure of association may be determined by any of several different analyses,
including correlation analysis and regression analysis. (Although the
terms correlation and association are often used interchangeably, correlation in a stricter sense refers to
linear correlation, and association refers to any relationship between variables.) The method used to
determine the strength of an association depends on the characteristics of the data for each variable.
Data may be measured on an interval/ratio scale, an ordinal/rank scale, or a nominal/categorical
scale. These three characteristics can be thought of as continuous, integer, and qualitative categories,
respectively.
• Enter Data Range with Labels: enter the range containing the data and the labels; default is the
range selected on the worksheet.
• Alpha: this is the confidence level; 1-alpha is the confidence interval. Default is 0.05 for 95%
confidence.
• Row Title: Enter the title of the rows; default is value in first row of selected data.
• Column Title: enter the title of the columns; default is value in row above second column.
• Select OK to generate the results.
• Select Cancel to end the program.
Chi Square Test for Association Output
The output from the Chi-Square Test for Association is shown below. An explanation of the output
follows.
The top part of the output contains the data with the observed and expected values as well as the
contribution of each to χ2. The row and column totals are also given.
The middle portion of the output contains the following:
• Alpha (entered)
• The calculated χ2
• The degrees of freedom
• The critical χ2 value based on alpha and the degrees of freedom
• The calculated p value (will be in red if ≤ alpha)
The bottom portion of the output contains the residuals. The residuals are the difference between the
observed and the expected values. The conclusion is then given based on the values of alpha and the p
value. The null hypothesis (that the variables are not associated) is rejected if the p value < alpha.
3.5.2 T-test
The t test tells you how significant the differences between group means are. It lets you know if those
differences in means could have happened by chance. The t test is usually used when data sets follow
a normal distribution but you don’t know the population variance.
For example, you might flip a coin 1,000 times and find the number of heads follows a normal distribution
for all trials. So you can calculate the sample variance from this data, but the population variance is
unknown. Or, a drug company may want to test a new cancer drug to find out if it improves life expectancy.
In an experiment, there’s always a control group (a group who are given a placebo, or “sugar pill”). So
while the control group may show an average life expectancy of +5 years, the group taking the new drug
might have a life expectancy of +6 years. It would seem that the drug might work. But it could be due to
a fluke. To test this, researchers would use a Student’s t-test to find out if the results are repeatable for an
entire population.
In addition, a t test uses a t-statistic and compares this to t-distribution values to determine if the results
are statistically significant.
However, note that you can only uses a t test to compare two means. If you want to compare three or more
means, use an ANOVA instead.
The T Score.
The t score is a ratio between the difference between two groups and the difference within the groups.
• Larger t scores = more difference between groups.
• Smaller t score = more similarity between groups.
A t score of 3 tells you that the groups are three times as different from each other as they are within each
other. So when you run a t test, bigger t-values equal a greater probability that the results are repeatable.
Every t-value has a p-value to go with it. A p-value from a t test is the probability that the results from
your sample data occurred by chance. P-values are from 0% to 100% and are usually written as a decimal
(for example, a p value of 5% is 0.05). Low p-values indicate your data did not occur by chance. For
example, a p-value of .01 means there is only a 1% probability that the results from an experiment
happened by chance.
Calculating the Statistic / Test Types
• Equal Variance is conducted when the sample size in each group or population is the same, or
the variance of the two data sets is similar.
• Unequal Variance is used when the variance and the number of samples in each group are
different.
• An Independent Samples t-test compares the means for two groups.
• A Paired sample t-test compares means from the same group at different times (say, one
year apart).
• A One sample t-test tests the mean of a single group against a known mean.
You can find the steps for an independent samples t test here. But you probably don’t want to calculate
the test by hand (the math can get very messy. Use the following tools to calculate the t test:
• How to do a T test in Excel.
• T test in SPSS.
• T-distribution on the TI 89.
• T distribution on the TI 83.
The null hypothesis for the independent samples t-test is μ1 = μ2. So it assumes the means are equal. With
the paired t test, the null hypothesis is that the pairwise difference between the two tests is equal (H0: µd =
0).
Paired Samples T Test By hand
Example question: Calculate a paired t test by hand for the following data:
Step 2: Add up all of the values from Step 1 then set this number aside for a moment.
Step 6: Subtract 1 from the sample size to get the degrees of freedom. We have 11 items. So 11 – 1 = 10.
Step 7: Find the p-value in the t-table, using the degrees of freedom in Step 6. But if you don’t have a
specified alpha level, use 0.05 (5%).
So for this example t test problem, with df = 10, the t-value is 2.228.
Step 8: In conclusion, compare your t-table value from Step 7 (2.228) to your calculated t-value (-2.74).
The calculated t-value is greater than the table value at an alpha level of .05. In addition, note that the p-
value is less than the alpha level: p <.05. So we can reject the null hypothesis that there is no difference
between means.
However, note that you can ignore the minus sign when comparing the two t-values as ± indicates the
direction; the p-value remains the same for both directions.
3.5.3 Regression
Regression is a statistical method used in finance, investing, and other disciplines that attempts
to determine the strength and character of the relationship between one dependent variable (usually
denoted by Y) and a series of other variables (known as independent variables).
Also called simple regression or ordinary least squares (OLS), linear regression is the most common form
of this technique. Linear regression establishes the linear relationship between two variables based on
a line of best fit. Linear regression is thus graphically depicted using a straight line with the slope defining
how the change in one variable impacts a change in the other. The y-intercept of a linear regression
relationship represents the value of one variable when the value of the other is zero. Non-linear
regression models also exist, but are far more complex.
Regression analysis is a powerful tool for uncovering the associations between variables observed
in data, but cannot easily indicate causation. It is used in several contexts in business, finance, and
economics. For instance, it is used to help investment managers value assets and understand the
relationships between factors such as commodity prices and the stocks of businesses dealing in those
commodities.
Regression as a statistical technique should not be confused with the concept of regression to the mean
(mean reversion).
The ultimate benefit of regression analysis is to determine which independent variables have the most
effect on a dependent variable. It also helps to determine which factors can be ignored and those that
should be emphasized.
For example, vegetable prices have increased in a certain areas. The reason behind the event can be
anything from natural calamities to transport and supply chain management. When an analyst decides to
put it out on a graph, he will pick up the most obvious reason, heavy rainfall in the agricultural regions.
Once the model is built, he can then add the rest of the affecting input variables into the picture based on
their occurrence and significance.
where:
Y= The dependent variable you are trying to predictor explain
X= The explanatory (independent) variable(s) you are using to predict or associate with Y
a= The y-intercept
b= (beta coefficient) is the slope of the explanatory variable (s)
u= The regression residual or error term
3.5.4 Analysis of Variance (ANOVA) is a statistical formula used to compare variances across the
means (or average) of different groups. A range of scenarios use it to determine if there is any difference
between the means of different groups. The t- and z-test methods developed in the 20th century were
used for statistical analysis until 1918, when Ronald Fisher created the analysis of variance method.
ANOVA is also called the Fisher analysis of variance, and it is the extension of the t- and z-tests. The
term became well-known in 1925, after appearing in Fisher's book, "Statistical Methods for Research
Workers." It was employed in experimental psychology and later expanded to subjects that were more
complex.
ANOVA used in statistics that splits an observed aggregate variability found inside a data set into two
parts: systematic factors and random factors. The systematic factors have a statistical influence on the
given data set, while the random factors do not. Analysts use the ANOVA test to determine the influence
that independent variables have on the dependent variable in a regression study.
For example, to study the effectiveness of different diabetes medications, scientists design and experiment
to explore the relationship between the type of medicine and the resulting blood sugar level. The sample
population is a set of people. We divide the sample population into multiple groups, and each group
receives a particular medicine for a trial period. At the end of the trial period, blood sugar levels are
measured for each of the individual participants. Then for each group, the mean blood sugar level is
calculated. ANOVA helps to compare these group means to find out if they are statistically different or if
they are similar.
The outcome of ANOVA is the ‘F statistic’. This ratio shows the difference between the within group
variance and the between group variance, which ultimately produces a figure which allows a conclusion
that the null hypothesis is supported or rejected. If there is a significant difference between the groups, the
null hypothesis is not supported, and the F-ratio will be larger.
ANOVA Terminology
Dependent variable: This is the item being measured that is theorized to be affected by the independent
variables.
Independent variable/s: These are the items being measured that may have an effect on the dependent
variable.
A null hypothesis (H0): This is when there is no difference between the groups or means. Depending on
the result of the ANOVA test, the null hypothesis will either be accepted or rejected.
An alternative hypothesis (H1): When it is theorized that there is a difference between groups and means.
Factors and levels: In ANOVA terminology, an independent variable is called a factor which affects the
dependent variable. Level denotes the different values of the independent variable that are used in an
experiment.
Fixed-factor model: Some experiments use only a discrete set of levels for factors. For example, a fixed-
factor test would be testing three different dosages of a drug and not looking at any other dosages.
Random-factor model: This model draws a random value of level from all the possible values of the
independent variable.
SST = SSB +
Total Df3 = N – 1
SSE
The ANOVA test is the initial step in analyzing factors that affect a given data set. Once the test is
finished, an analyst performs additional testing on the methodical factors that measurably contribute to
the data set's inconsistency. The analyst utilizes the ANOVA test results in an f-test to generate additional
data that aligns with the proposed regression models.
The ANOVA test allows a comparison of more than two groups at the same time to determine whether a
relationship exists between them. The result of the ANOVA formula, the F statistic (also called the F-
ratio), allows for the analysis of multiple groups of data to determine the variability between samples and
within samples.
If no real difference exists between the tested groups, which is called the null hypothesis, the result of
the ANOVA's F-ratio statistic will be close to 1. The distribution of all possible values of the F statistic
is the F-distribution. This is actually a group of distribution functions, with two characteristic numbers,
called the numerator degrees of freedom and the denominator degrees of freedom.
Example:
A researcher might, for example, test students from multiple colleges to see if students from one
of the colleges consistently outperform students from the other colleges. In a business application, an
R&D researcher might test two different processes of creating a product to see if one process is better
than the other in terms of cost efficiency.
The type of ANOVA test used depends on a number of factors. It is applied when data needs to be
experimental. Analysis of variance is employed if there is no access to statistical software resulting in
computing ANOVA by hand. It is simple to use and best suited for small samples. With many
experimental designs, the sample sizes have to be the same for the various factor level combinations.
ANOVA is helpful for testing three or more variables. It is similar to multiple two-sample t-tests.
However, it results in fewer type I errors and is appropriate for a range of issues. ANOVA groups
differences by comparing the means of each group and includes spreading out the variance into diverse
sources. It is employed with subjects, test groups, between groups and within groups.
One way ANOVA analysis of variance is commonly called a one-factor test in relation to the dependent
subject and independent variable. Statisticians utilize it while comparing the means of groups
independent of each other using the Analysis of Variance coefficient formula. A single independent
variable with at least two levels. The one way Analysis of Variance is quite similar to the t-test.
A one-way ANOVA (analysis of variance) has one categorical independent variable (also known as a
factor) and a normally distributed continuous (i.e., interval or ratio level) dependent variable.
The independent variable divides cases into two or more mutually exclusive levels, categories, or groups.
The one-way ANOVA test for differences in the means of the dependent variable is broken down by the
levels of the independent variable.
An example of a one-way ANOVA includes testing a therapeutic intervention (CBT, medication, placebo)
on the incidence of depression in a clinical sample.
Both the One-Way ANOVA and the Independent Samples t-Test can compare the means for two groups.
However, only the One-Way ANOVA can compare the means across three or more groups.
The pre-requisite for conducting a two-way anova test is the presence of two independent variables; one
can perform it in two ways:
Two way ANOVA with replication or repeated measures analysis of variance – is done when the two
independent groups with dependent variables do different tasks.
Two way ANOVA sans replication – is done when one has a single group that they have to double test
like one tests a player before and after a football game.
Moreover, one must meet the following conditions for its applications:
• The population should be near normal distribution.
• All samples should be independent.
• Variances of the population have to be equal.
• There should be an equal-sized sample in the group.
• ANOVA is to test for differences among the means of the population by examining the amount of
variation within each sample, relative to the amount of variation between the samples. Analyzing
variance tests the hypothesis that the means of two or more populations are equal.
• In a regression study, analysts use the ANOVA test to determine the impact of independent variables
on the dependent variable.
Uses of ANOVA
• To test correlation and regression.
• To study the homogeneity in the case of two-way classification.
• To test the significance of the multiple correlation coefficient.
• To test the linearity of regression.
Advantages of ANOVA:
• Whereas the Z test can only be used to compare the means of two populations, the ANOVA test
can be used to compare the means of three or more populations.
• If there are two different treatments/factors affecting the dependent variable, then we can use the
two way ANOVA test to analyse the effect due to each treatment. The test will tell us whether the
difference due to each of the treatments is significnt or not.
• We can check equality of three or more populations means by repeatedly applying Z test pairwise.
But this increases the Type 1 error. On the other hand, the same comparison done by the ANOVA
technique, has low Type 1 error. This means that ANOVA test is a statistically powerful test.
• The ANOVA method is used in clinical testing to check for the effectiveness of experimental
medicines.
• The calculations involved in calculating the F statistics are easy and involve elementary operations
such as squaring, summing up and dividing. The decision criteria for rejecting or accepting the
null hypothesis are easy to understand.
Disadvantages of ANOVA:
• It often happens that the parent populations do not follow the normal distribution. For example,
the lifetimes of products generally follow the Weibull distribution. In such cases the ANOVA
method cannot be used. For instance, we may not be able to use the ANOVA technique to compare
the mean life of bulbs produced by three companies.
• If there are two or more dependent variables then the ANOVA technique cannot be applied. The
MANOVA test must be used in such cases.
• It rarely happens that all the population variances are equal. If the assumption of homoscedasticity
is violated then the use of ANOVA cannot be justified.
• If the null hypothesis is rejected we can only conclude that some population means are unequal.
The ANOVA test does not tell us anything about which of them are unequal. Some post hoc
tests must be carried out in order to know about that.
• Checking all the background assumptions such as independence, normality, homoscedasticity, etc.
is in and of itself a difficult task.
• Although the calculations involved are elementary they are still tedious to perform by hand. But
ANOVA tests are usually carries out using statistical software so this is not a huge barrier.
When engaged in brainstorming ideas, how can you avoid information overload? Affinity
diagrams help leaders and teams visually organise numerous ideas and data points in a simplified visual
form.
3. Conjoint analysis
Market researchers will be familiar with this stats-oriented technique. Conjoint analysis is often used to
help forecast how accepting consumers will be of proposed changes. It’s also used to help determine a
brand’s positioning in the market. Conjoint analysis is a survey-based technique that helps reveal how
consumers might value the attributes (such as the function, features or benefits) of a product or service.
4. Cost/benefit analysis
This technique is solely for making decisions of a financial nature. It can also be used to acquire any
financial data you might wish to use as part of another decision-making technique. Key use: financial
decision making
6. Game theory
Game theory can help business leaders make decisions by putting themselves in the shoes of a third party
– e.g. a client, competitor or consumer – and anticipating what their actions, reactions and motives might
be. Playing out these scenarios in a safe hypothetical space can help a leader make decisions based on the
outcomes of the game.
Game theory can be a useful decision-making technique if you need to take into account exterior third
parties like competitors, clients or legislative authorities. It was invented in 1944 by John von Neumann
and Oskar Morgenstern. Since then, around 20 leading scientists and economists have been awarded the
Nobel Prize in Economic Sciences for their evolution of game theory, so it’s clearly an important aspect
of modern decision-making and analysis.
Game theory models the strategic interaction between two or more players in a situation that involves set
rules. Games are typically co-operative or non co-operative. There are various Players, Actions, Payoffs
and Information (known as PAPI). Players formulate strategies and try to gain as much benefit as they
can. Key use: negotiating with third parties or making strategic decisions that involve third parties
7. Heuristic methods
Heuristic methods are used to refine a product or service over time, using trial and error. They’re not
accurate, but they can get the job done. Heuristic methods often have the benefit of saving time and
resource and reducing initial expenditure.
For example, decisions relating to a website launch could be resolved using heuristic methods, if it’s
determined the website doesn’t need to be perfect on launch. It can meet 80% of desired requirements,
and be improved in terms of content and function over time. Key use: save time on making decisions
where a perfect result isn’t required first time round
Linear programming uses maths to represent requirements as linear equations. It is, for example, useful
when making decisions relating to problems cropping up in operations research. Key use: making the most
of limited resources
we buy a latte, we consider everything from cost and quality to the environmental friendliness of the
packaging.
Multiple criteria decision analysis enables leaders to weigh up different criteria. How does one measure
apples against cheese, or cost against comfort? The following MCDA steps can help. Key use: making
business decisions that reach a compromise between logical analysis and intuition
11. Multi-voting
When making decisions as a group, use multi-voting to weed out lower priority options. You can then use
other, more exacting techniques to make key decisions on a smaller (and therefore more manageable)
group of options.
Multi-voting can be as simple as giving each member of the group a list of ideas and telling them they can
only vote for the three ideas they consider most important or beneficial. Tally up the votes to determine
which options are deemed most important by the group. Key use: making fair and balanced group
decisions
Descriptive Statistics: This type of analysis is used to summarize and describe the main features of a
dataset, such as the mean, median, mode, and standard deviation.
Inferential Statistics: This type of analysis involves making generalizations about a population based on
a sample of data. Common techniques include hypothesis testing, regression analysis, and analysis of
variance (ANOVA).
Exploratory Data Analysis (EDA): This type of analysis is used to summarize and visualize the main
features of a dataset, identify patterns, and detect outliers.
Machine Learning: This type of analysis involves training a model on a dataset to make predictions or
classify data points. Common techniques include decision trees, random forests, support vector machines,
and neural networks.
Network Analysis: This type of analysis is used to analyze relationships between nodes in a network,
such as social networks, transportation networks, and biological networks.
Time Series Analysis: This type of analysis is used to analyze data that is collected over time, such as
stock prices, weather patterns, and economic indicators.
It's important to choose the right analysis technique for your data and research questions, as using the
wrong technique can lead to incorrect results and invalid conclusions.
Draw conclusions and make recommendations based on the results. Market surveys can provide
valuable information for businesses and organizations, such as insights into consumer preferences and
behaviors, market size and growth, and competitive landscape. It is important to ensure that the survey is
designed and conducted in a rigorous and scientific manner to ensure the validity and reliability of the
results.
To be simple Market survey is the survey research and analysis of the market for a particular
product/service which includes the investigation into customer inclinations. A study of various customer
capabilities such as investment attributes and buying potential. Market surveys are tools to directly collect
feedback from the target audience to understand their characteristics, expectations, and requirements.
Marketers develop new and exciting strategies for upcoming products/services but there can be no
assurance about the success of these strategies. For these to be successful, marketers should determine the
category and features of products/services that the target audiences will readily accept. By doing so, the
success of a new avenue can be assured.
Most marketing managers depend on market surveys to collect information that would catalyze the market
research process. Also, the feedback received from these surveys can be contributory in product marketing
and feature enhancement.
Market surveys collect data about a target market such as pricing trends, customer requirements,
competitor analysis, and other such details.
2. Market Surveys for exploring various aspects of the target market: Get information about
factors such as market size, demographic information such as age, gender, family income etc. to lay
out a roadmap by considering growth rate of the market, positioning, and average market share.
3. Market Surveys to probe into purchase procedure: How does a customer deciding on making a
purchase? What are the factors that convert product awareness into sales? This type of market survey
will unveil awareness, information, free trial, purchase, and repeat.
4. Market Surveys to establish buyer persona: These surveys are to build a buyer persona by
knowing about customer preferences, inclination, and capabilities of purchasing a product.
5. Market Surveys to measure customer loyalty: What is the degree of loyalty that the customers
have towards and organization? The answer to this question can be obtained by conducting a market
survey.
6. Market Surveys to analyze a new feature or concept: It is essential for an organization to include
market-compliant features and concepts. By carrying out a market survey to understand which features
to launch, will help all the teams involved in the feature development process to do that with proper
research.
7. Market Surveys for competitor analysis: Healthy competition is always good for an
organization’s progress. Market surveys done with the motive of competitor analysis will produce
results about how does the target market weigh the organization’s products/services in comparison to
the others in the market.
8. Market Surveys to understand the impact of sales activities: Sales activities are the backbone
of an organization and it becomes crucial to keep track of these activities. Market surveys for sales
activities will produce a report of the impact of sales activities, whether their frequency needs to
increase or any changes the audiences think should be inculcated in the sales process.
9. Market Surveys to assess prices for new products/services: Affordability of products also is an
aspect that drives the market for organizations. Price ranges, product variants to cater multiple price
ranges, target customers for each of the products etc.
10. Market Surveys for evaluation of customer service: Good customer service can lead to
enhanced satisfaction levels among customers. Factors such as time taken to resolve issues, the scope
of improvement, best practices of customer service etc.
using market surveys and segmentation can be a source of creating concrete and long-term marketing
plans.
3. Figure out customer expectations and needs: All marketing activities revolve around customer
acquisition. All small and large organizations require market surveys to gather feedback from their
target audience regularly, using customer satisfaction tools such as Net Promoter Score, Customer
Effort Score, Customer Satisfaction Score (CSAT) etc. Organizations can analyze customer feedback
to measure customer experience, satisfaction, expectations etc.
4. Accurate launch of new products: Market surveys are influential in understanding where to test
new products or services. Market surveys provide marketers a platform to analyze the scope of success
of upcoming products and make changes in strategizing the product according to the feedback they
receive.
5. Obtain information about customer demographics: Customer demographics form the core of any
business and market surveys can be used to obtain intricate and sensitive details about customer
demographics such as race, ethnicity or family income.
Exercise
I. Write down short answers for the following:
12. What is data management plan. (pg. 95)
13. List out challenges in data management. (pg. 97 & 98)
14. List out measurement and its functions. (pg. 103)
15. Describe likert’s scale. (pg. 109)
16. Define descriptive statistics (pg.)
17. What are database application? (pg. 110)
18. What is cross tabulation? (pg.115)
19. What are Quantitative and qualitative analysis? (pg.113)
20. Define hypothesis. (pg.119)
21. Write down types of hypothesis. (pg.119)
Introduction to Data analytics- Types of Data analytics- Data visualization for decision making- Graphical techniques,
skewness, kurtosis, formatting data- different operations using chart, pivot chart and formatting plot area-Data wrangling -
Business Problem Solving across different domains- Dash boarding Fundamentals
Learning Objectives
• To learn about Types of data analytics
• To understand data visualization
• To understand data wrangling
Learning Outcomes
At the end of the unit they will be able to:
• To apply different operations of data formatting
• To apply dash boarding fundamentals
• To solve business problems in various field with data analytics
Mode of Assessment
S.No Title of Teaching Textbook/ Link Tool(Quiz/Pu
Topic (PPT/Seminar/ Reference Book (if Applicable) link zzle/
Chalk & should on Springboard/ Assignment/
Board etc.) Coursera / Nptel Seminaretc..)
NPTEL Link:
Introduction https://onlinecourses.npt
William G Zikmund, Barry J
to data Chalk and el.ac.in/noc23_mg54/uni
Babin, Jon C.Carr,
analytics and board/PPT t?unit=26&lesson=31
AtanuAdhikari,Mitch Griffin,
1. visualization
Business Researchmethods, A
https://onlinecourses.npt
South Asian Perspective, 8th
el.ac.in/noc23_mg54/uni
Edition, Cengage Learning,
New Delhi,2012. t?unit=26&lesson=30 Quiz
https://www.youtube.co
Chalk and James R. Evans, "Business m/watch?v=1LgkR1R1A
2. Data board/PPT Analytics - Methods, Models CU
formatting and Decisions", Pearson Ed,
2012.
4. DATA ANALYTICS
4.1 Introduction to Data analytics
Data analytics is the process of inspecting, examining, cleaning, transforming, and modeling data with the
goal of discovering useful information, drawing conclusions, and supporting decision-making. It involves
the use of statistical and computational methods to extract insights and knowledge from data. The field of
data analytics has grown significantly in recent years with the explosion of data being generated in various
industries, such as healthcare, finance, and e-commerce. It enables organizations to make data-driven
decisions by transforming data into actionable insights. There are many software tools available for data
analytics, such as R, Python, SQL, SAS, and Tableau. The choice of tool depends on the type of data being
analyzed and the specific requirements of the project. Data analytics plays a critical role in making
informed decisions and solving complex problems by converting data into meaningful insights.
Data Analytics can be used to improve various aspects of business, including marketing, operations,
finance, and human resources. For example, a company can use data analytics to optimize its pricing
strategy, understand consumer behavior, improve supply chain management, or identify areas for cost
savings.
• Data Collection: The first step is to gather the data needed for analysis. This can come from various
sources, such as databases, spreadsheets, or external sources like social media or surveys.
• Data Cleaning: The next step is to clean and preprocess the data to ensure that it is accurate,
consistent, and in a format that can be easily analyzed. This involves removing missing values,
dealing with outliers, and correcting errors.
• Data Exploration: In this step, the data is explored and visualized to get a better understanding of
its structure and characteristics. This can involve creating histograms, scatter plots, and other types
of visualizations to identify patterns and relationships.
• Data Modeling: In this step, statistical and machine learning models are applied to the data to
identify patterns and make predictions. These models can be used to perform regression analysis,
clustering, classification, and more.
• Data Interpretation: The final step is to interpret the results of the analysis and draw meaningful
conclusions. This can involve presenting the findings in a clear and concise manner, making
recommendations, and communicating the results to relevant stakeholders.
Overall, data analytics plays a crucial role in today's data-driven world, helping organizations to make
better decisions and improve their operations.
• Descriptive Analytics: This type of analytics focuses on summarizing and describing the
characteristics of data, such as central tendencies and dispersion. Descriptive analytics can be used
to answer questions such as "What happened?" and "What is happening?"
• Diagnostic Analytics: This type of analytics goes beyond descriptive analytics and is used to
understand the reasons behind events. For example, it can be used to identify the root cause of a
problem or to determine why a particular trend is occurring.
• Predictive Analytics: This type of analytics uses historical data and statistical models to make
predictions about future events. Predictive analytics can be used to answer questions such as "What
will happen?" and "What is likely to happen?"
• Prescriptive Analytics: This type of analytics goes beyond predictive analytics and provides
guidance and recommendations for decision-making. It can be used to answer questions such as
"What should we do?" and "What actions should we take?"
• Big Data Analytics: This type of analytics focuses on analyzing large and complex datasets, often
using distributed processing systems such as Hadoop. Big data analytics can be used to uncover
insights that might not be apparent from smaller datasets.
• Text Analytics: This type of analytics focuses on analyzing unstructured text data, such as
customer reviews or social media posts. Text analytics can be used to extract sentiment, topics,
and key terms from text data.
• Visual Analytics: This type of analytics focuses on the visual representation of data and the use of
interactive visualizations to facilitate data exploration and discovery. Visual analytics can be used
to uncover patterns and relationships that might not be immediately apparent from tabular data.
These are just a few examples of the types of data analytics that exist. The specific approach and
techniques used will depend on the type of data being analyzed, the questions being asked, and the goals
of the analysis.
may describe trends or patterns, it won’t dig deeper. For this, we need tools
like diagnostic and predictive analytics. Nevertheless, descriptive analytics is exceptionally useful for
introducing yourself to unknown data.
The following kinds of data can all be summarized using descriptive analytics:
• Financial statements
• Surveys
• Social media engagement
• Website traffic
• Scientific findings
• Weather reports
• Traffic data
Essentially, any data set can be summarized in one way or another, meaning descriptive analytics has
an almost endless number of applications. We’ll explore these in more depth in section five. First, let’s
look at some of the benefits and drawbacks of descriptive analytics.
There are several types of descriptive statistics, including:
• Measures of Central Tendency: These are statistics that summarize the "typical" or "average" value of
a dataset. The most common measures of central tendency are the mean (average), median (middle
value), and mode (most frequently occurring value).
• Measures of Dispersion: These are statistics that describe how spread out the values in a dataset are.
The most common measures of dispersion are the range (difference between the largest and smallest
values), variance (a measure of how far the values in a dataset are from the mean), and standard
deviation (a measure of the average deviation of the values from the mean).
• Measures of Shape: These are statistics that describe the shape of the distribution of the data. The most
common measures of shape are skewness (a measure of the asymmetry of a distribution) and kurtosis
(a measure of the peakedness of a distribution).
• Percentiles and Quartiles: These are statistics that divide a dataset into equal parts and describe the
values that correspond to specific portions of the data. For example, the median is the 50th percentile,
and quartiles divide the data into four equal parts.
• Frequency Distributions: These are tables or graphs that summarize the number of occurrences of each
value in a dataset. Frequency distributions can be used to identify patterns and relationships in the
data, and are often used in conjunction with other descriptive statistics.
• Box Plots: These are graphical representations of the distribution of a dataset that provide a quick
summary of the distribution's shape, central tendency, and dispersion. Box plots are particularly useful
for comparing multiple datasets or for identifying outliers in a dataset.
• Histograms: These are graphs that represent the distribution of a dataset by dividing the data into
intervals and counting the number of occurrences in each interval. Histograms are useful for
visualizing the distribution of a dataset and identifying patterns and relationships in the data.
These are some of the most common types of descriptive statistics. The specific type of descriptive statistic
used will depend on the type of data being analyzed, the goals of the analysis, and the questions being
asked.
• Looks at a complete population (rather than data sampling), making it considerably more accurate
than inferential statistics.
But, of course, being so straightforward means descriptive analytics also has its limitations. Let’s explore
some of these next.
Disadvantages of descriptive analytics
Okay, we’ve looked at the strengths of descriptive analytics—but where does it fall short? Some
disadvantages of descriptive analytics include:
• You can summarize data sets you have access to, but these may not tell a complete story.
• You cannot use descriptive analytics to test a hypothesis or understand why data present the way
they do.
• You cannot use descriptive analytics to predict what may happen in the future.
• You cannot generalize your findings to a broader population.
• Descriptive analytics tells you nothing about the data collection methodology, meaning the data
set may include errors.
As you may suspect, although descriptive analytics are useful, it’s important not to overstretch their
capabilities. Fortunately, we have diagnostic and predictive analytics to help fill in the gaps where
descriptive analytics falls short.
Descriptive analytics use cases
Using these data, teachers and training providers can track both individual and organization-level targets.
They can analyze grade curves, or see which teaching resources are most popular. And while they won’t
necessarily know why, it may be possible to infer from the data that videos, for example, are more popular
than, say, written documents. Presenting this information is the first step towards improving course design
and creating better learner outcomes.
Health care:
Diagnostic analytics can support many areas of health care, including the core function of diagnosing
medical problems. For example, descriptive analytics can answer questions like, how many patients were
admitted to the hospital last month? And how many returned within 30 days? After all, in some cases
reimbursement may be dependent in part on readmittance rates. Descriptive analytics can quantify events
and highlight things like what hospital resources are being used and even model the rate of disease
diagnosis. By comparing that data with historical trends, anomalies can be detected, and then the discovery
work trying to find causal relationships can begin. For example, did high readmittance rates coincide with
a change in rounding policy, or what time of day patients are sent home? Regardless, the first step to
prescriptive analytics to solve issues is to uncover the anomalies.
Retail:
A store that sells eco-friendly products noticed a recent surge in revenue from one state. During discovery,
the company learned that the surge was driven by a leap in sales of a single product — a canvas tote bag.
Research revealed the causal relationship: the state’s governor had signed a law making plastic shopping
bags illegal, causing sales of reusable bags to soar.
Manufacturing:
A contract manufacturer found that a valuable type of machine started experiencing intermittent failures.
By using diagnostic analytics to examine the machines’ logs, the company discovered that routine
software updates had been installed the previous day. It identified the update as a likely cause of failure.
It verified the cause by uninstalling the software, which eliminated the problem.
Human resources:
A company’s annual hiring report showed that one department hired more people than any other
department — but there was no net increase in the department’s staff because it was losing people as fast
as it hired them. Drilling down into the data revealed that many of the positions were for a specific team,
which paid its staff less than the industry average. The company used the information to examine pay
scales, interview employees and take other measures to improve retention.
Increased Understanding: Diagnostic analytics helps organizations gain a deeper understanding of their
data and the factors that are driving outcomes, which can lead to improved performance and increased
efficiency.
Cost Savings: By identifying and addressing problems early on, diagnostic analytics can help
organizations avoid costly mistakes and improve overall performance.
There are numerous benefits to using predictive analysis. As mentioned above, using this type of analysis
can help entities when you need to make predictions about outcomes when there are no other (and
obvious) answers available.
Investors, financial professionals, and business leaders are able to use models to help reduce risk. For
instance, an investor and their advisor can use certain models to help craft an investment portfolio with
minimal risk to the investor by taking certain factors into consideration, such as age, capital, and goals.
There is a significant impact to cost reduction when models are used. Businesses can determine the
likelihood of success or failure of a product before it launches. Or they can set aside capital for production
improvements by using predictive techniques before the manufacturing process begins.
preferences to predict their career progression and help with career development planning in addition to
forecasting diversity or inclusion initiatives.
Shippers produce massive amounts of data. Rather than employing armies of analysts and dispatchers to
decide how to best operate, these businesses can automate and build prescriptive models to provide
recommendations.
Financial markets
Quantitative researchers and traders use statistical modeling to try to maximize returns. Financial firms
can use similar techniques to manage risk and profitability.
For example, financial firms can build algorithms to churn through historical trading data to measure risks
of trades. The resulting analytics can help them decide how to size positions, how to hedge them, or
whether to place trades at all.
Additionally, these firms can use models to reduce transaction costs by figuring out how and when to best
place their trades.
Prescriptive Analytics for Hospitals and Clinics
Prescriptive analytics can be used by hospitals and clinics to improve the outcomes for patients. It puts
health care data in context to evaluate the cost-effectiveness of various procedures and treatments and to
evaluate official clinical methods.
It can also be used to analyze which hospital patients have the highest risk of re-admission so that health
care providers can do more, via patient education and doctor follow-up to stave off constant returns to
the hospital or emergency room.
Prescriptive Analytics for Airlines
Suppose you are the chief executive officer (CEO) of an airline and you want to maximize your
company’s profits. Prescriptive analytics can help you do this by automatically adjusting ticket prices
and availability based on numerous factors, including customer demand, weather, and gasoline prices.
When the algorithm identifies that this year’s pre-Christmas ticket sales from Los Angeles to New York
are lagging last year’s, for example, it can automatically lower prices, while making sure not to drop
them too low in light of this year’s higher oil prices.
At the same time, when the algorithm evaluates the higher-than-usual demand for tickets from St. Louis
to Chicago because of icy road conditions, it can raise ticket prices automatically. The CEO doesn’t have
to stare at a computer all day looking at what’s happening with ticket sales and market conditions and
then instruct workers to log into the system and change the prices manually. Instead, a computer program
can do all of this and more—and at a faster pace, too.
Prescriptive Analytics in Banking
Banking is one of the industries that can benefit from prescriptive analytics the most. That's because
companies in this sector are always trying to find ways to better serve their customers while ensuring
they remain profitable. Applying prescriptive analytical tools can help the banking sector to:
• Create models for customer relationship management
• Improve ways to cross-sell and upsell products and services
• Recognize weaknesses that may result in losses, such as anti-money laundering (AML)
• Develop key security and regulatory initiatives like compliance reporting
Prescriptive Analytics in Marketing
Just like banking, data analytics is very critical in the marketing sector. Marketers can use prescriptive
analytics to stay ahead of consumer trends. Using past trends and past performance can give internal and
external marketing departments a competitive edge.
By employing prescriptive analytics, marketers can come up with effective campaigns that target specific
customers at specific times like, say, advertising for a certain demographic during the Superbowl.
Corporations can also identify how to engage different customers and how to effectively price
and discount their products and services.
Big data analytics refers to the process of examining large and complex datasets, also known as big data,
to uncover hidden patterns, correlations, and other useful information. The goal of big data analytics is to
transform raw data into actionable insights that can inform decision-making and drive business value.
Big data is generated by a variety of sources, including social media, e-commerce transactions, machine-
generated data from IoT devices, and more. The sheer volume and variety of big data presents significant
challenges for traditional data processing and storage methods. As a result, big data analytics relies on
new technologies and approaches, such as distributed computing, SQL databases, and machine learning
algorithms.
Some common use cases for big data analytics include customer behavior analysis, fraud detection,
predictive maintenance, and market trend analysis. By leveraging the insights provided by big data
analytics, organizations can gain a competitive advantage, improve operations, and make better-informed
decisions.
Overall, big data analytics is a rapidly evolving field that has the potential to revolutionize the way
organizations operate and compete.
products and services since we are living in a knowledge-intensive economy, and the companies in the
technology sector are reaping the benefits of big data analytics.
Healthcare
Healthcare is another industry that can benefit from big data analytics tools, techniques, and processes.
Healthcare personnel can diagnose the health of their patients through various tests, run them through the
computers, and look for telltale signs of anomalies, maladies, etc. It also helps in healthcare to improve
patient care and increase the efficiency of the treatment and medication processes. Some diseases can be
diagnosed before their onset so that measures can be taken in a preventive manner rather than a remedial
manner.
Manufacturing
Manufacturing is an industrial sector that is involved with developing physical goods. The life cycle of a
manufacturing process can vary from product to product. Manufacturing systems are involved within the
industry setup and across the manufacturing floor.
There are a lot of technologies that are involved in manufacturing such as the Internet of Things (IoT),
robotics, etc., but the backbone of all of these is firmly based on big data analytics. By using this,
manufacturers can improve their yield, reduce the time to market, enhance the quality, optimize the supply
chain and logistics processes, and build prototypes before the launch of products. It can help manufacturers
through all these steps.
Energy
Most oil and gas companies, which come under the energy sector, are extensive users of big data analytics.
It is deployed when it comes to discovering oil and other natural resources. Tremendous amounts of big
data go into finding out what the price of a barrel of oil will be, what the output should be, and if an oil
well will be profitable or not.
It is also deployed in finding out equipment failures, deploying predictive maintenance, and optimally
using resources in order to reduce capital expenditure.
• Banking: Data analytics can help track and monitor illegal money laundering.
feedback, and other sources, text analytics is becoming an increasingly important component of data
analytics.
Text mining, text analysis, and text analytics are often used interchangeably, with the end goal of
analyzing unstructured text to obtain insights. However, while text mining (or text analysis) provides
insights of a qualitative nature, text analytics aggregates these results and turns them into something that
can be quantified and visualized through charts and reports.
Text analysis and text analytics often work together to provide a complete understanding of all kinds of
text, like emails, social media posts, surveys, customer support tickets, and more. For example, you can
use text analysis tools to find out how people feel toward a brand on social media (sentiment analysis), or
understand the main topics in product reviews (topic detection).
But look again at the second sentence above. Did it end with the period at the end of “Dr.?”
Now check out the punctuation in that last sentence. There’s a period and a question mark right at the end
of it!
Point is, before you can run deeper text analytics functions (such as syntax parsing, #6 below), you must
be able to tell where the boundaries are in a sentence. Sometimes it’s a simple process, and other times…
not so much.
Certain communication channels <cough> Twitter <cough> are particularly complicated to break down.
We have ways of sentence breaking for social media, but we’ll leave that aside for now.
D. 4. Part of Speech Tagging
Once we’ve identified the language of a text document, tokenized it, and broken down the sentences, it’s
time to tag it.
Part of Speech tagging (or PoS tagging) is the process of determining the part of speech of every token
in a document, and then tagging it as such.
For example, we use PoS tagging to figure out whether a given token represents a proper noun or a
common noun, or if it’s a verb, an adjective, or something else entirely.
Part of Speech tagging may sound simple, but much like an onion, you’d be surprised at the layers involved
– and they just might make you cry. At Lexalytics, due to our breadth of language coverage, we’ve had to
train our systems to understand 93 unique Part of Speech tags.
E. 5. Chunking
Let’s move on to the text analytics function known as Chunking (a few people call it light parsing, but
we don’t). Chunking refers to a range of sentence-breaking systems that splinter a sentence into its
component phrases (noun phrases, verb phrases, and so on).
Before we move forward, I want to draw a quick distinction between Chunking and Part of Speech tagging
in text analytics.
• PoS tagging means assigning parts of speech to tokens
• Chunking means assigning PoS-tagged tokens to phrases
Here’s what it looks like in practice. Take the sentence:
The tall man is going to quickly walk under the ladder.
PoS tagging will identify man and ladder as nouns and walk as a verb.
Chunking will return: [the tall man]_np [is going to quickly walk]_vp [under the ladder]_pp
(np stands for “noun phrase,” vp stands for “verb phrase,” and pp stands for “prepositional phrase.”)
Got that? Let’s move on.
6. Syntax Parsing
The syntax parsing sub-function is a way to determine the structure of a sentence. In truth, syntax
parsing is really just fancy talk for sentence diagramming. But it’s a critical preparatory step in sentiment
analysis and other natural language processing features.
This becomes clear in the following example:
• Apple was doing poorly until Steve Job
• Because Apple was doing poorly, Steve Job
• Apple was doing poorly because Steve Job
• In the first sentence, Apple is negative, whereas Steve Jobs is positive.
In the second, Apple is still negative, but Steve Jobs is now neutral.
In the final example, both Apple and Steve Jobs are negative.
Syntax parsing is one of the most computationally-intensive steps in text analytics. At Lexalytics, we use
special unsupervised machine learning models, based on billions of input words and complex matrix
factorization, to help us understand syntax just like a human would.
7. Sentence Chaining
The final step in preparing unstructured text for deeper analysis is sentence chaining, sometimes known
as sentence relation.
Lexalytics utilizes a technique called “lexical chaining” to connect related sentences. Lexical chaining
links individual sentences by each sentence’s strength of association to an overall topic.
Even if sentences appear many paragraphs apart in a document, the lexical chain will flow through the
document and help a machine detect overarching topics and quantify the overall “feel.”
In fact, once you’ve drawn associations between sentences, you can run complex analyses, such as
comparing and contrasting sentiment scores and quickly generating accurate summaries of long
documents.
1. Sports Trading
One of the most popular sports to bet on, particularly in Europe, is football (soccer). The top sports traders
gather data from the mainstream media and have a deep understanding of the game and its politics at a
local level.
If you live in England and you bet on English football, irrespective of the division, it’s relatively easy to
understand your market. You can successfully bet on a local second division English team because you
speak the language, read the local newspapers and may even follow some of the team members on Twitter.
But what if you’d like to do the same for a similar team in Spain and you don’t speak a word of Spanish?
A Text Analysis API capable of understanding Spanish would allow you to extract meaning from local
Twitter feeds, giving you insights into what the local fans are saying about their team. These people
understand the squad dynamics at a local level. If, for example, the star striker of Real Club Deportivo
Mallorca has an argument with his wife the night before his cup game, is he as likely to be the top scorer
on match day?
2. Financial Trading
As with sports trading, having an insight into what is happening at a local level can be very valuable to a
financial trader.
Domain-specific sentiment analysis/classification can add real value here. The same way in which fans
have their own distinct vocab based on the sport, so too do traders in particular markets.
Intent recognition and Spoken Language Understanding services for detecting user intents (e.g. “buy”,
“sell”, etc) from short utterances can help to guide traders in deciding what to trade, how much and how
quickly.
3. Voice of the Customer (VOC)
VOC applications are primarily used by companies to determine what a customer is saying about a product
or service.
Sources of such data include emails, surveys, call center logs and social media streams like blogs, tweets,
forum posts, newsfeeds, and so on. For example, a telecom company could use voice of customer text
analysis to scan Twitter for customer gripes about their broadband internet services.
This would give them an early warning when customers were annoyed with the performance of the service
and allow them to intercept the issue before it involved the customer calling to officially complain or
request contract cancellation.
4. Fraud
Whether it’s workers claiming false compensation or a motorist disclosing a false home
address, fraudulent activity can be discovered much more quickly when those investigating can join the
dots together, faster.
In the latter case, for example, the guilty party may give an address that has many claims associated with
it or the driven vehicle may have been involved in other claims.
Having the ability to capture this information saves the insurer time and gives them greater insight into
the case.
5. Manufacturing or Warranty Analysis
In this use case, companies examine the text that comes from warranty claims, dealer technician lines,
report orders, customer relations text, and other potential information using text analytics to extract certain
entities or concepts (like the engine or a certain part).
They can then analyze this information, looking at how the entities cluster and to see if the clusters are
increasing in size and whether they are a cause for concern, for example.
6. Customer Service Routing
In this use case, companies can use text analytics to route requests to customer service representatives.
For example, say you’ve sent an email to a company while on hold to one of their reps. You might have
a question or a complaint about one of their products. The company can use text analytics for intelligent
routing of that email to the appropriate person at the company.
This could also be possible in a call center situation, provided you have sufficiently accurate speech-to-
text software.
7. Lead Generation
As was the case with the VOC application, taking timely action on a piece of Social Media information
can be used to both retain and gain new customers.
For example, if a person tweets that they are interested in a certain product or service, text analytics can
discover this & feed this info to a sales rep who can then pursue this prospect and convert them into a
customer.
8. TV Advertising & Audience Analysis
TV shows or live televised events are some of the most talked-about topics on Twitter. Marketers and TV
producers can benefit from using Text Analytics in two distinct ways. If producers can get an
understanding of how their audience ‘feels’ about certain characters, settings, storylines, featured music
etc, they can make adjustments in a bid to appease their viewers and therefore increase the audience size
and viewers ratings.
Marketers can dig into social media streams to analyse the effectiveness of product placement and
commercials aired during the breaks.
For example, the TV character ‘Cersei’ from Game of Thrones is becoming a fashion icon amongst fans,
who regularly Tweet about her latest frock. High street retailers that want to take advantage of this trend
could release a line of ‘Queen of Westeros’ style clothing and align their commercials with shows like
Game of Thrones.
Text Analytics could also be used by TV Executives looking to sell to advertisers. For example, a TV
company could mine viewers tweets & forum activity to profile their audience more accurately.
So instead of merely pitching the size of their audience to advertisers, they could wow them by identifying
their gender, location, age etc and their feelings towards certain products.
9. Recruitment
Text Analysis could be used in both the search and selection phases of recruitment.
The most basic application would be identifying the skills of a potential hire. In the recruitment industry,
the real value comes from identifying prospects before they become active in the job market.
For example, it would be very powerful to know if somebody tweets about disliking their job or expresses
an interest in working in a different field, larger/smaller company, different location. Once you have
identified such a prospect, you could use Text Analytics to analyse the suitability of this person based on
what others say about them.
Mining news and blog articles, forum postings and other sources could help to evaluate potential hires.
10. Review Sites
Companies like Expedia have millions of reviews on their website, from travellers all over the world.
Given the nature of the site and the fact that their users are looking for a stress-free experience, having to
sift through hundreds of reviews to find a place to stay can be a real turn off.
Text Analysis can be used here to build tools that can summarize multiple properties in 2-3 word phrases.
Instead of scrolling through a list of hotel features like heated pool, massage therapy, buffet breakfast etc,
you could simply say “Luxurious Hotel and Spa”.
Visual analytics is a branch of data analytics that emphasizes the use of visual representations to
explore, analyze, and understand complex data. It combines traditional data analysis methods with
interactive visualizations, enabling users to quickly identify patterns, relationships, and trends in large and
complex datasets.
Visual analytics leverages the human brain's ability to process visual information, allowing users
to quickly identify patterns and insights that might be difficult to discern from raw data or traditional
statistical reports. This makes it an effective tool for exploring and communicating data insights to both
technical and non-technical stakeholders.
Applications of visual analytics include exploratory data analysis, data visualization, business intelligence,
and scientific visualization. By using interactive visualizations, users can quickly test hypotheses, compare
multiple datasets, and gain a deeper understanding of the data.
Overall, visual analytics provides a powerful way to explore and understand complex data, helping
organizations to make informed decisions, improve operations, and gain a competitive advantage.
Data Analytics is more involved in bringing some form of structure into unorganized data, Data
Visualization deals with picturing the information to develop trends and conclusions. In Data
Visualization, information is organized into charts, graphs, and other forms of visual representations. This
simplifies otherwise complicated information and makes it accessible to all the involved stakeholders to
make critical business decisions.
Data scientists can find patterns or errors without visualization. However, it is crucial to communicate
data findings and identify critical information from them. And for this, interactive data visualization tools
make all the difference.A relevant and recent example is the ongoing pandemic. Yes, data scientists can
look into the data and gain insights. But data visualization is assisting experts in staying informed and
calm with such an abundance of data.
• Data visualization strengthens the impact of messaging for your audiences and presents the data
analysis results in the most persuasive manner. It unifies the messaging systems across all the
groups and fields within the organization.
• Visualization lets you comprehend vast amounts of data at a glance and in a better way. It helps to
understand the data better to measure its impact on the business and communicates the insight
visually to internal and external audiences.
• Decisions can’t be made in a vacuum. Available data and insights enable decision-makers to aid
decision analysis. Unbiased data without inaccuracies allows access to the right kind of
information and visualization to represent that information and keep it relevant.
Data visualization has the potential to solve many business issues. All businesses must incorporate data
visualization tools and reap transformative benefits in their critical areas of operations.
Advantages of Visualization
Data Analytics and Visualization are a crucial elements of the business decision-making process. It helps
the stakeholders to recognize patterns in the data and devise profitable business strategies. Below are some
of the benefits of Data Analytics and Visualization:
• Better Decision Making: By using skilled Data Analysts and the right software, companies can
identify market trends and make better business decisions to Boost Sales and Profits.
• Better Insights: Companies can get better insights into their Customer Base- using Data Analytics
and Visualization, companies can break large customer data down into smaller sets that can be
used to understand the Client Base better.
• Improving Productivity and Revenue Growth: By looking at the results from Data Analytics
and Visualization, companies get to know which areas they need to invest in and what processes
need to be automated for better efficiency.
•Noting Changes in Market Behaviour: With a real-time Data Analytics and Visualization
Dashboard, company stakeholders can quickly identify changes in market behavior and make
appropriate business decisions.
• Analyzing Different Markets: Using Data Analytics and Visualization techniques, companies
can analyze different markets and decide which ones to place attention on and which ones to avoid.
• Business Trends: This is one of the most valuable applications of Data Analytics and
Visualization. It allows businesses to examine the present and past trends to make predictions that
determine the way forward for the business.
• Data Relationships: This is one of the most obvious benefits of Data Analytics and Visualization.
It helps companies note the relationships between independent data sets and make business
decisions based on these findings.
Disadvantages of Data Visualization
• It gives assessment not exactness: While the information is exact in foreseeing the circumstances,
the perception of similar just gives the assessment. It without a doubt is anything but difficult to
change over the robust and protracted information into simple pictorial configuration yet such a
portrayal of data may prompt theoretical ends now and then.
• One-sided: The essential arrangement of information representation occurs with the human
interface, which means the information that turns out to be the base of perception can be one-sided.
The individual bringing the information for the equivalent may just think about the significant part
of the information or the information that requirements center and may reject the remainder of the
information which may prompt one-sided results.
• Absence of help: One of the downsides of information perception is that it can’t help, which means
an alternate gathering of the crowd may decipher it in an unexpected way.
• Inappropriate plan issue: On the off chance that information perception is viewed as such a
correspondence. At that point, it must be certifiable in clarifying the reason. In the event that the
plan isn’t legitimate, at that point, this can prompt disarray in correspondence.
• Wrong engaged individuals can skip center messages: One of the issues with information
perception is however it could be logical its clearness in clarification is totally subject to the focal
point of its crowd.
Skewness and kurtosis are two statistical measures that describe the shape of a probability
distribution. They provide important information about the distribution of a set of data and can help to
identify patterns or anomalies in the data.
4.2.1 Skewness
Skewness is a measure of the asymmetry of a probability distribution. It tells us whether the distribution
is symmetric or skewed to one side or the other. If the data is symmetrical, the skewness is zero. If the
data is skewed to the right (positive skewness), the mean is greater than the median, and if the data is
skewed to the left (negative skewness), the mean is less than the median.
Kurtosis is a measure of the "peakedness" of a probability distribution. It tells us whether the distribution
has a flat or peaked shape. A distribution with positive kurtosis has a higher peak and fatter tails than a
normal distribution. A distribution with negative kurtosis has a flatter shape and thinner tails than a normal
distribution.
Skewness and kurtosis are important measures for identifying outliers in a data set, as well as for
understanding the overall shape of the data. They can also be used to compare different data sets and to
identify patterns or anomalies in the data.
To Interpret Skewness
The value for skewness can range from negative infinity to positive infinity.
Here’s how to interpret skewness values:
• A negative value for skewness indicates that the tail is on the left side of the distribution, which
extends towards more negative values.
• A positive value for skewness indicates that the tail is on the right side of the distribution, which
extends towards more positive values.
• A value of zero indicates that there is no skewness in the distribution at all, meaning the
distribution is perfectly symmetrical.
Types of Skewness
Positive Skewness
A positively skewed distribution (often referred to as Right-Skewed) is a distribution type where most
values are concentrated to the left tail of the distribution whereas the right tail of the distribution is longer.
A positively skewed distribution is the complete opposite of a negatively skewed distribution.
A Positively Skewed Curve
In contrast to normally distributed data, where all central trend measurements (mean, median, and mode)
are equal to each other, with positively skewed data, the observations are dispersed. The general
relationship between the central tendency measures in a positively skewed distribution can be expressed
using the following inequalities: Mean > Median > Mode
Negative Skewness
A negatively skewed distribution (often referred to as Left-Skewed) is a kind of distribution where more
values are on the right side of the distribution graph whereas the left tail of its distribution graph is longer.
A Negatively Skewed Curve
Apart from normally distributed data, where all central trend measurements (mean, median, and mode)
are equal to each other, with negatively skewed data, the measurements are dispersed. The general
relationship between central trend measures in the negatively skewed distribution can be displayed using
the following inequality: Mode > Median > Mean
Zero Skewness
When a distribution has zero skew, it is symmetrical. Its left and right sides are mirror images.
Normal distributions have zero skew, but they’re not the only distributions with zero skew. Any
symmetrical distribution, such as a uniform distribution or some bimodal (two-peak) distributions, will
also have zero skew.
The easiest way to check if a variable has a skewed distribution is to plot it in a histogram. For example,
the weights of six-week-old chicks are shown in the histogram below.
The distribution is approximately symmetrical, with the observations distributed similarly on the left and
right sides of its peak. Therefore, the distribution has approximately zero skew.
Zero skew: mean = median=mode
1. Cricket Score
Cricket score is one of the best examples of skewed distribution. Let us say that during a match, most of
the players of a particular team scored runs above 50, and only a few of them scored below 10. In such a
case, the data is generally represented with the help of a negatively skewed distribution. Similarly, a
positively skewed distribution can be used if most of the players of a particular team score badly during a
match, and only a few of them tend to perform well.
2. Exam Results
The representation of exam results forms a classic example of skewed distribution in real life. The
distribution of scores obtained by the students of a class on any particularly difficult exam is generally
positively skewed in nature. This is because due to the increased difficulty level of the exam, a majority
of students tend to score low, and only a few of them manage to score high. Similarly, the distribution of
scores obtained on an easy test is negatively skewed in nature because the reduced difficulty level of the
exam helps more students score high, and only a few of them tend to score low.
3. Average Income Distribution
Income distribution is a prominent example of positively skewed distribution. This is because a large
percentage of the total people residing in a particular state tends to fall under the category of a low-income
earning group, while only a few people fall under the high-income earning group. The mean of such data
is generally greater than the other measures of central tendency of data such as median or mode.
4. Distribution of Stock Market Returns
The representation of stock market returns is usually done with the help of negatively skewed distribution.
This is because the stock market mostly provides slightly positive returns on most days, and the negative
returns are only observed occasionally. Hence, the graphical representation of data definitely has more
points on the right side as compared to the left side.
4.2.2 Kurtosis
Kurtosis is a statistical measure used to describe a characteristic of a dataset. When normally distributed
data is plotted on a graph, it generally takes the form of an upsidedown bell. This is called the bell curve.
The plotted data that are furthest from the mean of the data usually form the tails on each side of the
curve. Kurtosis indicates how much data resides in the tails.
Distributions with a large kurtosis have more tail data than normally distributed data, which appears to
bring the tails in toward the mean. Distributions with low kurtosis have fewer tail data, which appears to
push the tails of the bell curve away from the mean.
For investors, high kurtosis of the return distribution curve implies that there have been many price
fluctuations in the past (positive or negative) away from the average returns for the investment. So, an
investor might experience extreme price fluctuations with an investment with high kurtosis. This
phenomenon is known as kurtosis risk.
Kurtosis is all about the tails of the distribution — not the peakedness or flatness. It is used to describe the
extreme values in one versus the other tail. It is actually the measure of outliers present in the distribution.
High kurtosis in a data set is an indicator that data has heavy tails or outliers. If there is a high kurtosis,
then, we need to investigate why do we have so many outliers. It indicates a lot of things, maybe wrong
data entry or other things. Low kurtosis in a data set is an indicator that data has light tails or lack of outliers.
If we get low kurtosis(too good to be true), then also we need to investigate and trim the dataset of unwanted
results.
Mesokurtic: This distribution has kurtosis statistic similar to that of the normal distribution. It means that
the extreme values of the distribution are similar to that of a normal distribution characteristic. When
kurtosis is equal to 3, the distribution is mesokurtic.
The kurtosis of a mesokurtic distribution is neither high nor low, rather it is considered to be a baseline
for the two other classifications.
Leptokurtic (Kurtosis > 3): Distribution is longer, tails are fatter. Peak is higher and sharper than
Mesokurtic, which means that data are heavy-tailed or profusion of outliers.
Outliers stretch the horizontal axis of the histogram graph, which makes the bulk of the data appear in a
narrow (“skinny”) vertical range, thereby giving the “skinniness” of a leptokurtic distribution.
Positive excess values of kurtosis (>3) indicate that a distribution is peaked and possess thick tails.
Leptokurtic distributions have positive kurtosis values.
A leptokurtic distribution has a higher peak (thin bell) and taller (i.e. fatter and heavy) tails than a normal
distribution.
An extreme positive kurtosis indicates a distribution where more of the values are located in the tails of
the distribution rather than around the mean.
Platykurtic: (Kurtosis < 3): When kurtosis is equal to 0, the distribution is platykurtic A platykurtic
distribution is flatter (less peaked) when compared with the normal distribution, with fewer values in its
shorter (i.e. lighter and thinner) tails.
The peak is lower and broader than Mesokurtic, which means that data are light-tailed or lack of outliers.
Negative excess values of kurtosis (<3) indicate that a distribution is flat and has thin tails. Platykurtic
distributions have negative kurtosis values.
A platykurtic distribution is flatter (less peaked) when compared with the normal distribution, with fewer
values in its shorter (i.e. lighter and thinner) tails.
The reason for this is because the extreme values are less than that of the normal distribution.
In data analytics, formatting data using charts is an important step in visualizing data and uncovering
insights. Different types of charts are used to represent data in different ways and highlight specific
patterns and relationships in the data. There are many charts which are used for visual representation
including column chart, line, pie, doughnut, bar, area, stock, surface, radar, bubble, tree map, waterfall,
map, and pivot charts. MS Excel is the best choice of data formatting. Some common types of charts used
in data formats include:
• Bar Charts: These are used to compare the values of different categories. They can be used to
represent data vertically (column chart) or horizontally (bar chart).
• Line Charts: These are used to represent continuous data over time, such as stock prices, sales, or
temperatures.
• Pie Charts: These are used to represent the proportion of different categories in a whole. They are
best used for small data sets with few categories.
• Scatter Plots: These are used to represent the relationship between two variables. They are used to
visualize the distribution of data points and identify trends and patterns in the data.
• Histograms: These are used to represent the distribution of a single variable. They provide a visual
representation of the frequency of data points within specific ranges or bins.
• Area Charts: These are used to represent the changes in data over time. They are similar to line
charts, but the area under the line is filled with color to represent the magnitude of the data.
• Box Plots: These are used to represent the distribution of a single variable. They show the
minimum and maximum values, the median, and the interquartile range of the data.
• A pivot chart is already a dynamic chart, but you have to make changes in data to convert
a standard chart into a dynamic chart.
you want. To see all available shape styles, click the More button
• To change the format of chart text, select the text, and then choose an option on the mini toolbar
that appears. Or, on the Home tab, in the Font group, select the formatting that you want to use.
• To use WordArt styles to format text, select the text, and then on the Format tab in the WordArt
Styles group, choose a WordArt style to apply. To see all available styles, click the More button
available today, storing and organizing large quantities of data for analysis is becoming increasingly
necessary.
A data wrangling process, also known as a data munging process, consists of reorganizing, transforming
and mapping data from one "raw" form into another in order to make it more usable and valuable for a
variety of downstream uses including analytics.
Data wrangling can be defined as the process of cleaning, organizing, and transforming raw data into the
desired format for analysts to use for prompt decision-making. Also known as data cleaning or data
munging, data wrangling enables businesses to tackle more complex data in less time, produce more
accurate results, and make better decisions. The exact methods vary from project to project depending
upon your data and the goal you are trying to achieve. More and more organizations are increasingly
relying on data wrangling tools to make data ready for downstream analytics.
• Users – Analysts, statisticians, business users, executives, and managers use data wrangling. In
comparison, DW/ETL developers use ETL as an intermediate process linking source systems and
reporting layers.
• Data Structure – Data wrangling involves varied and complex data sets, while ETL involves structured
or semi-structured relational data sets.
• Use Case – Data wrangling is normally used for exploratory data analysis, but ETL is used for
gathering, transforming, and loading data for reporting.
8. Expand and Iterate. Once your initial dashboards begin to take shape, it’s common for new questions
to arise. Now that you have this new knowledge regarding your finance data, what new insight would
a marketing dashboard allow you to learn? At Graphable we have found this to be a highly iterative
process- as people see their data visualized, it makes them see and understand more and better ways
to gain insight from that same data. In Domo for example, many of the capabilities to customize are
designed for users to make on their own which often decreases the load on IT significantly.
Exercise
I. Write down short answers for the following:
Learning Objectives
• To learn about Types of data analytics
• To understand data visualization
• To understand data wrangling
Learning Outcomes
At the end of the unit they will be able to:
• To apply different operations of data formatting
• To apply dash boarding fundamentals
• To solve business problems in various field with data analytics
Mode of Assessment
S.No Title of Teaching Textbook/ Link Tool(Quiz/Pu
Topic (PPT/Seminar/ Reference Book (if Applicable) link zzle/
Chalk & should on Springboard/ Assignment/
Board etc.) Coursera / Nptel Seminaretc..)
NPTEL Link:
Introduction https://onlinecourses.npt
William G Zikmund, Barry J
to data Chalk and el.ac.in/noc23_mg54/uni
Babin, Jon C.Carr,
analytics and board/PPT t?unit=26&lesson=31
AtanuAdhikari,Mitch Griffin,
1. visualization
Business Researchmethods, A
https://onlinecourses.npt
South Asian Perspective, 8th
Edition, Cengage Learning, el.ac.in/noc23_mg54/uni
New Delhi,2012. t?unit=26&lesson=30 Quiz
https://www.youtube.co
Chalk and James R. Evans, "Business m/watch?v=1LgkR1R1A
2. Data board/PPT Analytics - Methods, Models CU
formatting and Decisions", Pearson Ed,
2012.
An enterprise dashboard is typically a feature of a software solution that, for the purposes of a hospital,
aggregates data from different departments or the same department from different hospitals and presents
the data on a single screen. Viewing any form of information using graphics makes data easier to
understand, analyze, and process into actionable insights. Adding the functionality of dashboard
reporting provides a comprehensive and clear view of all business insights at a glance. A dashboard can
be created by linking it to Excel databases or other software that data and reporting is produced on. It can
be automated so that as data is uploaded to the database or reporting system, the charts and graphs are
automatically updated. Dashboard reporting becomes critical in a dynamic business environment,
providing real-time insights that can be acted upon to steer organizations toward their goals.
inserted once and updated manually. With modern reporting tools, there is no need to do so. Real-time
dashboards enable real-time data and that is the beauty and power of BI at its core.
5. Productivity: While static reports have been a useful tool for increasing productivity, in today's modern
economies this is simply not enough. The amount of data that is collected, and needs to be analyzed
is continuously growing, and numerous static or paper sheets or millions of rows and columns cannot
help as much as they used to. The rise of self-service BI tools has enabled users to tinker with the data
on their own, and use modern technologies that will increase their productivity levels.
5.1.3 Dashboards for the Decision Making Process and Company Performance
The better the decisions you make for your company, the more it is bound to grow and become profitable.
Dashboards provide decisive individuals with the best tools to support their jobs, especially through the
following tasks:
• Identifying Negative Trends: In addition of activating and stimulating positive trends, effective
management should detect and reduce negative trends. The latter is more important as localizing,
analyzing and correcting these trends is essential for productivity and company morale.
• Inventing Strategies According to Goals: Dashboards support the decision-making procedure by
providing timely and accurate information. By basing decisions on this information, better
strategies will be developed and an improvement in the company’s performance will be noted.
• Improving Analysis through Visualization Abilities: Pure data will not necessarily identify and
trace most irregularities. Luckily, what may not be visible in spreadsheets may be prominently
displayed in graphic visualizations.
• Measuring Company’s Parameters: Measuring a company’s performance or levels of efficiency
can be difficult, especially since the outside may not reflect what is going on within four walls. As
dashboards support deep analysis, executives and managers alike will be able to detect
inefficiencies and take action against them.
Dashboard visualization tools allow you to see how you’re progressing towards your goals via dashboards,
scorecards, charts, and graphs.
The most common way businesses visualize data is via dashboards. In fact, 90% of our survey respondents
have been actively creating and using dashboards for some time. The rest have just started using
dashboards.
And more than half of them are using dashboards in marketing, sales, and web analytics.
Tableau
One of the most widely used data visualization tools, Tableau, offers interactive visualization solutions to
more than 57,000 companies.
Providing integration for advanced databases, including Teradata, SAP, My SQL, Amazon AWS,
and Hadoop, Tableau efficiently creates visualizations and graphics from large, constantly-evolving
datasets used for artificial intelligence, machine learning, and Big Data applications.
Dundas BI
Dundas BI offers highly-customizable data visualizations with interactive scorecards, maps, gauges, and
charts, optimizing the creation of ad-hoc, multi-page reports. By providing users full control over visual
elements, Dundas BI simplifies the complex operation of cleansing, inspecting, transforming, and
modeling big datasets.
The Pros of Dundas BI:
• Exceptional flexibility
• A large variety of data sources and charts
• Wide range of in-built features for extracting, displaying, and modifying data
The Cons of Dundas BI:
• No option for predictive analytics
• 3D charts not supported
JupyteR
A web-based application, JupyteR, is one of the top-rated data visualization tools that enable users to
create and share documents containing visualizations, equations, narrative text, and live code. JupyteR is
ideal for data cleansing and transformation, statistical modeling, numerical simulation, interactive
computing, and machine learning.
1. The Pros of JupyteR:
• Rapid prototyping
• Visually appealing results
• Facilitates easy sharing of data insights
2. The Cons of JupyteR:
• Tough to collaborate
• At times code reviewing becomes complicated
Zoho Reports
Zoho Reports, also known as Zoho Analytics, is a comprehensive data visualization tool that
integrates Business Intelligence and online reporting services, which allow quick creation and sharing of
extensive reports in minutes. The high-grade visualization tool also supports the import of Big Data from
major databases and applications.
The Pros of Zoho Reports:
• Effortless report creation and modification
• Includes useful functionalities such as email scheduling and report sharing
• Plenty of room for data
• Prompt customer support.
The Cons of Zoho Reports:
• User training needs to be improved
• The dashboard becomes confusing when there are large volumes of data
Google Charts
One of the major players in the data visualization market space, Google Charts, coded with SVG
and HTML5, is famed for its capability to produce graphical and pictorial data visualizations. Google
Charts offers zoom functionality, and it provides users with unmatched cross-platform compatibility with
iOS, Android, and even the earlier versions of the Internet Explorer browser.
The Pros of Google Charts:
• User-friendly platform
• Easy to integrate data
• Visually attractive data graphs
• Compatibility with Google products.
The Cons of Google Charts:
• The export feature needs fine-tuning
• Inadequate demos on tools
• Lacks customization abilities
• Network connectivity required for visualization
Visual.ly
Visual.ly is one of the data visualization tools on the market, renowned for its impressive distribution
network that illustrates project outcomes. Employing a dedicated creative team for data visualization
services, Visual.ly streamlines the process of data import and outsource, even to third parties.
The Pros of Visual.ly:
• Top-class output quality
• Easy to produce superb graphics
• Several link opportunities
The Cons of Visual.ly:
• Few embedding options
• Showcases one point, not multiple points
• Limited scope
RAW
RAW, better-known as RawGraphs, works with delimited data such as TSV file or CSV file. It serves as
a link between data visualization and spreadsheets. Featuring a range of non-conventional and
conventional layouts, RawGraphs provides robust data security even though it is a web-based application.
The Pros of RAW:
• Simple interface
• Super-fast visual feedback
• Offers a high-level platform for arranging, keeping, and reading user data
• Easy-to-use mapping feature
• Superb readability for visual graphics
• Excellent scalability option
The Cons of RAW:
• Non-availability of log scales
• Not user intuitive
IBM Watson
Named after IBM founder Thomas J. Watson, this high-caliber data visualization tool uses analytical
components and artificial intelligence to detect insights and patterns from both unstructured and structured
data. Leveraging NLP (Natural Language Processing), IBM Watson's intelligent, self-service visualization
tool guides users through the entire insight discovery operation.
The Pros of IBM Watson:
• NLP capabilities
• Offers accessibility from multiple devices
• Predictive analytics
• Self-service dashboards
The Cons of IBM Watson:
• Customer support needs improvement
• High-cost maintenance
Sisense
Regarded as one of the most agile data visualization tools, Sisense gives users access to instant data
analytics anywhere, at any time. The best-in-class visualization tool can identify key data patterns and
summarize statistics to help decision-makers make data-driven decisions.
The Pros of Sisense:
• Ideal for mission-critical projects involving massive datasets
• Reliable interface
• High-class customer support
• Quick upgrades
• Flexibility of seamless customization
The Cons of Sisense:
• Developing and maintaining analytic cubes can be challenging
• Does not support time formats
• Limited visualization versions
Plotly
An open-source data visualization tool, Plotly offers full integration with analytics-centric programming
languages like Matlab, Python, and R, which enables complex visualizations. Widely used for
collaborative work, disseminating, modifying, creating, and sharing interactive, graphical data, Plotly
supports both on-premise installation and cloud deployment.
The Pros of Plotly:
• Allows online editing of charts
• High-quality image export
• Highly interactive interface
• Server hosting facilitates easy sharing
The Cons of Plotly:
• Speed is a concern at times
• Free version has multiple limitations
• Various screen-flashings create confusion and distraction
Data Wrapper
Data Wrapper is one of the very few data visualization tools on the market that is available for free. It is
popular among media enterprises because of its inherent ability to quickly create charts and present
graphical statistics on Big Data. Featuring a simple and intuitive interface, Data Wrapper allows users to
create maps and charts that they can easily embed into reports.
The Pros of Data Wrapper:
• Does not require installation for chart creation
• Ideal for beginners
• Free to use
The Cons of Data Wrapper:
• Building complex charts like Sankey is a problem
• Security is an issue as it is an open-source tool
Highcharts
Deployed by seventy-two of the world's top hundred companies, the Highcharts tool is perfect for
visualization of streaming big data analytics. Running on Javascript API and offering integration with
jQuery, Highcharts provides support for cross-browser functionalities that facilitates easy access to
interactive visualizations.
The Pros of Highcharts:
• State-of-the-art customization options
• Visually appealing graphics
• Multiple chart layouts
• Simple and flexible
The Cons of Highcharts:
• Not ideal for small organizations
Fusioncharts
Fusioncharts is one of the most popular and widely-adopted data visualization tools. The Javascript-based,
top-of-the-line visualization tool offers ninety different chart building packages that integrate with major
frameworks and platforms, offering users significant flexibility.
The Pros of Fusioncharts:
• Customized for specific implementations
• Outstanding helpdesk support
• Active community
The Cons of Fusioncharts:
• An expensive data visualization solution
• Complex set-up
• Old-fashioned interface
Infogram
Infogram is one of the most popular software programmes on the internet today. It is a web-based
tool for creating infographics and visualising data. It is primarily intended to assist all users in quickly and
simply creating interesting and interactive reports, infographics, and dashboards with data-driven
information and captivating images. This particular solution provides customers with over 550 maps and
35 charts, 20 ready-made design templates, numerous pictures and icons, a drag-and-drop editor, and other
features. Even someone who is new to the sector may quickly learn how to utilise this programme.
It has a simple editor that allows users to modify the colours and styles of their visualisations, add
corporate logos, and adjust the display choices. In addition, the users will be granted the right to use over
a million icons, GIFs, and photos in their visualisations. Users may add connections to generate traffic to
their website using interactive charts, which allow audiences to examine data using Infogram tabs. Reports
that are interactive and shareable may also be developed and incorporated, with metrics to measure
audience interaction.
Sigma.js
Sigma is a JavaScript library for drawing graphs. It enables developers to incorporate network exploration
into rich online applications and makes it simple to publish networks on websites.
• The Sigma.js layout is fantastic.
Linear Regression and Logistic Regression are the two famous Machine Learning Algorithms which come
under supervised learning technique. Since both the algorithms are of supervised in nature hence these
algorithms use labeled dataset to make the predictions. But the main difference between them is how they
are being used. The Linear Regression is used for solving Regression problems whereas Logistic
Regression is used for solving the Classification problems. The description of both the algorithms is given
below along with difference table.
Linear Regression:
• Linear Regression is one of the most simple Machine learning algorithm that comes under Supervised
Learning technique and used for solving regression problems.
• It is used for predicting the continuous dependent variable with the help of independent variables.
• The goal of the Linear regression is to find the best fit line that can accurately predict the output for
the continuous dependent variable.
• If single independent variable is used for prediction then it is called Simple Linear Regression and if
there are more than two independent variables then such regression is called as Multiple Linear
Regression.
• By finding the best fit line, algorithm establish the relationship between dependent variable and
independent variable. And the relationship should be of linear nature.
• The output for Linear regression should only be the continuous values such as price, age, salary, etc.
The relationship between the dependent variable and independent variable can be shown in below
image:
In above image the dependent variable is on Y-axis (salary) and independent variable is on x-
axis(experience). The regression line can be written as:
y= a0+a1x+ ε
Where, a0 and a1 are the coefficients and ε is the error term.
Linear regression determines the straight line, called the least-squares regression line or LSRL,
that best expresses observations in a bivariate analysis of data set. Suppose Y is a dependent variable, and
X is an independent variable, then the population regression line is given by;
Y = B0+B1X
Where
B0 is a constant
B1 is the regression coefficient
If a random sample of observations is given, then the regression line is expressed by;
ŷ = b0 + b1x
where b0 is a constant, b1 is the regression coefficient, x is the independent variable, and ŷ is the predicted
value of the dependent variable.
It can be checked using the q-q plot. If the plot shows a straight line without any deviation,
which means the error is normally distributed.
• No autocorrelations: The linear regression model assumes no autocorrelation in error terms. If
there will be any correlation in the error term, then it will drastically reduce the accuracy of the
model. Autocorrelation usually occurs if there is a dependency between residual errors.
1. Businesses often use linear regression to understand the relationship between advertising spending and
revenue.
For example, they might fit a simple linear regression model using advertising spending as the predictor
variable and revenue as the response variable. The regression model would take the following form:
Revenue = β0 + β1(ad spending)
The coefficient β0 would represent total expected revenue when ad spending is zero.
The coefficient β1 would represent the average change in total revenue when ad spending is increased by
one unit (e.g. one dollar).
If β1 is negative, it would mean that more ad spending is associated with less revenue.
If β1 is close to zero, it would mean that ad spending has little effect on revenue.
And if β1 is positive, it would mean more ad spending is associated with more revenue.
Depending on the value of β1, a company may decide to either decrease or increase their ad spending.
2. Medical researchers often use linear regression to understand the relationship between drug dosage and
blood pressure of patients.
For example, researchers might administer various dosages of a certain drug to patients and observe how
their blood pressure responds. They might fit a simple linear regression model using dosage as the
predictor variable and blood pressure as the response variable. The regression model would take the
following form:
Blood pressure = β0 + β1(dosage)
The coefficient β0 would represent the expected blood pressure when dosage is zero.
The coefficient β1 would represent the average change in blood pressure when dosage is increased by one
unit.
If β1 is negative, it would mean that an increase in dosage is associated with a decrease in blood pressure.
If β1 is close to zero, it would mean that an increase in dosage is associated with no change in blood
pressure.
If β1 is positive, it would mean that an increase in dosage is associated with an increase in blood pressure.
Depending on the value of β1, researchers may decide to change the dosage given to a patient.
3. Agricultural scientists often use linear regression to measure the effect of fertilizer and water on crop
yields.
For example, scientists might use different amounts of fertilizer and water on different fields and see how
it affects crop yield. They might fit a multiple linear regression model using fertilizer and water as the
predictor variables and crop yield as the response variable. The regression model would take the following
form:
crop yield = β0 + β1(amount of fertilizer) + β2(amount of water)
The coefficient β0 would represent the expected crop yield with no fertilizer or water.
The coefficient β1 would represent the average change in crop yield when fertilizer is increased by one
unit, assuming the amount of water remains unchanged.
The coefficient β2 would represent the average change in crop yield when water is increased by one
unit, assuming the amount of fertilizer remains unchanged.
Depending on the values of β1 and β2, the scientists may change the amount of fertilizer and water used to
maximize the crop yield.
4. Data scientists for professional sports teams often use linear regression to measure the effect that
different training regimens have on player performance.
For example, data scientists in the NBA might analyze how different amounts of weekly yoga sessions
and weightlifting sessions affect the number of points a player scores. They might fit a multiple linear
regression model using yoga sessions and weightlifting sessions as the predictor variables and total points
scored as the response variable. The regression model would take the following form:
points scored = β0 + β1(yoga sessions) + β2(weightlifting sessions)
The coefficient β0 would represent the expected points scored for a player who participates in zero yoga
sessions and zero weightlifting sessions.
The coefficient β1 would represent the average change in points scored when weekly yoga sessions is
increased by one, assuming the number of weekly weightlifting sessions remains unchanged.
The coefficient β2 would represent the average change in points scored when weekly weightlifting
sessions is increased by one, assuming the number of weekly yoga sessions remains unchanged.
Depending on the values of β1 and β2, the data scientists may recommend that a player participates in more
or less weekly yoga and weightlifting sessions in order to maximize their points scored.
This assumption can be checked by simply counting the unique outcomes of the dependent variable. If
more than two possible outcomes surface, then one can consider that this assumption is violated.
2. Little or no multicollinearity between the predictor/explanatory variables
This assumption implies that the predictor variables (or the independent variables) should be
independent of each other. Multicollinearity relates to two or more highly correlated independent
variables. Such variables do not provide unique information in the regression model and lead to wrongful
interpretation.
The assumption can be verified with the variance inflation factor (VIF), which determines the correlation
strength between the independent variables in a regression model.
3. Linear relationship of independent variables to log odds
Log odds refer to the ways of expressing probabilities. Log odds are different from probabilities. Odds
refer to the ratio of success to failure, while probability refers to the ratio of success to everything that can
occur.
For example, consider that you play twelve tennis games with your friend. Here, the odds of you winning
are 5 to 7 (or 5/7), while the probability of you winning is 5 to 12 (as the total games played = 12).
4. Prefers large sample size
Logistic regression analysis yields reliable, robust, and valid results when a larger sample size of the
dataset is considered.
This assumption can be validated by taking into account a minimum of 10 cases considering the least
frequent outcome for each estimator variable. Let’s consider a case where you have three predictor
variables, and the probability of the least frequent outcome is 0.30. Here, the sample size would be (10*3)
/ 0.30 = 100.
Moreover, if the output of the sigmoid function (estimated probability) is greater than a predefined
threshold on the graph, the model predicts that the instance belongs to that class. If the estimated
probability is less than the predefined threshold, the model predicts that the instance does not belong to
the class.
Assume that, if the output of the sigmoid function is above 0.5, the output is considered as 1. On the other
hand, if the output is less than 0.5, the output is classified as 0. Also, if the graph goes further to the
negative end, the predicted value of y will be 0 and vice versa. In other words, if the output of the sigmoid
function is 0.65, it implies that there are 65% chances of the event occurring; a coin toss, for example.
The sigmoid function is referred to as an activation function for logistic regression and is defined as:
where,
•
e = base of natural logarithms
•
value = numerical value one wishes to transform
The following equation represents logistic regression:
2. Possibility of enrolling into a university: Application aggregators can determine the probability of a
student getting accepted to a particular university or a degree course in a college by studying the
relationship between the estimator variables, such as GRE, GMAT, or TOEFL scores.
3. Identifying spam emails: Email inboxes are filtered to determine if the email communication is
promotional/spam by understanding the predictor variables and applying a logistic regression algorithm
to check its authenticity.
Linear regression is used to predict the continuous Logistic Regression is used to predict the categorical
dependent variable using a given set of independent dependent variable using a given set of independent
variables. variables.
Linear Regression is used for solving Regression Logistic regression is used for solving Classification
problem. problems.
In Linear regression, we predict the value of In logistic Regression, we predict the values of
continuous variables. categorical variables.
In linear regression, we find the best fit line, by In Logistic Regression, we find the S-curve by
which we can easily predict the output. which we can classify the samples.
Least square estimation method is used for Maximum likelihood estimation method is used for
estimation of accuracy. estimation of accuracy.
The output for Linear Regression must be a The output of Logistic Regression must be a
continuous value, such as price, age, etc. Categorical value such as 0 or 1, Yes or No, etc.
In Linear regression, it is required that relationship In Logistic regression, it is not required to have the
between dependent variable and independent linear relationship between the dependent and
variable must be linear. independent variable.
In linear regression, there may be collinearity In logistic regression, there should not be
between the independent variables. collinearity between the independent variable.
Time series forecasting is a technique for the prediction of events through a sequence of time. It predicts
future events by analyzing the trends of the past, on the assumption that future trends will hold similar to
historical trends. It is used across many fields of study in various applications including:
• Astronomy
• Business planning
• Control engineering
• Earthquake prediction
• Econometrics
• Mathematical finance
• Pattern recognition
• Resources allocation
• Signal processing
• Statistics
• Weather forecasting
Time series forecasting starts with a historical time series. Analysts examine the historical data and check
for patterns of time decomposition, such as trends, seasonal patterns, cyclic patterns and regularity. Many
areas within organizations including marketing, finance and sales use some form of time series forecasting
to evaluate probable technical costs and consumer demand. Models for time series data can have many
forms and represent different stochastic processes.
Among the factors that make time series forecasting challenging are:
• Time dependence of a time series - The basic assumption of a linear regression model that the
observations are independent doesn’t hold in this case. Due to the temporal dependencies in time series
data, time series forecasting cannot rely on usual validation techniques. To avoid biased evaluations,
training data sets should contain observations that occurred prior to the ones in validation sets. Once
we have chosen the best model, we can fit it on the entire training set and evaluate its performance on
a separate test set subsequent in time.
• Seasonality in a time series - Along with an increasing or decreasing trend, most time series have
some form of seasonal trends, i.e. variations specific to a particular time frame. Time series models
can outperform others on a particular dataset — one model which performs best on one type of dataset
may not perform the same for all others.
Data classification
Further, time series data can be classified into two main categories:
• Stock time series data means measuring attributes at a certain point in time, like a static snapshot
of the information as it was.
• Flow time series data means measuring the activity of the attributes over a certain period, which
is generally part of the total whole and makes up a portion of the results.
Data variations
In time series data, variations can occur sporadically throughout the data:
• Functional analysis can pick out the patterns and relationships within the data to identify notable
events.
• Trend analysis means determining consistent movement in a certain direction. There are two types
of trends: deterministic, where we can find the underlying cause, and stochastic, which is random
and unexplainable.
• Seasonal variation describes events that occur at specific and regular intervals during the course
of a year. Serial dependence occurs when data points close together in time tend to be related.
Time series analysis and forecasting models must define the types of data relevant to answering the
business question. Once analysts have chosen the relevant data they want to analyze, they choose what
types of analysis and techniques are the best fit.
Big Data is a massive amount of data sets that cannot be stored, processed, or analyzed using
traditional tools. The history of Big Data analytics can be traced back to the early days of computing,
when organizations first began using computers to store and analyze large amounts of data. However, it
was not until the late 1990s and early 2000s that Big Data analytics really began to take off, as
organizations increasingly turned to computers to help them make sense of the rapidly growing volumes
of data being generated by their businesses.
Today, Big Data analytics has become an essential tool for organizations of all sizes across a wide
range of industries. By harnessing the power of Big Data, organizations are able to gain insights into their
customers, their businesses, and the world around them that were simply not possible before.As the field
of Big Data analytics continues to evolve, we can expect to see even more amazing and transformative
applications of this technology in the years to come.
There are millions of data sources that generate data at a very rapid rate. These data sources are
present across the world. Some of the largest sources of data are social media platforms and networks.
Let’s use Facebook as an example—it generates more than 500 terabytes of data every day. This data
includes pictures, videos, messages, and more.
Data also exists in different formats, like structured data, semi-structured data, and unstructured data. For
example, in a regular Excel sheet, data is classified as structured data with a definite format. In contrast,
emails fall under semi-structured, and your pictures and videos fall under unstructured data. All this data
combined makes up Big Data.
Big Data analytics is a process used to extract meaningful insights, such as hidden patterns, unknown
correlations, market trends, and customer preferences. Big Data analytics provides various advantages, it
can be used for better decision making, preventing fraudulent activities, among other things.
2. Process Data
Once data is collected and stored, it must be organized properly to get accurate results on analytical
queries, especially when it’s large and unstructured. Available data is growing exponentially, making data
processing a challenge for organizations. One processing option is batch processing, which looks at large
data blocks over time. Batch processing is useful when there is a longer turnaround time between
collecting and analyzing data. Stream processing looks at small batches of data at once, shortening the
delay time between collection and analysis for quicker decision-making. Stream processing is more
complex and often more expensive.
3. Clean Data
Data big or small requires scrubbing to improve data quality and get stronger results; all data must be
formatted correctly, and any duplicative or irrelevant data must be eliminated or accounted for. Dirty data
can obscure and mislead, creating flawed insights.
4. Analyze Data
Getting big data into a usable state takes time. Once it’s ready, advanced analytics processes can turn big
data into big insights. Some of these big data analysis methods include:
• Data mining sorts through large datasets to identify patterns and relationships by identifying
anomalies and creating data clusters.
• Predictive analytics uses an organization’s historical data to make predictions about the future,
identifying upcoming risks and opportunities.
• Deep learning imitates human learning patterns by using artificial intelligence and machine
learning to layer algorithms and find patterns in the most complex and abstract data.
This is done to understand what caused a problem in the first place. Techniques like drill-down, data
mining, and data recovery are all examples. Organizations use diagnostic analytics because they provide
an in-depth insight into a particular problem.
Use Case: An e-commerce company’s report shows that their sales have gone down, although customers
are adding products to their carts. This can be due to various reasons like the form didn’t load correctly,
the shipping fee is too high, or there are not enough payment options available. This is where you can use
diagnostic analytics to find the reason.
3.Predictive Analytics
This type of analytics looks into the historical and present data to make predictions of the future. Predictive
analytics uses data mining, AI, and machine learning to analyze current data and make predictions about
the future. It works on predicting customer trends, market trends, and so on.
Use Case: PayPal determines what kind of precautions they have to take to protect their clients against
fraudulent transactions. Using predictive analytics, the company uses all the historical payment data and
user behavior data and builds an algorithm that predicts fraudulent activities.
sooner, enabling them to initiate conversations to promote retention. Marketing teams can leverage
predictive data analysis for cross-sell strategies, and this commonly manifests itself through a
recommendation engine on a brand’s website.
• Supply chain: Businesses commonly use predictive analytics to manage product inventory and
set pricing strategies. This type of predictive analysis helps companies meet customer demand
without overstocking warehouses. It also enables companies to assess the cost and return on their
products over time. If one part of a given product becomes more expensive to import, companies
can project the long-term impact on revenue if they do or do not pass on additional costs to their
customer base. For a deeper look at a case study, you can read more about how FleetPride used
this type of data analytics to inform their decision making on their inventory of parts for excavators
and tractor trailers. Past shipping orders enabled them to plan more precisely to set appropriate
supply thresholds based on demand.
• Forecasting: Forecasting is essential in manufacturing because it ensures the optimal utilization
of resources in a supply chain. Critical spokes of the supply chain wheel, whether it is inventory
management or the shop floor, require accurate forecasts for functioning. Predictive modeling is
often used to clean and optimize the quality of data used for such forecasts. Modeling ensures
that more data can be ingested by the system, including from customer-facing operations, to
ensure a more accurate forecast.
• Credit: Credit scoring makes extensive use of predictive analytics. When a consumer or business
applies for credit, data on the applicant's credit history and the credit record of borrowers with
similar characteristics are used to predict the risk that the applicant might fail to perform on any
credit extended.
• Underwriting: Data and predictive analytics play an important role in underwriting. Insurance
companies examine policy applicants to determine the likelihood of having to pay out for a
future claim based on the current risk pool of similar policyholders, as well as past events that
have resulted in payouts. Predictive models that consider characteristics in comparison to data
about past policyholders and claims are routinely used by actuaries.
• Fraud Detection: Financial services can use predictive analytics to examine transactions, trends,
and patterns. If any of this activity appears irregular, an institution can investigate it for fraudulent
activity. This may be done by analyzing activity between bank accounts or analyzing when certain
transactions occur.
with the potential outcome. Predictive analytics can provide insight to inform the decision-making
process and offer a competitive advantage.
Use Case: Prescriptive analytics can be used to maximize an airline’s profit. This type of analytics is used
to build an algorithm that will automatically adjust the flight fares based on numerous factors, including
customer demand, weather, destination, holiday seasons, and oil prices.
• Sales Analytics: Sales analytics platforms help you manage customers, leads, evaluate sales across
geographies, and monitor the performance of your sales team. This information can reveal important
trends or signals that help leaders develop more effective sales strategies.
• Financial Analytics: Financial analytics go beyond financial statements to draw out revenue and
expense trends and details in your financial results that would be impossible to find without a large
team of financial analysts.
• Performance Analytics: Performance analytics look at sales, production or other data to find
bottlenecks, sources of expenses and improvement opportunities.
• Among any of these categories of tools, you may find more specialized software and varying features.
Basic tools may simply summarize and help you understand data, while more advanced tools leverage
technology like machine learning and artificial intelligence (AI) to analyze large volumes of data and
make predictions based on all of that information.
• Data models: A data model structure retrieves data and standardizes how data points relate to each
other for analysis. Models can be simple — using data from a single column of a spreadsheet, for
example or complex, involving several triggers and parameters, in multiple dimensions.
• Processing applications: Cloud analytics uses special applications to process huge volumes of
information stored in a data warehouse and reduce time to insight (more on this below).
• Computing power: Cloud analytics requires sufficient computing power to intake, clean,
structure and analyze large volumes of data.
• Analytic models: These are mathematical models that can be used to analyze complex data sets
and predict outcomes.
• Data sharing and storage: Cloud analytics solutions offer data warehousing as a service so that
the business can scale quickly and easily.
In addition to these features, AI is becoming a more integral part of cloud analytics. Machine
learning algorithms, in particular, enable cloud analytics systems to learn on their own and more
accurately predict future outcomes.
Many SaaS analytics vendors require you to move your data to their cloud. But moving your data can be
expensive – and by distancing your data from your users, you can introduce latency and performance
issues, too. Search for a solution that lets you keep your data wherever it’s most productive. You’ll want
to avoid getting locked into a single vendor, where your options will dwindle.
4. Single point of entry.
As with any SaaS solution, adoption is key. Make it easy for users by opting for a platform with a single
point of entry for login. Administrators and IT leaders also need a simple way to manage data analytics
across different clouds, regions, and users. Make sure they’ll be able to manage the entire deployment
from one management console – and easily change the deployment model at any time.
5. Self-service and readily available data, at scale, for all.
You shouldn’t have to be a coding pro to get in-depth insights about your data. The best cloud-based
analytics solutions give business users easy access to data through a catalog, a simple user interface where
they can “shop for” and select datasets, easily viewing lineage. The solutions also provide intuitive ways
to get insights, allowing users to explore and analyze in all possible contexts, without limitations.
6. Performance and scalability.
Most analytics solutions struggle with performance. That’s because they’re query-based, restricting users
to predetermined paths in the data and requiring them to reformulate queries whenever they want to pivot.
Look for a high-performing solution that can calculate analytics quickly even when used simultaneously
by a great number of users. And make sure that scaling capacity in any direction will be straightforward
and fast.
7. Augmented analytics.
AI capabilities are becoming increasingly integral to analytics, and different platforms employ them
differently. Instead of black-box AI that operates independently, look for a solution that uses AI to
augment the user experience with things like insight suggestions and natural language interactions. That
gives you the best of both worlds: machine intelligence that augments human intuition and understanding.
8. Orchestration across your cloud ecosystem.
Automation is another tool that’s vastly accelerating analytics delivery and augmenting insight discovery.
AI can speed time-to-insight by automating a wide variety of tasks for the user, including combining data
sets, preparing and transforming data, and creating visualizations.
9. Fully interactive mobile analytics.
From laptops to smartphones, the best cloud analytics solutions provide users with a consistent,
comprehensive experience. This includes the ability to analyze and share data and apps from anywhere.
10. A secure, enterprise-class experience with governed collaboration.
Your cloud analytics platform should allow you to easily assign and change permissions, so your data
stays secure and the right people have access. And when you’re evaluating moving workloads to a SaaS
platform, it’s vital to know that the service provider is following open and audited processes for security
controls.
While costs may vary from company to company, cloud solutions can be more cost-effective compared
to other IT infrastructures. This is because, with cloud computing, you only pay for what you use.
Consequently, providers supply tools that help businesses save on IT infrastructure costs.
Cost savings are usually achieved through the following methods:
• Access to more versatile IT services
• Replacement of local equipment and server rooms
• Additional services are provided only on request
• Improving operational efficiency
Migrating to cloud infrastructure requires an initial investment. But with the right spending management
strategy, good ROI can be achieved.
Improving Flexibility
At a fundamental level, cloud analytics is more flexible than local computing. There is no need to integrate
additional physical resources if you need to scale up. In fact, in many cloud computing schemes, you can
access more resources in real-time when the need arises and then cut back to save.
For example, if you run an e-shop and you know there will be more activity on Black Friday. Instead of
maintaining servers all year round, the cloud service will solve this issue when it is needed.
Security Improvement
Many people think that data available in the cloud is insecure. Business owners are interested in how
secure files, programs, and data are in the cloud—what will prevent hackers from taking and accessing
this data?
Service providers guarantee security. The combination of the best security tools: from a system of data
encryption at rest and transmission to the implementation of several security settings, can provide a very
reliable infrastructure. Even if there is a data leak, it takes several minutes to detect.
But users of cloud services should also be extremely careful and follow all security rules, especially when
controlling access to data and managing security keys. Using consistent security across both your internal
and cloud infrastructure is the best way to ensure security.
Improving Mobility
Remote work is now the norm. This has become especially popular during the pandemic. Businesses that
want to stay competitive are hiring skilled workers from all over the world, whether in the same city as
the company or on the other side of the planet.
The cloud is an internet service. This means that all resources in the cloud are available over the internet.
The user needs to sign in to the account and access the resources in the cloud. Employees don’t need to
work onsite to access the resources they need to do a job.
Sharing
Gone are the days when colleagues needed to email files back and forth to exchange information.
Everyone now uses cloud storage, which updates files in real-time as they are edited. The simplest example
of cloud collaboration software is Google Docs, which can be edited by multiple people.
Cloud analytics combines all company data sources to get a more complete picture of business processes.
All employees, regardless of their physical location, time of the day, and mood, can easily access and
share data with employees around the world.
Improved Disaster Recovery
Today, everyone understands how important it is to protect yourself from data loss. Therefore, many
people backup files, photos, etc. in the cloud. The same goes for business.
All organizations need to be aware that data loss will occur at some point. If all their important documents
and files are stored in only one place, one disaster could destroy the entire organization.
Cloud storage allows companies to save huge amounts of data away from the office and across multiple
locations. If an accident happens, they can simply turn on their recovery protocol to get back to square
one. And to prevent data loss and application downtime, cloud providers ensure that data is replicated
across multiple sites. This prevents losses from disaster or any other unexpected errors.
Automatic Update
Another advantage is that you do not have to sit and wait for updates to be installed. Cloud applications
do this automatically and without the intervention of IT staff. This saves time and money. In general, the
tools are very easy to use; workers can use cloud analytics tools without prior training.
Reducing Harm to the Environment
Usually, this benefit of the cloud is completely overlooked. But in modern realities, when animals are
dying out and future generations are threatened by global warming, in fact, cloud technologies, unlike
installed, maintained, and then disposed of machines, cause much less harm to the environment. You
simply rent or buy a place in the cloud, and do not maintain a server room, thus significantly saving on
electricity.
Cloud infrastructure is the new normal. Technology has come a long way, new options and solutions have
emerged that cannot be ignored. Depending on the characteristics and needs of the business, you have to
choose one of the types of cloud models for the analytical platform. This will be a good start for making
important strategic business decisions.
Automated analytics are a form of advanced analytics that use emerging technology, such as
machine learning, artificial intelligence, and natural language processing, to assist human data scientists
and analysts with tedious tasks like data gathering and preparation.
Relying on data-driven decision-making can be an effective way to expand a business. Data
collection and analysis allow companies to improve the efficiency of their operations. A nalyzing data can
be a long-term process, so relying on automated systems can provide employees with more time to work
on other important projects.
It is the process of using advanced computer programs and simulations to examine digital
information . Depending on a business' industry, its staff might collect statistical data on customer
information, production processes, profitability or performance metrics. Using this data to inform
important business decisions can help keep a business profitable, but analyzing these data points manually
can be time-consuming and costly.
Automated analytics systems save time and funds, as you can input data directly into software that
generates reports and makes recommendations based on user preferences. This type of automation is
particularly useful for companies that handle big data, as there could be many individual data points to
analyze on a day-to-day basis. By working with automation software, business owners can produce more
reliable results while prioritizing funds for other projects.
Repeatability factors
Analyses you only conduct once every few months are generally not suitable for automation. Automation
is often more beneficial for data analysis tasks that your team completes regularly, such as researching
new social media trends. As repetitive tasks can consume a lot of company resources, you can increase
efficiency by reducing the amount of human input required for these activities.
Consistency factors
Automation can be helpful if completing a task consistently is an important factor for accurate results. For
example, if a data task occurs at a specific time of the day, automating this process can ensure a team
completes it on time, allowing them to move forward to the next step of a project. Relying on computerized
systems to complete these tasks consistently may also reduce the number of data errors that occur.
Error mitigation
When collecting data, even small errors can lead to misrepresentations of important information.
Automated systems are often more reliable than manual options, as they rely on complex, pre-set
algorithms. If minor errors can reduce the reliability of your data sets, then using an automated analytics
process can be beneficial. For example, a pharmacologist developing a new medication might use an
automated process to ensure their study results are correct before submitting data for a formal evaluation.
Decision-making
Many businesses rely on dashboards and other types of data visualization to inform decision-making
processes. These data models can be an efficient way to convey important performance metrics to your
team. Updating the data in your dashboard can be a time-consuming process, so automating these types
of data models can save time while providing more reliable information.
• Identify candidate analytical tasks: Use the four checks we presented above: the task has
business value, is repetitive, saves time, reduces errors, and you can iteratively improve on it.
• Set up expectations by formalizing criteria for success: In the early stages, automation should
serve as a way to optimize processes, and save time. But be clear on what you expect. Start small,
by automating one data pipeline.
• Use devoted platforms and tools to speed up automation: Your engineers can write SQL
procedures and Python scripts to automate code, but relying on specialized tools and platforms
will save you time and money.
• Repeat and evaluate: As you automate part of the data analytics processes and products, evaluate
them against the success criteria set up before. If successful, automate more.
The details of a research report may change with the purpose of research but the main components of a
report will remain constant. The research approach of the market researcher also influences the style of
writing reports. Here are seven main components of a productive research report:
• Research Report Summary: The entire objective along with the overview of research are to be
included in a summary which is a couple of paragraphs in length. All the multiple components of the
research are explained in brief under the report summary. It should be interesting enough to capture
all the key elements of the report.
• Research Introduction: There always is a primary goal that the researcher is trying to achieve
through a report. In the introduction section, he/she can cover answers related to this goal and establish
a thesis which will be included to strive and answer it in detail. This section should answer an integral
question: “What is the current situation of the goal?”. After the research was conducted, did the
organization conclude the goal successfully or they are still a work in progress – provide such details
in the introduction part of the research report.
• Research Methodology: This is the most important section of the report where all the important
information lies. The readers can gain data for the topic along with analyzing the quality of provided
content and the research can also be approved by other market researchers. Thus, this section needs to
be highly informative with each aspect of research discussed in detail. Information needs to be
expressed in chronological order according to its priority and importance. Researchers should include
references in case they gained information from existing techniques.
• Research Results: A short description of the results along with calculations conducted to achieve the
goal will form this section of results. Usually, the exposition after data analysis is carried out in the
discussion part of the report.
• Research Discussion: The results are discussed in extreme detail in this section along with a
comparative analysis of reports that could probably exist in the same domain. Any abnormality
uncovered during research will be deliberated in the discussion section. While writing research reports,
the researcher will have to connect the dots on how the results will be applicable in the real world.
• Research References and Conclusion: Conclude all the research findings along with mentioning
each and every author, article or any content piece from where references were taken.
For example, for conducting a research study on the impact of the pandemic on young students, target
students between the age of 18-24, both male and female, form counties with a population of more than
25,000.
Audience based on purchase intentions: E-commerce businesses use a lot of purchase intentions data.
This is a crucial piece of information that they must possess to understand the buying intentions and
interests of potential customers.
For example, researchers group individuals based on the products they specifically look at or show interest
in. This helps them target individuals to capture their feedback on the expectations of the products and
services to enhance them further.
Audience based on personal interests: Interests make up an individual’s hobbies, passions, behavior,
things they read about and looking for. It can be anything from movie types to music genre, to cars, books,
and dance to name a few.
For example, you can offer a new action movie to action movie enthusiasts and get their feedback on
different parameters you set to collect genuine feedback.
4. Response rate: Ascertain the response rate you’re likely to receive. If your population is huge,
and you expect a low response rate, increase your sample size. A small sample, especially in a
diverse population will just not give you the accuracy you’re looking for.
Technical Report
Technical report is one that is needed where complete written report of research study is needed for the
purpose of public dissemination or record-keeping. In these report, data is presented in a simple manner
and key results are defined properly. Technical report emphasis on tools used in study, assumptions made
and presentation of findings along with their limitation.
Outline of Technical report is: –
1. Results Summary- Description of key findings of the study conducted.
2. Nature of Study- Denotes objectives of study, formulating problem on operational basis,
hypothesis used for working, type of data needed and kinds of analysis.
3. Methods Used- Tools and techniques used for carrying out the study along with their limitations
is explained.
4. Data- Description of how the data was collected, what are their sources, their characteristics and
limitations.
5. Data Analysis and Presenting Findings- It is the main body of report where data is analyzed
and finding are presented along with supporting data. Distinct types of tables and charts are used
for better explanation.
6. Conclusions- Findings are narrated in a detailed manner and implications of policies drawn from
results is explained.
7. Bibliography- It provide details of distinct sources which were consulted while performing a
research.
Popular Report
Popular report is the one that focuses on attractiveness and simplification of data. It is used when its
findings will have policy implications. Focus is laid on writing in a clear manner, minimization of
technical aspects, using charts and diagrams in liberal and detailed manner. Other key characteristics of
popular report are use of many subheadings, large prints and occasional cartoon. Practical emphasis is
given more importance in these type of report.
General outline of Popular report is as given below: –
1. Findings and Their Implications- Focus is given on practical aspects of findings of study
conducted and how these findings are implied.
2. Recommendations for Action- This section of report on basis of findings provides
recommendations for action.
3. Objectives of Study- A description of nature of problem and key objectives of conducting a
study are explained here.
4. Techniques Used- Review of all tools and techniques employed along with data employed for
concluding the study is given in this portion of study. All description is given in non-technical
manner.
5. Results- It is the main portion of report where all finding are denoted in simplified and non-
technical terms. All sorts of illustration like diagrams and charts are used liberally.
6. Technical Appendices- Technical appendices provides a detailed informed on different methods
used, forms etc. In case, if report is meant for general public then technical appendices is kept
precise.
Informational reports. These reports present facts about certain given activity in detail without any
note or suggestions. Whatever is gathered is reported without giving any thing by way of either
explanation or any suggestion. A vice-chancellor asking about the number of candidates appearing at
a particular examination naturally seeks only information of the fact (candidates taking up the
examination) of course without any comment. Generally such reports are of routine nature. Sometimes
they may fall under statutory routine category. A company registrar asking for allotment return within
the stipulate period is nothing but informational routine, falling under statutory but routine report.
Analytical reports. These reports contain facts along with analytical explanations offered by the
reporter himself or may be asked for by the one who is seeking the report. Such reports contain the
narration of facts, collected data and information, classified and tabulated data and also explanatory
note followed by the conclusions arrived at or interpretations. A company chairman may ask for a
report on falling trends in sale in a particular area. He will in this case be naturally interested in
knowing all the details including that of opinion of any of the investigator.
Common Business Research reports. These reports are based on some research work conducted by
either an individual or a group of individuals on a given problem. Indian oil company might have
asked its research division to find some substitute for petrol, and if such a study is conducted then a
report shall be submitted by the research division detailing its findings and then offering their own
suggestions, including the conclusions at which the division has arrived at as to whether such a
substitute is these and if it is there can the same be put to use with advantage and effectively. All
details shall naturally be asked and has to be given. In fact such a report is the result of a research.
Statutory reports. These reports are to be presented according to the requirements of a particular law
or a rule or a custom now has become a rule. The auditor reports to company registrar has to be
submitted as per the requirements of country legal requirement. A return on compensation paid to
factory workers during a period by a factory has to be submitted to competent authorities periodically.
These reports are generally prepared in the prescribed form as the rules have prescribed.
Non statutory reports. These reports are not in the nature of legal requirements or rules wants,
therefore, the reports are to be prepared and submitted. These reports are required to be prepared and
submitted: (i) for the administrative and other conveniences,(ii) for taking decision in a matter (iii) for
policy formulations, (iv) for projecting the future or (v) any thing alike so that efficient and smooth
functioning maybe assured and proper and necessary decision may be taken with a view to see that
every thing goes well and the objectives of the organization are achieved with assured success.
Routine reports. These reports are required to be prepared and submitted periodically on matters
required by the organization so as to help the management of the organization to take decisions in the
matters relating to day to day affairs. The main objectives of routine reports are to let the management
know as to what is happening in the organization, what is its progress where the deviation is, what
measures have been taken in solving the problems and what to do so that the organization may run
smoothly and efficiently. Routine reports are generally brief. They only give the facts. No comments
or explanations are usually offered in such reports. Generally forms are prescribed for preparation and
submission of such reports.
Special reports. Such a type of report is specially required to be prepared and submitted on matters
of special nature. Due to an accident a death of the foreman has occurred in a factory. The factory
manager may ask for a detail report from the head foreman. Such a report is classified as special
reports. These reports contain not only facts and details but they may contain suggestion, comments
and explanations as well.
Journal Articles
It is helpful to make acquainted yourself with the diverse types of articles published by journals.
Although it may emerge that there are a great number of types of articles published due to the broad
assortment of names they are published under, most articles published are one of the following
types- creative Research, evaluation Articles, Short Reports, or Letters, Case Studies, Methodologies.
Monographs or Books
Research monographs can be reformatted editions of dissertations, theses, or other noteworthy
research reports. Monographs are published by academia presses and profitable scholarly publishers.
A summit of distinction is that authors may get a royalty reimbursement for monographs, whereas,
for a good number of other research broadcasting, such as journal articles and conference papers,
authors do not accept direct payment.
As a profitable work, a monograph will characteristically be edited to be decipherable to a more
universal or specific audience, depending on to whom the publisher will be marketing the book.
The distribution of a research monograph will likely be individuals with anecdotal levels of
proficiency in the field, ranging from students to academics, practitioners to arrange people. When
writing, you can presuppose the reader will have some curiosity about the topic, but he or she may not
have many milieus in the field.
The required complexity or quality of research of a Monograph can fluctuate by country,
university, or program, and the required lowest study period. The word “Monograph” can at times be
used to describe a discourse without relation to obtaining an academic extent. The term “Monograph”
is also used to pass on to the general state of an essay or analogous work.
Professional Meetings report
• A meeting needs a clear purpose declaration. The exact goal for the specific meeting will evidently
relate to the whole goal of the group or committee. Formative your purpose is central to a
successful meeting and getting.
• A meeting should not be scheduled just because it was held at the same time last month or because
it is a standing committee. Members will show antipathy towards the intrusion into their schedules
and hastily perceive the short of purpose.
• Similarly, if the need for a meeting crops up, one should not dash into it without planning. An
inadequately planned meeting announced at the last minute is in no doubt to be less than useful.
• People may be powerless to change their schedules, may fall short to concentrate, or may hinder
the advancement and debate of the group because of their nonappearance. Those who concentrate
may feel stalled because they needed more time to organize and present all-inclusive results to the
assemblage or committee.
Business Seminar Reports
• A seminar may be defined as an assembly of employee for the intention of discussing a stated topic.
Such gatherings are typically interactive sessions where the participants fit into place in discussions
about the demarcated topic. The sessions are frequently headed or led by one or two presenters who
dole out to maneuver the discussion along the preferred conduit.
• A seminar may have numerous purposes or just one purpose. For a case in point, a seminar may be for
the rationale of education, such as a lecture, where the contributor engages in the discussion of an
academic subject for the intention of gaining a superior approach to the subject. Other forms of
instructive seminars might be held to notify some skills or acquaintance to the participants.
1. Project synopsis
A project synopsis is often used in science and engineering fields and summarizes a project’s goals,
processes, and conclusions. It often starts with a statement summarizing the problem that the project aims
to solve. It delves into methods used and other details that are important to the project, such as relevant
details about the project’s participants.
2. Research synopsis
Of the three main types of synopses, research and project synopses are most often used by research and
scientific institutions. Like a project synopsis, a research synopsis summarizes the problem or question
the research is attempting to solve and then describes how the research was conducted.
Research synopses also give details on the researchers themselves, such as any relevant academic degrees
they hold.
3. Literary synopsis
A literary synopsis is a synopsis of a work of fiction. It summarizes all the critical elements of a book so
that an agent or publisher understands, to a high level of detail, what a book is about without having read
it.
• Make a list of your book’s key elements. These include the most critical story and plot points, conflict,
characters, settings, themes, and tone. For the plot, go through each chapter, and write down one to
three of the most important plot developments from each. Then flesh out each item on your list with
any other important details.
• Write a good opening sentence. This should summarize your character, setting, and the immediate
conflict, ensuring you make it clear what’s at stake. Then link together your detailed list from step 1
to form a first draft of your synopsis.
• Read through the synopsis. Then add any details you may have forgotten. Also, look for details you
included that are not critical and cut them.
• Read through it again. Ensure that the plot and character arcs are clearly defined.
• Give it a final edit and proofread. A one-page synopsis is often ideal, but publishers may request a
synopsis of three to five pages or specify some other length.
popular option is an online dashboard. An online dashboard allows you to create and display charts in a
way that's easy to understand. Try to select a format that can present all of your data.
4. Add charts and other elements
A common component of an analytical report is its charts and other elements. Charts and graphs are how
you display your data, so it's valuable to include several of these visuals. Try to add charts that accurately
represent your findings. Some common graphs for an analytical report include line charts, bars charts,
maps and plots. Along with charts, you can add other elements, such as images and icons. These can help
your chart be easier to read and look impressive.
5. Use design practices
Once you've added your charts and other elements, start designing your report. While you can make your
analytical report as simple or complex as you'd like, it's important to use design practices. These guidelines
help make your report visually appealing and easy to read. For example, a common design practice is to
use a layout that's clear, with a mix of visuals and text. Another practice is to use plenty of white space to
improve the readability of your report.
6. Make recommendations
The last component of your report is the recommendations. Since you're trying to solve a problem or
answer a question, it's important to provide a few solutions based on your research. For example, if you
made an analytical report about operational performance, you could make recommendations for how the
company can improve its productivity.
and direction for future research. Findings are statements of factual information based upon the data
analysis.
Conclusions must clearly explain whether the hypothesis have been established and rejected. This part
requires great expertise and preciseness. A report should also refer to the limitations of the applicability
of the research inferences. It is essential to suggest the theoretical, practical and policy implications of the
research. The suggestions should be supported by scientific and logical arguments. The future direction
of research based on the work completed should also be outlined.
8) Bibliography
The bibliography is an alphabetic list of books, journal articles, reports, etc, published or unpublished,
read, referred to, examined by the researcher in preparing the report. The bibliography should follow
standard formats for books, journal articles, research reports.
The end of the research report may consist of appendices, listed in respect of all technical data. Appendices
are for the purpose of providing detailed data or information that would be too cumbersome within the
main body of the research report.
Exercise
I. Write down short answers for the following:
32. What is enterprise reporting
33. What is linear and logistic regression
34. Define time series analysis
35. Describe features of big data
36. List out types of big data analyics
37. What is predictive analysis?
38. What is prescriptive analysis?
39. What is cloud analytics?
40. Define report writing.
41. Describe automated analytics