Runit 1
Runit 1
OMBA-235
LOGO
R PROGRAMMING
FOR BUSINESS
PROGRAMME DESIGN COMMITTEE MBA (CBCS)
PRINT PRODUCTION
Copyright Reserved 2022
All Rights Reserved. No Part Of This Publication Which Is Material Protected By This
Copy Right Notice May Be Reproduced Or Transmitted Or Utilized Or Stored In Any
Form Or By Any Means, Now Known Or Hereinafter Invented Electronic Digital Or
Mechanical Including Photocopying Scanning, Recording Or By Any Information
Storage Or Retrieval System Without Prior Permission From The Publisher.
Information Contained In This Book Has Been Obtained By Its Authors From Sources
Believed To Be Reliable And Are Correct To The Best Of Their Knowledge. However,
The Publishers And Its Authors Shall In No Event Be Liable For Any Errors Omissions
Or Damages Arising Out Of Use Of This Information And Specifically Disclaim Any
Implied Warranties Or Merchantability Or Fitness For Any Particular Use.
2
R PROGRAMMING FOR BUSINESS
Unit - 1 Basic Statistics With R .............................................................. 4
Unit - 2 Classification and Tabulation of Data ...................................... 56
Unit- 3 Descriptive Statistics .............................................................. 106
Unit - 4 Moments ............................................................................... 145
Unit - 5 Measures of Skewness........................................................... 167
3
R Programming for Business
STRUCTURE
1.0 Objectives
1.1 Introduction
1.2 Significance of Statistics
1.3 Primary and Secondary Data
1.3.1 Primary Data
1.3.2 Secondary Data
1.4 Data Collection Methods
1.4.1 Surveys
1.4.2 Interviews
1.4.3 Observations
1.4.4 Experiments
1.5 Presentation of Numerical and Categorical Data
1.5.1 Numerical Data
1.5.2 Categorical Data
1.6 Let Us Sum Up
1.7 Keywords
1.8 Some Useful Books
1.9 Answers to Check You Progress
1.10 Terminal Questions
1.0 OBJECTIVES
4
R Programming for Business
1.1 INTRODUCTION
5
R Programming for Business
Secondary data, although not directly collected for the current research,
serves as a valuable resource. It is obtained from already published or
recorded sources, making it convenient and cost-effective. However,
researchers must critically assess its relevance and reliability to ensure
accurate and meaningful results.
Various methods are employed to collect data, each suited to different
types of studies. These include surveys, interviews, observations, and
experiments. Surveys involve systematically collecting information from
a sample of individuals. This method is effective for gathering opinions,
preferences, and quantitative data on a large scale. Surveys often employ
questionnaires or interviews to elicit responses. Interviews provide a more
in-depth understanding of individual perspectives. Researchers engage
directly with participants, asking open-ended questions to explore nuances
and gather detailed qualitative data. Observational studies involve
systematically watching and recording events or behaviors. This method
is particularly useful when studying natural settings and behaviors without
interference. Experiments are designed to establish cause-and-effect
relationships. Researchers manipulate variables and observe the resulting
outcomes to draw conclusions about the factors influencing a particular
phenomenon.
Once data is collected, it needs to be presented in a meaningful manner.
Numerical and categorical data are two fundamental types of data
presentation. Numerical data involves measurable quantities and is often
presented using statistical measures such as mean, median, and standard
deviation. Graphs, charts, and tables are also commonly used to visually
represent numerical data. Categorical data consists of distinct categories
or groups. Bar graphs, pie charts, and frequency tables are commonly used
to represent categorical data, providing a clear visual depiction of
distribution and proportions within different categories.
Statistics plays a vital role in data analysis by providing methods and tools
to describe, organize, interpret, and make inferences from data.
6
R Programming for Business
7
R Programming for Business
R has inbuilt functions for each - mean, median, sd, var, IQR etc.
The summarize function also calculates these together. Graphical
approaches like histogram, boxplot, scatterplot visualize the
distributional shape and outliers.
Quantiles like the 0.25, 0.5 and 0.75 quantiles aid comparison of centrality.
Tapley applies a function over subsets of data conditions on factors.
Aggregate can summarize by groups. Table produces summary stats tables
for reporting. The pastes package has more descriptive functions. dplyr
and data. Table provide faster, cleaner data manipulation.
Inferential Statistics: Inferential statistics consists of methods that allow
researchers to draw conclusions about a population from a sample.
It leverages probability theory and distributions to make estimates, test
hypotheses, and identify statistical relationships. R provides an extensive
range of built-in facilities and packages for inferential analysis.
Unlike descriptive statistics that summarize data directly, inferential
statistics employs mathematical models to infer population parameters
from samples and quantify the certainty or likelihood in those inferences.
This requires basic concepts from probability theory - random variables,
probability distributions, central limit theorem, standard errors that
estimate sampling distribution reliability.
In R, probability distributions can be simulated to build intuition and
inferences - rnorm, rpois, rbeta etc. visualize distributional forms.
Statistical inference procedures test hypotheses and derive estimates for
unknown population quantities given sample statistics. Confidence
intervals provide ranges for unknown parameters. Hypothesis testing
checks claims about population means, proportions, ANOVA effects etc.
R's formulas connect the probability and data theory to enable valid
statistical inferences from data samples. test and chestiest check
difference in means. prop. Test compares proportions. lm builds linear
regression models. Sample statistics plug into these formulas to produce
p-values testing null hypotheses and confidence intervals estimating
effect sizes. Users need not manually calculate sampling distributions.
8
R Programming for Business
Importance of Statistics in R:
9
R Programming for Business
hypotheses and a test statistic is chosen that measures
compatibility between the sample and null claim. R has a suite of
hypothesis tests built-in and via packages - t.test(), wilcox.test(),
and prop.test() for comparing means and proportions
respectively. chisq.test() handles count data, cor.test()
correlations. These tests output the sample statistics, test statistic
value, degrees of freedom, the p-value measuring probability of
observed (or more extreme) data under the null, and confidence
interval for the effect's magnitude. If the p-value falls below the
chosen significance level (often 0.05), the null hypothesis is
rejected - indicating insufficient compatibility between the
sample data distribution and the null claim. Then we conclude the
alternative is statistically supported. Failing to reject indicates
inadequate evidence against the null. Hypothesis testing
formalizes making data-based statistical inferences about effect
presence and generalizability.
Regression Analysis: Regression analysis refers to a family of
statistical techniques investigating the relationships between a
dependent outcome variable and one or more independent
explanatory variables. Amongst the most widely-used statistical
methods, regression facilitates modelling and predicting
continuous, discrete, and categorical outcome variables from a set
of predictors. R contains versatile in-built regression modelling
functions. The basic linear regression model estimates the linear
relationship between a quantitative response variable like income,
height etc. and quantitative predictor variables like age, weight
through the model - outcome = b0 + b1x1 + b2x2 + ... This
estimates the intercept (b0) and slope coefficients (b1, b2)
mapping predictors to outcome. R's lm function fits this model,
estimating coefficients and quantifying uncertainty.
Abline visualizes the fit. For binary categorical outcomes taking
0/1 values, logistic regression models the probability or odds of
"success" as explained by predictors through a logit
transformation handling the range constraint. R's glm allows this
10
R Programming for Business
Real-world Applications:
Business and Finance: Statistics plays an integral role in financial
modelling, analysis, and informed decision-making. R offers a rich set of
tools and packages tailored to business and finance applications -
portfolio optimization, risk modelling, time series forecasting,
algorithmic trading, insurance analytics, and more.
Descriptive statistics in R like mean, sd, quantile, run basic summary
profiles on financial metrics like historical returns, price movements,
trading indicators etc. Correlation and regression analysis quantify
relationships between assets, indicators, and macro factors. Plot
visualizes trends over time. ggplot2 enables publishing-grade graphics of
market dynamics.
Specialized R packages implement financial statistics and modelling
techniques. Performance Analytics has portfolio risk & return analysis.
Time Series supports autocorrelation, stationarity, ARIMA models for
temporal data. quant mod downloads market data and estimates portfolio
metrics. rug arch builds GARCH models analyzing volatility clustering.
tidy quant manipulates financial data. caret trains machine learning models
for prediction.
Beyond statistical inference, R has computational finance tools for
trading strategy development, back testing, and algorithmic trading -
TTR for technical analysis, quant Strat for strategy back testing. The
Metrics suite includes the portfolio package for portfolio optimization
based on Markovitz allocation and risk budgeting.
11
R Programming for Business
Healthcare: Statistics is integral to evidence-based medicine and
healthcare research. From descriptive summaries of symptoms to
complex multivariate models, R furnishes state-of-the-art data analysis
tools tailored for medical sciences.
A first step in analyzing healthcare data is understanding distributions -
patient demographics, disease incidence, lab measurements etc.
R produces descriptive statistics through summary, quantile and
visualizes data by hist, scatterplot, bar plot and more. This profiles the
population distribution.
Statistical modelling quantifies relationships in health data. Logistic
regression predicts clinical binary outcomes from risk factors and
symptoms. Survival analysis in R handles time-to-event data through
packages like survival, rms, and come. lme4 performs mixed effects
regression incorporating random effects along with fixed predictors.
Critically, much medical statistics concerns drawing inferences about
populations from samples. Hypothesis testing frameworks in R like test,
Wilcox. Test, prop. Test, and poetettes formally assess evidence for
effects in the data. Meta-analysis combines results across studies. Sample
size computations assist study planning.
Besides analysis, R has tools to simulate patient populations and
interventions. Probability distributions and random number generation
provide flexibility for modelling. The D2Hawkeye package models
entire healthcare systems. R thus extends from statistics to a decision-
modelling environment.
Social Sciences: Statistics plays a foundational role across the quantitative
social sciences - psychology, sociology, political science, communications
etc. R furnishes these fields state-of-the-art capabilities for data analysis
and modelling tailored to social research contexts.
A common application is questionnaire and survey data summarization.
R produces descriptive statistics on response patterns and trends via
summary, table, core and powerful visualizations through ggplot2, lattice
and base graphics.
This supports understanding data distributions and relationships.
12
R Programming for Business
13
R Programming for Business
Primary data affords maximum control, flexibility and currency to
researchers in assessing their research questions. This can leverage
approaches like field surveys, experiments, interviews, focus group
studies etc. tailored for the precise phenomena of interest. However,
designing, sampling, executing, recording and processing primary data
from scratch demands considerable resource commitments of time,
access, human power and funding.
In comparison, secondary pre-existing data sources represent a valuable
opportunity to conduct rigorous studies feasibly by analyzing patterns
across prior large-scale or wide-ranging data assets instead of pursuing
elaborate primary collections. For instance, census datasets, electoral
results, disease registries, social media trends, student records , open
access repositories etc. offer treasure troves of real-world data facets
amenable to scientific investigation. This can answer novel questions
economically by applying thoughtful analysis methods without further
data engineering.
However, secondary data analysis presents limitations as well. The
available variables and cohorts may not fully capture the desired target
behaviors. Metadata, provenance and data collection methods need
careful review to assess fit and analysis suitability relative to the
contextual research aims. Appropriate interpretation warrants factoring
data generating processes. Overall, a combination of primary and
secondary analyses often yields optimal and multifaceted insights.
14
R Programming for Business
15
R Programming for Business
question formats while automatic logging maintains data hygiene.
Streaming digital trace data opens new behavioral vistas. GPS, image
recognition and in-situ sensors enable direct environmental monitoring.
Telemedicine consults and wearables gather patient physiological data
unseen previously.
Such technical measurement channels promise multifaceted fine-grained
coverage of target phenomena once unfeasible or relying on coarse
secondary proxies earlier. Automated collection minimizes reporting
biases while boosting scale and compliance. Modern systems also embed
validation checks and audit trails improving transparency.
Advanced analytics draw holistic insights intersecting diverse data types.
However, digital tools carry challenges as well regarding usability,
access, and representation biases which must be addressed upfront, not as
an afterthought. Technology adoption metrics should ensure target
populations are not excluded disproportionately. Sensor monitoring
requires knowing meaningful thresholds and baselines beforehand
through primary calibration measurements. Logging complete contextual
metadata remains key for responsible interpretation.
Online Surveys and Data Collection Platforms: Online survey tools and
digital data collection platforms have streamlined primary data gathering
through convenient template interfaces, cloud storage, automated analysis,
and participant access at scale. However, researchers must address
challenges of data quality and representation biases before generalizing
insights derived.
Services like Qualtrics, SurveyMonkey, Google Forms, Amazon
Mechanical Turk etc. facilitate question creation, distribution, response
logging and analytics dashboards to summarize results. Embedded
quality checks like enforced validation, multiple choice formats,
randomization modules etc. aid reliability. Remote asynchronous
collection grants convenient access for subjects while enabling large,
diverse samples unconstrained by geography. Automatic recording
minimizes human error during data entry and collation. Export options
allow use with external tools for further modelling and inference tasks.
16
R Programming for Business
17
R Programming for Business
being captured and ensure resulting datasets can support credible
analysis and insights. This necessitates deliberate protocols and
validation checks before, periodically during, and post data compilation
to uphold analytical utility.
In instrument-based measures like surveys and sensor readings,
standardized calibrations assess accuracy against authoritative
references, specifying minimum precision and error thresholds
acceptable for use cases. Certifying respondent understanding via test
questions and previews limits noise from unclear items. Authentication
of identities and credentials fights fraudulent entries. Random sampling
orders combat biasing. Timestamping enables consistency audits by
factors like location, fatigue etc.
For human-generated data like experiments, treatment integrity demands
adherence confirmation to designated protocols. Reliability metrics
quantify marker proficiency and gaps prompting retraining. Replication
measurements affirm reproducibility across comparable subgroups.
Blinding investigators to intervention conditions limits perception biases.
Negative and positive controls check for false signals. Randomization
enables unbiased group assignments.
Post gathering, analytics examine completeness of expected records, data
distributions, outlier detection etc. Assessing subgroup variability gauges
distortions. Metadata captures provenance descriptions, question formats,
scoring logics etc aiding reuse. Formal data validation rulesets fight bad
input. Versioning enables flight records rollback.
1.3.2 Secondary Data
Secondary data refers to data originally collected and compiled for
purposes other than the current research questions at hand. In contrast
with primary data gathered first-hand tailoring measurement directly to
analytical needs, secondary sources represent pre-existing records
amenable to retrospective analysis towards new ends. By tapping prior
efforts, secondary data unlocks immense analytical potential efficiently.
As data gathering, storage and dissemination technologies progress,
accumulations of past endeavors to track societies, markets, ecosystems
etc. create vast repositories holding clues for discovery in historical
18
R Programming for Business
19
R Programming for Business
on the reputation of the institution and their adherence to standards
regarding research methodology and transparency. However, potential
conflicts of interest or biases should also be considered.
Government and Institutional Data: Government agencies such as
federal and state departments alongside respected institutions like
universities and research organizations generate and aggregate immense
amounts of secondary data across disciplines. Tapping into the public
databases and data repositories made available by official sources
provides researchers with access to high-quality, credible information
that typically adheres to rigorous standards and processes in its collection
and reporting.
For example, census data published by national statistics agencies and
epidemiological reports from health departments contain validated
quantitative and qualitative insights about populations based on
methodical analysis by subject experts. These databases can yield
representative, unbiased snapshots of various societal aspects.
Researchers can avoid the difficulty of recruiting sample groups and
conducting lengthy surveys by simply accessing related public data
resources.
Likewise, universities and associations publish studies in their own
repositories that follow field-specific protocols around sampling
approaches, measurement tools, analytical assumptions, error tolerances,
and peer-review oversight before public release. The methodical nature
of the data collection and reporting process followed by institutional
research bodies promotes a high degree of accuracy and trust in the data
integrity.
Digital and Online Sources: The advent of digital platforms and the
internet has exponentially increased the availability of online data that
researchers can potentially draw insights from. Websites, forums, social
media channels, e-commerce portals, and digital databases now produce
vast amounts of user-generated data daily alongside datasets created
explicitly for public consumption.
Tapping into big data that captures consumer search trends, purchasing
behavior, feedback patterns, and content engagement on online platforms
20
R Programming for Business
21
R Programming for Business
collinear relationships with each other - collecting similar information.
Analyzing correlated variables degrades model predictive capacity,
necessitating removal of duplicated attributes.
Data Analysis: Secondary datasets, whether from public repositories or
internal organizational records, require dedicated analytical approaches to
derive contextual insights. Established statistical techniques provide the
primary toolkit to examine patterns, model relationships, and interpret
signals within existing datasets.
Descriptive analytics, starting from data visualization, univariate analysis
around central tendencies and spreading, bivariate identification of
correlations and associations etc. constitutes initial examination to
characterize dataset features. Following aggregation of variables into
composite measures and application of sampling weights allows
population-level projections. Statistical testing procedures further assist
in making probabilistic inferences around phenomenon observations in
the data.
With reliable, good quality datasets, researchers can leverage multivariate
methods like regression analysis, factor analysis etc. for modeling causal
connections between phenomena; clustering algorithms to detect segments
and personas; and classification approaches to categorize entities or
predict outcomes based on historical patterns. Time series analysis
specifically tracks trends and trajectories in temporal data.
The analytical workflow is facilitated through dedicated statistical
software and programming platforms like SAS, Stata, SPSS, R, Python
etc. These tools automate the supply of cleaned, integrated data into
analytical models and generate reports, projections and visualizations to
assist interpretation. Big data capabilities using Hadoop, Spark and cloud-
based warehousing facilitate information consolidation from dispersed
secondary sources while GUI-based solutions expedite analysis.
Check Your Progress-2
1. What are the advantages of using primary data in research?
……………………………………………………………………………
……………………………………………………………………………
22
R Programming for Business
……………………………………………………………………………
…………………………………………………………………………….
2. What challenges are associated with secondary data analysis?
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
…………………………………………………………………………..
25
R Programming for Business
Anonymity and informed consent statements promote participation rates
and safeguard ethical compliance. Questionnaire structure, length and
medium suit target groups and research objectives while pretesting
iterations refine instrument quality.
Response Scales and Formats: Survey answer types constitute a crucial
questionnaire design choice that impacts administration, analytics
pathways and insight potential. Categoric response options include
dichotomous binary capturing yes/no perspectives, nominal typed multi-
choice item selections and ordinal graded scales signaling rank ordered
opinions. Numeric options encompass discrete ordinal counts,
continuous ratio quantities and cardinal monetary values or temporal
durations.
Among ordinal types, the Likert scale offers gradient answer options
spanning disagree/agree attitudes around single declarative statements.
An odd number of choices allows a neutral midpoint. Variants specify
granularity - "strongly agree" to "strongly disagree" ranges or simply
"always" to "never" frequencies. Semantic differences help match
question styles - value, frequency etc. For multi-dimensional concepts, a
matrix table template assesses the same choices across factors facilitating
comparison.
Rank order and constant sum scale questions have respondents
numerically prioritize or allocate quantitative values across items - useful
for trait preferences or budget allocation exercises. Open-ended box
responses elicit qualitative explanations while comment boxes gather
verbatim feedback requiring coding before analysis. Choice set clicking
optimizes online self-administration convenience but demands
exhaustive, mutually exclusive options with a possible "none" catchall.
Pilot Testing: The execution of a small-scale pilot survey constitutes a
vital quality assurance checkpoint within the phased process of nurturing
survey instruments before field deployment. Trial testing questionnaires
on groups with characteristics aligned to target samples unearths vital
refinements around question wording ambiguities, difficult skip patterns,
inadequate choices, length issues and general flow concerns.
26
R Programming for Business
27
R Programming for Business
Face-to-face surveys using home visits or public interceptions secure
higher quality data with the ability to collect collateral observations. But
custom exercises become expensive and data volume gets limited by
access constraints.
Ethical Considerations: Survey research activities directly interfacing
with human subjects demand prudent constructs to uphold participant
rights and welfare. Voluntary participation necessitates proper
disclosures on the study's purpose, sponsor, potential data usage,
anonymity mechanisms and withdrawal policies as part of informed
consent procedures. This enables respondents to rationally assess
involvement risks around roles possibly resulting in psychological,
economic, legal or social harm before agreeing to contribute views.
Further, collected personal information requires responsible data
stewardship. Confidentiality undergirds survey legitimacy - limiting
access to identifiable data, aggregating public reporting, securely storing
records with encryption protocols and regulating data sharing
conventions via agreements. Contact information gets maintained
separately from survey responses, only conjoined using reference codes
during active analysis by the core team before complete dataset
anonymization.
Relatedly, transparent policies must cover secondary usage norms for
archived survey data that respect original consent conditions. Ethical
standards also inform sound sampling protocols ensuring fair selection,
administering uniform collection instruments across strata, applying
consistent quality checks and avoiding leading communications that
prompt certain responses. Such mechanisms uphold credibility of the
resulting dataset and ensure representative voices get reflected in survey
findings without biases or coercion pressuring participation.
Data Analysis and Interpretation of Surveys:
Quantitative Analysis: Survey analysis leverages a repertoire of
statistical tools and testing approaches to derive descriptive summaries,
compare group responses and model variable relationships based on
collected response datasets.
28
R Programming for Business
29
R Programming for Business
like age, locale and gender. Content analysis allows numerical
conversion of textual data into quantifiable manifest content categories
historically compared across periodic surveys. sentiment analysis
through dictionary methods automates identification of emotional
expressions, criticisms and applause.
1.4.2 Interviews
Interviews constitute an intensive process of directed conversation
oriented to systematically gather first-hand descriptive insights around
lived experiences, attitudes, behaviors, expert opinions or eyewitness
accounts associated with a research phenomenon from selected individual
participants.
As data implicators, interviews excel in casting explanatory shades - the
contextual, relative, personal or situated facets challenging to quantify
concerning target themes under investigation. Through guided discussion,
question probes and narrative framing, investigators can motivate subjects
to intimately share nuanced reflections on beliefs, emotions, insider
knowledge, or explanatory rationales associated with their relationship to
the research focus area.
Capturing thick descriptive data around subjective vignettes, personal
histories or quotes facilitates documenting informal realities occurring
locally across incidents which elude questionnaire constraints. Analytical
frameworks like grounded theory then help collate common experiential
denominators into shared conceptual maps. Comparatively, interviews
allow customized targeting of niche experts or outliers holding specialized
experiences beyond behaviourally measured norms.
Purpose and Types of Interviews:
Purpose of Interviews:
Interviews serve as a vital data collection method in research, offering a
structured and interactive platform for obtaining in-depth information
from participants. The primary objectives of conducting interviews are
manifold, encompassing the exploration of complex phenomena, the
elucidation of participants' perspectives, and the generation of rich,
context-specific data.
30
R Programming for Business
Structured Interviews:
Advantages: Structured interviews are characterized by a
predetermined set of questions administered in a standardized
manner. This method offers several advantages, including
enhanced reliability and ease of analysis.
31
R Programming for Business
The standardized format ensures consistency across interviews,
facilitating the comparison of responses. Moreover, structured
interviews are efficient in terms of time and resource utilization,
making them suitable for large-scale studies. Researchers can
employ statistical techniques with greater confidence due to the
standardized nature of the data.
Limitations: However, structured interviews may lack flexibility
in addressing unanticipated insights or probing deeper into
responses. The predetermined questions might not capture the
complexity of participants' experiences or viewpoints, limiting the
depth of qualitative data. Additionally, the rigid structure may
hinder the establishment of rapport between the interviewer and
the participant, potentially affecting the candor and richness of
responses.
Semi-Structured Interviews:
Advantages: Semi-structured interviews strike a balance between
structure and flexibility. This approach allows researchers to use
a predefined set of core questions while also permitting the
exploration of emergent themes. The semi-structured format
encourages in-depth responses, fostering a more comprehensive
understanding of the research topic. This method is particularly
valuable when investigating complex phenomena, providing the
researcher with the flexibility to adapt the interview to the
participant's context and responses.
Limitations: Despite their flexibility, semi-structured interviews
require skilled interviewers who can navigate the balance
between adherence to the core questions and exploration of
additional topics. The variability in interviewer style and probing
techniques may introduce some degree of inconsistency in data
collection. Additionally, the analysis of semi-structured
interviews can be more time-consuming than that of structured
interviews due to the diverse and open-ended nature of the
responses.
32
R Programming for Business
Unstructured Interviews:
Advantages: Unstructured interviews are characterized by their
open-ended, exploratory nature. This method allows for
maximum flexibility, enabling the researcher to delve deeply into
participants' perspectives without predefined constraints.
Unstructured interviews are well-suited for exploring novel or
poorly understood topics, as they provide the freedom to follow
unexpected leads and capture rich, context-specific data.
Limitations: While unstructured interviews offer unparalleled
flexibility, they present challenges in terms of standardization and
comparability. The absence of a predefined set of questions
makes it challenging to ensure consistency across interviews.
Moreover, the open-ended format may result in data that is more
difficult to analyze, as the richness and diversity of responses can
be overwhelming. Additionally, the rapport-building process may
be more critical in unstructured interviews, as the absence of a
predetermined structure requires a higher level of participant
comfort and engagement.
33
R Programming for Business
Open-ended questions encourage participants to provide detailed
and nuanced responses, offering valuable qualitative data.
Closed-ended questions, on the other hand, can yield quantifiable
data and aid in the standardization of responses.
Avoid Ambiguity and Jargon: Craft questions with precision,
avoiding ambiguous language or disciplinary jargon that may
confuse participants. Clarity in language ensures that respondents
interpret questions consistently, reducing the likelihood of
miscommunication and enhancing the reliability of data.
Pilot Test the Protocol: Conduct a pilot test with a small sample
to refine the interview protocol. Assess participant
comprehension, identify potential ambiguities, and gauge the
overall effectiveness of the questions. Iterative refinement based
on pilot testing enhances the quality of the final interview
protocol.
Include Probing Techniques: Integrate probing techniques to
elicit deeper responses. Probing involves follow-up questions that
encourage participants to expand on their initial answers,
providing richer insights. Examples of probing techniques include
asking for clarification, requesting examples, or exploring
alternative perspectives.
Consider Cultural Sensitivity: Ensure that questions are
culturally sensitive and applicable to the diverse backgrounds of
participants. Avoid assumptions based on cultural stereotypes and
strive for inclusivity in language and content to enhance the
relevance of the interview protocol across different populations.
Maintain Neutrality and Avoid Leading Questions: Formulate
questions in a neutral manner to prevent bias and leading
participants toward specific responses. Neutral phrasing fosters
an environment in which participants feel comfortable expressing
their genuine perspectives without feeling guided or influenced.
Prioritize Conciseness: Craft questions with brevity and clarity.
Concise questions are easier for participants to comprehend and
answer accurately.
34
R Programming for Business
1.4.3 Observations
Observational research, a methodological approach employed across
diverse disciplines, involves the systematic and unobtrusive observation
of phenomena in their natural settings. This method serves as a powerful
means of collecting data, capturing the complexity and richness of
behaviors, interactions, and contexts within real-world environments.
Observational research holds particular significance in fields such as
psychology, sociology, anthropology, education, and environmental
science, offering researchers a unique lens through which to explore and
understand the intricacies of human behavior, social dynamics, and
ecological systems. Unlike self-report measures or structured interviews,
direct observation allows for the examination of behaviors as they unfold
naturally, affording researchers unparalleled insights into the nuances,
patterns, and contextual factors that shape the phenomena under
investigation. This method's inherent capacity to provide a holistic and
contextually embedded understanding makes observational research an
invaluable tool for advancing knowledge and informing evidence-based
practices across various academic domains.
Purpose and Types of Observations:
Purpose of Observational Research:
Observational research serves as a pivotal methodology within the realm
of scientific inquiry, offering a nuanced and context-rich approach to data
collection. Its primary purpose lies in the systematic and unobtrusive
observation of phenomena in their natural settings, enabling researchers to
glean valuable insights into human behavior, social interactions, and
environmental dynamics. By immersing oneself in the authentic context
of the subject under investigation, observational research seeks to provide
a genuine portrayal of occurrences, devoid of the potential biases
introduced by controlled environments or self-reported data.
One of the paramount objectives of employing observational research is
the pursuit of a comprehensive understanding of behavior and events as
35
R Programming for Business
they naturally unfold. Through careful and objective observation,
researchers can capture the intricacies of human conduct, discern
patterns, and uncover underlying dynamics that may elude detection in
more artificial or contrived settings. This method allows for the
exploration of the intricate interplay between variables, facilitating a
holistic comprehension of complex phenomena within their ecological
niches.
Observational research proves particularly valuable in scenarios where
participants may be unable or unwilling to articulate their experiences
accurately, or where the phenomenon of interest manifests spontaneously
and unpredictably. In fields such as psychology, sociology, and
anthropology, where human behavior and social dynamics constitute
focal points of investigation, observational research offers an
unparalleled avenue for uncovering the subtle nuances that shape
individuals' actions and interactions. Additionally, in ecological studies,
naturalistic observation enables the examination of wildlife behavior and
environmental dynamics with minimal interference, preserving the
authenticity of the observed behaviors.
Furthermore, observational research contributes to the validation and
refinement of theoretical frameworks by grounding abstract concepts in
real-world contexts. It allows researchers to bridge the gap between
theoretical constructs and empirical realities, fostering a more robust
foundation for subsequent analyses and interpretations. This method also
facilitates the identification of novel research questions, paving the way
for further exploration and hypothesis generation.
Types of Observations:
Observational research manifests itself in various forms, each tailored to
specific research goals and the nature of the phenomena under
investigation. Three prominent types of observations—participant
observation, non-participant observation, and naturalistic observation—
serve as distinct methodological approaches, each offering unique
advantages based on the researcher's objectives.
36
R Programming for Business
37
R Programming for Business
It is particularly advantageous when the aim is to observe behaviors
that may be influenced by the context, and when the researcher seeks
to avoid the potential biases introduced by laboratory settings.
38
R Programming for Business
1.4.4 Experiments
Experimental research, as a distinguished and methodologically rigorous
approach, occupies a paramount position in the arsenal of data collection
methods within the scientific domain. Defined by its systematic
manipulation of independent variables to observe their effects on
dependent variables, experimental research stands as an invaluable tool for
investigating causal relationships and discerning patterns in the intricate
fabric of phenomena. This methodological framework, characterized by
its structured design and controlled conditions, facilitates the isolation of
specific factors for meticulous examination, offering unparalleled insights
into the dynamics underlying diverse phenomena.
Purpose and Types of Experiments:
Purpose of Experimental Research:
Experimental research serves as a crucial methodological approach in the
realm of scientific inquiry, primarily aimed at achieving distinct
objectives. The fundamental purpose of conducting experiments is to
systematically investigate and understand phenomena by manipulating
variables in a controlled environment. This method allows researchers to
explore causal relationships between variables, shedding light on the
cause-and-effect dynamics inherent in the phenomena under
investigation.
The primary objective of experimental research is to contribute to the
advancement of knowledge by providing empirical evidence and testing
hypotheses. Through a carefully designed experimental setup,
researchers can manipulate independent variables while controlling for
potential confounding factors. This meticulous control enables them to
39
R Programming for Business
observe changes in the dependent variable and, consequently, discern
any causal relationships that may exist.
Crucially, experiments possess a unique ability to establish cause-and-
effect relationships, a feat not easily attainable through other research
designs. By systematically manipulating one or more variables and
observing their impact on the outcome, researchers can draw more
definitive conclusions about the factors influencing a particular
phenomenon. This cause-and-effect clarity is instrumental in building a
solid foundation for scientific theories and contributing to the cumulative
body of knowledge within a given field.
Types of Experiments:
Experimental designs play a pivotal role in shaping the structure and
implementation of research studies. Three prominent types of
experimental designs include between-subjects, within-subjects, and
factorial designs. The selection of a specific design depends on the
research goals, questions posed, and the nature of the phenomena under
investigation.
40
R Programming for Business
41
R Programming for Business
types of data: numerical and categorical. Numerical data, characterized by
quantitative values, encapsulates measurable quantities, while categorical
data, defined by distinct categories or labels, represents qualitative
information. Effectively presenting these two types of data is essential for
conveying research outcomes with accuracy and coherence.
42
R Programming for Business
43
R Programming for Business
The zero point in ratio data is meaningful and represents a complete
absence of the characteristic being measured. Examples of ratio data
include height, weight, and income. For instance, a height of 0 cm implies
the absence of height, making the zero point meaningful. Similarly, a
weight of 0 kg signifies the absence of weight. Ratio data allows for
meaningful ratios and comparisons, as values can be compared in terms of
multiplication or division.
Descriptive Statistics for Numerical Data:
Measures of Central Tendency: Measures of central tendency are
statistical tools used to summarize and describe the central or average
value of a dataset. Three commonly employed measures in this regard
are the mean, median, and mode. Each of these measures provides a
different perspective on the central tendency of a dataset and is suitable
under specific circumstances.
Mean: The mean, often referred to as the average, is calculated by
summing up all values in a dataset and dividing the total by the number
of observations. This measure is appropriate when dealing with a dataset
that is normally distributed or follows a symmetrical pattern. For
instance, when examining the average income of a population or the
average test scores of students, the mean is a reliable measure. However,
the mean can be sensitive to extreme values or outliers, making it less
suitable for skewed distributions.
Median: The median is the middle value of a dataset when it is arranged
in ascending or descending order. If there is an even number of
observations, the median is the average of the two middle values. The
median is particularly useful in scenarios where the data is skewed or
contains outliers. For example, when analyzing income data, the median
provides a more robust measure than the mean because it is less
influenced by extreme values. It accurately represents the center of the
distribution without being skewed by outliers.
Mode: The mode represents the most frequently occurring value in a
dataset. Unlike the mean and median, the mode is not affected by
extreme values or the shape of the distribution. The mode is appropriate
for categorical data or datasets with distinct peaks. For instance, in a
44
R Programming for Business
45
R Programming for Business
Unlike numerical data, which involves measurable quantities, categorical
data encompasses non-numeric information that falls into distinct classes
or labels. The prevalence of categorical data is ubiquitous across various
fields, playing a pivotal role in capturing and interpreting information
that is not inherently numerical.
In fields such as sociology, psychology, and market research, categorical
data is commonly employed to classify individuals based on
characteristics such as gender, marital status, or consumer preferences.
Medical studies utilize categorical data to categorize patients into
different diagnostic groups, while educational research may classify
students based on academic performance or learning styles. Categorical
variables are also integral in areas like linguistics, where language
elements are categorized into phonetic, syntactic, or semantic classes.
The nature of categories in categorical data is inherently discrete and
qualitative. These categories may represent nominal variables, where
there is no inherent order or ranking, such as colors or types of fruits.
Alternatively, they can represent ordinal variables, where there is a
meaningful order but the differences between categories are not uniform,
as seen in survey responses like "strongly agree," "agree," "neutral,"
"disagree," and "strongly disagree."
Types of Categorical Data:
Nominal vs. Ordinal Data: Researchers often encounter two
fundamental types: nominal and ordinal. These designations are pivotal
for understanding and interpreting data, as they convey distinct levels of
information hierarchy.
Nominal data is characterized by categories that are devoid of any
inherent order or ranking. In other words, the classification is merely
nominal or in name. These categories serve as labels for different groups
without implying any particular sequence or significance. For instance,
when analyzing the favorite colors of a group of individuals, the resulting
data – such as "red," "blue," or "green" – falls under the umbrella of
nominal categorization. In this context, the colors are merely labels and
do not possess a prescribed order.
46
R Programming for Business
47
R Programming for Business
Exhaustiveness: The two categories together encompass all
possible outcomes. Every observation must belong to either
category, leaving no room for unclassified or undefined instances.
48
R Programming for Business
Colour Frequency
Red 15
Blue 20
Green 12
Yellow 8
Orange 5
Bar Charts:
Bar charts are effective graphical representations used to visually convey
the distribution of categorical data. They are particularly suitable for
illustrating the frequency or proportion of different categories within a
dataset. Bar charts provide a clear and accessible way to interpret
categorical information, making them a popular choice in data
visualization.
49
R Programming for Business
Bar Charts for Categorical Data:
51
R Programming for Business
Surveys, employing questionnaires or interviews, systematically
collect quantitative data, making them effective for large-scale
information gathering on preferences and opinions.
Interviews allow researchers to explore nuances and gather
detailed qualitative data by engaging directly with participants
and asking open-ended questions.
Observational studies systematically observe and record events or
behaviors, particularly useful when studying natural settings
without interference.
Experiments manipulate variables to observe outcomes,
facilitating the establishment of cause-and-effect relationships in
understanding influencing factors.
1.7 KEYWORDS
52
R Programming for Business
53
R Programming for Business
Refer 1.2 for Answer to check your progress- 1 Q. 2
Measures of central tendency, including means, medians, and modes, help
summarize the central or typical values in a dataset. They provide a clear
reference point for understanding the distribution of data, aiding
researchers in describing and comparing different sets of information with
a single, standardized value.
Refer 1.3 for Answer to check your progress- 2 Q. 1
Primary data provides researchers with maximum control over data
collection methods, allowing for tailored approaches like field surveys and
interviews. It offers flexibility to address specific research questions and
ensures currency, as data is collected firsthand for the current analysis
objectives.
Refer 1.3 for Answer to check your progress- 2 Q. 2
Secondary data analysis faces limitations such as incomplete capturing of
target behaviors and the need for careful review of metadata and data
collection methods. Researchers must assess the fit of available variables
and cohorts with their research aims and consider data-generating
processes for appropriate interpretation.
Refer 1.4 for Answer to check your progress- 3 Q. 1
Quantitative techniques, such as surveys, focus on numerical data gathered
from representative samples to characterize distributions, frequencies, and
correlations. In contrast, qualitative approaches, like focus groups and in-
depth interviews, collect non-numerical data, exploring subjective
narratives and contextual meanings not easily captured by quantifiable
metrics.
Refer 1.4 for Answer to check your progress- 3 Q. 2
Alignment with research questions, disciplinary conventions, participant
accessibility, investigator skills, and resource availability is crucial for
effective data collection. It ensures that the chosen methodology is well-
suited to extract relevant information and insights, enhancing the validity
and reliability of the research findings.
Refer 1.5 for Answer to check your progress- 4 Q. 1
Numerical data, being measurable and precise, enables researchers to
conduct rigorous analyses and draw statistical inferences, providing a
54
R Programming for Business
55