0% found this document useful (0 votes)
30 views39 pages

Unit 2

RMI unit 2 ppt.

Uploaded by

joy15102000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views39 pages

Unit 2

RMI unit 2 ppt.

Uploaded by

joy15102000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

UNIT 2

RESEARCH DESIGN AND


DATA COLLECTION
Statistical Design of Experiments (DOE)

Statistical Design of Experiments (DOE) is a structured and systematic approach


used in research and industrial settings to plan, conduct, and analyze
experiments in a way that allows for the efficient and effective exploration of
variables and their interactions.
DOE is particularly valuable in fields such as engineering, manufacturing, quality
control, and scientific research, where it is essential to optimize processes,
products, or systems while minimizing the number of experimental runs and
resources required.
DOE in AI

● The application of Statistical Design of Experiments (DOE) in the field of


artificial intelligence (AI) involves using experimental methodologies to
systematically and efficiently explore the various factors and parameters that
affect AI model performance and behavior.
● This approach helps researchers and practitioners optimize AI models,
algorithms, and training processes while minimizing resource consumption
and time.
● DOE is valuable for fine-tuning, architecture selection, data augmentation,
and model interpretability in AI.
BASIC PRINCIPLES OF DOE

● The basic principles of experimental designs are

1) Randomization

2) Replication

3) Local Control

● These principles make a valid test of significance possible.


Randomization
Randomization in the statistical experiment design is a critical methodological technique
used to minimize bias and control for potential sources of variability that could affect the
results of an experiment. It involves the random assignment of experimental units to different
treatment groups or conditions.

Random Assignment: In a controlled experiment, researchers typically have one or more


groups of experimental units (e.g., individuals, subjects, objects) to which they want to
apply different treatments or conditions. Randomization ensures that each experimental
unit has an equal chance of being assigned to any of the treatment groups. This helps
eliminate any systematic bias that might arise if subjects were not assigned randomly.
Balancing Effects: Randomization helps balance the effects of potential confounding
variables or sources of variation that are not controlled in the experiment. By randomly
distributing these factors across treatment groups, researchers can assume that, on
average, these factors will have an equal impact on each group.
Statistical Validity: Randomization is fundamental to the validity of statistical tests and
analyses used to draw conclusions from the experiment. When treatments are
randomly assigned, statistical methods can make valid inferences about the
population from which the experimental units were drawn.
Minimizing Bias: It minimizes selection bias and ensures that the groups being
compared are as similar as possible at the outset of the experiment. This is important
because it allows researchers to attribute any differences in outcomes between
groups to the treatments themselves, rather than to pre-existing differences among
subjects.
Enhancing Generalizability: Randomization increases the generalizability of study
findings. If subjects are randomly assigned, the results are more likely to be applicable
to a broader population.
Ethical Considerations: Randomization can also be a fair way to assign individuals to
different treatment groups in situations where it may not be ethically justifiable to do
otherwise.
REPLICATION
Replication in statistical experimental design refers to the practice of conducting multiple,
independent repetitions of an experiment under the same or similar conditions. Replication
is a fundamental concept in experimental design and plays a crucial role in the validity and
reliability of experimental results. Here's why replication is important:

Enhancing Reliability: Replication increases the reliability of experimental results. By


repeating the experiment multiple times, researchers can assess the consistency of
their findings. If the results are consistently reproduced across replications, it provides
greater confidence in the validity of the conclusions.
Reducing Random Variation: In any experimental study, there is inherent variability or
"noise" in the data due to random factors. Replication helps to reduce the impact of
this random variation. When results are consistent across multiple replications, it
becomes more likely that the observed effects are real and not merely the result of
random chance.
Generalizability: Replication allows researchers to assess the generalizability of their
findings. If the same results are obtained in multiple replications, it suggests that the
observed effects are not specific to a particular set of conditions but may apply
more broadly.
Detection of Outliers or Anomalies: Replication can help identify outliers or
anomalies in the data. If a single replication produces unusual results, it may be an
indication of errors or unexpected factors. Repetition allows researchers to
distinguish between consistent patterns and aberrations.
Statistical Analysis: Replication is essential for conducting statistical analyses. Many
statistical tests require multiple data points to calculate measures of central
tendency, variance, and significance. Replication provides the necessary data for
robust statistical testing.
Control of Extraneous Variables: Replication helps control for the influence of
extraneous variables or sources of variability that may affect the results. By
conducting replications, researchers can assess whether the effects of interest
persist even when other factors vary.
Validation of Hypotheses: Replication is a critical step in the scientific process of
hypothesis testing. To establish a hypothesis as valid and reliable, it must be
tested and confirmed through repeated experimentation.
Peer Review and Scientific Credibility: Replication is a cornerstone of the
scientific method. Scientific findings are often considered more credible and
trustworthy when they have been independently replicated by different
researchers or research groups.
LOCAL CONTROL
Local control is a principle in the design of experiments that focuses on controlling or
accounting for variability at a specific, localized level within an experiment. It is one of
the fundamental principles used to ensure the validity and reliability of experimental
results. Here's an explanation of local control in the context of the design of
experiments:

Identification of Sources of Variability: In any experimental setting, there are


various sources of variability that can affect the outcome or response being
measured. These sources can include natural variations, uncontrollable factors, or
external influences.
Local Control at the Source: Local control involves identifying and controlling
these sources of variability at their origin or source, as close as possible to where
they occur within the experiment. The goal is to minimize or eliminate their impact
on the response variable.
Experimental Design: To implement local control, experimental designers may
use specific strategies and techniques. These can include:
● Randomization: Randomly assigning experimental units to treatment
groups to ensure that any uncontrolled variables are equally distributed
among the groups.
● Blocking: Grouping experimental units with similar characteristics or
potential sources of variability (e.g., age, location, time) to create
homogeneous blocks. Treatments are then randomly assigned within each
block.
● Control Variables: Holding specific variables constant to prevent their
influence on the response variable.
Advantages of Local Control:
● Increased Precision: Controlling variability at the source can lead to more precise
and accurate results because it minimizes the interference of uncontrolled factors.
● Improved Validity: By addressing potential sources of bias or confounding at the
local level, researchers can draw more valid conclusions from the experiment.
● Enhanced Reproducibility: Experiments with local control are often more
reproducible, as the effects of extraneous variables are minimized or accounted
for.
Examples:
● In a medical study evaluating the effectiveness of a new drug, local control might
involve randomizing patients to treatment groups and ensuring that patients with
similar baseline characteristics (e.g., age, gender, disease severity) are evenly
distributed across groups.
● In an agricultural experiment testing the effects of a new fertilizer, local control
might involve blocking the experiment based on soil types and ensuring that each
type receives all treatments.
Types of DOE

Design of Experiments (DOE) encompasses various


types of experimental designs, each tailored to
specific research objectives and situations. The key
types of experimental designs in DOE include:

Completely Randomized Design (CRD):


● In CRD, subjects or experimental units are
randomly assigned to different treatment
groups.
● It is often used when there is no natural
grouping or blocking of experimental units.
● Suitable for experiments where the primary
goal is to assess the effect of a single
treatment factor.
Types of DOE
Randomized Complete Block Design (RCBD):

● RCBD divides experimental units into homogeneous groups or blocks based on a


known source of variability (e.g., age groups, geographic locations).
● Within each block, random assignment of treatments is performed.
● This design helps control the effects of the blocking variable and increases the
precision of estimates.
Types of DOE

Factorial Design:
● Factorial designs involve the
simultaneous study of two or more
factors (independent variables)
and their interactions.
● It helps assess how multiple
factors influence the response
variable.
● Factorial designs can be 2x2 (two
factors with two levels each), 3x2,
2x3, and so on, depending on the
number of factors and levels.
Types of DOE

Fractional Factorial Design:


● Fractional factorial designs are a
subset of full factorial designs
where only a fraction of all
possible factor combinations is
tested.
● These designs are used when
there are too many factors to test
exhaustively, allowing
researchers to reduce the number
of experimental runs.
Types of DOE

Response Surface Design:


● Response surface designs
are used to optimize and
model complex relationships
between multiple factors and
the response variable.
● These designs involve
creating a mathematical
model that describes the
response surface (e.g.,
quadratic or cubic
relationships).
Types of DOE

Latin Square Design:


● Latin square designs are useful
when there are two sources of
variability to control, often seen in
food science and agriculture.
● They involve arranging treatments
or factors in a square grid,
ensuring that each treatment
occurs once in each row and
column.
Types of DOE
Split-Plot Design:
● When some factors (independent variables)
are difficult or impossible to change in your
experiment, a completely randomized design
isn’t possible.
● The result is a split-plot design, which has a
mixture of hard to randomize (or hard-to-
change) and easy-to-randomize (or easy-to-
change) factors. The hard-to-change factors
are implemented first, followed by the easier-
to-change factors.
● Study the effects of two irrigation methods
(factor 1) and two different fertilizer types
(factor 2) on four different fields (“whole plots”).
● fixed or hard-to-change factor (in this example,
that’s the irrigation method)
● non-fixed or easy-to-change factor within each
plot (in this example, that’s the fertilizer)
Types of DOE

Nested Design:
● Nested designs are appropriate
when the experimental units are
hierarchically structured or
nested within larger groups.
● For example, in educational
research, students may be
nested within classrooms, which
are in turn nested within schools.
Types of DOE
Repeated Measures Design:
● Repeated measures designs involve multiple measurements on the same
subjects over time or under different conditions.
● Commonly used in longitudinal studies or when assessing changes in response
over time
Types of DOE

Sequential Design:
● Sequential designs
involve collecting data in
stages, with the option to
stop the experiment
early if statistically
significant results are
observed.
● They are often used in
clinical trials and quality
control processes.
Types of DOE
Taguchi Design:
Taguchi designs, developed by Japanese engineer Genichi
Taguchi, are aimed at optimizing processes and products by
identifying factors that influence variability and quality.
Factors and Levels: In a Taguchi experiment, factors are variables
that can be adjusted or controlled, and levels are the different
settings or values that each factor can take. Factors can be
classified as controllable (those you can change) and
uncontrollable (those beyond your control).
Orthogonal Arrays: Taguchi designs often use orthogonal arrays,
which are structured tables that specify the combinations of factor
levels to be tested in an experiment. Orthogonal arrays ensure
efficient and systematic experimentation by reducing the number of
experimental runs required while still providing valuable
information.
Control Factors and Noise Factors: Taguchi experiments
distinguish between control factors (those factors you want to
optimize) and noise factors (factors that introduce variability but are
not the focus of optimization). The goal is to find settings of control
factors that are robust to variations caused by noise factors.
Signal-to-Noise (S/N) Ratios: Taguchi introduced the concept of
signal-to-noise ratios as objective functions for optimization. S/N
ratios measure the sensitivity of the system's performance to
variations in factors. The higher the S/N ratio, the better the
Data Types in DOE
In Design of Experiments (DOE), data types refer to the nature of the data that are collected
during an experimental study. Understanding the data types is crucial for designing the
experiment, choosing the appropriate statistical analysis, and drawing valid conclusions.

Continuous Data:
● Continuous data, also known as quantitative or numerical data, represent measurements that
can take any value within a range.
● These data are typically obtained through instruments or measurements that provide numeric
results.
● Examples of continuous data include:
● Length measurements (e.g., height, width, distance).
● Temperature readings.
● Weight or mass measurements.
● Time intervals.
● Concentrations (e.g., chemical concentrations).
● Continuous data are often analyzed using statistical techniques such as t-tests, analysis of
variance (ANOVA), regression analysis, and correlation analysis.
Data Types in DOE

Categorical Data:

● Categorical data, also known as qualitative or nominal data, represent categories or


groups into which data can be classified.
● These data are not numeric, and they describe qualities, characteristics, or group
memberships.
● Examples of categorical data include:
➢ Gender (categories: male, female).
➢ Types of materials (e.g., wood, plastic, metal).
➢ Colors (e.g., red, blue, green).
➢ Yes/no responses or binary outcomes (e.g., pass/fail).
➢ Customer satisfaction ratings (e.g., low, medium, high).
● Categorical data are often analyzed using techniques such as chi-square tests,
contingency tables, and logistic regression.
Data Types in DOE

Ordered Categories: Ordinal data consist of categories or levels that have a natural
order or ranking. This ranking implies that one category is "higher" or "lower" than
another, but the intervals between categories may not be equal.

Examples of Ordinal Data:

● Likert scale responses (e.g., strongly agree, agree, neutral, disagree, strongly
disagree).
● Educational attainment (e.g., high school diploma, bachelor's degree, master's
degree).
● Socioeconomic status (e.g., low income, middle income, high income).
● Customer satisfaction ratings (e.g., very satisfied, somewhat satisfied, neutral,
somewhat dissatisfied, very dissatisfied).
Data Types in DOE
Ordinal data are non-numeric and are often represented using labels or descriptive terms.
These labels indicate the position or ranking of each category.

Arithmetic operations (e.g., addition, subtraction, multiplication) are not meaningful for
ordinal data because the intervals between categories may not be equal or well-defined.
Therefore, calculating means or medians may not provide accurate information about the
data.

Rank-Based Analysis: When analyzing ordinal data, rank-based statistical methods are
often used. These methods focus on the order or ranking of the categories rather than their
numeric values.

Non-parametric tests like the Wilcoxon signed-rank test or the Mann-Whitney U test are
commonly used for ordinal data analysis.

Spearman's rank correlation coefficient is used to assess the strength and direction of
relationships between ordinal variables.
Classification problems in design of experiments

● In Design of Experiments (DOE), classification problems refer to situations


where the primary goal is to categorize or classify observations or
experimental units into distinct groups or categories based on certain
characteristics or factors.
● These classification problems are common in various experimental and
research settings.
Classification problems in design of experiments

Computer Vision:

● Object Detection: Identifying and locating objects within images or video


frames.
● Image Classification: Categorizing images into predefined classes or
categories.
● Facial Recognition: Recognizing and verifying individuals' faces.
● Gesture Recognition: Classifying hand gestures or movements in real-time.
● Scene Classification: Categorizing scenes or environments based on visual
information.
● Anomaly Detection: Identifying unusual or unexpected patterns in images.
Classification problems in design of experiments

Natural Language Processing (NLP):

● Sentiment Analysis: Determining the sentiment (positive, negative, neutral) in


text data.
● Text Classification: Categorizing text documents into topics or classes.
● Named Entity Recognition (NER): Identifying and classifying entities (e.g.,
names of people, places, organizations) in text.
● Machine Translation: Automatically translating text from one language to
another.
● Speech Recognition: Converting spoken language into written text.
● Text Summarization: Generating concise summaries of long text documents.
Classification problems in design of experiments
Machine Learning:
● Binary Classification: Classifying data points into one of two categories (e.g.,
spam or not spam).
● Multiclass Classification: Categorizing data into multiple distinct classes.
● Anomaly Detection: Identifying unusual or anomalous data points in a dataset.
● Recommender Systems: Predicting user preferences and recommending items
(e.g., movies, products).
● Customer Churn Prediction: Predicting which customers are likely to leave a
service or product.
● Fraud Detection: Identifying fraudulent transactions or activities.
● Disease Diagnosis: Classifying medical data to diagnose diseases.
● Handwriting Recognition: Recognizing handwritten characters or symbols.
Classification problems in design of experiments
Time Series Data:

● Time Series Classification: Categorizing time series data into different classes or states.
● Forecasting: Predicting future values or trends in time series data.

Computer Security:

● Intrusion Detection: Identifying unauthorized access or malicious activity in computer


networks.
● Malware Classification: Detecting and classifying different types of malware.
● Phishing Detection: Identifying phishing emails or websites.
● Access Control: Determining access rights for users or systems.

Autonomous Systems:

● Autonomous Vehicle Perception: Classifying objects and obstacles for self-driving cars.
● Robot Navigation: Navigating robots in complex environments and avoiding obstacles.
● Human Activity Recognition: Identifying and classifying human activities from sensor data.
Classification problems in design of experiments
Biometrics:

● Fingerprint Recognition: Matching and classifying fingerprints for identification.


● Iris Recognition: Identifying individuals based on their iris patterns.

Healthcare and Life Sciences:

● Disease Classification: Diagnosing diseases based on medical data (e.g., X-rays, MRIs).
● Drug Discovery: Classifying compounds for drug development.

Environmental Science:

● Species Classification: Identifying plant or animal species based on images or sounds.

These classification problems represent just a subset of the many applications of AI and
computer science in various domains. Solving these problems often involves the use of
machine learning algorithms and deep learning models to make accurate predictions and
classifications.
Data collection

Data collection is a crucial step in the research and analysis process. It involves
gathering relevant and accurate information to answer research questions or
achieve specific objectives.
There are various methods and tools available for collecting data, each with its
own advantages and limitations.
Here are some common data collection methods and tools:
Data collection – Methods and Tools
1. Surveys and Questionnaires:
- Method: Surveys involve asking individuals a set of standardized questions to gather
information about their opinions, preferences, behaviors, or characteristics.
- Tools: Online survey platforms (e.g., SurveyMonkey, Google Forms), paper-based
questionnaires, interviews (structured, semi-structured, or unstructured), telephone
surveys.
2. Observations:
- Method: Researchers directly observe and record behaviors, events, or activities to
gather data. This can be done in controlled settings (laboratories) or natural environments
(field observations).
- Tools: Notebooks, cameras, video recording equipment, mobile devices for real-time
data entry.
Data collection – Methods and Tools

3. Experiments:
- Method: Controlled experiments involve manipulating variables and observing
their effects on the outcome. This method is often used to establish cause-and-
effect relationships.
- Tools: Laboratory equipment, experimental setups, sensors, data loggers.

4. Case Studies:
- Method: In-depth examination of a single individual, group, or situation to gain
a comprehensive understanding of a specific context.
- Tools: Interviews, observations, archival data, documents, photographs.
Data collection – Methods and Tools
5. Secondary Data Analysis:
- Method: Analyzing existing data that was collected for other purposes. This can
include data from sources such as government agencies, research organizations, or
previous studies.
- Tools: Statistical software (e.g., R, Python), database management systems, data
visualization tools.

6. Content Analysis:
- Method: Systematically analyzing textual, visual, or audio content to identify
patterns, themes, and meanings.
- Tools: Text analysis software (e.g., NVivo, ATLAS.ti), image recognition tools,
sentiment analysis tools.
Data collection – Methods and Tools

7. Ethnography:
- Method: Immersing researchers in a cultural or social context to gain insights
into the experiences and perspectives of individuals within that context.
- Tools: Field notebooks, audio or video recording equipment, participant
observation techniques.

8. Social Media Data Collection:


- Method: Collecting and analyzing data from social media platforms to
understand trends, sentiments, and behaviors.
- Tools: Social media scraping tools, APIs, sentiment analysis tools.
Data collection – Methods and Tools
9. Sensor Data Collection:
- Method: Collecting data from sensors, IoT devices, and other instruments to monitor
physical phenomena and environmental conditions.
- Tools: Sensor networks, data loggers, IoT platforms.
10. Web Analytics:
- Method: Collecting and analyzing data about user interactions with websites or online
platforms to optimize user experience and marketing efforts.
- Tools: Google Analytics, web tracking tools, heatmaps.
When selecting a data collection method and tools, it's important to consider factors such as
the research objectives, the nature of the data, the target audience, ethical
considerations, and available resources. Each method has its own strengths and
limitations, and the choice should align with the research goals and the type of data needed
for analysis.

You might also like