0% found this document useful (0 votes)

12 views12 pages

R Lang-Unit-04

The document covers statistical modeling and testing, emphasizing their importance in data analysis across various fields. It explains key concepts such as hypothesis testing, sampling distributions, and different types of statistical tests, including t-tests and ANOVA. Additionally, it discusses the relationship between statistical modeling and machine learning, highlighting their complementary roles in data interpretation and prediction.

Uploaded by

km587522

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views12 pages

R Lang-Unit-04

Uploaded by

km587522

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Nrupathunga University

Department of Computer Science

V Sem BCA (NEP)
Statistical Computing and R Programming Language
Unit -04
Statistical testing and modelling, sampling distributions, hypothesis testing, components of a
hypothesis test, testing means, testing proportions, testing categorical variables, errors and
power, Analysis of variance.
1.Statistical modelling

Uses mathematics and statistics as a way to make assumptions and reach conclusions
from data. The use of statistical models is ubiquitous in scientific fields, including engineering,
business, and life sciences. It’s a valuable tool for drawing inferences and making quantitative
predictions about data.

This article defines statistical modeling and shows where and why it’s used. It then examines
the finer points of the most common forms of modeling, with actual published examples so
you can see regression in action.

2.Definition of Statistical Modeling (2 marks)

In simple terms, statistical modeling is a way to learn and reach meaningful
conclusions from data. A statistical model is defined by a mathematical equation, but defining
its very meaning is a good place to start:

• Statistics: the science of displaying, collecting, and analysing data

• Model: a mathematical representation of a phenomenon
The first step in any statistical analysis is to gather relevant data about the population. The
population is a set of similar items or events that you want to study and acquire information
about.

An example: U.S. voters

For instance, your population could be all U.S. voters. The population’s quantities you’re
interested in are called population parameters. In this case, it could be the approval
percentage for a presidential candidate. It would be impractical (and essentially impossible)
to collect this data on all U.S. voters.
You’ll typically run into the difficulty of obtaining one parameter for an entire population,
because It’s often impossible.

With statistical inference, you can estimate the population parameter by measuring it from
part of the population. Choosing that part to collect information on is known as statistical
sampling. Election polls do precisely that to predict winners (or losers) from a sample of U.S.
voters.

3.Why use statistical modelling?

The ability to predict and extract information are the main goals of data analysis.

The “cultures” of modeling

The late Leo Breiman, a noted statistician, stated there are two ways to approach these goals:
1. Data modeling culture
2. Algorithmic modeling culture
This notion of two cultures translates into a common conflict. How do you decide the best
approach to analyze a given dataset?

Statistical modeling is often referred to as data modeling. Many machine learning

models fall into the category of algorithmic modeling. These two share similar mathematical
underpinnings but differ in their purposes.
Machine learning models are used for large datasets, model automation, and are very
good at identifying hidden patterns in data. They’re an appropriate and necessary tool in
several data science applications, given their strong predictive power. However, the
predictability offered by machine learning doesn’t exclude the need for statistical modeling.

Statistical models are better at explaining the relationship between variables. They
seek some understanding of the structure underlying the observed data.

Statistics extracts population inferences from a sample, while machine learning

identifies generalizable patterns. More importantly, the approaches complement each other.
If possible, aim for accurate predictions as well as good interpretations.
4.Types of statistical models (2 marks)

Maybe the “simplest” statistical model is the arithmetic mean, or average, of a population
parameter. With this measure, you’re attempting to guess what the expected value is if you
take a random sample from the population.

Regression analysis is an important set of statistical models. It allows you to estimate a

variable using one or more independent variables. Those independent variables are also
known as explanatory variables. A regression model is specified by selecting a functional form
for the statistical model. Following are some of the most common regression models.
1. Linear Regression
2. Mulitple Regression
3. Logistic Regression
4. Ridge Regression

An example: COVID-19 mortality rate (not important)

the factors associated with the COVID-19 mortality rate (the dependent variable) in
169 countries were identified using linear regression.
The researchers performed a simple linear regression analysis to test the correlation
between COVID-19 mortality and a test number (the independent variable). They used
multiple regression analysis for predicting mortality rates considering other explanatory
factors (e.g., case numbers, critical cases, hospital bed numbers).

The results suggested that an increase in testing was effective at attenuating mortality when
other means of control were insufficient. Higher mortality was found to be associated with:

• lower test number

• lower government effectiveness
• elderly population
• fewer hospital beds
• better transport infrastructure
5.Statistical modeling can help with: (2 marks)

• Analyzing data
• Making predictions

• Understanding relationships between variables

• Decision-making in various fields, such as finance and healthcare.

6.Define Statistical testing

is a procedure used to determine the likelihood of observing certain patterns,
relationships, or differences in a dataset. It helps researchers draw conclusions about the
population based on sample data.

Statistical testing can also refer to a testing method in software engineering that aims
to identify unreliable software package products.

Some statistical tests include:

• Hypothesis testing: A statistical method that determines if there is enough evidence

in a sample data to draw conclusions about a population

• t-test: Compares the means of two samples

• ANOVA tests: Compares the means of more than two groups

Some basic statistical tests include:

• t-Test

• chi-square Test

• Kolmogorov-Smirnov Test (more commonly called the K-S Test)

Statistical testing in R involves

using statistical methods to analyze data and make inferences about
population parameters based on sample data. R is a programming language and
environment for statistical computing and graphics, and it provides a wide range of
functions and packages for conducting various statistical tests. Here are some
common statistical tests in R:
• t-test:
Used to compare the means of two groups to determine if they are significantly
different.
Example:
t.test(x, y)
• ANOVA (Analysis of Variance):
Used to compare means across multiple groups.
Example:
anova(lm(response_variable ~ factor_variable, data = your_data))
• Chi-square test:
Used for categorical data analysis to test the association between two categorical
variables.
Example:
chisq.test(table(variable1, variable2))
• Correlation test:
Used to measure the strength and direction of a linear relationship between two
continuous variables.
Example:
cor.test(x, y)
• Regression analysis:
What is Regression Analysis (2 marks)
Used to model the relationship between a dependent variable and one or more
independent variables.
Example:
lm(y ~ x, data = your_data)

6.Sample and Sampling Distribution:

What is a Sample? (2 marks)
A sample is a smaller set of data that a researcher chooses or selects from a
larger population using a pre-defined selection bias method. These elements are
known as sample points, sampling units, or observations.
What is Sampling Distribution? (2 marks)
A sampling distribution is a probability distribution of a statistic that is obtained
through repeated sampling of a specific population. It describes a range of possible
outcomes for a statistic, such as the mean or mode of some variable, of a population.
How Sampling Distribution is done for iris dataset? ( 5 marks)
Reference Youtube Link : https://youtu.be/Xfdg0xqFjts?si=jAoLfbU3M8RotyZr
(watch this video which I have explained in class and learn the code)

Demonstrate Sampling and Sampling Distribution using the Iris Data set

str(iris)

iris_df<-data.frame(iris)

View(iris_df)

iter<-100

n<-5

means<-rep(NA,iter)

for(i in 1:iter)

mean_of_each_sample<-sample(iris$Petal.Length,n)

means[i]<-mean(mean_of_each_sample)

hist(means)

7.Hypothesis Testing
What is Hypothesis Testing? (2 marks)
Is a form of statistical inference that uses data from a sample to draw conclusions about a
population parameters or a population parameter or a population probability distribution.
What are Inferential Statistics? (2 marks) Sampling
Distribution
Sample 1 Sample 2
Sample

Conclusion
Sample 3 Population Sample 6

Sample 4
Sample n

All Samples such as Sample1,2,3,4,….n are Sampling Distribution ,So if we many

such samples from a population data and draw a conlusion for any given samples
could be called as inferential Statistics

Sample Data Population

Data

Conclusion
The The Conclusion that is drawn from the population data or the hypothesis
made is tested with various types of Hypothesis testing
8.Components of Hypothesis testing
List the Components of Hypothesis Testing. (2 marks)
The components of Hypothesis testing are
1. Null Hypothesis
2. Alternate Hypothesis testing
3. Confidence Interval (CI)
4. Significant Value
5. Decision boundary
Define Null Hypothesis. (2 marks)
Null hypothesis as a general statement or a default position that says there is no
relationship between two measured phenomena or there is no association among groups.
A Null Hypothesis is denoted by the symbol H0 in statistics. It is usually pronounced as
“h-nought” or “H-null”. The Subscript in H is the digit 0.
Define Alternate Hypothesis. (2 marks)

The alternative hypothesis is a statement used in statistical inference experiments. It

is contradictory to the null hypothesis and denoted by Ha or H1. it is simply an alternative to
the null.

Explain the Null and Alternate Hypothesis with an Example (8 marks) (Very Important)

Consider an Example of Tossing a coin

When we toss a coin there may be chances of getting the following

1. 50% may be head and 50% may be tail
2. 60% may be head and 40% may be tail
3. 70% may be head and 30% may be tail

So the following Graph represents the above chances which is distributed normally

Figure 1 : Resenting the Normal Distribution Curve for Hypothesis testing

So if we get 50% head and 50% tail then we get a straight line exactly at the centre

If we get 60% head and 40% tail then we get a straight line between 40 and 60

If we get 70% head and 30% tail then we get a straight line between 30 and 70

So now we can explain the components of the Hypothesis for this problem
Figure 2: Resenting the Normal Distribution Curve for Hypothesis testing and its
Components

1. Null Hypothesis : H0 : Coin is Fair

Means if we toss the coin the best output that we need is between 30 and 70, if it
is so then we accept the Null Hypothesis which is justified from the experiment.

2. Alternate Hypothesis: H1 or Ha: Coin is not fair

This means if we toss a coin and if the chances of getting head and tail is beyond 30
and 70 then we accept the Alternate Hypothesis.

3. Confidence Interval

The Confidence interval is said to be 95 percent, means that if we repeat the coin flip
experiment 100 times, for 95 times, our probability of getting heads will fall within
that confidence interval.

Means all 95 times also if the chances of getting heads and tails lies between 30 and
70

4. Significant value

Can be defined as one minus the Confidence Interval 95%, means

Significant value= 1-0.95

=0.05
This means that 5% of the chances in tossing the coin 100 times may fall beyond 30
and 70, therefore the chances of getting the rest of the chances other than the three
chances mentioned above are 0.05% (or 5%).

5. Decision Boundary

The Confidence Interval itself is the Decision Boundary which has been decided
from tossing the coin for 100 times and it is proved that for 95 times the probability
of getting number of heads and tails is between 30 and 70,so the decision boundary
is 95%.

Conclusion

Suppose for the 5th flip of the coin, if we get 65% head and 25% tail, then accept
the Null Hypothesis as it lies in the given decision boundary as shown in the above
figure 2.

(Reference Youtube Link : https://youtu.be/pZ1d32ar_iY?si=a4dMtaGYUcOIegZJ )

please Look into the above Video to understand the above experiment if Required.

Testing Means or Types of Testing

Mention the different types of Hypothesis testing (2 marks)

1. Z Test
2. T Test
3. Annova Test
4. Chisqaure Test
What is Z-test? (2 marks)
A Z-test is a statistical test that determines if two population means are
different when the variances are known and the sample size is large. It is a type of statistical
hypothesis test where the test statistic follows a normal distribution.

What is T-Test ? (2 marks)

A t-test, also known as a Student's t-test, is a statistical test that compares the means
of two groups. It's used to determine if there's a significant difference between the means of
the two groups and how they're related.

What is annova-Test ? (2 marks)

Analysis of Variance (ANOVA) is a statistical test that determines differences between
research results from three or more unrelated samples or groups. It tests the hypothesis that
the means of two or more populations are equal

What is chi-square Test ? (2 marks)

The chi-square test is a statistical tool that determines if two categorical variables are
related or independent.

When do we use Z-test? (2 marks) (just write the below diagram for this question)

Explain z-test with an example. (8 marks) (very important)

Consider an example

The Average heights of all residents in a city is 168 cm with a population std deviation σ = 3.9.
A doctor believes the mean to be different. He measured the height of 36 individuals and
found the average to be 169.5

a. State the Null and Alternate Hypothesis

b. At a 95% CI is there is enough evidence to reject the Null Hypothesis

Given:

Standard Deviation σ = 3.9

Average of Population or Population mean μ=168 cm

Sample n=36

Average of Sample or Sample mean x̄=169.5

a. 1.State the Null Hypothesis

H0 = μ=168 cm

2.State the Alternate Hypothesis

H1 or Ha = μ ≠168 cm

b. Confidence Interval is 95% (given)

Significant value α = 1 – CI

= 1 - 0.95=0.05

Therefore with above data the Normal distribution curve is as fallows

0.025 0r 2.5 95% 0.025 or 2.5

-0.9750 μ=168 cm +0.9750

Population Mean

The above graph represents that, as we know CI is 95%,the significant value is

0.05 %,if we divide 0.05 by 2 we get 0.025 as it is a two-tailed test . So We can decide the
Decision boundary as

1 – 0.025 = 0.9750

So, the Decision boundary is +0.9750 and -0.9750 as shown in the graph.

Now Apply Z-test

Z-test Formula is as fallows

169.5−168
Z-test = = 2.31
3.9/√36

Statistical Inference:

So when 36 samples are drawn from a population of mean 169.5 and with standard
deviation 168, when this is applied to Z-test formula , we get 2.31. So when 2.31 is compared
with the above decision boundary value i.e 0.975 .

2.31 < 0.975 which lies outside the Decision boundary, So in this Case we Reject the Null
hypothesis and Accept the Alternate Hypothesis.

Ssmda Book
100% (2)
Ssmda Book
342 pages
Bayesian Statistical Modeling With Stan, R, and Python (Kentaro Matsuura) (Z-Library)
No ratings yet
Bayesian Statistical Modeling With Stan, R, and Python (Kentaro Matsuura) (Z-Library)
395 pages
R Programming Unit 4
No ratings yet
R Programming Unit 4
26 pages
Introduction To Statistical Modeling With SAS/STAT Software
No ratings yet
Introduction To Statistical Modeling With SAS/STAT Software
60 pages
Chapter 1
No ratings yet
Chapter 1
27 pages
Data Science
No ratings yet
Data Science
62 pages
LBOE2112 Module 2 Multivariate Data Analysis - 2024-2025 - All
No ratings yet
LBOE2112 Module 2 Multivariate Data Analysis - 2024-2025 - All
155 pages
SSMDA Notes Unit 2
No ratings yet
SSMDA Notes Unit 2
47 pages
Intro To Probability and Statistics
No ratings yet
Intro To Probability and Statistics
147 pages
CAM625 2019 s1 Module1
No ratings yet
CAM625 2019 s1 Module1
31 pages
Tuesday, 16 January 2024 2:58 PM
No ratings yet
Tuesday, 16 January 2024 2:58 PM
46 pages
Business Statistics 1
No ratings yet
Business Statistics 1
21 pages
Introduction To Stasmodels
No ratings yet
Introduction To Stasmodels
34 pages
Statistical Modelling Using Python
No ratings yet
Statistical Modelling Using Python
2 pages
Unit - 1 Introduction-Statistical Inference
No ratings yet
Unit - 1 Introduction-Statistical Inference
28 pages
Regression
No ratings yet
Regression
86 pages
Lecture Notes Statistics
100% (2)
Lecture Notes Statistics
117 pages
BA - Advanced Statistical Method Using R (P2)
No ratings yet
BA - Advanced Statistical Method Using R (P2)
12 pages
مبادئ الاحصاء
No ratings yet
مبادئ الاحصاء
66 pages
Apuntes Estadistica
No ratings yet
Apuntes Estadistica
116 pages
Testing of Hypothesis
No ratings yet
Testing of Hypothesis
108 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
54 pages
Lesson 1: Introduction and Review of Concepts
No ratings yet
Lesson 1: Introduction and Review of Concepts
11 pages
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Big Data Mid Term
No ratings yet
Big Data Mid Term
14 pages
Book
No ratings yet
Book
166 pages
STATISTICS
No ratings yet
STATISTICS
6 pages
Stats 201 Midterm Sheet
No ratings yet
Stats 201 Midterm Sheet
2 pages
UNL STAT318 Notes Chapter 1-4 (2020)
No ratings yet
UNL STAT318 Notes Chapter 1-4 (2020)
66 pages
MATH10282: Introduction To Statistics Lecture Notes
No ratings yet
MATH10282: Introduction To Statistics Lecture Notes
49 pages
Chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2
No ratings yet
Chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2
47 pages
An Introduction To Statistical Analysis
No ratings yet
An Introduction To Statistical Analysis
20 pages
MATH1208AnnotatedBook Imp
No ratings yet
MATH1208AnnotatedBook Imp
145 pages
MAT 211 Introduction To Business Statistics I Lecture Notes
No ratings yet
MAT 211 Introduction To Business Statistics I Lecture Notes
69 pages
Statistical Modeling and Computation
No ratings yet
Statistical Modeling and Computation
6 pages
Unit - II - Part I - Importance of Statistics in Data Science
No ratings yet
Unit - II - Part I - Importance of Statistics in Data Science
10 pages
MATH3091
No ratings yet
MATH3091
98 pages
An R Companion To Statistical Thinking For The 21st Century
No ratings yet
An R Companion To Statistical Thinking For The 21st Century
159 pages
Intro Stat
No ratings yet
Intro Stat
324 pages
Fundamentals of Statistics I - Lecture Notes
No ratings yet
Fundamentals of Statistics I - Lecture Notes
77 pages
ML Unit1
No ratings yet
ML Unit1
15 pages
Biostatistics Nutrition 2
No ratings yet
Biostatistics Nutrition 2
20 pages
Unit IV
No ratings yet
Unit IV
22 pages
Week 1 Introduction To Statistics: Key Ideas of The Topic
No ratings yet
Week 1 Introduction To Statistics: Key Ideas of The Topic
7 pages
Book IntroStatistics PDF
No ratings yet
Book IntroStatistics PDF
263 pages
Unit4 R
No ratings yet
Unit4 R
21 pages
Foundations and Applications of Statistics An Introduction Using R by Randall Pruim
100% (1)
Foundations and Applications of Statistics An Introduction Using R by Randall Pruim
842 pages
Stats 101 - Class 02
No ratings yet
Stats 101 - Class 02
103 pages
IntroStat Oct2010
No ratings yet
IntroStat Oct2010
324 pages
Statistical Regression and Classification From Linear Models To Machine Learning Matloff N Instant Download
No ratings yet
Statistical Regression and Classification From Linear Models To Machine Learning Matloff N Instant Download
85 pages
Introduction To Business Statistics Sixth Edition Ronald M. Weiers Instant Download
No ratings yet
Introduction To Business Statistics Sixth Edition Ronald M. Weiers Instant Download
52 pages
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Glossary of Research Methodology
From Everand
Glossary of Research Methodology
Dr. Awadhesh Kishore
No ratings yet
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet
Biostatistics Explored Through R Software: An Overview
From Everand
Biostatistics Explored Through R Software: An Overview
Vinaitheerthan Renganathan
3.5/5 (2)
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Singular and Non-Singular
No ratings yet
Singular and Non-Singular
6 pages
R Lang-Unit-02
No ratings yet
R Lang-Unit-02
35 pages
R Lang-Unit-05
No ratings yet
R Lang-Unit-05
7 pages
R Lang-Unit-01
100% (1)
R Lang-Unit-01
50 pages
Linear Transformation-1
No ratings yet
Linear Transformation-1
12 pages
QuestionBank Paper-5 AdvancedAlgebra&VectorCalculus
No ratings yet
QuestionBank Paper-5 AdvancedAlgebra&VectorCalculus
7 pages
Modeling Multisystemic Resilience
No ratings yet
Modeling Multisystemic Resilience
27 pages
Hardness Testing Technologies: Advanced
No ratings yet
Hardness Testing Technologies: Advanced
20 pages
Psionics Handbook 0.8.2
No ratings yet
Psionics Handbook 0.8.2
49 pages
RWM 2025 Sales Brochure
No ratings yet
RWM 2025 Sales Brochure
17 pages
Compression Test On Concrete: EN 12390-3
No ratings yet
Compression Test On Concrete: EN 12390-3
7 pages
Ray Tracing Study of Optical Characteristics of The Solar Image in The Receiver For A Thermal Solar Parabolic Dish Collector
No ratings yet
Ray Tracing Study of Optical Characteristics of The Solar Image in The Receiver For A Thermal Solar Parabolic Dish Collector
12 pages
Grade Viii Holiday Homework 25-26 - Updated
No ratings yet
Grade Viii Holiday Homework 25-26 - Updated
17 pages
Meadows, D. H. (1999) - Chicken Little, Cassandra, and The Real Wolf-So Many Ways To Think About The Fu
No ratings yet
Meadows, D. H. (1999) - Chicken Little, Cassandra, and The Real Wolf-So Many Ways To Think About The Fu
5 pages
Examen Febrero
No ratings yet
Examen Febrero
10 pages
Bending Stress Calculation
No ratings yet
Bending Stress Calculation
2 pages
Recount Text 2
No ratings yet
Recount Text 2
2 pages
Safety Data Sheet: Kolliphor® CSL
No ratings yet
Safety Data Sheet: Kolliphor® CSL
13 pages
Ece R2022 Syllabus 1-4
No ratings yet
Ece R2022 Syllabus 1-4
87 pages
Paul Rabinow
No ratings yet
Paul Rabinow
15 pages
Bde Unit IV
No ratings yet
Bde Unit IV
21 pages
Year 7 Mathematics Semester 2 Examination, 2014: General Instructions
No ratings yet
Year 7 Mathematics Semester 2 Examination, 2014: General Instructions
12 pages
Pages From 112006967-PRV-Sizing-for-Exchanger-Tube-Rupture
No ratings yet
Pages From 112006967-PRV-Sizing-for-Exchanger-Tube-Rupture
1 page
KU 3rd Semester NEP Examination 2024 Pending Marks
No ratings yet
KU 3rd Semester NEP Examination 2024 Pending Marks
3 pages
ICT 8 Activity Sheet: Quarter 3 - Weeks 5-6
No ratings yet
ICT 8 Activity Sheet: Quarter 3 - Weeks 5-6
10 pages
Crack Width Calculation
No ratings yet
Crack Width Calculation
3 pages
English Grade 8 Hoc Ki 2 Nam 2019 2020
No ratings yet
English Grade 8 Hoc Ki 2 Nam 2019 2020
24 pages
The Superalloys Fundamentals and Applications
No ratings yet
The Superalloys Fundamentals and Applications
382 pages
Study About Perception of Visitors While Visiting The Nature Places: A Case Study of Lucknow Zoo
No ratings yet
Study About Perception of Visitors While Visiting The Nature Places: A Case Study of Lucknow Zoo
17 pages
1ab29bb7-bd81-49c3-8a8e-c373e8db6363
No ratings yet
1ab29bb7-bd81-49c3-8a8e-c373e8db6363
947 pages
Research Ii
No ratings yet
Research Ii
10 pages
Diodo Linscan 808 JOLD-xxx-HS-4L Horizontal Stack 808 NM
No ratings yet
Diodo Linscan 808 JOLD-xxx-HS-4L Horizontal Stack 808 NM
2 pages
1ps0 01 Rms 20240822
100% (1)
1ps0 01 Rms 20240822
27 pages
F1 Self-Checking MC Quiz Chapter 10 Manipulation of Simple Polynomials - PDF - Google Drive 2
No ratings yet
F1 Self-Checking MC Quiz Chapter 10 Manipulation of Simple Polynomials - PDF - Google Drive 2
1 page
Lab Qa Checklist For Quality Control
No ratings yet
Lab Qa Checklist For Quality Control
6 pages
Report Card
No ratings yet
Report Card
1 page

R Lang-Unit-04

Uploaded by

R Lang-Unit-04

Uploaded by

Nrupathunga University

Department of Computer Science

2.Definition of Statistical Modeling (2 marks)

• Statistics: the science of displaying, collecting, and analysing data

An example: U.S. voters

3.Why use statistical modelling?

The “cultures” of modeling

Statistical modeling is often referred to as data modeling. Many machine learning

Statistics extracts population inferences from a sample, while machine learning

Regression analysis is an important set of statistical models. It allows you to estimate a

An example: COVID-19 mortality rate (not important)

• lower test number

• Understanding relationships between variables

• Decision-making in various fields, such as finance and healthcare.

6.Define Statistical testing

Some statistical tests include:

• Hypothesis testing: A statistical method that determines if there is enough evidence

• t-test: Compares the means of two samples

• ANOVA tests: Compares the means of more than two groups

• Kolmogorov-Smirnov Test (more commonly called the K-S Test)

Statistical testing in R involves

6.Sample and Sampling Distribution:

All Samples such as Sample1,2,3,4,….n are Sampling Distribution ,So if we many

Sample Data Population

The alternative hypothesis is a statement used in statistical inference experiments. It

Consider an Example of Tossing a coin

When we toss a coin there may be chances of getting the following

Figure 1 : Resenting the Normal Distribution Curve for Hypothesis testing

1. Null Hypothesis : H0 : Coin is Fair

2. Alternate Hypothesis: H1 or Ha: Coin is not fair

Can be defined as one minus the Confidence Interval 95%, means

Significant value= 1-0.95

(Reference Youtube Link : https://youtu.be/pZ1d32ar_iY?si=a4dMtaGYUcOIegZJ )

Testing Means or Types of Testing

Mention the different types of Hypothesis testing (2 marks)

What is T-Test ? (2 marks)

What is annova-Test ? (2 marks)

What is chi-square Test ? (2 marks)

Explain z-test with an example. (8 marks) (very important)

a. State the Null and Alternate Hypothesis

b. At a 95% CI is there is enough evidence to reject the Null Hypothesis

Standard Deviation σ = 3.9

Average of Population or Population mean μ=168 cm

Average of Sample or Sample mean x̄=169.5

a. 1.State the Null Hypothesis

2.State the Alternate Hypothesis

b. Confidence Interval is 95% (given)

Therefore with above data the Normal distribution curve is as fallows

0.025 0r 2.5 95% 0.025 or 2.5

-0.9750 μ=168 cm +0.9750

The above graph represents that, as we know CI is 95%,the significant value is

Now Apply Z-test

Z-test Formula is as fallows

You might also like