r Programming Unit 3 Qb Solved
r Programming Unit 3 Qb Solved
Unit-1
2 marks question
1. What are the two ways of assignment operators in R? Give an example.
The operators <- and = assign into the environment in which they are
evaluated. The operator<-can be used anywhere, whereas the
operator=is only allowed at the top level (e.g., in the complete expression
typed at the command prompt) or as one of the sub expressions in a
braced list of expressions.
>x<-5
>x
[1]5
>x=x+1#this over writes the previous value of x
>x
[1]6
8. If baz <- c(1,-1,0.5,-0.5) and qux <-3 , find the value of baz+quax.
:-There's a minor typo in your code where you defined `qux`. It
should also be a vector for addition. Here's the corrected code to
find the value of` baz+ qux` in R:
```R
baz<-c(1,-1,0.5,-0.5)
qux<-c(3,3,3,3)
9. What is the use of cbind and rbind functions in Matrix? Giv ean
example
:-cbind() and rbind() both create matrices by combining several
vectors of the same length. cbind() combines vectors as columns,
while rbind() combines them as rows.
Let’s use these functions to create a matrix with the numbers 1
through 30. First, we’ll create three vectors of length 5, then we’ll
combine them into one matrix. As you will see, the cbind()function
will combine the vectors as columns in the final matrix, while the
rbind() function will combine them as rows.
x <-1:5
y<-6:10
z<-11:15
#Create a matrix where x, y and z are
columns cbind (x, y, z)
## xyz ##
[1,]1611
## [2,]2712
## [3,]3813
## [4,]4914
##[5,]5 10 15
#Createamatrixwherex,yandzarerows
rbind(x, y, z)
## [,1][,2][,3][,4] [,5]
10. How do you find the dimension of the matrix? Give an example
:-After applying the dim function in R(I use the RStudio interface), we get
two numbers back. The first number reflects the number of rows; and the
second number reflects the number of columns. In other words: Our data
Frame consists of 500 rows and 5columns.
dim(data)#Apply dim function to data.frame
# 500 5
Example:
# Creating a matrix
matrix_example <- matrix(1:6, nrow = 2, ncol = 3)
Output:
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 2 0
[3,] 0 0 3
Output:
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 2 0
[3,] 0 0 3
13. Write proper code to replace the third column of matrixB with
the values in the third row of B.
:-To replace the third column of a matrix`B` with the values from
the third row of `B` in R, you can use the following code:
```R
#Assuming'B'isyour matrix
#Create an example matrix'B' for
illustration
B <- matrix(1:9, nrow = 3, ncol = 3)
#Replace the third column with values from the
third row B[, 3] <- B[3, ]
#Now'B' has the third column replaced with values from the third row
```
This code first assigns the values from the third row of `B` to the third column
of`B`, effectively replacing the original values in the third
column.
14. Write an example to find transpose and inverse of a matrix
using Rcommand?
:- The t() function in R is used to find the transpose of a matrix. The
15. What is the difference between & and && in R? Give an example.
:-The “&” operator performs the element-wise comparison and
returns a logical vector of the same length as its input.
Example:
# Using &
condition1 <- c(TRUE, TRUE, FALSE)
condition2 <- c(TRUE, FALSE, FALSE)
print(result)
Output:
[1] TRUE FALSE FALSE
print(result)
Output:
[1] FALSE
Example:
# Creating a list
my_list <- list(
name = "John",
age = 25,
grades = c(90, 85, 92),
is_student = TRUE
)
print("Grades:")
print(grades)
Output:
$name
[1] "John"
$age
[1] 25
$grades
[1] 90 85 92
$is_student
[1] TRUE
22. What is the purpose of attributes and class functions? Give an example.
:-The "class" attribute is what determines generic method
dispatch. A data frame has the "class" attribute set to the string
"data. frame", which is what allows generic functions like format,
print and even mathematical operators to treat it differently from,
say, a numeric vector.
ggplot2 is a more modern and powerful plotting system in R. It is based on the Grammar of
Graphics, which allows you to create complex plots by combining simple components. ggplot2
produces aesthetically pleasing and customizable visualizations and is widely used for data
visualization in R.
2. Repeat the vector c (-1,3, -5,7, -9) twice, with each element
repeated 10 times, and store the result. Display the result sorted
from largest to smallest.
In R, you can repeat the vector `c(-1, 3, -5, 7, -9)` twice with each element
repeated 10 times, and then sort the result from largest to smallest as
follows:
```R # Create the original vector
original_vector <- c(-1, 3, -5, 7, -9)
# Repeat each element 10 times
repeated_vector <- rep(original_vector, each = 10)
# Repeat the resulting vector twice
repeated_twice_vector <- rep(repeated_vector, times = 2)
# Sort the vector from largest to smallest
sorted_vector<-sort(repeated_twice_vector,decreasing= TRUE)
# Display the sorted vector sorted_vector
``` [1] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 3 3 3 3 3 3
[27] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
[53] -1 -1 -1 -1 -1 -1 -1 -1 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -
5
[79] -5 -5 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9
This code first creates the original vector `original_vector`, then repeats
each element 10 times using the `rep` function, repeats the resulting
vector twice, and finally sorts the vector in descending order to get the
desired result.
3. How do you extract elements from vectors? Explain it using individual
and vector of indexes with example?
Compiled by K Praveen Kumar, SMC SHIRVA
In R and many other programming languages, you can extract elements
from vectors using individual indices or vectors of indices. Let's explore
both methods with examples:
Individual Indexing:
You can extract a single element from a vector by specifying its index.
Example:
# Create a vector
my_vector <- c(10, 20, 30, 40, 50)
Vector of Indexing:
data: The input data that forms the matrix. This can be a vector or a
combination of vectors.
nrow: The number of rows in the matrix.
ncol: The number of columns in the matrix.
byrow: A logical value indicating whether the matrix should be filled by rows
(TRUE) or by columns (FALSE). The default is FALSE.
In this example:
c(1, 2, 3, ...) is the data vector that populates the matrix.
nrow = 3 specifies that the matrix should have 3 rows.
ncol = 4 specifies that the matrix should have 4 columns.
The matrix is filled column-wise by default (byrow = FALSE).
The resulting matrix has values from the data vector arranged in a 3x4 grid.
5. Do the following operations on a square matrix a. Retrieve third
and first rows of A, in that order, and from those rows, second and
third column elements. b. Retrieve diagonal elements c. Delete
second column of the matrix
Let's assume we have a square matrix A in R, and we'll perform the
specified operations:
# Create a square matrix A
A <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol = 3)
Output:
[1] "Extracted Second Row:"
[,1] [,2] [,3]
[1,] 4 5 6
2. Column Extraction:
You can extract specific columns from a matrix by specifying the column
indices.
Example:
# Create a matrix
matrix_example <- matrix(1:9, nrow = 3)
3. Diagonal Extraction:
You can extract the diagonal elements of a matrix using the diag()
function.
Example:
# Create a matrix
Compiled by K Praveen Kumar, SMC SHIRVA
matrix_example <- matrix(1:9, nrow = 3)
[1,] 1 3
[2,] 2 4
[3,] 5 6
```
These are three fundamental matrix operations in R. You can perform
various other matrix operations using R, but these examples should help
you understand the basics.
Output:
2. Logical Operators:
- AND Operator &:
The AND operator combines two conditions and returns TRUE only if
- OR Operator |:
The OR operator combines two conditions and returns TRUE if at least
one of the conditions is true.
Example:
# Check if either 5 is greater than 10 or 8 is less than 15
result_or <- (5 > 10) | (8 < 15)
- NOT Operator !:
The NOT operator negates a logical condition.
Example:
# Check if NOT (5 is greater than 10)
result_not <- !(5 > 10)
11.Explain any, all and which functions with example, on logical vector
In R, the `any()`,`all()`, and `which()` functions are commonly used to work with
logical vectors. These functions help you determine properties of logical vectors,
identify elements that meet specific criteria, and extract their indices. Here's an
explanation of each function with examples:
1.`any()`function:
- The`any()`functioncheckswhetheratleastoneelementinalogicalvectoris
`TRUE`.Itreturns`TRUE`ifanyelementis`TRUE`,otherwise,it returns
`FALSE`.
Example:
```R
logical_vector<-c(TRUE,FALSE,FALSE,TRUE,FALSE) result <-
any(logical_vector)
#The'result'variablewillbeTRUEbecausethereareTRUEvaluesinthe vector.
```
2. `all()` function:
- The `all()` function checks whether all elements in a logical vector are
`TRUE`. It returns `TRUE` only if all elements are `TRUE`, otherwise, it returns
`FALSE`.
Example:
```R
logical_vector<-c(TRUE,TRUE,TRUE,TRUE,TRUE)
result <- all(logical_vector)
#The 'result' variable will be TRUE because all elements are TRUE.
```
```R
Compiled by K Praveen Kumar, SMC SHIRVA
another_vector<-c(TRUE,FALSE,TRUE,TRUE,TRUE)
result <- all(another_vector)
#The'result' variable will be FALSE because not all elements are TRUE.
```
3. `which()`function:
- The`which()` function is used to extract the indices of elements in a logical
vector that are `TRUE`.
Example:
```R
logical_vector<-c(FALSE,TRUE,TRUE,FALSE,TRUE,FALSE)
indices<-which(logical_vector)
#The 'indices' variable will contain the indices of TRUE elements: 2,3,and 5.
```
You can use the `which()` function to locate specific elements in a vectoror filter
data based on certain conditions.
These functions are valuable for making logical assessments, checking
conditions, and performing operations on logical vectors in R.
paste Function:
The paste function is used to concatenate strings. It takes one or more vectors,
converts them to character vectors if necessary, and concatenates them term-
by-term.
Syntax:
paste(..., sep = " ", collapse = NULL)
...: Objects to be concatenated.
sep: Separator between the objects (default is a space).
collapse: If specified, collapses the result into a single string.
Example:
paste("Hello", "world!")
# Output: "Hello world!"
sub Function:
Purpose: The sub function is used to substitute the first occurrence of a
pattern with a replacement in a string.
Syntax:
sub(pattern, replacement, string)
pattern: The regular expression pattern to search for.
replacement: The string to replace the matched pattern.
string: The original string.
Example:
my $string = "apple orange apple banana";
sub/apple/pear/, $string; # Replace the first occurrence of 'apple' with
'pear'
print $string; # Output: pear orange apple banana
gsub Function:
Purpose: The gsub function is similar to sub, but it substitutes all
occurrences of a pattern with a replacement in a string.
Syntax:
gsub(pattern, replacement, string)
pattern: The regular expression pattern to search for.
replacement: The string to replace all occurrences of the matched
pattern.
string: The original string.
Example:
my $string = "apple orange apple banana";
These functions are quite powerful for manipulating strings, and the
specific syntax might vary slightly depending on the programming
language or tool you're using. The examples provided are in Perl, but
similar functions exist in other languages and tools with similar
functionality.
If you want to explicitly specify the order of levels, you can use the levels
18. What is data frame? Create a data frame as shown in the given
table and write R commands a) To extract the third, fourth, and fifth
elements of the third column b) To extract the elements of age column
using dollar operator
19.How do you add data columns and combine data frames? Explain
with example.
In R, you can add data columns to an existing data frame and combine
data frames using various functions and techniques. Adding columns to
a data frame is a common operation when you have new data to
include, and combining data frames
Can be useful for merging, joining, or stacking data from different
sources. Here are explanations and examples for both operations:
1. Adding Data Columns to a Data Frame:
You can add data columns to a data frame using the `$` operator or the
`[[]]` operator. Here's an example of how to add a new column to an
existing data frame:
```R
#Create an example data frame
original_df <- data.frame(Name= c("Alice","Bob","Carol"), Age = c(25,
30, 28))
#Add a new column "Score" to the data frame original_df$Score <- c(92,
85, 78)
#Display the modified data frame
print(original_df)
```
In this example, we created an original data frame and then added a new
22.List and explain graphical parameters used in plot function in R with example (any
4)
The `plot()` function in R allows you to create a wide variety of plots and
graphics. To customize the appearance of your plots, you can use graphical
parameters that control aspects like colors, axis labels, titles, and more. Here are
four common graphical parameters used in the `plot()` function, along with
examples:
1. `main` - Main Title:
- The `main` parameter is used to set the main title of the plot. Example:
```R
x <- c(1, 2, 3, 4, 5)
y <- c(3, 6, 2, 8, 5)
plot(x, y, main = "Scatter Plot of X vs Y") ```
2. `xlab` and `ylab` - X and Y Axis Labels:
- The `xlab` and `ylab` parameters are used to set the labels for the x and y axes,
respectively.
Example: ```R
x <- c(1, 2, 3, 4, 5)
y <- c(3, 6, 2, 8, 5)
plot(x, y, xlab = "X-Axis Label", ylab = "Y-Axis Label")
```
23.How do you add Points, Lines, and Text to an Existing Plot? Explain with example.
In R, you can add points, lines, and text to an existing plot using various
functions and graphical parameters. This allows you to enhance your plots with
additional information, annotations, and visual elements. Here's how to add
points, lines, and text to an existing plot with examples:
1. Adding Points to an Existing Plot:
You can add points to a plot using the `points()` function. This is useful for
overlaying new data points on an existing plot. Example:
```R
# Create a scatter plot
Unit-II
2-mark question
First, make sure to install the readr package if you haven't already:
# Install the readr package if not already installed
install.packages("readr")
Now, you can use the read_csv() function from the readr package to
read a CSV file from a web URL:
# Create a plot
plot(x, y, type = "l", col = "blue", lwd = 2, xlab = "X-axis", ylab = "Y-
axis", main = "Sine Wave")
# Close the PDF graphics device and save the plot to the file
dev.off()
dget() Function:
Purpose: The dget() function is used to deserialize (parse) the output of
dput() and reconstruct the original R object from the textual
representation.
Usage:
Local Environment:
Definition: A local environment is a specific environment created
when a function is called. Each function call has its own local
environment.
Characteristics:
Objects created within a function are usually stored in the local
environment of that function.
Local environments are temporary and are created when the function is
8. What the "search path" in R refers to.? How can you view the current
search path in R?
In R, the "search path" refers to the sequence of environments that R
searches when looking for a variable or function. When you reference a
variable or function, R searches through different environments in a
specific order to find the corresponding object. The search path is
essential for determining the scope and visibility of objects in your R
environment.
Syntax:
Search()
In R, break and next are control flow statements used within loops to
modify the flow of execution.
break Statement:
Purpose: The break statement is used to terminate the execution of a
loop prematurely. When break is encountered, the loop is immediately
exited, and the program control moves to the next statement after the
loop.
Example:
for (i in 1:10) {
next Statement:
Purpose: The next statement is used to skip the rest of the current
iteration of a loop and move to the next iteration. When next is
encountered, the loop jumps to the next iteration, skipping any
remaining code within the loop body.
Example:
for (i in 1:5) {
if (i == 3) {
next
}
print(i)
}
In this example, the loop prints the values of i from 1 to 5, but when i
equals 3, the next statement is encountered, and the loop skips the
print(i) statement for that iteration.
# Repeat loop
repeat {
# Generate a random number between 1 and 10
random_number <- sample(1:10, 1)
cat("Loop finished.\n")
15. Write an R code snippet that demonstrates the use of a try Catch
block to handle an exception.
In R, you can check for missing arguments in a function using the
Here's an example:
# Function with missing argument check
example_function <- function(x, y = 10, z) {
# Check if 'x' is missing
if (missing(x)) {
cat("Argument 'x' is missing or not explicitly provided.\n")
} else {
cat("Argument 'x' is present and its value is:", x, "\n")
}
16. How can you measure the execution time of a specific piece of code
in R?
You can measure the execution time of a specific piece of code in R
using the system.time() function or the microbenchmark package. Both
methods provide information about the elapsed time, user CPU time,
and system CPU time spent on the execution of a given expression.
Using system.time():
Here's an example using system.time():
# Function that may throw an exception
divide_numbers <- function(x, y) {
result <- tryCatch({
# Attempt to divide x by y
result <- x / y
result
}, error = function(e) {
# Handle the exception
cat("An error occurred:", conditionMessage(e), "\n")
return(NA)
})
return(result)
}
# Now you can install or load a different version of the 'dplyr' package
if needed
19. How the :: operator help to prevent masking when calling functions
from specific packages?
The :: operator in R is used to access functions or variables from a
specific package, helping to prevent function masking and namespace
conflicts. When you use the :: operator, you explicitly specify the
package that contains the function or variable you want to use, ensuring
that there is no ambiguity regarding which function is being called.
# In packageB
my_function <- function() {
print("Function in packageB")
}
4 or 6 marks questions
1. How do you read external data files into R? Explain any three types of
files with necessary commands to read their characters into R, with
example.
Table-format files are best thought of as plain-text files with three key
features that fully define how R should read the data.
Header If a header is present, it’s always the first line of the file. This
optional feature is used to provide names for each column of data. When
importing a file into R, you need to tell the software whether a header is
present so that it knows whether to treat the first line as variable names or,
alternatively, observed data values.
web-based files
To read web-based files in R, you can use functions from packages
such as readr, httr, or RCurl. The readr package is particularly useful
for reading various types of delimited files, including CSV files. Below
is an example of how to read a CSV file from a URL using the readr
package:
First, make sure to install the readr package if you haven't already:
# Install the readr package if not already installed
install.packages("readr")
Now, you can use the read_csv() function from the readr package to
Compiled by K Praveen Kumar, SMC SHIRVA
read a CSV file from a web URL:
Exact Matching:
my_function(b = 2, a = 1) # Output: a = 1 , b = 2
Partial Matching:
• Partial matching allows you to specify only a part of the parameter name,
and R will match the argument based on the provided partial name as long
as it is unambiguous.
• This is achieved by using a unique prefix of the parameter name.
• For example, if a function has a parameter named verbose, you can use
ver as a partial match for it.
my_function <- function(verbose = FALSE) {
if (verbose) {
print("Verbose mode is on.")
} else {
print("Verbose mode is off.")
}
}
Positional Matching:
• Positional matching occurs when arguments are matched to parameters
based on their order of appearance in the function definition.
• This is the simplest form of matching, where the first argument is matched
to the first parameter, the second argument to the second parameter, and so
on.
• It is important to pass the arguments in the correct order when using
positional matching.
1. if...else statement:
Syntax:
if (condition) {
# Code to be executed if the condition is TRUE
} else {
# Code to be executed if the condition is FALSE
}
Example:
# Example using if...else
x <- 10
if (x > 5) {
print("x is greater than 5.")
} else {
print("x is not greater than 5.")
}
2. ifelse() function:
Syntax:
ifelse(condition, true_value, false_value)
Example:
# Example using ifelse()
y <- 8
result <- ifelse(y > 5, "y is greater than 5", "y is not greater than 5")
# Nesting lists
nested_list <- list(
name = "John",
age = 25,
contact = list(
email = "[email protected]",
phone = "123-456-7890"
)
)
2. Stacking in R:
Stacking Data Frames:
print("Stacked by columns:")
print(stacked_df_col)
In this example, rbind() is used to stack data frames df1 and df2 by rows,
and cbind() is used to stack them by columns.
5. Explain for loop with its varieties in R with syntax and example.
for Loops
The R for loop always takes the following general form:
for(loopindex in loopvector)
{
do any code in here
}
Here, the loopindex is a placeholder that represents an element in the
loopvector—it starts off as the first element in the vector and moves to the
next element with each loop repetition. When the for loop begins, it runs
the code in the braced area, replacing any occurrence of the loopindex with
the first element of the loopvector. When the loop reaches the closing
brace, the loopindex is incremented, taking on the second element of the
loopvector, and the braced area is repeated. This continues until the loop
reaches the final element of the loopvector, at which point the braced code
is executed for the final time, and the loop exits. Here’s a simple example :
The following nested loop fills foo with the result of multiplying each
integer in loopvec1 by each integer in loopvec2:
```R
while (condition) {
# Code to be executed as long as the condition is true
}
```
```R
# Example of a while loop in R
count <- 1 # Initialize a counter variable
In this example:
```
[1] "Current count is: 1"
[1] "Current count is: 2"
[1] "Current count is: 3"
There are several functions in the *apply family*, and they include:
```R
# Example of apply function
matrix_data <- matrix(1:9, nrow = 3)
result <- apply(matrix_data, MARGIN = 1, FUN = sum)
print(result)
```
In this example, `apply` is used to calculate the sum of each row in the
`matrix_data` matrix.
```R
Here, `lapply` is used to calculate the sum of each element in the list.
```R
# Example of sapply function
list_data <- list(a = 1:3, b = 4:6, c = 7:9)
result <- sapply(list_data, FUN = sum)
print(result)
```
```R
# Example of tapply function
values <- c(1, 2, 3, 4, 5, 6)
groups <- factor(c("A", "B", "A", "B", "A", "B"))
result <- tapply(values, groups, FUN = sum)
print(result)
```
```R
# Example of mapply function
vector1 <- 1:3
vector2 <- 4:6
result <- mapply(sum, vector1, vector2)
print(result)
```
8. How do you define and call, user defined functions in R? Explain with an
example.
In R, you can define your own functions using the `function` keyword. A
user-defined function typically consists of a name, a list of parameters, and
a body containing the code to be executed. Here's the basic syntax for
defining a function in R:
```R
my_function <- function(parameter1, parameter2, ...) {
# Code to be executed
# Return statement (if needed)
}
```
After defining the function, you can call it by using its name and providing
values for the parameters. Here's an example of a simple user-defined
```R
# Define a function to calculate the square of a number
square <- function(x) {
result <- x^2
return(result)
}
In this example:
```R
[1] 25
```
```R
# Define a function to calculate the factorial of a number
factorial <- function(n) {
result <- 1
for (i in 1:n) {
result <- result * i
}
return(result)
}
In this example, the `factorial` function uses a `for` loop to calculate the
factorial of a number, and the result is printed, producing the output:
```R
[1] 120
```
These examples illustrate the basic process of defining and calling user-
defined functions in R. Functions provide a way to modularize code,
making it more readable, reusable, and easier to maintain.
9. How do you set default arguments to a user defined function? Explain with
an example.
In R, you can set default values for function arguments by assigning a
default value in the function definition. This allows users to omit those
```R
my_function <- function(arg1, arg2 = default_value2, arg3 =
default_value3, ...) {
# Code to be executed
# Use arg1, arg2, arg3, etc.
}
```
```R
# Define a function with default arguments
power_calculation <- function(x, exponent = 2) {
result <- x^exponent
return(result)
}
In this example:
```R
[1] 9 # result_default (3^2)
[1] 27 # result_custom (3^3)
```
This allows for flexibility when using the function. Users can choose to
provide a specific value for the `exponent` argument if they have a
different requirement, or they can rely on the default value if they are
comfortable with the default behavior.
1. **Helper Function:**
- **Definition:** A helper function is designed to perform a specific
subtask within a larger task. It is often used to encapsulate a particular
operation, making the main code more readable and modular.
- **Example:**
```R
# Helper function to calculate the square of a number
2. **Vectorized Function:**
- **Definition:** A vectorized function is designed to operate efficiently
on entire vectors or matrices, taking advantage of R's ability to perform
element-wise operations without explicit loops.
- **Example:**
```R
# Vectorized function to calculate the element-wise product of two
vectors
elementwise_product <- function(vector1, vector2) {
return(vector1 * vector2)
}
3. **Recursive Function:**
- **Definition:** A recursive function is one that calls itself, allowing it
to break a complex problem into simpler subproblems. Recursive functions
often have a base case that defines the simplest scenario and termination
condition.
- **Example:**
```R
# Recursive function to calculate the factorial of a non-negative integer
factorial <- function(n) {
if (n == 0 || n == 1) {
return(1) # Base case
} else {
return(n * factorial(n - 1)) # Recursive call
}
}
11. What is exception handling? How do you catch errors with try
Statements? Explain with example
Exception handling is a programming construct that allows developers to
manage and respond to errors or exceptional situations that may occur
during the execution of a program. In R, you can catch and handle errors
using the `try` function and associated constructs.
The `try` function is used to evaluate an expression and handle any errors
that may occur during its execution. The basic syntax of the `try` function
is as follows:
```R
result <- try({
# Code that may raise an error
})
```
If an error occurs during the evaluation of the code block inside `try`, the
error is caught, and the result is an object of class `try-error`. If there is no
error, the result contains the value of the expression.
```R
# Example of try statement for error handling
divide_numbers <- function(a, b) {
result <- try({
if (b == 0) {
stop("Error: Division by zero.")
}
return(a / b)
In this example:
```R
[1] 5 # Result of successful division
Error in divide_numbers(5, 0) : Error: Division by zero. # Error message
for division by zero
```
1. **Function Masking:**
- **Description:** When a function with the same name is defined
locally, it takes precedence over a function with the same name in the
global environment or a package namespace.
- **Example:**
```R
# Define a function in the global environment
my_function <- function() {
print("Global function")
}
# Create a local environment and define a function with the same name
local_environment <- new.env()
assign("my_function", function() {
print("Local function")
}, envir = local_environment)
2. **Variable Masking:**
- **Description:** When a variable with the same name is assigned a
value in a local scope, it masks the variable with the same name in the
broader scope.
- **Example:**
```R
# Assign a variable in the global environment
my_variable <- "Global variable"
# Create a local environment and assign a variable with the same name
local_environment <- new.env()
assign("my_variable", "Local variable", envir = local_environment)
# Print the values of the variable in the global and local scopes
print(my_variable)
print(local_environment$my_variable)
```
To avoid masking issues, it's good practice to use unique and descriptive
names for variables and functions. Additionally, understanding the scoping
13. How do you draw barplot and pie chart in R? Explain with example.
In R, you can create bar plots and pie charts using the base plotting system
or popular plotting packages like `ggplot2`. Here, I'll provide examples for
both the base plotting system and `ggplot2` for creating a bar plot and a pie
chart.
These examples showcase how to create a basic bar plot and pie chart
using both the base plotting system and `ggplot2`. Depending on your
preferences and requirements, you can choose the approach that best fits
your needs. Note that the `ggplot2` package provides a more flexible and
customizable approach for creating a wide range of plots.
```R
# Example of a histogram using the base plotting system
# Generate random data for illustration
set.seed(123)
data <- rnorm(1000, mean = 10, sd = 2)
# Create a histogram
hist(data, col = "skyblue", main = "Histogram Example", xlab = "Values",
ylab = "Frequency")
```
```R
# Example of a histogram using ggplot2
# Generate random data for illustration
set.seed(123)
data <- rnorm(1000, mean = 10, sd = 2)
```R
# Example of a boxplot using the base plotting system
# Generate random data for illustration
set.seed(123)
data <- list(A = rnorm(100, mean = 10, sd = 2),
B = rnorm(100, mean = 15, sd = 3),
C = rnorm(100, mean = 12, sd = 2))
# Create a boxplot
boxplot(data, col = c("skyblue", "lightgreen", "pink"), names = c("A", "B",
"C"),
main = "Boxplot Example", xlab = "Categories", ylab = "Values")
```
```R
# Example of a boxplot using ggplot2
# Generate random data for illustration
set.seed(123)
data <- data.frame(Category = rep(c("A", "B", "C"), each = 100),
Values = c(rnorm(100, mean = 10, sd = 2),
rnorm(100, mean = 15, sd = 3),
rnorm(100, mean = 12, sd = 2)))
16. What is scatterplot? Explain single plot and matrix of plots, with an
example.
A scatterplot is a graphical representation of the relationship between two
continuous variables. It displays individual data points as dots on a two-
dimensional plane, with one variable on the x-axis and the other on the y-
axis. Scatterplots are useful for visualizing patterns, trends, and
relationships between variables.
In R, you can create a single scatterplot using the base plotting system or
the `ggplot2` package. Here's an example using both approaches:
```R
# Example of a single scatterplot using the base plotting system
# Generate random data for illustration
set.seed(123)
x <- rnorm(100)
y <- 2 * x + rnorm(100)
# Create a scatterplot
plot(x, y, col = "blue", main = "Single Scatterplot Example", xlab = "X-
axis", ylab = "Y-axis")
```
In this example:
- We generate two sets of random data (`x` and `y`) using `rnorm`.
```R
# Example of a single scatterplot using ggplot2
# Generate random data for illustration
set.seed(123)
data <- data.frame(X = rnorm(100), Y = 2 * rnorm(100) + rnorm(100))
```R
# Example of a matrix of scatterplots using ggplot2
# Generate random data for illustration
set.seed(123)
data <- data.frame(
X1 = rnorm(100),
X2 = 2 * rnorm(100) + rnorm(100),
X3 = 0.5 * rnorm(100) + rnorm(100),
X4 = -1.5 * rnorm(100) + rnorm(100)
)
In this example, we have four variables (`X1`, `X2`, `X3`, `X4`), and a
matrix of scatterplots is created to visualize relationships between all
possible pairs of variables. Each scatterplot is represented by a different
color.
14. Determine the mode and median for the following numbers.
MODE= 0 73, 167,199,213, 243, 345, 444 ,524, 609, 682 Arrange the numbers in
ascending order:= 0 73, 167,199,213, 243, 345, 444 ,524, 609, 682
Mode: The mode is the number that appears most frequently in the dataset .
In this case, there are no repeated values, so there is no mode.
Median: The median is the middle value in a data set when it's arranged in
ascending order. In this case, there are 10 numbers, so the median will be the
average of the 5th and 6th numbers (the middle two numbers):
Median= (243+345)/2=588 /2=294
15.Compute the mean for the following numbers.
26 Write the formulae for sample Variance and sample Standard Deviation.
67. Write the formulae for Mean, Variance, and Standard Deviation of Discrete
distribution
73. What are uniform distributions? Write the probability density functionof
uniform distribution.
The uniform distribution, sometimes referred to as the rectangular
distribution, Is a relatively simple continuous distribution in which the sam
e height,orf(x),is obtained over a range of values. The following probability
Density function defines a uniform distribution.
LONG ANSWERS
1. Explain four Types of Data & Measurement Scales with example.
In statistics and data analysis, data can be classified in to different types and
measurement scales, each with its own characteristics and appropriate
2. Leptokurtic:
- A leptokurtic distribution has positive kurtosis greater than 3,indicating
heavy tails. This means that the distribution has more extreme values than
a normal distribution, resulting in higher peaks and thicker tails.
Diagram for a leptokurtic distribution:
3. Platykurtic:
- A platykurtic distribution has negative kurtosis less than 3, indicating
light tails. In this type of distribution, data points are less concentrated
around the mean and have thinner tails compared to a normal
In R, you can calculate kurtosis using the `kurtosis()` function from the
"e1071"package.Tocreateahistogramandvisuallyinspectthekurtosisofa
dataset, you can use the `hist()` function and examine the shape of the
distribution. Here's an example:
```R
#Exampledataset
data<-c(1,2,3,4,5,6, 6, 7,7,7,8,8,8,8,9)
#Calculatekurtosis
library(e1071)
kurt<-kurtosis(data) #
Create a histogram
hist(data,main="HistogramofData",xlab="Value",ylab="Frequency") # Add
kurtosis information to the plot
text(6,3,paste("Kurtosis=",round(kurt,2)),col="red")
```
In this example, we calculate the kurtosis of the `data` and create a
histogram to visualize the data distribution.The kurtosis value is displayed
on the histogram plot.Depending on the kurtosis value,you can determine
whether the distribution is mesokurtic, leptokurtic, or platykurtic.
as:
- A positive skewness value indicates that the data is skewed to the right
(long tail on the right side of the distribution), while a negative skewness
value indicates that the data is skewed to the left(long tail on the left side
of the distribution). A skewness value of suggests that the data is symmetric.
In R,you can calculate Pearson's coefficient of skewness using the
`skewness()`functionfromthe"e1071"package.
```R
#Example dataset
data<-c(5,6,6,7,8,8,8,9,10)
#Calculateskewness
library(e1071)
skew<-skewness(data)
```
2. Bowley’s coefficient of skewness:
Both types of skewness can provide insights into the shape of the data
distribution and can help you identify deviations from normality. Depending
4. The number of U.S. carsin service by top carrental companies in a recent year
according to Auto Rental News follows. Compute the mode, the median, and
themean.
Mode:9,000
Median:With 13 different companies in this group,.
The median is located at the 7th position. Because the data are already ordered,the
7th term is 20,000, which is the median.
Mean:The total number of cars in service is 1, 791, 000 =∑x
5.Compute the 35th percentile, the 55th percentile, Q1, Q2, and Q3 for
the following data
7.
a. Range:
The range is the difference between the maximum and minimum values in the
dataset.
Maximumvalue:9
Minimumvalue:1
Range=Maximum-Minimum Range
=9-1
Range=8
11.
12.
In this scenario, we have the following statistics for the distribution of ages:
1. Mean age = 51
2. Medianage=54
3. Modal age=59
To discuss the skewness soft he age distribution,we can consider the
relationships between the mean, median, and mode.
1. If the mean,median, and mode are approximately equal, the
distribution is approximately symmetrical and has little or no
skewness.
2. If the mean is less than the median,and the mode is greater than the
median, the distribution is negatively or left-skewed. In this case, the
tail of the distribution is stretched out to the left.
3. If the mean is greater than the median,and the mode is less than the
median, the distribution is positively or right-skewed. In this case, the
tail of the distribution is stretched out to the right.
In your case:
- The mean age(51)is less than the median age(54), indicating as light
negative skewness.
- The modal age(59) is greater than the median age(54), which
13.
14.
15)
Shownhereisalistofthetopfiveindustrialandfarmequipmentcompaniesin
theUnited States,along withtheir annualsales ($ millions).Construct
apiechartand a bar graph to represent these data, and label the slices
with the appropriatepercentages.
Ste root
m
1 83 93 97
2 09 75 78 84
3 67 32 34 55 53 89 21
4 10 15 84 95
5 10 11 42 47
6 45 72
7 20 80
8 64
9 15
10 94
18.
20.
22. A supplier shipped a lot of six parts to a company. The lot contained
threedefectiveparts.Supposethe customer decided to randomlyselect
twoparts andtest them for defects. How large a sample space is the
customer potentially working with?List the sample space.Using the
sample space list,determine the probability that the customer will select
a sample with exactly one defect.
To determine the sample space and calculate the probability of
selecting a sample with exactly one defect, we can use a combination
of counting techniques. In this problem, we have a lot of six parts, with
three of them being defective.
23.
XUZ={1,2,3,4,5,7,8,9}
XINTERSECTIONZ={1,3,
7} X INTERSECTION
Y={7,9}
XUYUZ={1,2,3,4,5,7,8,9}
XINTERSECTIONYINTERSECTIONZ={7
}
(XUY )INTERSECTION Z={1,2,3,4,7}
(YINTESRECTIONZ)U(XINTERSECTIONY)=
{7} X OR Y={1,2,3,4,5,7,8,9}
YAND X={2,4,7}
24.
Acompany’scustomerservice800telephonesystemissetupsothatthecallerhas
six options. Each of these six options leads to a menu with four options.
Foreach of these four options, three more options are available. For each
of thesethree options, another three options are presented. If a person calls
the 800number for assistance, how many total options are possible?
To find the total number of possible options, you can calculate the total
number of combinations by multiplying the number of choices at each
level.In this scenario, you have four levels of choices, each with a
different number of options:
1. Firstlevel:6options
25.A bin contains six parts.Two of the parts are defective and four
are acceptable.If three of the six parts are selected from the bin, how
large is the sample space?Which counting rule did you use, and why?
For this sample space, what is theprobability that exactly one of the
three sampled parts is defective?
To find the size of the sample space for selecting three parts from
the bin, you can use the combination formula. The combination
formula, also known as "n choose k,"isused to determine the
number of ways to choose k items from a set of n items without
regard to the order. It's expressed as:
C(n,k) =n!/(k!(n-k)!)
In this case,you have six parts in the bin,and you want to select
joint_probs<-matrix(c(0.2,0.1,0.3,0.4),nrow=2,byrow=TRUE)
marginal_prob_A <- rowSums(joint_probs)
```
2. UnionProbability:
- Union probability, denoted as P(A∪ B),represents the probability
that atleast one of two events A and B occurs. It can be calculated using
the marginal probabilities and the joint probability of A and B.
Example:
Using the same joint probability distribution as above,to find the union
probability of A and B:
Let F denote the event of female and P denote the event ofp rofessional
worker. The question is P (FꓴP) = ?
29.
\( M \) = woman is married,
\( L \) = woman participates in the labor force.
\[ P(L|M) = 78\% \]
\[ P(L|M') = 61\% \]
These calculations require the values of \( P(M) \) and \( P(L) \), which are
not provided in the question. If you have those values, you can substitute
them into the formulas to get the numerical answers.
30. A company has 140 employees, of which 30 are supervisors. Eighty of
theemployees are married, and 20% of the married employees are
supervisors. If accompany employee is randomly selected, what is the
probability that the employee is married and is a supervisor?
31 .
P(A Π E) =16/5+11+16+18
P(A Π B) = 2/2+3+5+7
P(D Π E) = 0/5+11+16+18
(Since there is no common element between D and E)
32.
Given information:
P(S) = 0.43 (probability of owning stock)
P(E|S) = 0.75 (probability of having some college education given they own
stock)
P(E) = 0.37 (probability of having some college education)
b. Probability that the adult owns stock and has some college education:
P(S Π E) = P(S) .P(E|S)
c. Probability that the adult owns stock or has some college education:
P(S U E) = P(S) + P(E) - P(S ΠE)
d. Probability that the adult has neither some college education nor owns stock:
P( ¬S U¬ E) = 1 - P(S Π E)
e. Probability that the adult does not own stock or has no college education:
P( ¬S U ¬ E) = 1 - P(S Π E)
f. Probability that the adult has some college education and owns no stock:
P(E Π ¬ S) = P(E) -
P(SΠE)
34. Determine the mean,the variance and the standard deviation of the
followingdiscrete distribution
39.A bank has an average random arrival rate of 3.2 customer severy
4minutes.What is the probability of getting exactly 10 customers during an
8-minuteinterval? (Using Poisson Formula)
2 09 75 78 84
3 67 32 34 55 53 89 21
4 10 15 84 95
5 10 11 42 47
6 45 72
7 20 80
8 64
9 15
10 94
10
12
23.
XUZ={1,2,3,4,5,7,8,9}
XINTERSECTIONZ={1,3,7} X
INTERSECTION Y={7,9}
XUYUZ={1,2,3,4,5,7,8,9}
XINTERSECTIONYINTERSECTIONZ={7}
(XUY )INTERSECTION Z={1,2,3,4,7}
(YINTESRECTIONZ)U(XINTERSECTIONY)={7} X
OR Y={1,2,3,4,5,7,8,9}
YAND X={2,4,7}
24. Acompany’scustomerservice800telephonesystemissetupsothatthecall
erhas six options. Each of these six options leads to a menu with four options.
Foreach of these four options, three more options are available. For each of
thesethree options, another three options are presented. If a person calls the
800number for assistance, how many total options are possible?
Totaloptions=6(1stlevel)x24(2ndlevel)x72(3rdlevel)x216(4thlevel) Total
options = 6 x 24 x 72 x 216
Calculatethetotalnumberofoptions:
Totaloptions=373,248
So,thereare373,248totalpossibleoptionsforapersoncallingthe800numberfor assistance.
Probability=(Numberoffavorableoutcomes)/(Totalnumberofpossible
outcomes)
216/800=0.27
25. Abincontainssixparts.Twoofthepartsaredefectiveandfourareacceptable
.If three of the six parts are selected from the bin, how large is the sample
space?Which counting rule did you use, and why? For this sample space, what is
theprobability that exactly one of the three sampled parts is defective?
26. ExplainMarginal,Union,JointandConditionalProbabilitieswithexample.
Inprobabilitytheory,varioustypesofprobabilitiesareusedtodescribedifferent
aspects of random events and their relationships. Four fundamental types of
probabilities are marginal probability, union probability, joint probability, and
conditional probability. Here, I'll explain each type with examples in R:
5. MarginalProbability:
- Marginalprobabilityreferstotheprobabilityofasingleeventoccurring,
ignoring other events. It is calculated by summing or integrating the joint
probabilities over all possible values of the events being ignored.
Example:
SupposeyouhavethefollowingjointprobabilitydistributionfortwoeventsA and B:
| |A = 0 |A=1|
| | | |
|B= 0 |0.2 |0.1 |
|B= 1 |0.3 |0.4 |
TofindthemarginalprobabilityofeventA:
```R
Let F denote the event of female and P denote the event of professional worker.
The question is P (FꓴP) = ?
28.
32.
33.
Ans:
38 Bankcustomersarriverandomlyonweekdayafternoonsatanaverageof
3.2customers every 4 minutes. What is the probability of having more than
7customers in a 4-minute interval on a weekday afternoon? (Using
PoissonFormula)
40.
11) Write the t test statistics formula for Paired Samples t Test.
16) Write the t test statistics formulae for Chi-Square Test of Independence.
t test statistics formulae for Chi-Square Test of Independence:
18) Write the formulae of MSC, MSE and F statistics for ONE WAY ANOVA.
Ans:
MSC:
MSC=SSC/〖df〗_C
MSE:
MSE=SSE/〖df〗_E
F:
F=MSC/MSE
19) Write the formulae of SSC, SSE and SST for ONE WAY ANOVA.
33) What do you mean by Perfect Positive Correlation? Write its condition and draw the
scatter diagram.
Perfect Positive Correlation: In this case, the points will form on a straight line rising from
the lower left hand corner to the upper right hand corner.
Conditions for perfect positive correlation:
1.Linear Relationship: The relationship between the two variables should be perfectly linear.
2.Constant Ratio: The ratio between the changes in the two variables should be constant.
3.No Variability: There should be no variability in the relationship. Every change in one
variable should be associated with a corresponding and proportional change in the other.
35) What do you mean by High Degree of Positive Correlation? Write its condition and draw
the scatter diagram.
High Degree of Positive Correlation: In this case, the plotted points fall in a narrow band,
wherein points show a rising tendency from the lower left hand corner to the upper right hand
corner.
Conditions for a high degree of positive correlation:
Correlation Coefficient (r): The correlation coefficient (r) should be close to +1. The range of
the correlation coefficient is from +1 (perfect positive correlation) to -1 (perfect negative
correlation), with 0 indicating no correlation.
Scatter Diagram: When plotting the data points on a scatter diagram, they should generally
form a clear upward-sloping pattern. This means that as one variable increases, the other
variable tends to increase as well.
36) What do you mean by High Degree of Negative? Write its condition and draw the scatter
diagram.
High Degree of Negative Correlation: In this case, the plotted points fall in a narrow
band, wherein points show a declining tendency from upper left hand corner to the lower right
hand corner.
37) What do you mean by Low Degree of Positive Correlation? Write its condition and draw
the scatter diagram.
Low Degree of Positive Correlation: If the points are widely scattered over the diagrams,
wherein points are rising from the left hand corner to the upper right hand corner.
39) What do you mean by Zero (No) Correlation? Write its condition and draw the scatter
diagram.
Zero (No) Correlation: When plotted points are scattered over the graph haphazardly,
then it indicate that there is no correlation or zero correlation between two variables.
40) Write the formula to calculate Karl Pearson’s coefficient of correlation method using
direct method.
41) Write the formula to calculate Karl Pearson’s product moment coefficient of correlation.
Regression line of X on Y: This line gives the probable value of X (Dependent variable)
for any given value of Y (Independent variable).
Long Answerers
Research Hypotheses
Research hypotheses are most nearly like hypotheses defined earlier. A research hypothesis
is a statement of what the researcher believes will be the outcome of an experiment or a
study.Before studies are undertaken, business researchers often have some idea or theory
based on experience or previous work as to how the study will turn out. These ideas, theories,
or notions established before an experiment or study is conducted are research hypotheses.
Some examples of research hypotheses in business might include:
■ Older workers are more loyal to a company.
■ Companies with more than $1 billion in assets spend a higher percentage of their annual
budget on advertising than do companies with less than $1 billion in assets.
■ The price of scrap metal is a good indicator of the industrial production index six months
later.
Statistical Hypotheses
In order to scientifically test research hypotheses, a more formal hypothesis structure needs
to be set up using statistical hypotheses. Suppose business researchers want to “prove” the
Note that the “new idea” or “new theory” that company officials want to
“prove” is stated in the alternative hypothesis. The null hypothesis states that the old market
share of 18% is still true..
Substantive Hypotheses
In testing a statistical hypothesis, a business researcher reaches a conclusion based on the data
obtained in the study. If the null hypothesis is rejected and therefore the alternative hypothesis
is accepted, it is common to say that a statistically significant result has been obtained.
Ans-
Type I and Type II Errors
Because the hypothesis testing process uses sample statistics calculated from random data to
reach conclusions about population parameters, it is possible to make an incorrect decision
about the null hypothesis. In particular, two types of errors can be made in testing hypotheses:
Type I error and Type II error.
A Type I error is committed by rejecting a true null hypothesis. With a Type I error, the null
hypothesis is true, but the business researcher decides that it is not.
As an example, suppose the flour-packaging process
actually is “in control” and is averaging 40 ounces of flour per package. Suppose also that a
business researcher randomly selects 100 packages, weighs the contents of each, and
computes a sample mean. It is possible, by chance, to randomly select 100 of the more
extreme packages (mostly heavy weighted or mostly light weighted) resulting in a mean that
falls in the rejection region. The decision is to reject the null hypothesis even though the
population mean is actually 40 ounces. In this case, the business researcher has committed a
Type I error.
The notion of a Type I error can be used outside the realm of statistical hypothesis
testing in the business world. For example, if a manager fires an employee because some
evidence indicates that she is stealing from the company and if she really is not stealing from
the company, then the manager has committed a Type I error.
As another example, suppose a worker on the assembly line of a large manufacturer hears an
unusual sound and decides to shut the line down (reject the null hypothesis). If the sound
turns out not to be related to the assembly line and no problems are occurring with the
assembly line, then the worker has committed a Type I error.
The probability of committing a Type I error is called alpha ( α) or level of significance.
Alpha equals the area under the curve that is in the rejection region beyond the critical
value(s). The value of alpha is always set before the experiment or study is undertaken. As
A Type II error is committed when a business researcher fails to reject a false null hypothesis.
In this case, the null hypothesis is false, but a decision is made to not reject it.
Suppose in the case of the flour problem that the packaging process is actually producing a
population mean of 41 ounces even though the null hypothesis is 40 ounces. A sample of 100
packages yields a sample mean of 40.2 ounces, which falls in the nonrejection region. The
business decision maker decides not to reject the null hypothesis. A Type II error has been
committed. The packaging procedure is out of control and the hypothesis testing process does
not identify it.
Suppose in the business world an employee is stealing from the company. A manager sees
some evidence that the stealing is occurring but lacks enough evidence to conclude that the
employee is stealing from the company. The manager decides not to fire the employee based
on theft. The manager has committed a Type II error.
Consider the manufacturing line with the noise. Suppose the worker decides not enough noise
is heard to shut the line down, but in actuality, one of the cords on the line is unraveling,
creating a dangerous situation. The worker is committing a Type II error
The probability of committing a Type II error is beta ( β).
Unlike alpha, beta is not usually stated at the beginning of the hypothesis testing procedure.
Actually, because beta occurs only when the null hypothesis is not true, the computation of
beta varies with the many possible alternative parameters that might occur.
Power, which is equal to 1 -β , is the probability of a statistical test rejecting the null
hypothesis when the null hypothesis is false. Figure 9.5 shows the relationship between α, β,
and power.
3.Imagine a company wants to test the claim that their batteries last more than 40 hours. Using
a simple random sample of 15 batteries yielded a mean of 44.9 hours, with a standard
deviation of 8.9 hours. Test this claim using a significance level of 0.05.
t 0.05,14=1.761
2.13>1.761 Reject Null Hypothesis
4. A random of sample size 20 is taken resulting in sample mean of 25.51 and a sample
standard deviation of 2.1933.Assume data is normally distributed use this information and α
The test is to determine whether the machine is out of control, and the shop supervisor has
not specified whether he believes the machine is producing plates that are too heavy or too
light. Thus a two-tailed test is appropriate. The following hypotheses are tested.
Anα of .05 is used. Figure 9.11 shows the rejection regions. Because n = 20, the degrees of
freedom for this test are 19 (20 - 1). The t distribution table is a onetailed table but the test for
this problem is two tailed, so alpha must be split, which yields α/2 = .025, the value in each
tail. (To obtain the table t value when conducting a two-tailed test, always split alpha and use
α/2.) The table t value for this example is 2.093. Table values such as this one are often written
in the following form:
t.025,19 = 2.093
Figure 9.12 depicts the t distribution for this example, along with the critical values, the
observed t value, and the rejection regions. In this case, the decision rule is to reject the
null hypothesis if the observed value of t is less than -2.093 or greater than +2.093 (in the
tails of the distribution). Computation of the test statistic yields
Because the observed t value is +1.04, the null hypothesis is not rejected. Not enough
evidence is found in this sample to reject the hypothesis that the population mean is 25 pound.
5. A random sample of size 20 is taken, resulting in a sample mean of 16.45 and a sample
standard deviation of 3.59. Assume x is normally distributed and use this information and a
= .05 to test the following hypotheses.
Η μ = 16 Η μ≠ 16
Ans-
6. To test the difference in the two methods, the managers randomly select one group of 15
newly hired employees to take the three-day seminar (method A) and a second group of 12
new employees for the two-day DVD method (method B). Table shows required data Using
α= .05, the managers want to determine whether there is a significant difference in the mean
scores of the two groups.
ACTION: STEP 7. Because the observed value, t = -5.20, is less than the lower critical table
value, t = -2.06, the observed value of t is in the rejection region. The null hypothesis is
rejected. There is a significant difference in the mean scores of the two tests.
BUSINESS IMPLICATIONS: STEP 8. Figure 10.6 shows the critical areas and the observed
t value. Note that the computed t value is -5.20, which is enough to cause the managers of the
Hernandez Manufacturing Company to reject the null hypothesis.
7. Use the data given and the eight step process to test the following hypotheses. Use 1% level
of significance.
t=((24.56-26.42)-(0))/(√((12.4(7)+15.8(10))/11+8-2)+8-2√(1/8+1/11))=-1.0548639
8. To test this, we may recruit a simple random sample of 20 college basketball players and
measure each of their max vertical jumps. Then, we may have each player use the training
program for one month and then measure their max vertical jump again at the end of the
month.To determine whether the training program increase max vertical jump, we will
perform a paired samples t-test at significance level α = 0.05.sample mean of the differences
is -0.95 and sample standard deviation of the differences is 1.317
The hypotheses for this test are:
Given:
- Sample mean of differences (\(\bar{x_d}\)): -0.95
- Sample standard deviation of differences (sd_d): 1.317
- Sample size (\(n\)): 20
- Significance level (\(\alpha\)): 0.05
1. **Set up hypotheses:**
- \(H0: \mu_d = 0\) (Mean difference is zero)
- \(H1: \mu_d < 0\) (Mean difference is less than zero)
4. **Make a decision:**
- If \(|t| > \text{Critical Value}\), reject \(H0\).
- If \(|t| \leq \text{Critical Value}\), fail to reject \(H0\).
7. **Conclusion:**
- Summarize the results and conclude whether there is enough evidence to reject the null
hypothesis.
If you provide the critical value or degrees of freedom, I can help you complete the
calculation.
These data are related data because each P/E value for year 1 has a corresponding year 2
measurement on the same company. Because no prior information indicates whether P/E
ratios have gone up or down, the hypothesis tested is two tailed. Assume α=.01 Assume that
differences in P/E ratios are normally distributed in the population.
HYPOTHESIZE: STEP 1.
STEP 3.α=.01 STEP 4. Becauseα=.01 and this test is two tailed,α/2=.005 is used to obtain the
table t value. With nine pairs of data, n = 9, df = n - 1 = 8. The table t value is
11. Use the data given to test the following hypotheses Assume the differences are normally
distributed in the population.
H0: D = 0 Ha: D ≠ 0
Individual Before After
1 107 102
2 99 98
3 110 100
4 113 108
5 96 89
12.Suppose a store manager wants to find out whether the results of this consumer survey
apply to customers of supermarkets in her city. To do so, she interviews 207 randomly
selected consumers as they leave supermarkets in various parts of the city. Now the manager
can use a chisquare test to determine whether the observed frequencies of responses from this
Hypothesis:
Step 1: H0 : The observed value is same as the expected distribution
Ha : The observed value is not same as the expected distribution
Step 2: The statistical test being used is
X^2=∑▒〖((f_o-f_e ))/f_e 〗^2
Step 3: Let α = .05
Step 4: Chi-square goodness-of-fit tests are one tailed because a chi-square of zero indicates
perfect agreement between distributions. Any deviation from zero difference occurs in the
positive direction only because chi-square is determined by a sum of squared values and can
never be negative.
Here k=4
Degree of freedom, ie, df = k – 1 = 4 – 1 = 3
x^2 0.5,3 = 7.8147
After the data are analyzed, an observed chi-square greater than 7.8147 must be computed in
order to reject the null hypothesis.
Step 5 : The observed values gathered in the sample data from Table sum to 207. Thus n =
207. The expected proportions are given, but the expected frequencies must be calculated by
multiplying the expected proportions by the sample total of the observed frequencies.
ACTION: Step 7: Because the observed value of chi-square of 6.25 is not greater than the
critical table value of 7.8147, the store manager will not reject the null hypothesis.
BUSINESS IMPLICATIONS: Step 8: Thus, the data gathered in the sample of 207
supermarket shoppers indicate that the distribution of responses of supermarket shoppers in
the manager’s city is not significantly different from the distribution of responses to the
national survey. The store manager may conclude that her customers do not appear to have
attitudes different from those people who took the survey.
13.Use chi-square test to determine whether the observed frequencies are distributed the same
as the expected frequencies. (α= .05).
Hypothesis:
Step 1: H0 : The observed value is same as the expected distribution
Ha : The observed value is not same as the expected distribution
Step 2: The statistical test being used is
ACTION: Step 7: Because the observed value of chi-square of 12.4802 is greater than the
critical table value of 11.0705, the store manager will reject the null hypothesis.
14.Use chi-square test to determine whether the observed frequencies represent a uniform
distribution. (α= .01)
15. Dairies would like to know whether the sales of milk are distributed uniformly over a year
so they can plan for milk production and storage. A uniform distribution means that the
frequencies are the same in all categories. In this situation, the producers are attempting to
determine whether the amounts of milk sold are the same for each month of the year. They
ascertain the number of gallons of milk sold by sampling one large supermarket each month
during a year, obtaining the following data. Use α=.01 to test whether the data fit a uniform
distribution. (Using Chi-square Test)
Answer:
H0: The monthly figures for milk sales are uniformly distributed.
An observed chi-square value of more than 24.725 must be obtained to reject null hypothesis.
The expected monthly figure is,
18,447/12=1537.25 gallons
The following table shows the observed frequencies, the expected frequencies and chi-square
calculations.
The observed x2 value of 74.37 is greater than the critical table value of x2.01.11=24.725.
So, the decision is to reject null hypothesis. This problem provides enough evidence that the
distribution of milk sales is not uniform.
Answer:
Tj: T1=12 T2=23 T3=25 T=60
nj: n1=6 n2=5 n3=6 N=17
x ̅j: x ̅1=2 x ̅2=4.6 x ̅3=4.17 x =
̅ 3.59
SSC= [6(2-3.59)2+5(4.6-3.59)2+6(4.17-3.59)2]
= [6(-1.59)2+5(1.01)2+6(0.58)2]
= [6(2.53) +5(1.02) +6(0.34)]
= 22.32
SSE= [(2-2)2+(1-2)2+(3-2)2+(3-2)2+(2-2)2+(1-2)2+(5-4.6)2+(3-4.6)2+(6-4.6 )2 +(4-
4.6)2+(5-4.6)2+(3-4.17)2+(4-4.17)2+(5-4.17)2+(5-4.17)2+(3-4.17 )2+ (5-4.17)2]
= [ (0)2 +(-1)2 +(1)2 +(1)2 +(0)2 +(-1)2 +(0.4)2 +(-1.6)2 +(1.4)2 +(0.6)2 +(0.4)2 +(-1.17)2
+(-0.17)2 +(0.83)2 +(0.83)2 +(-1.17)2 +(0.83)2]
= [0+1+1+1+0+1+0.16+2.56+1.96+0.36+0.16+1.3689+
0.0289+0.6889+0.6889+1.3689+0.6889]
= 14.0334
SST= [(2-3.59)2+(1-3.59)2+(3-3.59)2+(3-3.59)2+(2-3.59)2+(1-3.59)2+(5-3.59)2+(3-
3.59)2+(6-3.59)2 +(4-3.59)2+(5-3.59)2+(3-3.59)2+(4-3.59)2+(5-3.59)2+(5-3.59)2+(3-
17. A company has three manufacturing plants, and company officials want to determine
whether there is a difference in the average age of workers at the three locations. The
following data are the ages of five randomly selected workers at each plant. Perform a one-
way ANOVA to determine whether there is a significant difference in the mean ages of the
workers at the three plants. Use α = .01 and note that the sample sizes are equal.
Answer:
H0: µ1 =µ2 =µ3
Ha: At least one of the means is different from the others.
The appropriate test statistic is the F test calculated from ANOVA.
The value of α is .01.
The degree of freedom for this problem are 3-1=2 for the numerator and 15-3=12 for the
SSC= [5(28.2-28.33)2+5(32-28.33)2+5(24.8-28.33)2]
= 129.73
SSE= [(29-28.2)2+(27-28.2)2+(30-28.2)2+(27-28.2)2+(28-28.2)2+(32-32)2 +(33-32)2
+(31-32)2 +(34-32)2 +(30-32)2 +(25-24.8) 2+(24-24.8)2+(24-24.8)2+(25-24.8)2+ (26-
24.8)2]
=19.60
SST= [(29-28.33)2+(27-28.33)2+(30-28.33)2 +(27-28.33)2+(28-28.33)2+(32-28.33)2+(33-
28.33)2+(31-28.33)2+(34-28.33)2+(30-28.33)2+(25-28.33)2+(24-28.33)2+(24-
28.33)2+(25-28.33)2+(26-28.33)2]
=149.33
dfc=C-1=3-1=2
dfe=N-C=15-3=12
dft=N-1=15-1=14
MSC=SSC/dfc=129.73/2=64.87
MSE=SSE/dfe=19.60/12=1.63
F= MSC/MSE=64.87/1.63=39.80
The decision is to reject the null hypothesis because the observed F value of 39.80 is greater
than the critical table F value of 6.93.
Answer:
Source of Variance SS df MS F
Between 0.23658 3 0.078860 10.18
Error 0.15492 20 0.007746
Total 0.39150 23
=▸ HYPOTHESIZE:
STEP 1. The milk producer’s theory is that the proportion of Wisconsin residents who drink
milk for breakfast is higher than the national proportion, which is the alternative hypothesis.
The null hypothesis is that the proportion in Wisconsin does not differ from the national
average. The hypotheses for this problem are
ACTION:
STEP 7. Because z=2.44 is beyond z .05 =1.645 in the rejection region, the milk producer
rejects the null hypothesis. The probability of obtaining a z>=2.44 by chance is .0073
.
20. A manufacturer believes exactly 8% of its products contain at least one minor flaw.
Suppose a company researcher wants to test this belief. The null and alternative hypotheses
are
The business researcher randomly selects a sample of 200 products, inspects each item for
flaws, and determines that 33 items have at least one minor flaw. Calculating the sample
proportion. (α=.10)
=A manufacturer believes exactly 8% of its products contain at least one minor flaw. Suppose
a company researcher wants to test this belief. The null and alternative hypotheses are
The business researcher randomly selects a sample of 200 products, inspects each item for
flaws, and determines that 33 items have at least one minor flaw. Calculating the sample
proportion.(α=.10)
This test is two-tailed because the hypothesis being tested is whether the proportion of
products with at least one minor flaw is .08. Alpha is selected to be .10. Figure 9.15 shows
the distribution, with the rejection regions and z.05. Because is divided for a two-tailed test,
the table value for an area of (1/2)(.10) = .05 is z.05 = ±1.645
For the business researcher to reject the null hypothesis, the observed z value must be greater
than 1.645 or less than -1.645. The business researcher randomly selects a sample of 200
products, inspects each item for flaws, and determines that 33 items have at least one minor
flaw. Calculating the sample proportion gives
The observed value of z is in the rejection region (observed z = 4.43> table z.05 = +1.645),
so the business researcher rejects the null hypothesis. He concludes that the proportion of
items with at least one minor flaw in the population from which the sample of 200 was drawn
is not .08. With α=.10 , the risk of committing a Type I error in this example is .10 .
21. Using the given sample information, test following hypotheses. Note that x is the number
in the sample having the characteristics of interest.
22.Using the given sample information, test following hypotheses. Note that x is the number
in the sample having the characteristics of interest.
23.Suppose you decide to test this result by taking a survey of your own and identify female
entrepreneurs by gross sales. You interview 100 female entrepreneurs with gross sales of less
than $100,000, and 24 of them define sales/profit as success. You then interview 95 female
entrepreneurs with gross sales of $100,000 to $500,000, and 39 cite sales/profit as a definition
of success. Use this information to test to determine whether there is a significant difference
in the proportions of the two groups that define success as sales/profit. Use α=.01
Step 1: Since we are testing to determine whether there is a difference between the two groups
of entrepreneurs, a two-tailed test is required. The hypotheses follow.
H0: p1 − p2 = 0
Ha: p1 − p2 ≠ 0
Step 2: At step 2, the appropriate statistical test and sampling distribution are determined.
Because we are testing the difference in two population proportions, the z test in Formula
10.10 is the appropriate test statistic.
Step 3: At step 3, the Type I error rate, or alpha, which is .01, is specified for this problem
Step 4: With α = .01, the critical z value can be obtained from Table A.5 for α/2 = .005, z.005
= ±2.575. The decision rule is that if the data produce a z value greater than 2.575 or less than
−2.575, the test statistic is in the rejection region and the decision is to reject the null
hypothesis. If the observed z value is less than 2.575 but greater than −2.575, the decision is
to not reject the null hypothesis because the observed z value is in the nonrejection region.
Step 5: The sample information follows:
24.A group of researchers attempted to determine whether there was a difference in the
ACTION: STEP 7. Because z =2.59 is greater than the critical table z value of 1.28 and is in
the rejection region, the null hypothesis is rejected.
BUSINESS IMPLICATIONS: STEP 8. A significantly higher proportion of consumers than
of CEOs believe fear of getting caught or losing one’s job is a strong influence on ethical
behaviour
In these examples, `x` and `y` are vectors representing two variables. You can replace these
vectors with your own data. The `cor()` function is used to calculate the correlation, and the
`method` parameter specifies the type of correlation coefficient to be computed.
26.Explain types of correlation with respect Correlation coefficient with condition and
suitable scatter diagram.
1. Positive Correlation:
Positive correlation occurs when two variables tend to increase or decrease together. In other
4.Non-Linear Correlation:
Non-linear correlation occurs when there is a relationship between two variables that is not
well described by a straight line. In such cases, a non-linear correlation coefficient, such as
Spearman or Kendall, might be more appropriate.
# Example data for non-linear correlation
diameter <- c(1, 2, 3, 4, 5)
area <- c(pi * (0.5 * diameter)^2)
spearman_corr <- cor(diameter, area, method = "spearman")
Solution:
Formula:
Solution:
Formula:
Firm Interest X Y X2 Y2 XY
1 7.43 221 55.205 48841 1642.03
2 7.48 222 55.950 49284 1660.56
3 8.00 226 64.000 51076 1808.00
4 7.75 225 60.063 50625 1743.75
5 7.60 224 57.760 49729 1702.40
38.26 1118 292.978 249555 8556.74
∑X ∑y ∑x2 ∑y2 ∑XY
= 23878.34-8554.936
√([292.698-58.553])[251002-49996.96]
=15323.404
680.344
=2.2336
Solution:
Solution:
=229-(11.49)(9.857)
√([1148-(11.42)*(11.42)][815-(9.857)(9.857))]]
=501.433/√([1017.58])[717.83]
=501.43/854.663
=0.58670
X Y X2 Y2 XY
Y_XY = (n∑▒〖xy-(∑▒x)(∑▒〖y)〗〗)/√([n∑_x▒〖2-(∑▒x)2][n∑_y▒2-(∑▒y)2]〗)
=(7*624-(80)(69))/√([(7*1148)-6400] [(7*879)-4761)])
=(-1152)/15098
= -0.76337
X Y X2 Y2 XY
158 349 24964 121801 55142
296 510 87616 260100 150960
87 301 7569 90601 26187
110 322 12100 103684 35420
436 550 190096 302500 239800
∑▒X=1087 ∑▒Y=2032 ∑_X▒2=322345 ∑_Y▒2=878686 ∑▒XY=507509
Y_XY = (n∑▒〖xy-(∑▒x)(∑▒〖y)〗〗)/√([n∑_x▒〖2-(∑▒x)2][n∑_y▒2-(∑▒y)2]〗)
=((5*507509)-(1087*2032))/√([(5*322345)-1181569] [(5*878686)-4129024])
=328761/√(430156*264406)
=0.94748
32. Find the two regression equation of X on Y and Y on X from the following data:
Solution:
34.
Find the regression equation of x on y and predict the value of x when y is 9.
X 3 6 5 4 4 6 7 5
Y 3 2 3 5 3 6 6 4
Solution:
X Y Y2 XY
3 3 9 9
6 2 4 12
5 3 9 15
4 5 25 20
4 3 9 12
6 6 36 36
7 6 36 42
5 4 16 20
∑X=40 ∑Y=32 ∑ Y2=144 ∑XY=166