0% found this document useful (0 votes)
109 views

r Programming Unit 3 Qb Solved

Uploaded by

Lishanth
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views

r Programming Unit 3 Qb Solved

Uploaded by

Lishanth
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 272

Statistical R programming

Unit-1
2 marks question
1. What are the two ways of assignment operators in R? Give an example.
The operators <- and = assign into the environment in which they are
evaluated. The operator<-can be used anywhere, whereas the
operator=is only allowed at the top level (e.g., in the complete expression
typed at the command prompt) or as one of the sub expressions in a
braced list of expressions.
>x<-5
>x
[1]5
>x=x+1#this over writes the previous value of x
>x
[1]6

2. What is vector? Give an example to create a vector?


A vector is a basic data structure that is used to store ordered
collections of elements of the same data type. R supports different
types of vectors, such as numeric, character, logical, integer, and
complex vectors. The function for creating a vector is the single
letter c, with the desired entries in parentheses separated by
commas. Here's an example of creating vectors in R:
>myvec<-c(1,3,1,42)
>myvec
[1]1 3 1 4 2

3. How do you find length of a vector? Give an example.


In R, the length() function is used to get the length of the vector. In
simpler terms, it is used to find out how many items are present in
a vector.
Ex:

Compiled by K Praveen Kumar, SMC SHIRVA


# Creating a numeric vector
numeric_vector <- c(1.5, 2.3, 4.0, 5.1)

# Finding the length of the vector


vector_length <- length(numeric_vector)

# Printing the length of the vector


print(vector_length)

4. How do you sort a vector in descending order? Give an example


Sorting of vectors can be done using the sort() function. By default, it
sorts in ascending order. To sort in descending order we can pass
decreasing=TRUE. Example:
>sort(x=c(2.5,-1,-10,3.44),decreasing=FALSE)
[1]-10.00 -1.00 2.50 3.44
>sort(x=c(2.5,-1,-10,3.44),decreasing=TRUE)
[1] 3.44 2.50 -1.00 -10.00

5. Create and store a sequence of values from 5 to−11 that progresses in


steps of 0.3
x=seq(from=5,to=-11,by=-0.3)
>x
[1] 5.0 4.7 4.4 4.1 3.8 3.5 3.2 2.9 2.6 2.3 2.0 1.7 1.4 1.1 0.8
0.5 0.2 -0.1 -0.4 -0.7 -1.0 -1.3
[23] -1.6 -1.9 -2.2 -2.5 -2.8 -3.1 -3.4 -3.7 -4.0 -4.3 -4.6 -4.9 -5.2 -5.5 -
5.8 -6.1 -6.4 -6.7 -7.0 -7.3 -7.6 -7.9
[45] -8.2 -8.5 -8.8 -9.1 -9.4 -9.7 -10.0 -10.3 -10.6 -10.9
> length(x)
[1] 54
6. Let vector, myvect with elements 5,-3,4,4,4,8,10,40221,-8,1.Write
code to Delete last element from it.

Compiled by K Praveen Kumar, SMC SHIRVA


:-You can delete the last element from a vector in R using the
`length()` function to determine the current length of the vector
and then subsetting the vector accordingly. Here's the code to
delete the last element from your vector `myvect`:
```R
myvect<-c(5,-3,4,4,4,8,10,40221,-8,1)
myvect<-myvect[1:(length(myvect)-1)]
```
Thiscodewillremovethelastelementfrom`myvect`,andyou'llbe left
with a vector containing the first 9 elements.

7. Write the purpose of negative indexing invectors? Give an example.


:-The negative index drops the element at the specified index
position, counting from the start position. This can be used to
return a set of vector values except for those which we don't want.
Syntax —x[()]specifies the index location/position of the element
we wish to remove.
Example:

numeric_vector <- c(10, 20, 30, 40, 50)


last_element <- numeric_vector[-1]
second_last_element <- numeric_vector[-2]
print(last_element)
print(second_last_element)

8. If baz <- c(1,-1,0.5,-0.5) and qux <-3 , find the value of baz+quax.
:-There's a minor typo in your code where you defined `qux`. It
should also be a vector for addition. Here's the corrected code to
find the value of` baz+ qux` in R:
```R
baz<-c(1,-1,0.5,-0.5)
qux<-c(3,3,3,3)

Compiled by K Praveen Kumar, SMC SHIRVA


result<-baz+qux
```
Now, the `result` vector will contain the sum of each corresponding
element in`baz`and`qux`. [4,2,3.5,2.5]

9. What is the use of cbind and rbind functions in Matrix? Giv ean
example
:-cbind() and rbind() both create matrices by combining several
vectors of the same length. cbind() combines vectors as columns,
while rbind() combines them as rows.
Let’s use these functions to create a matrix with the numbers 1
through 30. First, we’ll create three vectors of length 5, then we’ll
combine them into one matrix. As you will see, the cbind()function
will combine the vectors as columns in the final matrix, while the
rbind() function will combine them as rows.
x <-1:5
y<-6:10
z<-11:15
#Create a matrix where x, y and z are
columns cbind (x, y, z)
## xyz ##
[1,]1611
## [2,]2712
## [3,]3813
## [4,]4914
##[5,]5 10 15

#Createamatrixwherex,yandzarerows
rbind(x, y, z)
## [,1][,2][,3][,4] [,5]

Compiled by K Praveen Kumar, SMC SHIRVA


##x 1 2 3 4 5
##y 6 7 8 9 10
##z 11 12 13 14 15

10. How do you find the dimension of the matrix? Give an example
:-After applying the dim function in R(I use the RStudio interface), we get
two numbers back. The first number reflects the number of rows; and the
second number reflects the number of columns. In other words: Our data
Frame consists of 500 rows and 5columns.
dim(data)#Apply dim function to data.frame
# 500 5
Example:
# Creating a matrix
matrix_example <- matrix(1:6, nrow = 2, ncol = 3)

# Finding the dimensions of the matrix


matrix_dimensions <- dim(matrix_example)

# Printing the dimensions


print(matrix_dimensions)

11. Construct a 4×2 matrix that is filled row-wise with the


Values 4.3, 3.1, 8.2, 8.2, 3.2, 0.9, 1.6, and 6.5, in that order using
Rcommand.
:-You can create a 4x2 matrix filled row-wise with the specified
values in R using the `matrix()` function. Here's the R command to
do that:
```R
my_matrix<-matrix ( c (4.3, 3.1, 8.2, 8.2, 3.2,0.9,1.6,6.5),
nrow=4,ncol = 2, byrow = TRUE)
```
This will create a matrix `my_matrix` with the values arranged row-

Compiled by K Praveen Kumar, SMC SHIRVA


wise as you specified.
Output:
[,1] [,2]
[1,] 4.3 3.1
[2,] 8.2 8.2
[3,] 3.2 0.9
[4,] 1.6 6.5

12. What is the use of diag command in R? Give an example.


:-diag() function in R Language is used to construct a diagonal
matrix.
Parameters: x: value present as the diagonal elements.
nrow, ncol: number of rows and columns in which elements are
represented
Example:
# Creating a diagonal matrix with values 1, 2, 3
diagonal_matrix <- diag(c(1, 2, 3))

# Printing the diagonal matrix


print(diagonal_matrix)

Output:
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 2 0
[3,] 0 0 3

:-diag() function in R Language is used to construct a diagonal


matrix.
Parameters: x: value present as the diagonal elements.
nrow, ncol: number of rows and columns in which elements are
represented

Compiled by K Praveen Kumar, SMC SHIRVA


Example:
# Creating a diagonal matrix with values 1, 2, 3
diagonal_matrix <- diag(c(1, 2, 3))

# Printing the diagonal matrix


print(diagonal_matrix)

Output:
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 2 0
[3,] 0 0 3

13. Write proper code to replace the third column of matrixB with
the values in the third row of B.
:-To replace the third column of a matrix`B` with the values from
the third row of `B` in R, you can use the following code:
```R
#Assuming'B'isyour matrix
#Create an example matrix'B' for
illustration
B <- matrix(1:9, nrow = 3, ncol = 3)
#Replace the third column with values from the
third row B[, 3] <- B[3, ]
#Now'B' has the third column replaced with values from the third row
```
This code first assigns the values from the third row of `B` to the third column
of`B`, effectively replacing the original values in the third
column.
14. Write an example to find transpose and inverse of a matrix
using Rcommand?
:- The t() function in R is used to find the transpose of a matrix. The

Compiled by K Praveen Kumar, SMC SHIRVA


transpose of a matrix is obtained by swapping its rows with its
columns. If a matrix has dimensions m×n, the transpose will have
dimensions n×m.
Example:
# Creating a sample matrix
original_matrix <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2)

# Displaying the original matrix


print("Original Matrix:")
print(original_matrix)

# Finding the transpose of the matrix


transpose_matrix <- t(original_matrix)

# Displaying the transpose


print("Transpose Matrix:")
print(transpose_matrix)

The solve() function in R is used to compute the inverse of a square


matrix or to solve a system of linear equations
Example:
# Creating a square matrix for the inverse operation
square_matrix <- matrix(c(1, 2, 3, 4), nrow = 2)

# Displaying the original square matrix


print("Original Square Matrix:")
print(square_matrix)

# Finding the inverse of the square matrix


inverse_matrix <- solve(square_matrix)

# Displaying the inverse matrix

Compiled by K Praveen Kumar, SMC SHIRVA


print("Inverse Matrix:")
print(inverse_matrix)

15. What is the difference between & and && in R? Give an example.
:-The “&” operator performs the element-wise comparison and
returns a logical vector of the same length as its input.
Example:
# Using &
condition1 <- c(TRUE, TRUE, FALSE)
condition2 <- c(TRUE, FALSE, FALSE)

result <- condition1 & condition2

print(result)

Output:
[1] TRUE FALSE FALSE

The“&&”operator evaluates only the first element of the input and


returns the single logical value.
Example:
# Using &&
condition1 <- TRUE
condition2 <- FALSE

result <- condition1 && condition2

print(result)
Output:

[1] FALSE

Compiled by K Praveen Kumar, SMC SHIRVA


16. Write Rcommand to store the vector
c(8,8,4,4,5,1,5,6,6,8)as bar. Identify the elements less than or
equal to 6 AND not equal to 4.
:-You can store the vector and identify the elements that are less
than or equal to 6 and not equal to 4 in R using the following
commands:
```R
bar<-c(8,8,4,4,5,1,5,6,6,8)
result<-bar[bar<=6&bar!=4]
#'result'will contain elements less than or equal to 6 and not equal to 4
```
The variable `result` will store the elements that meet your
specified condition.

17. How do you count the number of individual characters in a


string?Givean example.
:-Get Length of Character String Using nchar() Function. The
RStudio console returns the value 26,i.e.our string consists of 26
characters. Note that blanks are also considered as characters by
the nchar function.
Example:
# Example vector of strings
my_vector <- c("apple", "banana", "orange")

# Using nchar() to get the number of characters in each string


vector_lengths <- nchar(my_vector)

# Displaying the result


print(vector_lengths)
Output: [1] 5 6 6

18. What is levels function in R?Give an example

Compiled by K Praveen Kumar, SMC SHIRVA


:-Levels in R are a way of defining the possible values of a factor
variable. A factor variable is a categorical variable that can have one
of a fixed set of values.
Example:
# Create a factor variable
gender <- factor(c("Male", "Female", "Male", "Female", "Male"))

# Display the original levels


print("Original Levels:")
print(levels(gender))

# Change the levels of the factor


levels(gender) <- c("M", "F")

# Display the modified levels


print("Modified Levels:")
print(levels(gender))

19. What is list in R? Give an example.


:-A list in R is a generic object consisting of an ordered collection
of objects. Lists are one-dimensional, heterogeneous data
structures. The list can be a list of vectors, a list of matrices, a list
of characters and a list of functions, and so on. A list is a vector but
with heterogeneous data elements.
Example:
# Creating a list with different types of elements
my_list <- list(
name = "John",
age = 25,
grades = c(90, 85, 92),
is_student = TRUE
)

Compiled by K Praveen Kumar, SMC SHIRVA


# Displaying the list
print(my_list)
20. What is list slicing in lists?Give an example.
:- In R, list slicing refers to extracting a subset of elements from a list.
Unlike vectors or matrices, list slicing in R involves using double square
brackets [[ ]] to access individual elements, and single square brackets
[ ] to extract sublists or specific elements.

Example:
# Creating a list
my_list <- list(
name = "John",
age = 25,
grades = c(90, 85, 92),
is_student = TRUE
)

# List slicing to extract specific elements


name_age <- my_list[c("name", "age")]
grades <- my_list[["grades"]]

# Displaying the sliced elements


print("Name and Age:")
print(name_age)

print("Grades:")
print(grades)

21. How do you name list contents?Give an example.


:-The list can be created using list() function in R. Named list is also

Compiled by K Praveen Kumar, SMC SHIRVA


created with the same function by specifying the names of the
elements to access them. Named list can also be created using
names() function to specify the names of elements after defining
the list.
# Creating a list without names
my_list <- list("John", 25, c(90, 85, 92), TRUE)

# Assigning names to the list


names(my_list) <- c("name", "age", "grades", "is_student")

# Displaying the list with names


print(my_list)

Output:
$name
[1] "John"

$age
[1] 25

$grades
[1] 90 85 92

$is_student
[1] TRUE

22. What is the purpose of attributes and class functions? Give an example.
:-The "class" attribute is what determines generic method
dispatch. A data frame has the "class" attribute set to the string
"data. frame", which is what allows generic functions like format,
print and even mathematical operators to treat it differently from,
say, a numeric vector.

Compiled by K Praveen Kumar, SMC SHIRVA


Example:
# Creating a data frame
my_dataframe <- data.frame( Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 22)
)

# Displaying the data frame


print("Original Data Frame:")
print(my_dataframe)

# Assigning a class attribute to the data frame


class(my_dataframe) <- "PersonData"

# Displaying the updated class attribute


print("Class Attribute:")
print(class(my_dataframe))

23. What is the difference between ggplot2 and base R graphics


create plots?Give an example.
:- Base R graphics are part of the core R language and are typically
created using functions like plot(), hist(), barplot(), etc. These functions
are generally straightforward, and you can customize the plots using
various parameters. Base R graphics are often suitable for quick and
simple visualizations.
Here's a simple example using base R graphics to create a scatter plot:
# Create a data frame
my_data <- data.frame(
x = c(1, 2, 3, 4, 5),
y = c(3, 5, 2, 8, 6)
)

# Create a scatter plot using base R graphics

Compiled by K Praveen Kumar, SMC SHIRVA


plot(my_data$x, my_data$y, main = "Scatter Plot", xlab = "X-axis",
ylab = "Y-axis")

ggplot2 is a more modern and powerful plotting system in R. It is based on the Grammar of
Graphics, which allows you to create complex plots by combining simple components. ggplot2
produces aesthetically pleasing and customizable visualizations and is widely used for data
visualization in R.

Here's the same example using ggplot2 to create a scatter plot:

# Load the ggplot2 library


library(ggplot2)

# Create a ggplot scatter plot


ggplot(my_data, aes(x = x, y = y)) +
geom_point() +
labs(title = "Scatter Plot", x = "X-axis", y = "Y-axis")

Compiled by K Praveen Kumar, SMC SHIRVA


4or 6 marks questions

1. Explain seq, rep and length functions on vectors with example


A vector is simply a list of items that are of the same type. To combine
the list of items to a vector, use the c() function and separate the items by
a comma. Vectors are the most basic R data objects and there are six types
of atomic vectors. They are logical, integer, double, complex, character
Using sequence (Seq.) operator:
The seq function is used to generate sequences of numbers.
Syntax:
seq(from, to, by)
from: Starting value of the sequence.
to: Ending value of the sequence.
by: Step size or increment.
Example:
# Generate a sequence from 1 to 10 with a step of 2
sequence <- seq(1, 10, by = 2)
print(sequence)
# Output: [1] 1 3 5 7 9
rep Function:
The rep function is used to replicate elements in a vector.
Syntax:
rep(x, times)
x: The vector or element to be replicated.
times: The number of times to replicate the elements.
Example:
# Replicate the sequence 1 to 3, each element 2 times
replicated_vector <- rep(seq(1, 3), times = 2)
print(replicated_vector)
# Output: [1] 1 2 3 1 2 3
length Function:
The length function returns the number of elements in a vector.
Syntax:
length(x)

Compiled by K Praveen Kumar, SMC SHIRVA


x: The vector for which you want to determine the length.
Example:
# Create a vector and get its length
vector_example <- c(5, 8, 2, 10)
vector_length <- length(vector_example)
print(vector_length)
# Output: [1] 4

2. Repeat the vector c (-1,3, -5,7, -9) twice, with each element
repeated 10 times, and store the result. Display the result sorted
from largest to smallest.
In R, you can repeat the vector `c(-1, 3, -5, 7, -9)` twice with each element
repeated 10 times, and then sort the result from largest to smallest as
follows:
```R # Create the original vector
original_vector <- c(-1, 3, -5, 7, -9)
# Repeat each element 10 times
repeated_vector <- rep(original_vector, each = 10)
# Repeat the resulting vector twice
repeated_twice_vector <- rep(repeated_vector, times = 2)
# Sort the vector from largest to smallest
sorted_vector<-sort(repeated_twice_vector,decreasing= TRUE)
# Display the sorted vector sorted_vector
``` [1] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 3 3 3 3 3 3
[27] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
[53] -1 -1 -1 -1 -1 -1 -1 -1 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -5 -
5
[79] -5 -5 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9
This code first creates the original vector `original_vector`, then repeats
each element 10 times using the `rep` function, repeats the resulting
vector twice, and finally sorts the vector in descending order to get the
desired result.
3. How do you extract elements from vectors? Explain it using individual
and vector of indexes with example?
Compiled by K Praveen Kumar, SMC SHIRVA
In R and many other programming languages, you can extract elements
from vectors using individual indices or vectors of indices. Let's explore
both methods with examples:

Individual Indexing:

You can extract a single element from a vector by specifying its index.
Example:
# Create a vector
my_vector <- c(10, 20, 30, 40, 50)

# Extract the element at index 3


element_at_index_3 <- my_vector[3]
print(element_at_index_3)
# Output: [1] 30

Vector of Indexing:

You can extract multiple elements from a vector by providing a vector of


indices.
Example:
# Create a vector
my_vector <- c(10, 20, 30, 40, 50)

# Extract elements at indices 1, 3, and 5


selected_elements <- my_vector[c(1, 3, 5)]
print(selected_elements)
# Output: [1] 10 30 50
You can also use logical vectors for indexing. For example, extract
elements greater than 30:
# Create a vector
my_vector <- c(10, 20, 30, 40, 50)

# Create a logical vector for indexing

Compiled by K Praveen Kumar, SMC SHIRVA


logical_vector <- my_vector > 30

# Extract elements where the logical vector is TRUE


selected_elements <- my_vector[logical_vector]
print(selected_elements)
# Output: [1] 40 50
Combining both individual and vector indexing:
# Create a vector
my_vector <- c(10, 20, 30, 40, 50)

# Extract elements at index 2 and those greater than 30


selected_elements<-my_vector[c(2, which(my_vector > 30))]
print(selected_elements)
# Output: [1] 20 40 50

In summary, you can use square brackets [] to extract elements from


vectors. Individual indexing allows you to access a single element, while
vector indexing allows you to extract multiple elements based on a vector
of indices or logical conditions.

4. How do you create matrix in R? Explain with Its necessary attributes?


Give an example.
In R, you can create a matrix using the matrix() function. A matrix is a two-
dimensional data structure that can have rows and columns. Here are the
necessary attributes for creating a matrix:
Syntax:
matrix(data, nrow, ncol, byrow = FALSE, dimnames = NULL)

data: The input data that forms the matrix. This can be a vector or a
combination of vectors.
nrow: The number of rows in the matrix.
ncol: The number of columns in the matrix.
byrow: A logical value indicating whether the matrix should be filled by rows
(TRUE) or by columns (FALSE). The default is FALSE.

Compiled by K Praveen Kumar, SMC SHIRVA


dimnames: An optional list with names for the rows and columns.
Example:
# Create a 3x4 matrix filled column-wise
my_matrix <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), nrow = 3, ncol = 4)

# Print the matrix


print(my_matrix)
Output:
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12

In this example:
c(1, 2, 3, ...) is the data vector that populates the matrix.
nrow = 3 specifies that the matrix should have 3 rows.
ncol = 4 specifies that the matrix should have 4 columns.
The matrix is filled column-wise by default (byrow = FALSE).
The resulting matrix has values from the data vector arranged in a 3x4 grid.
5. Do the following operations on a square matrix a. Retrieve third
and first rows of A, in that order, and from those rows, second and
third column elements. b. Retrieve diagonal elements c. Delete
second column of the matrix
Let's assume we have a square matrix A in R, and we'll perform the
specified operations:
# Create a square matrix A
A <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol = 3)

# Print the original matrix


print("Original Matrix:")
print(A)

Compiled by K Praveen Kumar, SMC SHIRVA


This will create the following matrix A:
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

Now, let's perform the specified operations:

a. Retrieve third and first rows of A, in that order, and from


those rows, second and third column elements:
# Retrieve third and first rows, and then second and third
column elements
result_a <- A[c(3, 1), c(2, 3)]

# Print the result


print("Result of operation a:")
print(result_a)
Output:
[1] "Result of operation a:"
[,1] [,2]
[1,] 6 9
[2,] 4 7
b. Retrieve diagonal elements:
# Retrieve diagonal elements
result_b <- diag(A)
# Print the result
print("Result of operation b:")
print(result_b)

Compiled by K Praveen Kumar, SMC SHIRVA


Output:
[1] "Result of operation b:"
[1] 1 5 9

c. Delete second column of the matrix:


# Delete second column
A_without_second_col <- A[, -2]
# Print the result
print("Result of operation c:")
print(A_without_second_col)
Output:
[1] "Result of operation c:"
[,1] [,2]
[1,] 1 7
[2,] 2 8
[3,] 3 9
Now, A_without_second_col is the original matrix A with
the second column removed.
6. Explain row, column, and diagonal extractions of matrix elements
with example.
You can access the matrix element by using [ ] brackets.
1. Row Extraction:
You can extract specific rows from a matrix by specifying the row
indices.
Example:
# Create a matrix
matrix_example <- matrix(1:9, nrow = 3)

# Extract the second row


second_row <- matrix_example[2, ]
Compiled by K Praveen Kumar, SMC SHIRVA
# Print the result
print("Extracted Second Row:")
print(second_row)

Output:
[1] "Extracted Second Row:"
[,1] [,2] [,3]
[1,] 4 5 6

2. Column Extraction:
You can extract specific columns from a matrix by specifying the column
indices.
Example:
# Create a matrix
matrix_example <- matrix(1:9, nrow = 3)

# Extract the third column


third_column <- matrix_example[, 3]

# Print the result


print("Extracted Third Column:")
print(third_column)
Output:
[1] "Extracted Third Column:"
[1] 3 6 9

3. Diagonal Extraction:
You can extract the diagonal elements of a matrix using the diag()
function.
Example:
# Create a matrix
Compiled by K Praveen Kumar, SMC SHIRVA
matrix_example <- matrix(1:9, nrow = 3)

# Extract the diagonal elements


diagonal_elements <- diag(matrix_example)

# Print the result


print("Diagonal Elements:")
print(diagonal_elements)
Output:
[1] "Diagonal Elements:"
[1] 1 5 9

A row extraction involves selecting specific rows, column extraction


involves selecting specific columns, and diagonal extraction involves
selecting the diagonal elements of a matrix. These operations are useful
for manipulating and analyzing data in matrices.

7. How do you omit and overwrite an element/s from a matrix?


Explain with example

Compiled by K Praveen Kumar, SMC SHIRVA


Compiled by K Praveen Kumar, SMC SHIRVA
8. Explain any 3 matrix operations using R commands with example.
Certainly! Here are three matrix operations in R along with examples:
1. Matrix Addition:
Matrix addition involves adding corresponding elements of two
matrices to create a new matrix. Both matrices should have the
same dimensions for this operation.
Compiled by K Praveen Kumar, SMC SHIRVA
Example:
```R
#Create two matrices
matrix1<-matrix(1:4,nrow=2) matrix2<-matrix(5:8,nrow=2)
# Add the matrices
result_matrix<-matrix1+matrix2 print(result_matrix)
``` Output:
```
[,1] [,2]
[1,]6 8
[2,]10 12
```
2. Matrix Multiplication:
Matrix multiplication involves multiplying two matrices together. For matrix
multiplication, the number of columns in the first matrix should be equal to
the number of rows in the second matrix. You can use the`%*%`operator in R
for matrix multiplication.
Example:
```R
#Create two matrices
matrix1<-matrix(1:4,nrow=2) matrix2 <- matrix(5:8, ncol = 2)
#Multiplythematrices
result_matrix<-matrix1%*%matrix2 print(result_matrix)
``` Output:
```
[,1] [,2]
[1,]26 30
[2,]38 44
```
3. Matrix Transposition:
Matrix transposition involves swapping the rows and columns of a matrix,
resulting in a new matrix. In R, you can use the`t()` function to transpose a

Compiled by K Praveen Kumar, SMC SHIRVA


matrix.
Example:
```R
# Create a matrix
original_matrix<-matrix(1:6,nrow=2)
# Transpose the matrix
transposed_matrix <- t(original_matrix)
print(transposed_matrix)
``` Output:
```
[,1] [,2]

[1,] 1 3
[2,] 2 4
[3,] 5 6
```
These are three fundamental matrix operations in R. You can perform
various other matrix operations using R, but these examples should help
you understand the basics.

Compiled by K Praveen Kumar, SMC SHIRVA


9. How do you create arrays in R? Explain with suitable example

10. Explain any 3 relational and logical operators used in R, with an


example.
1. Relational Operators:
- Equality Operator ==:
The equality operator checks if two values are equal.
Example:
# Check if 5 is equal to 7
result_equal <- 5 == 7
# Print the result
print(result_equal)

Output:

Compiled by K Praveen Kumar, SMC SHIRVA


[1] FALSE

In this example, the result is FALSE because 5 is not equal to 7.

- Not Operator !=:


The not equality operator checks if two values are not equal.
Example:
# Check if 5 is not equal to 7
result_not_equal <- 5 != 7

# Print the result


print(result_not_equal)
Output:
[1] TRUE
Here, the result is TRUE because 5 is indeed not equal to 7.

- Greater Than Operator >:


The greater than operator checks if the value on the left is greater than
the value on the right.
Example:
# Check if 8 is greater than 3
result_greater_than <- 8 > 3

# Print the result


print(result_greater_than)
Output:
[1] TRUE
The result is TRUE because 8 is greater than 3.

2. Logical Operators:
- AND Operator &:
The AND operator combines two conditions and returns TRUE only if

Compiled by K Praveen Kumar, SMC SHIRVA


both conditions are true.
Example:
# Check if both 5 is greater than 3 and 10 is less than 15
result_and <- (5 > 3) & (10 < 15)

# Print the result


print(result_and)
Output:
[1] TRUE
The result is TRUE because both conditions are true.

- OR Operator |:
The OR operator combines two conditions and returns TRUE if at least
one of the conditions is true.
Example:
# Check if either 5 is greater than 10 or 8 is less than 15
result_or <- (5 > 10) | (8 < 15)

# Print the result


print(result_or)
Output:
[1] TRUE
The result is TRUE because the second condition (8 < 15) is true.

- NOT Operator !:
The NOT operator negates a logical condition.
Example:
# Check if NOT (5 is greater than 10)
result_not <- !(5 > 10)

# Print the result


print(result_not)

Compiled by K Praveen Kumar, SMC SHIRVA


Output:
[1] TRUE
The result is TRUE because the negation of (5 > 10) is true.

These operators are fundamental for creating logical conditions and


making decisions in R programming.

11.Explain any, all and which functions with example, on logical vector
In R, the `any()`,`all()`, and `which()` functions are commonly used to work with
logical vectors. These functions help you determine properties of logical vectors,
identify elements that meet specific criteria, and extract their indices. Here's an
explanation of each function with examples:
1.`any()`function:
- The`any()`functioncheckswhetheratleastoneelementinalogicalvectoris
`TRUE`.Itreturns`TRUE`ifanyelementis`TRUE`,otherwise,it returns
`FALSE`.
Example:
```R
logical_vector<-c(TRUE,FALSE,FALSE,TRUE,FALSE) result <-
any(logical_vector)
#The'result'variablewillbeTRUEbecausethereareTRUEvaluesinthe vector.
```
2. `all()` function:
- The `all()` function checks whether all elements in a logical vector are
`TRUE`. It returns `TRUE` only if all elements are `TRUE`, otherwise, it returns
`FALSE`.
Example:
```R
logical_vector<-c(TRUE,TRUE,TRUE,TRUE,TRUE)
result <- all(logical_vector)
#The 'result' variable will be TRUE because all elements are TRUE.
```
```R
Compiled by K Praveen Kumar, SMC SHIRVA
another_vector<-c(TRUE,FALSE,TRUE,TRUE,TRUE)
result <- all(another_vector)
#The'result' variable will be FALSE because not all elements are TRUE.
```
3. `which()`function:
- The`which()` function is used to extract the indices of elements in a logical
vector that are `TRUE`.
Example:
```R
logical_vector<-c(FALSE,TRUE,TRUE,FALSE,TRUE,FALSE)
indices<-which(logical_vector)
#The 'indices' variable will contain the indices of TRUE elements: 2,3,and 5.
```
You can use the `which()` function to locate specific elements in a vectoror filter
data based on certain conditions.
These functions are valuable for making logical assessments, checking
conditions, and performing operations on logical vectors in R.

12.Explain cat and paste functions with necessary arguments in R. Give an


example.
In R, the cat and paste functions are used for combining and displaying strings.
Here's an explanation of each function along with examples:
cat Function:
The cat function is used to concatenate and print multiple expressions or
strings.
It is often used to display text without enclosing quotes.
Syntax:
cat(..., sep = " ", fill = FALSE, labels = NULL, append = FALSE)
...: Objects to be concatenated and printed.
sep: Separator between the objects (default is a space).
fill: If TRUE, concatenation is done by columns.
labels: Labels for the arguments.

Compiled by K Praveen Kumar, SMC SHIRVA


append: If TRUE, the output is appended to the file specified by file.
Example:
cat("Hello", "world!", "\n")
# Output: Hello world!

paste Function:
The paste function is used to concatenate strings. It takes one or more vectors,
converts them to character vectors if necessary, and concatenates them term-
by-term.
Syntax:
paste(..., sep = " ", collapse = NULL)
...: Objects to be concatenated.
sep: Separator between the objects (default is a space).
collapse: If specified, collapses the result into a single string.
Example:
paste("Hello", "world!")
# Output: "Hello world!"

paste(1:3, c("a", "b", "c"), sep = "_")


# Output: "1_a" "2_b" "3_c"
You can also use paste with the collapse argument to concatenate elements into a
single string:
paste(1:3, collapse = ", ")
# Output: "1, 2, 3"

13.Write a note on escape sequences with example.


Escape sequences in R are special character combinations used to represent
characters that are difficult to input directly in a string or to encode special
meanings with in strings. An escape sequence starts with a backslash(`\`)
followed by a character or a combination of characters. Here are some common
escape sequences in R with examples:
1. `\n`- Newline:

Compiled by K Praveen Kumar, SMC SHIRVA


- The`\n` escape sequence is used to insert a new line character in a
string. It is often used to start a new line within a character vector.
Example:
```R
message<-"Hello,\n World!"
cat(message)
#Output:
# Hello,
#World!
```
2.`\t`-Tab:
The`\t` escape sequence is used to insert a horizontal tab character in a
string.
It is often used for aligning text in columns.
Example:
```R
data<-"Name:\t John\n Age:\t30"cat(data)
#Output:
# Name: John
# Age: 30
```
3.`\\`-Backslash:
- To include a literal backslash in a string, you need to escape it with
another backslash.
Example:
```R
path<-"C:\\Users\\Username\\Documents"
cat(path)
#Output: C:\Users\Username\Documents
```
4. `\"`-Double Quote:
- To include double quotes within a double-quoted string, you need to

Compiled by K Praveen Kumar, SMC SHIRVA


escape them with a backslash.
Example:
```R
sentence<-"She said,\"Hello!\""
cat(sentence)
#Output: She said,"Hello!"
```
5. `\'`-Single Quote:
- To include single quotes with in a single-quoted string, you need to
escape them with a backslash.
Example:
```R
phrase<-'He said,\'Hi!\''
cat(phrase)
#Output: He said, 'Hi!'
Escape sequences are essential for representing special characters and
control characters within strings, ensuring that they are interpreted
correctly by R. They help you work with a wide range of textual data and
formatting needs.

14.Explain substr, sub and gsub functions on strings with an example


In R, the `substr()`, `sub()`, and `gsub()` functions are used to manipulate
and modify strings. These functions are particularly useful for extracting
or replacing substrings within text. Here's an explanation of each
function along with examples:
substr Function:
Purpose: The substr function is used to extract a substring from a given
string.
Syntax:
substr(string, start, length)
string: The original string.
start: The position in the string where extraction begins (1-based index).

Compiled by K Praveen Kumar, SMC SHIRVA


length: The number of characters to extract.
Example
my $string = "Hello, World!";
my $substring = substr($string, 7, 5); # Extract 5 characters starting from
position 7
print $substring; # Output: World

sub Function:
Purpose: The sub function is used to substitute the first occurrence of a
pattern with a replacement in a string.
Syntax:
sub(pattern, replacement, string)
pattern: The regular expression pattern to search for.
replacement: The string to replace the matched pattern.
string: The original string.
Example:
my $string = "apple orange apple banana";
sub/apple/pear/, $string; # Replace the first occurrence of 'apple' with
'pear'
print $string; # Output: pear orange apple banana

gsub Function:
Purpose: The gsub function is similar to sub, but it substitutes all
occurrences of a pattern with a replacement in a string.
Syntax:
gsub(pattern, replacement, string)
pattern: The regular expression pattern to search for.
replacement: The string to replace all occurrences of the matched
pattern.
string: The original string.
Example:
my $string = "apple orange apple banana";

Compiled by K Praveen Kumar, SMC SHIRVA


gsub/apple/pear/, $string; # Replace all occurrences of 'apple' with
'pear'
print $string; # Output: pear orange pear banana

These functions are quite powerful for manipulating strings, and the
specific syntax might vary slightly depending on the programming
language or tool you're using. The examples provided are in Perl, but
similar functions exist in other languages and tools with similar
functionality.

15.What is factor? How do you define and order levels in a factor?


In R, a "factor" is a data structure used to represent categorical data.
Categorical data consists of distinct categories or levels, such as "yes" or
"no," "low," "medium, " or "high," or any other non-numeric data that
falls into specific groups. Factors are important for statistical analysis, as
they help in modeling and analyzing categorical data.
To create a factor in R, you can use the` factor()` function. Here's how
you define and order levels in a factor:
Define a Factor:
To create a factor in R, you can use the factor() function. Here's the basic
syntax:
Syntax: factor(vector, levels = c("level1", "level2", ...), ordered = FALSE)
Example:
gender <- factor(c("Male", "Female", "Male", "Female"))
Specify Levels:
When you create a factor without specifying levels, R assigns levels
based on the unique values in the data in the order of appearance.
# Implicit levels
print(levels(gender))
# Output: "Female" "Male"

If you want to explicitly specify the order of levels, you can use the levels

Compiled by K Praveen Kumar, SMC SHIRVA


parameter:
# Specify levels explicitly
gender <- factor(c("Male", "Female", "Male", "Female"), levels =
c("Male", "Female"))
print(levels(gender))
# Output: "Male" "Female"
Order Levels:
You can also change the order of levels after creating a factor using the levels() function:

# Change the order of levels


gender <- factor(gender, levels = c("Female", "Male"))
print(levels(gender))
# Output: "Female" "Male"

16.Explain cut function on factors with an example


The cut() function in R is used to divide a numeric vector into discrete
intervals, which are then converted into a factor. This is particularly
useful when you want to categorize continuous data into bins or
intervals. The resulting factor can represent different groups or levels
based on the specified breaks.
Here's an explanation of the cut() function with an example:
Syntax:
cut(x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE)
x: The numeric vector to be cut into intervals.
breaks: Either the number of breaks desired or a vector of break points.
labels: Labels for the resulting factor levels. If not provided, the intervals
themselves are used as labels.
include.lowest: Logical, indicating whether the intervals should include
the leftmost edge.
right: Logical, indicating whether the intervals should be right-closed or
left-closed.
Example:

Compiled by K Praveen Kumar, SMC SHIRVA


# Create a numeric vector
ages <- c(21, 35, 42, 18, 28, 55, 40, 30, 22, 48)

# Use the cut function to categorize ages into bins


age_categories <- cut(ages, breaks = c(18, 25, 35, 45, 60), labels = c("18-
25", "26-35", "36-45", "46-60"))

# Display the resulting factor


print(age_categories)
In this example:
• The ages vector represents a set of ages.
• The cut() function is used to create a factor (age_categories) by
categorizing ages into bins defined by the breaks c(18, 25, 35, 45,
60).
• The labels for the resulting factor levels are specified as "18-25,"
"26-35," "36-45," and "46-60."

17.What do you mean by member reference in lists? Explain with an


example.
In R, the concept of "member reference" is typically referred to as
accessing elements or components within a list. Lists in R can contain a
variety of data types, and you use member references to access specific
elements within the list. The $ operator is commonly used for this
purpose.
Let's look at an example:
# Create a list representing a person's information
person <- list(name = "John", age = 30, city = "New York")

# Accessing individual elements using member references


name <- person$name
age <- person$age
city <- person$city

Compiled by K Praveen Kumar, SMC SHIRVA


# Display the information
print(paste("Name:", name))
print(paste("Age:", age))
print(paste("City:", city))
In this example:
• We create a list person that contains elements such as "name,"
"age," and "city," each associated with a specific value.
• To access individual elements within the list, we use the $ operator.
For example, person$name refers to the "name" element of the
person list.
• The values obtained through these member references (name, age,
and city) can be used elsewhere in the program or for display
purposes.
Alternatively, you can use double square brackets to achieve the same
result:
# Accessing individual elements using double square brackets
name <- person[["name"]]
age <- person[["age"]]
city <- person[["city"]]
Both the $ operator and double square brackets can be used for member
referencing in lists, but the $ operator is more concise and often
preferred when working with named elements.
This concept of member referencing in lists is fundamental when dealing
with complex data structures, such as lists of lists or lists of data frames,
in R. It allows you to access and manipulate specific elements within the
nested structure.

18. What is data frame? Create a data frame as shown in the given
table and write R commands a) To extract the third, fourth, and fifth
elements of the third column b) To extract the elements of age column
using dollar operator

Compiled by K Praveen Kumar, SMC SHIRVA


A data frame is a fundamental data structure in R that is used to store
and manipulate tabular data. It is similar to a spreadsheet or a database
table and is made up of rows and columns, where each column can
contain different datatypes (e.g., numbers, strings, factors) and is
typically labeled with column names. Data frames are commonly used
for data analysis and statistics in R.
To create a data frame in R, you can use the `data.frame()` function.
Here's an example of how to create a data frame and perform the tasks
you mentioned:
To create a data frame with this data and perform the tasks you
mentioned:
```R
#Create a data frame
df <- data.frame (Name = c("PETER", "LOIS", "MEG", "CHRIS",
"STEWIE"), Age = c(42,40,17,14,1),Sex = c(M,F,F,M,M))
#a)Extract the third, fourth, and fifth elements of the third column
third_to_fifth_elements <- df$ex[3:5]
third_to_fifth_elements
# Output: F M M
#b)Extract the elements of the "Age" column using the dollar operator
ages <- df$Age
ages
#Output:424017141
```
In the above code:
- We first create a data frame `df` with the given data , where the
Compiled by K Praveen Kumar, SMC SHIRVA
column names are "Name," "Age," and "Sex."
- To extract the third, fourth, and fifth elements of the third column
(Sex), we use the `$` operator to access the "S" column and then use
indexing`[3:5]` to select the specific elements.
- To extract the elements of the "Age" column using the dollar
operator, we use
`df$Age`, which retrieves all values in the "Age" column.
Now you have a data frame and have performed the tasks of extracting
specific elements from it using both the dollar operator and indexing.

19.How do you add data columns and combine data frames? Explain
with example.
In R, you can add data columns to an existing data frame and combine
data frames using various functions and techniques. Adding columns to
a data frame is a common operation when you have new data to
include, and combining data frames
Can be useful for merging, joining, or stacking data from different
sources. Here are explanations and examples for both operations:
1. Adding Data Columns to a Data Frame:
You can add data columns to a data frame using the `$` operator or the
`[[]]` operator. Here's an example of how to add a new column to an
existing data frame:
```R
#Create an example data frame
original_df <- data.frame(Name= c("Alice","Bob","Carol"), Age = c(25,
30, 28))
#Add a new column "Score" to the data frame original_df$Score <- c(92,
85, 78)
#Display the modified data frame
print(original_df)
```
In this example, we created an original data frame and then added a new

Compiled by K Praveen Kumar, SMC SHIRVA


column "Score" to it by assigning a vector of values to
`original_df$Score`.
2. Combining Data Frames:
You can combine data frames using functions like` rbind()`, `cbind()`, and
`merge()` depending on your specific needs. Here's an example of how
to combine two data frames using `rbind()` to stack them vertically:
```R
#Create two example data frames
df1 <-data.frame(Name=c("Alice","Bob"), Age = c(25, 30))
df2 <-data.frame(Name=c("Carol","Dave"), Age = c(28, 35))
#Combine the data frames vertically using rbind
combined_df<-rbind(df1,df2)
#Display the combined data frame
print(combined_df)
```
In this example, we created two data frames (`df1` and `df2`) with the
same structure, and the new combined them vertically using `rbind()` to
create a single data frame with all the rows.
Combining data frames can also be more complex when dealing with
datasets that have different structures or when merging based on
specific keys or variables. The
`merge()` function, for instance, allows you to merge data frames by one
or more common columns.
Adding columns and combining data frames are essential operations in
data manipulation and analysis in R, and there are various functions and
techniques to achieve your specific data transformation needs.

20.Write a note on special values used in R with an example for each.


In R, there are several special values that are used to represent specific
situations or concepts in data analysis and programming. These special
values play an essential role in handling missing data, infinity, and
undefined values. Here are some of the commonly used special values in

Compiled by K Praveen Kumar, SMC SHIRVA


R with examples:
1. `NA`(Not Available):
- `NA` represents missing or undefined data. It is used to indicate that
a value is not available or is missing from a dataset.
Example:
```R
x<-c(10,15,NA,25,NA)
mean(x,na.rm= TRUE)
#Calculates the mean of 'x' while ignoring the NA values, resulting in 18.33
```
2. `NaN`(Not-a-Number):
- `NaN` represents the result of undefined mathematical operations,
such as 0/0. It is used to indicate that a calculation does not have a numeric
result.
Example:
```R
x <-0/0
is.nan(x)#ReturnsTRUEbecause'x'is NaN
```
3. `Inf` (Infinity):
`Inf` represents positive infinity, while `-Inf` represents negative infinity. These
values are used to indicate unbounded or extremely large numerical values
Example:
```R x <- 1 / 0 is.infinite(x) # Returns TRUE because 'x' is Inf (infinity) ```
4. `-Inf` (Negative Infinity): -
`-Inf` represents negative infinity, similar to `Inf`, but with a negative sign.
Example: ```R x <- -1 / 0 is.infinite(x) # Returns TRUE because 'x' is -Inf
(negative infinity) ``` 5. `NULL`: - `NULL` represents the absence of an
object or a placeholder for an empty or non-existent value. It is used to
indicate that a variable does not have a value assigned. Example: ```R x <-
NULL length(x) # Returns 0 because 'x' is empty ``` These special values are
important for handling missing or undefined data, numerical calculations,

Compiled by K Praveen Kumar, SMC SHIRVA


and managing variables with no assigned value. Properly handling these
values is crucial for accurate data analysis and avoiding errors in R.

21.Explain Is-Dot Object-Checking Functions and As-Dot Coercion


Functions with an example
In R, the "Is-dot" object-checking functions and "As-dot" coercion functions
are a set of built-in functions that help you check the class or type of an
object and convert it into another class, respectively. These functions are
useful for data type validation and conversion in R. Here's an explanation of
each category with examples:
1. "Is-dot" Object-Checking Functions:
These functions allow you to check the class or type of an R object. They
return a logical value (TRUE or FALSE) indicating whether the object
matches the specified class. Some commonly used "Is-dot" functions
include:
- `is.vector()`: Checks if an object is a vector.
- `is.data.frame()`: Checks if an object is a data frame.
- `is.list()`: Checks if an object is a list.
- `is.numeric()`: Checks if an object is numeric.
- `is.character()`: Checks if an object is a character. Example: ```R #
Check if an object is a vector x <- c(1, 2, 3) is.vector(x) # Returns TRUE
# Check if an object is a data frame
df <- data.frame(Name = c("Alice", "Bob"), Age = c(25, 30))
is.data.frame(df) # Returns TRUE
# Check if an object is numeric y <- 42 is.numeric(y) # Returns TRUE ```
2. "As-dot" Coercion Functions:
These functions allow you to convert an object from one class to another.
They are used for data type conversion. Some commonly used "As-dot"
functions include:
- `as.numeric()`: Coerces an object to a numeric data type.
- `as.character()`: Coerces an object to a character data type.
- `as.factor()`: Coerces an object to a factor data type.
- `as.list()`: Coerces an object to a list data type.
Example:

Compiled by K Praveen Kumar, SMC SHIRVA


```R # Coerce an object to numeric
z <- "123"
numeric_z <- as.numeric(z)
class(numeric_z) # Returns "numeric"
# Coerce an object to character
num <- 42
char_num <- as.character(num)
class(char_num) # Returns "character"
``` These "Is-dot" and "As-dot" functions are valuable for type checking
and data type conversion when working with R objects. They help ensure
that data is correctly formatted and can be manipulated or analyzed as
needed.

22.List and explain graphical parameters used in plot function in R with example (any
4)
The `plot()` function in R allows you to create a wide variety of plots and
graphics. To customize the appearance of your plots, you can use graphical
parameters that control aspects like colors, axis labels, titles, and more. Here are
four common graphical parameters used in the `plot()` function, along with
examples:
1. `main` - Main Title:
- The `main` parameter is used to set the main title of the plot. Example:
```R
x <- c(1, 2, 3, 4, 5)
y <- c(3, 6, 2, 8, 5)
plot(x, y, main = "Scatter Plot of X vs Y") ```
2. `xlab` and `ylab` - X and Y Axis Labels:
- The `xlab` and `ylab` parameters are used to set the labels for the x and y axes,
respectively.
Example: ```R
x <- c(1, 2, 3, 4, 5)
y <- c(3, 6, 2, 8, 5)
plot(x, y, xlab = "X-Axis Label", ylab = "Y-Axis Label")
```

Compiled by K Praveen Kumar, SMC SHIRVA


3. `col` - Point or Line Color:
- The `col` parameter specifies the color of points or lines in the plot. It accepts
various color specifications, such as names, hexadecimal codes, or numeric
indices.
Example:
```R
x <- c(1, 2, 3, 4, 5)
y <- c(3, 6, 2, 8, 5)
plot(x, y, col = "blue", pch = 16) # Blue points with filled circles
```
4. `pch` - Point Character:
- The `pch` parameter is used to specify the type of point character to be used in
the plot. It accepts numeric values representing different symbols.
Example:
```R
x <- c(1, 2, 3, 4, 5)
y <- c(3, 6, 2, 8, 5)
plot(x, y, pch = 19) # Points displayed as solid squares
```
These graphical parameters are just a few examples of the many options
available in the `plot()` function to customize your plots in R. You can further
customize plots by adjusting parameters like `cex` for text size, `xlim` and
`ylim` for axis limits, and `lwd` for line width, among others.

23.How do you add Points, Lines, and Text to an Existing Plot? Explain with example.
In R, you can add points, lines, and text to an existing plot using various
functions and graphical parameters. This allows you to enhance your plots with
additional information, annotations, and visual elements. Here's how to add
points, lines, and text to an existing plot with examples:
1. Adding Points to an Existing Plot:
You can add points to a plot using the `points()` function. This is useful for
overlaying new data points on an existing plot. Example:
```R
# Create a scatter plot

Compiled by K Praveen Kumar, SMC SHIRVA


x <- c(1, 2, 3, 4, 5)
y <- c(3, 6, 2, 8, 5)
plot(x, y, main = "Scatter Plot") # Add new points to the plot new_x <-
c(2.5, 3.5)
new_y <- c(5, 3)
points(new_x, new_y, col = "red", pch = 19)
```
2. Adding Lines to an Existing Plot:
You can add lines to a plot using the `lines()` function. This is useful for
adding additional curves, lines, or segments to an existing plot. Example:
```R
# Create a scatter plot
x <- c(1, 2, 3, 4, 5)
y <- c(3, 6, 2, 8, 5)
plot(x, y, type = "b", main = "Scatter Plot with Line") # Add a line to the plot
new_x <- c(1, 5)
new_y <- c(2, 5)
lines(new_x, new_y, col = "blue", lwd = 2)
```
3. Adding Text to an Existing Plot:
You can add text labels to a plot using the `text()` function. This is useful
for annotating data points or providing additional information. Example:
```R
# Create a scatter plot
x <- c(1, 2, 3, 4, 5)
y <- c(3, 6, 2, 8, 5)
plot(x, y, main = "Scatter Plot")
# Add text labels to the plot
text(3, 5.5, labels = "Max Value", col = "green", pos = 3)
```
In the examples above, we first create a base plot using the `plot()` function
and then use the `points()`, `lines()`, and `text()` functions to add points,
lines, and text to the existing plot, respectively. You can customize the
appearance of these added elements by specifying parameters like color

Compiled by K Praveen Kumar, SMC SHIRVA


(`col`), point type (`pch`), line width (`lwd`), and text position (`pos`),
among others.

Adding points, lines, and text to plots is a powerful way to visually


communicate additional information and insights within your data
visualization.

24.How do you set appearance constants and aesthetic mapping with


geoms? Explain with example.
In R, when creating data visualizations using the ggplot2 package, you can
set appearance constants and aesthetic mappings to control the visual aspects
of your plots. Appearance constants allow you to define fixed visual
properties, while aesthetic mappings link visual properties to variables in
your data. This approach provides flexibility and enables dynamic
visualizations. Here's an explanation with an example:
1. Setting Appearance Constants:
Appearance constants allow you to define fixed visual properties, such as
line color, shape, and size. You set these properties using fixed values or
constants.
Example:
```R
library(ggplot2) # Create a scatter plot with fixed appearance constants
ggplot(data = mtcars, aes(x = wt, y = mpg)) + geom_point(color = "blue",
shape = 16, size = 3) + labs(title = "Scatter Plot of MPG vs. Weight", x =
"Weight", y = "Miles per Gallon")
```
In this example, we use the `geom_point()` function to create a scatter
plot of `mpg` (miles per gallon) against `wt` (weight) from the `mtcars`
dataset. We set appearance constants like `color`, `shape`, and `size` to
specify the point appearance. 2. Using Aesthetic Mapping: Aesthetic
mappings allow you to link visual properties to variables in your data.
This makes it easy to create dynamic visualizations where the visual
properties are determined by the data.
Example:

Compiled by K Praveen Kumar, SMC SHIRVA


```R
library(ggplot2) # Create a scatter plot with aesthetic mapping
ggplot(data = mtcars, aes(x = wt, y = mpg)) + geom_point(aes(color =
factor(cyl), shape = factor(gear), size = hp)) + labs(title = "Scatter Plot of
MPG vs. Weight", x = "Weight", y = "Miles per Gallon")
```
In this example, we use aesthetic mappings to specify that the point
`color` should be mapped to the `cyl` variable, the `shape` should be
mapped to the `gear` variable, and the point `size` should be mapped to
the `hp` variable. This allows the visual properties of the points to vary
based on the values of these variables in the dataset. By using appearance
constants and aesthetic mappings in ggplot2, you can create visually
appealing and informative data visualizations that are flexible and
adaptable to different datasets and data-driven scenarios.

Unit-II
2-mark question

1. What are Table format Files? List its features


Table format files are files that store data in a tabular structure,
typically organized into rows and columns. These files are
commonly used to store and exchange structured data, and they
come in various formats
Features are:
• Header
• Delimiter
• Missing Value

2. How can you read a CSV file in R? Give Example
In R, you can read a CSV (Comma-Separated Values) file using the
read.csv() function or its equivalent, read.csv2(). These functions are
part of the base R installation and are used to read data from a CSV file
into a data frame, which is a common data structure in R for tabular

Compiled by K Praveen Kumar, SMC SHIRVA


data. Here's an example:
# Example: Reading a CSV file in R

# Set the path to your CSV file


csv_file_path <- "path/to/your/file.csv"

# Use read.csv() to read the CSV file into a data frame


my_data <- read.csv(csv_file_path)

# Display the structure of the data frame


str(my_data)

# Display the first few rows of the data frame


head(my_data)

3. What is the purpose of write.table command? Give an example


In R, the write.table() function is used to write data from a data frame
or matrix to a text file. This function is useful when you want to export
your data to a file that can be easily shared or imported into other
applications. It is a versatile function that allows you to control various
aspects of the output, such as the delimiter, whether to include row and
column names, and more.

Here's an example of how to use the write.table() function:


# Example: Using write.table() to write data to a text file

# Create a sample data frame


my_data <- data.frame(
Name = c("John", "Alice", "Bob"),
Age = c(25, 30, 22),
Score = c(95, 89, 75)
)

# Display the data frame

Compiled by K Praveen Kumar, SMC SHIRVA


print(my_data)

# Set the path for the output text file


output_file_path <- "path/to/your/outputfile.txt"

# Use write.table() to write the data frame to a text file


write.table(
my_data,
file = output_file_path,
sep = "\t", # Set the delimiter to tab ("\t"), you can use "," for CSV
col.names = TRUE, # Include column names in the output
row.names = FALSE # Do not include row names in the output
)

# Print a message indicating that the data has been written


cat("Data has been written to", output_file_path, "\n")

4. How to read Web based files? Give example


To read web-based files in R, you can use functions from packages
such as readr, httr, or RCurl. The readr package is particularly useful
for reading various types of delimited files, including CSV files. Below
is an example of how to read a CSV file from a URL using the readr
package:

First, make sure to install the readr package if you haven't already:
# Install the readr package if not already installed
install.packages("readr")
Now, you can use the read_csv() function from the readr package to
read a CSV file from a web URL:

# Load the readr package


library(readr)

# URL of the CSV file

Compiled by K Praveen Kumar, SMC SHIRVA


url <- "https://example.com/path/to/your/file.csv"

# Read the CSV file from the web


my_data <- read_csv(url)

# Display the structure of the data frame


str(my_data)

# Display the first few rows of the data frame


head(my_data)

5. How to save plots and graphics directly to file? Give example


In R, you can save plots and graphics directly to a file using the pdf(),
png(), jpeg(), bmp(), or other similar functions, depending on the
desired file format. After creating your plot, you call one of these
functions to open a graphics device, and then you use the dev.off()
function to close the device and save the plot to a file. Here's an
example using the pdf() function to save a plot as a PDF file:
# Example: Saving a plot as a PDF file

# Generate some sample data


x <- seq(0, 2 * pi, length.out = 100)
y <- sin(x)

# Create a plot
plot(x, y, type = "l", col = "blue", lwd = 2, xlab = "X-axis", ylab = "Y-
axis", main = "Sine Wave")

# Set the file path for the output PDF file


pdf_file_path <- "path/to/your/output_plot.pdf"

# Open a PDF graphics device


pdf(file = pdf_file_path)

Compiled by K Praveen Kumar, SMC SHIRVA


# Re-create the plot (optional, but necessary to capture the plot in the
PDF)
plot(x, y, type = "l", col = "blue", lwd = 2, xlab = "X-axis", ylab = "Y-
axis", main = "Sine Wave")

# Close the PDF graphics device and save the plot to the file
dev.off()

# Print a message indicating that the plot has been saved


cat("Plot has been saved to", pdf_file_path, "\n")
6. Differentiate dput() and dget() functions in R
dput() Function:
Purpose: The dput() function is used to serialize an R object (such as a
data frame, list, or vector) into a textual representation that can be
printed or saved to a file.
Usage:
# Serialize and print an R object
dput(object, file = "filename.R")
Example:
# Create a sample data frame
my_data <- data.frame(
Name = c("John", "Alice", "Bob"),
Age = c(25, 30, 22),
Score = c(95, 89, 75)
)

# Serialize and print the data frame


dput(my_data)

dget() Function:
Purpose: The dget() function is used to deserialize (parse) the output of
dput() and reconstruct the original R object from the textual
representation.
Usage:

Compiled by K Praveen Kumar, SMC SHIRVA


# Deserialize and reconstruct an R object
dget(file = "filename.R")
Example:
# Assuming you have previously used dput to save the data frame
# Deserialize and reconstruct the data frame
reconstructed_data <- dget("filename.R")

# Display the reconstructed data frame


print(reconstructed_data)

7. Differentiate global and local environment in R


Global Environment:

Definition: The global environment, often referred to as the "global


workspace" or "global scope," is the highest level of the R environment
hierarchy.
Characteristics:
Objects created in the global environment are accessible from any part
of the R script or session.
It is the workspace that you see when you start R or RStudio.
Variables created outside of any function are typically placed in the
global environment.
Example:
# Creating a variable in the global environment
global_variable <- 10

Local Environment:
Definition: A local environment is a specific environment created
when a function is called. Each function call has its own local
environment.
Characteristics:
Objects created within a function are usually stored in the local
environment of that function.
Local environments are temporary and are created when the function is

Compiled by K Praveen Kumar, SMC SHIRVA


executed. They are destroyed when the function completes its
execution.
Variables created within a function are generally not accessible from
outside the function (unless explicitly returned).
Example:
# Creating a function with a local variable
my_function <- function() {
local_variable <- 20
print(local_variable)
}

# Calling the function


my_function()

8. What the "search path" in R refers to.? How can you view the current
search path in R?
In R, the "search path" refers to the sequence of environments that R
searches when looking for a variable or function. When you reference a
variable or function, R searches through different environments in a
specific order to find the corresponding object. The search path is
essential for determining the scope and visibility of objects in your R
environment.
Syntax:
Search()

9. What is purpose of switch () function in R? Give example


The switch() function in R is used to select one of several alternative
expressions or actions based on the value of a condition. It is a way to
implement a simple form of a switch or case statement, allowing you to
choose a specific action or value depending on the given condition.

The general syntax of the switch() function is as follows:


switch(EXPR, ..., default = NULL)
get_season <- function(month) {

Compiled by K Praveen Kumar, SMC SHIRVA


switch(month,
"December" = "Winter",
"January" = "Winter",
"February" = "Winter",
"March" = "Spring",
"April" = "Spring",
"May" = "Spring",
"June" = "Summer",
"July" = "Summer",
"August" = "Summer",
"September" = "Fall",
"October" = "Fall",
"November" = "Fall",
default = "Invalid month"
)
}

# Test the function


cat("December is in the", get_season("December"), "season.\n")
cat("June is in the", get_season("June"), "season.\n")
cat("September is in the", get_season("September"), "season.\n")
cat("InvalidMonth is", get_season("InvalidMonth"), "\n")

10. What is use of apply function in R? Give example


The apply() function in R is used to apply a function to the rows or
columns of a matrix or, more generally, to the margins of an array. It
provides a concise and flexible way to perform operations on the
elements of a matrix or array without using explicit loops.

The basic syntax of the apply() function is as follows


apply(X, MARGIN, FUN, ...)

X: The matrix or array.


MARGIN: A vector indicating which margins should be "applied

Compiled by K Praveen Kumar, SMC SHIRVA


over." Use 1 for rows, 2 for columns, and c(1, 2) for both.
FUN: The function to be applied. This can be a custom function or one
of the built-in functions.
...: Additional arguments passed to the function.

Here's a simple example using apply() to calculate the row sums of a


matrix:
# Example: Using apply() to calculate row sums of a matrix

# Create a sample matrix


my_matrix <- matrix(1:12, nrow = 3)

# Display the original matrix


print("Original Matrix:")
print(my_matrix)

# Use apply() to calculate row sums


row_sums <- apply(my_matrix, MARGIN = 1, FUN = sum)

# Display the row sums


print("Row Sums:")
print(row_sums)

11. Differentiate break and next statements in R

In R, break and next are control flow statements used within loops to
modify the flow of execution.
break Statement:
Purpose: The break statement is used to terminate the execution of a
loop prematurely. When break is encountered, the loop is immediately
exited, and the program control moves to the next statement after the
loop.
Example:
for (i in 1:10) {

Compiled by K Praveen Kumar, SMC SHIRVA


print(i)
if (i == 5) {
break
}
}
In this example, the loop prints the values of i from 1 to 10. When i
equals 5, the break statement is encountered, and the loop is exited.

next Statement:
Purpose: The next statement is used to skip the rest of the current
iteration of a loop and move to the next iteration. When next is
encountered, the loop jumps to the next iteration, skipping any
remaining code within the loop body.
Example:
for (i in 1:5) {
if (i == 3) {
next
}
print(i)
}
In this example, the loop prints the values of i from 1 to 5, but when i
equals 3, the next statement is encountered, and the loop skips the
print(i) statement for that iteration.

12. What is repeat statement in R? Give an example.


In R, the repeat statement is a loop construct that creates an infinite
loop. The loop continues to execute until explicitly terminated using
the break statement. The repeat loop is useful when you need to
repeatedly execute a block of code until a certain condition is met.

Here's an example of using the repeat loop in R:


# Example: Using repeat loop to generate random numbers until a
condition is met

Compiled by K Praveen Kumar, SMC SHIRVA


# Set a target value
target_value <- 8

# Initialize a variable to store the sum


sum_values <- 0

# Repeat loop
repeat {
# Generate a random number between 1 and 10
random_number <- sample(1:10, 1)

# Add the random number to the sum


sum_values <- sum_values + random_number

# Print the current sum


cat("Current Sum:", sum_values, "\n")

# Check if the sum exceeds the target value


if (sum_values > target_value) {
cat("Target value exceeded. Breaking out of the loop.\n")
break
}
}

cat("Loop finished.\n")

13. What do you mean by lazy evaluation? Give an example.


Lazy evaluation is a programming language feature in which
expressions are not evaluated until their values are actually needed. In
other words, the computation is delayed until the result is required for
some operation or output. This can lead to more efficient use of
resources, as not all computations are performed upfront.

In R, lazy evaluation is commonly associated with the concept of

Compiled by K Praveen Kumar, SMC SHIRVA


promises. A promise is a delayed computation or expression that is not
immediately evaluated. The actual computation is deferred until the
result is explicitly needed.

Here's a simple example to illustrate lazy evaluation in R:


# Function that uses lazy evaluation
lazy_function <- function(x) {
cat("Inside lazy_function\n")
return(x * 2)
}

# Function that demonstrates lazy evaluation


demo_lazy_evaluation <- function(y) {
cat("Inside demo_lazy_evaluation\n")
result <- lazy_function(y)
cat("Result is obtained, but lazy_function has not been executed
yet.\n")
return(result)
}

# Call the function that demonstrates lazy evaluation


result_value <- demo_lazy_evaluation(5)

# Print the final result


cat("Final Result:", result_value, "\n")

14. How to check for Missing Arguments of function in R? Give


example
In R, you can check for missing arguments in a function using the
missing() function. The missing() function returns a logical value
indicating whether an argument was explicitly passed to a function or
whether its default value is being used. If the argument is missing,
missing() returns TRUE; otherwise, it returns FALSE.

Compiled by K Praveen Kumar, SMC SHIRVA


Here's an example:
# Function with missing argument check
example_function <- function(x, y = 10, z) {
# Check if 'x' is missing
if (missing(x)) {
cat("Argument 'x' is missing or not explicitly provided.\n")
} else {
cat("Argument 'x' is present and its value is:", x, "\n")
}

# Check if 'y' is missing


if (missing(y)) {
cat("Argument 'y' is missing or not explicitly provided, using default
value (10).\n")
} else {
cat("Argument 'y' is present and its value is:", y, "\n")
}

# Check if 'z' is missing


if (missing(z)) {
cat("Argument 'z' is missing or not explicitly provided.\n")
} else {
cat("Argument 'z' is present and its value is:", z, "\n")
}
}

# Call the function with different argument combinations


example_function(x = 5, z = "Hello")
example_function(y = 15, z = "World")
example_function(x = 3, y = 12, z = "Missing Argument Example")

15. Write an R code snippet that demonstrates the use of a try Catch
block to handle an exception.
In R, you can check for missing arguments in a function using the

Compiled by K Praveen Kumar, SMC SHIRVA


missing() function. The missing() function returns a logical value
indicating whether an argument was explicitly passed to a function or
whether its default value is being used. If the argument is missing,
missing() returns TRUE; otherwise, it returns FALSE.

Here's an example:
# Function with missing argument check
example_function <- function(x, y = 10, z) {
# Check if 'x' is missing
if (missing(x)) {
cat("Argument 'x' is missing or not explicitly provided.\n")
} else {
cat("Argument 'x' is present and its value is:", x, "\n")
}

# Check if 'y' is missing


if (missing(y)) {
cat("Argument 'y' is missing or not explicitly provided, using default
value (10).\n")
} else {
cat("Argument 'y' is present and its value is:", y, "\n")
}

# Check if 'z' is missing


if (missing(z)) {
cat("Argument 'z' is missing or not explicitly provided.\n")
} else {
cat("Argument 'z' is present and its value is:", z, "\n")
}
}

# Call the function with different argument combinations


example_function(x = 5, z = "Hello")
example_function(y = 15, z = "World")

Compiled by K Praveen Kumar, SMC SHIRVA


example_function(x = 3, y = 12, z = "Missing Argument Example")

16. How can you measure the execution time of a specific piece of code
in R?
You can measure the execution time of a specific piece of code in R
using the system.time() function or the microbenchmark package. Both
methods provide information about the elapsed time, user CPU time,
and system CPU time spent on the execution of a given expression.
Using system.time():
Here's an example using system.time():
# Function that may throw an exception
divide_numbers <- function(x, y) {
result <- tryCatch({
# Attempt to divide x by y
result <- x / y
result
}, error = function(e) {
# Handle the exception
cat("An error occurred:", conditionMessage(e), "\n")
return(NA)
})
return(result)
}

# Example usage with tryCatch


result1 <- divide_numbers(10, 2) # No exception
result2 <- divide_numbers(5, 0) # Exception (division by zero)

# Print the results


cat("Result 1:", result1, "\n")
cat("Result 2:", result2, "\n")

17. Give an example of timing a code block using the system.time


function.

Compiled by K Praveen Kumar, SMC SHIRVA


Certainly! Here's an example of using the system.time() function to
measure the execution time of a code block in R:
# Your code to be measured
code_to_measure <- {
# Simulate a time-consuming operation
result <- sum(1:1000000)
for (i in 1:50000) {
result <- result * 2
}
}

# Measure the execution time


timing_result <- system.time({
code_to_measure
})

# Print the result


cat("Elapsed time:", timing_result[3], "seconds\n")

18. How do you unmount packages in R? Give an example


In R, you can unload or detach a package using the detach() function.
Detaching a package removes it from the search path, making its
functions and objects no longer directly accessible. This can be useful
when you want to unload a package temporarily or switch to a different
version of a package.

Here's an example of how to detach a package:


# Check if the package is currently attached
if ("dplyr" %in% search()) {
# Detach the 'dplyr' package
detach("package:dplyr", unload = TRUE)
cat("Package 'dplyr' detached.\n")
} else {
cat("Package 'dplyr' is not currently attached.\n")

Compiled by K Praveen Kumar, SMC SHIRVA


}

# Now you can install or load a different version of the 'dplyr' package
if needed

19. How the :: operator help to prevent masking when calling functions
from specific packages?
The :: operator in R is used to access functions or variables from a
specific package, helping to prevent function masking and namespace
conflicts. When you use the :: operator, you explicitly specify the
package that contains the function or variable you want to use, ensuring
that there is no ambiguity regarding which function is being called.

Here's an example to illustrate the use of the :: operator to prevent


masking:

Suppose you have two packages, packageA and packageB, both of


which have a function named my_function:
# In packageA
my_function <- function() {
print("Function in packageA")
}

# In packageB
my_function <- function() {
print("Function in packageB")
}

20. Why is data visualization important in data analysis with R?


Visualization helps in understanding the underlying patterns, trends,
and distributions in the data. Exploring the data graphically allows
analysts to identify outliers, missing values, and potential relationships
between variables.

Compiled by K Praveen Kumar, SMC SHIRVA


21. Differentiate boxplot and scatter plot
Purpose:
a. Boxplot: Summarizes the distribution of a dataset, especially focusing on
quartiles, median, and outliers.
b. Scatter Plot: Visualizes the relationship between two continuous variables.
Representation:
c. Boxplot: Represents the distribution of a single variable.
d. Scatter Plot: Represents the relationship between two variables.
Variables:
e. Boxplot: Typically used for a single variable.
f. Scatter Plot: Involves two variables (x, y).

4 or 6 marks questions

1. How do you read external data files into R? Explain any three types of
files with necessary commands to read their characters into R, with
example.
Table-format files are best thought of as plain-text files with three key
features that fully define how R should read the data.
Header If a header is present, it’s always the first line of the file. This
optional feature is used to provide names for each column of data. When
importing a file into R, you need to tell the software whether a header is
present so that it knows whether to treat the first line as variable names or,
alternatively, observed data values.

Delimiter The all-important delimiter is a character used to separate the


entries in each line. The delimiter character cannot be used for anything
else in the file. This tells R when a specific entry begins and ends (in other
words, its exact position in the table).

Missing value This is another unique character string used exclusively to


denote a missing value. When reading the file, R will turn these entries
into the form it recognizes: NA

Compiled by K Praveen Kumar, SMC SHIRVA


1.Spreadsheet Workbooks
The standard file format for Microsoft Office Excel is .xls or .xlsx. In
general, these files are not directly compatible with R. There are some
contributed package functions that attempt to bridge this gap—see, for
example, gdata by Warnes et al. (2014) or XLConnect by Mirai
Solutions GmbH (2014)—but it’s generally preferable to first export
the spreadsheet file to a table format, such as CSV.

To read this spreadsheet with R, you should first convert it to a table


format. In Excel, File → Save As... provides a wealth of options. Save
the spreadsheet as a comma-separated file, called spreadsheet.csv. R
has a shortcut version of read.table, read.csv, for these files.

R> spread <- read.csv(file="/Users/tdavies/spreadsheetfile.csv",


header=FALSE,stringsAsFactors=TRUE)
R> spread
V1 V2 V3
1 55 161 female

web-based files
To read web-based files in R, you can use functions from packages
such as readr, httr, or RCurl. The readr package is particularly useful
for reading various types of delimited files, including CSV files. Below
is an example of how to read a CSV file from a URL using the readr
package:

First, make sure to install the readr package if you haven't already:
# Install the readr package if not already installed
install.packages("readr")
Now, you can use the read_csv() function from the readr package to
Compiled by K Praveen Kumar, SMC SHIRVA
read a CSV file from a web URL:

# Load the readr package


library(readr)

# URL of the CSV file


url <- "https://example.com/path/to/your/file.csv"

# Read the CSV file from the web


my_data <- read_csv(url)

# Display the structure of the data frame


str(my_data)

# Display the first few rows of the data frame


head(my_data)

2. What do you mean by argument matching to function in R programming?


Explain any three of them.
In R programming, argument matching refers to the process by which
function arguments are matched to the parameters defined in a function.
There are several ways in which arguments can be matched to function
parameters, and I'll explain three of them:

Exact Matching:

In exact matching, arguments are matched to parameters based on their


names, and the order in which they are passed to the function doesn't
matter.
This is the default matching method in R.
For example, consider a function my_function(a, b). If you call the
function as my_function(b = 2, a = 1), the arguments are matched based on
their names, and the order doesn't affect the matching.
my_function <- function(a, b) {

Compiled by K Praveen Kumar, SMC SHIRVA


print(paste("a =", a, ", b =", b))
}

my_function(b = 2, a = 1) # Output: a = 1 , b = 2

Partial Matching:
• Partial matching allows you to specify only a part of the parameter name,
and R will match the argument based on the provided partial name as long
as it is unambiguous.
• This is achieved by using a unique prefix of the parameter name.
• For example, if a function has a parameter named verbose, you can use
ver as a partial match for it.
my_function <- function(verbose = FALSE) {
if (verbose) {
print("Verbose mode is on.")
} else {
print("Verbose mode is off.")
}
}

my_function(ver = TRUE) # Output: Verbose mode is on.

Positional Matching:
• Positional matching occurs when arguments are matched to parameters
based on their order of appearance in the function definition.
• This is the simplest form of matching, where the first argument is matched
to the first parameter, the second argument to the second parameter, and so
on.
• It is important to pass the arguments in the correct order when using
positional matching.

my_function <- function(a, b) {


print(paste("a =", a, ", b =", b))
}

Compiled by K Praveen Kumar, SMC SHIRVA


my_function(1, 2) # Output: a = 1 , b = 2

3. Explain if …. else and ifelse statements with syntax and example.


In R, the if...else statement and the ifelse() function are used for
conditional execution of code. They allow you to execute different blocks
of code based on whether a specified condition is TRUE or FALSE.
Below, I'll explain both the if...else statement and the ifelse() function with
their syntax and examples.

1. if...else statement:
Syntax:
if (condition) {
# Code to be executed if the condition is TRUE
} else {
# Code to be executed if the condition is FALSE
}
Example:
# Example using if...else
x <- 10

if (x > 5) {
print("x is greater than 5.")
} else {
print("x is not greater than 5.")
}
2. ifelse() function:
Syntax:
ifelse(condition, true_value, false_value)
Example:
# Example using ifelse()
y <- 8

result <- ifelse(y > 5, "y is greater than 5", "y is not greater than 5")

Compiled by K Praveen Kumar, SMC SHIRVA


print(result)

4. What do you mean by nesting and stacking in R? Explain with an example.


1. Nesting in R:
Nesting in Data Structures:

Nesting in the context of data structures usually refers to the inclusion of


one data structure within another. For example, lists can be nested within
other lists, creating a hierarchical or nested structure.
Example:

# Nesting lists
nested_list <- list(
name = "John",
age = 25,
contact = list(
email = "[email protected]",
phone = "123-456-7890"
)
)

# Accessing nested elements


print(nested_list$name) # Output: John
print(nested_list$contact$email) # Output: [email protected]
In this example, the contact element in the nested_list is itself a list,
creating a nested structure.

2. Stacking in R:
Stacking Data Frames:

Stacking in the context of data frames refers to combining multiple data


frames vertically, either by rows or columns.
Example:
# Creating two data frames

Compiled by K Praveen Kumar, SMC SHIRVA


df1 <- data.frame(ID = c(1, 2, 3), Name = c("Alice", "Bob", "Charlie"))
df2 <- data.frame(ID = c(4, 5, 6), Name = c("David", "Eve", "Frank"))

# Stacking data frames by rows (binding rows)


stacked_df <- rbind(df1, df2)

# Stacking data frames by columns (binding columns)


stacked_df_col <- cbind(df1, df2)

# Displaying the stacked data frames


print("Stacked by rows:")
print(stacked_df)

print("Stacked by columns:")
print(stacked_df_col)
In this example, rbind() is used to stack data frames df1 and df2 by rows,
and cbind() is used to stack them by columns.

5. Explain for loop with its varieties in R with syntax and example.
for Loops
The R for loop always takes the following general form:
for(loopindex in loopvector)
{
do any code in here
}
Here, the loopindex is a placeholder that represents an element in the
loopvector—it starts off as the first element in the vector and moves to the
next element with each loop repetition. When the for loop begins, it runs
the code in the braced area, replacing any occurrence of the loopindex with
the first element of the loopvector. When the loop reaches the closing
brace, the loopindex is incremented, taking on the second element of the
loopvector, and the braced area is repeated. This continues until the loop
reaches the final element of the loopvector, at which point the braced code
is executed for the final time, and the loop exits. Here’s a simple example :

Compiled by K Praveen Kumar, SMC SHIRVA


This loop prints the current value of the loopindex (which I’ve named
myitem here) as it increments from 5 to 7. Here’s the output after sending
to the console:

Looping via Index or Value


The difference between using the loopindex to directly represent elements
in the loopvector and using it to represent indexes of a vector. The
following two loops use these two different approaches to print double
each number in myvec:

Compiled by K Praveen Kumar, SMC SHIRVA


The first loop uses the loopindex i to directly represent the elements in
myvec, printing the value of each element times 2. In the second loop, on
the other hand, you use i to represent integers in the sequence
1:length(myvec). These integers form all the possible index positions of
myvec, and you use these indexes to extract myvec’s elements (once again
multiplying each element by 2 and printing the result). Though it takes a
slightly longer form, using vector index positions provides more flexibility
in terms of how you can use the loopindex.

Nesting for Loops


You can also nest for loops, just like if statements. When a for loop is
nested in another for loop, the inner loop is executed in full before the
outer loop loopindex is incremented, at which point the inner loop is
executed all over again. Create the following objects in your R console:

The following nested loop fills foo with the result of multiplying each
integer in loopvec1 by each integer in loopvec2:

Compiled by K Praveen Kumar, SMC SHIRVA


Nested loops require a unique loopindex for each use of for. In this case,
the loopindex is i for the outer loop and j for the inner loop. When the code
is executed, i is first assigned 1, the inner loop begins, and then j is also
assigned 1. The only command in the inner loop is to take the product of
the ith element of loopvec1 and the jth element of loopvec2 and assign it to
row i, column j of foo. The inner loop repeats until j reaches
length(loopvec2) and fills the first row of foo; then i increments, and the
inner loop is started up again. The entire procedure is complete after i
reaches length(loopvec1) and the matrix is filled.

6. Explain while loop in R with syntax and example.

In R, a `while` loop is a control flow statement that repeatedly executes a


block of code as long as a specified condition is true. The basic syntax of a
`while` loop in R is as follows:

```R
while (condition) {
# Code to be executed as long as the condition is true
}
```

Here's a breakdown of the syntax:

- `while`: Keyword indicating the start of the `while` loop.

Compiled by K Praveen Kumar, SMC SHIRVA


- `condition`: A logical expression that is evaluated before each iteration.
If the condition is true, the code inside the loop is executed; otherwise, the
loop is terminated.

Now, let's look at an example to illustrate the use of a `while` loop in R.


Suppose you want to print numbers from 1 to 5 using a `while` loop:

```R
# Example of a while loop in R
count <- 1 # Initialize a counter variable

while (count <= 5) {


print(paste("Current count is:", count))
count <- count + 1 # Increment the counter
}
```

In this example:

- We initialize a counter variable `count` to 1.


- The `while` loop checks whether the condition `(count <= 5)` is true.
- If true, it executes the code inside the loop, which includes printing the
current value of the counter using the `print` function and then
incrementing the counter by 1.
- The loop continues to execute as long as the condition is true, printing
the current count in each iteration.
- Once the counter exceeds 5, the loop terminates.

The output of this code will be:

```
[1] "Current count is: 1"
[1] "Current count is: 2"
[1] "Current count is: 3"

Compiled by K Praveen Kumar, SMC SHIRVA


[1] "Current count is: 4"
[1] "Current count is: 5"
```

This example demonstrates a simple `while` loop that iterates until a


specified condition is no longer true. It's important to ensure that the
condition eventually becomes false to avoid infinite loops.

7. What is apply function? Explain variety of apply functions with an


example.
In R, the `apply` function is part of the *apply family*, which includes
several functions designed to apply a specified function over the margins
of an array or matrix. The primary advantage of using these functions is
concise and efficient code, especially when dealing with multi-
dimensional data structures like matrices and arrays.

There are several functions in the *apply family*, and they include:

1. **`apply()`:** Applies a function to the rows or columns of a matrix.

```R
# Example of apply function
matrix_data <- matrix(1:9, nrow = 3)
result <- apply(matrix_data, MARGIN = 1, FUN = sum)
print(result)
```

In this example, `apply` is used to calculate the sum of each row in the
`matrix_data` matrix.

2. **`lapply()`:** Applies a function to each element of a list and returns a


list.

```R

Compiled by K Praveen Kumar, SMC SHIRVA


# Example of lapply function
list_data <- list(a = 1:3, b = 4:6, c = 7:9)
result <- lapply(list_data, FUN = sum)
print(result)
```

Here, `lapply` is used to calculate the sum of each element in the list.

3. **`sapply()`:** Similar to `lapply`, but it tries to simplify the result to a


vector or matrix.

```R
# Example of sapply function
list_data <- list(a = 1:3, b = 4:6, c = 7:9)
result <- sapply(list_data, FUN = sum)
print(result)
```

The result of `sapply` is a simplified vector in this case.

4. **`tapply()`:** Applies a function over subsets of a vector or array,


grouping by a factor.

```R
# Example of tapply function
values <- c(1, 2, 3, 4, 5, 6)
groups <- factor(c("A", "B", "A", "B", "A", "B"))
result <- tapply(values, groups, FUN = sum)
print(result)
```

`tapply` is used to calculate the sum of values grouped by the levels of


the factor variable.

Compiled by K Praveen Kumar, SMC SHIRVA


5. **`mapply()`:** Applies a function to multiple lists or vectors.

```R
# Example of mapply function
vector1 <- 1:3
vector2 <- 4:6
result <- mapply(sum, vector1, vector2)
print(result)
```

In this example, `mapply` is used to calculate the sum of corresponding


elements in `vector1` and `vector2`.

These functions provide a convenient way to perform operations on data


structures in a concise manner, avoiding the need for explicit loops in
many cases. The specific function and syntax used depend on the structure
of the data and the desired operation.

8. How do you define and call, user defined functions in R? Explain with an
example.
In R, you can define your own functions using the `function` keyword. A
user-defined function typically consists of a name, a list of parameters, and
a body containing the code to be executed. Here's the basic syntax for
defining a function in R:

```R
my_function <- function(parameter1, parameter2, ...) {
# Code to be executed
# Return statement (if needed)
}
```

After defining the function, you can call it by using its name and providing
values for the parameters. Here's an example of a simple user-defined

Compiled by K Praveen Kumar, SMC SHIRVA


function that calculates the square of a number:

```R
# Define a function to calculate the square of a number
square <- function(x) {
result <- x^2
return(result)
}

# Call the function with an argument


result <- square(5)

# Print the result


print(result)
```

In this example:

- The function `square` takes one parameter (`x`).


- Inside the function, it calculates the square of the input parameter using
the exponentiation operator (`^`) and assigns it to the variable `result`.
- The `return(result)` statement is optional; if omitted, the last evaluated
expression in the function is automatically returned.
- The function is then called with the argument `5`, and the result is stored
in the variable `result`.
- Finally, the result is printed, and the output will be:

```R
[1] 25
```

You can also define functions with multiple parameters, include


conditional statements, loops, and perform more complex operations
within the function body. Here's an example of a function that calculates

Compiled by K Praveen Kumar, SMC SHIRVA


the factorial of a number using a `for` loop:

```R
# Define a function to calculate the factorial of a number
factorial <- function(n) {
result <- 1
for (i in 1:n) {
result <- result * i
}
return(result)
}

# Call the function with an argument


result <- factorial(5)

# Print the result


print(result)
```

In this example, the `factorial` function uses a `for` loop to calculate the
factorial of a number, and the result is printed, producing the output:

```R
[1] 120
```

These examples illustrate the basic process of defining and calling user-
defined functions in R. Functions provide a way to modularize code,
making it more readable, reusable, and easier to maintain.

9. How do you set default arguments to a user defined function? Explain with
an example.
In R, you can set default values for function arguments by assigning a
default value in the function definition. This allows users to omit those

Compiled by K Praveen Kumar, SMC SHIRVA


arguments when calling the function, and the default values will be used if
a specific value is not provided. Here's the general syntax for defining a
function with default arguments:

```R
my_function <- function(arg1, arg2 = default_value2, arg3 =
default_value3, ...) {
# Code to be executed
# Use arg1, arg2, arg3, etc.
}
```

Here's an example of a user-defined function with default arguments:

```R
# Define a function with default arguments
power_calculation <- function(x, exponent = 2) {
result <- x^exponent
return(result)
}

# Call the function without providing the 'exponent' argument


result_default <- power_calculation(3)

# Call the function with a specific 'exponent' argument


result_custom <- power_calculation(3, 3)

# Print the results


print(result_default) # Uses the default exponent (2)
print(result_custom) # Uses the provided exponent (3)
```

In this example:

Compiled by K Praveen Kumar, SMC SHIRVA


- The `power_calculation` function takes two parameters: `x` and
`exponent`.
- The `exponent` parameter has a default value of `2`.
- When the function is called without explicitly providing the `exponent`
argument (`power_calculation(3)`), the default value of `2` is used.
- When the function is called with a specific `exponent` argument
(`power_calculation(3, 3)`), the provided value of `3` is used.

The output of the code will be:

```R
[1] 9 # result_default (3^2)
[1] 27 # result_custom (3^3)
```

This allows for flexibility when using the function. Users can choose to
provide a specific value for the `exponent` argument if they have a
different requirement, or they can rely on the default value if they are
comfortable with the default behavior.

10. Explain three kinds of specialized user defined functions in R, with


example.
In R, specialized user-defined functions can be created to address specific
tasks or requirements. Here are explanations of three types of specialized
user-defined functions with examples:

1. **Helper Function:**
- **Definition:** A helper function is designed to perform a specific
subtask within a larger task. It is often used to encapsulate a particular
operation, making the main code more readable and modular.
- **Example:**

```R
# Helper function to calculate the square of a number

Compiled by K Praveen Kumar, SMC SHIRVA


square <- function(x) {
return(x^2)
}

# Main function using the helper function


calculate_sum_of_squares <- function(a, b) {
sum_of_squares <- square(a) + square(b)
return(sum_of_squares)
}

# Call the main function


result <- calculate_sum_of_squares(3, 4)
print(result)
```

In this example, `square` is a helper function that calculates the square


of a number. The `calculate_sum_of_squares` function then uses this
helper function to find the sum of squares for two numbers.

2. **Vectorized Function:**
- **Definition:** A vectorized function is designed to operate efficiently
on entire vectors or matrices, taking advantage of R's ability to perform
element-wise operations without explicit loops.
- **Example:**

```R
# Vectorized function to calculate the element-wise product of two
vectors
elementwise_product <- function(vector1, vector2) {
return(vector1 * vector2)
}

# Call the vectorized function


result <- elementwise_product(c(1, 2, 3), c(4, 5, 6))

Compiled by K Praveen Kumar, SMC SHIRVA


print(result)
```

Here, `elementwise_product` is a vectorized function that calculates the


element-wise product of two vectors. The multiplication is automatically
applied to each corresponding pair of elements in the input vectors.

3. **Recursive Function:**
- **Definition:** A recursive function is one that calls itself, allowing it
to break a complex problem into simpler subproblems. Recursive functions
often have a base case that defines the simplest scenario and termination
condition.
- **Example:**

```R
# Recursive function to calculate the factorial of a non-negative integer
factorial <- function(n) {
if (n == 0 || n == 1) {
return(1) # Base case
} else {
return(n * factorial(n - 1)) # Recursive call
}
}

# Call the recursive function


result <- factorial(5)
print(result)
```

In this example, `factorial` is a recursive function that calculates the


factorial of a non-negative integer. The base case is when `n` is 0 or 1, and
in other cases, the function calls itself with a reduced value of `n`.

These are just a few examples of specialized user-defined functions in R.

Compiled by K Praveen Kumar, SMC SHIRVA


Depending on the task at hand, you can design functions that fit the
specific needs of your analysis or programming task.

11. What is exception handling? How do you catch errors with try
Statements? Explain with example
Exception handling is a programming construct that allows developers to
manage and respond to errors or exceptional situations that may occur
during the execution of a program. In R, you can catch and handle errors
using the `try` function and associated constructs.

The `try` function is used to evaluate an expression and handle any errors
that may occur during its execution. The basic syntax of the `try` function
is as follows:

```R
result <- try({
# Code that may raise an error
})
```

If an error occurs during the evaluation of the code block inside `try`, the
error is caught, and the result is an object of class `try-error`. If there is no
error, the result contains the value of the expression.

Here's an example to illustrate the use of `try` in catching errors:

```R
# Example of try statement for error handling
divide_numbers <- function(a, b) {
result <- try({
if (b == 0) {
stop("Error: Division by zero.")
}
return(a / b)

Compiled by K Praveen Kumar, SMC SHIRVA


})
return(result)
}

# Test the function with different inputs


result1 <- divide_numbers(10, 2)
result2 <- divide_numbers(5, 0)

# Print the results


print(result1) # Successful division
print(result2) # Error caught and handled
```

In this example:

- The `divide_numbers` function attempts to perform division but checks


for the possibility of division by zero.
- If the denominator `b` is zero, the function uses the `stop` function to
raise an error.
- The division operation is wrapped inside a `try` block to catch any errors
that may occur.
- When calling the function with `divide_numbers(10, 2)`, the division is
successful, and the result is printed.
- When calling the function with `divide_numbers(5, 0)`, an error is raised,
caught by the `try` statement, and the result contains an object of class
`try-error`.

The output of the code will be:

```R
[1] 5 # Result of successful division
Error in divide_numbers(5, 0) : Error: Division by zero. # Error message
for division by zero
```

Compiled by K Praveen Kumar, SMC SHIRVA


Using `try` allows you to gracefully handle errors and continue with the
execution of your program, preventing it from terminating abruptly. You
can further customize error handling by using constructs like `tryCatch` for
more complex scenarios.

12. What is “masking" in R? Explain two most common masking


situations in R, with example
In R, "masking" refers to a situation where a variable or function in a local
scope takes precedence over a variable or function with the same name in
a broader scope, such as the global environment or a package namespace.
This can lead to unexpected behavior, as the local definition "masks" or
hides the broader definition.

Here are two common masking situations in R:

1. **Function Masking:**
- **Description:** When a function with the same name is defined
locally, it takes precedence over a function with the same name in the
global environment or a package namespace.
- **Example:**

```R
# Define a function in the global environment
my_function <- function() {
print("Global function")
}

# Create a local environment and define a function with the same name
local_environment <- new.env()
assign("my_function", function() {
print("Local function")
}, envir = local_environment)

Compiled by K Praveen Kumar, SMC SHIRVA


# Call the function in the local environment
with(local_environment, my_function())
```

In this example, the local function defined in the `local_environment`


takes precedence over the global function, and calling `my_function()`
within that environment prints "Local function."

2. **Variable Masking:**
- **Description:** When a variable with the same name is assigned a
value in a local scope, it masks the variable with the same name in the
broader scope.
- **Example:**

```R
# Assign a variable in the global environment
my_variable <- "Global variable"

# Create a local environment and assign a variable with the same name
local_environment <- new.env()
assign("my_variable", "Local variable", envir = local_environment)

# Print the values of the variable in the global and local scopes
print(my_variable)
print(local_environment$my_variable)
```

In this example, the local variable defined in the `local_environment`


masks the global variable, and printing `my_variable` in the global
environment prints "Global variable," while printing
`local_environment$my_variable` prints "Local variable."

To avoid masking issues, it's good practice to use unique and descriptive
names for variables and functions. Additionally, understanding the scoping

Compiled by K Praveen Kumar, SMC SHIRVA


rules in R and using functions like `assign` and `get` with explicit
environments can help manage scope-related challenges.

13. How do you draw barplot and pie chart in R? Explain with example.
In R, you can create bar plots and pie charts using the base plotting system
or popular plotting packages like `ggplot2`. Here, I'll provide examples for
both the base plotting system and `ggplot2` for creating a bar plot and a pie
chart.

### Bar Plot:

#### Using Base Plotting System:


```R
# Example of a bar plot using the base plotting system
data <- data.frame(
Category = c("A", "B", "C", "D"),
Frequency = c(15, 25, 10, 30)
)

# Create a bar plot


barplot(data$Frequency, names.arg = data$Category, col = "skyblue",
main = "Bar Plot Example", ylab = "Frequency")
```

#### Using ggplot2:


```R
# Example of a bar plot using ggplot2
library(ggplot2)

# Create a data frame


data <- data.frame(
Category = c("A", "B", "C", "D"),
Frequency = c(15, 25, 10, 30)
)

Compiled by K Praveen Kumar, SMC SHIRVA


# Create a bar plot using ggplot2
ggplot(data, aes(x = Category, y = Frequency, fill = Category)) +
geom_bar(stat = "identity", color = "black") +
labs(title = "Bar Plot Example", y = "Frequency") +
theme_minimal()
```

### Pie Chart:

#### Using Base Plotting System:


```R
# Example of a pie chart using the base plotting system
data <- c(15, 25, 10, 30)
labels <- c("A", "B", "C", "D")

# Create a pie chart


pie(data, labels = labels, col = rainbow(length(data)), main = "Pie Chart
Example")
```

#### Using ggplot2:


```R
# Example of a pie chart using ggplot2
library(ggplot2)

# Create a data frame


data <- data.frame(
Category = c("A", "B", "C", "D"),
Frequency = c(15, 25, 10, 30)
)

# Create a pie chart using ggplot2


ggplot(data, aes(x = "", y = Frequency, fill = Category)) +

Compiled by K Praveen Kumar, SMC SHIRVA


geom_bar(stat = "identity", width = 1) +
coord_polar("y") +
labs(title = "Pie Chart Example") +
theme_minimal()
```

These examples showcase how to create a basic bar plot and pie chart
using both the base plotting system and `ggplot2`. Depending on your
preferences and requirements, you can choose the approach that best fits
your needs. Note that the `ggplot2` package provides a more flexible and
customizable approach for creating a wide range of plots.

14. Explain histograms in R with an example.


In R, a histogram is a graphical representation of the distribution of a
dataset. It displays the frequencies or relative frequencies of different
values in the dataset, typically organized into bins or intervals. Histograms
are useful for visualizing the shape of the distribution and identifying
patterns or characteristics of the data.

Here's an example of creating a histogram in R using both the base plotting


system and the `ggplot2` package:

### Using Base Plotting System:

```R
# Example of a histogram using the base plotting system
# Generate random data for illustration
set.seed(123)
data <- rnorm(1000, mean = 10, sd = 2)

# Create a histogram
hist(data, col = "skyblue", main = "Histogram Example", xlab = "Values",
ylab = "Frequency")
```

Compiled by K Praveen Kumar, SMC SHIRVA


In this example:
- We generate a random dataset of 1000 values from a normal distribution
with a mean of 10 and a standard deviation of 2 using `rnorm`.
- The `hist` function is then used to create a histogram of the data.
- The `col` argument sets the color of the bars, and the `main`, `xlab`, and
`ylab` arguments provide a title and axis labels.

### Using ggplot2:

```R
# Example of a histogram using ggplot2
# Generate random data for illustration
set.seed(123)
data <- rnorm(1000, mean = 10, sd = 2)

# Load ggplot2 library


library(ggplot2)

# Create a histogram using ggplot2


ggplot(data.frame(x = data), aes(x = x)) +
geom_histogram(binwidth = 1, fill = "skyblue", color = "black", alpha =
0.7) +
labs(title = "Histogram Example", x = "Values", y = "Frequency") +
theme_minimal()
```

In this ggplot2 example:


- We use the `ggplot` function to initialize the plot and the
`geom_histogram` function to create the histogram.
- The `aes` function is used to specify the aesthetics, and `binwidth`
controls the width of the bins.
- The `fill`, `color`, and `alpha` arguments set the appearance of the bars.
- The `labs` function is used to set the title and axis labels.

Compiled by K Praveen Kumar, SMC SHIRVA


Both examples generate a histogram that visualizes the distribution of the
randomly generated dataset. You can customize the appearance of the
histogram further based on your preferences and the characteristics of your
data.

15. Explain boxplot in R with an example.


In R, a boxplot (also known as a box-and-whisker plot) is a graphical
representation of the distribution of a dataset. It provides a visual summary
of the central tendency, spread, and skewness of the data. The boxplot
consists of a rectangular "box" that represents the interquartile range
(IQR), with a line inside the box indicating the median. Whiskers extend
from the box to the minimum and maximum values within a specified
range, and individual points beyond the whiskers are considered outliers.

Here's an example of creating a boxplot in R using both the base plotting


system and the `ggplot2` package:

### Using Base Plotting System:

```R
# Example of a boxplot using the base plotting system
# Generate random data for illustration
set.seed(123)
data <- list(A = rnorm(100, mean = 10, sd = 2),
B = rnorm(100, mean = 15, sd = 3),
C = rnorm(100, mean = 12, sd = 2))

# Create a boxplot
boxplot(data, col = c("skyblue", "lightgreen", "pink"), names = c("A", "B",
"C"),
main = "Boxplot Example", xlab = "Categories", ylab = "Values")
```

Compiled by K Praveen Kumar, SMC SHIRVA


In this example:
- We generate three sets of random data (`A`, `B`, and `C`) using `rnorm`.
- The `boxplot` function is then used to create a boxplot of the data.
- The `col` argument sets the colors of the boxes, and the `names`
argument provides labels for the categories.
- The `main`, `xlab`, and `ylab` arguments set the title and axis labels.

### Using ggplot2:

```R
# Example of a boxplot using ggplot2
# Generate random data for illustration
set.seed(123)
data <- data.frame(Category = rep(c("A", "B", "C"), each = 100),
Values = c(rnorm(100, mean = 10, sd = 2),
rnorm(100, mean = 15, sd = 3),
rnorm(100, mean = 12, sd = 2)))

# Load ggplot2 library


library(ggplot2)

# Create a boxplot using ggplot2


ggplot(data, aes(x = Category, y = Values, fill = Category)) +
geom_boxplot() +
labs(title = "Boxplot Example", x = "Categories", y = "Values") +
theme_minimal()
```

In this ggplot2 example:


- We use the `ggplot` function to initialize the plot and the `geom_boxplot`
function to create the boxplot.
- The `aes` function is used to specify the aesthetics, and `fill` sets the fill
color of the boxes.
- The `labs` function is used to set the title and axis labels.

Compiled by K Praveen Kumar, SMC SHIRVA


Both examples generate a boxplot that visualizes the distribution of the
randomly generated dataset across different categories. Boxplots are useful
for comparing the central tendency and spread of multiple groups of data.

16. What is scatterplot? Explain single plot and matrix of plots, with an
example.
A scatterplot is a graphical representation of the relationship between two
continuous variables. It displays individual data points as dots on a two-
dimensional plane, with one variable on the x-axis and the other on the y-
axis. Scatterplots are useful for visualizing patterns, trends, and
relationships between variables.

### Single Scatterplot:

In R, you can create a single scatterplot using the base plotting system or
the `ggplot2` package. Here's an example using both approaches:

#### Using Base Plotting System:

```R
# Example of a single scatterplot using the base plotting system
# Generate random data for illustration
set.seed(123)
x <- rnorm(100)
y <- 2 * x + rnorm(100)

# Create a scatterplot
plot(x, y, col = "blue", main = "Single Scatterplot Example", xlab = "X-
axis", ylab = "Y-axis")
```

In this example:
- We generate two sets of random data (`x` and `y`) using `rnorm`.

Compiled by K Praveen Kumar, SMC SHIRVA


- The `plot` function is then used to create a scatterplot of `x` against `y`.
- The `col` argument sets the color of the dots, and the `main`, `xlab`, and
`ylab` arguments provide title and axis labels.

#### Using ggplot2:

```R
# Example of a single scatterplot using ggplot2
# Generate random data for illustration
set.seed(123)
data <- data.frame(X = rnorm(100), Y = 2 * rnorm(100) + rnorm(100))

# Load ggplot2 library


library(ggplot2)

# Create a scatterplot using ggplot2


ggplot(data, aes(x = X, y = Y)) +
geom_point(color = "blue") +
labs(title = "Single Scatterplot Example", x = "X-axis", y = "Y-axis") +
theme_minimal()
```

In this ggplot2 example:


- We use the `ggplot` function to initialize the plot and the `geom_point`
function to create the scatterplot.
- The `aes` function is used to specify the aesthetics, and `color` sets the
color of the points.
- The `labs` function is used to set the title and axis labels.

### Matrix of Scatterplots:

You can also create a matrix of scatterplots to visualize relationships


between multiple pairs of variables. Here's an example:

Compiled by K Praveen Kumar, SMC SHIRVA


#### Using ggplot2:

```R
# Example of a matrix of scatterplots using ggplot2
# Generate random data for illustration
set.seed(123)
data <- data.frame(
X1 = rnorm(100),
X2 = 2 * rnorm(100) + rnorm(100),
X3 = 0.5 * rnorm(100) + rnorm(100),
X4 = -1.5 * rnorm(100) + rnorm(100)
)

# Load ggplot2 library


library(ggplot2)

# Create a matrix of scatterplots using ggplot2


ggplot(data) +
geom_point(aes(x = X1, y = X2), color = "blue") +
geom_point(aes(x = X1, y = X3), color = "red") +
geom_point(aes(x = X1, y = X4), color = "green") +
geom_point(aes(x = X2, y = X3), color = "purple") +
geom_point(aes(x = X2, y = X4), color = "orange") +
geom_point(aes(x = X3, y = X4), color = "pink") +
labs(title = "Matrix of Scatterplots Example") +
theme_minimal()
```

In this example, we have four variables (`X1`, `X2`, `X3`, `X4`), and a
matrix of scatterplots is created to visualize relationships between all
possible pairs of variables. Each scatterplot is represented by a different
color.

These examples demonstrate how to create both single scatterplots and

Compiled by K Praveen Kumar, SMC SHIRVA


matrices of scatterplots in R using both the base plotting system and
`ggplot2`.

Compiled by K Praveen Kumar, SMC SHIRVA


Unit3
1. WhatisaStatistics? Mention it types.
Statistics is the science ,or a branch of mathematics that involves
collecting, classifying ,analyzing, interpreting, and presenting numerical
facts and data. Types of Data& Measurement Scales: Nominal, Ordinal,
Interval and Ratio

2.What is a Descriptive Statistics and inferential statistics?


Descriptive statistics describe, show, and summarize the basic features of
a dataset found in a given study, presented in a summary that describes
the data sample and its measurements. It helps analysts to understand the
data better.
Inferential statistics allows you to make predictions(“inferences”)from that
data.
3.Compare Descriptive Statistics and Inferential statistic

4 . What are the four Types of Data & Measurement Scales?


TypesofData&MeasurementScales:
• Nominal
• Ordinal
• Interval and Ratio

Compiled by K Praveen Kumar, SMC SHIRVA


5. What are nominal data? Give an example.
Nominal scales are used for labeling variables, without any quantitative value.
These are qualitative data that can be used for classification purposes.
example: gender, blood type etc.,
6.What are ordinal data? Give an example.
Ordinal Level: Ordinal-level data measurement is higher than the nominal
level. In addition to the nominal level capabilities, ordinal-level measurement
can be used to rank or order objects
example: education level, income etc.,
7.What are interval data? Give an example.
Interval scales are numeric scales in which we know both the order and the
exact differences between the values.
Example: Celsius temperature because the difference between each value is the same
8 What is ratio data?Give an example.
Ratio Level Ratio-level data measurement is the highest level of data measurement.
Ratio data have the same properties as interval data, but ratio data have an absolute zero,
and the ratio of two numbers is meaningful. The , and the zero value in the data
represents the absence of the characteristic being studied. example: sales, crime rate etc.,
9 Define Measure of Central Tendency. List its types.
Measures of central tendency yield information about the center, or
middle part, of a group of numbers.
Types: Mean, median, mode
10.Define Mode. Determine the mode for the following numbers.

Compiled by K Praveen Kumar, SMC SHIRVA


The mode is the most frequently occurring value in a set of data.
MODE IS 4

11.Define Median. Write the steps to calculate Median.


The median is the middle value in an ordered array of numbers .For an array
with an odd number of terms, the median is the middle number .For an array
with an even number of terms, the median is the average of the two middle
numbers. The following steps are used to determine the median:
STEP1.Arrange the observations in an ordered data array.
STEP2.For an odd number of terms, find the middle term of the ordered array. It Is
the median.
STEP3.For an even number of terms, find the average of the middle two terms. This
average is the median.

12 Define Median. Determine the median for the numbers


2233444456788 89
MEDIAN N+1/2TH 9+1=10/2=5 Median=4
13.Define Population and Sample mean.Write its formulae
Sample mean is the arithmetic mean of random sample values drawn
from the population.
Population mean represents the actual mean of the whole population

14. Determine the mode and median for the following numbers.

MODE= 0 73, 167,199,213, 243, 345, 444 ,524, 609, 682 Arrange the numbers in
ascending order:= 0 73, 167,199,213, 243, 345, 444 ,524, 609, 682
Mode: The mode is the number that appears most frequently in the dataset .
In this case, there are no repeated values, so there is no mode.
Median: The median is the middle value in a data set when it's arranged in
ascending order. In this case, there are 10 numbers, so the median will be the
average of the 5th and 6th numbers (the middle two numbers):
Median= (243+345)/2=588 /2=294
15.Compute the mean for the following numbers.

Compiled by K Praveen Kumar, SMC SHIRVA


Mean= (17.3 +44.5+31.6+40.0+52.8 +38.8+30.1+78.5)/8
= (334.6)/8 Mean = 41.825
16 Define Percentiles. Write the steps to calculate location of Percentiles.
Percentiles are measures of central tendency that divide a group of data into
100 parts.

Compiled by K Praveen Kumar, SMC SHIRVA


17.Compute the mean for the following numbers.
16 28 29 13 17 20 11 34 32 27 25 30 19 ,18 ,33
Mean= (16+28+29+13+17 +20+11+34+32+ 27 +25 +30 +19 +18+
+33)/ 15
=376 /15 Mean = 25.067
18 What is Quartiles? Determine Q3 for 14, 12, 19, 23, 5, 13, 28, 17.
Quartiles are values that divide a data set into four equal parts. Quartiles are
often used to understand the spread and distribution of data.
To determine Q3(the third quartile) for the dataset 14,12,19,23, 5,13,28, 17,
follow these steps
1. First, arrange the data in ascending
order: 5, 12, 13, 14, 17, 19, 23, 28
2. Calculate the position of Q3 using the f ormula:
Q3=3/4*(n +1) = 3/4*(8+1) = 3/4 * 9 Q3 = 6.75
3. Since the position of Q3 is not an integer, you will need to interpolate to
find the value at this position. To interpolate, you can take the average of
the data values at positions 6 and 7. These correspond to the 6th and 7th
data points in the ordered list:
Q3 = (19+23)/2
= 42 /2 = 21
19) Determine the 30 th percentile of the following eight numbers: 14,
12,19,23,5,13, 28, 17.
To determine the 30th percentile of the dataset {14,12,19,23,5,13,28,17},
follow these steps:
1. First,arrange the data in ascending
order: 5, 12, 13, 14, 17, 19, 23, 28
2. Calculate the position:
P30 = (P/100)*(n+1) =(30/100)*(8+1) = (0.30) * 9 = 2.7

Compiled by K Praveen Kumar, SMC SHIRVA


3. Since the position of the 30th percentile is not an integer, you will need to
interpolate to find the value at this position. To interpolate, you can take a weight
average of the data values at positions 2 and 3.Thes e correspond to the 2nd and 3rd
data points in the ordered list:

P30= Value at position 2+ (0.7*Difference between values at positions 3


and 2)
P30=12+(0.7 *(13-12))
P30=12+(0.7 *1)
P30=12+ 0.7
P30 =12.7
20 What are Measures of Variability? List.
Measures of central tendency yield information about the center or middle
part of a data set. However, business researchers can use another group of
analytic tools, measures of variability, to describe the spread or the
dispersion of a set of data.
Lists: Range, interquartile range, mean absolute deviation ,variance
,standard deviation ,Z scores.

21 Define Range. Write the range of following numbers.


The range is the difference between the largest value of a data set and the
smallest value of a set.
11 13 16 17 18 19 20 25 27 28 29 30 32 33 34 [ascending]
Range= Highest-Lowest=34-11=23
22 Define Interquartile Range.Write its formulae.
The interquartile range is the range of values between the first and
third quartile. It determined by computing the
Value of Q3-Q1.

Compiled by K Praveen Kumar, SMC SHIRVA


23 Define Mean Absolute Deviation.Write its formulae .
The mean absolute deviation (MAD) is the average of the absolute values of
the deviations around the mean for a set of numbers

24 Define Variance. Write its formulae.

25 Define Standard Deviation.Write its formulae.


The standard deviation is a popular measure of variability. It is used to
separate entity and as a part of other analyses, such as computing confidence
intervals and in hypothesis testing

26 Write the formulae for sample Variance and sample Standard Deviation.

27 Define Zscore.Write its formulae.


Z score represents the number of standard deviations a value(x) is above or
below the mean of a set of numbers when the data are normally
distributed.Using zscores allows translation of a value’s raw distance from
the mean into units of standard deviations
28 State EmpiricalRule. List the condition

Compiled by K Praveen Kumar, SMC SHIRVA


The empirical rule is an important rule of thumb that is used to state the approximate
percentage of values that lie within a given number of standard deviations from the mean of
a set of data if the data are normally distributed.

29 Define Coefficient of Variation.Write its formulae.


The coefficient of variation is a statistic that is the ratio of the Standard
deviation to the mean expressed in percentage and is denoted CV.

30 Define Measures of Shape.Mention its types.


Measures of shape are tools that can be used to describe the shape of a
distribution of data. In this section, we examine two measures of shape
Skewness And kurtosis. We also look at box-and-whisker plots.

31 What is Skewness? Draw its types.


A distribution of data in which the right half is a mirror image of the left
half is said to be symmetrical. Skewness is when a distribution is a
symmetrical or lacks symmetry

Compiled by K Praveen Kumar, SMC SHIRVA


32 Write the formulae to calculate Coefficient of Skewness using Karl Pearson.

33 Write the formulae to calculate coefficient of Skewness using Bowel’s.

34 Write the relationship between mean, median and mode in various


skewness.

35 What is Kurtosis? Mention its types.


Kurtosis describes the amount of peakedness of a distribution.
1leptokurtic
2mesokurtic3
platykurtic

36 What are the components of Box-and-Whisker Plots?


A box-and-whiskerplot, sometimes called a boxplot, is a diagram that utilizes the
upper and lower quartiles along with the median and the two most extreme values to
depict a distribution graphically.
Compiled by K Praveen Kumar, SMC SHIRVA
37 What are Pie charts and Barcharts?
A pie chart is a circular depiction of data where the area of the whole pie
represents 100% of the data and slices of the pie represent a percentage
breakdown of the sublevels.
A bar graph generally is constructed from the same type of data that is used
to produce a pie chart.

38 What are Histogram and frequency polygons?


A histogram is a series of contiguous bars or rectangles that represent the
frequency of data in given class intervals.
A frequency polygon,like the histogram,is a graphical display of
classfrequencies.However,insteadofusingbarsorrectangleslikeahistogram,i
n a frequency polygon each class frequency is plotted as a dot at the class
midpoint, and the dots are connected by a series of line segments.
39 What are Stem and Leaf plot?
A stem-and-leaf plot is constructed by separating the digits for each number
of the data into two groups ,as tem and a leaf. The left most digits are the
stem and consist of the higher value digits.The right most digits are the
leaves and contain the lower values. If a set of data has only two digits, the
stem is the value on the left and the leaf is the value on the right.

40 What isa Probability?Mention it types.


Probability is a numerical measure of chance of occurrence of an event
The three general methods of assigning probabilities are (1)the classical
method, (2)the relative frequency of occurrence method, and (3)
subjective probabilities.

41 What is an Experiment?Give Example.


A test, trial, or tentative procedure; an act or operation for the purpose of
discovering something unknown or of testing a principle, supposition ,etc.: a
chemical experiment; a teaching experiment; an experiment in living.
42 What is an E vent?Give Example.
Because an event is an outcome of an experiment, the experiment defines the
possibilities of the event.
Example: throwing a die

43What is the classical method of assigning of a Probability?Give ex


When probabilities are assigned based on laws and rules, the method is referred
to as the classical method of assigning probabilities. This method involves an

Compiled by K Praveen Kumar, SMC SHIRVA


experiment, which is a process that produces outcomes, and an event, which is an
outcome of an experiment.
When we assign probabilities using the classical method, the probability of an individual
event occurring is determined as the ratio of the number of items in a population containing
the event(ne)to the total number of items in the population (N). That is, P(E) = ne /N.

44 What is the relative frequency of occurrence method assigning of a


Probability?Give Example.
The relative frequency of occurrence method of assigning probabilities is
based on cumulated historical data. With this method, the probability of an
event occurring is equal to the number of times the event has occurred in the
past divided by the total number of opportunities for the event to have
occurred.

45 What are the subjective probabilities?GiveExample.


The subjective method of assigning probability is based on the feelings
or insights of the person determining the probability. Subjective
probability comes from the person’s intuition or reasoning. Although not
a scientific approach to probability, the subjective method often is based
on the accumulation of knowledge, understanding, and experience stored
and processed in the human mind.

46 Define Elementary Events.Give an example.


Events that cannot be decomposed or broken down into other events are
called elementary events.Elementary events are denoted by lower case
letters (e.g., e1, e2, e3, . . .)
For example,if we toss a coin, then the sample space S = {Head,Tail}. Now
the event of Head appearing on the die is simple and is given by E= {Head}.

47 Define Sample Space.Give an example.


A sample space is a complete roster or listing of all elementary events for an
experiment example: the sample space for the roll of a pair of dice.The sample
space for the roll of a single die is {1, 2, 3, 4, 5, 6}

48 Give an example for Unions and Inter sections.


A union of sets produces a new set containing each value the two originals
sets contain.
An intersection is denoted X∩Y. To qualify for intersection, an element must
be in both X and Y. The intersection contains the elements common to both

Compiled by K Praveen Kumar, SMC SHIRVA


sets.
In this example, P∪Y ={2,4.5,7,9.8,10,8,3,4,5.6}. Example2:Find R∩
(Z∩O ) two different ways if R={,4,6,10.87),Z={4,,7,10.87,11.2},
and O= {1,,8,10,13}

49 Define Mutually Exclusive Events.Give an example.


Two or more events are mutually exclusive events if the occurrence of one
event precludes the occurrence of the other event(s). This characteristic
means that mutually exclusive events cannot occur simultaneously and
therefore can have no intersection.
For example ,when a coin is tossed then the result will be either head or tail,
but we cannot get both the results.

50 Define Independent Events.Give an example.


Two or more events are independent events if the occurrence or non
occurrence of one of the events does not affect the occurrence or non
occurrence of the other event(s).
Example: Riding a bike and watching your favorite movie on a laptop.

51 Define Collectively Exhaustive Events.Give an example.


A list of collectively exhaustive events contains all possible elementary
events for an experiment. Thus, all sample spaces are collectively exhaustive
lists.
The list of possible outcomes for the tossing of a pair of dice contained in
Table 4.1is a collectively exhaustive list. The sample space for an experiment
can be described as a list of events that are mutually exclusive and
collectively exhaustive. Sample space events do not overlap or intersect, and
the list is complete.
For example, when rolling asix-sided die, the events 1,2,3,4,5,and 6 balls of a
s ingle out come are collectively exhaustive,be cause they encompass the
entire range of possible outcomes.

52 Define Complementary Events.Give an example.


The complement of event A is denoted A’, pronounced“ not A.” All the
elementary events of an experiment not in A comprise its complement.
example, clearing an exam or not clearing an exam.

53. If a population consists of the positiveeve n numbers through 30 and if A=

Compiled by K Praveen Kumar, SMC SHIRVA


{2,6,12,24}, what is A’?
The population contains the elements:2,4,6,8,10,12,14,16,18,20,22,24,
26,28,30.
The elements of A are even numbers:2,6, 12 and 24.
Since A contains elements taken from the population,and is a subset of it,then
A is a sample.
A’={4,8,10,14,16,18,20,22,26,28,30}

54 What are the three types of Counting the Possibilities


The mn Counting Rule
Sampling from a Population with Replacement
Combinations : Sampling from a Population without Replacement
55 What are mn counting rule? Give an example.
If one thing can be done in m ways and another thing can be done in n ways,
the two things can be done in mn ways. Example: Are stauranth as 5
appetizers, 8 beverages, 9 entrees, and 6 desserts on the menu. If you have a
beverage and a dessert,the reare 8*6= 48 different meals consisting of a
beverage and dessert.

56 What are sampling from a Population with Replacement? Give an example.


If you sample with replacement, you would choosen e person's name, put
that person's name back in the hat ,and then choose another name. The
possibilities for y our two-name sample are: John, John. John, Jack.

57 What are sampling from a Population without Replacement? Give an


example.
sampling without replacement, in which a subset of the observations are
selected randomly, and once an observation is selected it cannot be selected
again. sampling with replacement, in which a subset of observations are
selected randomly, and an observation may be selected more than once.
58..Write the Special Addition Laws.

59.Write the General Law of Multiplication.

Compiled by K Praveen Kumar, SMC SHIRVA


60.Write the Special Law of Multiplication

61.Write the Conditional Probability

62 What are random variables? Give a example.


A random variable is a variable that contains the outcomes of a chance
experiment. Suppose experiment is to measure the time between the
completions of two tasks in a production line. The values will range from 0
seconds to n seconds. These time measurements are the values of another
random variable
A typical example of a random variable is the outcome of a coin toss.

63 What are discrete random variables?Give an example.


A random variable is a discrete random variable if the set of all possible
values is atmost a finite or accountable infinite number of possible values. In
most statistical situations, discrete random variables produce values that are
nonnegative whole numbers
.Examples of discrete random variables include the number of children in a
family, the Friday night attendance at a cinema, the number of patients in a
doctor's surgery, the number of defective light bulbs in a box of te.
64.Write the general Addition. Laws.

Compiled by K Praveen Kumar, SMC SHIRVA


65 What are Continuous random variables?Give an example.
Continuous random variables take on values at every point over a
given interval. Thus continuous random variables have no gaps or un
assumed
values. It could be said that continuous random variables a regenerated from
experiments in which things are “measured” not “counted.”
In general, quantities such as pressure, height, mass, weight, density, volume,
temperature, and distance are examples of continuous random variables.
66. List three types of discrete distributions.
this text, three discrete distributions a represented: 1.Binomial distribution
2. Poisson distribution 3. Hypergeometric distribution

67. Write the formulae for Mean, Variance, and Standard Deviation of Discrete
distribution

Compiled by K Praveen Kumar, SMC SHIRVA


68.List the assumptions of Binomial Distribution.

69 Write the formulae of binomial distribution.

70 What are Poisson distribution? Give an example.


.The Poisson distribution focuses only on the number of discrete occurrences
over some interval or continuum. A Poisson experiment does not have a
given number of trials (n) as a binomial experiment does EXAMPLE; .
Number of telephone calls per minute at a small business

71 List the characteristics of Poisson distribution.


The Poisson distribution has the following characteristics: ■ It is a discrete
distribution. ■It describes rare events. ■Each occurrence is independent of
the other occurrences. ■ It describes discrete occurrences over a continuum
or interval. ■ The occurrences in each interval can range from zero to infinity.
■ The expected number of occurrences must hold constant throughout the
experiment

Compiled by K Praveen Kumar, SMC SHIRVA


72.Write the formulae of Poisson distribution.

73. What are uniform distributions? Write the probability density functionof
uniform distribution.
The uniform distribution, sometimes referred to as the rectangular
distribution, Is a relatively simple continuous distribution in which the sam
e height,orf(x),is obtained over a range of values. The following probability
Density function defines a uniform distribution.

74 Write the formulae of mean and standard deviation of a uniform


distribution.

75.Write the formulae of Probabilities in a Uniform Distribution.

76 List the characteristics of normal distribution.


The normal distribution exhibits the following characteristics.
i. It is a continuous distribution.
ii. It is a symmetrical distribution about its mean.
iii. It is a symptotic to the horizontal axis.
iv. It is unimodal.
v. It is a family of curves.
vi. Areaunderthecurveis1

Compiled by K Praveen Kumar, SMC SHIRVA


77.Write the probability density function Normal Distribution

78 What are t Distribution? Write the formula for the tstatistic.


Gosset developed the t distribution, which is used instead of the z distribution
For doing inferential statistics on the population mean when the population
standard deviation is unknown and the population is normally distributed.

79 Write the Confidence Intervals formulae in t statistic.

80 Write the Z formulae for sample mean.


It's defined as z=y−¯y s ,z=y−y¯ s, where y is a data value, ¯y is the mean
of all data values, and s is their standard deviation.

LONG ANSWERS
1. Explain four Types of Data & Measurement Scales with example.
In statistics and data analysis, data can be classified in to different types and
measurement scales, each with its own characteristics and appropriate

Compiled by K Praveen Kumar, SMC SHIRVA


methods of analysis. There are four main types of data and measurement
scales:

1. Nominal Data (Categorical Data):


- Nominal data consists of categories or labels that cannot be ordered or
ranked.These categories represent distinct groups with no inherent order
or numeric meaning.
- Examples: Gender(male, female),Color(red, blue, green),Country
(USA, Canada, France).
In R, you can represent nominal data as factors or character vectors. For
example:
```R
gender<-factor(c("male","female","male","female","male"))
```
2.Ordinal Data:
- Ordinal data consists of categories with a natural order or ranking. While
the intervals between categories are not equal, we can establish a clear
order.
- Examples: Education level (high school, bachelor's, master's, Ph.D.),
Likert scale (strongly disagree, disagree, neutral, agree, strongly agree).
In R,you can represent ordinal data as ordered factors. For example:
```R
education<-ordered(c("bachelor's","highschool","master's","Ph.D.", "high
school"))
```
3.Interval Data:
- Interval data has a numerical scale where the intervals between value are
equal and meaningful. However, it lacks a true zero point, meaning that a
value of 0 does not indicate the absence of the attribute being measured.
- Examples: Temperature in Celsius or Fahrenheit, IQ scores.
In R, interval data is typically represented as numeric vectors. For
example:
```R
temperature_celsius<-c(22.5,30.0,18.3,25.7,15.2)
```
4.Ratio Data:
- Ratio data is similar to interval data but has a true zero point, where 0

Compiled by K Praveen Kumar, SMC SHIRVA


means the complete absence of the attribute being measured. In ratio
data, both the intervals and ratios between values are meaningful and
interpretable.
- Examples: Height in centi meters,Weight in kilograms, Age in years,
Income in dollars.
In R,ratio data is also represented as numeric vectors. For example:
```R
height_cm<-c(165,180,155,170,150)
```
Understanding the type and measurement scale of your data is crucial
because it determines the appropriate satistical methods and analyses that

Compiled by K Praveen Kumar, SMC SHIRVA


Can be applied to the dataset. Different types of data require different
treatment and statistical tests for accurate analysis and interpretation.

2. Explain Kurtosis types with diagram.


Kurtosis is a statistical measure that describes the distribution of data points
in a dataset in relation to the shape of its tails (the tails being the extreme
values, both high and low). There are three main types of kurtosis:
1. Mesokurtic:
- A mesokurtic distributionhaskurtosisequalto3,which is the kurtosis of a
normal distribution. In a mesokurtic distribution, the tails are neither too
heavy (leptokurtic) nor too light (platykurtic). It has a bell-shaped curve
similar to the normal distribution.
Diagram for a mesokurtic distribution:

2. Leptokurtic:
- A leptokurtic distribution has positive kurtosis greater than 3,indicating
heavy tails. This means that the distribution has more extreme values than
a normal distribution, resulting in higher peaks and thicker tails.
Diagram for a leptokurtic distribution:

3. Platykurtic:
- A platykurtic distribution has negative kurtosis less than 3, indicating
light tails. In this type of distribution, data points are less concentrated
around the mean and have thinner tails compared to a normal

Compiled by K Praveen Kumar, SMC SHIRVA


distribution.

Compiled by K Praveen Kumar, SMC SHIRVA


Diagram for a platykurtic distribution:

In R, you can calculate kurtosis using the `kurtosis()` function from the
"e1071"package.Tocreateahistogramandvisuallyinspectthekurtosisofa
dataset, you can use the `hist()` function and examine the shape of the
distribution. Here's an example:
```R
#Exampledataset
data<-c(1,2,3,4,5,6, 6, 7,7,7,8,8,8,8,9)
#Calculatekurtosis
library(e1071)
kurt<-kurtosis(data) #
Create a histogram
hist(data,main="HistogramofData",xlab="Value",ylab="Frequency") # Add
kurtosis information to the plot
text(6,3,paste("Kurtosis=",round(kurt,2)),col="red")
```
In this example, we calculate the kurtosis of the `data` and create a
histogram to visualize the data distribution.The kurtosis value is displayed
on the histogram plot.Depending on the kurtosis value,you can determine
whether the distribution is mesokurtic, leptokurtic, or platykurtic.

3. Explain Measure of Skewness with its types.


Skewness is a statistical measure that quantifies the asymmetry of the
probability distribution of a dataset.It tells you whether the data is skewed
to the left (negatively skewed), centered (symmetrical), or skewed to the
right (positively skewed). Skewness is important because it can affect the
assumptions of many statistical tests and models.
There are different types of skewness measures:
1. Pearson's Coefficient of Skewness(First Type of Skewness):

Compiled by K Praveen Kumar, SMC SHIRVA


- Pearson's coefficient of skewness, also known as the first type of
skewness,is a measure of the a symmetry of a dataset. It is defined

as:
- A positive skewness value indicates that the data is skewed to the right
(long tail on the right side of the distribution), while a negative skewness
value indicates that the data is skewed to the left(long tail on the left side
of the distribution). A skewness value of suggests that the data is symmetric.
In R,you can calculate Pearson's coefficient of skewness using the
`skewness()`functionfromthe"e1071"package.
```R
#Example dataset
data<-c(5,6,6,7,8,8,8,9,10)
#Calculateskewness
library(e1071)
skew<-skewness(data)
```
2. Bowley’s coefficient of skewness:

Both types of skewness can provide insights into the shape of the data
distribution and can help you identify deviations from normality. Depending

Compiled by K Praveen Kumar, SMC SHIRVA


On your data and analysis goals, you can choose the appropriate skewness
measure to use.

4. The number of U.S. carsin service by top carrental companies in a recent year
according to Auto Rental News follows. Compute the mode, the median, and
themean.

Mode:9,000
Median:With 13 different companies in this group,.
The median is located at the 7th position. Because the data are already ordered,the
7th term is 20,000, which is the median.
Mean:The total number of cars in service is 1, 791, 000 =∑x

5.Compute the 35th percentile, the 55th percentile, Q1, Q2, and Q3 for
the following data

Compiled by K Praveen Kumar, SMC SHIRVA


6 The following shows the top 16 global marketing categories for
advertisingspending for a recent year according to Advertising Age.
Spending is given in millions of U.S.dollars. Determine the first, the second,

Compiled by K Praveen Kumar, SMC SHIRVA


and the third quartiles for these data.

7.

g.Calcualte Coefficient of Variation.


Let's go through each of the calculations step by step:

a. Range:
The range is the difference between the maximum and minimum values in the
dataset.

Maximumvalue:9
Minimumvalue:1
Range=Maximum-Minimum Range
=9-1
Range=8

Compiled by K Praveen Kumar, SMC SHIRVA


b. Mean Absolute Deviation(MAD):
MAD is the average of the absolute differences between each data point and the
mean.
First,calculate the mean(average):
Mean =(6+2+5+3+1 +9+4)/7
Mean =30 / 7
Mean≈4.29(rounded to two decimal places)
N ow,calculate the absolute differences from the mean for each data point and find
their average:
MAD=[(|6-4.29|+|2 -4.29|+|5 -4.29|+|3 -4.29|+|1 -4.29|+|9 -4.29|+|4 - 4.29|) /
7]
MAD≈(1.71+2.29+0.71 +1.29 +3.29+4.71 +0.29)/7
MAD≈2.14(rounded to two decimal places)
c. Population Variance:
To calculate the population variance, you'll first find the squared differences from
the mean, and then take the average of those squared differences.
Population Variance=[((6 -4.29)^2+(2-4.29)^2+(5-4.29)^2+(3 -4.29)^2+ (1 - 4.29)^2
+ (9 - 4.29)^2 + (4 - 4.29)^2 ) / 7]

Population Variance≈(2.92+5.28+0.51+1.73+10.96+20.51+0.08)/7 Population


Variance ≈ 5.23 (rounded to two decimal places)
d. Population Standard Deviation:
Population Standard Deviation is the square root of the population variance.
Population Standard Deviation=√5.23≈2.29 (rounded to two decimal places)
e. Interquartile Range(IQR):
To find the interquartile range, you first need to determine the first quartile(Q1)
and the third quartile (Q3). Then, subtract Q1 from Q3.
1. Arrange the data in ascending order:
1,2,3,4,5,6,9

2. Find the median(Q2),which is them idle value: Q2


=4
3. CalculateQ1(the median of the lower half of the data):
Q1 = (2 + 3) / 2

Compiled by K Praveen Kumar, SMC SHIRVA


Q1=2.5
4. CalculateQ3(the median of the upper half of thed ata):
Q3 = (5 + 6) / 2
Q3=5.5
5. Calculate the IQR:
IQR = Q3 - Q1
IQR=5.5-2.5
IQR=3
f. Z-scores:
Thez-scoreforeachvalueisameasureofhowmanystandarddeviationsitisfrom the
mean.
To find the z-score for a value x, you can use the formula: Z=(x-Mean)/ Standard
Deviation.
Mean=4.29(calculated in partb)
StandardDeviation≈2.29(calculated in part d) Now,
calculate the z-scores for each value:
Z(6)=(6-4.29)/ 2.29 ≈0.74
Z(2)=(2-4.29)/ 2.29 ≈-1.01
Z(5)=(5-4.29)/ 2.29 ≈0.31
Z(3)=(3-4.29)/ 2.29 ≈-0.57
Z(1)=(1-4.29)/ 2.29 ≈-1.44
Z(9)=(9-4.29)/ 2.29 ≈2.04
Z(4)=(4-4.29)/ 2.29 ≈-0.13
g. Coefficient of Variation(CV):
The coefficient of variation is calculated as the ratio of the standard deviation to
the mean, expressed as a percentage.
CV=(Standard Deviation/Mean)*100 CV =
(2.29 / 4.29) * 100 ≈ 53.38%
So,the coefficient of variation is approximately 53.38%

8. Calculate Coefficient of Variation

Compiled by K Praveen Kumar, SMC SHIRVA


Let's go through each of the calculation step by step for the given dataset:

Compiled by K Praveen Kumar, SMC SHIRVA


9.Shown here is a sample of six of the largest accounting firms in the United
States and the number of partners associated with each firm as reported byt
hePublic Accounting Report. Calculate sample variance and sample standard
deviation.

Compiled by K Praveen Kumar, SMC SHIRVA


10.

To find the population variance and population standard deviation for


the given data, you can follow these steps:
Dataset:378,601,75,646,546,179,749,531,90,280,953,468,123,392,572,
303
1. Calculate the mean(average)of the data:

Compiled by K Praveen Kumar, SMC SHIRVA


Mean=(378 +601+75 +646 +546+179 +749 +531 +90 +280+953 +468 +
123+392+572+303)/16
Mean=7494/16
Mean=468.375
2. Calculate the squared differences from the mean for each data point:

Compiled by K Praveen Kumar, SMC SHIRVA


(378-468.375)^2=83610.14
(601-468.375)^2=17483.77
(75-468.375)^2=144788.14
(646-468.375)^2=31632.89
(546-468.375)^2=5981.14
(179-468.375)^2=83385.64
(749-468.375)^2=78754.89
(531-468.375)^2=39385.89
(90-468.375)^2=146860.89
(280-468.375)^2=35424.89
(953-468.375)^2=230260.14
(468-468.375)^2=0.14
(123-468.375)^2=119313.14
(392-468.375)^2=58294.64
(572-468.375)^2=106655.64
(303-468.375)^2=27258.14
3. Calculate the population variance by finding the average of the
sequared differences:
PopulationVariance=(83610.14+17483.77+144788.14+31632.89+5981.14+
83385.64+78754.89+39385.89+146860.89+35424.89+230260.14+0.14+
119313.14+58294.64+106655.64+27258.14)/16
Population
Variance=1,734,261.56/16
Population Variance ≈ 108,391.35

4. Calculate the population standard deviation, which is the square


root of the population variance:
Population Standard Deviation = √108, 391.35 ≈ 329.42 (rounded to two
decimal places)
So, the population variance is approximately 108, 391.35, and the
population standard deviation is approximately 329.42.

Compiled by K Praveen Kumar, SMC SHIRVA


5. Calculate the population standard deviation,which is the square
root of the population variance:
Population Standard Deviation = √108, 391.35 ≈ 329.42 (rounded to two
decimal places)
So,the population variance is approximately 108, 391.35, and the
population standard deviation is approximately 329.42.

11.

In statistics, the mean, median, and mode are measures of central


tendency that describe the center of a dataset. When there is a
difference between these measures, it can indicate the presence of
skewness in the distribution of the data. Skewness refers to the
asymmetry of the data distribution.
In your case, the average closing price(mean)is $35, the median value is
$33, and the mode is $21.To determine if the distribution is skewed and
the direction of the skew, you can compare the mean, median, and
mode:
1. If the mean is greater than the median,it suggests that the data is
positively (right) skewed. In a positively skewed distribution, the tail
on the right side is longer, and the majority of data points are
concentrated on the left side of the center.
2. If the mean is less than the median, it suggests that the data is
negatively (left) skewed. In a negatively skewed distribution, the tail on
the left side is longer, and the majority of data points are concentrated
on the right side of the center.
In your case, the mean ($35) is greater than the median ($33), which
suggests a positive skew. This means that there are some high-priced
Compiled by K Praveen Kumar, SMC SHIRVA
stocks that are pulling the mean up wards, creating alonger tail on the
right side of the distribution. Most of the data points are clustered to the
left of the mean.
So,the distribution of these stock prices is positively skewed.This
skewness implies that there are relatively few very high-priced stocks in
the dataset compared to the majority of stocks with lower prices.

12.

In this scenario, we have the following statistics for the distribution of ages:
1. Mean age = 51
2. Medianage=54
3. Modal age=59
To discuss the skewness soft he age distribution,we can consider the
relationships between the mean, median, and mode.
1. If the mean,median, and mode are approximately equal, the
distribution is approximately symmetrical and has little or no
skewness.
2. If the mean is less than the median,and the mode is greater than the
median, the distribution is negatively or left-skewed. In this case, the
tail of the distribution is stretched out to the left.
3. If the mean is greater than the median,and the mode is less than the
median, the distribution is positively or right-skewed. In this case, the
tail of the distribution is stretched out to the right.
In your case:
- The mean age(51)is less than the median age(54), indicating as light
negative skewness.
- The modal age(59) is greater than the median age(54), which

Compiled by K Praveen Kumar, SMC SHIRVA


supports the idea of a left-skewed distribution.

Compiled by K Praveen Kumar, SMC SHIRVA


So,based on the given statistics, it appears that the distribution of ages
is slightly left-skewed. This means that there may be a relatively larger
number of younger individuals, with a few older individuals
contributing to the higher median and mode ages.

13.

To compute the Pearsonian coefficient of skewness for the given


data,follow these steps:
1. Calculate the mean(average)and standard deviation of the data.
2. Use the formula for Pearsonian coefficient of skewness:
Skewness=3*(Mean - Median) / Standard Deviation.
Given data:41,15,31,25,24,23,21,22,22,18,30,20,19,19,16,23,27,38,34,
24,19,20,29,17,23
Step1:Calculate the mean,median,and standard deviation.
-Mean (μ): (41 +15+31 +25+24 +23+21 +22 +22+18+30 +20 +19 +19
+16 +23 +27+38+34 +24 +19+20+29 +17 +23) /25=474/25 =18.96
(rounded to two decimal places)
To calculate the median,first, arrange the data in ascending order:
15,16,17,18,19,19,20,20,21,22,22,23,23,23,24,24,25,27,29,30,31,34,
38,41
Since there are25 data points,the median is the value at the(25+1)/
2=13th position, which is 23.

Compiled by K Praveen Kumar, SMC SHIRVA


-Median(M):23
To calculate the standard deviation,you can use a calculator or software.
The standard deviation is approximately 6.49 (rounded to two decimal
places).
Step2:Calculate the Pearsonian coefficient of
skewness. Skewness = 3 * (Mean - Median) /
Standard Deviation Skewness = 3 * (18.96 - 23) /
6.49
Skewness=-13.04/6.49
Skewness≈-2.01(rounded to two decimal places)
Interpretation:
The Pearsonian coefficient of skewness, in this case, is approximately -
2.01. A negative value of skewness suggests that the distribution is left-
skewed or negatively skewed. This means that the tail of the distribution
is stretched out to the left,and there may be some lower values that are
pulling the mean towards the left side of the distribution. The skewness
value of -2.01 indicates a relatively strong leftward skew.

14.

To construct a box-and-whisker plot and determine if there are any


outliers and whether the distribution is skewed, follow these steps:
1. First,arrange the data in ascending order:
379,477,490,495,495,497,503, 510,527,540,541,558,559,562,570,574,
580,588,590,601,602,609,623,690

Compiled by K Praveen Kumar, SMC SHIRVA


2. Find the median(the middle value).Since there are 24 data points,the
mediani s the average of the 12th and 13th values:
Median=(558+559)/2=558.5
3. Divide the data into two halves: the lower half(1st quartile)and the
upper half (3rd quartile).
Lowerhalf:379,477,490,495,495,497,503,510,527,540,541,558
Upperhalf:559,562,570,574,580,588,590,601,602,609,623,690
4. Find the median of the lower and upper halves:
Lowerquartile(Q1)=(503+510)/2=506.5
Upperquartile (Q3)=(590+601)/2= 595.5
5. Calculate the interquartile range(IQR):
IQR=Q3-Q1 =595.5-506.5=89
6. Determine the lower and upperb ounds for potential
outliers: Lower Bound = Q1 - 1.5 * IQR = 506.5 - 1.5 *
89 = 377Upper Bound = Q3 + 1.5 * IQR = 595.5 + 1.5 *
89 = 725\
7. Identify any data points that fall below the lower bound or above
the upper bound.
In this case, there are no data points below the lower bound(no lower
outliers)or above the upper bound (no upper outliers). Therefore, there
are no outliers in the data.

15)
Shownhereisalistofthetopfiveindustrialandfarmequipmentcompaniesin
theUnited States,along withtheir annualsales ($ millions).Construct
apiechartand a bar graph to represent these data, and label the slices
with the appropriatepercentages.

Compiled by K Praveen Kumar, SMC SHIRVA


Compiled by K Praveen Kumar, SMC SHIRVA
16. The following list shows the top six pharmaceutical companies

in the United States and their sales figures($millions)for are cent


year.Use this in formation to
Construct a piechart and a bar graph to represent these six companies and their
sales

Compiled by K Praveen Kumar, SMC SHIRVA


Compiled by K Praveen Kumar, SMC SHIRVA
17 The following data represent the costs (indollars) of a sample of
30 postal mailings by a company.Using dollars as astem and
centsa sale af,construct a stem-and-leaf plot of the data.

Ste root
m
1 83 93 97
2 09 75 78 84
3 67 32 34 55 53 89 21
4 10 15 84 95
5 10 11 42 47
6 45 72
7 20 80
8 64
9 15
10 94

18.

Compiled by K Praveen Kumar, SMC SHIRVA


19.

20.

Compiled by K Praveen Kumar, SMC SHIRVA


21. Explain general Methods of assigning probabilities with example.
Assigning probabilities to events is a fundamental concept in probability
theory and statistics. There are several general methods for assigning
probabilities, depending on the context and the nature of the problem.
Here are some common methods with examples in R:
1. Classical(or Theoretical)Probability:
- This method assigns probabilities based on the assumption that all
outcomes in the sample space are equally likely. It is often used for
simple, well-defined experiments.
Example:
Suppose you roll a fair six-sided die. Each face of the die has an
equal probability of 1/6.
In R, you can represent this using a probability distribution:
```R
outcomes <- 1:6
probabilities<-
rep(1/6,6)
```
2. Empirical(or Experimental)Probability:
- This method assigns probabilities based on observed data from
past experiments or events. It is useful when you have data to
estimate probabilities.

Compiled by K Praveen Kumar, SMC SHIRVA


Example:
You want to estimate the probability of winning a game based on 100
recorded game outcomes. You find that you won 30 out of the 100 ga
mes, so the empirical probability of winning is 30/100.
In R, you can calculate empirical probabilities from data:
```R
total_games<-100
wins<-30
empirical_probability<-wins/total_games
```
3. Subjective Probability:
- This method assigns probabilities based on personal judgment or
subjective beliefs. It is used when there is no available data or the or
etical basis for assigning probabilities.
Example:
You estimate the probability of it raining tomorrow based on your
personal judgment. You might assign a probability of 0.3 if you
believe there's a 30% chance of rain.
In R, you can simply assign the subjective probability:
```R
subjective_probability<-0.3
```
4. ProbabilityDensityFunctions(PDF)andCumulativeDistributionFunct
ions (CDF):

Compiled by K Praveen Kumar, SMC SHIRVA


- In somecases, you may define probabilities by specifying a
probability density function (PDF) or cumulative distribution function
(CDF). These functions describe how probabilities are distributed across
a continuous range of values.
Example:
You have a continuous random variable X with a normal distribution.
You can specify the PDF and CDF using R functions like `dnorm()` and
`pnorm()`.
```R
x<-seq(-3,3,by=0.1)
pdf_values<-dnorm(x,mean=0,sd=1)#PDFofastandardnormal
distribution
cdf_values<-pnorm(x,mean=0,sd=1)#CDFofastandardnormal
distribution
```
These are general methods for assigning probabilities to events, and the
choice of method depends on the nature of the problem and the
information available. In practice, probabilities are often assigned using
a combination of these methods, depending on the specific context and
available data.

22. A supplier shipped a lot of six parts to a company. The lot contained
threedefectiveparts.Supposethe customer decided to randomlyselect
twoparts andtest them for defects. How large a sample space is the
customer potentially working with?List the sample space.Using the
sample space list,determine the probability that the customer will select
a sample with exactly one defect.
To determine the sample space and calculate the probability of
selecting a sample with exactly one defect, we can use a combination
of counting techniques. In this problem, we have a lot of six parts, with
three of them being defective.

Compiled by K Praveen Kumar, SMC SHIRVA


The sample space re presents all possible combinations of selecting two
parts from the lot. To find the number of such combinations, you can use
the binomial coefficient formula, also known as "n choose k," which is
calculated as:
C(n,k) =n!/ (k!(n-k)!)
In this case, n is the total number of parts in the lot (6),and k is the
number of parts you want to select (2).
C(6,2) =6!/(2!(6-2)!)
C(6,2) =6!/(2!*4!)
C(6,2) =(6*5*4!)/(2!*4!)
C(6,2) =(6*5)/ (2*1)
C(6,2) =15
So, there are 15 possible combination in the sample space for selecting
two parts from the lot.
Next, let's determine the probability of selecting a sample with exactly one
defect.
To do this, we need to calculate the number of ways to select one
defective part and one non-defective part, and then divide it by the
total number of possible combinations.
Number of ways to select one defective and one non-
defective part: 3 ways to select one defective part from the
3 available.
3 ways to select one non-defective part from the 3 available.
Total number of ways
To select a sample of two parts from the lot:15(as calculated above).
Now ,calculate the probability:
Probability=(Number off avorable outcomes)/(Total number of possible
outcomes)

Compiled by K Praveen Kumar, SMC SHIRVA


Probability=(3 *3)/15
Probability=9/15
Probability=3/5
So, the probability that the customer will select a sample with exactly
one defect is 3/5.

23.

XUZ={1,2,3,4,5,7,8,9}
XINTERSECTIONZ={1,3,
7} X INTERSECTION
Y={7,9}
XUYUZ={1,2,3,4,5,7,8,9}
XINTERSECTIONYINTERSECTIONZ={7
}
(XUY )INTERSECTION Z={1,2,3,4,7}
(YINTESRECTIONZ)U(XINTERSECTIONY)=
{7} X OR Y={1,2,3,4,5,7,8,9}
YAND X={2,4,7}
24.
Acompany’scustomerservice800telephonesystemissetupsothatthecallerhas
six options. Each of these six options leads to a menu with four options.
Foreach of these four options, three more options are available. For each
of thesethree options, another three options are presented. If a person calls
the 800number for assistance, how many total options are possible?
To find the total number of possible options, you can calculate the total
number of combinations by multiplying the number of choices at each
level.In this scenario, you have four levels of choices, each with a
different number of options:
1. Firstlevel:6options

Compiled by K Praveen Kumar, SMC SHIRVA


2. Secondlevel:4 options for each of the 6 options in the first level (4
options x6 choices = 24 options)
3. Third level: 3 options for each of the24options in the se cond
level(3optionsx 24 choices = 72 options)
4. Fourth level:3 options for each of the 72 options in the third
level(3optionsx 72 choices = 216 options)
Now,multiply the number of options at each level to find the total
number of possible options:

Total options=6(1st level)x24(2nd level)x72(3rd level)x216(4th level)


Total options = 6 x 24 x 72 x 216
Calculate the total number of options:
Total options=373,248
So,there are 373, 248 total possible options for a person calling the 800 number
for assistance.
Probability=(Number off avorable outcomes)/(Total number of
possible outcomes)
216/800=0.27

25.A bin contains six parts.Two of the parts are defective and four
are acceptable.If three of the six parts are selected from the bin, how
large is the sample space?Which counting rule did you use, and why?
For this sample space, what is theprobability that exactly one of the
three sampled parts is defective?
To find the size of the sample space for selecting three parts from
the bin, you can use the combination formula. The combination
formula, also known as "n choose k,"isused to determine the
number of ways to choose k items from a set of n items without
regard to the order. It's expressed as:
C(n,k) =n!/(k!(n-k)!)
In this case,you have six parts in the bin,and you want to select

Compiled by K Praveen Kumar, SMC SHIRVA


three of them: C(6, 3) = 6! / (3!(6 - 3)!)
C(6,3) =6!/(3!*3!)
C(6,3) =(6* 5*4)/(3 *2*1)
C(6,3) =20
So,the sample space contains 20 different ways to select three parts
from the bin.
Now,to find the probability that exactly one of the three
sample d parts is defective, we can consider two cases:
Case1:One defective part and two accept able parts are selected.
Number of ways to choose one defective part from the two defective
parts:C(2,1)
=2
Number of ways to choose two acceptable parts from the four
acceptable parts: C(4, 2) = 6
Case2:Two defective parts and one acceptable part are selected.
Number of ways to choose two defective parts from the two defective
parts:C(2, 2) = 1
Number of ways to choose one acceptable part from the four
acceptable parts:C(4, 1) = 4
Now,calculate the total number of favorable outcomes for each case:
Total favorable outcomes=(Number of way sin Case1)+(Number
of ways in Case 2) = (2 * 6) + (1 * 4) = 12 + 4 = 16

Finally,calculate the probability:


Probability=(Number of f avorable outcomes)/(Total number of
possible outcomes)
Probability=
16/20
Probability
4/5
So,the probability that exactly one of the three sample parts is defective is
4/5.

Compiled by K Praveen Kumar, SMC SHIRVA


26. Explain Marginal,Union,Joint and Conditional Probabilities with example.
In probabilit y theory, various types of probabilities are used to describe
different aspects of random events and their relationships. Four
fundamental types of probabilities are marginal probability, union
probability, joint probability, and conditional probability. Here, I'll
explain each type with examples in R:
1. Marginal Probability:
- Marginal probability refers to the probability of asing le event
occurring, ignoring other events. It is calculated by summing or
integrating the joint probabilities over all possible values of the
events being ignored.
Example:
Suppose you have the following joint probability distribution for two
events A and B:
| |A = 0 |A=1|
| | | |
|B= 0 |0.2 |0.1 |
|B= 1 |0.3 |0.4 |
To find them arginal probability of event A:
```R

joint_probs<-matrix(c(0.2,0.1,0.3,0.4),nrow=2,byrow=TRUE)
marginal_prob_A <- rowSums(joint_probs)
```
2. UnionProbability:
- Union probability, denoted as P(A∪ B),represents the probability
that atleast one of two events A and B occurs. It can be calculated using
the marginal probabilities and the joint probability of A and B.
Example:
Using the same joint probability distribution as above,to find the union
probability of A and B:

Compiled by K Praveen Kumar, SMC SHIRVA


```R
union_prob_A_or_B<-
sum(marginal_prob_A)+sum(marginal_prob_B)- joint_probs[2, 2]
```
3. Joint Probability:
- Joint probability, denoted as P(A∩B),represents the probability of
both events A and B occurring simultaneously.
Example:
Continuing with the previous example,to find the joint probability ofA and B:
```R
joint_prob_A_and_B<-joint_probs[2,2]
```
4. ConditionalProbability:

- Conditional probability, denoted as P(A|B), represents the


probability of event A occurring given that event B has already occurred.
It is calculated by dividing the joint probability of A and B by the
probability of B.
Example:
To find the conditional probability of A given B:
```R
conditional_prob_A_given_B<-joint_prob_A_and_B/marginal_prob_B[2]
```
These four types of probabilities are fundamental in probability theory and
play a crucial role in various fields, includings tatistics, machine learning,
and decision- making. They provide a way to quantify and analyze the
relationships between events and their likelihood of occurrence

Compiled by K Praveen Kumar, SMC SHIRVA


27. The client company datafrom the Decision Dilemma revealthat155
employeesworked one of four types of positions. Shown here again is the raw
values matrix(also called a contingency table) with the frequency counts for
each category andfor subtotals and totals containing a breakdown of these
employees by type ofposition and by sex. If an employee of the company is
selected randomly, what isthe probability that the employee is female or a
professional worker?

Let F denote the event of female and P denote the event ofp rofessional
worker. The question is P (FꓴP) = ?

By the general law of addition, P(FꓴP)= P(F)+P(P)-P(F∩P)


Of the 155 employees, 55 are women.
Therefore,P(F)=55/155=.355.
The 155 employees include 44 professionals.
Therefore, P(P) = 44/155 = .284.
Because 13 employees are both female and professional
,P(F∩P)=13/155=.084.
The union probability is solved as P(FꓴP)=.355+.284-.084=.555
.P(FꓴP)=86/155=.555
A second way to produce the answer from the raw value matrix is to add
all the
cellsonetimethatareineithertheFemalecolumnortheProfessionalrow3+1
3+ 17 + 22 + 31 = 86 and then divide by the total number of employees,
N = 155, which gives P(FꓴP) = 86/155 = .55

Compiled by K Praveen Kumar, SMC SHIRVA


28.s

29.

Compiled by K Praveen Kumar, SMC SHIRVA


Let's denote the following events:

\( M \) = woman is married,
\( L \) = woman participates in the labor force.

According to the information provided:

\[ P(L|M) = 78\% \]
\[ P(L|M') = 61\% \]

Where \( M' \) is the complement of \( M \), i.e., not married.

Now, we can use the probability rules to answer the questions:

a. The probability that a randomly selected woman in that age group is


married or is participating in the labor force is given by:

\[ P(M \cup L) = P(M) + P(L) - P(M \cap L) \]

\[ P(M \cup L) = P(M) + P(L|M')P(M') \]

\[ P(M \cup L) = P(M) + P(L|M') \cdot (1 - P(M)) \]

b. The probability that a randomly selected woman in that age group is


married or is participating in the labor force but not both is given by:

\[ P((M \cap \neg L) \cup (\neg M \cap L)) \]

\[ = P(M \cap \neg L) + P(\neg M \cap L) \]

\[ = P(L|M') \cdot P(M') + P(M|L') \cdot P(L') \]

\[ = P(L|M') \cdot (1 - P(M)) + P(M|L') \cdot (1 - P(L)) \]

Compiled by K Praveen Kumar, SMC SHIRVA


c. The probability that a randomly selected woman in that age group is
neither married nor participating in the labor force is given by:

\[ P(\neg M \cap \neg L) = 1 - P(M \cup L) \]

These calculations require the values of \( P(M) \) and \( P(L) \), which are
not provided in the question. If you have those values, you can substitute
them into the formulas to get the numerical answers.
30. A company has 140 employees, of which 30 are supervisors. Eighty of
theemployees are married, and 20% of the married employees are
supervisors. If accompany employee is randomly selected, what is the
probability that the employee is married and is a supervisor?

To find the probability that a randomly selected employe e is both


married and a supervisor, you can use the information provided.
1. There are140 employees in total, and 30 of them are supervisors.

Compiled by K Praveen Kumar, SMC SHIRVA


2. 80 of the employees are married, and 20% of the married
employees are supervisors.
Let's calculate the probability step bys tep:
1. Calculate the number of married employees who are supervisors:
Number of married employees who are supervisors=20%of80=0.20*80=16
employees
2. Calculate the probability that a randomly selected employee is
married and is a supervisor:
Probability=(Numberofmarriedemployeeswhoaresupervisors)/(Totalnu
mber of employees)
Probability=16/140
Now,simplify the fraction(if possible):
Probability=8/70
To further simplify, you can divide both then umerator and the
denominator by their greatest common divisor (GCD), which is 2:
Probability=(8 /2)/ (70/ 2)
Probability=4/35
So,the probability that a randomly selected employee is married and
is a supervisor is 4/35.

31 .

To find the probabilities, you can use the formula:

P(X Π Y) = Number of elements in X Π Y/Total number of elements

Compiled by K Praveen Kumar, SMC SHIRVA


Let's calculate each probability:

P(A Π E) =16/5+11+16+18

P(A Π B) = 2/2+3+5+7

P(D Π E) = 0/5+11+16+18
(Since there is no common element between D and E)

P(D Π B) = 0/2+3+5+7 (Since there is no common element between D and B)

32.

Let's denote the events:


- S : the adult owns stock
- E : the adult has some college education

Given information:
P(S) = 0.43 (probability of owning stock)
P(E|S) = 0.75 (probability of having some college education given they own
stock)
P(E) = 0.37 (probability of having some college education)

Now, we can answer each part:

Compiled by K Praveen Kumar, SMC SHIRVA


a. Probability that the adult does not own stock:
P( ¬ S) = 1 - P(S) = 1 - 0.43

b. Probability that the adult owns stock and has some college education:
P(S Π E) = P(S) .P(E|S)

c. Probability that the adult owns stock or has some college education:
P(S U E) = P(S) + P(E) - P(S ΠE)

d. Probability that the adult has neither some college education nor owns stock:
P( ¬S U¬ E) = 1 - P(S Π E)

e. Probability that the adult does not own stock or has no college education:
P( ¬S U ¬ E) = 1 - P(S Π E)

f. Probability that the adult has some college education and owns no stock:
P(E Π ¬ S) = P(E) -
P(SΠE)

Compiled by K Praveen Kumar, SMC SHIRVA


33.

34. Determine the mean,the variance and the standard deviation of the
followingdiscrete distribution

Compiled by K Praveen Kumar, SMC SHIRVA


35 AGallupsurveyfoundthat65%ofallfinancialconsumerswereverysati

sfiedwith their primary financial institution. Suppose that 25

financial consumers aresampled and if the Gallup survey result still


holds true today, what is theprobability that exactly 19 are very
satisfied with their primary financialinstitution?(Using Binomial
Distribution formulae)

Compiled by K Praveen Kumar, SMC SHIRVA


36.According to the U.S. Census Bureau, approximately 6% of all workers
inJackson,Mississippi,areunemployed.Inconductingarandom
telephonesurveyinJackson, what is the probability of getting two or fewer
unemployed workers in asample of 20? (Using Binomial Distribution
formulae)

37.Suppose bank customers arrive randomly on week day afternoon


satan averageof 3.2 customers every 4 minutes. What is the probability
of exactly 5 customersarriving in a 4-minute interval on a weekday
afternoon? The lambda for this problem is 3.2 customers per 4
minutes.The value of x is 5 customers per 4 minutes. (Using Poisson
Formula)

Compiled by K Praveen Kumar, SMC SHIRVA


38.Bank customers arrive randomly on weekday afternoon sat an
average of 3.2 customers every 4 minutes. What is the probability of
having more than 7customers in a 4-minute interval on a weekday
afternoon? (Using PoissonFormula)

39.A bank has an average random arrival rate of 3.2 customer severy
4minutes.What is the probability of getting exactly 10 customers during an
8-minuteinterval? (Using Poisson Formula)

Compiled by K Praveen Kumar, SMC SHIRVA


40.

The Poisson probability mass function is given by:

\[ P(X = x) = \frac{e^{-\lambda} \cdot \lambda^x}{x!} \]

where \( \lambda \) is the average rate of success per unit, and \( x \)


is the number of successes you are interested in.

a. \( P(x=5 | \lambda = 2.3) \)

\[ P(X=5) = \frac{e^{-2.3} \cdot 2.3^5}{5!} \]

Compiled by K Praveen Kumar, SMC SHIRVA


b. \( P(x=2 | \lambda = 3.9) \)

\[ P(X=2) = \frac{e^{-3.9} \cdot 3.9^2}{2!} \]

c. \( P(x\leq3 | \lambda = 4.1) \)

\[ P(X\leq3) = \sum_{i=0}^{3} \frac{e^{-4.1} \cdot 4.1^i}{i!} \]

d. \( P(x=0 | \lambda = 2.7) \)

\[ P(X=0) = \frac{e^{-2.7} \cdot 2.7^0}{0!} \]

e. \( P(x=1 | \lambda = 5.4) \)

\[ P(X=1) = \frac{e^{-5.4} \cdot 5.4^1}{1!} \]

f. \( P(4 < X < 8 | \lambda = 4.4) \)

\[ P(4 < X < 8) = \sum_{i=5}^{8} \frac{e^{-4.4} \cdot 4.4^i}{i!} \]

You can use a calculator or a statistical software package to compute


these values. Note that \(e\) is the mathematical constant
approximately equal to 2.71828.
41.Suppose the amount of time it takes toassemblea plastic module
ranges from 27to39 seconds and that assembly timesare uniformly
distributed. Describe the distribution. What is the probability that a
given assembly will take between 30and 35 seconds? Fewer than 30
seconds?

Compiled by K Praveen Kumar, SMC SHIRVA


42. According to the National Association of Insurance Commissioners,
the average annual cost for automobile insurancein the United States in
a recent year was $691. Suppose automobile insurance costs are
uniformly distributed in theUnited States witha rangeoffrom
$200to$1,182. What is the standard deviationof this uniform
distribution? What is the height of the distribution? What is
theprobability that a person’s annual cost for automobile insurance in
the UnitedStates is between $410 and $825?

Compiled by K Praveen Kumar, SMC SHIRVA


43.a. What is the probability of obtaining a score greater than 700 on a
GMAT test that has a mean of 494 and a standard deviation of 100? Assume
GMAT scores are normally distributed.

Compiled by K Praveen Kumar, SMC SHIRVA


Compiled by K Praveen Kumar, SMC SHIRVA
44. GMAT test that has a mean of 494 and a standard deviation of
100? Assume GMAT scores are normally distributed. What is the
probability of randomly obtaining a score between 300 and 600 on
the GMAT exam?

Compiled by K Praveen Kumar, SMC SHIRVA


45.GMAT test that has a mean of 494 and a standard deviation of 100?
Assume GMAT scores are normally distributed. what is the probability of
randomly drawing a score that is 550 or less? a.

Compiled by K Praveen Kumar, SMC SHIRVA


Compiled by K Praveen Kumar, SMC SHIRVA
46. GMAT test that has a mean of 494 and a standard deviation of 100?
Assume GMAT scores are normally distributed.What is the probability of
getting a score between 350 and 450 on the same GMAT exam?

Compiled by K Praveen Kumar, SMC SHIRVA


47.Suppose the following data are selected randomly from a
population ofnormally distributed values. Construct a confidence
interval to estimate the population mean. And 90% confidence
level.(UsingthetStatistic).The sample mean is 13.56 and the sample
standard deviation is 7.8.

Compiled by K Praveen Kumar, SMC SHIRVA


48.Assuming x is normally distributed; use the following information to
computea99% confidence interval to estimatethe population mean.And
99% confidencelevel. (Using the t Statistic) The sample mean is 2.14 and
the sample standard deviation is 1.29.

Compiled by K Praveen Kumar, SMC SHIRVA


Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
8 The following list shows the top six pharmaceutical companies in the
UnitedStatesandtheirsalesfigures($millions)forarecentyear.Usethisinformationto

Compiled by K Praveen Kumar, SMC SHIRVA


constructapiechartandabargraphtorepresentthesesixcompaniesandtheirsales.

9 The following datarepresentthe costs (indollars) of asampleof 30


postalmailingsbyacompany.Usingdollarsasastemandcentsasaleaf,constructast
em-and-leaf plot of the data.

Compiled by K Praveen Kumar, SMC SHIRVA


Stem root
1 83 93 97

2 09 75 78 84

3 67 32 34 55 53 89 21

4 10 15 84 95

5 10 11 42 47

6 45 72

7 20 80

8 64

9 15

10 94

10

Compiled by K Praveen Kumar, SMC SHIRVA


11

12

Compiled by K Praveen Kumar, SMC SHIRVA


13 ExplaingeneralMethodsofassigningprobabilitieswithexample.
Assigning probabilities to events is a fundamental concept in probability theory
and statistics. There are several general methods for assigning probabilities,
dependingonthecontextandthenatureoftheproblem.Herearesomecommon
methods with examples in R:
5. Classical(orTheoretical)Probability:
- Thismethodassignsprobabilitiesbasedontheassumptionthatalloutcomesin the
sample space are equally likely. It is often used for simple, well-defined
experiments.
Example:
Supposeyourollafairsix-sideddie.Eachfaceofthediehasanequal probability of
1/6.
InR,youcanrepresentthisusingaprobabilitydistribution:
```R
outcomes <- 1:6
probabilities<-rep(1/6,6)
```
6. Empirical(orExperimental)Probability:
- This method assigns probabilities based on observed data from past
experimentsorevents.Itisusefulwhenyouhavedatatoestimateprobabilities.

Compiled by K Praveen Kumar, SMC SHIRVA


Example:
You want to estimate theprobabilityof winning agame based on 100 recorded
gameoutcomes.Youfindthatyouwon30outofthe100games,sotheempirical
probability of winning is 30/100.
InR,youcancalculateempiricalprobabilitiesfromdata:
```R
total_games<-100
wins<-30
empirical_probability<-wins/total_games
```
7. SubjectiveProbability:
- This method assigns probabilities based on personal judgment or subjective
beliefs.Itisusedwhenthereisnoavailabledataortheoreticalbasisforassigning
probabilities.
Example:
Youestimatetheprobabilityofitrainingtomorrowbasedonyourpersonal
judgment. You might assign aprobabilityof 0.3 if youbelievethere's a 30%
chance of rain.
InR,youcansimplyassignthesubjectiveprobability:
```R
subjective_probability<-0.3
```
8. ProbabilityDensityFunctions(PDF)andCumulativeDistributionFunctions
(CDF):

Compiled by K Praveen Kumar, SMC SHIRVA


- Insomecases,youmaydefineprobabilitiesbyspecifyingaprobabilitydensity
function (PDF) or cumulative distribution function (CDF). These functions describe
how probabilities are distributed across a continuous range of values.
Example:
YouhaveacontinuousrandomvariableXwithanormaldistribution.Youcan specify
the PDF and CDF using R functions like `dnorm()` and `pnorm()`.
```R
x<-seq(-3,3,by=0.1)
pdf_values<-dnorm(x,mean=0,sd=1)#PDFofastandardnormal distribution
cdf_values<-pnorm(x,mean=0,sd=1)#CDFofastandardnormal distribution
```
Thesearegeneralmethodsforassigningprobabilitiestoevents,andthechoiceof
method depends on the nature of the problem and the information available. In
practice, probabilities are often assigned using a combination of these methods,
depending on the specific context and available data.

14 A supplier shipped a lot of six parts to a company. The lot contained


threedefectiveparts.Supposethe customer decided to randomlyselect twoparts
andtest them for defects. How large a sample space is the customer
potentiallyworkingwith?Listthesamplespace.Usingthesamplespacelist,determi
netheprobability that the customer will select a sample with exactly one defect.
Todeterminethesamplespaceandcalculatetheprobabilityofselectingasample with
exactlyonedefect,wecanuseacombination of counting techniques.In this
problem, we have a lot of six parts, with three of them being defective.

Compiled by K Praveen Kumar, SMC SHIRVA


Thesamplespacerepresentsallpossiblecombinationsofselectingtwopartsfrom the
lot. To find the number of such combinations, you can use the binomial coefficient
formula, also known as "n choose k," which is calculated as:
C(n,k) =n!/ (k!(n-k)!)
In this case,nisthetotalnumberofpartsin thelot (6),and kisthenumberofparts you
want to select (2).
C(6,2) =6!/(2!(6-2)!)
C(6,2) =6!/(2!*4!)
C(6,2) =(6*5*4!)/(2!*4!)
C(6,2) =(6*5)/ (2*1)
C(6,2) =15
So,thereare15possiblecombinationsinthesamplespaceforselectingtwoparts from
the lot.
Next,let'sdeterminetheprobabilityofselectingasamplewithexactlyonedefect.
Todothis,weneedtocalculatethenumberofwaystoselectonedefective part and one
non-defective part, and then divide it by the total number of possible
combinations.
Numberofwaystoselectonedefectiveandonenon-defectivepart: 3
ways to select one defective part from the 3 available.
3waystoselectonenon-defectivepartfromthe3 available.
Totalnumberofwaystoselectasampleoftwopartsfromthe lot:15(ascalculated
above).
Now,calculatetheprobability:
Probability=(Numberoffavorableoutcomes)/(Totalnumberofpossible outcomes)

Compiled by K Praveen Kumar, SMC SHIRVA


Probability=(3 *3)/15
Probability=9/15
Probability=3/5
So,theprobabilitythatthecustomerwillselectasample withexactlyonedefectis 3/5.

23.
XUZ={1,2,3,4,5,7,8,9}
XINTERSECTIONZ={1,3,7} X
INTERSECTION Y={7,9}
XUYUZ={1,2,3,4,5,7,8,9}
XINTERSECTIONYINTERSECTIONZ={7}
(XUY )INTERSECTION Z={1,2,3,4,7}
(YINTESRECTIONZ)U(XINTERSECTIONY)={7} X
OR Y={1,2,3,4,5,7,8,9}
YAND X={2,4,7}

24. Acompany’scustomerservice800telephonesystemissetupsothatthecall
erhas six options. Each of these six options leads to a menu with four options.
Foreach of these four options, three more options are available. For each of
thesethree options, another three options are presented. If a person calls the
800number for assistance, how many total options are possible?

Compiled by K Praveen Kumar, SMC SHIRVA


Tofindthetotalnumberofpossibleoptions,youcancalculatethetotalnumberof
combinationsbymultiplyingthe number ofchoices at each level.In this scenario,
you have four levels of choices, each with a different number of options:
5. Firstlevel:6options
6. Secondlevel:4optionsforeachofthe6optionsinthefirstlevel(4optionsx6 choices
= 24 options)
7. Thirdlevel:3optionsforeachofthe24optionsinthesecondlevel(3optionsx 24
choices = 72 options)
8. Fourthlevel:3optionsforeachofthe72optionsinthethirdlevel(3optionsx 72
choices = 216 options)
Now,multiplythenumberofoptionsateachleveltofindthetotalnumberof possible
options:

Totaloptions=6(1stlevel)x24(2ndlevel)x72(3rdlevel)x216(4thlevel) Total
options = 6 x 24 x 72 x 216
Calculatethetotalnumberofoptions:
Totaloptions=373,248
So,thereare373,248totalpossibleoptionsforapersoncallingthe800numberfor assistance.
Probability=(Numberoffavorableoutcomes)/(Totalnumberofpossible
outcomes)
216/800=0.27

25. Abincontainssixparts.Twoofthepartsaredefectiveandfourareacceptable
.If three of the six parts are selected from the bin, how large is the sample
space?Which counting rule did you use, and why? For this sample space, what is
theprobability that exactly one of the three sampled parts is defective?

Compiled by K Praveen Kumar, SMC SHIRVA


To find the size of the sample space for selecting three parts from the bin, you can
use the combination formula. The combination formula, also known as "n choose
k,"isusedtodeterminethenumberofwaystochoosekitemsfromasetofnitems
without regard to the order. It's expressed as:
C(n,k) =n!/(k!(n-k)!)
Inthiscase,youhavesixpartsinthebin,andyouwanttoselectthreeofthem: C(6, 3)
= 6! / (3!(6 - 3)!)
C(6,3) =6!/(3!*3!)
C(6,3) =(6* 5*4)/(3 *2*1)
C(6,3) =20
So,thesamplespacecontains20differentwaystoselect threepartsfromthebin.
Now,tofindtheprobabilitythatexactlyoneofthethreesampledpartsis defective,
we can consider two cases:
Case1:Onedefectivepartandtwo acceptablepartsareselected.
Numberofwaystochooseonedefectivepartfromthetwodefectiveparts:C(2,1)
=2
Numberofwaystochoosetwoacceptablepartsfromthefouracceptableparts: C(4, 2)
=6
Case2:Twodefectivepartsandoneacceptablepartareselected.
Numberofwaystochoosetwodefectivepartsfromthetwodefectiveparts:C(2, 2) = 1
Numberofwaystochooseoneacceptablepartfromthefouracceptableparts:C(4, 1) =
4
Now,calculatethetotalnumberoffavorableoutcomesforeachcase:
Totalfavorableoutcomes=(Numberofwaysin Case1)+(Numberofwaysin Case 2) = (2
* 6) + (1 * 4) = 12 + 4 = 16

Compiled by K Praveen Kumar, SMC SHIRVA


Finally,calculatetheprobability:
Probability=(Numberoffavorableoutcomes)/(Totalnumberofpossible outcomes)
Probability=16/20
Probability = 4/5
So,theprobabilitythatexactlyoneofthethreesampledpartsisdefectiveis4/5.

26. ExplainMarginal,Union,JointandConditionalProbabilitieswithexample.
Inprobabilitytheory,varioustypesofprobabilitiesareusedtodescribedifferent
aspects of random events and their relationships. Four fundamental types of
probabilities are marginal probability, union probability, joint probability, and
conditional probability. Here, I'll explain each type with examples in R:
5. MarginalProbability:
- Marginalprobabilityreferstotheprobabilityofasingleeventoccurring,
ignoring other events. It is calculated by summing or integrating the joint
probabilities over all possible values of the events being ignored.
Example:
SupposeyouhavethefollowingjointprobabilitydistributionfortwoeventsA and B:
| |A = 0 |A=1|
| | | |
|B= 0 |0.2 |0.1 |
|B= 1 |0.3 |0.4 |
TofindthemarginalprobabilityofeventA:
```R

Compiled by K Praveen Kumar, SMC SHIRVA


joint_probs<-matrix(c(0.2,0.1,0.3,0.4),nrow=2,byrow=TRUE) marginal_prob_A
<- rowSums(joint_probs)
```
6. UnionProbability:
- Unionprobability,denotedasP(A∪ B),representstheprobabilitythatatleast one
of two events A and B occurs. It can be calculated using the marginal probabilities
and the joint probability of A and B.
Example:
Usingthesamejointprobabilitydistributionasabove,tofindtheunion probability of
A and B:
```R
union_prob_A_or_B<-sum(marginal_prob_A)+sum(marginal_prob_B)-
joint_probs[2, 2]
```
7. JointProbability:
- Jointprobability,denotedasP(A∩B),representstheprobabilityofbothevents A
and B occurring simultaneously.
Example:
Continuingwiththepreviousexample,tofindthejointprobabilityofAandB:
```R
joint_prob_A_and_B<-joint_probs[2,2]
```
8. ConditionalProbability:

Compiled by K Praveen Kumar, SMC SHIRVA


- Conditionalprobability,denotedasP(A|B),representstheprobabilityofevent A
occurring given that event B has already occurred. It is calculated by dividing the
joint probability of A and B by the probability of B.
Example:
TofindtheconditionalprobabilityofAgivenB:
```R
conditional_prob_A_given_B<-joint_prob_A_and_B/marginal_prob_B[2]
```
Thesefourtypesofprobabilitiesarefundamentalinprobabilitytheoryandplaya
crucialroleinvariousfields,includingstatistics,machinelearning,anddecision-
making. They provide a way to quantify and analyze the relationships between
events and their likelihood of occurrence.

27. The client company datafrom the Decision Dilemma revealthat155


employeesworked one of four types of positions. Shown here again is the raw
values matrix(also called a contingency table) with the frequency counts for each
category andfor subtotals and totals containing a breakdown of these employees
by type ofposition and by sex. If an employee of the company is selected
randomly, what isthe probability that the employee is female or a professional
worker?

Let F denote the event of female and P denote the event of professional worker.
The question is P (FꓴP) = ?

Compiled by K Praveen Kumar, SMC SHIRVA


By the general law of addition, P(FꓴP)=P(F)+P(P)-P(F∩P) Of the
155 employees, 55 are women.
Therefore,P(F)=55/155=.355.
The155employeesinclude44professionals.
Therefore, P(P) = 44/155 = .284.
Because13employeesarebothfemaleandprofessional
,P(F∩P)=13/155=.084.
The union probability is solved asP(FꓴP)=.355+.284-.084=.555
.P(FꓴP)=86/155=.555
A second way to produce the answer from the raw value matrix is to add all the
cellsonetimethatareineithertheFemalecolumnortheProfessionalrow3+13+ 17 + 22
+ 31 = 86 and then divide by the total number of employees, N = 155, which gives
P(FꓴP) = 86/155 = .55

28.

Compiled by K Praveen Kumar, SMC SHIRVA


29.
30. A company has 140 employees, of which 30 are supervisors. Eighty of
the employees are married, and 20% of the married employees are supervisors.
If a company employe e is randomly selected, what is the probability that the
employee is married and is a supervisor?
To find the probability that a randomly selected employee is both married and a
supervisor, you can use the information provided.
3. Thereare140 employees in total,and30 of them are supervisors.

Compiled by K Praveen Kumar, SMC SHIRVA


4. 80 of the employees are married,and20%of the married employees are
supervisors.
Let's calculate the probability step by step:
3. Calculate the number of married employees who are supervisors:
Number of married employees who are supervisors=20%of80=0.20*80=16 employees
4. Calculate the probability that a randomly selected employee is married and is
a supervisor:
Probability=(Numberofmarriedemployeeswhoaresupervisors)/(Totalnumber of
employees)
Probability=16/140
Now, simplify the fraction(if possible):
Probability=8/70
To further simplify , you can divide both the numerator and the denominator b y
their greatest common divisor (GCD), which is 2:
Probability=(8 /2)/ (70/ 2)
Probability=4/35
So,the probability that a randomly selected employee is married and is a
supervisor is 4/35.

Compiled by K Praveen Kumar, SMC SHIRVA


31.

32.

33.

Compiled by K Praveen Kumar, SMC SHIRVA


Compiled by K Praveen Kumar, SMC SHIRVA
34 Determine the mean,the variance and the standard deviation of the
followingdiscrete distribution.

Ans:

Compiled by K Praveen Kumar, SMC SHIRVA


35 AGallupsurveyfoundthat65%ofallfinancialconsumerswereverysatisfie
dwith their primary financial institution. Suppose that 25 financial
consumers aresampled and if the Gallup survey result still holds true today,
what is theprobability that exactly 19 are very satisfied with their primary
financialinstitution?(Using Binomial Distribution formulae)

36 According to the U.S. Census Bureau, approximately 6% of all workers


inJackson,Mississippi,areunemployed.Inconductingarandom
telephonesurveyinJackson, what is the probability of getting two or fewer
unemployed workers in asample of 20? (Using Binomial Distribution formulae)

37 Suppose bank customers arrive randomly on week day afternoons at an


average of 3.2 customers every 4 minutes. What is the probability of exactly 5

Compiled by K Praveen Kumar, SMC SHIRVA


customersarriving in a 4-minute interval on a weekday afternoon? The lambda
for this

Compiled by K Praveen Kumar, SMC SHIRVA


Problem is 3.2customers per 4 minutes.The value of x is 5 customers per 4minutes.
(Using Poisson Formula)

38 Bankcustomersarriverandomlyonweekdayafternoonsatanaverageof
3.2customers every 4 minutes. What is the probability of having more than
7customers in a 4-minute interval on a weekday afternoon? (Using
PoissonFormula)

Compiled by K Praveen Kumar, SMC SHIRVA


39 A bank has an average random arrival rate of 3.2 customer sever y 4
minutes. What is the probability of getting exactly 10 customers during an
8-minute interval? (Using Poisson Formula)

40.

41.Supposetheamount of time it takes toassemblea plastic module ranges


from27to39 seconds andthatassembly timesare uniformly distributed.
Describethedistribution. What is the probability that a given assembly will
take between 30and 35 seconds? Fewer than 30 seconds?

Compiled by K Praveen Kumar, SMC SHIRVA


40 According to the National Association of Insurance Commissioners,
theaverageannualcostforautomobileinsuranceintheUnitedStatesinarecentye
arwas $691. Suppose automobile insurance costs are uniformly distributed in
theUnited States witha rangeoffrom $200to$1,182. What is the standard
deviationof this uniform distribution? What is the height of the distribution?
What is theprobability that a person’s annual cost for automobile insurance
in the UnitedStates is between $410 and $825?

Compiled by K Praveen Kumar, SMC SHIRVA


41 a. What is the probability of obtaining a score greater than 700 on a
GMATtestthathasameanof494andastandarddeviationof100?AssumeGMATsc
oresare normally distributed.

Compiled by K Praveen Kumar, SMC SHIRVA


Compiled by K Praveen Kumar, SMC SHIRVA
42 GMATtestthathasameanof494andastandarddeviation
of100?AssumeGMAT scores are normally distributed. What is the
probability of randomlyobtaining a score between 300 and 600 on the
GMAT exam?

Compiled by K Praveen Kumar, SMC SHIRVA


43 GMATtestthathasameanof494andastandarddeviationof100?Assu
meGMAT scores are normally distributed. what is the probability of
randomlydrawing a score that is 550 or less? a.

Compiled by K Praveen Kumar, SMC SHIRVA


44 GMAT test that has a mean of 494 and a standard deviation of 100?
AssumeGMATscoresarenormallydistributed.Whatistheprobabilityofgetting

Compiled by K Praveen Kumar, SMC SHIRVA


ascorebetween 350 and 450 on the same GMAT exam?

Compiled by K Praveen Kumar, SMC SHIRVA


Compiled by K Praveen Kumar, SMC SHIRVA
45 Suppose the following data are selected randomly from a
population ofnormally distributed values. Construct a confidence interval
to estimate
thepopulationmean.And90%confidencelevel.(UsingthetStatistic).Thesam
plemean is 13.56 and the sample standard deviation is 7.8.

Compiled by K Praveen Kumar, SMC SHIRVA


46 Assumingxisnormallydistributed;usethefollowinginformationtocomput
ea99% confidence interval to estimatethe population mean.And 99%
confidencelevel. (Using the t Statistic) The sample mean is 2.14 and the
sample standarddeviation is 1.29.

Compiled by K Praveen Kumar, SMC SHIRVA


Compiled by K Praveen Kumar, SMC SHIRVA
UNIT-4
2 Marks:
1)What is Hypothesis? Giva an Example.
A hypothesis is an assumption that is made based on some evidence. This is the initial
point of any investigation that translates the research questions into predictions. It includes
components like variables, population and the relation between the variables.
Example
The field of business, decision makers are continually attempting to find answers to questions
such as the following:
■ What container shape is most economical and reliable for shipping a product?
■ Which management approach best motivates employees in the retail industry?
■ What is the best way to link client databases for fast retrieval of useful information?
■ Which indicator best predicts the general state of the economy in the next six months?
■ What is the most effective means of advertising in a business-to-business setting?

2)What are the Types of Hypotheses?


Types of Hypotheses:
1. Research hypotheses
2. Statistical hypotheses
3. Substantive hypotheses

3)What are Type-I Error? Give an Example.


A Type I error is committed by rejecting a true null hypothesis. With a Type I error, the
null hypothesis is true, but the business researcher decides that it is not.
As an example, suppose a worker on the assembly line of a large manufacturer hears an
unusual sound and decides to shut the line down (reject the null hypothesis). If the sound
turns out not to be related to the assembly line and no problems are occurring with the
assembly line, then the worker has committed a Type I error.

4)What are null and alternate hypotheses? Give Example.


The null hypothesis states that the “null” condition exists; that is, there is nothing new
happening, the old theory is still true, the old standard is correct, and the system is in control.
The alternative hypothesis, on the other hand, states that the new theory is true, there
are new standards, the system is out of control, and/or something is happening.
As an example, suppose flour packaged by a manufacturer is sold by weight; and a
particular size of package is supposed to average 40 ounces. Suppose the manufacturer wants
to test to determine whether their packaging process is out of control as determined by the
weight of the flour packages. The null hypothesis for this experiment is that the average

Compiled by K Praveen Kumar, SMC SHIRVA


weight of the flour packages is 40 ounces (no problem). The alternative hypothesis is that the
average is not 40 ounces (process is out of control).

5) What are Type-II Error? Give an Example.


A Type II error is committed when a business researcher fails to reject a false null
hypothesis. In this case, the null hypothesis is false, but a decision is made to not reject it.
Eg: Suppose in the business world an employee is stealing from the company. A manager
sees some evidence that the stealing is occurring but lacks enough evidence to conclude that
the employee is stealing from the company. The manager decides not to fire the employee
based on theft. The manager has committed a Type II error.

6) Show the relationship between α, β, and power.


The relationships between α (alpha), β (beta), and power in statistical hypothesis testing can
be understood through the following equations:

1. Type I Error (α):


- This is the probability of rejecting a true null hypothesis.
- Symbolically: α = P(Reject H0 | H0 is true)

2. Type II Error (β):


- This is the probability of failing to reject a false null hypothesis.
- Symbolically: β = P (Fail to Reject H0 | H0 is false)

3. Power of the Test (1 - β):


- Power is the probability of correctly rejecting a false null hypothesis.
- Symbolically: Power= 1 - β = P (Reject H0 | H0 is false)

7) What are One Sample t Test? List its uses.


The One Sample t Test examines whether the mean of a population is statistically different
from a known or hypothesized value. The One Sample t Test is a parametric test.
Common Uses:
Statistical difference between a mean and a known or hypothesized value of the mean
in the population.
Statistical difference between a change score and zero.

8) Write the t test statistic formula for one sample t Test.

Compiled by K Praveen Kumar, SMC SHIRVA


9) State the hypotheses One sample t Test?
All statistical hypotheses consist of two parts, a null hypothesis and an alternative
hypothesis.
Generally, the null hypothesis states that the “null” condition exists; that is, there is nothing
new happening, the old theory is still true, the old standard is correct, and the system is in
control. The alternative hypothesis, on the other hand, states that the new theory is true,
there are new standards, the system is out of control, and/or something is happening.

10) What are Paired Samples t Test? List its uses.


The paired samples t Test compares the means of two measurements taken from the same
individual, object, or related units. These “paired” measurements can represent things like:
Common Uses:
Statistical difference between two time points
Statistical difference between two conditions
Statistical difference between two measurements
Statistical difference between a matched pair

11) Write the t test statistics formula for Paired Samples t Test.

12) State the hypotheses Paired Samples t Test?

13) What are Independent Samples t Test? List its uses.


The Independent Samples t Test compares the means of two independent groups in
order to determine whether there is statistical evidence that the associated population means
are significantly different. The Independent Samples t Test is a parametric test.
Common Uses
The Independent Samples t Test is commonly used to test the following:
• Statistical differences between the means of two groups
• Statistical differences between the means of two interventions
• Statistical differences between the means of two change score

14) State the hypotheses Independent Samples t Test.

15) What are Chi-Square Test of Independence? List its uses.


The Chi-Square Test of Independence determines whether there is an association between

Compiled by K Praveen Kumar, SMC SHIRVA


categorical variables (i.e., whether the variables are independent or related). It is a
nonparametric test.
Common Uses
The Chi-Square Test of Independence is commonly used to test the following:
• Statistical independence or association between two or more categorical variables.
The Chi-Square Test of Independence can only compare categorical variables.
It cannot make comparisons between continuous variables or between categorical and
continuous variables.
Additionally, the Chi-Square Test of Independence only assesses associations between
categorical variables, and cannot provide any inferences about causation.

16) Write the t test statistics formulae for Chi-Square Test of Independence.
t test statistics formulae for Chi-Square Test of Independence:

17) What are ONE WAY ANOVA? List its uses.


One-Way ANOVA:
One-Way ANOVA ("analysis of variance") compares the means of two or more
independent groups in order to determine whether there is statistical evidence that the
associated population means are significantly different. One-Way ANOVA is a parametric
test.
The One-Way ANOVA is commonly used to test the following:
• Statistical differences among the means of two or more groups
• Statistical differences among the means of two or more interventions
• Statistical differences among the means of two or more change scores

18) Write the formulae of MSC, MSE and F statistics for ONE WAY ANOVA.
Ans:
MSC:
MSC=SSC/〖df〗_C

MSE:
MSE=SSE/〖df〗_E

F:
F=MSC/MSE

19) Write the formulae of SSC, SSE and SST for ONE WAY ANOVA.

Compiled by K Praveen Kumar, SMC SHIRVA


Where
i= a particular member of a treatment level
j = a treatment level
C = number of treatment levels
nj = number of observations in a given treatment level
x ̅ = grand mean
x ̅_(j =) column mean
xij = individual value

20) Write the Z formulae for One Sample Proportion Test.

21) Write the Z formulae for Two Sample Proportion Test.

22) Define Correlation Analysis and Regression Analysis.


• Correlation is a statistical technique to ascertain the association or relationship between
two or more variables.
• Correlation analysis is a statistical technique to study the degree and direction of relationship
between two or more variables.
Regression:
Regression analysis is a statistical tool to study the nature and extent
of functional relationship between two or more variables and to estimate
(or predict) the unknown values of dependent variable from the known
values of independent variable

23) What are the uses of Uses of correlations?


1. Correlation analysis helps inn deriving precisely the degree and the direction
of such relationship.
2. The effect of correlation is to reduce the range of uncertainty of our
prediction. The prediction based on correlation analysis will be more reliable
and near to reality.
3. The measure of coefficient of correlation is a relative measure of change.

24) List the Types of Correlation.

Compiled by K Praveen Kumar, SMC SHIRVA


Correlation is described or classified in several different ways. Three of the
most important are:
I. Positive and Negative
II. Simple, Partial and Multiple
III. Linear and non-linear

25) What are Positive Correlation? Give an example.


If both the variables vary in the same direction, correlation is said to be positive. It means
if one variable is increasing, the other on an average is also increasing or if one variable is
decreasing, the other on an average is also deceasing, then the correlation is said to be positive
correlation.
For example, the correlation between heights and weights of a group of persons is a positive
correlation.

26) What are Negative Correlation? Give an example.


Negative Correlation: If both the variables vary in opposite direction, the correlation is
said to be negative. If means if one variable increases, but the other variable decreases or if
one variable decreases, but the other variable increases, then the correlation is said to be
negative correlation. For example, the correlation between the price of a product and its
demand is a negative correlation.

27) What are Simple Correlation? Give an example.


When only two variables are studied, it is a case of simple correlation. For example, when
one studies relationship between the marks secured by student and the attendance of student
in class, it is a problem of simple correlation.

28) What are Partial Correlation? Give an example.


Partial Correlation: In case of partial correlation one studies three or more variables but
considers only two variables to be influencing each other and the effect of other influencing
variables being held constant. For example, in above example of relationship between student
marks and attendance, the other variable influencing such as effective teaching of teacher,
use of teaching aid like computer, smart board etc are assumed to be constant.

29) What are Multiple Correlation? Give an example.


Multiple Correlations: When three or more variables are studied, it is a case of multiple
correlation. For example, in above example if study covers the relationship between student
marks, attendance of students, effectiveness of teacher, use of teaching aids etc, it is a case of

Compiled by K Praveen Kumar, SMC SHIRVA


multiple correlation.

30) What are Zero Correlation? Give an example.


Zero Correlation: Actually it is not a type of correlation but still it is called as zero or no
correlation. When we don’t find any relationship between the variables then, it is said to be
zero correlation. It means a change in value of one variable doesn’t influence or change the
value of other variable. For example, the correlation between weight of person and
intelligence is a zero or no correlation

31) What are Linear Correlation? Give an example.


Linear Correlation: If the amount of change in one variable bears a constant ratio to the
amount of change in the other variable, then correlation is said to be linear. If such variables
are plotted on a graph paper all the plotted points would fall on a straight line.
For example: If it is assumed that, to produce one unit of finished product we need 10 units
of raw materials, then subsequently to produce 2 units of finished product we need double of
the one unit.

32) What are Non-linear Correlation? Give an example.


Non-linear Correlation: If the amount of change in one variable does not bear a constant
ratio to the amount of change to the other variable, then correlation is said to be non-linear.
If such variables are plotted on a graph, the points would fall on a curve and not on a straight
line.
For example, if we double the amount of advertisement expenditure, then sales volume
would not necessarily be double.

33) What do you mean by Perfect Positive Correlation? Write its condition and draw the
scatter diagram.
Perfect Positive Correlation: In this case, the points will form on a straight line rising from
the lower left hand corner to the upper right hand corner.
Conditions for perfect positive correlation:
1.Linear Relationship: The relationship between the two variables should be perfectly linear.
2.Constant Ratio: The ratio between the changes in the two variables should be constant.
3.No Variability: There should be no variability in the relationship. Every change in one
variable should be associated with a corresponding and proportional change in the other.

Compiled by K Praveen Kumar, SMC SHIRVA


34) What do you mean by Perfect Negative Correlation? Write its condition and draw the
scatter diagram.
Perfect Negative Correlation: In this case, the points will form on a straight line declining
from the upper left hand corner to the lower right hand corner.
Conditions for perfect negative correlation:
Linear Relationship: The relationship between the two variables should be perfectly linear.
Constant Ratio: The ratio between the changes in the two variables should be constant.
No Variability: There should be no variability in the relationship. Every change in one
variable should be associated with a corresponding and proportional change in the other, but
in the opposite direction.

35) What do you mean by High Degree of Positive Correlation? Write its condition and draw
the scatter diagram.
High Degree of Positive Correlation: In this case, the plotted points fall in a narrow band,
wherein points show a rising tendency from the lower left hand corner to the upper right hand
corner.
Conditions for a high degree of positive correlation:
Correlation Coefficient (r): The correlation coefficient (r) should be close to +1. The range of
the correlation coefficient is from +1 (perfect positive correlation) to -1 (perfect negative
correlation), with 0 indicating no correlation.
Scatter Diagram: When plotting the data points on a scatter diagram, they should generally
form a clear upward-sloping pattern. This means that as one variable increases, the other
variable tends to increase as well.

36) What do you mean by High Degree of Negative? Write its condition and draw the scatter
diagram.
High Degree of Negative Correlation: In this case, the plotted points fall in a narrow
band, wherein points show a declining tendency from upper left hand corner to the lower right
hand corner.

37) What do you mean by Low Degree of Positive Correlation? Write its condition and draw
the scatter diagram.
Low Degree of Positive Correlation: If the points are widely scattered over the diagrams,
wherein points are rising from the left hand corner to the upper right hand corner.

Compiled by K Praveen Kumar, SMC SHIRVA


38) What do you mean by Low Degree of Negative Correlation? Write its condition and draw
the scatter diagram.
Low Degree of Negative Correlation: If the points are widely scattered over the diagrams,
wherein points are declining from the upper left hand corner to the lower right hand corner.

39) What do you mean by Zero (No) Correlation? Write its condition and draw the scatter
diagram.
Zero (No) Correlation: When plotted points are scattered over the graph haphazardly,
then it indicate that there is no correlation or zero correlation between two variables.

40) Write the formula to calculate Karl Pearson’s coefficient of correlation method using
direct method.

41) Write the formula to calculate Karl Pearson’s product moment coefficient of correlation.

42) What is a Correlation matrix? Give a example.


A correlation matrix is a table showing correlation coefficients between variables. Each
cell in the table shows the correlation between two variables. A correlation matrix is used to
summarize data, as an input into a more advanced analysis, and as a diagnostic for advanced
analyses.
Example:

43) What are the uses of Uses of Regression Analysis?


Uses of Regression Analysis:
1. It provides estimates of values of the dependent variables from values of independent
variables.
2. It is used to obtain a measure of the error involved in using the regression line as a basis
for estimation.
3. With the help of regression analysis, we can obtain a measure of degree of association or
correlation that exists between the two variables.

Compiled by K Praveen Kumar, SMC SHIRVA


4. It is highly valuable tool in economies and business research, since most of the problems
of the economic analysis are based on cause and effect relationship.

44) Write the formulae for regression equation of X on Y and Y on X.


Regression line of Y on X: This line gives the probable value of Y (Dependent variable)
for any given value of X (Independent variable).

Regression line of X on Y: This line gives the probable value of X (Dependent variable)
for any given value of Y (Independent variable).

Long Answerers

Explain three Types of Hypotheses with example.


Types of Hypotheses
Three types of hypotheses that will be explored here:
Research hypotheses
Statistical hypotheses
Substantive hypotheses.

Research Hypotheses
Research hypotheses are most nearly like hypotheses defined earlier. A research hypothesis
is a statement of what the researcher believes will be the outcome of an experiment or a
study.Before studies are undertaken, business researchers often have some idea or theory
based on experience or previous work as to how the study will turn out. These ideas, theories,
or notions established before an experiment or study is conducted are research hypotheses.
Some examples of research hypotheses in business might include:
■ Older workers are more loyal to a company.
■ Companies with more than $1 billion in assets spend a higher percentage of their annual
budget on advertising than do companies with less than $1 billion in assets.
■ The price of scrap metal is a good indicator of the industrial production index six months
later.

Statistical Hypotheses
In order to scientifically test research hypotheses, a more formal hypothesis structure needs
to be set up using statistical hypotheses. Suppose business researchers want to “prove” the

Compiled by K Praveen Kumar, SMC SHIRVA


research hypothesis that older workers are more loyal to a company. A “loyalty” survey
instrument is either developed or obtained. If this instrument is administered to both older
and younger workers, how much higher do older workers have to score on the “loyalty”
instrument (assuming higher scores indicate more loyal) than younger workers to prove the
research hypothesis? What is the “proof threshold”? Instead of attempting to prove or
disprove research hypotheses directly in this manner, business researchers convert their
research hypotheses to statistical hypotheses and then test the statistical hypotheses using
standard procedures.
All statistical hypotheses consist of two parts, a null hypothesis and an alternative
hypothesis. These two parts are constructed to contain all possible outcomes of the
experiment or study. Generally, the null hypothesis states that the “null” condition exists; that
is, there is nothing new happening, the old theory is still true, the old standard is correct, and
the system is in control.
The alternative hypothesis, on the other hand, states that the new theory is true, there are new
standards, the system is out of control, and/or something is happening.
As an example, suppose flour packaged by a manufacturer is sold by weight; and a particular
size of package is supposed to average 40 ounces. Suppose the manufacturer wants to test to
determine whether their packaging process is out of control as determined by the weight of
the flour packages. The null hypothesis for this experiment is that the average weight of the
flour packages is 40 ounces (no problem). The alternative hypothesis is that the average is
not 40 ounces (process is out of control).

Note that the “new idea” or “new theory” that company officials want to
“prove” is stated in the alternative hypothesis. The null hypothesis states that the old market
share of 18% is still true..

In business research, the conservative approach is to conduct a twotailed test


because sometimes study results can be obtained that are in opposition to the direction that
researchers thought would occur. For example, in the market share problem, it might turn out
that the company had actually lost market share; and even though company officials were not
interested in
“proving” such a case, they may need to know that it is true. It is recommended that, if in
doubt, business researchers should use a two-tailed test.

Substantive Hypotheses
In testing a statistical hypothesis, a business researcher reaches a conclusion based on the data
obtained in the study. If the null hypothesis is rejected and therefore the alternative hypothesis
is accepted, it is common to say that a statistically significant result has been obtained.

Compiled by K Praveen Kumar, SMC SHIRVA


For example, in the market share problem, if the null hypothesis is rejected, the result is that
the market share is “significantly greater” than 18%. The word significant to statisticians and
business researchers merely means that the result of the experiment is unlikely due to chance
and a decision has been made to reject the null hypothesis. However, in everyday business
life, the word significant is more likely to connote “important” or “a large amount.” One
problem that can arise in testing statistical hypotheses is that particular characteristics of the
data can result in a statistically significant outcome that is not a significant business outcome.

2. Explain Type I and Type II Errors with example

Ans-
Type I and Type II Errors
Because the hypothesis testing process uses sample statistics calculated from random data to
reach conclusions about population parameters, it is possible to make an incorrect decision
about the null hypothesis. In particular, two types of errors can be made in testing hypotheses:
Type I error and Type II error.
A Type I error is committed by rejecting a true null hypothesis. With a Type I error, the null
hypothesis is true, but the business researcher decides that it is not.
As an example, suppose the flour-packaging process
actually is “in control” and is averaging 40 ounces of flour per package. Suppose also that a
business researcher randomly selects 100 packages, weighs the contents of each, and
computes a sample mean. It is possible, by chance, to randomly select 100 of the more
extreme packages (mostly heavy weighted or mostly light weighted) resulting in a mean that
falls in the rejection region. The decision is to reject the null hypothesis even though the
population mean is actually 40 ounces. In this case, the business researcher has committed a
Type I error.
The notion of a Type I error can be used outside the realm of statistical hypothesis
testing in the business world. For example, if a manager fires an employee because some
evidence indicates that she is stealing from the company and if she really is not stealing from
the company, then the manager has committed a Type I error.
As another example, suppose a worker on the assembly line of a large manufacturer hears an
unusual sound and decides to shut the line down (reject the null hypothesis). If the sound
turns out not to be related to the assembly line and no problems are occurring with the
assembly line, then the worker has committed a Type I error.
The probability of committing a Type I error is called alpha ( α) or level of significance.
Alpha equals the area under the curve that is in the rejection region beyond the critical
value(s). The value of alpha is always set before the experiment or study is undertaken. As

Compiled by K Praveen Kumar, SMC SHIRVA


mentioned previously, common values of alpha are .05, .01, .10, and .001.

A Type II error is committed when a business researcher fails to reject a false null hypothesis.
In this case, the null hypothesis is false, but a decision is made to not reject it.
Suppose in the case of the flour problem that the packaging process is actually producing a
population mean of 41 ounces even though the null hypothesis is 40 ounces. A sample of 100
packages yields a sample mean of 40.2 ounces, which falls in the nonrejection region. The
business decision maker decides not to reject the null hypothesis. A Type II error has been
committed. The packaging procedure is out of control and the hypothesis testing process does
not identify it.
Suppose in the business world an employee is stealing from the company. A manager sees
some evidence that the stealing is occurring but lacks enough evidence to conclude that the
employee is stealing from the company. The manager decides not to fire the employee based
on theft. The manager has committed a Type II error.
Consider the manufacturing line with the noise. Suppose the worker decides not enough noise
is heard to shut the line down, but in actuality, one of the cords on the line is unraveling,
creating a dangerous situation. The worker is committing a Type II error
The probability of committing a Type II error is beta ( β).
Unlike alpha, beta is not usually stated at the beginning of the hypothesis testing procedure.
Actually, because beta occurs only when the null hypothesis is not true, the computation of
beta varies with the many possible alternative parameters that might occur.

Power, which is equal to 1 -β , is the probability of a statistical test rejecting the null
hypothesis when the null hypothesis is false. Figure 9.5 shows the relationship between α, β,
and power.

3.Imagine a company wants to test the claim that their batteries last more than 40 hours. Using
a simple random sample of 15 batteries yielded a mean of 44.9 hours, with a standard
deviation of 8.9 hours. Test this claim using a significance level of 0.05.

t 0.05,14=1.761
2.13>1.761 Reject Null Hypothesis

4. A random of sample size 20 is taken resulting in sample mean of 25.51 and a sample
standard deviation of 2.1933.Assume data is normally distributed use this information and α

Compiled by K Praveen Kumar, SMC SHIRVA


=0.05 to test the following hypothesis

The test is to determine whether the machine is out of control, and the shop supervisor has
not specified whether he believes the machine is producing plates that are too heavy or too
light. Thus a two-tailed test is appropriate. The following hypotheses are tested.

Anα of .05 is used. Figure 9.11 shows the rejection regions. Because n = 20, the degrees of
freedom for this test are 19 (20 - 1). The t distribution table is a onetailed table but the test for
this problem is two tailed, so alpha must be split, which yields α/2 = .025, the value in each
tail. (To obtain the table t value when conducting a two-tailed test, always split alpha and use
α/2.) The table t value for this example is 2.093. Table values such as this one are often written
in the following form:
t.025,19 = 2.093
Figure 9.12 depicts the t distribution for this example, along with the critical values, the
observed t value, and the rejection regions. In this case, the decision rule is to reject the

null hypothesis if the observed value of t is less than -2.093 or greater than +2.093 (in the
tails of the distribution). Computation of the test statistic yields

Because the observed t value is +1.04, the null hypothesis is not rejected. Not enough
evidence is found in this sample to reject the hypothesis that the population mean is 25 pound.

5. A random sample of size 20 is taken, resulting in a sample mean of 16.45 and a sample
standard deviation of 3.59. Assume x is normally distributed and use this information and a
= .05 to test the following hypotheses.
Η μ = 16 Η μ≠ 16
Ans-

6. To test the difference in the two methods, the managers randomly select one group of 15
newly hired employees to take the three-day seminar (method A) and a second group of 12
new employees for the two-day DVD method (method B). Table shows required data Using
α= .05, the managers want to determine whether there is a significant difference in the mean
scores of the two groups.

HYPOTHESIZE: STEP 1. The hypotheses for this test follow.


TEST: STEP 2. The statistical test to be used is formula 10.3.

Compiled by K Praveen Kumar, SMC SHIRVA


STEP 3. The value of alpha is .05.
STEP 4. Because the hypotheses are = and ≠this test is two tailed. The degrees of freedom
are 25 (15 + 12 - 2 = 25) and alpha is .05. The t table requires an alpha value for one tail only,
and, because it is a two-tailed test, alpha is split from .05 to .025 to obtain the table t value:
t.025,25 = ±2.060
The null hypothesis will be rejected if the observed t value is less than -2.060 or greater than
+2.060.
STEP 5. The sample data are given in Table 10.2. From these data, we can calculate the
sample statistics. The sample means and variances follow.

STEP 6. The observed value of t is

ACTION: STEP 7. Because the observed value, t = -5.20, is less than the lower critical table
value, t = -2.06, the observed value of t is in the rejection region. The null hypothesis is
rejected. There is a significant difference in the mean scores of the two tests.
BUSINESS IMPLICATIONS: STEP 8. Figure 10.6 shows the critical areas and the observed
t value. Note that the computed t value is -5.20, which is enough to cause the managers of the
Hernandez Manufacturing Company to reject the null hypothesis.

7. Use the data given and the eight step process to test the following hypotheses. Use 1% level
of significance.

HYPOTHESIZE: STEP 1. The hypotheses for this test follow.


H0: μ1 - μ2 = 0
H1: μ1 - μ2 < 0
TEST: STEP 2. The statistical test to be used is formula 10.3.
STEP 3. The value of alpha is .01.
STEP 4. Because the hypotheses are = and < this test is one tailed. The degrees of freedom
are (11 + 8 - 2 = 17) and alpha is .01. The t table requires an alpha value for one tail only,
and, because it is a two-tailed test, alpha is split from .05 to .025 to obtain the table t value:
t.01,17 = 2.567
The null hypothesis will be rejected if the observed t value is greater than 2.567.
STEP 5. The sample data are given in Table. From these data, we can calculate the sample
statistics.

STEP 6. The observed value of t is

t=((24.56-26.42)-(0))/(√((12.4(7)+15.8(10))/11+8-2)+8-2√(1/8+1/11))=-1.0548639

Compiled by K Praveen Kumar, SMC SHIRVA


ACTION: STEP 7. Because the observed value, t = -1.0548639 is less than the critical value
i.e 2.567 so this hypothesis is accepted
The shaded area represents the rejection zone, indicating the range of t values that would
lead to the rejection(greater than 2.567) of the null hypothesis. In this analysis, the critical
value for a significance level of 0.01 is 2.567.

8. To test this, we may recruit a simple random sample of 20 college basketball players and
measure each of their max vertical jumps. Then, we may have each player use the training
program for one month and then measure their max vertical jump again at the end of the
month.To determine whether the training program increase max vertical jump, we will
perform a paired samples t-test at significance level α = 0.05.sample mean of the differences
is -0.95 and sample standard deviation of the differences is 1.317
The hypotheses for this test are:

- Null Hypothesis (H0): The mean difference is zero.


- Alternative Hypothesis (H1): The mean difference is less than zero.

Given:
- Sample mean of differences (\(\bar{x_d}\)): -0.95
- Sample standard deviation of differences (sd_d): 1.317
- Sample size (\(n\)): 20
- Significance level (\(\alpha\)): 0.05

Here are the steps to perform the test:

1. **Set up hypotheses:**
- \(H0: \mu_d = 0\) (Mean difference is zero)
- \(H1: \mu_d < 0\) (Mean difference is less than zero)

2. **Calculate the test statistic (\(t\)):**


\[ t = \frac{\bar{x_d} - \mu_0}{\frac{sd_d}{\sqrt{n}}} \]
where \(\mu_0\) is the hypothesized population mean difference (in this case, 0).

Substitute the given values:


\[ t = \frac{-0.95 - 0}{\frac{1.317}{\sqrt{20}}} \]

Compiled by K Praveen Kumar, SMC SHIRVA


3. **Determine the critical region:**
- Since it's a one-tailed test and \(\alpha = 0.05\), you'll find the critical value from the t-
distribution table.
- Degrees of freedom (\(df\)) = \(n - 1\)

4. **Make a decision:**
- If \(|t| > \text{Critical Value}\), reject \(H0\).
- If \(|t| \leq \text{Critical Value}\), fail to reject \(H0\).

5. **Calculate the p-value:**


- You can also calculate the p-value associated with the test statistic.

6. **Make a decision based on the p-value:**


- If \(p < \alpha\), reject \(H0\).
- If \(p \geq \alpha\), fail to reject \(H0\).

7. **Conclusion:**
- Summarize the results and conclude whether there is enough evidence to reject the null
hypothesis.
If you provide the critical value or degrees of freedom, I can help you complete the
calculation.

9. Suppose a stock market investor is interested in determining whether there is a significant


difference in the P/E (price to earnings) ratio for companies from one year to the next.Assume
α=.01 Assume that differences in P/E ratios are normally distributed in the population. n=9

These data are related data because each P/E value for year 1 has a corresponding year 2
measurement on the same company. Because no prior information indicates whether P/E
ratios have gone up or down, the hypothesis tested is two tailed. Assume α=.01 Assume that
differences in P/E ratios are normally distributed in the population.
HYPOTHESIZE: STEP 1.

TEST: STEP 2. The appropriate statistical test is

STEP 3.α=.01 STEP 4. Becauseα=.01 and this test is two tailed,α/2=.005 is used to obtain the
table t value. With nine pairs of data, n = 9, df = n - 1 = 8. The table t value is

Compiled by K Praveen Kumar, SMC SHIRVA


t.005,8 = ±3.355.If the observed test statistic is greater than 3.355 or less than - 3.355, the
null hypothesis will be rejected.
STEP 5. The sample data are given in Table 10.5.
STEP 6. Table 10.6 shows the calculations to obtain the observed value of the test statistic,
which is t=-0.70
ACTION: STEP 7. Because the observed t value is greater than the critical table t value in
the lower tail(t=-0.70>t=-3.355), it is in the nonrejection region.
BUSINESS IMPLICATIONS: STEP 8. There is not enough evidence from the data to declare
a significant difference in the average P/E ratio between year 1 and year 2. The graph in
Figure 10.9 depicts the rejection regions, the critical values of t, and the observed value of t
for this example.
10.Use the data given and a 1% level of significance to test the following hypotheses.
Assume the differences are normally distributed in the population.

11. Use the data given to test the following hypotheses Assume the differences are normally
distributed in the population.
H0: D = 0 Ha: D ≠ 0
Individual Before After
1 107 102
2 99 98
3 110 100
4 113 108
5 96 89

12.Suppose a store manager wants to find out whether the results of this consumer survey
apply to customers of supermarkets in her city. To do so, she interviews 207 randomly
selected consumers as they leave supermarkets in various parts of the city. Now the manager
can use a chisquare test to determine whether the observed frequencies of responses from this

Compiled by K Praveen Kumar, SMC SHIRVA


survey are the same as the frequencies that would be expected on the basis of the national
survey. (α= .05).

Hypothesis:
Step 1: H0 : The observed value is same as the expected distribution
Ha : The observed value is not same as the expected distribution
Step 2: The statistical test being used is
X^2=∑▒〖((f_o-f_e ))/f_e 〗^2
Step 3: Let α = .05
Step 4: Chi-square goodness-of-fit tests are one tailed because a chi-square of zero indicates
perfect agreement between distributions. Any deviation from zero difference occurs in the
positive direction only because chi-square is determined by a sum of squared values and can
never be negative.
Here k=4
Degree of freedom, ie, df = k – 1 = 4 – 1 = 3
x^2 0.5,3 = 7.8147
After the data are analyzed, an observed chi-square greater than 7.8147 must be computed in
order to reject the null hypothesis.
Step 5 : The observed values gathered in the sample data from Table sum to 207. Thus n =
207. The expected proportions are given, but the expected frequencies must be calculated by
multiplying the expected proportions by the sample total of the observed frequencies.

Step 6: Calculating Chi-square

ACTION: Step 7: Because the observed value of chi-square of 6.25 is not greater than the
critical table value of 7.8147, the store manager will not reject the null hypothesis.
BUSINESS IMPLICATIONS: Step 8: Thus, the data gathered in the sample of 207
supermarket shoppers indicate that the distribution of responses of supermarket shoppers in
the manager’s city is not significantly different from the distribution of responses to the
national survey. The store manager may conclude that her customers do not appear to have
attitudes different from those people who took the survey.

13.Use chi-square test to determine whether the observed frequencies are distributed the same
as the expected frequencies. (α= .05).

Hypothesis:
Step 1: H0 : The observed value is same as the expected distribution
Ha : The observed value is not same as the expected distribution
Step 2: The statistical test being used is

Compiled by K Praveen Kumar, SMC SHIRVA


X^2=∑▒〖((f_o-f_e ))/f_e 〗^2
Step 3: Let α = .05
Step 4: Chi-square goodness-of-fit tests are one tailed because a chi-square of zero indicates
perfect agreement between distributions. Any deviation from zero difference occurs in the
positive direction only because chi-square is determined by a sum of squared values and can
never be negative.
Here k=4
Degree of freedom, ie, df = k – 1 = 6– 1 = 5
x^2 0.5,5 = 11.0705
After the data are analyzed, an observed chi-square greater than 11.0705 must be computed
in order to reject the null hypothesis.
Step 5:

ACTION: Step 7: Because the observed value of chi-square of 12.4802 is greater than the
critical table value of 11.0705, the store manager will reject the null hypothesis.

14.Use chi-square test to determine whether the observed frequencies represent a uniform
distribution. (α= .01)

15. Dairies would like to know whether the sales of milk are distributed uniformly over a year
so they can plan for milk production and storage. A uniform distribution means that the
frequencies are the same in all categories. In this situation, the producers are attempting to
determine whether the amounts of milk sold are the same for each month of the year. They
ascertain the number of gallons of milk sold by sampling one large supermarket each month
during a year, obtaining the following data. Use α=.01 to test whether the data fit a uniform
distribution. (Using Chi-square Test)

Answer:
H0: The monthly figures for milk sales are uniformly distributed.

Compiled by K Praveen Kumar, SMC SHIRVA


Ha: The monthly figures for milk sales are not uniformly distributed.
The statistical test used is
23
Alpha is .01.
There are 12 categories and a uniform distribution is the expected distribution, so the degrees
of freedom are k-1=12-1=11. For the α=.01, the critical value is x2.01.11=24.725.

An observed chi-square value of more than 24.725 must be obtained to reject null hypothesis.
The expected monthly figure is,
18,447/12=1537.25 gallons
The following table shows the observed frequencies, the expected frequencies and chi-square
calculations.

The observed x2 value of 74.37 is greater than the critical table value of x2.01.11=24.725.
So, the decision is to reject null hypothesis. This problem provides enough evidence that the
distribution of milk sales is not uniform.

16.Construct one way ANOVA table for following data.

Answer:
Tj: T1=12 T2=23 T3=25 T=60
nj: n1=6 n2=5 n3=6 N=17
x ̅j: x ̅1=2 x ̅2=4.6 x ̅3=4.17 x =
̅ 3.59
SSC= [6(2-3.59)2+5(4.6-3.59)2+6(4.17-3.59)2]
= [6(-1.59)2+5(1.01)2+6(0.58)2]
= [6(2.53) +5(1.02) +6(0.34)]
= 22.32
SSE= [(2-2)2+(1-2)2+(3-2)2+(3-2)2+(2-2)2+(1-2)2+(5-4.6)2+(3-4.6)2+(6-4.6 )2 +(4-
4.6)2+(5-4.6)2+(3-4.17)2+(4-4.17)2+(5-4.17)2+(5-4.17)2+(3-4.17 )2+ (5-4.17)2]
= [ (0)2 +(-1)2 +(1)2 +(1)2 +(0)2 +(-1)2 +(0.4)2 +(-1.6)2 +(1.4)2 +(0.6)2 +(0.4)2 +(-1.17)2
+(-0.17)2 +(0.83)2 +(0.83)2 +(-1.17)2 +(0.83)2]
= [0+1+1+1+0+1+0.16+2.56+1.96+0.36+0.16+1.3689+
0.0289+0.6889+0.6889+1.3689+0.6889]
= 14.0334
SST= [(2-3.59)2+(1-3.59)2+(3-3.59)2+(3-3.59)2+(2-3.59)2+(1-3.59)2+(5-3.59)2+(3-
3.59)2+(6-3.59)2 +(4-3.59)2+(5-3.59)2+(3-3.59)2+(4-3.59)2+(5-3.59)2+(5-3.59)2+(3-

Compiled by K Praveen Kumar, SMC SHIRVA


3.59)2+ (5-3.59)2]
= [ (-1.59)2 +(-2.59)2 +(-0.59)2 +(-0.59)2 +(-1.59)2 +(-2.59)2 +(1.41)2 +(-0.59)2 +(2.41)2
+(0.41)2 +(1.41)2 +(-0.59)2 +(0.41)2 +(1.41)2 +(1.41)2 +(0.59)2 +(1.41)2]
= [2.5281+6.7081 +0.3481 +0.3481 +2.5281 +6.7081+1.9881+0.3481 +5.808
+0.1681+1.9881+0.3481+0.1681+1.9881 +1.9881+0.3481+1.9881]
= 36.2977
dfc=C-1=3-1=2
dfe=N-C=17-3=14
dft=N-1=17-1=16
MSC=SSC/dfc=22.32/2=11.16
MSE=SSE/dfe=14.0334/14=1.0023
F= MSC/MSE=22.32/14.0334=1.59
Source of Variance SS df MS F
Between 22.32 2 11.16 1.59
Error 14.0334 14 1.0023
Total 36.2977 16

17. A company has three manufacturing plants, and company officials want to determine
whether there is a difference in the average age of workers at the three locations. The
following data are the ages of five randomly selected workers at each plant. Perform a one-
way ANOVA to determine whether there is a significant difference in the mean ages of the
workers at the three plants. Use α = .01 and note that the sample sizes are equal.

Answer:
H0: µ1 =µ2 =µ3
Ha: At least one of the means is different from the others.
The appropriate test statistic is the F test calculated from ANOVA.
The value of α is .01.
The degree of freedom for this problem are 3-1=2 for the numerator and 15-3=12 for the

Compiled by K Praveen Kumar, SMC SHIRVA


denominator. The critical F value is F.01,2,12=6.93.
Because ANOVAs are always one tailed with the rejection region in the upper tail, the
decision rule is to reject the null hypothesis if the observed value of F is greater than 6.93.

SSC= [5(28.2-28.33)2+5(32-28.33)2+5(24.8-28.33)2]
= 129.73
SSE= [(29-28.2)2+(27-28.2)2+(30-28.2)2+(27-28.2)2+(28-28.2)2+(32-32)2 +(33-32)2
+(31-32)2 +(34-32)2 +(30-32)2 +(25-24.8) 2+(24-24.8)2+(24-24.8)2+(25-24.8)2+ (26-
24.8)2]
=19.60
SST= [(29-28.33)2+(27-28.33)2+(30-28.33)2 +(27-28.33)2+(28-28.33)2+(32-28.33)2+(33-
28.33)2+(31-28.33)2+(34-28.33)2+(30-28.33)2+(25-28.33)2+(24-28.33)2+(24-
28.33)2+(25-28.33)2+(26-28.33)2]
=149.33
dfc=C-1=3-1=2
dfe=N-C=15-3=12
dft=N-1=15-1=14
MSC=SSC/dfc=129.73/2=64.87
MSE=SSE/dfe=19.60/12=1.63
F= MSC/MSE=64.87/1.63=39.80

The decision is to reject the null hypothesis because the observed F value of 39.80 is greater
than the critical table F value of 6.93.

18. Construct one way ANOVA table for following data.

Answer:

Source of Variance SS df MS F
Between 0.23658 3 0.078860 10.18
Error 0.15492 20 0.007746
Total 0.39150 23

Compiled by K Praveen Kumar, SMC SHIRVA


19.A survey of the morning beverage market shows that the primary breakfast beverage for
17% of Americans is milk. A milk producer in Wisconsin, where milk is plentiful, believes
the figure is higher for Wisconsin. To test this idea, she contacts a random sample of 550
Wisconsin residents and asks which primary beverage they consumed for breakfast that day.
Suppose 115 replied that milk was the primary beverage. Using a level of significance of .05,
test the idea that the milk figure is higher for Wisconsin.

=▸ HYPOTHESIZE:
STEP 1. The milk producer’s theory is that the proportion of Wisconsin residents who drink
milk for breakfast is higher than the national proportion, which is the alternative hypothesis.
The null hypothesis is that the proportion in Wisconsin does not differ from the national
average. The hypotheses for this problem are

STEP 2. The test statistic is

STEP 3. The Type I error rate is .05.


STEP 4. This test is a one-tailed test, and the table value is z .05 =+1.645.The sample results
must yield an observed z value greater than 1.645 for the milk producer to reject the null
hypothesis. The following diagram shows z .05 and the rejection region for this problem.

ACTION:
STEP 7. Because z=2.44 is beyond z .05 =1.645 in the rejection region, the milk producer
rejects the null hypothesis. The probability of obtaining a z>=2.44 by chance is .0073
.

Compiled by K Praveen Kumar, SMC SHIRVA


BUSINESS IMPLICATIONS:
STEP 8. If the proportion of residents who drink milk for breakfast is higher in Wisconsin
than in other parts of the United States.

20. A manufacturer believes exactly 8% of its products contain at least one minor flaw.
Suppose a company researcher wants to test this belief. The null and alternative hypotheses
are
The business researcher randomly selects a sample of 200 products, inspects each item for
flaws, and determines that 33 items have at least one minor flaw. Calculating the sample
proportion. (α=.10)
=A manufacturer believes exactly 8% of its products contain at least one minor flaw. Suppose
a company researcher wants to test this belief. The null and alternative hypotheses are

The business researcher randomly selects a sample of 200 products, inspects each item for
flaws, and determines that 33 items have at least one minor flaw. Calculating the sample
proportion.(α=.10)

This test is two-tailed because the hypothesis being tested is whether the proportion of
products with at least one minor flaw is .08. Alpha is selected to be .10. Figure 9.15 shows
the distribution, with the rejection regions and z.05. Because is divided for a two-tailed test,
the table value for an area of (1/2)(.10) = .05 is z.05 = ±1.645
For the business researcher to reject the null hypothesis, the observed z value must be greater
than 1.645 or less than -1.645. The business researcher randomly selects a sample of 200
products, inspects each item for flaws, and determines that 33 items have at least one minor
flaw. Calculating the sample proportion gives

The observed value of z is in the rejection region (observed z = 4.43> table z.05 = +1.645),
so the business researcher rejects the null hypothesis. He concludes that the proportion of
items with at least one minor flaw in the population from which the sample of 200 was drawn
is not .08. With α=.10 , the risk of committing a Type I error in this example is .10 .

21. Using the given sample information, test following hypotheses. Note that x is the number
in the sample having the characteristics of interest.

Compiled by K Praveen Kumar, SMC SHIRVA


=

22.Using the given sample information, test following hypotheses. Note that x is the number
in the sample having the characteristics of interest.

23.Suppose you decide to test this result by taking a survey of your own and identify female
entrepreneurs by gross sales. You interview 100 female entrepreneurs with gross sales of less
than $100,000, and 24 of them define sales/profit as success. You then interview 95 female
entrepreneurs with gross sales of $100,000 to $500,000, and 39 cite sales/profit as a definition
of success. Use this information to test to determine whether there is a significant difference
in the proportions of the two groups that define success as sales/profit. Use α=.01
Step 1: Since we are testing to determine whether there is a difference between the two groups
of entrepreneurs, a two-tailed test is required. The hypotheses follow.
H0: p1 − p2 = 0
Ha: p1 − p2 ≠ 0

Step 2: At step 2, the appropriate statistical test and sampling distribution are determined.
Because we are testing the difference in two population proportions, the z test in Formula
10.10 is the appropriate test statistic.

Step 3: At step 3, the Type I error rate, or alpha, which is .01, is specified for this problem
Step 4: With α = .01, the critical z value can be obtained from Table A.5 for α/2 = .005, z.005
= ±2.575. The decision rule is that if the data produce a z value greater than 2.575 or less than
−2.575, the test statistic is in the rejection region and the decision is to reject the null
hypothesis. If the observed z value is less than 2.575 but greater than −2.575, the decision is
to not reject the null hypothesis because the observed z value is in the nonrejection region.
Step 5: The sample information follows:

24.A group of researchers attempted to determine whether there was a difference in the

Compiled by K Praveen Kumar, SMC SHIRVA


proportion of consumers and the proportion of CEOs who believe that fear of getting caught
or losing one’s job is a strong influence of ethical behavior. In their study, they found that
57% of consumers said that fear of getting caught or losing one’s job was a strong influence
on ethical behavior, but only 50% of CEOs felt the same way. Suppose these data were
determined from a sample of 755 consumers and 616 CEOs. Does this result provide enough
evidence to declare that a significantly higher proportion of consumers than of CEOs believe
fear of getting caught or losing one’s job is a strong influence on ethical behavior?( α=0.10).
HYPOTHESIZE: STEP 1. Suppose sample 1 is the consumer sample and sample 2 is the
CEO sample. Because we are trying to prove that a higher proportion of consumers than of
CEOs believe fear of getting caught or losing one’s job is a strong influence on ethical
behaviour, the alternative hypothesis should be p 1 >p 2 =0.The following hypotheses are
being tested
H0: p1 − p2 = 0
Ha: p1 − p2 > 0
Where p1 is the proportion of consumers who select the factor
p2 is the proportion of CEOs who select the factor TEST:
STEP 2. The appropriate statistical test is formula 10.10.

STEP 3. Let α=0.10


STEP 4. Because this test is a one-tailed test, the critical table z value is z 0.10 =1.28. If an
observed value of z of more than 1.28 is obtained, the null hypothesis will be rejected. Figure
10.12 shows the rejection region and the critical value for this problem.
STEP 5. The sample information follows

ACTION: STEP 7. Because z =2.59 is greater than the critical table z value of 1.28 and is in
the rejection region, the null hypothesis is rejected.
BUSINESS IMPLICATIONS: STEP 8. A significantly higher proportion of consumers than
of CEOs believe fear of getting caught or losing one’s job is a strong influence on ethical
behaviour

25.Explain Types of Correlation with example.


Correlation is a statistical measure that describes the extent to which two variables change
together. There are different types of correlation coefficients, and each indicates the strength
and direction of the relationship between two variables. In R programming, you can calculate
correlation using functions like `cor()`.
Here are three types of correlation coefficients commonly used:

Compiled by K Praveen Kumar, SMC SHIRVA


1. Pearson Correlation Coefficient:
- Measures the linear relationship between two variables.
- Values range from -1 to 1.
- A value of 1 indicates a perfect positive linear relationship, -1 indicates a perfect negative
linear relationship, and 0 indicates no linear relationship.
# Example data
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 6, 8, 10)
pearson_corr <- cor(x, y, method = "pearson")
print(pearson_corr)
2. Spearman Rank Correlation Coefficient:
- Measures the monotonic relationship between two variables.
- It is based on the ranks of the data rather than the actual values.
- Values range from -1 to 1.
# Example data
x <- c(10, 20, 30, 40, 50)
y <- c(5, 15, 25, 35, 45)
spearman_corr <- cor(x, y, method = "spearman")
print(spearman_corr)
3.Kendall Rank Correlation Coefficient:
- Measures the strength and direction of the monotonic relationship between two variables.
- Like Spearman, it is based on the ranks of the data.
- Values also range from -1 to 1.
# Example data
x <- c(3, 1, 4, 2, 5)
y <- c(15, 10, 25, 5, 30)
kendall_corr <- cor(x, y, method = "kendall")
print(kendall_corr)

In these examples, `x` and `y` are vectors representing two variables. You can replace these
vectors with your own data. The `cor()` function is used to calculate the correlation, and the
`method` parameter specifies the type of correlation coefficient to be computed.

26.Explain types of correlation with respect Correlation coefficient with condition and
suitable scatter diagram.

1. Positive Correlation:
Positive correlation occurs when two variables tend to increase or decrease together. In other

Compiled by K Praveen Kumar, SMC SHIRVA


words, as the value of one variable increases, the value of the other variable also tends to
increase, and vice versa. The correlation coefficient for positive correlation is close to 1.
# Example data for positive correlation
hours_studied <- c(2, 4, 6, 8, 10, 12)
exam_scores <- c(60, 70, 80, 85, 90, 95)
pearson_corr <- cor(hours_studied, exam_scores, method = "pearson")
plot(hours_studied, exam_scores, main = paste("Positive Correlation (r =",
round(pearson_corr, 2), ")"),
xlab = "Hours Studied", ylab = "Exam Scores")
2. Negative Correlation:
Negative correlation occurs when two variables move in opposite directions. In other words,
as the value of one variable increases, the value of the other variable tends to decrease, and
vice versa. The correlation coefficient for negative correlation is close to -1.
# Example data for negative correlation
temperature <- c(30, 25, 20, 15, 10, 5)
jacket_sales <- c(10, 15, 20, 25, 30, 35)
pearson_corr <- cor(temperature, jacket_sales, method = "pearson")
plot(temperature, jacket_sales, main = paste("Negative Correlation (r =",
round(pearson_corr, 2), ")"),
xlab = "Outdoor Temperature (°C)", ylab = "Jacket Sales")
3. No/Weak Correlation:
No or weak correlation occurs when there is little to no linear relationship between two
variables. The correlation coefficient in such cases is close to 0.
# Example data for weak or no correlation
tv_hours <- c(1, 2, 3, 4, 5, 6)
water_consumption <- c(2, 3, 3, 2, 4, 3)
pearson_corr <- cor(tv_hours, water_consumption, method = "pearson")
plot(tv_hours, water_consumption, main = paste("Weak/No Correlation (r =",
round(pearson_corr, 2), ")"),
xlab = "TV Hours", ylab = "Water Consumption (liters)")

4.Non-Linear Correlation:
Non-linear correlation occurs when there is a relationship between two variables that is not
well described by a straight line. In such cases, a non-linear correlation coefficient, such as
Spearman or Kendall, might be more appropriate.
# Example data for non-linear correlation
diameter <- c(1, 2, 3, 4, 5)
area <- c(pi * (0.5 * diameter)^2)
spearman_corr <- cor(diameter, area, method = "spearman")

Compiled by K Praveen Kumar, SMC SHIRVA


plot(diameter, area, main = paste("Non-linear Correlation (rho =", round(spearman_corr, 2),
")"),
xlab = "Diameter", ylab = "Area")

27.From following information find the correlation coefficient between advertisement


expenses and sales volume using Karl Pearson’s coefficient of correlation method (Direct
Method).
Firm 1 2 3 4 5 6 7 8 9 10
Advertisement Exp.(Rs.In Lakhs) 11 13 14 16 16 15 15 14 13
13
Volume(Rs.In Lakhs) 50 50 55 60 65 65 65 60 60 50

Solution:
Formula:

28.Calculate the Karl Pearson’s product moment of coefficient of correlation.

Solution:
Formula:

Firm Interest X Y X2 Y2 XY
1 7.43 221 55.205 48841 1642.03
2 7.48 222 55.950 49284 1660.56
3 8.00 226 64.000 51076 1808.00
4 7.75 225 60.063 50625 1743.75
5 7.60 224 57.760 49729 1702.40
38.26 1118 292.978 249555 8556.74
∑X ∑y ∑x2 ∑y2 ∑XY

= 23878.34-8554.936

√([292.698-58.553])[251002-49996.96]

Compiled by K Praveen Kumar, SMC SHIRVA


=15323.404
√4706435.09

=15323.404
680.344

=2.2336

29. Calculate the Karl Pearson’s product moment of coefficient of correlation.

Solution:

Solution:

=229-(11.49)(9.857)

√([1148-(11.42)*(11.42)][815-(9.857)(9.857))]]

=501.433/√([1017.58])[717.83]
=501.43/854.663
=0.58670

30. Determine the Karl Pearson’s product moment of coefficient of correlation.

X Y X2 Y2 XY

Compiled by K Praveen Kumar, SMC SHIRVA


4 18 16 324 72
6 12 36 144 72
7 13 49 169 91
11 8 121 64 88
14 7 196 49 98
17 7 289 49 119
21 4 441 16 84
∑▒X=80 ∑▒Y=69 ∑_X▒2=1148 ∑_Y▒2=879 ∑▒XY=624

Y_XY = (n∑▒〖xy-(∑▒x)(∑▒〖y)〗〗)/√([n∑_x▒〖2-(∑▒x)2][n∑_y▒2-(∑▒y)2]〗)
=(7*624-(80)(69))/√([(7*1148)-6400] [(7*879)-4761)])
=(-1152)/15098
= -0.76337

31. Determine the Karl Pearson’s product moment of coefficient of correlation.

X Y X2 Y2 XY
158 349 24964 121801 55142
296 510 87616 260100 150960
87 301 7569 90601 26187
110 322 12100 103684 35420
436 550 190096 302500 239800
∑▒X=1087 ∑▒Y=2032 ∑_X▒2=322345 ∑_Y▒2=878686 ∑▒XY=507509

Y_XY = (n∑▒〖xy-(∑▒x)(∑▒〖y)〗〗)/√([n∑_x▒〖2-(∑▒x)2][n∑_y▒2-(∑▒y)2]〗)
=((5*507509)-(1087*2032))/√([(5*322345)-1181569] [(5*878686)-4129024])
=328761/√(430156*264406)
=0.94748

32. Find the two regression equation of X on Y and Y on X from the following data:

Solution:

33.Compute the regression equation of y on x from the following data

Compiled by K Praveen Kumar, SMC SHIRVA


X 2 4 5 6 8 11
Y 18 12 10 8 7 5

34.
Find the regression equation of x on y and predict the value of x when y is 9.
X 3 6 5 4 4 6 7 5
Y 3 2 3 5 3 6 6 4

Solution:
X Y Y2 XY
3 3 9 9
6 2 4 12
5 3 9 15
4 5 25 20
4 3 9 12
6 6 36 36
7 6 36 42
5 4 16 20
∑X=40 ∑Y=32 ∑ Y2=144 ∑XY=166

Compiled by K Praveen Kumar, SMC SHIRVA


Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA
Compiled by K Praveen Kumar, SMC SHIRVA

You might also like