0% found this document useful (0 votes)
84 views115 pages

Research Methododlgy Lab File

The document provides information about various functions in Microsoft Excel including COUNT, COUNTA, COUNTBLANK, SUM, MAX, MIN, AVERAGE, COUNTIF, SUMIF, AVERAGEIF, CONCATENATE, VLOOKUP, HLOOKUP and other tools like transposing tables, text to column, conditional formatting, formatting as tables, formatting cells, cell styles, data validation, customizing the ribbon and quick access toolbar, and the backstage view. Examples of how to use each function are also provided.

Uploaded by

Alex T Binoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views115 pages

Research Methododlgy Lab File

The document provides information about various functions in Microsoft Excel including COUNT, COUNTA, COUNTBLANK, SUM, MAX, MIN, AVERAGE, COUNTIF, SUMIF, AVERAGEIF, CONCATENATE, VLOOKUP, HLOOKUP and other tools like transposing tables, text to column, conditional formatting, formatting as tables, formatting cells, cell styles, data validation, customizing the ribbon and quick access toolbar, and the backstage view. Examples of how to use each function are also provided.

Uploaded by

Alex T Binoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 115

Research Methodology Lab

(Using MS Excel and R)

PRACTICAL FILE

Submitted for partial fulfillment for the award of the


Degree of

BACHELOR OF B U S I N E S S
ADMINISTRATION

(BBA 2020 - 2023)

Under the supervision of


Dr. Rubeena Bano

Submitted by
NAME : ALEX T BINOY
ENROLEMENT NO. : 00516701720

SIRIFORT INSTITUTE OF MANAGEMENT STUDIES


(Affiliated to Guru Gobind Singh Indraprastha University)

1
INDEX
Topic Page No.
Functions in Excel
Count 6
CountA 6–7
Count Blank 7
Sum 8
Max 8–9
Min 9
Average 10
CountIf 10 – 11
SumIF 11
Average If 12 – 13
Concatenate 13
VlookUp 14
Vlookup+ Dropdown 15 – 16
Hlookup 16 – 17

Other tools
Transpose table 18 – 20
Text to Column 21 – 25
Conditional Formatting – Highlight Cell rules (greater than, less than,
between, equal to, text that contains, a date occurring, duplicate 25 – 29
values)
Conditional Formatting – Top/ Bottom rules 29 – 30
Conditional Formatting – Data Bars 30 – 31
Conditional Formatting – Color Scales 31 – 32
Format as Tables 32 – 33
Format Cells – Number, Alignment, Font, Border, Fill 34
Cell Styles 35
Data validation – settings ( any value, number , custom) 36 – 37
Data validation – input message 37 – 39
Data validation – error alert 39 – 41
Customization - ribbon 41 – 42
Customization- quick access toolbar 43 – 44
backstage view 44 – 47
save as adobe pdf 47 – 48
Data Visualization and Analysis
Frequency 51
Relative frequency 51 – 52
Percentage frequency 52
Bar Graph 53
Histogram using Graph tab 54 – 57

2
Pivot Table and its tools 58 – 59
Pivot Chart and its tools 60

Histogram frequency distribution


Histogram – Chart output 61
Histogram – Pareto (sorted diagram) 62
Histogram – Cumulative percentage 62 – 63
Descriptive statistics 63 – 64
Descriptive statistics for various scales 65 – 66
Correlation 67 – 68
Hypothesis Testing
One sample t test using dummy (one-tailed)
One sample t test using dummy (two-tailed)
One sample t test using test average (one-
tailed)
One sample t test using test average (two-
tailed)
t test using function (all combinations)
Two sample - Independent sample t test
Two sample - Paired Sample t test
One sample z test 86 – 88
Two sample z test 89 – 92
ANOVA – Single Factor 92 – 94
ANOVA – Two Factor without replication 95 – 97
ANOVA – Two Factor with replication 98 – 100
F test 101 – 103
Chi square test 103 – 104

Introduction to R

Four Panes in R 108 – 109


Import of Data Sheet in Excel 109 - 111
Descriptive statistics
Correlation
Hypothesis Testing
One sample t test
Two sample - Independent sample t test
Two sample - Paired Sample t test
One way ANOVA
F test
Chi square test

3
FUNCTIONS IN
EXCEL

4
Data - Enter the date for 26 random people.

S.NO. NAME DESIGNATION SALARY LEAVES


1 A Vice President 100000 4
2 B President 200000 3
3 C Manager 90000 2
4 D President 200000 3
5 E Manager 80000 6
6 F Manager 90000
7 G Vice President 100000 4
8 H President 200000 5
9 I Manager 65000 2
10 J President 200000 NA
11 K President 200000 3
12 L President 700000 11
13 M President 200000 3
14 N President 350000 15
15 O Manager 90000 2
16 P Manager 50000
17 Q Manager 90000 2
18 R Manager 90000 6
19 S Vice President 71000 4
20 T Vice President 100000 4
21 U Manager 90000 5
22 V Vice President 100000 4
23 W Manager 90000 2
24 X President 200000 3
25 Y President 200000 3
26 Z Manager 90000 2

5
1. COUNT FUNCTION
(A) MEANING:The COUNT function counts the number of cells that contain numbers, and
counts numbers within the list of arguments. Use the COUNT function to get the number of
entries in a number field that is in a range or array of numbers.

(B) SYNTAX: =COUNT(value1, value2,…)

(C) EXAMPLE:

2. COUNTA FUNCTION
(A) MEANING: The COUNTA function counts cells containing any type of information,
including error values and empty text (""). .... If you do not need to count logical values, text,
or error values (in other words, if you want to count only cells that contain numbers), use the
COUNT function.

(B) SYNTAX: =COUNTA(value1, value2,…)

(C) EXAMPLE:

6
3. COUNT-BLANK FUNCTION
(A) MEANING: The Microsoft Excel COUNTBLANK function counts the number of empty
cells in a range. ... It can be used as a worksheet function (WS) in Excel. As a worksheet
function, the COUNTBLANK function can be entered as part of a formula in a cell of a
worksheet.

(B) SYNTAX: =COUNTBLANK(range)

(C) EXAMPLE:

7
4. SUM FUNCTION
(A) MEANING: The Microsoft Excel SUM function adds all numbers in a range of cells and
returns the result. The SUM function is a built-in function in Excel that is categorized as a
Math/Trig Function. As a worksheet function, the SUM function can be entered as part of a
formula in a cell of a worksheet.

(B) SYNTAX: =SUM(number1, number2,…)

(C) EXAMPLE:

5. MAX FUNCTION
(A) MEANING: The Microsoft Excel MAX function returns the largest value from the
numbers provided. .... It can be used as a worksheet function (WS) in Excel. As a worksheet
function, the MAX function can be entered as part of a formula in a cell of a worksheet.

(B) SYNTAX: =MAX(number1, number2,…)

(C) EXAMPLE:

8
6. MIN FUNCTION
(A) MEANING: The Microsoft Excel MIN function returns the smallest value from the
numbers provided. The MIN function is a built-in function in Excel that is categorized as a
Statistical Function. It can be used as a worksheet function (WS) in Excel.

(B) SYNTAX: =MIN(number1, number2,…)

(C) EXAMPLE:

9
7. AVERAGE FUNCTION
(A) MEANING:The Microsoft Excel AVERAGE function returns the average (arithmetic
mean) of the numbers provided. The AVERAGE function is a built-in function in Excel that
is categorized as a Statistical Function. It can be used as a worksheet function (WS) in Excel.

(B) SYNTAX: =AVERAGE(number1, number2,…)

(C) EXAMPLE:

8. COUNT-IF FUNCTION

(A) MEANING: The Microsoft Excel COUNTIF function counts the number of cells in a
range, that meets a given criteria ..... It can be used as a worksheet function (WS) in Excel. As
a worksheet function, the COUNTIF function can be entered as part of a formula in a cell of a
worksheet.

(B) SYNTAX: =COUNTIF(range, criteria)

(C) EXAMPLE:

10
9. COUNT-IFS FUNCTION
(A) MEANING: The Excel COUNTIFS function returns the count of cells that meet one or
more criteria. COUNTIFS can be used with criteria based on dates, numbers, text, and other
conditions. COUNTIFS supports logical operators (>,<,<>,=).

(B) SYNTAX: =COUNTIFS(range1, criteria1, range2, criteria2…)

(C) EXAMPLE:

11
10. SUM-IF FUNCTION
(A) MEANING: The SUMIF function is a worksheet function that adds all numbers in a
range of cells based on one criteria (for example, is equal to 2000). ... It can be used as a
worksheet function (WS) in Excel. As a worksheet function, the SUMIF function can be
entered as part of a formula in a cell of a worksheet.

(B) SYNTAX: =SUMIF(range, critera, [sum_range])

(C) EXAMPLE:

11. AVERAGE-IF FUNCTION


(A) MEANING: The Microsoft Excel AVERAGEIF function returns the average (arithmetic
mean) of all numbers in a range of cells, based on a given criteria. The AVERAGEIF
function is a built-in function in Excel that is categorized as a Statistical Function. It can be
used as a worksheet function (WS) in Excel.

(B) SYNTAX: =AVERAGEIF(range, critera, [sum_range])

(C) EXAMPLE:

12
12. CONCATENATE FUNCTION
(A) MEANING: The concatenate function is one of Excel's text functions. It is used to join
two or more words or text strings together. For example, sometimes data distributed over
multiple columns in an excel spreadsheet is more efficient to use when combined into one
column.

(B) SYNTAX: =CONCATENATE(text1,text2, text3)

(C) EXAMPLE:

13
13. V-LOOKUP FUNCTION
(A) MEANING: When the VLOOKUP function is called, Excel searches for a lookup value
in the leftmost column of a section of your spreadsheet called the table array. The function
returns another value in the same row, defined by the column index number.

(B) SYNTAX: =VLOOKUP(lookup_value, table_array, col_index_num, false)

(C) EXAMPLE:

14
15. V-LOOKUP + DROPDOWN LIST

(A) MEANING:In Excel, VLOOKUP and drop down list are two useful functions. For
example, you have a drop down list in a range, and when you select one kind of the fruits
from the drop down list, the relative price will be shown in the adjacent cell as below
screenshot shown.

(B) STEPS:

1. Select a cell where you want to create the drop down list.
2. Go to Data –> Data Tools –> Data Validation.
3. In the Data Validation dialogue box, within the Settings tab, select List as the
Validation criteria.
4. In the source field, enter source range, or simply click in the Source field and select
the cells using the mouse and click OK. This will insert a drop down list in the
required cell.
5. Now, in the corresponding cell where you want the lookup value, apply
‘=VLOOKUP(lookup_value, table_array, col_index_num, false)’, where lookup
value should be the cell where you applied drop down list.

(C) EXAMPLE:

15
17. H-LOOKUP FUNCTION
(A) MEANING: The Microsoft Excel HLOOKUP function performs a horizontal lookup by
searching for a value in the top row of the table and returning the value in the same column
based on the index_number. The HLOOKUP function is a built-in function in Excel that is
categorized as a Lookup/Reference Function.

16
(B) SYNTAX: =HLOOKUP(lookup_value, table_array, row_index_num, range_lookup)

(C) EXAMPLE:

17
16. TRANSPOSE OF A TABLE
(A) MEANING: The TRANSPOSE function returns a vertical range of cells as a horizontal
range, or vice versa. The TRANSPOSE function must be entered as an array formula in a
range that has the same number of rows and columns, respectively, as the source range has
columns and rows.

(B) STEPS:

1. Select the range of data you want to rearrange, including any row or column labels,
and either select Copy on the Home tab, or press CONTROL+C.

Note: Make sure you copy the data to do this. Using the Cut command or
CONTROL+X won’t work.

2. Select the first cell where you want to paste the data, and on the Home tab, click the
arrow next to Paste, and then click Transpose.

Pick a spot in the worksheet that has enough room to paste your data. The data you
copied will overwrite any data that’s already there.

(C) EXAMPLE:

18
19
19. TEXT TO COLUMN FUNCTION

(A) MEANING: To separate the contents of one Excel cell into separate columns, you can
use the 'Convert Text to Columns Wizard'.

(B) STEPS:

1. Highlight the range of text to be separated.


2. Go to Data, Data Tools, Text to Columns. The Convert Text to Columns Wizard opens.
3. Select Delimited from step 1 of the wizard and select next.
4. Now select the appropriate delimiter to separate the different text into different columns.

(C) EXAMPLE:

20
21
22
Next

STEPS:

1. Highlight the range of text to be separated.


2. Go to Data, Data Tools, Text to Columns. The Convert Text to Columns Wizard opens.
3. Select Fixed width from step 1 of the wizard and select next.
4. Now select the appropriate width to separate the different text into different columns.

23
24
18. CONDITIONAL FORMATTING – HIGHLIGHT
CELL RULES

(A) MEANING: Conditional formatting is a feature of Excel which allows you to apply a
format to a cell or a range of cells based on certain criteria.

(B) STEPS:

1. Select the cells you want to format.


2. Click the Conditional Formatting button under the Home menu, Styles section.
3. Select your rule- Highlight Cells Rules

25
• Greater than -

26
• LESS THAN

27
• Duplicate Value

28
19. CONDITIONAL FORMATTING – TOP/BOTTOM RULE:
Conditional formatting in Excel enables you to highlight cells with a certain colour,
depending on the cell’s value.

Step 1:

Select the range of cells, the table, or the whole sheet that you want to apply Conditional
Formatting to.

Step 2:

On the Home tab, Click on the Conditional Formatting. Go to Top/Bottom Rules and
select the required formatting to be applied.

29
20. CONDITIONALFORMATTING – DATA BARS:
Conditional formatting in Excel enables you to highlight cells with a certain colour,
depending on the cell’s value.

Step 1:

Select the range of cells, the table, or the whole sheet that you want to apply Conditional
Formatting to.

30
Step 2:

On the Home tab, Click on the Conditional Formatting. Go to Data Bars and select the
required formatting to be applied.

21. CONDITIONALFORMATTING – COLOR SCALE:


Conditional formatting in Excel enables you to highlight cells with a certain colour,
depending on the cell’s value.

Step 1:

Select the range of cells, the table, or the whole sheet that you want to apply Conditional
Formatting to.

31
Step 2:

On the Home tab, Click on the Conditional Formatting. Go to Color Scales and select the
required formatting to be applied.

22. FORMAT AS TABLES:


Create a table, then convert it back into a Range. On the worksheet, select a range of cells
that you want to format by applying a predefined table style.

Step 1:

Select the cells you want to format as a table. From the Home Tab, click the Format as Table
command.

Step 2:

32
Select a Table style from the drop-down menu. A dialog box will appear, confirming the
selected cell range for the table.

Step 3:

Click OK.

33
23. FORMAT CELLS:
FORMAT cells in Excel change the appearance of a number without changing the number
itself.

Step 1:

Select the cells you want to format. On the Format menu, click Cells. In the Format cells
dialog box, make the required customisations.

Step 2:

RESULT.

34
24. CELLSTYLES:
Excel has CELL styles which make it more efficient to style your Excel worksheet.

Step 1:

Select the cells which you want to style. On the Home tab, click on Cell Styles.

Step 2:

Click on the desired style.

35
25. DATAVALIDATION – SETTINGS:
DATA VALIDATION is a feature in Excel used to control what a user can enter.

Step 1:

Select the cells you want to create for. Select Data Validation under the Data tab. Select the
list option under Allow.

Step 2:

RESULT.

For Number – select whole Number

36
26. DATAVALIDATION – INPUTMESSAGE:
DATA VALIDATION is a feature in Excel used to control what a user can enter.

Step 1:

Select the cells you want to create for. Select Data Validation under the Data tab. Enter the
Input Message that may tell the user what data to enter.

Step 2:

RESULT.

37
For - Number

Result

38
27. DATAVALIDATION – ERRORALERT:
DATA VALIDATION is a feature in Excel used to control what a user can enter.

Step 1:

Select the cells you want to create for. Select Data Validation under the Data tab. Enter the
Error Alert message that will detect the wrong input.

Step 2:

RESULT.

39
For Numbers

Result

40
28. CUSTOMIZATION – RIBBON:

Step 1:

Right-Click the Ribbon and select Customize the Ribbon from the drop-down menu.

Step 2:

41
The Excel Options dialog box will appear. Locate and select New Tab. Make sure the New
Group is selected, select a command, then click Add. You can also drag commands directly
into a group.

Step 3:

When you are done adding commands, click OK. The commands will be added to the
Ribbon.

42
29. CUSTOMIZATION – QUICKACCESSTOOLBAR:
The Quick Access Toolbar is a customizable toolbar that contains a set of commands that are
independent of the tab on the ribbon that is currently displayed.

Step 1:

Right-Click the Ribbon and select Customize Quick Access Toolbar from the drop-down
menu.

Step 2:

In the choose Commands from list, click Commands Not in the Ribbon.

43
Step 3:

Find the command in the list, and then click Add.

30. BACKSTAGEVIEW:
Backstage view is an option that allows you to manipulate aspects of a file. The backstage
view gives access to saving, opening, info about the open file, creating a new file, printing,
and recently opened files.

44
First column of the backstage view will have the following options −

S.No. Option & Description

1 Save

If an existing sheet is opened, it would be saved as is, otherwise it will display a


dialogue box asking for the sheet name.

2 Save As

A dialogue box will be displayed asking for sheet name and sheet type. By
default, it will save in sheet 2010 format with extension .xlsx.

3 Open

This option is used to open an existing excel sheet.

4 Close

This option is used to close an opened sheet.

5 Info

This option displays the information about the opened sheet.

6 Recent

This option lists down all the recently opened sheets.

7 New

This option is used to open a new sheet.

8 Print

This option is used to print an opened sheet.

9 Save & Send

This option saves an opened sheet and displays options to send the sheet using

45
email etc.

10 Help

You can use this option to get the required help about excel 2010.

11 Options

Use this option to set various option related to excel 2010.

12 Exit

Use this option to close the sheet and exit.

46
31. SAVEASADOBEPDF:
Step 1:

Open your Excel workbook and select ranges or tables you want to convert to a PDF file.

Step 2:

47
In Excel click File>Save As. In the Save As dialog window, select PDF from the “Save As
type” drop-down list.

48
DATA
VISUALIZATION AND
ANALYSIS

49
✓ Qualitative data
✓ Quantitative data

Qualitative data

The qualitative data includes the following aspects which are as follows:-

• Frequency distribution
• Relative frequency distribution
• Percent frequency distribution
• Graphs

Frequency distribution
Firstly, we are going to elaborate frequency distribution using an example which
includes the data which is in ungrouped in nature.
There are 20 guest rated the quality of accommodation on a 5 point scale
❖ Excellent
❖ Above average
❖ Average
❖ Below average
❖ Poor

The responses were

50
Now, we have to categories the sum of each category rated by 20 people by using
above aspects

FREQUENCY:

The FREQUENCY function in Excel calculates how often values occur within the ranges you
specify in a bin table.

RELATIVEFREQUENCY:
Relative Frequency of a particular observation or class interval is found by dividing the
frequency (f) by the number of observations (n).

51
PERCENTAGE FREQUENCY:
The Percentage Frequency is found by multiplying each relative frequency value by 100.

52
• BARGRAPH:
A Bar Chart is the horizontal version of column chart.

Step 1:

Select the range of cells which are to be represented under bar graph.

Step 2:

Go to Insert>Bar Chart.

53
• HISTOGRAM USING GRAPH TAB:
Histogram is a graphical representation of the distribution of numerical data.

Step 1:

Select the range of cells which are to be represented under bar graph.

Step 2:

Go to Data > Data Analysis.

54
Step 3:

Select Histogram and click OK.

Discrete series

When the data is given in the frequency Given, there is a mark obtained by students in a class
from 0-100 in a particular subject which is distributed in different bins. The following are the
data recorded:-

marks bin
86 10
76 20
84 30
96 40
1 50
31 60
12 70
19 80
35 90

55
56
bin Frequency Cumulative % bin Frequency Cumulative %
10 1 11.11% 20 2 22.22%
20 2 33.33% 40 2 44.44%
30 0 33.33% 90 2 66.67%
40 2 55.56% 10 1 77.78%
50 0 55.56% 80 1 88.89%
60 0 55.56% More 1 100.00%
70 0 55.56% 30 0 100.00%
80 1 66.67% 50 0 100.00%
90 2 88.89% 60 0 100.00%
More 1 100.00% 70 0 100.00%

Histogram
2.5 120.00%
2 100.00%
Frequency

80.00%
1.5
60.00%
1
40.00% Frequency
0.5 20.00% Cumulative %
0 0.00%
20
40
90
10
80

30
50
60
70
More

bin

57
• PIVOT TABLES AND ITS TOOLS:
A Pivot Table allows you to extract the significance from a large, detailed data set.

A pivot table is a program tool that allows you to reorganize and summarize selected columns
and rows of data in a spreadsheet or database table to obtain a desired report. A pivot table
doesn't actually change the spreadsheet or database itself. In database lingo, to pivot is to turn
the data to view it from different perspectives. A pivot table is especially useful with large
amounts of data.

Step 1:

Select the range of cells for the extraction.

Step 2:

Go to Insert > Pivot Table.

58
Step 3:

Choose the required fields to be added for the report.

59
• PIVOT CHARTS AND ITS TOOLS:
A Pivot Chart is the graphical representation of a Pivot Table in Excel.

Step 1:

Click any cell inside the Pivot Table. On the Charts tab, in the Insert group, click on Pivot
Chart.

Step 2:

Select the desired format of Pivot Chart and click OK.

60
• HISTOGRAM FREQUENCY DISTRIBUTION
HISTOGRAM – CHART OUTPUT

Bin Frequency
10 1
20 2
30 0
40 2
50 0
60 0
70 0
80 1
90 2
More 1

Histogram
3
Frequency

2
1
0 Frequency

bin

61
HISTOGRAM – PARETO (SORTED DIAGRAM)

Bin Frequency bin Frequency


10 1 20 2
20 2 40 2
30 0 90 2
40 2 10 1
50 0 80 1
60 0 More 1
70 0 30 0
80 1 50 0
90 2 60 0
More 1 70 0

HISTOGRAM – CUMMALATIVE PERCENTAGE

Bin Frequency Cumulative %


10 1 11.11%
20 2 33.33%
30 0 33.33%
40 2 55.56%
50 0 55.56%
60 0 55.56%

62
70 0 55.56%
80 1 66.67%
90 2 88.89%
More 1 100.00%

• DESCRIPTIVE STATISTICS:
Descriptive statistics are one of the fundamental “must knows” with any set of data.

Step 1:

Enter the data for which descriptive statistics data is to be obtained.

Group Group
A B
76 95
87 97
98 87
45 89
66 87
78 45
76 76
88 56
78 76
87 87
54 45
65 76
76 45
89 88
65 76
78 66
54 78
87 56
45 77

63
Step 2:

Go to Data > Data Analysis > Descriptive Statistics.

Step 3:

RESULT.

64
• DESCRIPTIVE STATISTICS FOR VARIOUS SCALES:
Descriptive statistics are one of the fundamental “must knows” with any set of data.

Step 1:

Enter the data for which descriptive statistics data is to be obtained.

Group Group
A B
76 95
87 97
98 87
45 89
66 87
78 45
76 76
88 56
78 76
87 87
54 45
65 76
76 45
89 88
65 76
78 66
54 78
87 56
45 77

65
Step 2:

Go to Data > Data Analysis > Descriptive Statistics.

Step 3:

RESULT.

66
• CORRELATION:
A Correlation coefficient (a value between -1 and +1) tells how strongly two variables are
related to each other.

Step 1:

Enter the data for which descriptive statistics data is to be obtained.

Group Group
A B
76 95
87 97
98 87
45 89
66 87
78 45
76 76
88 56
78 76
87 87
54 45
65 76
76 45
89 88
65 76
78 66
54 78
87 56
45 77

67
Step 2:

Go to Data > Data Analysis > Correlation.

Step 3:

RESULT.

68
HYPOTHESIS
TESTING

69
One sample t test using dummy (one – tailed)
T-test one sample assuming equal variances (one-tailed)

Problem Statement-
To determine whether the population mean age is greater than 40 at α = 0.05.

Hypothesis

Null hypothesis-HO : µ<=40

Alternate hypothesis- Ha : µ>40

Age Dummy
18 0
24 0
56
78
67
24
65
89
76
23
45
65
78
55
32
33
44

26
76

Steps- Go to data- data analysis- t-test two sample assuming equal variance

70
71
t-Test: Two-Sample Assuming Equal Variances

Age Dummy
Mean 51.26315789 0
Variance 531.4269006 0
Observations 19 2
Pooled Variance 503.4570637
Hypothesized Mean Difference 40
df 19
t Stat 0.675244576
P(T<=t) one-tail 0.253827763
t Critical one-tail 1.729132812
P(T<=t) two-tail 0.507655526
t Critical two-tail 2.093024054

Decision rule-

1 if t stat is greater than critical, Reject Null hypothesis

2. If p value is less than alpha (5%), Reject Null hypothesis

Here,

t stat=0.675,which is less than t critical= 1.740 therefore null hypothesis is not rejected

Inference:
Null hypothesis is accepted and alternate hypothesis is Rejected therefore,mean of age of
population is greater than 40

• T-test one sample assuming equal variances


Problem Statement-
To analyse that the time spent by full time students in studying statistics is more than the time
spent by part time students at α=0.05

Hypothesis
Null Hypothesis- H0 : µf - µp<=0
Alternate hypothesis- H1 : µf - up> 0

full part
time time
3.2 3.1
1.5 3.4
6.5 4.6

72
0.2 2.8
3.7 2.3
3.3 1.5
1.7 2.8
3.6 9.5
3.8 4.3
5.3 2.7
6.9 1.6
3.6 1.6
1.7 3.2
1.2 4.2
7.2 3.9
3.9 1.2
1.9
5.3

Steps- Go to data- data analysis- t-test two sample assuming equal variance

73
t-Test: Two-Sample Assuming Equal Variances

Variable 1 Variable 2
Mean 3.583333333 3.29375
Variance 4.133235294 3.843291667
Observations 18 16
Pooled Variance 3.997324219
Hypothesized Mean Difference 0
df 32
t Stat 0.421546668
P(T<=t) one-tail 0.338087152
t Critical one-tail 1.693888748
P(T<=t) two-tail 0.676174305
t Critical two-tail 2.036933343

Decision rule-

1 if t stat is greater than t critical, Reject Null hypothesis

2. If p value is less than alpha (5%), Reject Null hypothesis

Here,t-stat is 0.421 and t critical is 1.69 which is greater than t-stat, so null hypothesis is
accepted

Here, p value is 0.338 and alpha is 0.05 which is less than p value, so null hypothesis is
accepted

Inference:

There is enough evidence that the time spent by full time students in studying statistics is not
more than the time spent by part time students

• T-Test: Two-Sample Assuming Unequal Variances

Problem Statement- two types of drugs were used on 5 and 7 patients for reducing their
weight. Drug A was imported and drug B was indigenous. The decrease in the weight after
using drugs for six months as follows

Drug A Drug B

10 8

74
12 9
13 12
11 14
14 13
10
9

Is there a signifiant difference in the efficiency of the two drugs?

Hypothesis-

Null hypothesis- There is no difference in efficiency of 2 drugs

H₀:µa-µb=0

Alternate hypothesis- There is difference in efficiency of 2 drugs

H1: µa-µB≠0

Steps- Go to data- data analysis- t-test two sample assuming unequal variances

75
t-Test: Two-Sample Assuming Unequal Variances

Drug A Drug B
Mean 12 10.71428571
Variance 2.5 5.238095238
Observations 5 7
Hypothesized Mean Difference 0
df 10
t Stat 1.150760914
P(T<=t) one-tail 0.138301959
t Critical one-tail 1.812461123
P(T<=t) two-tail 0.276603918
t Critical two-tail 2.228138852

Decision rule:

If value of t stat is > t critical reject the null hypothesis


If p value is ≤ α reject null hypothesis

Here, t stat is 1.15, t critical is 2.228 which is > t stat, therefore accept null hypothesis
Here, p value is 0.27, alpha is 0.05 which is < p value, therefore accept null hypothesis

Inference:- There is no significant difference in efficiency of 2 drugs.

• T – Test paired two samples for means


Research problem: following are the weights of 8 persons to test the
effectiveness of the diet. These are the weights before the consumption and after
the consumption of the diets.
Before After
162 168
170 136
184 147
164 159
172 143
176 161
159 143
170 145
You are required to determine whether the diet was effective or not.

Hypothesis-

Null hypothesis- Ho : The diet was not effective; ub-ua<=0

76
Alternate hypothesis- Ha : The diet was effective; ub-ua>0

Steps- Go to data- data analysis- t-Test: Paired Two Sample for Means

77
t-Test: Paired Two Sample for Means

before after
Mean 169.625 150.25
Variance 65.125 121.9286
Observations 8 8
Pearson Correlation -0.17675
Hypothesized Mean Difference 0
df 7
t Stat 3.706873
P(T<=t) one-tail 0.003793
t Critical one-tail 1.894579
P(T<=t) two-tail 0.007586
t Critical two-tail 2.364624

Decision rule-

1 if t stat is greater than t critical, Reject Null hypothesis

2. If p value is less than alpha (5%), Reject Null hypothesis

Here, t stat is 3.706 and t critical is 2.364, so null hypothesis is rejected

Here, p value(0.007) is less than alpha(0.05), so null hypothesis is rejected

Inference

There is enough evidence that diet was effective

1. Coaching was given to students for statistical software. After their result was
evaluated in January in order to improve their performance in April exams .Determine
if the coaching is successful.

T – Test: Paired two sample for mean

JAN MAY

45 56
54 57
44 45
56 67
34 44
45 44

78
34 34
67 76
45 56
54 45
67 76
56 87
56 66
56 65
76 45
76 76

H0 :µj : the coaching is not successful i.e. µm -µj ≤ 0


H1 : µm : the coaching is successful i.e. µm -µj >0

79
t-Test: Paired Two Sample for Means

Jan May
Mean 54.0625 58.0625
Variance 164.3291667 258.0625
Observations 16 16
Pearson Correlation 0.591118937
Hypothesized Mean Difference 0
df 15
t Stat -1.19611891
P(T<=t) one-tail 0.125107938
t Critical one-tail 1.753050356
P(T<=t) two-tail 0.250215876
t Critical two-tail 2.131449546

HYPOTHESIS: µm-µj >0 , H1

µm-µj ≤ 0 , H0

DIRECTION:

SINCE , t stat is < t critical , Accept Null hypothesis

P (one tail) >α , Accept Null Hypothesis

80
INFERENCE: "Therefore, the coaching for the students is Unsuccessful because there is no
improvement .We accept Null Hypothesis.

• TWO SAMPLE – INDEPENDENT SAMPLE T-TEST


RESEARCH PROBLEM:

To analyse that there is a significant difference between the marks scored by Class Groups A
and B in maths at alpha=0.05.

Group Group
A B
76 95
87 97
98 87
45 89
66 87
78 45
76 76
88 56
78 76
87 87
54 45
65 76
76 45
89 88
65 76
78 66
54 78
87 56
45 77

Let mean marks scored by Group A be µA.

Let mean marks scored by Group B be µB.

Therefore, H1: µA - µB ≠ 0 (ALTERNATE)

H2: µA - µB = 0 (NULL)

81
Step 1:

Go to Data > Data Analysis > t-Test: Two-Sample Assuming Equal Variances.

Step 2:

Click OK.

OUTPUT:

t-Test: Two-Sample Assuming Equal Variances

82
Group A Group B
Mean 73.26316 73.78947
Variance 236.7602 287.3977
Observations 19 19
Pooled Variance 262.0789
Hypothesized Mean Difference 0
df 36
t Stat -0.10021
P(T<=t) one-tail 0.460369
t Critical one-tail 1.688298
P(T<=t) two-tail 0.920737
t Critical two-tail 2.028094

DECISION RULE:

• If t stat > t critical, reject null and accept alternate.


• If p value <α, reject null and accept alternate.
Here, t stat <t critical i.e. 0.10<2.02, we will accept null and reject alternate hypothesis.

And p value >α i.e. 0.92>0.05, we will accept null and reject alternate.

INFERENCE:

Therefore, there is no significant difference between Group A and Group B at alpha=0.05.

• TWOSAMPLE – PAIREDSAMPLET-TEST
RESEARCH PROBLEM:

Is there sufficient evidence to suggest that the mean to exhaustion is greater after chocolate
milk than after carbohydrate replacement drink? Use a significant level of 0.05.

chocolate carbohydrate replacement


cyclist milk drink
1 50.46 42.9
2 47.08 50.1
3 57.51 41.67
4 46.6 32.69
5 29.1 46.33
6 57.5 31.63
7 23.87 20.61

83
8 28.65 14.99
9 35.37 20.11

Let Chocolate Milk be µCM.

And Carbohydrate Replacement Drink be µCD.

Null hypothesis-there is no significant difference

Hο:µcm-µcd≤0

Alternate hypothesis- There is significant difference

H1: µcm-µcd>0

α=0.05
.

Step 1:

Go to Data > Data Analysis > t-Test: Paired Two Sample for Means.

Step 2:

Click OK.

84
OUTPUT:

t-Test: Paired Two Sample for Means

chocolate
milk carbohydrate replacement drink
Mean 41.79333333 33.44777778
Variance 164.53125 160.9338194
Observations 9 9
Pearson Correlation 0.508406248
Hypothesized Mean Difference 0
Df 8
t Stat 1.979280834
P(T<=t) one-tail 0.0415706
t Critical one-tail 1.859548038
P(T<=t) two-tail 0.083141199
t Critical two-tail 2.306004135

DECISION RULE:

• If t stat > t critical, reject null and accept alternate.


• If p value <α, reject null and accept alternate.
Here, t stat > t critical i.e. 1.97>1.85, we will reject null and accept alternate hypothesis.

And p value < α i.e. 0.04 < 0.05, we will reject null and accept alternate.

85
INFERENCE:

Therefore we can suggest that MEAN time for exhaustion of chocolate milk is greater than
UCD.

• Z-Test: One Sample


Problem statement- Given are the age in years for 35 employees. You are
required to determine whether or not population mean age differs significantly
from 23 years. Assume population standard deviation as 5 and alpha 10%

Hypothesis

Null hypothesis- the population mean age is equal to 23

H0 : μ=23
Alternate hypothesis- the population mean age differs from 23

H1 : μ≠23
Age DUMMY
25 0
21
21
20
30
22
20
20
23
18
21
23
21
20
21
22
24
24
19
23
22
24
21

86
19
24
22
19
22
25
23
24
19
22
19
25

Steps- Go to data-data anaylsis-z-Test: Two Sample for Means

87
z-Test: Two Sample for Means

age DUMMY
Mean 21.94285714 0
Known Variance 25 0.0001
Observations 35 1
Hypothesized Mean Difference 23
z -1.250740748
P(Z<=z) one-tail 0.105514539
z Critical one-tail 1.644853627
P(Z<=z) two-tail 0.211029079
z Critical two-tail 1.959963985

Decision rule-

1. If z stat is greater than z critical value, reject null hypothesis


2. If p value is less than alpha, reject null hypothesis

Here, z stat is 1.25 and z critical is 1.95 so accept null hypothesis

Here, p value is 0.211 and alpha is 0.10, accept null hypothesis

Inference-

There is sufficient evidence that population mean age does not differ significantly from 23

88
• TWOSAMPLEZ-TEST
RESEARCH PROBLEM:

The returns on investment after deducting all relevant fees.

Can investors do better by buying mutual funds directly from banks or other financial institutions than
by purchasing mutual funds through brokers? Can we conclude at 5% significance level that directly
purchased mutual funds out performed mutual funds through brokers?

Direct Broker
9.33 3.24
6.94 -6.76
16.17 12.8
16.97 11.1
5.94 2.73
12.61 -0.13
3.33 18.22
16.13 -0.8
11.2 -5.75
1.14 2.59
4.68 3.71
3.09 13.15
7.26 11.05
2.05 -3.12
13.07 8.94
0.59 2.74
13.57 4.07
0.35 5.6
2.69 -0.85
18.45 -0.28
4.23 16.4
10.28 6.39
7.1 -1.9
-3.09 9.49
5.6 6.7
5.27 0.19
8.09 12.39
15.05 6.54
13.21 10.92
1.72 -2.15
14.69 4.36
-2.97 -11.07

89
10.37 9.24
-0.63 -2.67
-0.15 8.97
0.27 1.87
4.59 -1.53
6.38 5.23
-0.24 6.87
10.32 -1.69
10.29 9.43
4.39 8.31
-2.06 -3.99
7.66 -4.44
10.83 8.63
14.48 7.06
4.8 1.57
13.12 -8.44
-6.54 -5.72
-1.06 6.95

Let Direct Investment be µD and Broker Investment be µB.

Therefore, H1: µD - µB> 0 (ALTERNATE)

H0: µD - µB ≤ 0 (NULL)

In Z-test we need to find Variance of the two variables.

1. In case of Direct – 37.48818


2. In case of Broker – 43.33928

90
Step 1:

Go to Data > Data Analysis > z-Test: Two Sample for Means.

Step 2:

Click OK.

OUTPUT:

91
z-Test: Two Sample for Means

Direct Broker
Mean 6.6312 3.7232
Known Variance 37.48818 43.33928
Observations 50 50
Hypothesized Mean Difference 0
z 2.287177862
P(Z<=z) one-tail 0.011092722
z Critical one-tail 1.644853627
P(Z<=z) two-tail 0.022185444
z Critical two-tail 1.959963985

DECISION RULE:

• If z > z critical, reject null and accept alternate hypothesis.


• If p value < alpha, reject null and accept alternate hypothesis.
Here z > z critical i.e. 2.28 > 1.64, reject Null hypothesis.
Also p value < alpha i.e. 0.01 < 0.05, reject Null hypothesis.

INFERENCE:

There is enough significance that directly purchased mutual funds out performed mutual funds
through brokers

• Annova - One way or single factor


A single factor or one way ANNOVA is used to test the null hypothesis that the means of
several populations are all equal

Problem statement-The salaries of people who have a degree of economics, medical and
history

The required data is as followed:-

economies medicine History


42 69 35
53 54 40
49 58 53
53 64 42

92
43 64 50
44 55 39
45 56 55
52 0 39
54 0 40

Hypothesis:-

Null hypothesis- that there is no significant difference in mean marks for economics,
medicine and history

H0:- µ1 = µ2 = µ3

Alternate hypothesis- H1:-At least one of the means are different

Step 1:

Go to Data > Data Analysis >Anova: Single Factor.

Step 2:

Click OK.

93
OUTPUT:

Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
economics 9 435 48.33333 23.5
medicine 9 420 46.66667 724.25
history 9 393 43.66667 50.5

ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 100.6667 2 50.33333 0.189164 0.828873 3.402826
Within Groups 6386 24 266.0833

Total 6486.667 26
Decision rule-

1. If f stat is greater than f crirical, reject null hypothesis


2. If p value is less than alpha(0.05), reject null hypothesis

Here, F stats is less than F critical and P-vale is greater than alpha i.e. 0.05. Therefore, we
accept the null hypothesis

Inference- there is sufficient evidence thatthere is no significant difference in mean marks


for economics, medicine and history

i.e., H0:- µ1 = µ2 = µ3

94
• Two factor annova without replication
Problem statement- To test whether or not marks of students differ with respect to student
and subject both.

Hypothesis

Null hypothesis-

row wise – there is no significant difference in marks of student

column wise- there is no significant difference in marks for three subjects that is economic
science and history

Alternate hypothesis-

row wise – there is significant difference in marks of student

column wise- there is significant difference in marks for three subjects that is economic
science and history

Student economics science history


A 42 69 35
B 53 54 40
C 49 58 51
D 53 64 42
E 43 64 50

Steps-

Go to data- data analysis- annova two factor without replication

95
Anova: Two-Factor Without
Replication

SUMMARY Count Sum Average Variance


A 3 146 48.66667 322.3333
B 3 147 49 61
C 3 158 52.66667 22.33333
D 3 159 53 121

96
E 3 157 52.33333 114.3333

Economics 5 240 48 28
Science 5 309 61.8 34.2
History 5 218 43.6 46.3

ANOVA
Source of Variation SS df MS F P-value F crit
Rows 53.73333 4 13.43333 0.282609 0.881261 3.837853
Columns 901.7333 2 450.8667 9.485273 0.007741 4.45897
Error 380.2667 8 47.53333

Total 1335.733 14

Decision rule-

1 if F stat is greater than F critical, Reject Null hypothesis

2. If p value is less than alpha (5%), Reject Null hypothesis

Row wise:

Here F stat is 0.30 and F critical is 3.83, so null hypothesis is accepted.

Here p value is 0.86 which is greater than 5%, so null hypothesis is accepted.

Column wise:

Here F stat is 8.59 and F critical is 4.458, so null hypothesis is rejected

Here p value is 0.010 which Iess than 5%, so null hypothesis is rejected

Inference:

Row wise

There is enough evidence that marks of students do not differ significantly.

Column wise

There is enough evidence that marks of students differ for three subjects significantly.

97
ANOVA – Two Factor with replication

economics medicine history


School A 42 69 35
53 54 40
49 58 53
53 64 42
43 64 50
School B 44 55 39
45 56 55
52 0 39
54 0 40
0 0 0

98
Anova: Two-Factor With
Replication

SUMMARY economics medicine history Total


School A
Count 5 5 5 15
Sum 240 309 220 769
Average 48 61.8 44 51.26666667
Variance 28 34.2 54.5 95.63809524

School B

99
Count 5 5 5 15
Sum 195 111 173 479
Average 39 22.2 34.6 31.93333333
Variance 494 924.2 420.3 579.4952381
Total
Count 10 10 10
Sum 435 420 393
Average 43.5 42 39.3
Variance 254.5 861.5555556 235.5666667

ANOVA
Source of Variation SS df MS F P-value F crit
Sample 2803.333 1 2803.333333 8.602700491 0.007272 4.259677
Columns 90.6 2 45.3 0.139013912 0.870912 3.402826
Interaction 1540.467 2 770.2333333 2.363645663 0.115611 3.402826
Within 7820.8 24 325.8666667

Total 12255.2 29

Decision rule-

1 if F stat is greater than F critical, Reject Null hypothesis

2. If p value is less than alpha (5%), Reject Null hypothesis

Sample

Here, F stat is 8.602 and F critical is 4.259, so reject null hypothesis

Here, p value is 0.007 which is less than 5%, so null hypothesis is rejected

Column wise

Here, F stat is 0.139 and F critical is 3.402, so null hypothesis is accepted

Here, p value is 0.139 which is greater than alpha 5%, so null hypothesis is accepted

Interaction wise

Here, F stat is 2.363 and F critical is 3.402, so null hypothesis is accepted

Here, p value is 0.115 which is greater than alpha 5%, so null hypothesis is accepted

100
• F-TEST
Determine whether or not there is a significant difference between variances of two data sets.

Group 1 Group 2
150 125
175 165
160 130
130 155
160 170
145 150

NULL HYPOTHESIS

There is no significant difference between variances of two data sets

ALTERNATE HYPOTHESIS

There is a significant difference between variances of two data sets

Alpha= 0.05

101
Since the variance for group 1 is less than variance of group 2 we will swap the ranges.

102
Decision rule-

1 if F value is greater than F critical, Reject Null hypothesis

2. If p value is less than alpha (5%), Reject Null hypothesis

Here, p value is 0.357 which is greater than alpha, so null hypothesis is accepted

Inference

There is no significant difference between variances of two data sets

• Chi square test


Discrete series
Problem Statement - A Company is concerned about the increase in violent alterations
between its employees. The no. of violent incidents recorded by management during six
months randomly selected months.to determine whether or not crime rate in the company is
associated with month.

Hypothesis

Null hypothesis : crime rate in the company is not associated with month

Alternate hypothesis: crime rate in the company is associated with month

103
Month Observed Expected O-E (O-E)^2/E
Jan 55 70 -15 3.214286
Feb 65 70 -5 0.357143
Mar 68 70 -2 0.057143
Apr 72 70 2 0.057143
May 78 70 8 0.914286
Jun 82 70 12 2.057143
cal value = 6.657143

5% level of confidence
Degree of freedom= (r-1)(c-1)
Degree of freedom = (6-1)(2-1) = 5

Table Value=11.07

CHIQ.TEST 0.247413

Decision Rule –
If Cal value is greater than tab value reject Null hypothesis
If P value is less than Alpha Reject Null hypothesis

Here, Cal value is 6.657 and tab value is 11.07, so null hypothesis is accepted

Here, P value is 0.247 which is greater than Alpha, so null hypothesis is accepted

Inference,

There is enough evidence that crime rate in the company is not associated with month

• Continuous Series

104
Problem statement- To determine whether brand preference is independent of age group

Hypothesis

Null: there is no association between brand preference and age group


Alternate: there is association between brand preference and age group

Age/Brand Brand1 Brand2 Brand3 Row Total


15-25 65 76 72 213
26-35 60 40 64 164
36-45 45 52 50 147
46-55 55 65 60 180
Column Total 225 233 246 704

Degree of Freedom= (r-1)(c-1) = (4-1)(3-1) = 6

alpha=0.05

table value= 12.5916

Observed Expected O-E (O-E)^2/e


65 68.0752841 -3.07528409 0.138925197
60 52.4147727 7.58522727 1.097699557
45 46.9815341 -1.98153409 0.083574907
55 57.5284091 -2.52840909 0.11112514
76 70.4957386 5.50426136 0.429769143
40 54.2784091 -14.2784091 3.756060091
52 48.6519886 3.34801136 0.230395106
65 59.5738636 5.42613636 0.494226059
72 74.4289773 -2.42897727 0.079269269
64 57.3068182 6.69318182 0.781733907
50 51.3664773 -1.36647727 0.036351727
60 62.8977273 -2.89772727 0.13349963
Cal value= 7.372629732

105
p value= 0.768154456

Decision Rule –
If Cal value is greater than tab value reject Null hypothesis
If P value is less than Alpha Reject Null hypothesis

Here, Cal value is 7.372 and tab value is 12.59, so null hypothesis is accepted

Here, P value is 0.768 which is greater than Alpha(0.05), so null hypothesis is accepted

Inference

There is enough significance that there is no association between brand preference and age
group

106
INTRODUCTION
TO R

107
• FOURPANESINR:
The R Studio interface consists of four main panes, or windows:

1. TOPLEFT:
Text editor or script window. This is where you can save and edit collections of commands.

2. TOPRIGHT:
Environment and history window. The environment window contains objects (data, values,
functions) R has currently stored in its memory. The history window shows all commands
that were executed in the Console.

3. BOTTOMLEFT:

108
Console or command window. Here you can type any valid R command after the prompt
followed by Enter and R will execute that command

4. BOTTOMRIGHT:
Files, plots, packages, help, and viewer pane. Here you can open files, view plots, install and
load packages, read man pages, and view markdown and other documents in the viewer tab.

• IMPORTOFDATASHEETINEXCEL:

109
Importing data into R is a necessary step that, at times, can become time intensive. To ease
this task, RStudio includes new features to import data from: csv, xls, xlsx, sav, dta, por, sas
and stata files.

Step 1:

Go to Files > Import Dataset > From Excel.

Step 2:

Go to Browse, select the file to be imported.

Step 3:

Select the sheet to be imported from Default.

110
• FTEST:
F-Test is used to assess whether the variances of two populations (A and B) are equal.

111
Question:

To determine whether or not there is a significant difference between variances of two data
sets.

Group 1 Group 2
150 125
175 165
160 130
130 155
160 170
145 150

Let Group 1 be µ1 and Group 2 be µ2.

NULL HYPOTHESIS

There is no significant difference between variances of two data sets

ALTERNATE HYPOTHESIS

There is a significant difference between variances of two data sets

Alpha= 0.05

>var.test(boook1rm$`group 1`,boook1rm$`group 2`,alternative ="two.sided")

F test to compare two variances

data: boook1rm$`group 1` and boook1rm$`group 2`


F = 0.70823, numdf = 5, denomdf = 5, p-value = 0.7142
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.09910322 5.06127790
sample estimates:
ratio of variances
0.7082294

112
INTERPRETATION:

If p value is greater than alpha accept null and reject alternate.

Here, p value i.e., 0.7142 > alpha i.e., 0.05, thus accept null hypothesis.

INFERENCE:

There is no significant difference in Group 1 and Group 2 at alpha=0.05.

Chi square test


Problem Statement - A Company is concerned about the increase in violent alterations
between its employees. The no. of violent incidents recorded by management during six
months randomly selected months.to determine whether or not crime rate in the company is
associated with month.

Hypothesis

Null hypothesis : crime rate in the company is not associated with month

Alternate hypothesis: crime rate in the company is associated with month

113
Month Observed
Jan 55
Feb 65
Mar 68
Apr 72
May 78
Jun 82

> table(Book1)

Observed
Month 55 65 68 72 78 82
Apr 0 0 0 1 0 0
Feb 0 1 0 0 0 0
Jan 1 0 0 0 0 0
Jun 0 0 0 0 0 1
Mar 0 0 1 0 0 0
May 0 0 0 0 1 0

114
> chisq.test(table(Book1$Month,Book1$Observed))

Pearson's Chi-squared test

data: table(Book1$Month, Book1$Observed)


X-squared = 30, df = 25, p-value = 0.2243

Decision Rule –
If Cal value is greater than tab value reject Null hypothesis
If P value is less than Alpha Reject Null hypothesis

Here, P value is 0.2243 which is greater than Alpha, so null hypothesis is accepted

Inference,

There is enough evidence that crime rate in the company is not associated with month

115

You might also like