Research Methododlgy Lab File
Research Methododlgy Lab File
PRACTICAL FILE
BACHELOR OF B U S I N E S S
ADMINISTRATION
Submitted by
NAME : ALEX T BINOY
ENROLEMENT NO. : 00516701720
1
INDEX
Topic Page No.
Functions in Excel
Count 6
CountA 6–7
Count Blank 7
Sum 8
Max 8–9
Min 9
Average 10
CountIf 10 – 11
SumIF 11
Average If 12 – 13
Concatenate 13
VlookUp 14
Vlookup+ Dropdown 15 – 16
Hlookup 16 – 17
Other tools
Transpose table 18 – 20
Text to Column 21 – 25
Conditional Formatting – Highlight Cell rules (greater than, less than,
between, equal to, text that contains, a date occurring, duplicate 25 – 29
values)
Conditional Formatting – Top/ Bottom rules 29 – 30
Conditional Formatting – Data Bars 30 – 31
Conditional Formatting – Color Scales 31 – 32
Format as Tables 32 – 33
Format Cells – Number, Alignment, Font, Border, Fill 34
Cell Styles 35
Data validation – settings ( any value, number , custom) 36 – 37
Data validation – input message 37 – 39
Data validation – error alert 39 – 41
Customization - ribbon 41 – 42
Customization- quick access toolbar 43 – 44
backstage view 44 – 47
save as adobe pdf 47 – 48
Data Visualization and Analysis
Frequency 51
Relative frequency 51 – 52
Percentage frequency 52
Bar Graph 53
Histogram using Graph tab 54 – 57
2
Pivot Table and its tools 58 – 59
Pivot Chart and its tools 60
Introduction to R
3
FUNCTIONS IN
EXCEL
4
Data - Enter the date for 26 random people.
5
1. COUNT FUNCTION
(A) MEANING:The COUNT function counts the number of cells that contain numbers, and
counts numbers within the list of arguments. Use the COUNT function to get the number of
entries in a number field that is in a range or array of numbers.
(C) EXAMPLE:
2. COUNTA FUNCTION
(A) MEANING: The COUNTA function counts cells containing any type of information,
including error values and empty text (""). .... If you do not need to count logical values, text,
or error values (in other words, if you want to count only cells that contain numbers), use the
COUNT function.
(C) EXAMPLE:
6
3. COUNT-BLANK FUNCTION
(A) MEANING: The Microsoft Excel COUNTBLANK function counts the number of empty
cells in a range. ... It can be used as a worksheet function (WS) in Excel. As a worksheet
function, the COUNTBLANK function can be entered as part of a formula in a cell of a
worksheet.
(C) EXAMPLE:
7
4. SUM FUNCTION
(A) MEANING: The Microsoft Excel SUM function adds all numbers in a range of cells and
returns the result. The SUM function is a built-in function in Excel that is categorized as a
Math/Trig Function. As a worksheet function, the SUM function can be entered as part of a
formula in a cell of a worksheet.
(C) EXAMPLE:
5. MAX FUNCTION
(A) MEANING: The Microsoft Excel MAX function returns the largest value from the
numbers provided. .... It can be used as a worksheet function (WS) in Excel. As a worksheet
function, the MAX function can be entered as part of a formula in a cell of a worksheet.
(C) EXAMPLE:
8
6. MIN FUNCTION
(A) MEANING: The Microsoft Excel MIN function returns the smallest value from the
numbers provided. The MIN function is a built-in function in Excel that is categorized as a
Statistical Function. It can be used as a worksheet function (WS) in Excel.
(C) EXAMPLE:
9
7. AVERAGE FUNCTION
(A) MEANING:The Microsoft Excel AVERAGE function returns the average (arithmetic
mean) of the numbers provided. The AVERAGE function is a built-in function in Excel that
is categorized as a Statistical Function. It can be used as a worksheet function (WS) in Excel.
(C) EXAMPLE:
8. COUNT-IF FUNCTION
(A) MEANING: The Microsoft Excel COUNTIF function counts the number of cells in a
range, that meets a given criteria ..... It can be used as a worksheet function (WS) in Excel. As
a worksheet function, the COUNTIF function can be entered as part of a formula in a cell of a
worksheet.
(C) EXAMPLE:
10
9. COUNT-IFS FUNCTION
(A) MEANING: The Excel COUNTIFS function returns the count of cells that meet one or
more criteria. COUNTIFS can be used with criteria based on dates, numbers, text, and other
conditions. COUNTIFS supports logical operators (>,<,<>,=).
(C) EXAMPLE:
11
10. SUM-IF FUNCTION
(A) MEANING: The SUMIF function is a worksheet function that adds all numbers in a
range of cells based on one criteria (for example, is equal to 2000). ... It can be used as a
worksheet function (WS) in Excel. As a worksheet function, the SUMIF function can be
entered as part of a formula in a cell of a worksheet.
(C) EXAMPLE:
(C) EXAMPLE:
12
12. CONCATENATE FUNCTION
(A) MEANING: The concatenate function is one of Excel's text functions. It is used to join
two or more words or text strings together. For example, sometimes data distributed over
multiple columns in an excel spreadsheet is more efficient to use when combined into one
column.
(C) EXAMPLE:
13
13. V-LOOKUP FUNCTION
(A) MEANING: When the VLOOKUP function is called, Excel searches for a lookup value
in the leftmost column of a section of your spreadsheet called the table array. The function
returns another value in the same row, defined by the column index number.
(C) EXAMPLE:
14
15. V-LOOKUP + DROPDOWN LIST
(A) MEANING:In Excel, VLOOKUP and drop down list are two useful functions. For
example, you have a drop down list in a range, and when you select one kind of the fruits
from the drop down list, the relative price will be shown in the adjacent cell as below
screenshot shown.
(B) STEPS:
1. Select a cell where you want to create the drop down list.
2. Go to Data –> Data Tools –> Data Validation.
3. In the Data Validation dialogue box, within the Settings tab, select List as the
Validation criteria.
4. In the source field, enter source range, or simply click in the Source field and select
the cells using the mouse and click OK. This will insert a drop down list in the
required cell.
5. Now, in the corresponding cell where you want the lookup value, apply
‘=VLOOKUP(lookup_value, table_array, col_index_num, false)’, where lookup
value should be the cell where you applied drop down list.
(C) EXAMPLE:
15
17. H-LOOKUP FUNCTION
(A) MEANING: The Microsoft Excel HLOOKUP function performs a horizontal lookup by
searching for a value in the top row of the table and returning the value in the same column
based on the index_number. The HLOOKUP function is a built-in function in Excel that is
categorized as a Lookup/Reference Function.
16
(B) SYNTAX: =HLOOKUP(lookup_value, table_array, row_index_num, range_lookup)
(C) EXAMPLE:
17
16. TRANSPOSE OF A TABLE
(A) MEANING: The TRANSPOSE function returns a vertical range of cells as a horizontal
range, or vice versa. The TRANSPOSE function must be entered as an array formula in a
range that has the same number of rows and columns, respectively, as the source range has
columns and rows.
(B) STEPS:
1. Select the range of data you want to rearrange, including any row or column labels,
and either select Copy on the Home tab, or press CONTROL+C.
Note: Make sure you copy the data to do this. Using the Cut command or
CONTROL+X won’t work.
2. Select the first cell where you want to paste the data, and on the Home tab, click the
arrow next to Paste, and then click Transpose.
Pick a spot in the worksheet that has enough room to paste your data. The data you
copied will overwrite any data that’s already there.
(C) EXAMPLE:
18
19
19. TEXT TO COLUMN FUNCTION
(A) MEANING: To separate the contents of one Excel cell into separate columns, you can
use the 'Convert Text to Columns Wizard'.
(B) STEPS:
(C) EXAMPLE:
20
21
22
Next
STEPS:
23
24
18. CONDITIONAL FORMATTING – HIGHLIGHT
CELL RULES
(A) MEANING: Conditional formatting is a feature of Excel which allows you to apply a
format to a cell or a range of cells based on certain criteria.
(B) STEPS:
25
• Greater than -
26
• LESS THAN
27
• Duplicate Value
28
19. CONDITIONAL FORMATTING – TOP/BOTTOM RULE:
Conditional formatting in Excel enables you to highlight cells with a certain colour,
depending on the cell’s value.
Step 1:
Select the range of cells, the table, or the whole sheet that you want to apply Conditional
Formatting to.
Step 2:
On the Home tab, Click on the Conditional Formatting. Go to Top/Bottom Rules and
select the required formatting to be applied.
29
20. CONDITIONALFORMATTING – DATA BARS:
Conditional formatting in Excel enables you to highlight cells with a certain colour,
depending on the cell’s value.
Step 1:
Select the range of cells, the table, or the whole sheet that you want to apply Conditional
Formatting to.
30
Step 2:
On the Home tab, Click on the Conditional Formatting. Go to Data Bars and select the
required formatting to be applied.
Step 1:
Select the range of cells, the table, or the whole sheet that you want to apply Conditional
Formatting to.
31
Step 2:
On the Home tab, Click on the Conditional Formatting. Go to Color Scales and select the
required formatting to be applied.
Step 1:
Select the cells you want to format as a table. From the Home Tab, click the Format as Table
command.
Step 2:
32
Select a Table style from the drop-down menu. A dialog box will appear, confirming the
selected cell range for the table.
Step 3:
Click OK.
33
23. FORMAT CELLS:
FORMAT cells in Excel change the appearance of a number without changing the number
itself.
Step 1:
Select the cells you want to format. On the Format menu, click Cells. In the Format cells
dialog box, make the required customisations.
Step 2:
RESULT.
34
24. CELLSTYLES:
Excel has CELL styles which make it more efficient to style your Excel worksheet.
Step 1:
Select the cells which you want to style. On the Home tab, click on Cell Styles.
Step 2:
35
25. DATAVALIDATION – SETTINGS:
DATA VALIDATION is a feature in Excel used to control what a user can enter.
Step 1:
Select the cells you want to create for. Select Data Validation under the Data tab. Select the
list option under Allow.
Step 2:
RESULT.
36
26. DATAVALIDATION – INPUTMESSAGE:
DATA VALIDATION is a feature in Excel used to control what a user can enter.
Step 1:
Select the cells you want to create for. Select Data Validation under the Data tab. Enter the
Input Message that may tell the user what data to enter.
Step 2:
RESULT.
37
For - Number
Result
38
27. DATAVALIDATION – ERRORALERT:
DATA VALIDATION is a feature in Excel used to control what a user can enter.
Step 1:
Select the cells you want to create for. Select Data Validation under the Data tab. Enter the
Error Alert message that will detect the wrong input.
Step 2:
RESULT.
39
For Numbers
Result
40
28. CUSTOMIZATION – RIBBON:
Step 1:
Right-Click the Ribbon and select Customize the Ribbon from the drop-down menu.
Step 2:
41
The Excel Options dialog box will appear. Locate and select New Tab. Make sure the New
Group is selected, select a command, then click Add. You can also drag commands directly
into a group.
Step 3:
When you are done adding commands, click OK. The commands will be added to the
Ribbon.
42
29. CUSTOMIZATION – QUICKACCESSTOOLBAR:
The Quick Access Toolbar is a customizable toolbar that contains a set of commands that are
independent of the tab on the ribbon that is currently displayed.
Step 1:
Right-Click the Ribbon and select Customize Quick Access Toolbar from the drop-down
menu.
Step 2:
In the choose Commands from list, click Commands Not in the Ribbon.
43
Step 3:
30. BACKSTAGEVIEW:
Backstage view is an option that allows you to manipulate aspects of a file. The backstage
view gives access to saving, opening, info about the open file, creating a new file, printing,
and recently opened files.
44
First column of the backstage view will have the following options −
1 Save
2 Save As
A dialogue box will be displayed asking for sheet name and sheet type. By
default, it will save in sheet 2010 format with extension .xlsx.
3 Open
4 Close
5 Info
6 Recent
7 New
8 Print
This option saves an opened sheet and displays options to send the sheet using
45
email etc.
10 Help
You can use this option to get the required help about excel 2010.
11 Options
12 Exit
46
31. SAVEASADOBEPDF:
Step 1:
Open your Excel workbook and select ranges or tables you want to convert to a PDF file.
Step 2:
47
In Excel click File>Save As. In the Save As dialog window, select PDF from the “Save As
type” drop-down list.
48
DATA
VISUALIZATION AND
ANALYSIS
49
✓ Qualitative data
✓ Quantitative data
Qualitative data
The qualitative data includes the following aspects which are as follows:-
• Frequency distribution
• Relative frequency distribution
• Percent frequency distribution
• Graphs
Frequency distribution
Firstly, we are going to elaborate frequency distribution using an example which
includes the data which is in ungrouped in nature.
There are 20 guest rated the quality of accommodation on a 5 point scale
❖ Excellent
❖ Above average
❖ Average
❖ Below average
❖ Poor
50
Now, we have to categories the sum of each category rated by 20 people by using
above aspects
FREQUENCY:
The FREQUENCY function in Excel calculates how often values occur within the ranges you
specify in a bin table.
RELATIVEFREQUENCY:
Relative Frequency of a particular observation or class interval is found by dividing the
frequency (f) by the number of observations (n).
51
PERCENTAGE FREQUENCY:
The Percentage Frequency is found by multiplying each relative frequency value by 100.
52
• BARGRAPH:
A Bar Chart is the horizontal version of column chart.
Step 1:
Select the range of cells which are to be represented under bar graph.
Step 2:
Go to Insert>Bar Chart.
53
• HISTOGRAM USING GRAPH TAB:
Histogram is a graphical representation of the distribution of numerical data.
Step 1:
Select the range of cells which are to be represented under bar graph.
Step 2:
54
Step 3:
Discrete series
When the data is given in the frequency Given, there is a mark obtained by students in a class
from 0-100 in a particular subject which is distributed in different bins. The following are the
data recorded:-
marks bin
86 10
76 20
84 30
96 40
1 50
31 60
12 70
19 80
35 90
55
56
bin Frequency Cumulative % bin Frequency Cumulative %
10 1 11.11% 20 2 22.22%
20 2 33.33% 40 2 44.44%
30 0 33.33% 90 2 66.67%
40 2 55.56% 10 1 77.78%
50 0 55.56% 80 1 88.89%
60 0 55.56% More 1 100.00%
70 0 55.56% 30 0 100.00%
80 1 66.67% 50 0 100.00%
90 2 88.89% 60 0 100.00%
More 1 100.00% 70 0 100.00%
Histogram
2.5 120.00%
2 100.00%
Frequency
80.00%
1.5
60.00%
1
40.00% Frequency
0.5 20.00% Cumulative %
0 0.00%
20
40
90
10
80
30
50
60
70
More
bin
57
• PIVOT TABLES AND ITS TOOLS:
A Pivot Table allows you to extract the significance from a large, detailed data set.
A pivot table is a program tool that allows you to reorganize and summarize selected columns
and rows of data in a spreadsheet or database table to obtain a desired report. A pivot table
doesn't actually change the spreadsheet or database itself. In database lingo, to pivot is to turn
the data to view it from different perspectives. A pivot table is especially useful with large
amounts of data.
Step 1:
Step 2:
58
Step 3:
59
• PIVOT CHARTS AND ITS TOOLS:
A Pivot Chart is the graphical representation of a Pivot Table in Excel.
Step 1:
Click any cell inside the Pivot Table. On the Charts tab, in the Insert group, click on Pivot
Chart.
Step 2:
60
• HISTOGRAM FREQUENCY DISTRIBUTION
HISTOGRAM – CHART OUTPUT
Bin Frequency
10 1
20 2
30 0
40 2
50 0
60 0
70 0
80 1
90 2
More 1
Histogram
3
Frequency
2
1
0 Frequency
bin
61
HISTOGRAM – PARETO (SORTED DIAGRAM)
62
70 0 55.56%
80 1 66.67%
90 2 88.89%
More 1 100.00%
• DESCRIPTIVE STATISTICS:
Descriptive statistics are one of the fundamental “must knows” with any set of data.
Step 1:
Group Group
A B
76 95
87 97
98 87
45 89
66 87
78 45
76 76
88 56
78 76
87 87
54 45
65 76
76 45
89 88
65 76
78 66
54 78
87 56
45 77
63
Step 2:
Step 3:
RESULT.
64
• DESCRIPTIVE STATISTICS FOR VARIOUS SCALES:
Descriptive statistics are one of the fundamental “must knows” with any set of data.
Step 1:
Group Group
A B
76 95
87 97
98 87
45 89
66 87
78 45
76 76
88 56
78 76
87 87
54 45
65 76
76 45
89 88
65 76
78 66
54 78
87 56
45 77
65
Step 2:
Step 3:
RESULT.
66
• CORRELATION:
A Correlation coefficient (a value between -1 and +1) tells how strongly two variables are
related to each other.
Step 1:
Group Group
A B
76 95
87 97
98 87
45 89
66 87
78 45
76 76
88 56
78 76
87 87
54 45
65 76
76 45
89 88
65 76
78 66
54 78
87 56
45 77
67
Step 2:
Step 3:
RESULT.
68
HYPOTHESIS
TESTING
69
One sample t test using dummy (one – tailed)
T-test one sample assuming equal variances (one-tailed)
Problem Statement-
To determine whether the population mean age is greater than 40 at α = 0.05.
Hypothesis
Age Dummy
18 0
24 0
56
78
67
24
65
89
76
23
45
65
78
55
32
33
44
26
76
Steps- Go to data- data analysis- t-test two sample assuming equal variance
70
71
t-Test: Two-Sample Assuming Equal Variances
Age Dummy
Mean 51.26315789 0
Variance 531.4269006 0
Observations 19 2
Pooled Variance 503.4570637
Hypothesized Mean Difference 40
df 19
t Stat 0.675244576
P(T<=t) one-tail 0.253827763
t Critical one-tail 1.729132812
P(T<=t) two-tail 0.507655526
t Critical two-tail 2.093024054
Decision rule-
Here,
t stat=0.675,which is less than t critical= 1.740 therefore null hypothesis is not rejected
Inference:
Null hypothesis is accepted and alternate hypothesis is Rejected therefore,mean of age of
population is greater than 40
Hypothesis
Null Hypothesis- H0 : µf - µp<=0
Alternate hypothesis- H1 : µf - up> 0
full part
time time
3.2 3.1
1.5 3.4
6.5 4.6
72
0.2 2.8
3.7 2.3
3.3 1.5
1.7 2.8
3.6 9.5
3.8 4.3
5.3 2.7
6.9 1.6
3.6 1.6
1.7 3.2
1.2 4.2
7.2 3.9
3.9 1.2
1.9
5.3
Steps- Go to data- data analysis- t-test two sample assuming equal variance
73
t-Test: Two-Sample Assuming Equal Variances
Variable 1 Variable 2
Mean 3.583333333 3.29375
Variance 4.133235294 3.843291667
Observations 18 16
Pooled Variance 3.997324219
Hypothesized Mean Difference 0
df 32
t Stat 0.421546668
P(T<=t) one-tail 0.338087152
t Critical one-tail 1.693888748
P(T<=t) two-tail 0.676174305
t Critical two-tail 2.036933343
Decision rule-
Here,t-stat is 0.421 and t critical is 1.69 which is greater than t-stat, so null hypothesis is
accepted
Here, p value is 0.338 and alpha is 0.05 which is less than p value, so null hypothesis is
accepted
Inference:
There is enough evidence that the time spent by full time students in studying statistics is not
more than the time spent by part time students
Problem Statement- two types of drugs were used on 5 and 7 patients for reducing their
weight. Drug A was imported and drug B was indigenous. The decrease in the weight after
using drugs for six months as follows
Drug A Drug B
10 8
74
12 9
13 12
11 14
14 13
10
9
Hypothesis-
H₀:µa-µb=0
H1: µa-µB≠0
Steps- Go to data- data analysis- t-test two sample assuming unequal variances
75
t-Test: Two-Sample Assuming Unequal Variances
Drug A Drug B
Mean 12 10.71428571
Variance 2.5 5.238095238
Observations 5 7
Hypothesized Mean Difference 0
df 10
t Stat 1.150760914
P(T<=t) one-tail 0.138301959
t Critical one-tail 1.812461123
P(T<=t) two-tail 0.276603918
t Critical two-tail 2.228138852
Decision rule:
Here, t stat is 1.15, t critical is 2.228 which is > t stat, therefore accept null hypothesis
Here, p value is 0.27, alpha is 0.05 which is < p value, therefore accept null hypothesis
Hypothesis-
76
Alternate hypothesis- Ha : The diet was effective; ub-ua>0
Steps- Go to data- data analysis- t-Test: Paired Two Sample for Means
77
t-Test: Paired Two Sample for Means
before after
Mean 169.625 150.25
Variance 65.125 121.9286
Observations 8 8
Pearson Correlation -0.17675
Hypothesized Mean Difference 0
df 7
t Stat 3.706873
P(T<=t) one-tail 0.003793
t Critical one-tail 1.894579
P(T<=t) two-tail 0.007586
t Critical two-tail 2.364624
Decision rule-
Inference
1. Coaching was given to students for statistical software. After their result was
evaluated in January in order to improve their performance in April exams .Determine
if the coaching is successful.
JAN MAY
45 56
54 57
44 45
56 67
34 44
45 44
78
34 34
67 76
45 56
54 45
67 76
56 87
56 66
56 65
76 45
76 76
79
t-Test: Paired Two Sample for Means
Jan May
Mean 54.0625 58.0625
Variance 164.3291667 258.0625
Observations 16 16
Pearson Correlation 0.591118937
Hypothesized Mean Difference 0
df 15
t Stat -1.19611891
P(T<=t) one-tail 0.125107938
t Critical one-tail 1.753050356
P(T<=t) two-tail 0.250215876
t Critical two-tail 2.131449546
µm-µj ≤ 0 , H0
DIRECTION:
80
INFERENCE: "Therefore, the coaching for the students is Unsuccessful because there is no
improvement .We accept Null Hypothesis.
To analyse that there is a significant difference between the marks scored by Class Groups A
and B in maths at alpha=0.05.
Group Group
A B
76 95
87 97
98 87
45 89
66 87
78 45
76 76
88 56
78 76
87 87
54 45
65 76
76 45
89 88
65 76
78 66
54 78
87 56
45 77
H2: µA - µB = 0 (NULL)
81
Step 1:
Go to Data > Data Analysis > t-Test: Two-Sample Assuming Equal Variances.
Step 2:
Click OK.
OUTPUT:
82
Group A Group B
Mean 73.26316 73.78947
Variance 236.7602 287.3977
Observations 19 19
Pooled Variance 262.0789
Hypothesized Mean Difference 0
df 36
t Stat -0.10021
P(T<=t) one-tail 0.460369
t Critical one-tail 1.688298
P(T<=t) two-tail 0.920737
t Critical two-tail 2.028094
DECISION RULE:
And p value >α i.e. 0.92>0.05, we will accept null and reject alternate.
INFERENCE:
• TWOSAMPLE – PAIREDSAMPLET-TEST
RESEARCH PROBLEM:
Is there sufficient evidence to suggest that the mean to exhaustion is greater after chocolate
milk than after carbohydrate replacement drink? Use a significant level of 0.05.
83
8 28.65 14.99
9 35.37 20.11
Hο:µcm-µcd≤0
H1: µcm-µcd>0
α=0.05
.
Step 1:
Go to Data > Data Analysis > t-Test: Paired Two Sample for Means.
Step 2:
Click OK.
84
OUTPUT:
chocolate
milk carbohydrate replacement drink
Mean 41.79333333 33.44777778
Variance 164.53125 160.9338194
Observations 9 9
Pearson Correlation 0.508406248
Hypothesized Mean Difference 0
Df 8
t Stat 1.979280834
P(T<=t) one-tail 0.0415706
t Critical one-tail 1.859548038
P(T<=t) two-tail 0.083141199
t Critical two-tail 2.306004135
DECISION RULE:
And p value < α i.e. 0.04 < 0.05, we will reject null and accept alternate.
85
INFERENCE:
Therefore we can suggest that MEAN time for exhaustion of chocolate milk is greater than
UCD.
Hypothesis
H0 : μ=23
Alternate hypothesis- the population mean age differs from 23
H1 : μ≠23
Age DUMMY
25 0
21
21
20
30
22
20
20
23
18
21
23
21
20
21
22
24
24
19
23
22
24
21
86
19
24
22
19
22
25
23
24
19
22
19
25
87
z-Test: Two Sample for Means
age DUMMY
Mean 21.94285714 0
Known Variance 25 0.0001
Observations 35 1
Hypothesized Mean Difference 23
z -1.250740748
P(Z<=z) one-tail 0.105514539
z Critical one-tail 1.644853627
P(Z<=z) two-tail 0.211029079
z Critical two-tail 1.959963985
Decision rule-
Inference-
There is sufficient evidence that population mean age does not differ significantly from 23
88
• TWOSAMPLEZ-TEST
RESEARCH PROBLEM:
Can investors do better by buying mutual funds directly from banks or other financial institutions than
by purchasing mutual funds through brokers? Can we conclude at 5% significance level that directly
purchased mutual funds out performed mutual funds through brokers?
Direct Broker
9.33 3.24
6.94 -6.76
16.17 12.8
16.97 11.1
5.94 2.73
12.61 -0.13
3.33 18.22
16.13 -0.8
11.2 -5.75
1.14 2.59
4.68 3.71
3.09 13.15
7.26 11.05
2.05 -3.12
13.07 8.94
0.59 2.74
13.57 4.07
0.35 5.6
2.69 -0.85
18.45 -0.28
4.23 16.4
10.28 6.39
7.1 -1.9
-3.09 9.49
5.6 6.7
5.27 0.19
8.09 12.39
15.05 6.54
13.21 10.92
1.72 -2.15
14.69 4.36
-2.97 -11.07
89
10.37 9.24
-0.63 -2.67
-0.15 8.97
0.27 1.87
4.59 -1.53
6.38 5.23
-0.24 6.87
10.32 -1.69
10.29 9.43
4.39 8.31
-2.06 -3.99
7.66 -4.44
10.83 8.63
14.48 7.06
4.8 1.57
13.12 -8.44
-6.54 -5.72
-1.06 6.95
H0: µD - µB ≤ 0 (NULL)
90
Step 1:
Go to Data > Data Analysis > z-Test: Two Sample for Means.
Step 2:
Click OK.
OUTPUT:
91
z-Test: Two Sample for Means
Direct Broker
Mean 6.6312 3.7232
Known Variance 37.48818 43.33928
Observations 50 50
Hypothesized Mean Difference 0
z 2.287177862
P(Z<=z) one-tail 0.011092722
z Critical one-tail 1.644853627
P(Z<=z) two-tail 0.022185444
z Critical two-tail 1.959963985
DECISION RULE:
INFERENCE:
There is enough significance that directly purchased mutual funds out performed mutual funds
through brokers
Problem statement-The salaries of people who have a degree of economics, medical and
history
92
43 64 50
44 55 39
45 56 55
52 0 39
54 0 40
Hypothesis:-
Null hypothesis- that there is no significant difference in mean marks for economics,
medicine and history
H0:- µ1 = µ2 = µ3
Step 1:
Step 2:
Click OK.
93
OUTPUT:
SUMMARY
Groups Count Sum Average Variance
economics 9 435 48.33333 23.5
medicine 9 420 46.66667 724.25
history 9 393 43.66667 50.5
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 100.6667 2 50.33333 0.189164 0.828873 3.402826
Within Groups 6386 24 266.0833
Total 6486.667 26
Decision rule-
Here, F stats is less than F critical and P-vale is greater than alpha i.e. 0.05. Therefore, we
accept the null hypothesis
i.e., H0:- µ1 = µ2 = µ3
94
• Two factor annova without replication
Problem statement- To test whether or not marks of students differ with respect to student
and subject both.
Hypothesis
Null hypothesis-
column wise- there is no significant difference in marks for three subjects that is economic
science and history
Alternate hypothesis-
column wise- there is significant difference in marks for three subjects that is economic
science and history
Steps-
95
Anova: Two-Factor Without
Replication
96
E 3 157 52.33333 114.3333
Economics 5 240 48 28
Science 5 309 61.8 34.2
History 5 218 43.6 46.3
ANOVA
Source of Variation SS df MS F P-value F crit
Rows 53.73333 4 13.43333 0.282609 0.881261 3.837853
Columns 901.7333 2 450.8667 9.485273 0.007741 4.45897
Error 380.2667 8 47.53333
Total 1335.733 14
Decision rule-
Row wise:
Here p value is 0.86 which is greater than 5%, so null hypothesis is accepted.
Column wise:
Here p value is 0.010 which Iess than 5%, so null hypothesis is rejected
Inference:
Row wise
Column wise
There is enough evidence that marks of students differ for three subjects significantly.
97
ANOVA – Two Factor with replication
98
Anova: Two-Factor With
Replication
School B
99
Count 5 5 5 15
Sum 195 111 173 479
Average 39 22.2 34.6 31.93333333
Variance 494 924.2 420.3 579.4952381
Total
Count 10 10 10
Sum 435 420 393
Average 43.5 42 39.3
Variance 254.5 861.5555556 235.5666667
ANOVA
Source of Variation SS df MS F P-value F crit
Sample 2803.333 1 2803.333333 8.602700491 0.007272 4.259677
Columns 90.6 2 45.3 0.139013912 0.870912 3.402826
Interaction 1540.467 2 770.2333333 2.363645663 0.115611 3.402826
Within 7820.8 24 325.8666667
Total 12255.2 29
Decision rule-
Sample
Here, p value is 0.007 which is less than 5%, so null hypothesis is rejected
Column wise
Here, p value is 0.139 which is greater than alpha 5%, so null hypothesis is accepted
Interaction wise
Here, p value is 0.115 which is greater than alpha 5%, so null hypothesis is accepted
100
• F-TEST
Determine whether or not there is a significant difference between variances of two data sets.
Group 1 Group 2
150 125
175 165
160 130
130 155
160 170
145 150
NULL HYPOTHESIS
ALTERNATE HYPOTHESIS
Alpha= 0.05
101
Since the variance for group 1 is less than variance of group 2 we will swap the ranges.
102
Decision rule-
Here, p value is 0.357 which is greater than alpha, so null hypothesis is accepted
Inference
Hypothesis
Null hypothesis : crime rate in the company is not associated with month
103
Month Observed Expected O-E (O-E)^2/E
Jan 55 70 -15 3.214286
Feb 65 70 -5 0.357143
Mar 68 70 -2 0.057143
Apr 72 70 2 0.057143
May 78 70 8 0.914286
Jun 82 70 12 2.057143
cal value = 6.657143
5% level of confidence
Degree of freedom= (r-1)(c-1)
Degree of freedom = (6-1)(2-1) = 5
Table Value=11.07
CHIQ.TEST 0.247413
Decision Rule –
If Cal value is greater than tab value reject Null hypothesis
If P value is less than Alpha Reject Null hypothesis
Here, Cal value is 6.657 and tab value is 11.07, so null hypothesis is accepted
Here, P value is 0.247 which is greater than Alpha, so null hypothesis is accepted
Inference,
There is enough evidence that crime rate in the company is not associated with month
• Continuous Series
104
Problem statement- To determine whether brand preference is independent of age group
Hypothesis
alpha=0.05
105
p value= 0.768154456
Decision Rule –
If Cal value is greater than tab value reject Null hypothesis
If P value is less than Alpha Reject Null hypothesis
Here, Cal value is 7.372 and tab value is 12.59, so null hypothesis is accepted
Here, P value is 0.768 which is greater than Alpha(0.05), so null hypothesis is accepted
Inference
There is enough significance that there is no association between brand preference and age
group
106
INTRODUCTION
TO R
107
• FOURPANESINR:
The R Studio interface consists of four main panes, or windows:
1. TOPLEFT:
Text editor or script window. This is where you can save and edit collections of commands.
2. TOPRIGHT:
Environment and history window. The environment window contains objects (data, values,
functions) R has currently stored in its memory. The history window shows all commands
that were executed in the Console.
3. BOTTOMLEFT:
108
Console or command window. Here you can type any valid R command after the prompt
followed by Enter and R will execute that command
4. BOTTOMRIGHT:
Files, plots, packages, help, and viewer pane. Here you can open files, view plots, install and
load packages, read man pages, and view markdown and other documents in the viewer tab.
• IMPORTOFDATASHEETINEXCEL:
109
Importing data into R is a necessary step that, at times, can become time intensive. To ease
this task, RStudio includes new features to import data from: csv, xls, xlsx, sav, dta, por, sas
and stata files.
Step 1:
Step 2:
Step 3:
110
• FTEST:
F-Test is used to assess whether the variances of two populations (A and B) are equal.
111
Question:
To determine whether or not there is a significant difference between variances of two data
sets.
Group 1 Group 2
150 125
175 165
160 130
130 155
160 170
145 150
NULL HYPOTHESIS
ALTERNATE HYPOTHESIS
Alpha= 0.05
112
INTERPRETATION:
Here, p value i.e., 0.7142 > alpha i.e., 0.05, thus accept null hypothesis.
INFERENCE:
Hypothesis
Null hypothesis : crime rate in the company is not associated with month
113
Month Observed
Jan 55
Feb 65
Mar 68
Apr 72
May 78
Jun 82
> table(Book1)
Observed
Month 55 65 68 72 78 82
Apr 0 0 0 1 0 0
Feb 0 1 0 0 0 0
Jan 1 0 0 0 0 0
Jun 0 0 0 0 0 1
Mar 0 0 1 0 0 0
May 0 0 0 0 1 0
114
> chisq.test(table(Book1$Month,Book1$Observed))
Decision Rule –
If Cal value is greater than tab value reject Null hypothesis
If P value is less than Alpha Reject Null hypothesis
Here, P value is 0.2243 which is greater than Alpha, so null hypothesis is accepted
Inference,
There is enough evidence that crime rate in the company is not associated with month
115