0% found this document useful (0 votes)
31 views41 pages

ANUSHKA

Uploaded by

yoovbansal2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views41 pages

ANUSHKA

Uploaded by

yoovbansal2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 41

RESEARCH METHODS

FOR COMMERCE LAB


LAB FILE

SUBMITTED BY
ANUSHKA
ENROLMENT NO.- 35517788821
BCOM(H)- 3A
(FIRST SHIFT)
1. How to Install R Studio?What is the latest version of R. Give details?

RStudio is an integrated development environment suite for the R programming language,


synthesizing coding tools into one software tool for easier advanced data processing. Using in-
memory processing, it is capable of parsing big data through integrations and connections.
It is available in both open-source and commercial formats, with extra features available in the
paid edition includes more sophisticated collaboration and security efforts. The free version is
capable of end-to-end analytics, from API connectivity and data ingestion to visualization
creation and distribution. It can be deployed standalone or on a web browser through connection
to RStudio Server.

To install R on Windows OS:

Step 1: Go to the CRAN website.


Step 2: Click on "Download R for Windows".

Step 3: Click on "install R for the first time" link to download the R executable (.exe) file.

Step 4: Run the R executable file to start installation, and allow the app to make changes to your
device.

Step 5: Select the installation language.


Step 6: Follow the installation instructions.

Step 7: Click on "Finish" to exit the installation setup.

The latest RStudio IDE release is 2022.07.0, code-named “Spotted Wakerobin”.

2. RStudio Layout with snapshot. Explain the purpose of all panes.

RStudio has four main panes each in a quadrant of your screen: Source Editor, Console,
Workspace Browser (and History), and Plots (and Files, Packages, Help).
1. The Console is where you can type code that executes immediately. This is also known as the
command line. It is at the bottom left of your screen.

2. The Environment pane is very useful as it shows you what objects (i.e., dataframes, arrays,
values and functions) you have in your environment (workspace). You can see the values for
objects with a single value and for those that are longer, R will tell you their class.

3. The last pane has a number of different tabs.


3.1. The Files tab has a navigable file manager, just like the file system on your operating
system.
3.2. The Packages tab shows you the packages that are installed and those that can be installed.
3.3. The Plot tab is where graphics you create will appear.
3.4. The Help tab allows you to search the R documentation for help and is where the help
appears when you ask for it from the Console. It is at the bottom right of your screen.

4. The Source Editor can help you open, edit and execute these programs. It is the pane on the
top left of your screen.
3. Who designed and developed the R language?

R was started by professors Ross Ihaka and Robert Gentleman as a programming language to
teach introductory statistics at the University of Auckland. The language took heavy inspiration
from the S programming language with most S programs able to run unaltered in Ras well as
from Scheme's lexical scoping allowing for local variables. The name of the language comes
from being an S language successor and the shared first letter of the authors, Ross and Robert.
Ihaka and Gentleman first shared binaries of R on the data archive StatLib and the s-news
mailing list in August 1993.In June 1995, statistician Martin Mächler convinced Ihaka and
Gentleman to make R free and open-source under the GNU General Public License. The first
official 1.0 version was released on 29 February 2000.

4. Are variables ‘H’ and ‘h’ same in R language?

R is a Case sensitive language. This means that variables ‘H’ and ‘h’ are different in R language.

5. What is a package? What are two major parts of R language?

R packages are a collection of R functions, complied code and sample data. They are stored
under a directory called "library" in the R environment. By default, R installs a set of packages
during installation. More packages are added later, when they are needed for some specific
purpose. When we start the R console, only the default packages are available by default. Other
packages which are already installed have to be loaded explicitly to be used by the R program
that is going to use them.
The R system is divided into 2 conceptual parts:
1. The “base” R system that you download from CRAN
2. Everything else.
R functionality is divided into a number of packages.
The “base” R system contains, among other things, the base package which is required to run R
and contains the most fundamental functions.
The other packages contained in the “base” system include utils, stats, datasets, graphics,
grDevices, grid, methods, tools, parallel, compiler, splines, tcltk, stats4.
6. How is a package installed and accessed?

To install a package, simply pass the package to be installed as an argument to


“install.packages(function)”.
A package can be accessed once it has been installed using “library()” command.

7. What is CRAN?

The Comprehensive R Archive Network (CRAN) is R's central software repository, supported by
the R Foundation.It contains an archive of the latest and previous versions of the R distribution,
documentation, and contributed R packages. It includes both source packages and pre-compiled
binaries for Windows and macOS.As of November 2020, more than 16,000 packages are
available.CRAN was created by Kurt Hornik and Friedrich Leisch in 1997,with the name
paralleling other early packing systems such as TeX's CTAN (released 1992) and Perl's CPAN
(released 1995).As of 2021, it is still maintained by Hornik and a team of volunteers.The master
site is located at the Vienna University of Economics and Business and is mirrored on servers
around the world.

8. What do you mean by Object Assignment? Elucidate difference between Left-side


and Right-side Assignment with output. What is Assignment operator in Rstudio?

Object assignment refers to assigning values to a name via the assignment operator, which will
create a new object with a name. You can use the new named object once it is created in
subsequent calculations without redundancy.
The assignment operation has three components. From left to right-
1. the first component x_numeric is the object name of a new object.
2. The second component is the assignment operator <-, which is a combination of the less than
sign < immediately followed by the minus sign -.
3. The final component is the value(s) to be assigned to the name.
<- assigns a value to a variable from right to left.
-> assigns a value to a variable left to right.
The assignment operator is used to assign a value. For instance we can assign the value 3 to the
variable x using the <- assignment operator. We can then evaluate the variable by simply typing
x at the command line which will return the value of x .
9. Explain The c( ) function. Write a sample command for the same.

The c R function combines multiple values into a vector or list.


Sample Command:

10. What is paste( ) function used for? Write a sample command for the same.

Paste() function takes multiple elements from the multiple vectors and concatenates them into a
single element.
Sample Command:

11. What is %>% operator used for?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands
with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an
expression, into the next function call/expression.
12. What is meant by “>”, “+” and [1]in R console?
The arrow symbol (“>”) tells you that R is ready for new code.
The plus sign (+) tells you that R is waiting for something else from you. Normally to close out a
function.
[1] defines a line number.

13. Write the code to identify odd or even numbers using IF statement.

14. Write the code to identify minimum number among three numbers using Nested IF
statement.
15. Display grade of student using nested if command for following criterion (customize
student name). Output example: Kritika has scored “A” Grade

16. How to Import of Data Sheet in Excel?

Select Import Dataset and select from Excel


Select your file and press import.

17. What are Data Frames, Matrices, Vectors?

Data Frames are data displayed in a format as a table. Data Frames can have different types of
data inside it. While the first column can be character , the second and third can be numeric or
logical . However, each column should have the same type of data.
A matrix is a two dimensional data set with columns and rows. A column is a vertical
representation of data, while a row is a horizontal representation of data. A matrix can be created
with the matrix() function.
A vector is simply a list of items that are of the same type. To combine the list of items to a
vector, use the c() function and separate the items by a comma.
18. Write a command to create Data Frames, Matrices, Vectors
Data Frame:

Matrix:

Vector:
19. Name some built-in functions with their description.Basic statistic functions

Operator Description

mean(x) Mean of x

median(x) Median of x

var(x) Variance of x

sd(x) Standard deviation of x

scale(x) Standard scores (z-scores) of x

quantile(x) The quartiles of x

summary(x) Summary of x: mean, min, max etc..

20. Create a function for multiplication but no return value.

21. Write a command for Accessing Rows and Columns.


22. Create a data frame by your surname of 12 rows and 8 columns.

23. Write a command to access non-consecutive rows or columns, use ‘c() ‘. For
example, to obtain rows 1 to 5, 7 and 11 and columns 3 to 4 and 7.

24. Add one new column and drop two existing columns 4 and 5.
25. Drop rows 1, 3 and 4.

26. Write a command to calculate the number of columns and number of rows.
27. What is the command to access built in datasets? What is the command to get
description of a built-in datasets.

We use data() function without any arguments to get the list of built-in datasets.
28. Access Titanic dataset and Execute commands to evaluate whether the evacuation
strategy was fair or not. If biased, state which gender, age group and class was most
favored. Analyze using cross tabulations.

From this plot, it can be clearly seen Women, Children and First Class were favoured the most.

29. Calculate correlation by importing data from excel. Determine whether there is a
positive or a negative correlation in Advertisement in month and Sales in crores.

Hence there is a positive correlation between advertisement run and sales


Packages in R Programming

The tidyr Package

30. Apply Important Functions (gather, separate, unite, spread, fill, full_seq, drop_na,
and replace_na) in “tidyr Package” for following dataset

Gather
Separate

Unite

Spread

Full_seq
Drop_na

Since there is no null value in dataset so there is no change.

Replace_na

Since there is no null value, no value is replaced.

Fill
The dplyr Package
31. Apply Important Functions (filter, arrange, select, rename, mutate and transmute,
sample_n and sample_frac) for following column heads with 5 data rows:

Dataset

Filter

Arrange

Select

Rename
Mutate

Transmute

Sample
Data Visualization in R Studio
Quick plot with ggplot2

32. Generate BCOM marks data containing the sections and overall percentage (5
sections ranging from A to E ), with 60 students in each section

Histogram plot

· Histogram fill color by group (Section)


· Basic density plot

· Density plot line color by group (Section) and change line type
· Draw a plot using data from numeric vectors where X contains values ranging from 10 to
20 and Y is square of X

· Add to the dot plot for X & Y


33. Activate Motor Trend Car Road Tests dataset. Using the given data set prepare
following quick plots:

· Scatter plots with smoothed line for Miles/(US) gallon on y axis and Weight (lb/1000) on
x axis
· Scatter plots (for Miles/(US) gallon on y axis and Weight (lb/1000) on x axis) with
Smoothed line by groups (Number of cylinders)
· Scatter plots with colors for Miles/(US) gallon on y axis and Weight (lb/1000) on x axis
· Scatter plots (for Miles/(US) gallon on y axis and Weight (lb/1000) on x axis) with colors
by groups (Number of gears)
· Scatter plots (for Miles/(US) gallon on y axis and Weight (lb/1000) on x axis) with
Smoothed line and colors by groups (Number of gears)
· Scatter plots (for Miles/(US) gallon on y axis and Weight (lb/1000) on x axis) with
Smoothed line and the point shape by groups (Number of gears)
35. Provide 5 commands for Descriptive statistics

Max Returns the largest value in the entire data frame.


Min Returns the smallest value in the entire data frame.
Sum Returns the sum of the entire data frame.
Fivenum Returns the Tukey summary values for the entire data frame.
Length Returns the number of columns in the data frame.

36. Provide summary statistics for the MTCARS dataset while displaying a count summary
of categorical variables

.
HYPOTHESIS TESTING using R studio

For all test import excel with the given data saved by your name_test

T-TEST
A.One Sample t- Test using dummy (One- Tailed)

File name example: kritika_ttest

Problem 1:

To determine that the population mean of age is equal to 40 at α=0.05.

Age Age Age Age


18 89 76 25
24 44 33 23
56 34 44 45
78 3 26 65
67 4 56 78
24 56 32 55

65 56 89

Decision Rule: If p> α, accept null hypothesis.

Inference: p(0.1571)> α(0.05), accept null hypothesis.

Conclusion: The population mean of age is 40 at α=0.05.


B.Two Sample t- Test

File name example: kritika_ttest2

Problem 1:

To analyze that the time spent by full time students in studying statistics is different as time spent
by part time students.

Full time Part time


3.2 3.6 3.1 2.7
1.5 3.8 3.4 3.4
6.5 5.3 4.6 1.6
0.2 6.9 2.8 3.2
3.7 3.6 2.3 4.2
3.3 1.7 1.5 3.9
1.7 2.2 3.8 1.2
1.9 7.2 9.5 4.3
5.3 3.9

Decision Rule: If p<α, reject null hypothesis.

Inference: p(0.8)> α(0.05), accept null hypothesis.

Conclusion: The time spent by full time students in studying is not different as time spent by part
time students at α=0.05
C.Two Sample t- Test

Problem 1:

Is there sufficient evidence to suggest that the mean time to exhaustion is greater after chocolate
milk than after carbohydrate replacement drink? Use a significance level of 0.05. (Use µ CM-µCD
in hypothesis statements)

Cyclist Chocolate Milk Carbohydrate Replacement Drink


1 50.46 32.9
2 47.08 20.1
3 57.51 41.67
4 46.6 32.69
5 49.1 46.33
6 27.5 31.63
7 23.87 50.61
8 28.65 14.99
9 35.37 20.11

Decision Rule: If p<α, reject null hypothesis.

Inference: p(0.076)> α(0.05), accept null hypothesis.

Conclusion: The mean time to exhaustion is greater after carbohydrate drink than after chocolate
milk at α=0.05.

D.Paired t- Test

Problem 1:

Coaching was given to students for Statistical software after their result was evaluatedin January
in order to improve their performance in April exams. Determine if the coaching was successful.
(α = 0.05%)

Jan May
45 45 56 56
54 54 57 45
44 67 32 55
56 56 67 87
34 56 44 66
45 56 34 65
34 76 34 45
56 45 76 76

Decision Rule: If p<α, reject null hypothesis.

Inference: p(0.85)> α(0.05), accept null hypothesis.

Conclusion: The coaching was not successful.


E.Two Sample t Test

Problem 1:

To analyse that there is a significant difference between the marks scored by class groups A & B
in mathematics at α=10%.

Group A Group B
76 87 95 56
55 87 97 87
76 65 87 76
76 76 89 87
89 89 56 88
65 65 98 76
76 78 76 66
88 69 56 45
78 65 76 76
89 77

Decision Rule: If p<α, reject null hypothesis.

Inference: p(0.85)> α(0.10), accept null hypothesis.

Conclusion: There is no significant difference between the marks scored by A and B.


F.F Test

Problem 1:

Determine whether or not there is a significant difference between variances of two data sets.

Group 1 Group 2
150 170
125 165
160 130
130 155
160 125
125 150

Decision Rule: If p<α, reject null hypothesis.

Inference: p(0.87)> α(0.05), accept null hypothesis.

Conclusion: There is no significant difference between the variancies of the two datasets.
G.One Way Anova

Problem 1:

The marks for 3 different groups in Economics, Science, History are given.

Determine whether there is a significant difference between the means of population.

Economics Science History


45 69 75
53 54 20
54 58 45
53 64 42
43 64 50
44 55 39
56 45 55
52 39

20 20

Decision Rule: If p<α, reject null hypothesis.

Inference: p(0.06)> α(0.05), accept null hypothesis.

Conclusion: There is no significant difference between the mean of the population.


H.Chi Square Test

Problem 1:

Determine whether brand preference is independent of age group.

Age/ Brand Brand 1 Brand 2 Brand 3

15-25 75 56 72

26-35 60 40 64

36-45 45 52 50

46-55 55 35 45

Decision Rule: If p<α, reject null hypothesis.

Inference: p(0.35)> α(0.05), accept null hypothesis.

Conclusion: There is no association between brand preference and age group.

You might also like