ANUSHKA
ANUSHKA
SUBMITTED BY
ANUSHKA
ENROLMENT NO.- 35517788821
BCOM(H)- 3A
(FIRST SHIFT)
1. How to Install R Studio?What is the latest version of R. Give details?
Step 3: Click on "install R for the first time" link to download the R executable (.exe) file.
Step 4: Run the R executable file to start installation, and allow the app to make changes to your
device.
RStudio has four main panes each in a quadrant of your screen: Source Editor, Console,
Workspace Browser (and History), and Plots (and Files, Packages, Help).
1. The Console is where you can type code that executes immediately. This is also known as the
command line. It is at the bottom left of your screen.
2. The Environment pane is very useful as it shows you what objects (i.e., dataframes, arrays,
values and functions) you have in your environment (workspace). You can see the values for
objects with a single value and for those that are longer, R will tell you their class.
4. The Source Editor can help you open, edit and execute these programs. It is the pane on the
top left of your screen.
3. Who designed and developed the R language?
R was started by professors Ross Ihaka and Robert Gentleman as a programming language to
teach introductory statistics at the University of Auckland. The language took heavy inspiration
from the S programming language with most S programs able to run unaltered in Ras well as
from Scheme's lexical scoping allowing for local variables. The name of the language comes
from being an S language successor and the shared first letter of the authors, Ross and Robert.
Ihaka and Gentleman first shared binaries of R on the data archive StatLib and the s-news
mailing list in August 1993.In June 1995, statistician Martin Mächler convinced Ihaka and
Gentleman to make R free and open-source under the GNU General Public License. The first
official 1.0 version was released on 29 February 2000.
R is a Case sensitive language. This means that variables ‘H’ and ‘h’ are different in R language.
R packages are a collection of R functions, complied code and sample data. They are stored
under a directory called "library" in the R environment. By default, R installs a set of packages
during installation. More packages are added later, when they are needed for some specific
purpose. When we start the R console, only the default packages are available by default. Other
packages which are already installed have to be loaded explicitly to be used by the R program
that is going to use them.
The R system is divided into 2 conceptual parts:
1. The “base” R system that you download from CRAN
2. Everything else.
R functionality is divided into a number of packages.
The “base” R system contains, among other things, the base package which is required to run R
and contains the most fundamental functions.
The other packages contained in the “base” system include utils, stats, datasets, graphics,
grDevices, grid, methods, tools, parallel, compiler, splines, tcltk, stats4.
6. How is a package installed and accessed?
7. What is CRAN?
The Comprehensive R Archive Network (CRAN) is R's central software repository, supported by
the R Foundation.It contains an archive of the latest and previous versions of the R distribution,
documentation, and contributed R packages. It includes both source packages and pre-compiled
binaries for Windows and macOS.As of November 2020, more than 16,000 packages are
available.CRAN was created by Kurt Hornik and Friedrich Leisch in 1997,with the name
paralleling other early packing systems such as TeX's CTAN (released 1992) and Perl's CPAN
(released 1995).As of 2021, it is still maintained by Hornik and a team of volunteers.The master
site is located at the Vienna University of Economics and Business and is mirrored on servers
around the world.
Object assignment refers to assigning values to a name via the assignment operator, which will
create a new object with a name. You can use the new named object once it is created in
subsequent calculations without redundancy.
The assignment operation has three components. From left to right-
1. the first component x_numeric is the object name of a new object.
2. The second component is the assignment operator <-, which is a combination of the less than
sign < immediately followed by the minus sign -.
3. The final component is the value(s) to be assigned to the name.
<- assigns a value to a variable from right to left.
-> assigns a value to a variable left to right.
The assignment operator is used to assign a value. For instance we can assign the value 3 to the
variable x using the <- assignment operator. We can then evaluate the variable by simply typing
x at the command line which will return the value of x .
9. Explain The c( ) function. Write a sample command for the same.
10. What is paste( ) function used for? Write a sample command for the same.
Paste() function takes multiple elements from the multiple vectors and concatenates them into a
single element.
Sample Command:
%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands
with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an
expression, into the next function call/expression.
12. What is meant by “>”, “+” and [1]in R console?
The arrow symbol (“>”) tells you that R is ready for new code.
The plus sign (+) tells you that R is waiting for something else from you. Normally to close out a
function.
[1] defines a line number.
13. Write the code to identify odd or even numbers using IF statement.
14. Write the code to identify minimum number among three numbers using Nested IF
statement.
15. Display grade of student using nested if command for following criterion (customize
student name). Output example: Kritika has scored “A” Grade
Data Frames are data displayed in a format as a table. Data Frames can have different types of
data inside it. While the first column can be character , the second and third can be numeric or
logical . However, each column should have the same type of data.
A matrix is a two dimensional data set with columns and rows. A column is a vertical
representation of data, while a row is a horizontal representation of data. A matrix can be created
with the matrix() function.
A vector is simply a list of items that are of the same type. To combine the list of items to a
vector, use the c() function and separate the items by a comma.
18. Write a command to create Data Frames, Matrices, Vectors
Data Frame:
Matrix:
Vector:
19. Name some built-in functions with their description.Basic statistic functions
Operator Description
mean(x) Mean of x
median(x) Median of x
var(x) Variance of x
23. Write a command to access non-consecutive rows or columns, use ‘c() ‘. For
example, to obtain rows 1 to 5, 7 and 11 and columns 3 to 4 and 7.
24. Add one new column and drop two existing columns 4 and 5.
25. Drop rows 1, 3 and 4.
26. Write a command to calculate the number of columns and number of rows.
27. What is the command to access built in datasets? What is the command to get
description of a built-in datasets.
We use data() function without any arguments to get the list of built-in datasets.
28. Access Titanic dataset and Execute commands to evaluate whether the evacuation
strategy was fair or not. If biased, state which gender, age group and class was most
favored. Analyze using cross tabulations.
From this plot, it can be clearly seen Women, Children and First Class were favoured the most.
29. Calculate correlation by importing data from excel. Determine whether there is a
positive or a negative correlation in Advertisement in month and Sales in crores.
30. Apply Important Functions (gather, separate, unite, spread, fill, full_seq, drop_na,
and replace_na) in “tidyr Package” for following dataset
Gather
Separate
Unite
Spread
Full_seq
Drop_na
Replace_na
Fill
The dplyr Package
31. Apply Important Functions (filter, arrange, select, rename, mutate and transmute,
sample_n and sample_frac) for following column heads with 5 data rows:
Dataset
Filter
Arrange
Select
Rename
Mutate
Transmute
Sample
Data Visualization in R Studio
Quick plot with ggplot2
32. Generate BCOM marks data containing the sections and overall percentage (5
sections ranging from A to E ), with 60 students in each section
Histogram plot
· Density plot line color by group (Section) and change line type
· Draw a plot using data from numeric vectors where X contains values ranging from 10 to
20 and Y is square of X
· Scatter plots with smoothed line for Miles/(US) gallon on y axis and Weight (lb/1000) on
x axis
· Scatter plots (for Miles/(US) gallon on y axis and Weight (lb/1000) on x axis) with
Smoothed line by groups (Number of cylinders)
· Scatter plots with colors for Miles/(US) gallon on y axis and Weight (lb/1000) on x axis
· Scatter plots (for Miles/(US) gallon on y axis and Weight (lb/1000) on x axis) with colors
by groups (Number of gears)
· Scatter plots (for Miles/(US) gallon on y axis and Weight (lb/1000) on x axis) with
Smoothed line and colors by groups (Number of gears)
· Scatter plots (for Miles/(US) gallon on y axis and Weight (lb/1000) on x axis) with
Smoothed line and the point shape by groups (Number of gears)
35. Provide 5 commands for Descriptive statistics
36. Provide summary statistics for the MTCARS dataset while displaying a count summary
of categorical variables
.
HYPOTHESIS TESTING using R studio
For all test import excel with the given data saved by your name_test
T-TEST
A.One Sample t- Test using dummy (One- Tailed)
Problem 1:
65 56 89
Problem 1:
To analyze that the time spent by full time students in studying statistics is different as time spent
by part time students.
Conclusion: The time spent by full time students in studying is not different as time spent by part
time students at α=0.05
C.Two Sample t- Test
Problem 1:
Is there sufficient evidence to suggest that the mean time to exhaustion is greater after chocolate
milk than after carbohydrate replacement drink? Use a significance level of 0.05. (Use µ CM-µCD
in hypothesis statements)
Conclusion: The mean time to exhaustion is greater after carbohydrate drink than after chocolate
milk at α=0.05.
D.Paired t- Test
Problem 1:
Coaching was given to students for Statistical software after their result was evaluatedin January
in order to improve their performance in April exams. Determine if the coaching was successful.
(α = 0.05%)
Jan May
45 45 56 56
54 54 57 45
44 67 32 55
56 56 67 87
34 56 44 66
45 56 34 65
34 76 34 45
56 45 76 76
Problem 1:
To analyse that there is a significant difference between the marks scored by class groups A & B
in mathematics at α=10%.
Group A Group B
76 87 95 56
55 87 97 87
76 65 87 76
76 76 89 87
89 89 56 88
65 65 98 76
76 78 76 66
88 69 56 45
78 65 76 76
89 77
Problem 1:
Determine whether or not there is a significant difference between variances of two data sets.
Group 1 Group 2
150 170
125 165
160 130
130 155
160 125
125 150
Conclusion: There is no significant difference between the variancies of the two datasets.
G.One Way Anova
Problem 1:
The marks for 3 different groups in Economics, Science, History are given.
20 20
Problem 1:
15-25 75 56 72
26-35 60 40 64
36-45 45 52 50
46-55 55 35 45