0% found this document useful (0 votes)
50 views

R Introduction

The document provides an introduction to the statistical software R. It discusses the benefits of R such as being free, having an excellent help system, and allowing for additional functionality through packages. Some disadvantages are a limited graphical interface and requiring programming knowledge. The document then demonstrates basic use of R, including using it as a calculator, importing and analyzing sample data, and descriptive statistics. Exercises are provided to familiarize users with R's functionality.

Uploaded by

Yatra Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

R Introduction

The document provides an introduction to the statistical software R. It discusses the benefits of R such as being free, having an excellent help system, and allowing for additional functionality through packages. Some disadvantages are a limited graphical interface and requiring programming knowledge. The document then demonstrates basic use of R, including using it as a calculator, importing and analyzing sample data, and descriptive statistics. Exercises are provided to familiarize users with R's functionality.

Uploaded by

Yatra Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Introduction to R

Welcome to the statistical world of R language. This chapter discuss the basic introduction to R language
with the help of few case studies. The R software provides an environment for data management and
statistical analysis. Although this environment is perceived to be unpleasant as compared to much user
friendly software’s, but the way its demand is increasing in academics and corporates, it will be definitely the
future of statistical analysis in both academics and corporates.

1.1 Benefits of R

The benefits of R software for a research analyst include:


 It is free software which provides techniques of statistical analysis and graphics facilities to an
analyst.
 R has an excellent built-in help system.
 R is a “Open Source”, which means that people who developed R allows everyone to access their
codes. This facility allows everyone to make added contribution to the software.
 R software is a base package with many built-in statistical functions. It can also be expanded by
downloading additional packages which provides specific functionality to the base software.
 R is a computer programming language. Programmers feel more comfort in using it.
 Comprehensive R Archive Network (CRAN) is central to using R. It is a place from where you
download the software and the package you want to install.

Disadvantages to R compared to other software’s


 It has a limited graphical interface.
 There is no commercial support.
 R requires a programming language so students must learn to appreciate syntax issues etc.

The R project was started by Robert Gentleman and Ross Ihaka of the Statistics Department of the
University of Auckland in 1995. It has quickly gained a widespread audience. It is currently maintained by
the R core-development team, a hard-working, international team of volunteer developers. The R project web
page http://www.r-project.org is the main site for information on R. At this site are directions for obtaining
the software, accompanying packages and other sources of documentation.

2. Starting R

There are three windows in R. These are:


1. Console
2. Editor
3. Graphics

Console is the main window where you can run commands and see the results of executing these commands.
In graphics window graphs will appear as a result of the commands.

>, + and #
The symbol “>” is called the prompt and indicates that the lines after this symbol are typed by users. Lines
beginning by anything else are produced by R. You can type the commands after the symbol “>” to instruct
R to execute them. If a command is too long to fit on a line, a + is used for the continuation prompt. The
symbol # is used to make comments after the command. Basically anything after the comment character is
ignored by R.

Objects and Functions: Commands in R are generally made up of 2 parts:

 Objects

 Functions

An object is anything created in R. It may be a variable or a collection of variables. Functions are inbuilt in
the software. These are separated by <-. That is Objects <- Function
Typing less: You can save a lot of typing in R. Arrow keys can be used to retrieve your previous commands
in R. In particular, each command is stored in a history and the up arrow will traverse backwards along this
history and the down arrow forwards. Left and right arrow keys will work as expected.

Exercise 1: Using R as a calculator: Perform the following mathematical expression in R software

5+10+20-15+25
15+10/2-15/3*5
(15+20/2-50/10)*5
pi*2^5-sqrt(16)+ log10(15) -log(5)
15-17*2/3-20
abs(15-17*2/3-20)
factorial(5)
log10(2)
log(2)
exp(0.6931472)

Exercise 2: Following is the data set of few companies w.r.t the details of their stock price, earning per share,
book value per share, the average PE ratio of the industry these companies belongs to and the type of
industry.
Company Sector Buy Current EPS Book Industry Industry
Price Price Value PE Type
DLF Real estate 110 90 2.52 85 29 Manufacturin
g
SBI Banking 180 160 147 1584 11 Service
HDFC Banking 990 1000 36 178 25 Service
Bharti Telecom 310 326 19 152 14 Service
Reliance Petroleum 990 1020 69 609 18 Manufacturin
g
Infosys IT 1100 1120 185 733 23 Service
BHEL Infra 140 110 13 138 17 Manufacturin
g
Ranbaxy Pharma 577 450 9.78 26 22.57 Manufacturin
g
Tata Steel Metal 283 250 75.41 629 10.33 Manufacturin
g
L&T Infra 1470 1200 62.86 362 17.39 Manufacturin
g

You are required to perform the following activities in R software:


1. Make a folder “R data” on the desktop and define this folder as your default folder.
2. Install the package “foreign”
3. Enter the given data of all the variables using c ( ) and scan ( ) function in R
4. Combine all the variables into a dataset using data.frame ( ) function in R
5. Calculate PE ratio of each company and include in the dataset.
6. Check the nature of variables (numeric or character).
7. Add the data following five companies:

Company Sector Buy Current EPS Book Industry Industry Type


Price Price Value PE
Gati Logistic 120 110 1.96 70 97 Service
Yes Bank Banking 670 710 57 278 17 Service
Vedanta Metal 65 75 8 114 11 Manufacturing
Amtek auto Auto 40 29 0.1 276 7 Manufacturing
Apollo Tyres Auto 163 161 16 64 10 Manufacturing

8. Calculate the descriptive statistics (mean, median, mode, standard deviation, variance, minimum marks and
maximum) of the variables.
9. Export the new dataset in the default folder in .csv format.
Scaler : A single number e.g. a=16

Vector : a = c(2,5,6,7,8,9,10)

Data frame : Data = data.frame(a,b,c), where a, b and c are vectors. Creates a matrix of variables and
observations

Write function : used for export

Read function : used for import

Exercise 3 (Descriptive statistics in R) The HR manager of ABC Ltd is interested in analysing the performance score
as well as the retention level of the employees in the company. She collected the data of 40 employees working with the
company w.r.t. to six variables (gender, age, performance score, education background, monthly income and the time
spent by the employees in the company). The data set consisting of the details of selected variables of the 40 employees
of a company is given below:

Employee Gender Age (in years) Performance Education Monthly Time spent
Code Score Background Income in the
company
1 M 25 57 BSc 29000 2
2 M 27 78 Bcom 30000 2
3 M 36 57 Btech 43000 7
4 F 43 46 BE 56000 6
5 F 36 59 BA 67000 5
6 F 28 65 BSc 76000 3
7 F 23 67 BA 15000 4
8 M 35 73 Bcom 72000 5
9 F 34 49 BE 52000 4
10 M 45 63 Btech 65000 7
11 F 34 68 Btech 61000 6
12 F 43 62 BA 89000 5
13 M 42 75 BSc 87000 6
14 M 25 56 Bcom 39000 2
15 F 43 64 Bcom 73000 4
16 F 42 69 BSc 76000 3
17 M 33 75 BA 30000 3
18 M 26 65 BE 28000 2
19 M 27 52 Btech 39000 3
20 F 24 99 BA 19000 3
21 F 25 56 BA 20000 4
22 M 26 87 BE 32000 3
23 F 35 67 BA 48000 9
24 F 36 52 BSc 52000 5
25 M 25 91 BSc 26000 2
26 M 36 78 BE 54000 7
27 F 38 50 BA 71000 6
28 F 39 72 Btech 39000 6
29 M 35 69 Btech 41000 5
30 M 36 61 Btech 43000 5
31 F 31 66 Bcom 20000 6
32 F 34 89 Bcom 45000 4
33 F 35 59 BE 51000 3
34 M 32 60 BA 48000 2
35 M 71 65 Btech 120000 15
36 M 29 59 Btech 49000 4
37 F 32 72 BSc 50000 7
38 M 36 63 BE 48000 5
39 F 48 56 BE 51000 8
40 M 39 73 BSc 63000 3

Perform the following analysis in R for the given data set.


1. Import the data (.csv format) in R using read.csv function.
2. Study the dataset using head, names, View. Analyse the summary of each variable.
3. Calculate the descriptive statistics (mean, median, mode, standard deviation, variance, minimum marks and
maximum) of the variables Age, performance score, retention and monthly income.
4. It is later found that the performance score of 2 nd employee are recorded by mistake. The actual performance
score is 78. Correct the data.
5. (a) What if the monthly income of 19th employee.
(b) Want to see the performance score of except 10th employee.
(c) Wants to see the age of only 7th, 9th and 15th employee.
6. (a) Which employee is having performance score equal to 80.
(b) Which employee is having performance score greater than 70.
(c) Which employee is having performance score less than 70.

7. Make the frequency distribution table of gender and education background of the employees.
8. Find out the univariate outliers in the variables performance scores and monthly income using box plot
diagram.
9. Use barplot( ), hist ( ) and pie ( ) functions to plot the graph of performance score of the employees.
10. Test the normal distribution of the variables ps, age, mi and rt
11. Plot the bivariate plot between the variables age and monthly income (using plot(c( )function)
12. (a) Test the null hypothesis that the average monthly income of the employees of the company is Rs 50000.
(b) Test the null hypothesis that average performance score of employees with all education background is
same.
13. Analyse the correlation between age, performance score, monthly income and retention of the employees.
14. Run the analyse the following regression model
Performance score=α + β 1∗Age+ β 2∗Retention

You might also like