Introduction to Analytics and R file
Introduction to Analytics and R file
University (GGSIPU)
LAB FILE
OF
INTRODUCTION TO ANALYTICS & R
1|Page
Sno. Particular Page No.
1 R Programming 3-6
2 Exercise - Vector 7-11
3 Exercise 2 - Matrix 12-15
4 Exercise – Data Frame 16-18
5 Exercise – How to Delete Commands In R 19-20
6 Exercise - GGPLOT 21-26
7 Exercise – Hypothesis Testing 27-29
8 Exercise – Linear Regression 30-33
9 Exercise – Logistic Regression 34-35
10 Exercise – Time Series 36-37
11 Exercise – Neural Networking 38-42
12 Exercise – Factor Analysis 43-44
13 Exercise – Cluster Analysis 45-47
14 Exercise – Market Analysis 48-49
15 Exercise – Monte Carlo Method 50
2|Page
R PROGRAMMING
Ques What is R?
Ans. R is a programming language and software environment primarily used for
statistical computing, data analysis, and visualization. It is popular among statisticians,
data scientists, and researchers for its flexibility, ease of use, and a vast ecosystem of
packages.
Features of R-programming:
R programming has several features that make it a popular choice for data analysis, statistical
computing, and visualization. The following are the important features of R:
1. Statistical Analysis
R is great for performing statistical calculations. Whether you need basic statistics
like averages and medians, or complex ones like regression or hypothesis testing, R
has built-in functions for it.
2. Data Visualization
R has tools that allow you to create beautiful graphs and charts. You can easily
make line plots, bar charts, histograms, and more. Libraries like ggplot2 help you
create professional-quality visualizations.
3. Easy Data Manipulation
3|Page
R provides simple ways to organize and clean data. With packages like dplyr, you
can easily filter, sort, and transform your data to get it ready for analysis.
4. Open Source & Free
R is completely free to use. It is open-source, meaning anyone can contribute to
improving it. You don’t have to buy any software to use it.
5. Large Number of Packages
R has a huge library of packages (special add-ons) for specific tasks, like data
manipulation, machine learning, or bioinformatics. You can download and use these
packages to extend R’s functionality.
6. Supports Multiple Data Formats
R can handle different types of data like spreadsheets (CSV), databases, or even big
data. It works well with text files, Excel files, and web data.
Before you can ask your computer to save some numbers, you’ll need to know how to talk to
it. That’s where R and RStudio come in. RStudio gives you a way to talk to your computer. R
gives you a language to speak in.
To get started, open RStudio just as you would open any other application on your computer.
When you do, a window should appear in your screen like the one shown here->
4|Page
The RStudio interface is simple. You type R code into the bottom line of the RStudio console
pane and then click Enter to run it.
The code you type is called a command, because it will command your computer to do
something for you. The line you type it into is called the command line.
When you type a command at the prompt and hit Enter, your computer executes the
command and shows you the results. Then RStudio displays a fresh prompt for your next
command. For example, if you type 2 + 3 and hit Enter, RStudio will display:
Everything in R is an object.
R has 6 basic data types. (In addition to the five listed below, there is also raw which will not
be discussed in this workshop.)
Character
numeric (real or decimal)
integer
logical
complex
5|Page
Data Structures in R
R has many data structures. These include
vector
list
matrix
data frame
factors
6|Page
EXERCISE -Vector
In R programming, a vector is a basic data structure that contains elements of the same type.
Vectors are used to store and manipulate collections of data in a linear, one-dimensional
format.
Key Characteristics of Vectors in R:
Homogeneous Elements: All elements in a vector must be of the same type, such as
numeric, character, or logical.
Indexing: Elements in a vector can be accessed using their indices, starting from 1.
Creation: Vectors can be created using the c() function, among other functions.
Types of Vectors:
Numeric Vectors
• What is it?
A numeric vector stores numbers (either whole numbers or decimals).
• Example:
# Numeric vector (integers)
num_vec <- c(1, 2, 3, 4)
print(num_vec)
Character Vector:
• What is it?
A character vector stores text (strings of characters).
• Example:
# Character vector
char_vec <- c("Alice", "Bob", "Charlie")
print(char_vec)
7|Page
Logical Vector:
• What is it?
A logical vector stores Boolean values: either TRUE or FALSE.
• Example:
# Logical vector
logical_vec <- c(TRUE, FALSE, TRUE, FALSE)
print(logical_vec)
Ques2 Create a vector called stock. Price with the following data points: 23,27,23,21,34
Ques3 Assign names to the price data points relating to the day of week, starting with Mon,
Tues, Wed etc.
8|Page
Ques4 What was the avg.(mean) stock price for the week?
Ques5 create a vector called over.23 consisting of logical that corresponds to the days where
the stock price was more than $23?
9|Page
Ques6 Use the over.23 vector to filter out the stock price vector and only return the day and
prices where the prices were over $23.
Ques7 Use the built-in function to find the day the price was the highest.
10 | P a g e
Exercise 2
11 | P a g e
Matrix
In R programming, a matrix is a two-dimensional data structure that can store elements of the
same type arranged in rows and columns. It's essentially a vector with dimensions. Matrices
are useful for various mathematical computations, such as linear algebra and statistics.
R BIND & C BIND
In R, rbind() and cbind() are functions used to combine objects by rows or columns,
respectively. They are very handy for data manipulation and combining data frames, matrices,
and vectors.
rbind()
Function: Stands for "row bind." It combines objects by rows, i.e., it appends one or
more sets of rows to an existing object.
cbind()
Function: Stands for "column bind." It combines objects by columns, i.e., it appends
one or more sets of columns to an existing object.
Practical Uses:
rbind() is typically used when you have multiple datasets with the same columns and
you want to combine them into a single dataset by stacking them on top of each other.
cbind() is used when you have multiple datasets with the same number of rows and
you want to combine them side-by-side.
Ques1 Create 2 vectors A and B where A is (1,2,3) and B is (4,5,6). With these vectors use the
c bind() or r bind() function to create a 2 by 3 matrix from the vector.
Input: A=c(1,2,3)
B=c(4,5,6)
rbind(A,B)
cbind(A,B)
12 | P a g e
Ques2 Create a 3 by 3 matrix consisting of the number 1-9. Create this matrix using the
shortcut 1:9 and by specifying the nrow argument in the matrix() function call. Assign this
matrix to the variable mat.
Input: mat=matrix(1:9,nrow=3)
mat
Output:
Ques4 Create a 5 by 5 matrix consisting of the number 1-25 and assign it to the variable
mat1. The top row should be the numbers 1-5.
Input: mat1 <- matrix(1:25,nrow = 5, ncol = 5, byrow = T)
mat1
Output:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
13 | P a g e
[4,] 16 17 18 19 20
[5,] 21 22 23 24 25
Ques5 Using indexing notation, grab a sub-section of mat1 from the previous exercise that
looks like this:
[7,8]
[12,13]
Input: sub_section <- mat1[2:3,2:3]
sub_section
Output: [,1] [,2]
[1,] 7 8
[2,] 12 13
14 | P a g e
Ques7 Find out how to use runif() to create a 4 by 5 matrix consisting of 20 random numbers
(4*5=20).
Input: random_numbers <- runif(20)
random_matrix <- matrix(random_numbers, nrow = 4, ncol = 5)
random_matrix
Output:
15 | P a g e
Exercise -Data frame
In R programming, a data frame is one of the most commonly used data structures. It’s a table
or a 2-dimensional array-like structure that stores data in rows and columns. Each column can
contain different types of data (numeric, character, factor, etc.), making data frames
incredibly versatile for data analysis and manipulation.
Rows and Columns: Data frames are organized into rows and columns. Each column
represents a variable, and each row represents an observation.
Row and Column Names: Data frames can have names for rows and columns, which
makes them easier to understand and manipulate.
You can access data frame elements using the $ operator, [ ] indexing, or [[ ]] indexing.
Ques1 Recreate the following dataframe by creating vectors and using the data. frame
function:
df <- data.frame(Age = age, Weight = weight, Sex = sex, row.names = c("Sam", "Frank",
"Amy"))
df
16 | P a g e
Sam 22 150 M
Frank 25 165 M
Amy 26 120 F
Input: is.data.frame(mtcars)
Output: TRUE
Mat
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 3 8 13 18 23
[4,] 4 9 14 19 24
[5,] 5 10 15 20 25
Input: df<-mtcars
Df
17 | P a g e
Output:
18 | P a g e
EXERCISE- HOW TO DELETE COMMANDS IN R?
1. RM FUNCTION
rm() stands for "remove" and is used to delete objects from the R environment. Here is the
example of how to use rm() function
INPUT:
OUTPUT:
After deletion, trying to use the removed object will result in an error
19 | P a g e
DELETING MULTIPLE VARIABLES USING RM FUNCTION
After deletion, trying to use the removed object will result in an error
20 | P a g e
Exercise -GG Plot
ggplot2 is one of the most popular and powerful packages in R for creating advanced and
customizable graphics. It's based on the Grammar of Graphics, which provides a systematic
way to describe and build graphs.
Codes: install.packages("ggplot2")
library(ggplot2)
Aesthetic Mapping (aes): Defines how variables in the data are mapped to visual
properties (like x and y positions, colors, etc.).
Geometric Object (geom): The type of plot (e.g., points, lines, bars).
ggplot2 is extremely flexible and allows for a high degree of customization, making it an
essential tool for data visualization in R.
Ans install.packages("ggplot2")
library(ggplot2)
21 | P a g e
Ques How to create histogram?
Output:
Input: View(mpg)
Output:
22 | P a g e
Ques Classify the cars on basis of type and show stacked chart for fuel type of each car.
Output:
23 | P a g e
Ques Percent chart: cluster cars on the basis of fuel type
Output:
library(dplyr)
library(ggplot2)
data(mpg)
group_by(class) %>%
summarise(mileage = mean(cty))
View(dataset)
24 | P a g e
Output:
What is the average mileage classwise and on the basis of drive mode?
Input:
data
Output:
25 | P a g e
Ques wt vs mpg
Output:
Output:
26 | P a g e
Exercise -Hypothesis testing
Key Concepts
1. Null Hypothesis (H₀): The statement that there is no effect or no difference. It is the
hypothesis that researchers aim to test against.
3. Significance Level (α): The probability threshold for rejecting the null hypothesis.
Common significance levels are 0.05, 0.01, and 0.10.
4. Test Statistic: A standardized value that is calculated from sample data during a
hypothesis test. It can follow different distributions (e.g., t-distribution, z-
distribution).
5. P-Value: The probability of obtaining a test statistic at least as extreme as the one
observed, assuming the null hypothesis is true.
6. Decision Rule: Based on the comparison between the p-value and the significance
level, you either reject or fail to reject the null hypothesis.
27 | P a g e
o Choose a statistical test based on the type of data and sample size (e.g., t-test,
chi-square test, ANOVA).
o Use statistical software or formulas to compute the test statistic and p-value.
If p-value > α: Fail to reject the null hypothesis (not enough evidence
to support H₁).
t.test(df$Post_usage_1month, mu = 50)
Q2: Dependent (paired) t test --Is there any impact of marketing campaign on usage?
mean(df$pre_usage)
mean(df$Post_usage_1month)
mean(df$post_usage_2ndmonth)
mean(df$Latest_mon_usage)
28 | P a g e
Q3: Independent t test
Ans mean(df[df$sex==0,'Post_usage_1month'])
mean(df[df$sex==1,'Post_usage_1month'])
Ans # Method 1
summary(AOV_obj)
# Method 2
anova(AOV_obj2)
summary(AOV_obj2)
mean(df[df$segment==1,'Post_usage_1month'])
mean(df[df$segment==2,'Post_usage_1month'])
mean(df[df$segment==3,'Post_usage_1month'])
chisq.test(tab)
29 | P a g e