0% found this document useful (0 votes)
22 views

Introduction to Analytics and R file

Uploaded by

Hanish verma
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Introduction to Analytics and R file

Uploaded by

Hanish verma
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Guru Gobind Singh Indraprastha

University (GGSIPU)

Submitted in partial fulfilment of requirement of


Masters in Business Administration (MBA)

LAB FILE
OF
INTRODUCTION TO ANALYTICS & R

Submitted to: Submitted by:


Dr. Shitika DIVYA
00416619824

1|Page
Sno. Particular Page No.
1 R Programming 3-6
2 Exercise - Vector 7-11
3 Exercise 2 - Matrix 12-15
4 Exercise – Data Frame 16-18
5 Exercise – How to Delete Commands In R 19-20
6 Exercise - GGPLOT 21-26
7 Exercise – Hypothesis Testing 27-29
8 Exercise – Linear Regression 30-33
9 Exercise – Logistic Regression 34-35
10 Exercise – Time Series 36-37
11 Exercise – Neural Networking 38-42
12 Exercise – Factor Analysis 43-44
13 Exercise – Cluster Analysis 45-47
14 Exercise – Market Analysis 48-49
15 Exercise – Monte Carlo Method 50

2|Page
R PROGRAMMING

Ques What is R?
Ans. R is a programming language and software environment primarily used for
statistical computing, data analysis, and visualization. It is popular among statisticians,
data scientists, and researchers for its flexibility, ease of use, and a vast ecosystem of
packages.

Features of R-programming:
R programming has several features that make it a popular choice for data analysis, statistical
computing, and visualization. The following are the important features of R:
1. Statistical Analysis
 R is great for performing statistical calculations. Whether you need basic statistics
like averages and medians, or complex ones like regression or hypothesis testing, R
has built-in functions for it.
2. Data Visualization
 R has tools that allow you to create beautiful graphs and charts. You can easily
make line plots, bar charts, histograms, and more. Libraries like ggplot2 help you
create professional-quality visualizations.
3. Easy Data Manipulation

3|Page
 R provides simple ways to organize and clean data. With packages like dplyr, you
can easily filter, sort, and transform your data to get it ready for analysis.
4. Open Source & Free
 R is completely free to use. It is open-source, meaning anyone can contribute to
improving it. You don’t have to buy any software to use it.
5. Large Number of Packages
 R has a huge library of packages (special add-ons) for specific tasks, like data
manipulation, machine learning, or bioinformatics. You can download and use these
packages to extend R’s functionality.
6. Supports Multiple Data Formats
 R can handle different types of data like spreadsheets (CSV), databases, or even big
data. It works well with text files, Excel files, and web data.

Before you can ask your computer to save some numbers, you’ll need to know how to talk to
it. That’s where R and RStudio come in. RStudio gives you a way to talk to your computer. R
gives you a language to speak in.
To get started, open RStudio just as you would open any other application on your computer.
When you do, a window should appear in your screen like the one shown here->

4|Page
The RStudio interface is simple. You type R code into the bottom line of the RStudio console
pane and then click Enter to run it.
The code you type is called a command, because it will command your computer to do
something for you. The line you type it into is called the command line.
When you type a command at the prompt and hit Enter, your computer executes the
command and shows you the results. Then RStudio displays a fresh prompt for your next
command. For example, if you type 2 + 3 and hit Enter, RStudio will display:

Everything in R is an object.
R has 6 basic data types. (In addition to the five listed below, there is also raw which will not
be discussed in this workshop.)

 Character
 numeric (real or decimal)
 integer
 logical
 complex

5|Page
Data Structures in R
R has many data structures. These include
 vector
 list
 matrix
 data frame
 factors

6|Page
EXERCISE -Vector

In R programming, a vector is a basic data structure that contains elements of the same type.
Vectors are used to store and manipulate collections of data in a linear, one-dimensional
format.
Key Characteristics of Vectors in R:
 Homogeneous Elements: All elements in a vector must be of the same type, such as
numeric, character, or logical.
 Indexing: Elements in a vector can be accessed using their indices, starting from 1.
 Creation: Vectors can be created using the c() function, among other functions.
Types of Vectors:

 Numeric Vectors
• What is it?
A numeric vector stores numbers (either whole numbers or decimals).

• Example:
# Numeric vector (integers)
num_vec <- c(1, 2, 3, 4)
print(num_vec)

# Numeric vector (decimals)


decimal_vec <- c(1.5, 2.7, 3.6)
print(decimal_vec)

• Why use it?


You use a numeric vector when you need to store a list of numbers, such as
ages, prices, or measurements.

 Character Vector:
• What is it?
A character vector stores text (strings of characters).

• Example:
# Character vector
char_vec <- c("Alice", "Bob", "Charlie")
print(char_vec)

• Why use it?


Use a character vector to store names, cities, or any other type of textual
information.

7|Page
 Logical Vector:
• What is it?
A logical vector stores Boolean values: either TRUE or FALSE.

• Example:
# Logical vector
logical_vec <- c(TRUE, FALSE, TRUE, FALSE)
print(logical_vec)

• Why use it?


Logical vectors are useful when you need to track conditions or binary data,
like whether someone is eligible (TRUE) or not (FALSE).

Solve the following problems using R


Ques1 What is two to the power of five?
Input: 2^5
Output: 32

Ques2 Create a vector called stock. Price with the following data points: 23,27,23,21,34

Ques3 Assign names to the price data points relating to the day of week, starting with Mon,
Tues, Wed etc.
8|Page
Ques4 What was the avg.(mean) stock price for the week?

Ques5 create a vector called over.23 consisting of logical that corresponds to the days where
the stock price was more than $23?

9|Page
Ques6 Use the over.23 vector to filter out the stock price vector and only return the day and
prices where the prices were over $23.

Ques7 Use the built-in function to find the day the price was the highest.

10 | P a g e
Exercise 2
11 | P a g e
Matrix

In R programming, a matrix is a two-dimensional data structure that can store elements of the
same type arranged in rows and columns. It's essentially a vector with dimensions. Matrices
are useful for various mathematical computations, such as linear algebra and statistics.
R BIND & C BIND
In R, rbind() and cbind() are functions used to combine objects by rows or columns,
respectively. They are very handy for data manipulation and combining data frames, matrices,
and vectors.
rbind()
 Function: Stands for "row bind." It combines objects by rows, i.e., it appends one or
more sets of rows to an existing object.
cbind()
 Function: Stands for "column bind." It combines objects by columns, i.e., it appends
one or more sets of columns to an existing object.

Practical Uses:
 rbind() is typically used when you have multiple datasets with the same columns and
you want to combine them into a single dataset by stacking them on top of each other.
 cbind() is used when you have multiple datasets with the same number of rows and
you want to combine them side-by-side.

Ques1 Create 2 vectors A and B where A is (1,2,3) and B is (4,5,6). With these vectors use the
c bind() or r bind() function to create a 2 by 3 matrix from the vector.
Input: A=c(1,2,3)
B=c(4,5,6)
rbind(A,B)
cbind(A,B)

12 | P a g e
Ques2 Create a 3 by 3 matrix consisting of the number 1-9. Create this matrix using the
shortcut 1:9 and by specifying the nrow argument in the matrix() function call. Assign this
matrix to the variable mat.
Input: mat=matrix(1:9,nrow=3)
mat
Output:

Ques 3 Confirm that mat is matrix using is.matrix()


Input: is.matrix(mat)
Output: TRUE

Ques4 Create a 5 by 5 matrix consisting of the number 1-25 and assign it to the variable
mat1. The top row should be the numbers 1-5.
Input: mat1 <- matrix(1:25,nrow = 5, ncol = 5, byrow = T)
mat1
Output:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15

13 | P a g e
[4,] 16 17 18 19 20
[5,] 21 22 23 24 25

Ques5 Using indexing notation, grab a sub-section of mat1 from the previous exercise that
looks like this:
[7,8]
[12,13]
Input: sub_section <- mat1[2:3,2:3]
sub_section
Output: [,1] [,2]
[1,] 7 8
[2,] 12 13

Ques6 What is the sum of all the elements in mat1?


Input: sum_of_elements <- sum(mat1)
sum_of_elements
Output: 325

14 | P a g e
Ques7 Find out how to use runif() to create a 4 by 5 matrix consisting of 20 random numbers
(4*5=20).
Input: random_numbers <- runif(20)
random_matrix <- matrix(random_numbers, nrow = 4, ncol = 5)
random_matrix
Output:

15 | P a g e
Exercise -Data frame

In R programming, a data frame is one of the most commonly used data structures. It’s a table
or a 2-dimensional array-like structure that stores data in rows and columns. Each column can
contain different types of data (numeric, character, factor, etc.), making data frames
incredibly versatile for data analysis and manipulation.

Key Characteristics of Data Frames:

 Rows and Columns: Data frames are organized into rows and columns. Each column
represents a variable, and each row represents an observation.

 Heterogeneous Data: Columns can contain different types of data.

 Row and Column Names: Data frames can have names for rows and columns, which
makes them easier to understand and manipulate.

Creating a Data Frame:

You can create a data frame using the data.frame()

Accessing Elements in a Data Frame:

You can access data frame elements using the $ operator, [ ] indexing, or [[ ]] indexing.

Ques1 Recreate the following dataframe by creating vectors and using the data. frame
function:

Input: age <- c(22, 25, 26)

weight <- c(150, 165, 120)

sex <- c("M", "M", "F")

df <- data.frame(Age = age, Weight = weight, Sex = sex, row.names = c("Sam", "Frank",
"Amy"))

df

Output: Age Weight Sex

16 | P a g e
Sam 22 150 M

Frank 25 165 M

Amy 26 120 F

Ques2 Check if mtcars is a dataframe using is.dataframe()

Input: is.data.frame(mtcars)

Output: TRUE

Ques3 Use as.dataframe() to convert a matrix into a dataframe:

Input: mat<-matrix(1:25, nrow=5)

Mat

Output: [,1] [,2] [,3] [,4] [,5]

[1,] 1 6 11 16 21

[2,] 2 7 12 17 22

[3,] 3 8 13 18 23

[4,] 4 9 14 19 24

[5,] 5 10 15 20 25

Ques4 Set the built-in data frame mtcars as a variable df.

Input: df<-mtcars

Df

17 | P a g e
Output:

18 | P a g e
EXERCISE- HOW TO DELETE COMMANDS IN R?

Commands can be deleted in R using rm function

1. RM FUNCTION

DELETING ONLY ONE VARIABLE

rm() stands for "remove" and is used to delete objects from the R environment. Here is the
example of how to use rm() function

INPUT:

OUTPUT:

After deletion, trying to use the removed object will result in an error

19 | P a g e
DELETING MULTIPLE VARIABLES USING RM FUNCTION

STEP-1: Create the Variables

STEP-2: Delete them using rm function

After deletion, trying to use the removed object will result in an error

20 | P a g e
Exercise -GG Plot

ggplot2 is one of the most popular and powerful packages in R for creating advanced and
customizable graphics. It's based on the Grammar of Graphics, which provides a systematic
way to describe and build graphs.

Getting Started with ggplot2

Installation and Loading the Package:

First, you need to install and load the ggplot2 package:

Codes: install.packages("ggplot2")

library(ggplot2)

Basic Components of ggplot2

 Data: The dataset you want to visualize.

 Aesthetic Mapping (aes): Defines how variables in the data are mapped to visual
properties (like x and y positions, colors, etc.).

 Geometric Object (geom): The type of plot (e.g., points, lines, bars).

 Scales: Control mapping of data values to visual properties.

 Facets: Create multiple plots based on the values of a variable.

ggplot2 is extremely flexible and allows for a high degree of customization, making it an
essential tool for data visualization in R.

Ques How to Intall ggplot?

Ans install.packages("ggplot2")

library(ggplot2)

21 | P a g e
Ques How to create histogram?

Input: ggplot(mtcars,aes(mpg))+geom_histogram(binwidth = 5, color="red")

Output:

Ques Barplot: by default, it gives you the count

How many cars are SUV?

Input: View(mpg)

Output:

22 | P a g e
Ques Classify the cars on basis of type and show stacked chart for fuel type of each car.

Input: ggplot(mpg,aes(class))+ geom_bar(aes(fill=fl))

ggplot(mpg,aes(class, fill=fl))+ geom_bar()

Output: Stacked Bar chart

Ques Dodge Chart: cluster cars on the basis of fuel type.

Input: ggplot(mpg,aes(class))+ geom_bar(aes(fill=fl), position = 'dodge')

Output:

23 | P a g e
Ques Percent chart: cluster cars on the basis of fuel type

Input: ggplot(mpg,aes(class))+ geom_bar(aes(fill=fl), position = 'fill')

Output:

Ques What is the average mileage classwise?

Input: # Load necessary library

library(dplyr)

library(ggplot2)

data(mpg)

# Group by class and summarize the average city mileage

dataset <- mpg %>%

group_by(class) %>%

summarise(mileage = mean(cty))

View(dataset)

24 | P a g e
Output:

Ques Stacked bar chart with identity

What is the average mileage classwise and on the basis of drive mode?

Input:

data<- mpg%>% group_by(class, drv)%>% summarise(mileage=mean(cty))

data

ggplot(data, aes(x=class, y=mileage, fill=drv))+

geom_bar(stat = 'identity', position = 'dodge')+ scale_fill_manual(values =


c('red','yellow','green'))

Output:

25 | P a g e
Ques wt vs mpg

Input: ggplot(mtcars, aes(x=wt, y=mpg))+ geom_point(color='red', shape=5)

Output:

Ques Plot the cyl on the chart

Input: ggplot(mtcars, aes(x=wt, y=mpg))+ geom_point(aes(color=factor(cyl)))

Output:

26 | P a g e
Exercise -Hypothesis testing

Hypothesis testing is a statistical method used to make decisions or inferences about a


population based on sample data. It helps determine whether there is enough evidence to
reject a null hypothesis in Favor of an alternative hypothesis. Here’s a step-by-step overview
of hypothesis testing:

Key Concepts

1. Null Hypothesis (H₀): The statement that there is no effect or no difference. It is the
hypothesis that researchers aim to test against.

2. Alternative Hypothesis (H₁ or Ha): The statement that there is an effect or a


difference. It represents the outcome that researchers want to support.

3. Significance Level (α): The probability threshold for rejecting the null hypothesis.
Common significance levels are 0.05, 0.01, and 0.10.

4. Test Statistic: A standardized value that is calculated from sample data during a
hypothesis test. It can follow different distributions (e.g., t-distribution, z-
distribution).

5. P-Value: The probability of obtaining a test statistic at least as extreme as the one
observed, assuming the null hypothesis is true.

6. Decision Rule: Based on the comparison between the p-value and the significance
level, you either reject or fail to reject the null hypothesis.

Steps in Hypothesis Testing

1. State the Hypotheses:

o Null Hypothesis (H₀): There is no difference in means.

o Alternative Hypothesis (H₁): There is a difference in means.

2. Choose the Significance Level (α):

o Commonly chosen values are 0.05, 0.01, or 0.10.

3. Select the Appropriate Test:

27 | P a g e
o Choose a statistical test based on the type of data and sample size (e.g., t-test,
chi-square test, ANOVA).

4. Calculate the Test Statistic and P-Value:

o Use statistical software or formulas to compute the test statistic and p-value.

5. Make the Decision:

o Compare the p-value with the significance level:

 If p-value ≤ α: Reject the null hypothesis (evidence suggests H₁ is


true).

 If p-value > α: Fail to reject the null hypothesis (not enough evidence
to support H₁).

6. Interpret the Results:

o Provide context and implications of the decision in practical terms.

Q1: One Sample t test : 50 v/s post Usage 1 Month.

Ans t.test(df$Post_usage_1month, mu=50)

t.test(df$Post_usage_1month, mu = 50)

Q2: Dependent (paired) t test --Is there any impact of marketing campaign on usage?

Ans t.test(df$pre_usage, df$Post_usage_1month, paired = TRUE)

t.test(df$pre_usage, df$post_usage_2ndmonth, paired = TRUE)

t.test(df$pre_usage, df$Latest_mon_usage, paired = TRUE)

mean(df$pre_usage)

mean(df$Post_usage_1month)

mean(df$post_usage_2ndmonth)

mean(df$Latest_mon_usage)

28 | P a g e
Q3: Independent t test
Ans mean(df[df$sex==0,'Post_usage_1month'])

mean(df[df$sex==1,'Post_usage_1month'])

sd(df[df$sex==0,'Post_usage_1month']) # Male = 10.30

sd(df[df$sex==1,'Post_usage_1month']) # Female = 8.13

t.test(Post_usage_1month~sex, data = df, var.equal = FALSE)

Q4: ANOVA # segments v/s Post usage 1 month usage

Ans # Method 1

AOV_obj <- aov(Post_usage_1month~segment, data = df)

summary(AOV_obj)

# Method 2

AOV_obj2 <- lm(Post_usage_1month~segment,data=df)

anova(AOV_obj2)

summary(AOV_obj2)

mean(df[df$segment==1,'Post_usage_1month'])

mean(df[df$segment==2,'Post_usage_1month'])

mean(df[df$segment==3,'Post_usage_1month'])

Q5: Chi Square

Ans tab <- xtabs(~region+segment, data = df)

chisq.test(tab)

Q6: Correlation between pre usage and post usage

Ans cor.test(df$pre_usage, df$Post_usage_1month)

29 | P a g e

You might also like