0% found this document useful (0 votes)

6 views10 pages

Week 1

The document outlines the first week of a Marketing Analytics course taught by Dr. Swagato Chatterjee, focusing on using R for data analysis. It covers software installation, RStudio interface, essential functions, data manipulation techniques, and basic statistical concepts relevant to hotel review data analysis. Key topics include creating and manipulating matrices and data frames, summarizing data with dplyr, and introducing visualization and regression analysis methods.

Uploaded by

dushyant1209garg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views10 pages

Week 1

Uploaded by

dushyant1209garg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

WEEK 1

🟦 Course & Software Overview

Course Title: Marketing Analytics
Instructor: Dr. Swagato Chatterjee, VGSoM, IIT Kharagpur
Software Used: Excel (small problems), R (large problems)

🟩 Why Use R in Marketing Analytics?

Excel is limited to ~1 million rows → not suitable for large datasets.
R is:
Open-source and free.
Has strong community support.
Good for research-oriented work.
Python:
More suitable for deployment/automation, less for research.
Not preferred in this course.

🟨 Installing R & RStudio

Install R from: https://cran.r-project.org
Current mentioned version: R 3.6.1 (may vary)
Install RStudio from: https://posit.co/download/rstudio-desktop
Current version mentioned: RStudio 1.2.5001
Choose RStudio Desktop (Free Version)
For 32-bit systems, use older compatible versions.

🟧 Why RStudio over Base R?

RStudio has a more user-friendly UI.
Easier for click-and-run, drag-and-drop operations.
Also open-source and free.

🟥 RStudio Interface (4 Quadrants)

1. Top-Left → Script Editor:
Write and save R code ( .R files)
2. Bottom-Left → Console:
Runs the code, shows output
3. Top-Right → Environment/History:
Stores variables, datasets
4. Bottom-Right → Files/Plots/Packages/Help:
File explorer, visualizations, install/manage packages

🔷 Getting Started in RStudio

Open RStudio → File → New File → R Script
Save the script using the floppy disk icon or Ctrl + S
File extension: .R
Run code from editor to console for output

🔶 Best Practices
Save script before running code
Type code manually → Helps you learn from mistakes
Avoid copy-paste; typing reinforces learning

✅ Important Functions in R
1. seq() Function
Used to generate sequences.
Syntax: seq(from, to, by)
Example: seq(1, 30, 2) gives 1, 3, 5, ..., 29
2. rep() Function
Used to repeat elements.
Syntax: rep(value, times)
Example: rep(2, 20) repeats 2 twenty times.
3. Help Options in R
help(function_name) or ?function_name shows syntax, arguments, and examples.

✅ Subsetting in R
4. Indexing
Access elements using square brackets [ ] .
Example: a[5] returns the 5th element of vector a .
5. Multiple Indexing
Use c() to pass multiple indices.
Example: a[c(5, 7, 9)] gives elements at positions 5, 7, and 9.
6. Conditional Subsetting
Use logical conditions inside [ ] .
Example: a[a > 7] returns elements greater than 7.

✅ Logical Operators in R
7. Logical Conditions
> : greater than
< : less than
>= : greater than or equal to
<= : less than or equal to
== : equal to
!= : not equal to
8. Combining Conditions
| : OR operator
& : AND operator
Example: a[a > 15 | a < 8] returns values satisfying either condition.

✅ Best Practices
9. Online Help
Search for help using keywords like "how to repeat a number in R".
Useful websites include: StackOverflow, RDocumentation, etc.
10. Visualization Analogy
Vector indexing is like finding people on specific floors of a building.
Index = floor number; Vector name = building name.

✅ Session 3 Overview: Working with Matrices and

Data Frames in R

🔧 Before Starting
Open the file W1S3.R in RStudio.
Clear the console using Ctrl + L
Clear the Global Environment by clicking the brush icon.
Close any other open files—only W1S3.R should be open.

🧮 Vectors Recap
A: Integer vector 1:10
B: Numeric sequence from 2 to 10 with 10 values using seq(2, 10, length.out = 10)
C: Character vector: first five values "Sachin" , next five "Saurav"

🧱 Matrix in R
A matrix is a tabular structure with homogeneous data types.
Matrix data must all be numeric or character—not mixed.
Matrix cells are accessed via [row, column] format (e.g., [2, 3] is row 2, column 3).

🛠️ Creating Matrices
1. Column Bind ( cbind )

matrix1 <- cbind(a, b, c)

Joins vectors side-by-side (columns).
Converts all data to character if any one is character (due to coercion).
2. Row Bind ( rbind )

matrix2 <- rbind(a, b, c)

Joins vectors one below another (rows).
Different shape compared to cbind .
3. Direct Matrix Creation

matrix3 <- matrix(1:9, nrow = 3, byrow = TRUE)

Creates a 3x3 matrix with values from 1 to 9.
byrow = TRUE fills row-wise, FALSE (default) fills column-wise.

🧪 Matrix Functions
Use View(matrix_name) to open spreadsheet-like view.
Use t(matrix) for transpose (swap rows/columns).

🗃️ Data Frame in R
A data frame allows different types of data in different columns (unlike matrices).
Created using: data1 <- data.frame(gh = a, ij = b, kl = c)
gh , ij , kl become column names.
a , b , c are the actual vectors with data.

🔍 Key Differences

Feature Matrix Data Frame

Data Types Homogeneous (all same) Heterogeneous (mixed allowed)
Use Case Numeric/character tables Tabular data (like a spreadsheet)
Access [row, column] $column or [row, column]

This session focuses on working with a basic dataset in R, covering how to:

1. Create and view a dataset using variables like company , fy , revenue , and margin .
2. Add a new variable profit calculated from revenue * margin / 100 .
3. Use the dplyr library to:
Group data using group_by()
Modify data using mutate() to add new columns (e.g., highest and lowest margin)
Use summarise() to condense grouped data into summary statistics

💡 Key R Concepts Covered

Concept Explanation
data.frame() Combines vectors into a tabular structure.
$ operator Used to access or create columns within a dataframe.
mutate() Adds or modifies columns without reducing the dataset size.
summarise() Condenses multiple rows into one per group.
group_by() Used with dplyr to perform group-wise operations.
install.packages("dplyr") Installs the dplyr package.
library(dplyr) Loads the dplyr package into memory for use.

You’ve shared a detailed walkthrough of data operations in R programming, covering topics

like:
1. Summarization using group_by and summarise()
2. Conditional logic using ifelse()
3. Looping with for loops
4. Subsetting data
5. Creating custom functions

Here’s a clean summary with relevant R code snippets that correspond to each major point
you made

🧮 1. Group-wise Minimum Cost

You used group_by() and summarise() to calculate the lowest cost for each year ( fy ).

library(dplyr) new_data <- data %>% group_by(as.factor(fy)) %>%

summarise(lowest_cost = min(cost))

Make sure:

cost is spelled with a lowercase c

fy is the correct year column
You grouped by as.factor(fy) to treat fy as a categorical variable

🔁 2. Conditional Columns Using ifelse()

You created a new column to label margins:

data$margin_high_low <- ifelse(data$margin > 10, "High", "Low")

You also mentioned extending it with nested conditions, which can be done with case_when()
(preferred for multiple conditions):

data$margin_level <- case_when( data$margin > 15 ~ "Very High", data$margin > 10 ~

"High", TRUE ~ "Low" )

🔄 3. Filtering Data Using Subset

You filtered only PNG company rows:

data_png <- data[data$company == "PNG", ]

Make sure to use == (not = ) for logical equality.

📉 4. Calculating Growth Using for Loop

You added a growth_rate column by calculating percentage change row-wise:

data_png$gr <- 0 # initialize the column for (i in 2:nrow(data_png)) {

data_png$gr[i] <- (data_png$revenue[i] - data_png$revenue[i - 1]) /
data_png$revenue[i - 1] }

🧰 5. Defining Your Own Function

You referred to writing custom functions like f(x) in mathematics. Here’s a simple example:

growth_calc <- function(current, previous) { return((current - previous) /

previous) } # Usage growth_calc(15698, 14567)

Week 1, Session 5: Handling Hotel Review Data in R

Key Concepts:

Data: sample hotel data.csv - contains hotel reviews (overall rating, date, reviewer type,
and 6 attribute ratings: value, location, sleep quality, rooms, cleanliness, service).
Objective: Basic data analytics for marketing insights (performance, areas for
improvement).
R Functions:
read.csv() : Reads the CSV data.
str() : Shows the structure of the data frame (rows, columns, data types).
names() : Gets column names.
head() : Shows the first few rows.
View() : Opens data in a spreadsheet view.
library(dplyr) : Loads the dplyr package for data manipulation.
group_by() : Groups data by a specific column (e.g., hotel_name_city ).
summarize() : Creates new columns by applying summary functions (e.g., mean() ).
na.rm = TRUE : Argument in mean() to handle missing values.
as.data.frame() : Converts to a data frame.
[rows, columns] : Used for subsetting data frames.

Core Steps in Analysis:

1. Read Data: Load the sample hotel data.csv into R.

2. Explore Data: Use str() , names() , head() , View() to understand the data.
3. Summarize by Hotel: Use dplyr 's group_by() and summarize() to calculate mean
overall rating and mean attribute ratings for each hotel.
4. Compare Performance: The summarized data allows for comparison of overall and
attribute ratings between hotels.

Potential MCQ Topics:

Purpose of different R functions ( read.csv , str , head , summarize , group_by ).

Understanding the structure of the sample hotel data.csv dataset (columns and their
meaning).
How to calculate basic summary statistics (like mean) in R, including handling missing data
( na.rm = TRUE ).
The role of dplyr in data manipulation.
How to group data and perform calculations within groups.
Basic steps in analyzing customer review data for marketing insights.
The meaning of overall rating and attribute ratings in the context of hotel reviews.
How to subset a data frame in R.

Not Covered in Detail (Less Likely for Basic MCQ):

Advanced R programming concepts beyond basic data frames and functions.

Specific details of regression analysis (mentioned as a future step).
In-depth text mining of review content.
Detailed strategies for resource allocation or service improvement.
The as.Date() function and date format conversions.

Focus on understanding the basic R commands used for data loading, exploration, and
summarization, and how these steps can provide initial marketing insights from
customer review data.

Here are the short and most important notes from Professor Chatterjee's lecture (Week 1,
Session 6):

Topic: Analyzing Hotel Review Data in R - Visualization and Regression

Key Objectives:

Visualization: Create a bar plot to compare overall and attribute ratings of two hotels.
Regression Introduction: Outline the steps involved in regression analysis to determine
the importance of different hotel aspects on overall rating.
Ordered Logistic Regression: Introduce an alternative method for analyzing ordered
categorical data (like the 1-5 star ratings).
Coding Familiarity: Get more comfortable with basic R coding for data analysis.

R Code and Concepts Covered:

1. Bar Plot Creation:

barplot() function.
Input needs to be a matrix.
names.arg : Specifies labels for the bars (using column names).
xlab , ylab : Sets axis labels.
beside = TRUE : Displays bars for different groups side-by-side.
col : Sets the colors of the bars.
legend() : Adds a legend to the plot, specifying location ( x , y ), labels
( summary_two[, 1] ), and colors.
2. Steps for Regression Analysis (to find importance of aspects):
Missing Value Imputation: Replace missing values (using median imputation as the
method applied).
Loop through relevant columns (aspect ratings).
Use ifelse() and is.na() to identify missing values.
Replace NA with the median() of that column.
Outlier Removal: Identify and remove extreme values (using the Z-score method with a
threshold of +/- 3).
Calculate Z-scores ( scale() ).
Keep data points where the absolute Z-score is less than 3.
Correlation Check: Examine the correlation matrix of the independent variables
(aspect ratings) to avoid multicollinearity (using cor() ).
3. Normality Check (and its caveat):
Visual inspection using hist() for each variable.
Shapiro-Wilk test ( shapiro.test() ) for formal normality testing.
Important Note: The lecture acknowledges that the rating data is likely not truly normal
(being categorical), but linear regression might still be used for a general idea in some
marketing research.
4. Linear Regression:
lm() function: fit <- lm(review_overall_rating ~ . , data = DATA) (where .
represents all other columns as predictors).
summary(fit) : Displays the results of the linear regression (F-statistic, R-squared,
coefficients, p-values).
Interpretation: Coefficients indicate the impact of each aspect on the overall rating.
Service had the highest positive coefficient in this example.
5. Ordered Logistic Regression (for ordered categorical Y variable):
Requires the MASS library ( library(MASS) ).
Convert both dependent ( review_overall_rating ) and independent (aspect) variables
to factors using as.factor() .
polr() function: fit1 <- polr(factor(review_overall_rating) ~
factor(rating_value) + ..., data = data1, method = "logistic") .
summary(fit1) : Displays the results of the ordered logistic regression.
Interpretation: Coefficients show the log-odds of moving to a higher rating category for
a one-unit increase in the predictor (keeping other variables constant). The example
showed the impact of increasing aspect ratings (e.g., value for money) on the likelihood
of higher overall ratings.

Key Takeaways for MCQs:

How to create and interpret a basic bar plot in R for comparing groups.
The fundamental steps involved in preparing data for regression analysis (missing value
handling, outlier detection, correlation check).
The purpose and interpretation of linear regression results (coefficients, significance).
The concept of ordered logistic regression and why it might be suitable for ordered
categorical dependent variables.
Basic R functions used for these analyses ( barplot , ifelse , is.na , median , scale ,
cor , hist , shapiro.test , lm , polr , as.factor ).
The overall goal of using these analytical techniques in a marketing context (understanding
drivers of customer satisfaction).

Important Note for Future Sessions: The professor emphasizes the need to revise basic
statistics, marketing management, and introductory business analytics (especially regression
and basic machine learning concepts) as these will be heavily used in future weeks.

CE4530 4.0v1 Sophos Central XDR Live Discover Query Scheduling and Editing
No ratings yet
CE4530 4.0v1 Sophos Central XDR Live Discover Query Scheduling and Editing
37 pages
An Answer
No ratings yet
An Answer
106 pages
R - A Practical Course
No ratings yet
R - A Practical Course
42 pages
Mesa County Database and System Analysis
100% (6)
Mesa County Database and System Analysis
22 pages
N2 Data in R
No ratings yet
N2 Data in R
7 pages
Da Session 4
No ratings yet
Da Session 4
75 pages
Tutorial 1
No ratings yet
Tutorial 1
29 pages
Data in R
No ratings yet
Data in R
7 pages
R Programming Checklist of Basic Skills With Examples
No ratings yet
R Programming Checklist of Basic Skills With Examples
33 pages
Week 1-B. Data in R
No ratings yet
Week 1-B. Data in R
5 pages
Lecture 1
No ratings yet
Lecture 1
42 pages
Unit 4
No ratings yet
Unit 4
27 pages
Network Analysis and Visualization With R and Igraph
No ratings yet
Network Analysis and Visualization With R and Igraph
62 pages
R Program Record Book Iba
No ratings yet
R Program Record Book Iba
24 pages
CH 3
No ratings yet
CH 3
33 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
Chapter 1 Introduction To R
No ratings yet
Chapter 1 Introduction To R
33 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
Introduction To R
No ratings yet
Introduction To R
34 pages
All Codes
No ratings yet
All Codes
10 pages
R Statistical Package
No ratings yet
R Statistical Package
63 pages
DR - Pierpaolo-Delser - Introduction R
No ratings yet
DR - Pierpaolo-Delser - Introduction R
83 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
Data Anlytics Using R Notes
No ratings yet
Data Anlytics Using R Notes
14 pages
Introduction To R
No ratings yet
Introduction To R
21 pages
Statistics With R Unit 1: Divya Arun Kumar
No ratings yet
Statistics With R Unit 1: Divya Arun Kumar
65 pages
Week 5
No ratings yet
Week 5
5 pages
R WorkSamples
No ratings yet
R WorkSamples
44 pages
Programming With R: Lecture #4
No ratings yet
Programming With R: Lecture #4
34 pages
R Programming
No ratings yet
R Programming
22 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
37 pages
R Advbeginner v5
No ratings yet
R Advbeginner v5
73 pages
R Module 2
No ratings yet
R Module 2
30 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
R - Tutorial: Matrices Are Vectors
No ratings yet
R - Tutorial: Matrices Are Vectors
13 pages
R Introduction
No ratings yet
R Introduction
40 pages
Mod 2 Summary Table
No ratings yet
Mod 2 Summary Table
16 pages
Base R
No ratings yet
Base R
9 pages
Introduction To R
No ratings yet
Introduction To R
39 pages
Introduction To R
No ratings yet
Introduction To R
20 pages
Unit 1 Big Data Analytics - An Introduction (Final)
No ratings yet
Unit 1 Big Data Analytics - An Introduction (Final)
65 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
Beginner Guide To R and R Studio V1
No ratings yet
Beginner Guide To R and R Studio V1
27 pages
BDA Section 3
No ratings yet
BDA Section 3
33 pages
R Programming Language: History
No ratings yet
R Programming Language: History
20 pages
MKT4080 Review Notes-R Part
No ratings yet
MKT4080 Review Notes-R Part
13 pages
R Programming
No ratings yet
R Programming
22 pages
Matrix, Dataframes, List
No ratings yet
Matrix, Dataframes, List
8 pages
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
No ratings yet
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
72 pages
Satyam Jha R File
No ratings yet
Satyam Jha R File
41 pages
M2 Dar
No ratings yet
M2 Dar
46 pages
Rbasics
No ratings yet
Rbasics
96 pages
Apunts BLOC 1 Estadística
No ratings yet
Apunts BLOC 1 Estadística
15 pages
ProgrammingForDS14 Rbasics
No ratings yet
ProgrammingForDS14 Rbasics
32 pages
R Session A
No ratings yet
R Session A
107 pages
R File Code
No ratings yet
R File Code
16 pages
R Study Material I
No ratings yet
R Study Material I
8 pages
R
No ratings yet
R
13 pages
R Pres
No ratings yet
R Pres
53 pages
R Commands
No ratings yet
R Commands
18 pages
Introduction To R For Business Analytics
No ratings yet
Introduction To R For Business Analytics
7 pages
Introduction To R
No ratings yet
Introduction To R
74 pages
JLG - PC Analyzer Kit Instruction
No ratings yet
JLG - PC Analyzer Kit Instruction
4 pages
Finance Analytics - Adani GCC
No ratings yet
Finance Analytics - Adani GCC
4 pages
Sim - VNCK - TTAC - Mill - 5ax - Sinumerik - Installation - Instructions
No ratings yet
Sim - VNCK - TTAC - Mill - 5ax - Sinumerik - Installation - Instructions
5 pages
Syllabus - Basic Computer and Information Science
No ratings yet
Syllabus - Basic Computer and Information Science
2 pages
Theory IoT3
No ratings yet
Theory IoT3
7 pages
Os Answer 1
No ratings yet
Os Answer 1
3 pages
Class7 Midterm Paper 2023AGAIN
No ratings yet
Class7 Midterm Paper 2023AGAIN
3 pages
Level 2 Digital Technologies and Hangarau Matihiko 2024: 91898 Demonstrate Understanding of A Computer Science Concept
No ratings yet
Level 2 Digital Technologies and Hangarau Matihiko 2024: 91898 Demonstrate Understanding of A Computer Science Concept
19 pages
LTE FDD Baseband Proc Tech Spec
No ratings yet
LTE FDD Baseband Proc Tech Spec
15 pages
Micro Programming Concepts
No ratings yet
Micro Programming Concepts
2 pages
Optimization Problem
No ratings yet
Optimization Problem
21 pages
New Text Document
No ratings yet
New Text Document
2 pages
D31-Vlsi PD
No ratings yet
D31-Vlsi PD
7 pages
IT WK4 G9 24 25 Key
No ratings yet
IT WK4 G9 24 25 Key
4 pages
Mapping Ai 2021 v2 PDF
No ratings yet
Mapping Ai 2021 v2 PDF
1 page
1D.20030.020377 - TC-R3108 IBP8CEuLS - en
No ratings yet
1D.20030.020377 - TC-R3108 IBP8CEuLS - en
4 pages
1515 SecureAgentBestPracticesAndTuningGuidelines en H2L
No ratings yet
1515 SecureAgentBestPracticesAndTuningGuidelines en H2L
5 pages
9.1.2.5 Lab - Install Linux in A Virtual Machine and Explore The GUI
No ratings yet
9.1.2.5 Lab - Install Linux in A Virtual Machine and Explore The GUI
4 pages
C++ Chapter 5 Arrays and String
No ratings yet
C++ Chapter 5 Arrays and String
15 pages
COS201-Computer Programming I - Week 1
No ratings yet
COS201-Computer Programming I - Week 1
43 pages
Advanced - Call Queues and Auto Attendants
No ratings yet
Advanced - Call Queues and Auto Attendants
35 pages
Programmable Logic Controller L T P C 1 0 0 1: Department of
No ratings yet
Programmable Logic Controller L T P C 1 0 0 1: Department of
4 pages
Anand Bhat PHD Thesis
No ratings yet
Anand Bhat PHD Thesis
202 pages
Lecture - 10 Cryptographic Hash Functions
No ratings yet
Lecture - 10 Cryptographic Hash Functions
46 pages
Project Report
No ratings yet
Project Report
19 pages
2020 CyberC AComprehensiveDetectionApproachofNmap-PrinciplesRulesandExperiments
No ratings yet
2020 CyberC AComprehensiveDetectionApproachofNmap-PrinciplesRulesandExperiments
9 pages
AMD AM29LV641DH90REI Datasheet
No ratings yet
AMD AM29LV641DH90REI Datasheet
5 pages

Week 1

Uploaded by

Week 1

Uploaded by

WEEK 1

🟦 Course & Software Overview

🟩 Why Use R in Marketing Analytics?

🟨 Installing R & RStudio

🟧 Why RStudio over Base R?

🟥 RStudio Interface (4 Quadrants)

🔷 Getting Started in RStudio

✅ Session 3 Overview: Working with Matrices and

matrix1 <- cbind(a, b, c)

matrix2 <- rbind(a, b, c)

matrix3 <- matrix(1:9, nrow = 3, byrow = TRUE)

Feature Matrix Data Frame

💡 Key R Concepts Covered

You’ve shared a detailed walkthrough of data operations in R programming, covering topics

🧮 1. Group-wise Minimum Cost

library(dplyr) new_data <- data %>% group_by(as.factor(fy)) %>%

cost is spelled with a lowercase c

🔁 2. Conditional Columns Using ifelse()

data$margin_high_low <- ifelse(data$margin > 10, "High", "Low")

data$margin_level <- case_when( data$margin > 15 ~ "Very High", data$margin > 10 ~

🔄 3. Filtering Data Using Subset

data_png <- data[data$company == "PNG", ]

Make sure to use == (not = ) for logical equality.

📉 4. Calculating Growth Using for Loop

data_png$gr <- 0 # initialize the column for (i in 2:nrow(data_png)) {

🧰 5. Defining Your Own Function

growth_calc <- function(current, previous) { return((current - previous) /

Week 1, Session 5: Handling Hotel Review Data in R

Core Steps in Analysis:

1. Read Data: Load the sample hotel data.csv into R.

Potential MCQ Topics:

Purpose of different R functions ( read.csv , str , head , summarize , group_by ).

Not Covered in Detail (Less Likely for Basic MCQ):

Advanced R programming concepts beyond basic data frames and functions.

Topic: Analyzing Hotel Review Data in R - Visualization and Regression

R Code and Concepts Covered:

1. Bar Plot Creation:

Key Takeaways for MCQs:

You might also like