Week 1
Week 1
🔶 Best Practices
Save script before running code
Type code manually → Helps you learn from mistakes
Avoid copy-paste; typing reinforces learning
✅ Important Functions in R
1. seq() Function
Used to generate sequences.
Syntax: seq(from, to, by)
Example: seq(1, 30, 2) gives 1, 3, 5, ..., 29
2. rep() Function
Used to repeat elements.
Syntax: rep(value, times)
Example: rep(2, 20) repeats 2 twenty times.
3. Help Options in R
help(function_name) or ?function_name shows syntax, arguments, and examples.
✅ Subsetting in R
4. Indexing
Access elements using square brackets [ ] .
Example: a[5] returns the 5th element of vector a .
5. Multiple Indexing
Use c() to pass multiple indices.
Example: a[c(5, 7, 9)] gives elements at positions 5, 7, and 9.
6. Conditional Subsetting
Use logical conditions inside [ ] .
Example: a[a > 7] returns elements greater than 7.
✅ Logical Operators in R
7. Logical Conditions
> : greater than
< : less than
>= : greater than or equal to
<= : less than or equal to
== : equal to
!= : not equal to
8. Combining Conditions
| : OR operator
& : AND operator
Example: a[a > 15 | a < 8] returns values satisfying either condition.
✅ Best Practices
9. Online Help
Search for help using keywords like "how to repeat a number in R".
Useful websites include: StackOverflow, RDocumentation, etc.
10. Visualization Analogy
Vector indexing is like finding people on specific floors of a building.
Index = floor number; Vector name = building name.
🔧 Before Starting
Open the file W1S3.R in RStudio.
Clear the console using Ctrl + L
Clear the Global Environment by clicking the brush icon.
Close any other open files—only W1S3.R should be open.
🧮 Vectors Recap
A: Integer vector 1:10
B: Numeric sequence from 2 to 10 with 10 values using seq(2, 10, length.out = 10)
C: Character vector: first five values "Sachin" , next five "Saurav"
🧱 Matrix in R
A matrix is a tabular structure with homogeneous data types.
Matrix data must all be numeric or character—not mixed.
Matrix cells are accessed via [row, column] format (e.g., [2, 3] is row 2, column 3).
🛠️ Creating Matrices
1. Column Bind ( cbind )
🧪 Matrix Functions
Use View(matrix_name) to open spreadsheet-like view.
Use t(matrix) for transpose (swap rows/columns).
🗃️ Data Frame in R
A data frame allows different types of data in different columns (unlike matrices).
Created using: data1 <- data.frame(gh = a, ij = b, kl = c)
gh , ij , kl become column names.
a , b , c are the actual vectors with data.
🔍 Key Differences
This session focuses on working with a basic dataset in R, covering how to:
1. Create and view a dataset using variables like company , fy , revenue , and margin .
2. Add a new variable profit calculated from revenue * margin / 100 .
3. Use the dplyr library to:
Group data using group_by()
Modify data using mutate() to add new columns (e.g., highest and lowest margin)
Use summarise() to condense grouped data into summary statistics
Concept Explanation
data.frame() Combines vectors into a tabular structure.
$ operator Used to access or create columns within a dataframe.
mutate() Adds or modifies columns without reducing the dataset size.
summarise() Condenses multiple rows into one per group.
group_by() Used with dplyr to perform group-wise operations.
install.packages("dplyr") Installs the dplyr package.
library(dplyr) Loads the dplyr package into memory for use.
Here’s a clean summary with relevant R code snippets that correspond to each major point
you made
Make sure:
You also mentioned extending it with nested conditions, which can be done with case_when()
(preferred for multiple conditions):
Key Concepts:
Data: sample hotel data.csv - contains hotel reviews (overall rating, date, reviewer type,
and 6 attribute ratings: value, location, sleep quality, rooms, cleanliness, service).
Objective: Basic data analytics for marketing insights (performance, areas for
improvement).
R Functions:
read.csv() : Reads the CSV data.
str() : Shows the structure of the data frame (rows, columns, data types).
names() : Gets column names.
head() : Shows the first few rows.
View() : Opens data in a spreadsheet view.
library(dplyr) : Loads the dplyr package for data manipulation.
group_by() : Groups data by a specific column (e.g., hotel_name_city ).
summarize() : Creates new columns by applying summary functions (e.g., mean() ).
na.rm = TRUE : Argument in mean() to handle missing values.
as.data.frame() : Converts to a data frame.
[rows, columns] : Used for subsetting data frames.
Focus on understanding the basic R commands used for data loading, exploration, and
summarization, and how these steps can provide initial marketing insights from
customer review data.
Here are the short and most important notes from Professor Chatterjee's lecture (Week 1,
Session 6):
Key Objectives:
Visualization: Create a bar plot to compare overall and attribute ratings of two hotels.
Regression Introduction: Outline the steps involved in regression analysis to determine
the importance of different hotel aspects on overall rating.
Ordered Logistic Regression: Introduce an alternative method for analyzing ordered
categorical data (like the 1-5 star ratings).
Coding Familiarity: Get more comfortable with basic R coding for data analysis.
How to create and interpret a basic bar plot in R for comparing groups.
The fundamental steps involved in preparing data for regression analysis (missing value
handling, outlier detection, correlation check).
The purpose and interpretation of linear regression results (coefficients, significance).
The concept of ordered logistic regression and why it might be suitable for ordered
categorical dependent variables.
Basic R functions used for these analyses ( barplot , ifelse , is.na , median , scale ,
cor , hist , shapiro.test , lm , polr , as.factor ).
The overall goal of using these analytical techniques in a marketing context (understanding
drivers of customer satisfaction).
Important Note for Future Sessions: The professor emphasizes the need to revise basic
statistics, marketing management, and introductory business analytics (especially regression
and basic machine learning concepts) as these will be heavily used in future weeks.