0% found this document useful (0 votes)
12 views

1. R Programming

This document provides an introduction to R, a language and environment for statistical computing and graphics, emphasizing its strengths in data analysis and visualization. It covers the RStudio interface, basic calculations, data structures like vectors and data frames, and how to manage packages and import data from CSV and Excel files. The document also highlights the importance of getting help and practicing with R to enhance data manipulation skills.

Uploaded by

ilias ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

1. R Programming

This document provides an introduction to R, a language and environment for statistical computing and graphics, emphasizing its strengths in data analysis and visualization. It covers the RStudio interface, basic calculations, data structures like vectors and data frames, and how to manage packages and import data from CSV and Excel files. The document also highlights the importance of getting help and practicing with R to enhance data manipulation skills.

Uploaded by

ilias ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Class 1: Introduction to R

Introduction to R and Data Basics

Md. Iftekhar Ahmed Khan


Machine Learning Engineer
Bondstein Technologies
Limited
Welcome & Today's Goals
• Today you will learn to:
• Understand what R is and why it's used.
• Navigate the RStudio interface.
• Perform basic calculations and use variables.
• Understand and use fundamental data structures (Vectors, Data Frames).
• Manage R packages (install/load).
• Import data from common file types (CSV, Excel) and save results.
• Know how to get help!
What is R?
• R is a language AND an environment for statistical computing
and graphics.
• Strengths:
• Specifically designed for data analysis and visualization.
• Open Source: Free to use, modify, and distribute.
• Huge Community: Active development, extensive documentation, lots
of help online.
• Packages: Thousands of add-ons for specialized tasks (more later!).
• Common Uses: Data cleaning, data exploration, statistical
modeling, machine learning, report generation, creating plots
and dashboards.
Why R for Data Science?
• Vast Package Ecosystem: CRAN (Comprehensive R Archive
Network) hosts thousands of packages (like dplyr for manipulation,
ggplot2 for plotting).
• Powerful Visualization: Tools like ggplot2 allow for creating
complex and publication-quality graphics.
• Data Wrangling: Excellent tools (like the tidyverse) for cleaning,
transforming, and preparing data.
• Reproducibility: Scripts make analyses repeatable and shareable.
• Interoperability: Connects well with databases, other languages
(Python, SQL), and reporting tools (R Markdown).
RStudio
• RStudio: An Integrated Development Environment (IDE) for
R. Think of it as a powerful dashboard for R.
• Code editor with syntax highlighting
• Console to run commands interactively

• Workspace browser to see your variables


• Plotting window
• Help and file browsers
• Package manager
Your First Commands (Use Console)
• R can be used as a powerful calculator.
• Type directly into the Console pane (after the > prompt) and
press Enter.
• # Basic Arithmetic
• 2+2
# [1] 4 <- This is the output R gives
• 5 * 10
# [1] 50
• 10 / 3
# [1] 3.333333
Logical Operations & Comparisons
• Used for asking TRUE/FALSE questions. Essential for filtering data later.
• == means "is equal to?" (Note: double equals!)
• != means "is not equal to?"
• >, <, >=, <= (Greater than, Less than, etc.)
# Comparisons
5>3
# [1] TRUE
10 == 10
# [1] TRUE
10 == 5
# [1] FALSE
5 != 6
# [1] TRUE
Variables (Objects) in R
• Store values or results using variables (R often calls them
objects).
• Use the assignment operator <- (less than sign, hyphen).
Think of it as an arrow pointing from the value to the variable
name.
• Variable names:
• Must start with a letter.
• Can contain letters, numbers, _, and ..
• Are case-sensitive (myVar is different from myvar).
• Avoid using names of existing functions (like c, mean, data).
Using Built-in Functions
• Functions perform specific tasks. You provide arguments (inputs)
inside parentheses ().
• R has many built-in functions.
some_numbers <- c(2, 8, 3, 7, 5)
# Use functions on the data
sum(some_numbers) # Calculates the sum
# [1] 25
mean(some_numbers) # Calculates the average (mean)
# [1] 5
Getting Help!
• Essential skill! Don't try to memorize everything.
• Use ? followed by the function name (no parentheses needed).
• Use help("function_name").
• Use ?? to search documentation for keywords (use quotes).
Packages: Extending R's Power
• Packages are collections of functions, data, and documentation that add
specific capabilities to R.
• Thousands are available from CRAN (Comprehensive R Archive Network)
and other places (like GitHub, Bioconductor).
• Examples: dplyr for data manipulation, ggplot2 for plotting, readxl for
reading Excel files.
• Two Steps:
• Install: Download the package to your computer (only need to do ONCE per R
installation). Use install.packages("package_name").
• Load: Make the package's functions available in your current R session (need to do
EVERY TIME you start a new R session and want to use it). Use
library(package_name).
Data Structures: Organizing Your Data
• Variables store single values. Data structures store collections of values.
• R has several fundamental data structures:
• Vectors: Ordered sequence of elements of the same basic type. (MOST
FUNDAMENTAL)
• Data Frames: Rectangular table (like a spreadsheet), columns can be
different types. (MOST IMPORTANT FOR TABULAR DATA)
• Lists: Ordered, flexible collection, elements can be of different
types/structures.
• Matrices: 2-dimensional array, all elements must be the same type.
• Factors: Special type of vector for representing categorical data
(groups/levels).
Data Structure 1: Vectors
• The basic building block. Use the c() function (combine or
concatenate).
• All elements MUST be the same type (numeric, character,
logical). If you mix, R will coerce them (often to character).
# Numeric vector
ages <- c(25, 30, 22, 45)
ages
# [1] 25 30 22 45
class(ages)
# "numeric"
Data Structure 2: Data Frames
• The go-to structure for datasets (rows = observations, columns =
variables).
• Think spreadsheet: rectangular.
• Columns are typically vectors.
• Columns can be different data types (numeric, character, etc.).
• All columns MUST have the same length (same number of rows).
# Creating a data frame
employee_data <- data.frame(
EmployeeID = c(101, 102, 103, 104),
Name = c("Alice", "Bob", "Charlie", "David"),
Department = c("Sales", "IT", "Sales", "HR"),
Salary = c(50000, 65000, 52000, 58000)
)
# Print the data frame
employee_data
Accessing Data Frame Elements
• Use $ to access columns by name (most common).
• Use [[ ]] to access columns by name or index.
• Use [row, column] indexing.
# Access the 'Name' column
employee_data$Name
# [1] "Alice" "Bob" "Charlie" "David"
# Access the 'Salary' column
employee_data[["Salary"]]
# [1] 50000 65000 52000 58000
# Access the 3rd column (Department)
employee_data[[3]]
# [1] "Sales" "IT" "Sales" "HR"
Data Structure 3: Lists
• Flexible containers. Can hold vectors, data frames, other lists,
mixed types.
my_list <- list(name = "Alice", age = 30, scores = c(85, 92, 78), employed
= TRUE)
my_list #Print the list
my_list$scores # Access list elements by name using $
my_list[[3]] # Access list elements by index using [[ ]]
Working Directory & RStudio Projects
• When reading/writing files, R looks in the working directory.
• getwd(): Get Working Directory (see where R is looking).
• setwd("path/to/your/directory"): Set Working Directory (use /
not \). Can be fragile!
• BETTER WAY: RStudio Projects!
• Go to File -> New Project... -> New Directory (or Existing
Directory).
• Create a folder for your course/project.
• RStudio automatically sets the working directory to the project folder when
you open the .Rproj file.
• Keeps scripts, data, and output organized together! Highly recommended.
Importing Data: CSV Files
• CSV = Comma Separated Values. Very common plain text
format.
• Use read.csv() (base R) or read_csv() (from the readr
package, part of tidyverse - often faster and smarter).
• Make sure the CSV file is in your RStudio Project folder (or
working directory).
# Assume 'employee_data.csv' exists in your project directory
# Using base R:
my_data_csv <- read.csv("employee_data.csv")
head(my_data_csv)
str(my_data_csv)
Importing Data: Excel Files
• Requires the readxl package (install and load it first!).
• read_excel() function is the main tool.
• Can specify sheet name or number.
# Make sure readxl is loaded: library(readxl)
# Assume 'employee_data.xlsx' exists in your project directory
# Read the first sheet by default
my_data_excel <- read_excel("employee_data.xlsx")
head(my_data_excel)
str(my_data_excel)
Class 1 Summary & Recap
• R is a powerful language for data analysis. RStudio is the best way to
use it.
• You can do calculations, use variables (<-), and call functions ().
• Key Data Structures:
• Vectors: c(), same data type, access with [].
• Data Frames: data.frame(), columns ($, [[]], [,]), rows ([,]).
• Packages extend R: install.packages(), library().
• Use RStudio Projects for organization.
• Import/Export: read.csv(), read_excel(), write.csv(),
write_xlsx().
• Getting Help: ?, ??.
Practice & Next Class
• Practice:
• Create different types of vectors.
• Create a simple data frame.
• Practice accessing elements/columns.
• Try importing a sample CSV or Excel file (find one online or create
one).
• Next Class:
• Data Manipulation! We'll learn how to filter, select, rearrange, and
summarize data using the powerful dplyr package.

You might also like