Rstudio Cours

Uploaded by

fatichourak48

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views11 pages

Rstudio Cours

Uploaded by

fatichourak48

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 11

R studio_Com

Kamal ZEHRAOUI
Step 1: Creating an Initial Data Frame
data <- data.frame( ID = 1:3, Name = c("Alice", "Bob", "Charlie"), Age = c(24, 30, 28),
Gender = as.factor(c("Female", "Male", "Male"))
print(data)
Step 2: Adding a New Row
new_row <- data.frame( ID = 4, Name = "Diana", Age = 27, Gender = factor("Female", levels
= levels(data$Gender))
data <- rbind(data, new_row)
print(data)
Step 3: Adding a New Column
data$Occupation <- c("Engineer", "Doctor", "Artist", "Lawyer")
print(data)
• A character data type in R represents plain text or string data. It
stores data as literal text, which means each entry is treated
independently. Characters are typically used for variables where the
values are unique and don’t necessarily repeat (e.g., names, unique
identifiers).
• A factor is a categorical variable used to store a fixed set of unique
values (called "levels"). Factors are ideal for data that belongs to a
limited number of categories, like gender, status, or education level. R
treats factors differently from characters because they are stored as
integer values with corresponding labels, which helps in data analysis,
especially with categorical data.
Basic Data Exploration: Without Packages

• # View the structure of the dataset

str(data_kamal)
• # Summary statistics for quantitative variables
summary(data_kamal)
• # Frequency table for categorical variables
table(data_kamal$Gender) table(data_kamal$Department)
• # Basic descriptive statistics for quantitative variables
mean(data_kamal$Age)
median(data_kamal$Income)
sd(data_kamal$Weight)
Basic Data Exploration: With Packages

• # Install and load the dplyr package

• install.packages("dplyr") library(dplyr)
• # Summary statistics for quantitative variables
data_kamal %>% summarise(across(c(Age, Income, Weight), list(mean = mean, sd = sd, median =
median)))

# Frequency table for categorical variables

data_kamal %>% count(Gender) data_kamal %>% count(Department)
Univariate Analysis: Without Packages
• # Histogram for quantitative variables
hist(data_kamal$Age, main = "Histogram of Age", xlab = "Age")
• # Bar plot for categorical variables
barplot(table(data_kamal$Gender), main = "Gender Distribution", xlab = "Gender", ylab = "Count")
Univariate Analysis: With Packages
# Load ggplot2 package
install.packages("ggplot2") library(ggplot2)
# Histogram for quantitative variables
ggplot(data_kamal, aes(x = Age)) + geom_histogram(binwidth = 5, fill = "skyblue", color = "black") +
ggtitle("Histogram of Age")
# Bar plot for categorical variables
ggplot(data_kamal, aes(x = Gender)) + geom_bar(fill = "lightgreen") + ggtitle("Gender Distribution")
 Bivariate Analysis : Without Packages
# Scatter plot for quantitative variables
plot(data_kamal$Age, data_kamal$Income, main = "Age vs Income", xlab = "Age", ylab = "Income")
# Boxplot for categorical vs quantitative
boxplot(data_kamal$Income ~ data_kamal$Gender, main = "Income by Gender", xlab = "Gender", ylab =
"Income")
# Correlation between quantitative variables
cor(data_kamal$Age, data_kamal$Income)
 Bivariate Analysis : With Packages
# Scatter plot for quantitative variables
ggplot(data_kamal, aes(x = Age, y = Income)) + geom_point(color = "blue") + ggtitle("Age vs Income")
# Boxplot for categorical vs quantitative
ggplot(data_kamal, aes(x = Gender, y = Income)) + geom_boxplot(fill = "lightgreen") + ggtitle("Income by Gender")
# Correlation matrix for quantitative variables
install.packages("GGally") library(GGally)
ggpairs(data_kamal %>% select(Age, Income, Weight), title = "Correlation Matrix")
 Multivariate Analysis
 Principal Component Analysis (PCA) : Without Packages
# Standardize the quantitative
data pca_data <- scale(data_kamal[, c("Age", "Income", "Weight")])
# Perform PCA
pca_result <- prcomp(pca_data) summary(pca_result)
# Scree plot
plot(pca_result, type = "l", main = "Scree Plot")
 Principal Component Analysis (PCA ) With Packages
install.packages("FactoMineR") install.packages("factoextra") library(FactoMineR) library(factoextra)
# PCA
pca_result <- PCA(data_kamal %>% select(Age, Income, Weight), graph = FALSE)
fviz_pca_var(pca_result)
fviz_pca_biplot(pca_result, geom.ind = "point", geom.var = "arrow", col.var = "black", title = "PCA Biplot")
# Scree plot of PCA with FactoMineR
fviz_eig(pca_result, addlabels = TRUE, ylim = c(0, 50), title = "Scree Plot")
# Correlation circle (variables) plot
fviz_pca_var(pca_result, col.var = "contrib", # Color by contribution of variables gradient.cols = c("blue", "red"), title = "PCA Variables Contribution")
# PCA individuals (scores) plot
fviz_pca_ind(pca_result, col.ind = "cos2", # Color by cos2 (quality of representation) gradient.cols = c("blue", "green", "red"), title = "PCA Individuals (Scores)")
# Contributions of individuals to the first principal component
fviz_contrib(pca_result, choice = "ind", axes = 1, top = 10, title = "Top 10 Contributions of Individuals to PC1")
# Contributions of variables to the first principal component
 Hierarchical Clustering Without Packages

# Step 1: Standardize the data (optional, but recommended for clustering) # Using scale() to standardize quantitative variables
data_scaled <- scale(data_kamal[, c("Age", "Income", "Weight")])
# Step 2: Compute the distance matrix
distance_matrix <- dist(data_scaled, method = "euclidean") # Use "manhattan" for Manhattan distance
# Step 3: Perform hierarchical clustering using hclust
hc_result <- hclust(distance_matrix, method = "ward.D2") # Options: "ward.D2", "single", "complete", "average«
# Step 4: Plot the dendrogram
plot(hc_result, main = "Dendrogram of Hierarchical Clustering", xlab = "", sub = "")
# Step 5: Cut the dendrogram into clusters
data_kamal$Cluster <- cutree(hc_result, k = 3) # k is the number of clusters you want
table(data_kamal$Cluster) # View the number of observations in each cluster
 Hierarchical Clustering With Packages
 install.packages("cluster") install.packages("factoextra") library(cluster) library(factoextra)

data_scaled <- scale(data_kamal[, c("Age", "Income", "Weight")])

# Euclidean distance matrix
distance_matrix <- dist(data_scaled, method = "euclidean")
hc_agnes <- agnes(data_scaled, method = "ward") # Options: "ward", "single", "complete", "average" print(hc_agnes$ac) # Agglomerative coefficient
# Basic dendrogram
fviz_dend(hc_agnes, rect = TRUE, k = 3, # Adds rectangles around clusters, set k for number of clusters main = "Dendrogram of Hierarchical Clustering", palette = "jco", # Set colour palette for
clusters cex = 0.5) # Adjust label size
# Cut the dendrogram into clusters clusters <- cutree(hc_agnes, k = 3) # Specify the number of clusters
# Add the cluster assignment to your data data_kamal$Cluster <- as.factor(clusters)

# Visualize clusters in a 2D scatter plot of principal components fviz_cluster(list(data = data_scaled, cluster = clusters), geom = "point", ellipse.type = "convex", palette = "jco", main = "Cluster Plot", ggtheme = theme_minimal())
 MCA Without Packages
# Convert categorical variables into dummy variables (one-hot encoding)

indicator_matrix <- model.matrix(~ Gender + Department - 1, data = data_kamal)

Use singular value decomposition (SVD) on the indicator matrix to perform MCA.

# Perform SVD on the indicator matrix svd_result <- svd(scale(indicator_matrix, scale = FALSE))

# Extract the principal components (MCA dimensions) mca_dimensions <- svd_result$u %*% diag(svd_result$d)

# Basic scatter plot of the first two dimensions

plot(mca_dimensions[, 1], mca_dimensions[, 2], xlab = "Dimension 1", ylab = "Dimension 2", main = "MCA Scatter Plot")

 MCA With Packages

install.packages("FactoMineR") install.packages("factoextra") library(FactoMineR) library(factoextra)

Ensure all categorical variables are coded as factors. Convert them if necessary.

data_kamal$Gender <- as.factor(data_kamal$Gender)

data_kamal$Department <- as.factor(data_kamal$Department)

# Perform MCA on the categorical variables

mca_result <- MCA(data_kamal[, c("Gender", "Department")], graph = FALSE)

fviz_screeplot(mca_result, addlabels = TRUE, ylim = c(0, 50), title = "MCA Scree Plot")

fviz_mca_var(mca_result, repel = TRUE, # Avoid overlapping labels col.var = "contrib", # Color by contributions to the dimensions gradient.cols = c("blue", "red"), title = "MCA Variable Categories Plot")

fviz_mca_ind(mca_result, col.ind = "cos2", # Color by quality of representation (cos2) gradient.cols = c("blue", "green", "red"), repel = TRUE, title = "MCA Individuals Plot")

fviz_mca_biplot(mca_result, repel = TRUE, # Avoid overlapping labels geom.ind = "point", col.var = "blue", col.ind = "red", title = "MCA Biplot")

print(mca_result)
 Basic Linear Regression Without Packages
Simple Linear Regression
# Simple linear regression: Predict Income based on Age
model_simple <- lm(Income ~ Age, data = data_kamal)
# View the summary of the model to see details
summary(model_simple)
Multiple Linear Regression
Convert Categorical Variables
data_kamal$Gender <- as.factor(data_kamal$Gender)
# Multiple linear regression: Predict Income based on Age, Weight, and Gender
model_multiple <- lm(Income ~ Age + Weight + Gender, data = data_kamal)
plot(model_multiple)

 Linear Regression With Packages

install.packages("ggplot2") install.packages("broom") library(ggplot2) library(broom)
Simple Linear Regression with Visualization
# Plot Income vs Age with a linear regression line
ggplot(data_kamal, aes(x = Age, y = Income)) + geom_point() + # Scatter plot geom_smooth(method = "lm", color = "blue") + # Linear regression line labs(title = "Simple Linear Regression of Income on Age", x
= "Age", y = "Income")
Multiple Linear Regression with Broom
# Tidy summary of the multiple regression model
model_multiple_tidy <- tidy(model_multiple) print(model_multiple_tidy)
# Residuals vs Fitted plot
ggplot(data.frame(fitted = fitted(model_multiple), residuals = resid(model_multiple)), aes(x = fitted, y = residuals)) + geom_point() + geom_hline(yintercept = 0, color = "red") + labs(title = "Residuals vs Fitted
Values", x = "Fitted Values", y = "Residuals")

Practical Guide To Principal Component Methods in R PDF
100% (4)
Practical Guide To Principal Component Methods in R PDF
205 pages
Advance R Prog.-1
No ratings yet
Advance R Prog.-1
24 pages
Lecture 1
No ratings yet
Lecture 1
167 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
68 pages
MATHS Worksheet and Class 2 CH-6 L1
No ratings yet
MATHS Worksheet and Class 2 CH-6 L1
40 pages
Module2 BDA
No ratings yet
Module2 BDA
44 pages
Lab Manual - DSR
No ratings yet
Lab Manual - DSR
32 pages
Descriptive Statistics in R
No ratings yet
Descriptive Statistics in R
46 pages
Aphical Representation
No ratings yet
Aphical Representation
8 pages
R Record-1
No ratings yet
R Record-1
57 pages
02 Stats Revision
No ratings yet
02 Stats Revision
46 pages
Lecture 1
No ratings yet
Lecture 1
36 pages
R Code
No ratings yet
R Code
13 pages
Commands For Data Analysis Using R
No ratings yet
Commands For Data Analysis Using R
11 pages
STAT 214-T241-Lab 2
No ratings yet
STAT 214-T241-Lab 2
23 pages
Lab File AD PDF
No ratings yet
Lab File AD PDF
25 pages
Lab 2
No ratings yet
Lab 2
22 pages
R Practicals
No ratings yet
R Practicals
32 pages
Lab 02 - Compound Data Structures
No ratings yet
Lab 02 - Compound Data Structures
12 pages
DataVis Cheat Sheet
No ratings yet
DataVis Cheat Sheet
13 pages
Mock Exam - Appendix
No ratings yet
Mock Exam - Appendix
15 pages
Basic R Commands For Data Analysis
No ratings yet
Basic R Commands For Data Analysis
7 pages
2023 Tutorial 12
No ratings yet
2023 Tutorial 12
6 pages
03 UnderstandData
No ratings yet
03 UnderstandData
29 pages
IntroR 2
No ratings yet
IntroR 2
18 pages
AMDA Practical - A048
No ratings yet
AMDA Practical - A048
35 pages
The Oxford Handbook of Critical Improvisation Studies
No ratings yet
The Oxford Handbook of Critical Improvisation Studies
601 pages
R Code
No ratings yet
R Code
9 pages
Lecture 1 Exploratory Data Analysis
No ratings yet
Lecture 1 Exploratory Data Analysis
41 pages
Module 5-6
No ratings yet
Module 5-6
12 pages
Basics of Data Analysis and Graphics in
No ratings yet
Basics of Data Analysis and Graphics in
103 pages
Unit Ii Eda Using R
No ratings yet
Unit Ii Eda Using R
11 pages
R Commands
No ratings yet
R Commands
18 pages
R Programming End Term
No ratings yet
R Programming End Term
4 pages
Experiment 2
No ratings yet
Experiment 2
7 pages
Notes
No ratings yet
Notes
6 pages
Intro To R Software
No ratings yet
Intro To R Software
7 pages
R File Code
No ratings yet
R File Code
16 pages
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
No ratings yet
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
15 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
Gail Marlow Taylor, Ph.D. - The Alchemy of Al-Razi - A Translation of The - Book of Secrets - CreateSpace Independent Publishing Platform (2015)
100% (1)
Gail Marlow Taylor, Ph.D. - The Alchemy of Al-Razi - A Translation of The - Book of Secrets - CreateSpace Independent Publishing Platform (2015)
274 pages
Unit 3
No ratings yet
Unit 3
11 pages
Lecture 7 - Integrated Analysis With R
No ratings yet
Lecture 7 - Integrated Analysis With R
79 pages
Exploratory Data Analysis
100% (1)
Exploratory Data Analysis
48 pages
Vedic Maths Final PPT-1
No ratings yet
Vedic Maths Final PPT-1
21 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
R Console
No ratings yet
R Console
6 pages
Data Preprocess Steps
No ratings yet
Data Preprocess Steps
2 pages
(Practical) Programming With R
No ratings yet
(Practical) Programming With R
5 pages
Lec 4
No ratings yet
Lec 4
18 pages
Module - 4 (R Training) - Basic Stats & Modeling
No ratings yet
Module - 4 (R Training) - Basic Stats & Modeling
15 pages
R
No ratings yet
R
6 pages
R Studio Notes
No ratings yet
R Studio Notes
10 pages
Computer Statistics With R: 2. Exploratory Data Analysis (Descriptive Statistics)
No ratings yet
Computer Statistics With R: 2. Exploratory Data Analysis (Descriptive Statistics)
28 pages
Normality, T-Test, ANOVA, Chi Square, Correlation
No ratings yet
Normality, T-Test, ANOVA, Chi Square, Correlation
31 pages
Lexico Practice Tests For Nec 176 Trang Fanpage Tai Lieu Tieng Anh Nang Cao Compress
No ratings yet
Lexico Practice Tests For Nec 176 Trang Fanpage Tai Lieu Tieng Anh Nang Cao Compress
176 pages
R Cheat Sheet
No ratings yet
R Cheat Sheet
9 pages
R-Training For Print
No ratings yet
R-Training For Print
11 pages
UL2
No ratings yet
UL2
2 pages
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
No ratings yet
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
79 pages
Advanced R Data Analysis Training PDF
No ratings yet
Advanced R Data Analysis Training PDF
72 pages
Modelling With R
No ratings yet
Modelling With R
3 pages
FORM 2 ENGLISH Lesson 16 Writing - A Personal Profile
100% (1)
FORM 2 ENGLISH Lesson 16 Writing - A Personal Profile
7 pages
6 - Risk Appetite Statement Template
100% (1)
6 - Risk Appetite Statement Template
5 pages
Complete Bundle Vile Boys Spine Ridge University Clarissa Wild HQ File
No ratings yet
Complete Bundle Vile Boys Spine Ridge University Clarissa Wild HQ File
406 pages
All Values in The First Column
No ratings yet
All Values in The First Column
7 pages
STAT-2450 Assignment 1: Name:, Student ID: B00
No ratings yet
STAT-2450 Assignment 1: Name:, Student ID: B00
9 pages
Powergrout - Ns1: High Performance Precision Grout
No ratings yet
Powergrout - Ns1: High Performance Precision Grout
2 pages
Abstrak NG Thesis Filipino
100% (2)
Abstrak NG Thesis Filipino
6 pages
Annual Report 2022
No ratings yet
Annual Report 2022
248 pages
Scanning Probe Lithography Fundamentals Materials and Applications Yu Kyoung Ryu Javier Martinez Rodrigo Download
No ratings yet
Scanning Probe Lithography Fundamentals Materials and Applications Yu Kyoung Ryu Javier Martinez Rodrigo Download
41 pages
The Role of Youth in Building A Smart India
No ratings yet
The Role of Youth in Building A Smart India
2 pages
RISA-2D Educational Tutorial
No ratings yet
RISA-2D Educational Tutorial
18 pages
Mercedes Technical Training Noise Vibration and Harshness
100% (63)
Mercedes Technical Training Noise Vibration and Harshness
8 pages
Test Bank For Psychology in Your Life 3rd by Grisonpdf Download
100% (6)
Test Bank For Psychology in Your Life 3rd by Grisonpdf Download
41 pages
Statik Und Festigkeitslehre (STAFL)
No ratings yet
Statik Und Festigkeitslehre (STAFL)
9 pages
SCIENCE 4 Week1 - Lesson1 - The Major Organs of The Body
No ratings yet
SCIENCE 4 Week1 - Lesson1 - The Major Organs of The Body
62 pages
Introduction To Bioinformatics - Notes
No ratings yet
Introduction To Bioinformatics - Notes
18 pages
Prelim Bio 2020 Module 4 Ecosystem Dynamics
No ratings yet
Prelim Bio 2020 Module 4 Ecosystem Dynamics
8 pages
Research Papers Astrophysics Science Journal 7533
No ratings yet
Research Papers Astrophysics Science Journal 7533
15 pages
914-Article Text-3490-3-10-20191231
No ratings yet
914-Article Text-3490-3-10-20191231
8 pages
Coal Geology Assignment
No ratings yet
Coal Geology Assignment
16 pages
Class 7 S09
No ratings yet
Class 7 S09
3 pages
KK275P-3CD3CG: IEC61215 Ed2 IEC61730
No ratings yet
KK275P-3CD3CG: IEC61215 Ed2 IEC61730
2 pages
Bio Te CN Ika: Csir Net Unit 11 Syllabus Evolution and Behaviour
No ratings yet
Bio Te CN Ika: Csir Net Unit 11 Syllabus Evolution and Behaviour
3 pages
Proving Areaof Triangle Using Series
No ratings yet
Proving Areaof Triangle Using Series
9 pages
Muhammad Atiq - 2021
No ratings yet
Muhammad Atiq - 2021
3 pages
Metadata Digestive System Grade 6 Week 2 Q2
No ratings yet
Metadata Digestive System Grade 6 Week 2 Q2
1 page
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet