R Record-1
R Record-1
Ex.No:01
Date:
AIM:
ALGORITHM:
Step 1:Define a data frame df with columns: Name, Language, and Age.
Step2:Use write.table() to export df to a text file named "myDataFrame.txt".
Step3:Define column names and add values to the data frame.
Step4:Call the write.table() function to save the data frame to a file.Specify the file name using
the file parameter.
Step5:Check the file in the working directory to confirm successful export.Open the file to verify
the structure and formatting.
Step6:Use the file.choose() function inside read.csv() to allow the user to manually select a
CSV file from their system.
Step7:TRUE Treats the first row of the file as column names.
Step8:Displays the entire data frame in the console.
Step9:The function will display all rows and columns of the data frame.
PROGRAM:
#IMPORTING DATA
df = data.frame(
"Language" = c ("R","python","java"),
"Age" = c(22,25,45))
write.table(df,
file="myDataFrame.txt",
sep = "/t",
row.names = TRUE,
col.names = NA)
#EXPORTING DATA
data1
OUTPUT:
RESULT:
Ex.No:02
Date:
Write an R Script to perform the Data Pre-processing techniques
AIM:
ALGORITHM:
Step3:Count and print the total number of missing values in the dataset.
Step6:Create an additional dataset with car names and randomly assigned country values
(USA, Japan, Europe).
Step7:Add a new column car to store row names.Merge the datasets based on the car column.
Step8:Display the first few rows of the final dataset using head(mtcars).
PROGRAM:
library(dplyr)
library(tidyr)
data(mtcars)
head(mtcars)
# 2. Removing Duplicates
distinct()
# 3. Data Transformation
mutate(mpg_normalized = scale(mpg))
# 4. Feature Engineering
mutate(power_to_weight_ratio = hp / wt)
# 5. Data Integration
car = rownames(mtcars),
left_join(additional_data, by = "car")
head(mtcars)
OUTPUT:
RESULT:
AIM:
ALGORITHM:
Step1:Check if the dplyr package is installed; if not, install it.Load the dplyr library.
Step4:Use summary(data) to get basic statistics (min, max, median, mean, etc.) for numeric
columns
Step 7:Calculate the Pearson correlation between Height and Weight using cor(data$Height,
data$Weight).
PROGRAM:
# Load necessary library
if (!require("dplyr")) install.packages("dplyr")
library(dplyr)
set.seed(123)
ID = 1:100,
head(data)
# Descriptive statistics
summary(data)
# Mean, Median, Variance, and Standard Deviation for a specific column
cat("\nGender Distribution:\n")
print(gender_distribution)
# Percentage distribution
print(round(gender_percentage, 2))
group_by(Gender) %>%
summarise(
Mean_Height = mean(Height),
Mean_Weight = mean(Weight),
SD_Height = sd(Height),
SD_Weight = sd(Weight)
print(grouped_stats)
hist(data$Age, main = "Histogram of Age", xlab = "Age", col = "blue", border = "black")
OUTPUT:
RESULT:
Ex.No:04
Date:
Visualizing the data in different graphics using R Script.
AIM:
PROGRAM:
if (!require("ggplot2")) install.packages("ggplot2")
if (!require("dplyr")) install.packages("dplyr")
library(ggplot2)
library(dplyr)
set.seed(123)
x = "Height (cm)",
y = "Weight (kg)") +
theme_minimal()
x = "Age (years)",
y = "Frequency") +
theme_minimal()
geom_boxplot() +
x = "Gender",
y = "Height (cm)") +
theme_minimal()
group_by(Gender) %>%
summarise(Count = n())
x = "Gender",
y = "Count") +
theme_minimal()
x = "ID",
y = "Age (years)") +
theme_minimal()
geom_density(alpha = 0.5) +
x = "Weight (kg)",
y = "Density") +
theme_minimal()
group_by(Gender) %>%
summarise(Count = n())
coord_polar("y") +
theme_void()
facet_wrap(~ Gender) +
x = "Height (cm)",
y = "Weight (kg)") +
theme_minimal()
OUTPUT:
RESULT:
Visualizing the data in different graphics using R Script was executed successfully.
Ex.No:05
Date:
Write an R Script to implement the Normal and binomial distribution.
AIM:
ALGORITHM:
PROGRAM:
library(ggplot2)
library(ggplot2)
set.seed(123)
OUTPUT:
RESULT:
AIM:
ALGORITHM:
Step1:Ensures that the randomly generated numbers are the same every time you run the script
Step4:This prints the dataset after the categorical variables have been added
Step5:Counts occurrences of each age group. Counts occurrences of each income bracket
Step 10:A dataset with numerical and categorical variables.Frequency counts of categories.
PROGRAM:
set.seed(123)
ID = 1:20,
print("Original Dataset:")
print(data)
data$Age,
right = FALSE
data$Income,
right = TRUE
print(data)
print(table(data$Age_Group))
if (!require("ggplot2")) install.packages("ggplot2")
library(ggplot2)
x = "Age Group",
y = "Count") +
theme_minimal()
x = "Income Bracket",
y = "Count") +
theme_minimal()
OUTPUT:
RESULT:
R Script to convert numerical data to categorical and binomial distribution are executed
successfully.
Ex.No:07
Date:
Write an R Script to Bayes’ Theorem.
Aim:
ALGORITHM:
Step2:Probability that the test is positive given the person has the disease
Step3:Probability that the test is positive given the person does not have the
disease
Step5:Probability that a person has the disease given a positive test result.
PROGRAM:
# Define probabilities
# Likelihood (P(B|A)): Probability of a positive test result given the person has the disease
P_B_given_A <- 0.95 # Test is 95% accurate for those with the disease
if (!require("ggplot2")) install.packages("ggplot2")
library(ggplot2)
scale_fill_brewer(palette = "Set3") +
x = "Probability Component",
y = "Probability") +
theme_minimal()
OUTPUT:
RESULT:
Write an R Script to implement the Time series data analysis and forecasting.
Ex.No:08
Date:
AIM:
To Write an R script implements the Time series data analysis and forecasting.
ALGORITHM:
Step1:Before running the analysis, ensure the required packages (forecast, ggplot2,
and tseries) are installed and loaded.
Step2:the time series into trend, seasonal, and residual components using
decompose()
Step5:Use STL (Seasonal and Trend decomposition using LOESS) for better
decomposition.
Step7:Visualize the original time series.Decompose the time series into trend,
seasonal, and residuals.
PROGRAM:
if (!require("forecast")) install.packages("forecast")
if (!require("ggplot2")) install.packages("ggplot2")
library(forecast)
library(ggplot2)
set.seed(123)
autoplot(time_series_data) +
x = "Time",
y = "Value") +
theme_minimal()
autoplot(decomposed) +
if (!require("tseries")) install.packages("tseries")
library(tseries)
print(adf_test)
print(summary(arima_model))
autoplot(forecast_values) +
x = "Time",
y = "Value") +
theme_minimal()
autoplot(arima_model$residuals) +
x = "Time",
y = "Residuals") +
theme_minimal()
print(lb_test)
autoplot(stl_decomposed) +
# write.csv(data.frame(forecast_values), "forecasted_values.csv")
OUTPUT:
RESULT:
Aim:
ALGORITHM:
PROGRAM:
if (!require("ggplot2")) install.packages("ggplot2")
library(ggplot2)
# 1. One-Sample t-test
cat("\nOne-Sample t-test:\n")
print(t_test_one_sample)
cat("\nTwo-Sample t-test:\n")
print(t_test_two_sample)
# 3. Paired t-test
cat("\nPaired t-test:\n")
print(t_test_paired)
# 4. Chi-Square Test
cat("\nChi-Square Test:\n")
print(chi_sq_test)
cat("\nANOVA Test:\n")
summary(anova_result)
geom_boxplot() +
x = "Group",
y = "Values") +
theme_minimal()
cat("\nShapiro-Wilk Test:\n")
print(shapiro_test)
print(wilcox_test)
# 9. Correlation Test
cat("\nCorrelation Test:\n")
print(cor_test)
OUTPUT:
RESULT:
AIM:
ALGORITHM:
Step1:Before running the analysis, necessary libraries are installed and loaded
Step 2:For machine learning operations like data partitioning and evaluation.
Step9:Visualize the decision tree and feature importance from random forest.
PROGRAM:
if (!require("caret")) install.packages("caret")
if (!require("rpart")) install.packages("rpart")
if (!require("randomForest")) install.packages("randomForest")
if (!require("ggplot2")) install.packages("ggplot2")
library(caret)
library(rpart)
library(randomForest)
library(ggplot2)
set.seed(123)
data <- data.frame(
cat("Sample Dataset:\n")
print(head(data))
set.seed(123)
print(summary(lm_model))
print(tree_model)
print(rf_model)
# 4. Make Predictions
# 5. Evaluate Models
print(confusionMatrix(tree_predictions, testData$Default))
print(confusionMatrix(rf_predictions, testData$Default))
if (!require("rpart.plot")) install.packages("rpart.plot")
library(rpart.plot)
cat("\nFeature Importance:\n")
print(importance)
geom_bar(stat = "identity") +
coord_flip() +
x = "Features",
y = "Importance") +
theme_minimal()
OUTPUT:
RESULT:
AIM:
ALGORITHM:
Step2:Convert the Default column into a factor since it's a classification problem.
Step5:Use the plot() function to visualize the model performance and parameter
tuning
PROGRAM:
if (!require("caret")) install.packages("caret")
if (!require("randomForest")) install.packages("randomForest")
library(caret)
library(randomForest)
set.seed(123)
# View dataset
cat("Sample Dataset:\n")
print(head(data))
set.seed(123)
# 4. Model Performance
print(rf_model)
# Print the best model parameters
print(rf_model$bestTune)
cat("\nCross-Validation Results:\n")
print(rf_model$resample)
plot(rf_model)
OUTPUT:
RESULT:
ALGORITHM:
Step3:This function displays the first six rows of the mtcars dataset for a quick
preview
Step5:The summary() function provides key details about the regression results
PROGRAM:
library(ggplot2)
data(mtcars)
# View the first few rows of the dataset
head(mtcars)
summary(model)
geom_point() +
OUTPUT:
RESULT:
AIM:
ALGORITHM:
Step 6:Uses ggplot2 to visualize the relationship between Age and Spending
PROGRAM:
if (!require("ggplot2")) install.packages("ggplot2")
library(ggplot2)
# Sample dataset
set.seed(123)
# Model Summary
summary(lm_model)
geom_point() +
OUTPUT:
RESULT:
AIM:
ALGORITHM:
Step4:The computed cluster labels (1, 2, or 3) are added to the iris dataset.
Step 5: the as.factor() ensures the cluster labels are treated as categorical values.
PROGRAM:
library(ggplot2)
data(iris)
geom_point() +
x = "Sepal Length",
y = "Sepal Width")
OUTPUT:
RESULT:
Ex.No:15
Date:
Write an R script to implement the Naive Bayes.
AIM:
ALGORITHM:
Step 2:Loads the built-in Iris dataset, which contains 150 samples of iris flowers
with four features (Sepal.Length, Sepal.Width, Petal.
Step 4:Uses the trained model to predict the species of flowers in the test set.
Step 5:Creates a confusion matrix, comparing the actual vs. predicted species.
Step 7:Computes the accuracy by dividing correct predictions by the total number
of test samples.
Step 9:The probability of each feature value given a class using the Gaussian
(Normal) Distribution
Step 10:The class with the highest probability is selected as the prediction.
PROGRAM:
library(e1071)
# Load the dataset
data(iris)
print(confusion_matrix)
OUTPUT:
RESULT: