0% found this document useful (0 votes)
48 views20 pages

DVT (Lab) - R Language Manual

DVT lab manual

Uploaded by

chennas143225
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views20 pages

DVT (Lab) - R Language Manual

DVT lab manual

Uploaded by

chennas143225
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

1.

Financial Analysis using Histogram

# Load the required libraries


library(ggplot2) # for enhanced plotting capabilities

# Generate some random financial data (replace with your actual data)
financial_data <- rnorm(1000, mean = 1000, sd = 200)

# Create a histogram
ggplot(data = NULL, aes(x = financial_data)) +
geom_histogram(fill = "blue", color = "black", bins = 30) +
labs(title = "Financial Data Histogram", x = "Value", y = "Frequency")

Histograms are a useful tool for visualizing the distribution of numerical data,
which can be helpful in financial analysis. In R, you can create histograms using
the hist() function. Here's an example of how to perform financial analysis using
a histogram in R:

Install.packages(‘ggplot2)
library(ggplot2)
> financial_data<-rnorm(1000,mean=1000,sd=200)
> ggplot(data = NULL, aes(x = financial_data)) +
+ geom_histogram(fill = "blue",color = "black", bins = 30) +
+ labs(title = "Financial Data Histogram",x = "Value", y = "Frequency")

In the code above, we start by loading the ggplot2 library, which provides
enhanced plotting capabilities. Next, we generate some random financial
data using the rnorm() function. Replace financial_data with your actual
financial data.

The ggplot() function initializes the plot, and geom_histogram() creates


the histogram. You can customize the fill color, outline color, and the
number of bins (intervals) by adjusting the fill, color, and bins arguments,
respectively.

The labs() function allows you to specify the plot title, x-axis label, and y-
axis label.

After executing the code, you will see a histogram representing the
distribution of your financial data. The x-axis represents the values, and
the y-axis represents the frequency or count of occurrences within each
bin.

Histograms are particularly useful for analyzing the shape, center, and
spread of a distribution. They can help you identify patterns, outliers, and
potential issues with your financial data.
2. Heat map
# Load the required libraries

library(ggplot2) # for enhanced plotting capabilities

library(reshape2) # for data manipulation

# Generate some random financial data (replace with your actual data)

financial_data <- matrix(rnorm(100), ncol = 10)

# Compute correlation matrix

cor_matrix <- cor(financial_data)

# Reshape the correlation matrix into a dataframe


cor_df <- melt(cor_matrix)

# Create a heatmap

ggplot(data = cor_df, aes(x = Var1, y = Var2, fill = value)) +

geom_tile() +

scale_fill_gradient(low = "blue", high = "red") +

labs(title = "Financial Data Heatmap", x = "Variable 1", y = "Variable 2")

Install.packages(‘ggplot’)
Install.packages(‘reshape2’)
library(ggplot2)
> library(reshape2)
> financial_data <- matrix(rnorm(100), ncol = 10)
> cor_matrix <- cor(financial_data)
> cor_df <- melt(cor_matrix)
> ggplot(data = cor_df, aes(x = Var1, y = Var2, fill = value)) +
+ geom_tile() +scale_fill_gradient(low = "blue", high = "red") +
+ labs(title = "Financial Data Heatmap", x = "Variable 1", y = "Variable 2")

In the code above, we start by loading the required libraries: ggplot2 for
plotting and reshape2 for data manipulation.

Next, we generate some random financial data using the matrix()


function. Replace financial_data with your actual financial data.

We then compute the correlation matrix using the cor() function, which
measures the pairwise correlation between variables in the financial data.

To create the heatmap, we reshape the correlation matrix into a


dataframe using the melt() function from the reshape2 library. This
transformation allows us to plot the correlation values as a heatmap.

Finally, we use ggplot() to initialize the plot, geom_tile() to create the


heatmap tiles, and scale_fill_gradient() to define the color gradient for the
correlation values. You can customize the color palette by adjusting the
low and high arguments. The labs() function sets the title, x-axis label,
and y-axis label.

After executing the code, you will see a heatmap representing the pairwise
correlations between variables in the financial data. The intensity of the colors
represents the strength and direction of the correlations. Positive correlations are
indicated by warmer colors (e.g., red), while negative correlations are indicated by
cooler colors (e.g., blue).
By analyzing the heatmap, you can identify patterns of correlation and dependencies
among variables in your financial data, which can help in understanding relationships
and making informed decisions.

Output:

3. geospatial data visualization using R

# Load the required libraries

library(ggplot2) # for data visualization

# Create a sample geospatial dataset

data <- data.frame(

city = c("New York", "Los Angeles", "Chicago"),


latitude = c(40.7128, 34.0522, 41.8781),

longitude = c(-74.0060, -118.2437, -87.6298),

population = c(8622698, 3990456, 2705994)

# Create a base map using the 'world' dataset from the 'maps' package

world_map <- map_data("world")

# Plot the base map

ggplot() +

geom_polygon(data = world_map, aes(x = long, y = lat, group = group), fill = "lightgray", color =
"gray") +

coord_equal() # Equal aspect ratio

# Add geospatial points to the plot

ggplot(data = data, aes(x = longitude, y = latitude, color = population)) +

geom_point(size = 4) +

scale_color_gradient(low = "green", high = "red", guide = "legend") +

coord_equal() # Equal aspect ratio

# Add labels to the geospatial points

ggplot(data = data, aes(x = longitude, y = latitude, color = population, label = city)) +

geom_point(size = 4) +

scale_color_gradient(low = "green", high = "red", guide = "legend") +

geom_text(size = 3, nudge_x = 0.5, nudge_y = 0.5) +

coord_equal() # Equal aspect ratio

In the code above, we start by loading the required library, ggplot2, for
data visualization.
We then create a sample geospatial dataset ( data) that includes the city
names, latitude, longitude, and population.

Next, we create a base map using the map_data() function from the maps
package. In this example, we use the "world" dataset.

To plot the base map, we use the geom_polygon() function with the
world_map dataset. This creates a gray map background.

To add geospatial points to the plot, we use geom_point() with the data
dataset. The size argument adjusts the point size, and the color argument
maps the point color to the population values. You can customize the
color gradient using the scale_color_gradient() function.

If you want to add labels to the geospatial points, you can use geom_text()
with the label argument set to the city names. The nudge_x and nudge_y
arguments adjust the position of the labels.

Finally, the coord_equal() function ensures an equal aspect ratio in the


plot, providing accurate geospatial representation.

By executing the code, you will create geospatial visualizations in R,


including base maps and plots with geospatial points and labels. This
allows you to effectively visualize and analyze geospatial data.

4.Weather forecasting visualization using R


# Load the required libraries
library(ggplot2) # for data visualization

# Create a sample weather forecasting dataset


data <- data.frame(
date = seq(as.Date("2023-05-01"), as.Date("2023-05-07"), by = "day"),
temperature = c(20, 22, 19, 23, 21, 24, 20),
precipitation = c(0.2, 0.0, 0.5, 0.1, 0.0, 0.3, 0.2),
humidity = c(70, 65, 72, 68, 73, 66, 71)
)
# Plot temperature
ggplot(data = data, aes(x = date, y = temperature)) +
geom_line(color = "blue") +
geom_point(color = "blue") +
labs(title = "Temperature Forecast", x = "Date", y = "Temperature (°C)")

# Plot precipitation and humidity


ggplot(data = data, aes(x = date)) +
geom_bar(aes(y = precipitation), stat = "identity", fill = "blue", alpha = 0.5) +
geom_line(aes(y = humidity), color = "green") +
geom_point(aes(y = humidity), color = "green") +
labs(title = "Precipitation and Humidity Forecast", x = "Date", y = "Value") +
scale_y_continuous(sec.axis = sec_axis(~ ., name = "Humidity (%)"))

# Plot interactive weather map using leaflet


library(leaflet) # for interactive maps

# Create a sample location dataset


locations <- data.frame(
city = c("New York", "Los Angeles", "Chicago"),
latitude = c(40.7128, 34.0522, 41.8781),
longitude = c(-74.0060, -118.2437, -87.6298)
)

# Create a leaflet map with markers


leaflet(data = locations) %>%
addTiles() %>%
addMarkers(
lng = ~longitude,
lat = ~latitude,
label = ~city,
popup = ~paste("Latitude: ", latitude, "<br>Longitude: ", longitude)
)

To visualize weather forecasting data in R, you can use various packages like ggplot2,
plotly, and leaflet. Here's an example of how to create a weather forecasting visualization
using the ggplot2 package:

In the code above, we start by loading the required library, ggplot2, for data visualization.

We then create a sample weather forecasting dataset ( data) that includes the date,
temperature, precipitation, and humidity.

To plot the temperature forecast, we use geom_line() and geom_point() with the
temperature variable. The color argument adjusts the line and point color, and the
labs() function sets the title and axis labels.

To plot the precipitation and humidity forecast, we use geom_bar() with stat =
"identity" for the precipitation, and geom_line() and geom_point() for the humidity.
Customize the fill color, line color, and point color accordingly. The labs() function sets the
title and axis labels, and scale_y_continuous() adds a secondary y-axis for the humidity.

For interactive weather visualization on a map, we load the leaflet library. We create a
sample location dataset ( locations) with city names, latitude, and longitude.

Using leaflet(), we create a map, add tiles with addTiles(), and place markers at the

Output:
5.Multivariate analysis in R language
Multivariate analysis in R involves analyzing data that consists of multiple
variables simultaneously. It allows for exploring relationships, patterns, and
dependencies among variables. R provides several packages for multivariate
analysis, such as stats, psych, and FactoMineR. Here's an example of how to
perform multivariate analysis in R:

# Load the required libraries


library(stats) # for basic statistical analysis
library(psych) # for factor analysis
library(FactoMineR) # for principal component analysis

# Create a sample multivariate dataset


data <- data.frame(
var1 = c(1, 2, 3, 4, 5),
var2 = c(6, 7, 8, 9, 10),
var3 = c(11, 12, 13, 14, 15)
)

# Correlation matrix
cor_matrix <- cor(data)
print(cor_matrix)

# Covariance matrix
cov_matrix <- cov(data)
print(cov_matrix)

# Factor analysis
factor_analysis <- fa(data)
print(factor_analysis)

# Principal component analysis


pca <- PCA(data)
print(pca)

In the code above, we start by loading the required libraries: stats for basic statistical
analysis, psych for factor analysis, and FactoMineR for principal component analysis.

We then create a sample multivariate dataset ( data) with multiple variables ( var1, var2,
var3).

To calculate the correlation matrix, we use the cor() function with the data dataset. The
resulting matrix (cor_matrix) shows the pairwise correlations between variables.

To calculate the covariance matrix, we use the cov() function with the data dataset. The
resulting matrix (cov_matrix) shows the pairwise covariances between variables.

For factor analysis, we use the fa() function from the psych package. It performs factor
analysis on the data dataset and returns the factor loadings, communalities, and other related
statistics. The factor_analysis object contains the results.
For principal component analysis (PCA), we use the PCA() function from the FactoMineR
package. It performs PCA on the data dataset and returns the eigenvalues, eigenvectors, and
other related statistics. The pca object contains the results.

By executing the code, you can perform basic multivariate analysis tasks in R, including
calculating correlation and covariance matrices, conducting factor analysis, and performing
principal component analysis. These techniques help uncover patterns, underlying
dimensions, and dependencies within multivariate datasets.

Output:

6.Healthcare Visualization using R language.

Healthcare visualization in R can be done using various packages such as


ggplot2, plotly, and ggiraph . Here's an example of how to visualize healthcare
data in R:
# Load the required libraries
library(ggplot2) # for data visualization
library(plotly) # for interactive plots
library(ggiraph) # for interactive ggplot2 plots

# Create a sample healthcare dataset


healthcare_data <- data.frame(
Year = c(2018, 2019, 2020, 2021),
Hospital_A = c(100, 150, 130, 200),
Hospital_B = c(120, 160, 180, 150),
Hospital_C = c(80, 110, 90, 120)
)

# Visualize hospital performance over time using line chart


ggplot(healthcare_data, aes(x = Year)) +
geom_line(aes(y = Hospital_A, color = "Hospital A")) +
geom_line(aes(y = Hospital_B, color = "Hospital B")) +
geom_line(aes(y = Hospital_C, color = "Hospital C")) +
labs(title = "Hospital Performance over Time", x = "Year", y = "Number of
Patients") +
scale_color_manual(values = c("Hospital A" = "blue", "Hospital B" = "red",
"Hospital C" = "green"))

# Create an interactive bar chart using plotly


plot_ly(healthcare_data, x = ~Year, y = ~Hospital_A, type = "bar", name = "Hospital
A") %>%
add_trace(y = ~Hospital_B, name = "Hospital B") %>%
add_trace(y = ~Hospital_C, name = "Hospital C") %>%
layout(title = "Hospital Performance", xaxis = list(title = "Year"), yaxis = list(title =
"Number of Patients"))

# Create an interactive scatter plot using ggiraph


gg <- ggplot(healthcare_data, aes(x = Year, y = Hospital_A, color = "Hospital A",
tooltip = paste("Year:", Year, "<br>Patients:", Hospital_A))) +
geom_point_interactive(aes(tooltip = paste("Year:", Year, "<br>Patients:",
Hospital_B)), size = 4) +
geom_point_interactive(aes(tooltip = paste("Year:", Year, "<br>Patients:",
Hospital_C)), size = 4) +
labs(title = "Hospital Performance", x = "Year", y = "Number of Patients") +
scale_color_manual(values = c("Hospital A" = "blue", "Hospital B" = "red",
"Hospital C" = "green"))

girafe(ggobj = gg)

In the code above, we start by loading the required libraries: ggplot2 for
data visualization, plotly for interactive plots, and ggiraph for interactive
ggplot2 plots.

We then create a sample healthcare dataset ( healthcare_data) containing


information about hospitals' performance over time.

To visualize hospital performance over time, we use a line chart with


ggplot2. The code creates separate lines for each hospital using
geom_line() . Customize the aesthetics ( aes()), titles ( labs()), and colors
(scale_color_manual() ) according to your dataset and visualization
preferences.

For interactive visualization, we use plot_ly() from the plotly package to


create an interactive bar chart. The code specifies the x-axis ( Year), y-axis
(Hospital_A, Hospital_B, Hospital_C), and layout options ( layout()).

We also create an interactive scatter

Output:
7. Acquiring the data from Excel
Example: record.csv

id,name,salary,start_date,dept
1,Shubham,613.3,2012-01-01,IT
2,Arpita,525.2,2013-09-23,Operations
3,Vaishali,63,2014-11-15,IT
4,Nishka,749,2014-05-11,HR
5,Gunjan,863.25,2015-03-27,Finance
6,Sumit,588,2013-05-21,IT
7,Anisha,932.8,2013-07-30,Operations
8,Akash,712.5,2014-06-17,Finance

# printing the data

1. data <- read.csv("record.csv")


2. print(data)
# Printing Rows and columns

1. csv_data<- read.csv("record.csv")
2. print(is.data.frame(csv_data))
3. print(ncol(csv_data))
4. print(nrow(csv_data))

# Getting the maximum salary from data frame.


1. max_sal<- max(csv_data$salary)
2. print(max_sal)

#
Getting the detais of the person who have maximum salary
1. details <- p(csv_data,salary==max(salary))
2. print(details)

# Getting the details of all the persons who are working in the
IT department
1. details <- subset(csv_data,dept=="IT")
2. print(details)

#Getting the detais of all the person who are working in IT d


epartment
3. details <- subset(csv_data,dept=="IT"&salary>600)
4. print(details)

# Getting details of those peoples who joined on or after 2014


5. details <- subset(csv_data,as.Date(start_date)>as.Date("2014-01-01"))

6. print(details)
8. Cluster Analysis using Data Visualization
Clustering is a technique in machine learning that attempts to find clusters of observations
within a dataset.
The goal is to find clusters such that the observations within each cluster are quite similar to
each other, while observations in different clusters are quite different from each other.
Clustering is a form of unsupervised learning because we’re simply attempting to find
structure within a dataset rather than predicting the value of some response variable.

What is K-Means Clustering?


K-means clustering is a technique in which we place each observation in a dataset into one of
K clusters.
kmeans(data, centers, nstart)
where: data: Name of the dataset.centers: The number of clusters, denoted k.nstart: The
number of initial configurations.

Program
# Library required for fviz_cluster function
install.packages("factoextra")
library(factoextra)
library(ggplot2)

# Loading dataset
df <- mtcars

# Omitting any NA values


df <- na.omit(df)

# Scaling dataset
df <- scale(df)

km <- kmeans(df, centers = 4, nstart = 25)

# Visualize the clusters


fviz_cluster(km, data = df)
km <- kmeans(df, centers = 5, nstart = 25)

# Visualize the clusters


fviz_cluster(km, data = df)

Output
When k = 3

When k = 4

You might also like