Big Data Analytics Lab Manual
Big Data Analytics Lab Manual
TECHNOLOGY, NALLANUR,
DHARMAPURI
BONAFIDE CERTIFICATE
Register No : .……………………………
Students Name : .……………………………
AIM:
To install, configure and run Hadoop and HDFS.
PROCEDURE:
1) Installing Java
Hadoop is a framework written in Java for running applications on large clusters of
commodity hardware. Hadoop needs Java 6 or above to work.
Step 1: Download tar and extract
Download Jdk tar.gz file for linux-64 bit, extract it into “/usr/local”
# cd /opt
# sudo tar xvpzf /home/itadmin/Downloads/jdk-8u5-linux-x64.tar.gz
# cd /opt/jdk1.8.0_05
Step 2: Set Environments
• Open the “/etc/profile” file and Add the following line as per the version
• Set a environment for Java
• Use the root user to save the /etc/proflie or use gedit instead of vi .
• The 'profile' file contains commands that ought to be run for login shells
# sudo vi /etc/profile
#--insert JAVA_HOME
JAVA_HOME=/opt/jdk1.8.0_05
#--in PATH variable just append at the end of the line PATH=$PATH:
$JAVA_HOME/bin
#--Append JAVA_HOME at end of the export statement
export PATH JAVA_HOME
save the file using by pressing “Esc” key followed by :wq!
Step 3: Source the /etc/profile
# source /etc/profile
Step 4: Update the java alternatives
1. By default OS will have a open jdk. Check by “java -version”. You will be prompt
“openJDK”
2. If you also have openjdk installed then you'll need to update the java alternatives:
3. If your system has more than one version of Java, configure which one your
system causes by entering the following command in a terminal window
4. By default OS will have a open jdk. Check by “java -version”. You will be prompt
“JavaHotSpot(TM) 64-Bit Server”
# update-alternatives --install "/usr/bin/java" java "/opt/jdk1.8.0_05/bin/java"
1 # update-alternatives --config java
--type selection number:
# java -version
2) configuressh
•Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your
localmachine if you want to use Hadoop on it (which is what we want to do in this exercise).For
our single-node setup of Hadoop, we therefore need to configure SSH access to localhostThe
need to create a Password-less SSH Key generation based authentication is so thatthe master
node can then login to slave nodes (and the secondary node) to start/stopthem easily without any
delays for authentication
•If you skip this step, then have to provide passwordGenerate an SSH key for the user.
Then Enable password-less SSH access to you sudo apt-get install openssh-server
--You will be asked to enter password,
root@abc []#sshlocalhost
root@abc[]# ssh-keygen
root@abc[]# ssh-copy-id -i localhost
--After above 2 steps, You will be connected without password,
root@abc[]# sshlocalhost
root@abc[]# exit
3) Hadoop installation
•Now Download Hadoop from the official Apache, preferably a stable release
version ofHadoop 2.7.x and extract the contents of the Hadoop package to a location of
yourchoice.
•For example, choose location as “/opt/”
Step 1: Download the tar.gz file of latest version Hadoop( hadoop-2.7.x) from the official
site .
Step 2: Extract (untar) the downloaded file from this commands to /opt/bigdata
root@abc[]# cd /opt
root@abc[/opt]# sudo tar xvpzf /home/itadmin/Downloads/hadoop-2.7.0.tar.gz
root@abc[/opt]# cd hadoop-2.7.0/
Like java, update Hadop environment variable in /etc/profile
# sudo vi /etc/profile
#--insert HADOOP_PREFIX
HADOOP_PREFIX=/opt/hadoop-2.7.0
#--in PATH variable just append at the end of the line PATH=$PATH:
$HADOOP_PREFIX/bin
#--Append HADOOP_PREFIX at end of the export statement
Copy the output files from the distributed filesystem to the local filesystem and examine them:
$ bin/hdfsdfs -get output output
$ cat output/* or
• View the output files on the distributed filesystem:
$ bin/hdfsdfs -cat /output/*
RESULT:
Thus the installation and configuration of Hadoop and HDFS is successfully executed.
Ex.No:2 IMPLEMENT WORD COUNT / FREQUENCY PROGRAMS USING MAPREDUCE
Date:
AIM:
Word count program to demonstrate the use of Map and Reduce tasks.
PROCEDURE:
1. Analyze the input file content.
2. Develop the code.
a. Writing a map function.
b. Writing a reduce function.
c. Writing the Driver class.
3. Compiling the source.
4. Building the JAR file.
5. Starting the DFS.
6. Creating Input path in HDFS and moving the data into Input path.
7. Executing the program.
PROGRAM:
Import java.io.IOException;
Import java.util.StringTokenizer;
Import org.apache.hadoop.conf.Configuration;
Import org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import org.apache.hadoop.io.Text;
Import org.apache.hadoop.mapreduce.Job;
Import org.apache.hadoop.mapreduce.Mapper;
Import org.apache.hadoop.mapreduce.Reducer;
Import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount
{
//Step a
public static class TokenizerMapper extends Mapper < Object , Text, Text, IntWritable>
{
//hadoop supported data types
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
//map method that performs the tokenizer job and framing the initial key value pairs
public void map( Object key, Text value, Context context) throws IOException ,
InterruptedException
{
//taking one line at a time and tokenizing the same
StringTokenizeritr = new StringTokenizer (value.toString());
//iterating through all the words available in that line and forming the key value pair
while (itr.hasMoreTokens())
{
word.set(itr.nextToken());
//sending to the context which inturn passes the same to reducer
context.write(word, one);
}
}
}
//Step b
public static class IntSumReducer extends Reducer < Text, IntWritable, Text, IntWritable>
{
privateIntWritable result = new IntWritable();
// Reduce method accepts the Key Value pairs from mappers, do the aggregation based on keys
// and produce the final output
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws
IOException ,InterruptedException }
int sum = 0;
/*iterates through all the values available with a key and
add them together and give the final result as the key and sum of its values*/
for (IntWritableval: values)
{
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
//Step c
public static void main( String [] args) throws Exception
{
//creating conf instance for Job Configuration
Configuration conf = new Configuration();
//Parsing the command line arguments
String [] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
if (otherArgs.length< 2)
{
System .err.println( "Usage: wordcount<in> [<in>...]<out>" );
System .exit(2);
}
//Create a new Job creating a job object and assigning a job name for identification
//purposes
Job job = new Job(conf, "word count" );
job.setJarByClass(WordCount.class);
// Specify various job specific parameters
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
//Setting job object with the Data Type of output Key
job.setOutputKeyClass(Text.class);
//Setting job object with the Data Type of output value
job.setOutputValueClass(IntWritable.class);
//the hdfs input and output directory to be fetched from the command line
for ( int i = 0; i <otherArgs.length 1; ++i)
{
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length 1]));
System .exit(job.waitForCompletion( true ) ? 0 : 1);
}
}
RESULT:
Thus the Word count program to use Map and reduce tasks is demonstrated successfully.
Ex.No:3 IMPLEMENT AN MR PROGRAM THAT PROCESSES A WEATHER DATASET
Date:
AIM:
Map Reduce program to that processes a weather dataset R.
PROCEDURE:
1. Analyze the input file content.
2. Develop the code.
a. Writing a map function.
b. Writing a reduce function.
c. Writing the Driver class.
3. Compiling the source.
4. Building the JAR file.
5. Starting the DFS.
6. Creating Input path in HDFS and moving the data into Input path.
7. Executing the program.
PROGRAM:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
while (strTokens.hasMoreElements()) {
if (counter == 0) {
date = strTokens.nextToken();
} else {
if (counter % 2 == 1) {
currentTime = strTokens.nextToken();
} else {
currnetTemp =
Float.parseFloat(strTokens.nextToken());
if (minTemp > currnetTemp)
{ minTemp =
currnetTemp;
currentTime; minTempANDTime = minTemp + "AND" +
}
if (maxTemp < currnetTemp)
{ maxTemp =
currnetTemp;
+ currentTime; maxTempANDTime = maxTemp + "AND"
}
}
}
counter++;
}
// Write to context - MinTemp, MaxTemp and corresponding time
Text temp = new Text();
temp.set(maxTempANDTime);
Text dateText = new Text();
dateText.set(date);
try {
con.write(dateText, temp);
} catch (Exception e) {
e.printStackTrace();
}
temp.set(minTempANDTime);
dateText.set(date);
con.write(dateText, temp);
}
}
public static class WhetherForcastReducer extends
Reducer<Text, Text, Text, Text> {
MultipleOutputs<Text, Text> mos;
public void setup(Context context) {
mos = new MultipleOutputs<Text, Text>(context);
}
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
int counter = 0;
String reducerInputStr[] = null;
String f1Time = "";
String f2Time = "";
String f1 = "", f2 = "";
Text result = new Text();
for (Text value : values) {
if (counter == 0) {
reducerInputStr = value.toString().split("AND");
f1 = reducerInputStr[0];
f1Time = reducerInputStr[1];
}
else
{ reducerInputStr = value.toString().split("AND");
f2 = reducerInputStr[0];
f2Time = reducerInputStr[1];
}
counter = counter + 1;
}
if (Float.parseFloat(f1) > Float.parseFloat(f2)) {
result = new Text("Time: " + f2Time + " MinTemp: " + f2 + "\t"
+ "Time: " + f1Time + " MaxTemp: " + f1);
} else {
"hdfs://192.168.213.133:54310/wheatherInputData/input_temp.txt");
Path pathOutputDir = new Path(
"hdfs://192.168.213.133:54310/user/hduser1/testfs/output_mapred5");
FileInputFormat.addInputPath(job, pathInput);
FileOutputFormat.setOutputPath(job, pathOutputDir);
try {
System.exit(job.waitForCompletion(true) ? 0 : 1);
System.out.println("Job executed successfully!!");
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}}}
Input Dataset:
RESULT:
Thus the Map Reduce program that processes a weather dataset R is executed
successfully.
OUTPUT:
AIM:
PROCEDURE:
1. Linear regression is used to predict a quantitative outcome variable (y) on the basis of
one or multiple predictor variables (x)
2. The goal is to build a mathematical formula that defines y as a function of the x variable.
3. When you build a regression model, you need to assess the performance of the predictive
model.
4. Two important metrics are commonly used to assess the performance of the predictive
regression model:
5. Root Mean Squared Error, which measures the model prediction error. It corresponds to
the average difference between the observed known values of the outcome and the
predicted value by the model. RMSE is computed as RMSE = mean((observeds -
predicteds)^2) %>% sqrt(). The lower the RMSE, the better the model.
6. R-square, representing the squared correlation between the observed known outcome
values and the predicted values by the model. The higher the R2, the better the model.
PROGRAM:
X=c(151,174,138,186,128,136,179,163,152,131)
Y=c(63,81,56,91,47,57,76,72,62,48)
plot(X,Y)
relation=lm(Y~X)
print(relation)
print(summary(relation))
a=data.frame(X=170)
result=predict(relation,a)
print(result)
png(file="linearregression.png")
plot(Y,X,col="green",main="Height & Weight Regression",abline(lm(X~Y)),
cex=1.3,pch=16,Xlab="Weight in kg",Ylab="Height in cm")
dev.off()
RESULT:
Thus the implementation of linear regression was executed and verified successfully.
OUTPUT:
> a=data.frame(X=170)
>result=predict(relation,a)
>print(result)
1
76.22869
>png(file="linearregression.png")
>plot(Y,X,col="green",main="Height & Weight Regression",abline(lm(X~Y)),
cex=1.3,pch=16,Xlab="Weight in kg",Ylab="Height in cm")
>dev.off()
RStudioGD
2
Ex.No:4b) Date: IMPLEMENTATION OF LOGISTIC REGRESSION
AIM:
PROCEDURE:
1. Logistic regression is used to predict the class of individuals based on one or multiple
predictor variables (x).
2. It is used to model a binary outcome, that is a variable, which can have only two
possible values: 0 or 1, yes or no, diseased or non-diseased.
3. Logistic regression belongs to a family, named Generalized Linear Model (GLM),
developed for extending the linear regression model to other situations.
4. Other synonyms are binary logistic regression, binomial logistic regression and logit
model.
5. Logistic regression does not return directly the class of observations. It allows us to
estimate the probability (p) of class membership. The probability will range between
0 and 1.
PROGRAM:
input=mtcars[,c("am","cyl","hp","wt")]
am.data=glm(formula=am~cyl+hp+wt,data=input,family = binomial)
print(summary(am.data))
RESULT:
Thus the implementation of logistic regression was executed and verified successfully.
OUTPUT:
Call:
glm(formula = am ~ cyl + hp + wt, family = binomial, data = input)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.17272 -0.14907 -0.01464 0.14116 1.27641
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 19.70288 8.11637 2.428 0.0152 *
cyl 0.48760 1.07162 0.455 0.6491
hp 0.03259 0.01886 1.728 0.0840 .
wt -9.14947 4.15332 -2.203 0.0276 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
AIM:
PROCEDURE:
x=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
y=c(3,4,5,4,8,10,10,11,14,20,23,24,32,34,35,37,42,48,53,60)
#Linear regression
model<- lm(y ~ x, train)
#SVM
library(e1071)
#Plot the predictions and the plot to see our model fit
points(train$x, pred, col = "blue", pch=4)
#Linear model has a residuals part which we can extract and directly calculate rmse
error<- model$residuals
lm_error<- sqrt(mean(error^2)) # 3.832974
#Forsvm, we have to manually calculate the difference between actual values (train$y) with our
predictions (pred)
error_2 <- train$y - pred
svm_error<- sqrt(mean(error_2^2)) # 2.696281
#- best parameters:
# epsilon cost
#0 8
plot(svm_tune)
plot(train,pch=16)
points(train$x, best_mod_pred, col = "blue", pch=4)
RESULT:
AIM:
PROCEDURE:
PROGRAM:
library(party)
input.dat <- readingSkills[c(1:105),]
png(file = "decision_tree.png")
output.tree <- ctree( nativeSpeaker ~ age + shoeSize + score, data = input.dat)
plot(output.tree)
dev.off()
RESULT:
Thus the implementation of decision tree classification was executed and verified
successfully.
OUTPUT:
null device
AIM:
PROCEDURE:
install.packages("factoextra")
install.packages("cluster")
install.packages("magrittr")
library("factoextra")
library("cluster")
library("magrittr")
scale() %>%
clustering
# Visualize using factoextra
RESULT:
AIM:
PROCEDURE:
1. Partitioning algorithms are clustering techniques that subdivide the data sets into a set of
k groups, where k is the number of groups pre-specified by the analyst.
2. There are different types of partitioning clustering methods. The most popular is the K-
means clustering (MacQueen 1967), in which, each cluster is represented by the center or
means of the data points belonging to the cluster. The K-means method is sensitive to
outliers.
3. An alternative to k-means clustering is the K-medoids clustering or PAM (Partitioning
Around Medoids, Kaufman & Rousseeuw, 1990), which is less sensitive to outliers
compared to k-means.
4. Determining the optimal number of clusters: use factoextra::fviz_nbclust()
5. Compute and visualize k-means clustering.
PROGRAM:
install.packages("factoextra")
install.packages("magrittr")
install.packages("cluster")
library("factoextra")
library("magrittr")
library("cluster")
set.seed(123)
km.res<-kmeans(my_data, 3, nstart=25)
# Visualize
library("factoextra")
fviz_cluster(km.res, data=my_data,
ellipse.type="convex",
palette="jco",
ggtheme=theme_minimal())
RESULT:
AIM:
PROCEDURE:
PROGRAM:
install.packages("factoextra")
install.packages("magrittr")
install.packages("cluster")
library("factoextra")
library("magrittr")
library("cluster")
library(cluster)
df<-scale(USArrests)# Standardize the data
res.fanny<-fanny(df, 2)# Compute fuzzy clustering with k = 2
head(res.fanny$membership, 3)# Membership coefficients
res.fanny$coeff# Dunn's partition coefficient
head(res.fanny$clustering)# Observation groups
library(factoextra)
fviz_cluster(res.fanny, ellipse.type="norm", repel=TRUE,
palette="jco", ggtheme=theme_minimal(),
legend="right")
RESULT:
Thus the implementation of clustering techniques using fuzzy clustering was executed
and verified successfully.
OUTPUT:
Ex.No:6d) Date: IMPLEMENTATION OF DENSITY BASED CLUSTERING
AIM:
PROCEDURE:
1. It can be used to identify clusters of any shape in a data set containing noise and outliers.
2. Clusters are dense regions in the data space, separated by regions of lower density
of points.
3. The simulated data set multishapes is used.
4. The function fviz_cluster() is used to visualize the clusters.
5. First, install factoextra: install.packages(“factoextra”); then compute and visualize k-
means clustering using the data set multishapes.
6. The goal is to identify dense regions, which can be measured by the number of
objects close to a given point.
PROGRAM:
install.packages("factoextra")
install.packages("magrittr")
install.packages("cluster")
library("factoextra")
library("magrittr")
library("cluster")
install.packages("fpc")
install.packages("dbscan")
install.packages("factoextra")
data("multishapes", package="factoextra")
df<-multishapes[, 1:2]
library("fpc")
set.seed(123)
library("factoextra")
ellipse=FALSE,
show.clust.cent=FALSE,
geom="point",palette="jco", ggtheme=theme_classic())
RESULT:
Thus the implementation of clustering techniques using density based clustering was
executed and verified successfully.
OUTPUT:
Ex.No:6e) Date: IMPLEMENTATION OF MODEL BASED CLUSTERING
AIM:
PROCEDURE:
1. In model-based clustering, the data are viewed as coming from a distribution that is
mixture of two ore more clusters.
2. It finds best fit of models to data and estimates the number of clusters.
3. Install the mclust package as follow: install.packages(“mclust”).
4. Model-based clustering results can be drawn using the base function plot.Mclust() .
5. fviz_mclust() uses a principal component analysis to reduce the dimensionnality of
the data.
PROGRAM:
install.packages("factoextra")
install.packages("cluster")
install.packages("magrittr")
library("cluster")
library("factoextra")
library("magrittr")
library("mclust")
data("diabetes")
head(diabetes, 3)
library(factoextra)
pointsize=1.5, palette="jco")
# Classification uncertainty
RESULT:
Thus the implementation of clustering techniques using model based clustering was
executed and verified successfully.
OUTPUT:
##
## Gaussian finite mixture model fitted by EM algorithm
##
##
## Mclust VVV (ellipsoidal, varying volume, shape, and
orientation) model with 3 components:
##
## log.likelihood n df BIC ICL
## -169 145 29 -483 -501
##
## Clustering table:
## 1 2 3
## 81 36 28
Model selection
Best model: VVV | Optimal clusters: n = 3
-500
-750
-1000
-s- Eli - - VI EEI
VEI EVI —•— VVI
— - EEE EVE — - VEE
— - VVE —e•- EEV VE'v
-1250 EVV VVV
1 2 3 4 5 6 7 8 9
Number of components
Cluster plot
Classification
cluster
Dim2
-2 0 2 4
Dim1 (73.2%)
Cluster plot
Uncertainty
cluster
-2 0 2 4
Dim1 (73.2%)
Ex.No:7a) Date: DATA VISUALIZATION USING PIE CHART
AIM:
To visualize data using plotty framework.
PROCEDURE:
1. In R the pie chart is created using the pie() function which takes positive
numbers as a vector input.
2. The additional parameters are used to control labels, color, title etc.
3. The basic syntax for creating a pie-chart using the R is −
i. pie(x, labels, radius, main, col, clockwise)
4. Following is the description of the parameters used −
a. 4.a x is a vector containing the numeric values used in the pie chart.
b. 4.b labels is used to give description to the slices.
c. 4.c radius indicates the radius of the circle of the pie chart.(value between −1 and
+1).
d. 4.d main indicates the title of the chart.
e. 4.e col indicates the color palette.
f. 4.f clockwise is a logical value indicating if the slices are drawn clockwise or anti
clockwise.
5. We will use parameter main to add a title to the chart and another parameter
is col which will make use of rainbow colour pallet while drawing the chart. The length
of the pallet should be same as the number of values we have for the chart. Hence we use
length(x).
PROGRAM:
# Create data for the graph.
x <- c(21, 62, 10, 53)
labels<- c("London", "New York", "Singapore", "Mumbai")
RESULT:
Thus the data is visualized using plotty framework.
OUTPUT:
Ex.No:7b) Date: DATA VISUALIZATION USING BAR PLOT
AIM:
To visualize data using plotty framework.
PROCEDURE:
1. R uses the function barplot() to create bar charts. R can draw both vertical and
horizontal bars in the bar chart. In bar chart each of the bars can be given different
colors.
2. The basic syntax to create a bar-chart in R is −
3. Following is the description of the parameters used −
i. barplot(H, xlab, ylab, main, names.arg, col)
a. H is a vector or matrix containing numeric values used in bar chart.
b. xlab is the label for x axis.
c. ylab is the label for y axis.
d. main is the title of the bar chart.
e. names.arg is a vector of names appearing under each bar.
f. col is used to give colors to the bars in the graph.
4. The main parameter is used to add title. The col parameter is used to add colors to the
bars. The args.name is a vector having same number of values as the input vector to
describe the meaning of each bar.
PROGRAM:
# Create the data for the chart.
H <- c(7,12,28,3,41)
M <- c("Mar","Apr","May","Jun","Jul")
RESULT:
Thus the data is visualized using plotty framework.
OUTPUT:
Ex.No:7c) Date: DATA VISUALIZATION USING BOX PLOT
AIM:
To visualize data using plotty framework.
PROCEDURE:
1. Boxplots are created in R by using the boxplot() function.
2. The basic syntax to create a boxplot in R is −
3. Following is the description of the parameters used −
boxplot(x, data, notch, varwidth, names, main)
a. x is a vector or a formula.
b. data is the data frame.
c. notch is a logical value. Set as TRUE to draw a notch.
d. varwidth is a logical value. Set as true to draw width of the box proportionate to
the sample size.
e. names are the group labels which will be printed under each boxplot.
f. main is used to give a title to the graph.
PROGRAM:
# Give the chart file a name.
png(file = "boxplot.png")
RESULT:
Thus the data is visualized using plotty framework.
OUTPUT:
Ex.No:7d) Date: DATA VISUALIZATION USING HISTOGRAM
AIM:
To visualize data using plotty framework.
PROCEDURE:
1. R creates histogram using hist() function. This function takes a vector as an input and
uses some more parameters to plot histograms.
2. The basic syntax for creating a histogram using R is −
3. Following is the description of the parameters used −
i. hist(v,main,xlab,xlim,ylim,breaks,col,border)
a. v is a vector containing numeric values used in histogram.
b. main indicates title of the chart.
c. col is used to set color of the bars.
d. border is used to set border color of each bar.
e. xlab is used to give description of x-axis.
f. xlim is used to specify the range of values on the x-axis.
g. ylim is used to specify the range of values on the y-axis.
h. breaks is used to mention the width of each bar.
PROGRAM:
# Create data for the graph.
v <- c(9,13,21,8,36,22,12,41,31,33,19)
RESULT:
Thus the data is visualized using plotty framework.
OUTPUT:
Ex.No:7e) Date: DATA VISUALIZATION USING LINE GRAPH
AIM:
To visualize data using plotty framework.
PROCEDURE:
1. The plot() function in R is used to create the line graph.
2. The basic syntax to create a line chart in R is −
3. Following is the description of the parameters used −
i. plot(v,type,col,xlab,ylab)
a. v is a vector containing the numeric values.
b. type takes the value "p" to draw only the points, "l" to draw only the lines
and "o" to draw both points and lines.
c. xlab is the label for x axis.
d. ylab is the label for y axis.
e. main is the Title of the chart.
f. col is used to give colors to both the points and lines.
4. We add color to the points and lines, give a title to the chart and add labels to the
axes.
PROGRAM:
# Create the data for the chart.
v <- c(7,12,28,3,41)
RESULT:
Thus the data is visualized using plotty framework.
OUTPUT:
Ex.No:7f) Date: DATA VISUALIZATION USING SCATTER PLOT
AIM:
To visualize data using plotty framework.
PROCEDURE:
1. The simple scatterplot is created using the plot() function.
2. The basic syntax for creating scatterplot in R is −
3. Following is the description of the parameters used −
i. plot(x, y, main, xlab, ylab, xlim, ylim, axes)
a. x is the data set whose values are the horizontal coordinates.
b. y is the data set whose values are the vertical coordinates.
c. main is the tile of the graph.
d. xlab is the label in the horizontal axis.
e. ylab is the label in the vertical axis.
f. xlim is the limits of the values of x used for plotting.
g. ylim is the limits of the values of y used for plotting.
h. axes indicates whether both axes should be drawn on the plot.
PROGRAM:
# Get the input values.
input<- mtcars[,c('wt','mpg')]
# Plot the chart for cars with weight between 2.5 to 5 and mileage between 15 and 30.
plot(x = input$wt,y = input$mpg,
xlab = "Weight",
ylab = "Milage",
xlim = c(2.5,5),
ylim = c(15,30),
main = "Weight vsMilage"
)
RESULT:
Thus the data is visualized using plotty framework.
OUTPUT:
Ex.No:8a) Date: APPLICATION TO ADJUST THE NUMBER OF BINS IN THE HISTOGRAM USING R
AIM:
To implement the application to adjust the number of bins in the histogram using r
language.
PROCEDURE:
Any shiny app is built using two components:
1. UI.R: This file creates the user interface in a shiny application. It provides interactivity to
the shiny app by taking the input from the user and dynamically displaying the generated
output on the screen.
2. Server.R: This file contains the series of steps to convert the input given by user into
the desired output to be displayed.
a. Before we proceed further you need to set up Shiny in your system. Follow
these steps to get started.
1. Create a new project in R Studio
2. Select type as Shiny web application.
3. It creates two scripts in R Studio named ui.R and server R.
4. Each file needs to be coded separately and the flow of input and output between two
is possible.
PROGRAM:
#
# This is a Shiny web application. You can run the application by clicking
# the 'Run App' button above.
#
# Find out more about building applications with Shiny here:
#
# http://shiny.rstudio.com/
#
library(shiny)
# Application title
titlePanel("Old Faithful Geyser Data"),
output$distPlot<- renderPlot({
# generate bins based on input$bins from
ui.R x <- faithful[, 2]
bins<- seq(min(x), max(x), length.out = input$bins + 1)
AIM:
PROCEDURE:
1a.Toanalyze stock data, Stock data can be obtained from Yahoo! Finance
(http://finance.yahoo.com) by using the quantmod package provides easy access to Yahoo!
Finance.
2b.AAPL is of the xts class (which is also a zoo-class object). xts objects (provided in the
xts package) are seen as improved versions of the ts object for storing time series data.
3a.In this stock data’s are stored based on time-based indexing and can provide custom
attributes, along with allowing multiple (presumably related) time series with the same time
index to be stored in the same object.
3b.Yahoo! Finance provides six series with each security. Open is the price of the stock
at the beginning of the trading day, high is the highest price of the stock on that trading day, low
the lowest price of the stock on that trading day, and close the price of the stock at closing time.
Volume indicates how many stocks were traded. Adjusted is the closing price of the stock that
adjusts the price of the stock for corporate actions.
4b.Visualization is obtained as
5a.Financial data is often plotted with the function called candleChart() from quantmod
to create a chart.