0% found this document useful (0 votes)

203 views80 pages

Big Data Analytics Lab Manual

Uploaded by

nandhusekar2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

203 views80 pages

Big Data Analytics Lab Manual

Uploaded by

nandhusekar2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 80

JAYM COLLEGE OF ENGINEERING AND

TECHNOLOGY, NALLANUR,

DHARMAPURI

BONAFIDE CERTIFICATE

Certified that this the bonafide record of practical work done by

Mr/Ms. of the SECOND semester in
M.E COMPUTER SCIENCE AND ENGINEERING branch of this institution in the
DATA ANALYTICS LABORATORY(CP5261) during academic year 2021-2022 in
partial fulfillment of the M.E Degree course of the ANNA UNIVERSITY, CHENNAI.

Staff In-Charge Head of the Department

Examination Held on: ………………………………

Internal Examiner External Examiner

CONTENTS

EX. DATE PROGRAM PAGE. SIGN

NO NO

1 Installation, Configuration, and Running of Hadoop and

HDFS

2 Implementation of Word Count / Frequency Programs

using MapReduce

3 Implementation of MR Program that processes a Weather

Dataset

4a Implementation of Linear Regression

4b Implementation of Logistic Regression

5a Implementation of SVM Classification Technique

5b Implementation of Decision Tree Classification Technique

6a Implementation of Hierarchical Clustering

6b Implementation of Partitioning Clustering

6c Implementation of Fuzzy Clustering

6d Implementation of Density Based Clustering

6e Implementation of Model Based Clustering

7a Data Visualization using Pie Chart Plotting Framework

7b Data Visualization using Bar Chart Plotting Framework

7c Data Visualization using Boxplot Plotting Framework

7d Data Visualization using Histogram Plotting Framework

7e Data Visualization using Line Graph Plotting Framework

7f Data Visualization using Scatterplot Plotting Framework

8a Application to adjust the Number of Bins in the Histogram

using R Language

8b Application to analyze Stock Market Data using R

Language
Ex.No:1 INSTALL,CONFIGURE AND RUN HADOOP AND HDFS
Date:

AIM:
To install, configure and run Hadoop and HDFS.
PROCEDURE:
1) Installing Java
Hadoop is a framework written in Java for running applications on large clusters of
commodity hardware. Hadoop needs Java 6 or above to work.
Step 1: Download tar and extract
Download Jdk tar.gz file for linux-64 bit, extract it into “/usr/local”
# cd /opt
# sudo tar xvpzf /home/itadmin/Downloads/jdk-8u5-linux-x64.tar.gz
# cd /opt/jdk1.8.0_05
Step 2: Set Environments
• Open the “/etc/profile” file and Add the following line as per the version
• Set a environment for Java
• Use the root user to save the /etc/proflie or use gedit instead of vi .
• The 'profile' file contains commands that ought to be run for login shells
# sudo vi /etc/profile
#--insert JAVA_HOME
JAVA_HOME=/opt/jdk1.8.0_05
#--in PATH variable just append at the end of the line PATH=$PATH:
$JAVA_HOME/bin
#--Append JAVA_HOME at end of the export statement
export PATH JAVA_HOME
save the file using by pressing “Esc” key followed by :wq!
Step 3: Source the /etc/profile
# source /etc/profile
Step 4: Update the java alternatives
1. By default OS will have a open jdk. Check by “java -version”. You will be prompt
“openJDK”
2. If you also have openjdk installed then you'll need to update the java alternatives:
3. If your system has more than one version of Java, configure which one your
system causes by entering the following command in a terminal window
4. By default OS will have a open jdk. Check by “java -version”. You will be prompt
“JavaHotSpot(TM) 64-Bit Server”
# update-alternatives --install "/usr/bin/java" java "/opt/jdk1.8.0_05/bin/java"
1 # update-alternatives --config java
--type selection number:
# java -version
2) configuressh
•Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your
localmachine if you want to use Hadoop on it (which is what we want to do in this exercise).For
our single-node setup of Hadoop, we therefore need to configure SSH access to localhostThe
need to create a Password-less SSH Key generation based authentication is so thatthe master
node can then login to slave nodes (and the secondary node) to start/stopthem easily without any
delays for authentication
•If you skip this step, then have to provide passwordGenerate an SSH key for the user.
Then Enable password-less SSH access to you sudo apt-get install openssh-server
--You will be asked to enter password,
root@abc []#sshlocalhost
root@abc[]# ssh-keygen
root@abc[]# ssh-copy-id -i localhost
--After above 2 steps, You will be connected without password,
root@abc[]# sshlocalhost
root@abc[]# exit
3) Hadoop installation
•Now Download Hadoop from the official Apache, preferably a stable release
version ofHadoop 2.7.x and extract the contents of the Hadoop package to a location of
yourchoice.
•For example, choose location as “/opt/”
Step 1: Download the tar.gz file of latest version Hadoop( hadoop-2.7.x) from the official
site .
Step 2: Extract (untar) the downloaded file from this commands to /opt/bigdata
root@abc[]# cd /opt
root@abc[/opt]# sudo tar xvpzf /home/itadmin/Downloads/hadoop-2.7.0.tar.gz
root@abc[/opt]# cd hadoop-2.7.0/
Like java, update Hadop environment variable in /etc/profile
# sudo vi /etc/profile
#--insert HADOOP_PREFIX
HADOOP_PREFIX=/opt/hadoop-2.7.0
#--in PATH variable just append at the end of the line PATH=$PATH:
$HADOOP_PREFIX/bin
#--Append HADOOP_PREFIX at end of the export statement

export PATH JAVA_HOME HADOOP_PREFIX

save the file using by pressing “Esc” key followed by :wq!
Step 3: Source the /etc/profile
# source /etc/profile

Verify Hadoop installation

# cd $HADOOP_PREFIX
# bin/hadoop version
Modify the Hadoop Configuration Files
• In this section, we will configure the directory where Hadoop will store its configuration
files, the network ports it listens to, etc. Our setup will use Hadoop Distributed File System,
(HDFS), even though we are using only a single local machine.
• Add the following properties in the various hadoop configuration files which is available
under $HADOOP_PREFIX/etc/hadoop/
• core-site.xml, hdfs-site.xml, mapred-site.xml & yarn-site.xml
Update Java, hadoop path to the Hadoop environment file
# cd $HADOOP_PREFIX/etc/hadoop
# vi hadoop-env.sh
Paste following line at beginning of the file
export JAVA_HOME=/usr/local/jdk1.8.0_05
export HADOOP_PREFIX=/opt/hadoop-2.7.0
Modify the core-site.xml
# cd $HADOOP_PREFIX/etc/hadoop
# vi core-site.xml
Paste following between <configuration> tags
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
< /configuration>

Modify the hdfs-site.xml

# vi hdfs-site.xml
Paste following between <configuration> tags
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
YARN configuration - Single Node
modify the mapred-site.xml
# cpmapred-site.xml.template mapred-site.xml
# vi mapred-site.xml
Paste following between <configuration> tags
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Modiy yarn-site.xml
# vi yarn-site.xml
Paste following between <configuration> tags
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Formatting the HDFS file-system via the NameNode
• The first step to starting up your Hadoop installation is formatting the Hadoop files
system which is implemented on top of the local file system of our “cluster” which
includes only our local machine. We need to do this the first time you set up a Hadoop
cluster.
• Do not format a running Hadoop file system as you will lose all the data currently in the
cluster (in HDFS)
root@abc[]# cd $HADOOP_PREFIX
root@abc[]# bin/hadoopnamenode -format
Start NameNode daemon and DataNode daemon: (port 50070)
root@abc[]# sbin/start-dfs.sh
To know the running daemons jut type jps or /usr/local/jdk1.8.0_05/bin/jps
Start ResourceManager daemon and NodeManager daemon: (port 8088)
root@abc[]# sbin/start-yarn.sh
To stop the running process
root@abc[]# sbin/stop-dfs.sh
To know the running daemons jut type jps or /usr/local/jdk1.8.0_05/bin/jps
Start ResourceManager daemon and NodeManager daemon: (port 8088)
root@abc[]# sbin/stop-yarn.sh
Make the HDFS directories required to execute MapReduce jobs:
$ bin/hdfsdfs -mkdir /user
$ bin/hdfsdfs -mkdir /user/mit
• Copy the input files into the distributed filesystem:
$ bin/hdfsdfs -put <input-path>/* /input
• Run some of the examples provided:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar
grep /input /output '(CSE)'
• Examine the output files:

Copy the output files from the distributed filesystem to the local filesystem and examine them:
$ bin/hdfsdfs -get output output
$ cat output/* or
• View the output files on the distributed filesystem:
$ bin/hdfsdfs -cat /output/*

RESULT:
Thus the installation and configuration of Hadoop and HDFS is successfully executed.
Ex.No:2 IMPLEMENT WORD COUNT / FREQUENCY PROGRAMS USING MAPREDUCE
Date:

AIM:
Word count program to demonstrate the use of Map and Reduce tasks.
PROCEDURE:
1. Analyze the input file content.
2. Develop the code.
a. Writing a map function.
b. Writing a reduce function.
c. Writing the Driver class.
3. Compiling the source.
4. Building the JAR file.
5. Starting the DFS.
6. Creating Input path in HDFS and moving the data into Input path.
7. Executing the program.
PROGRAM:
Import java.io.IOException;
Import java.util.StringTokenizer;
Import org.apache.hadoop.conf.Configuration;
Import org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import org.apache.hadoop.io.Text;
Import org.apache.hadoop.mapreduce.Job;
Import org.apache.hadoop.mapreduce.Mapper;
Import org.apache.hadoop.mapreduce.Reducer;
Import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount
{
//Step a
public static class TokenizerMapper extends Mapper < Object , Text, Text, IntWritable>
{
//hadoop supported data types
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
//map method that performs the tokenizer job and framing the initial key value pairs
public void map( Object key, Text value, Context context) throws IOException ,
InterruptedException
{
//taking one line at a time and tokenizing the same
StringTokenizeritr = new StringTokenizer (value.toString());
//iterating through all the words available in that line and forming the key value pair
while (itr.hasMoreTokens())
{
word.set(itr.nextToken());
//sending to the context which inturn passes the same to reducer
context.write(word, one);
}
}
}
//Step b
public static class IntSumReducer extends Reducer < Text, IntWritable, Text, IntWritable>
{
privateIntWritable result = new IntWritable();
// Reduce method accepts the Key Value pairs from mappers, do the aggregation based on keys
// and produce the final output
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws
IOException ,InterruptedException }

int sum = 0;
/*iterates through all the values available with a key and
add them together and give the final result as the key and sum of its values*/
for (IntWritableval: values)
{
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
//Step c
public static void main( String [] args) throws Exception
{
//creating conf instance for Job Configuration
Configuration conf = new Configuration();
//Parsing the command line arguments
String [] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
if (otherArgs.length< 2)
{
System .err.println( "Usage: wordcount<in> [<in>...]<out>" );
System .exit(2);
}
//Create a new Job creating a job object and assigning a job name for identification
//purposes
Job job = new Job(conf, "word count" );
job.setJarByClass(WordCount.class);
// Specify various job specific parameters
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
//Setting job object with the Data Type of output Key
job.setOutputKeyClass(Text.class);
//Setting job object with the Data Type of output value
job.setOutputValueClass(IntWritable.class);
//the hdfs input and output directory to be fetched from the command line
for ( int i = 0; i <otherArgs.length 1; ++i)
{
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length 1]));
System .exit(job.waitForCompletion( true ) ? 0 : 1);
}
}

Compiling the source:

bigdata@localhost:$ cd /home/bigdata/Downloads/mrcode/src
bigdata@localhost:/home/bigdata/Downloads/mrcode/src$ javacclasspath
../lib/hadoopcommon2.5.0.jar:../lib/hadoopmapreduceclientcore2.5.0.jar:../lib/commonscli1.2.j
ar d../build/ bigdata/WordCount.java
Building the JAR File:
bigdata@localhost:/home/bigdata/Downloads/mrcode/src$ cd ../build/
bigdata@localhost:/home/bigdata/Downloads/mrcode/build$ jar cvfwc.jar .
Starting the DFS (if not running already)
bigdata@localhost:/home/bigdata/Downloads/hadoop2.5.1$sbin/startdfs.sh
Creating Input path in HDFS and moving the data into Input path
bigdata@localhost:/home/bigdata/Downloads/hadoop2.5.1$bin/hadoopfsmkdir/mrin
bigdata@localhost:/home/bigdata/Downloads/hadoop2.5.1$bin/hadoopfs -copyFromLocal
/home/bigdata/Downloads/mrcode/mrsampledata/* hdfs://localhost:9000/mrin
Executing the program
bigdata@localhost:/home/bigdata/Downloads/hadoop2.5.1$bin/hadoop jar
/home/bigdata/Downloads/mrcode/build/wc.jar bigdata.WordCount /mrin /mrout1

RESULT:
Thus the Word count program to use Map and reduce tasks is demonstrated successfully.
Ex.No:3 IMPLEMENT AN MR PROGRAM THAT PROCESSES A WEATHER DATASET
Date:

AIM:
Map Reduce program to that processes a weather dataset R.
PROCEDURE:
1. Analyze the input file content.
2. Develop the code.
a. Writing a map function.
b. Writing a reduce function.
c. Writing the Driver class.
3. Compiling the source.
4. Building the JAR file.
5. Starting the DFS.
6. Creating Input path in HDFS and moving the data into Input path.
7. Executing the program.

PROGRAM:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class CalculateMaxAndMinTemeratureWithTime

{ public static String calOutputName =
"California"; public static String nyOutputName =
"Newyork";
public static String njOutputName = "Newjersy";
public static String ausOutputName = "Austin";
public static String bosOutputName = "Boston";
public static String balOutputName = "Baltimore";
public static class WhetherForcastMapper extends
Mapper<Object, Text, Text, Text> {
public void map(Object keyOffset, Text dayReport, Context con)
throws IOException, InterruptedException {
StringTokenizer strTokens = new
StringTokenizer( dayReport.toString(), "\t");
int counter = 0;
Float currnetTemp = null;
Float minTemp =
Float.MAX_VALUE; Float maxTemp
= Float.MIN_VALUE; String date =
null;
String currentTime = null;
String minTempANDTime = null;
String maxTempANDTime =
null;

while (strTokens.hasMoreElements()) {
if (counter == 0) {
date = strTokens.nextToken();
} else {
if (counter % 2 == 1) {
currentTime = strTokens.nextToken();
} else {
currnetTemp =
Float.parseFloat(strTokens.nextToken());
if (minTemp > currnetTemp)
{ minTemp =
currnetTemp;
currentTime; minTempANDTime = minTemp + "AND" +
}
if (maxTemp < currnetTemp)
{ maxTemp =
currnetTemp;
+ currentTime; maxTempANDTime = maxTemp + "AND"
}
}
}
counter++;
}
// Write to context - MinTemp, MaxTemp and corresponding time
Text temp = new Text();
temp.set(maxTempANDTime);
Text dateText = new Text();
dateText.set(date);
try {
con.write(dateText, temp);
} catch (Exception e) {
e.printStackTrace();
}
temp.set(minTempANDTime);
dateText.set(date);
con.write(dateText, temp);
}
}
public static class WhetherForcastReducer extends
Reducer<Text, Text, Text, Text> {
MultipleOutputs<Text, Text> mos;
public void setup(Context context) {
mos = new MultipleOutputs<Text, Text>(context);
}
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
int counter = 0;
String reducerInputStr[] = null;
String f1Time = "";
String f2Time = "";
String f1 = "", f2 = "";
Text result = new Text();
for (Text value : values) {
if (counter == 0) {
reducerInputStr = value.toString().split("AND");
f1 = reducerInputStr[0];
f1Time = reducerInputStr[1];
}
else
{ reducerInputStr = value.toString().split("AND");
f2 = reducerInputStr[0];
f2Time = reducerInputStr[1];

}
counter = counter + 1;
}
if (Float.parseFloat(f1) > Float.parseFloat(f2)) {
result = new Text("Time: " + f2Time + " MinTemp: " + f2 + "\t"
+ "Time: " + f1Time + " MaxTemp: " + f1);
} else {

result = new Text("Time: " + f1Time + " MinTemp: " + f1 + "\t"

+ "Time: " + f2Time + " MaxTemp: " + f2);
}
String fileName = "";
if (key.toString().substring(0, 2).equals("CA")) {
fileName = CalculateMaxAndMinTemerature.calOutputName;
} else if (key.toString().substring(0, 2).equals("NY")) {
fileName = CalculateMaxAndMinTemerature.nyOutputName;
} else if (key.toString().substring(0, 2).equals("NJ")) {
fileName = CalculateMaxAndMinTemerature.njOutputName;
} else if (key.toString().substring(0, 3).equals("AUS")) {
fileName = CalculateMaxAndMinTemerature.ausOutputName;
} else if (key.toString().substring(0, 3).equals("BOS")) {
fileName = CalculateMaxAndMinTemerature.bosOutputName;
} else if (key.toString().substring(0, 3).equals("BAL")) {
fileName = CalculateMaxAndMinTemerature.balOutputName;
}
String strArr[] = key.toString().split("_");
key.set(strArr[1]);
mos.write(fileName, key, result);
}
@Override
public void cleanup(Context context) throws IOException,
InterruptedException {
mos.close();
}}
public static void main(String[] args) throws IOException,
ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Wheather Statistics of USA");
job.setJarByClass(CalculateMaxAndMinTemeratureWithTime.class);
job.setMapperClass(WhetherForcastMapper.class);
// job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(WhetherForcastReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);// <hadoop,4>
job.setOutputValueClass(Text.class);
MultipleOutputs.addNamedOutput(job,
calOutputName,
TextOutputFormat.class, Text.class, Text.class);
MultipleOutputs.addNamedOutput(job, nyOutputName,
TextOutputFormat.class, Text.class, Text.class);
MultipleOutputs.addNamedOutput(job, njOutputName,
TextOutputFormat.class, Text.class, Text.class);
MultipleOutputs.addNamedOutput(job, bosOutputName,
TextOutputFormat.class, Text.class, Text.class);
MultipleOutputs.addNamedOutput(job, ausOutputName,
TextOutputFormat.class, Text.class, Text.class);
MultipleOutputs.addNamedOutput(job, balOutputName,
TextOutputFormat.class, Text.class, Text.class);

// FileInputFormat.addInputPath(job, new Path(args[0]));

// FileOutputFormat.setOutputPath(job, new Path(args[1]));
Path pathInput = new Path(

"hdfs://192.168.213.133:54310/wheatherInputData/input_temp.txt");
Path pathOutputDir = new Path(
"hdfs://192.168.213.133:54310/user/hduser1/testfs/output_mapred5");
FileInputFormat.addInputPath(job, pathInput);
FileOutputFormat.setOutputPath(job, pathOutputDir);
try {
System.exit(job.waitForCompletion(true) ? 0 : 1);
System.out.println("Job executed successfully!!");
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}}}

Input Dataset:

CA_25-Jan-2014 00:12:345 15.7 01:19:345 23.1 02:34:542 12.3

03:12:187 16 04:00:093 -14 05:12:345 35.7 06:19:345 23.1
07:34:542 12.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 -22.3 13:12:187 16 14:00:093 -7
15:12:345 15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -7
CA_26-Jan-2014 00:54:245 15.7 01:19:345 23.1 02:34:542 12.3
03:12:187 16 04:00:093 -14 05:12:345 55.7 06:19:345 23.1
07:34:542 12.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 12.3 13:12:187 16 14:00:093 -7
15:12:345 15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -7
CA_27-Jan-2014 00:14:045 35.7 01:19:345 23.1 02:34:542 -22.3
03:12:187 16 04:00:093 -14 05:12:345 35.7 06:19:345 23.1
07:34:542 12.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 12.3 13:12:187 16 14:00:093 -7
15:12:345 15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -7
CA_28-Jan-2014 00:22:315 15.7 01:19:345 23.1 02:34:542 12.3
03:12:187 16 04:00:093 -14 05:12:345 35.7 06:19:345 23.1
07:34:542 12.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 -23.3 12:34:542 12.3 13:12:187 16 14:00:093 -7
15:12:345 15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -7
CA_29-Jan-2014 00:15:345 15.7 01:19:345 23.1 02:34:542 52.9
03:12:187 16 04:00:093 -14 05:12:345 45.0 06:19:345 23.1
07:34:542 -2.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 12.3 13:12:187 16 14:00:093 -17
15:12:345 15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -7
NJ_29-Jan-2014 00:15:345 15.7 01:19:345 23.1 02:34:542 52.9
03:12:187 16 04:00:093 -14 05:12:345 45.0 06:19:345 23.1
07:34:542 -2.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 12.3 13:12:187 16 14:00:093 -17
15:12:345 15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -7
CA_30-Jan-2014 00:22:445 15.7 01:19:345 23.1 02:34:542 12.3
03:12:187 56 04:00:093 -14 05:12:345 35.7 06:19:345 39.6
07:34:542 12.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 12.3 13:12:187 16 14:00:093 -7
15:12:345 -15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -7
CA_31-Jan-2014 00:42:245 15.7 01:19:345 23.1 02:34:542 12.3
03:12:187 16 04:00:093 -14 05:12:345 49.2 06:19:345 23.1
07:34:542 12.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 12.3 13:12:187 16 14:00:093 -7
15:12:345 15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -27
NY_29-Jan-2014 00:15:345 15.7 01:19:345 23.1 02:34:542 52.9
03:12:187 16 04:00:093 -14 05:12:345 45.0 06:19:345 23.1
07:34:542 -2.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 12.3 13:12:187 16 14:00:093 -17
15:12:345 15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -7
NY_30-Jan-2014 00:22:445 15.7 01:19:345 23.1 02:34:542 12.3
03:12:187 56 04:00:093 -14 05:12:345 35.7 06:19:345 39.6
07:34:542 12.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 12.3 13:12:187 16 14:00:093 -7
15:12:345 -15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -7
NY_31-Jan-2014 00:42:245 15.7 01:19:345 23.1 02:34:542 12.3
03:12:187 16 04:00:093 -14 05:12:345 49.2 06:19:345 23.1
07:34:542 12.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 12.3 13:12:187 16 14:00:093 -7
15:12:345 15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -27
NJ_30-Jan-2014 00:22:445 15.7 01:19:345 23.1 02:34:542 12.3
03:12:187 56 04:00:093 -14 05:12:345 35.7 06:19:345 39.6
07:34:542 12.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 12.3 13:12:187 16 14:00:093 -7
15:12:345 -15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -7
AUS_25-Jan-2014 00:12:345 15.7 01:19:345 23.1 02:34:542 12.3
03:12:187 16 04:00:093 -14 05:12:345 35.7 06:19:345 23.1
07:34:542 12.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 -22.3 13:12:187 16 14:00:093 -7
15:12:345 15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -7
AUS_26-Jan-2014 00:54:245 15.7 01:19:345 53.1 02:34:542 12.3
03:12:187 16 04:00:093 -14 05:12:345 55.7 06:19:345 23.1
07:34:542 12.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 12.3 13:12:187 16 14:00:093 -7
15:12:345 15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -27
AUS_27-Jan-2014 00:14:045 35.7 01:19:345 23.1 02:34:542 -22.3
03:12:187 16 04:00:093 -14 05:12:345 55.7 06:19:345 23.1
07:34:542 12.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 12.3 13:12:187 16 14:00:093 -7
15:12:345 15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -7
BOS_28-Jan-2014 00:22:315 15.7 01:19:345 33.1 02:34:542 12.3
03:12:187 16 04:00:093 -14 05:12:345 35.7 06:19:345 23.1
07:34:542 12.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 -23.3 12:34:542 12.3 13:12:187 16 14:00:093 -7
15:12:345 15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -7
AUS_29-Jan-2014 00:15:345 15.7 01:19:345 23.1 02:34:542 62.9
03:12:187 16 04:00:093 -14 05:12:345 45.0 06:19:345 23.1
07:34:542 -2.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 12.3 13:12:187 16 14:00:093 -17
15:12:345 15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 36
22:00:093 -7
BOS_29-Jan-2014 00:15:345 15.7 01:19:345 23.1 02:34:542 52.9
03:12:187 16 04:00:093 -14 05:12:345 45.0 06:19:345 23.1
07:34:542 -2.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 12.3 13:12:187 16 14:00:093 -17
15:12:345 15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -7
BOS_30-Jan-2014 00:22:445 15.7 01:19:345 13.1 02:34:542 12.3
03:12:187 56 04:00:093 -14 05:12:345 35.7 06:19:345 39.6
07:34:542 12.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 12.3 13:12:187 16 14:00:093 -7
15:12:345 -15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -7
BAL_29-Jan-2014 00:42:245 15.7 01:19:345 23.1 02:34:542 12.3
03:12:187 16 04:00:093 -14 05:12:345 49.2 06:19:345 23.1
07:34:542 12.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 12.3 13:12:187 16 14:00:093 -27
15:12:345 15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -27
BAL_30-Jan-2014 00:15:345 15.7 01:19:345 23.1 02:34:542 52.9
03:12:187 16 04:00:093 -14 05:12:345 45.0 06:19:345 23.1
07:34:542 -2.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 12.3 13:12:187 16 14:00:093 -17
15:12:345 15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -7
BAL_31-Jan-2014 00:22:445 15.7 01:19:345 13.1 02:34:542 12.3
03:12:187 56 04:00:093 -14 05:12:345 35.7 06:19:345 39.6
07:34:542 12.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 12.3 13:12:187 16 14:00:093 -7
15:12:345 -15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -7
AUS_30-Jan-2014 00:42:245 15.7 01:19:345 23.1 02:34:542 12.3
03:12:187 16 04:00:093 -14 05:12:345 49.2 06:19:345 23.1
07:34:542 12.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 12.3 13:12:187 16 14:00:093 -7
15:12:345 15.7 16:19:345 23.1 19:34:542 12.3 20:12:187 16
22:00:093 -27
AUS_31-Jan-2014 00:22:445 15.7 01:19:345 43.1 02:34:542 12.3
03:12:187 56 04:00:093 -14 05:12:345 35.7 06:19:345 39.6
07:34:542 12.3 08:12:187 16 09:00:093 -7 10:12:345 15.7
11:19:345 23.1 12:34:542 12.3 13:12:187 16 14:00:093 -17
15:12:345 -15.7 16:19:345 23.1 19:34:542 32.3 20:12:187 16
22:00:093 -17

RESULT:
Thus the Map Reduce program that processes a weather dataset R is executed
successfully.
OUTPUT:

hduser1@ubuntu:/usr/local/hadoop2.6.1/bin$ ./hadoop fs -cat

/user/hduser1/testfs/output_mapred3/Austin-r-00000
25-Jan-2018 Time: 12:34:542 MinTemp: -22.3 Time: 05:12:345 MaxTemp: 35.7
26-Jan-2018 Time: 22:00:093 MinTemp: -27.0 Time: 05:12:345 MaxTemp: 55.7
27-Jan-2018 Time: 02:34:542 MinTemp: -22.3 Time: 05:12:345 MaxTemp: 55.7
29-Jan-2018 Time: 14:00:093 MinTemp: -17.0 Time: 02:34:542 MaxTemp: 62.9
30-Jan-2018 Time: 22:00:093 MinTemp: -27.0 Time: 05:12:345 MaxTemp: 49.2
31-Jan-2018 Time: 14:00:093 MinTemp: -17.0 Time: 03:12:187 MaxTemp: 56.0
Ex.No:4a) Date: IMPLEMENTATION OF LINEAR REGRESSION

AIM:

To write the implementation of linear regression.

PROCEDURE:

1. Linear regression is used to predict a quantitative outcome variable (y) on the basis of
one or multiple predictor variables (x)
2. The goal is to build a mathematical formula that defines y as a function of the x variable.
3. When you build a regression model, you need to assess the performance of the predictive
model.
4. Two important metrics are commonly used to assess the performance of the predictive
regression model:
5. Root Mean Squared Error, which measures the model prediction error. It corresponds to
the average difference between the observed known values of the outcome and the
predicted value by the model. RMSE is computed as RMSE = mean((observeds -
predicteds)^2) %>% sqrt(). The lower the RMSE, the better the model.
6. R-square, representing the squared correlation between the observed known outcome
values and the predicted values by the model. The higher the R2, the better the model.

PROGRAM:

X=c(151,174,138,186,128,136,179,163,152,131)

Y=c(63,81,56,91,47,57,76,72,62,48)

plot(X,Y)

relation=lm(Y~X)

print(relation)

print(summary(relation))

a=data.frame(X=170)

result=predict(relation,a)

print(result)

png(file="linearregression.png")
plot(Y,X,col="green",main="Height & Weight Regression",abline(lm(X~Y)),
cex=1.3,pch=16,Xlab="Weight in kg",Ylab="Height in cm")

dev.off()

RESULT:

Thus the implementation of linear regression was executed and verified successfully.
OUTPUT:

> a=data.frame(X=170)
>result=predict(relation,a)
>print(result)
1
76.22869

>png(file="linearregression.png")
>plot(Y,X,col="green",main="Height & Weight Regression",abline(lm(X~Y)),
cex=1.3,pch=16,Xlab="Weight in kg",Ylab="Height in cm")
>dev.off()
RStudioGD
2
Ex.No:4b) Date: IMPLEMENTATION OF LOGISTIC REGRESSION

AIM:

To write the implementation of logistic regression.

PROCEDURE:

1. Logistic regression is used to predict the class of individuals based on one or multiple
predictor variables (x).
2. It is used to model a binary outcome, that is a variable, which can have only two
possible values: 0 or 1, yes or no, diseased or non-diseased.
3. Logistic regression belongs to a family, named Generalized Linear Model (GLM),
developed for extending the linear regression model to other situations.
4. Other synonyms are binary logistic regression, binomial logistic regression and logit
model.
5. Logistic regression does not return directly the class of observations. It allows us to
estimate the probability (p) of class membership. The probability will range between
0 and 1.
PROGRAM:

input=mtcars[,c("am","cyl","hp","wt")]

am.data=glm(formula=am~cyl+hp+wt,data=input,family = binomial)

print(summary(am.data))

RESULT:

Thus the implementation of logistic regression was executed and verified successfully.
OUTPUT:

Call:
glm(formula = am ~ cyl + hp + wt, family = binomial, data = input)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.17272 -0.14907 -0.01464 0.14116 1.27641

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 19.70288 8.11637 2.428 0.0152 *
cyl 0.48760 1.07162 0.455 0.6491
hp 0.03259 0.01886 1.728 0.0840 .
wt -9.14947 4.15332 -2.203 0.0276 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 43.2297 on 31 degrees of freedom

Residual deviance: 9.8415 on 28 degrees of freedom
AIC: 17.841

Number of Fisher Scoring iterations: 8

Ex.No:5a) Date: IMPLEMENTATION OF SVM CLASSIFICATION TECHNIQUE

AIM:

To write the implementation of SVM Classification.

PROCEDURE:

1. To use SVM in R, we have a package e1071.

2. The package is not preinstalled, hence one needs to run the line
“install.packages(“e1071”) to install the package
3. Then import the package contents using the library command--library(e1071)
PROGRAM:

x=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
y=c(3,4,5,4,8,10,10,11,14,20,23,24,32,34,35,37,42,48,53,60)

#Create a data frame of the data

train=data.frame(x,y)

#Plot the dataset

plot(train,pch=16)

#Linear regression
model<- lm(y ~ x, train)

#Plot the model using abline

abline(model)

#SVM
library(e1071)

#Fit a model. The function syntax is very similar to lm function

model_svm<- svm(y ~ x , train)

#Use the predictions on the data

pred<- predict(model_svm, train)

#Plot the predictions and the plot to see our model fit
points(train$x, pred, col = "blue", pch=4)
#Linear model has a residuals part which we can extract and directly calculate rmse
error<- model$residuals
lm_error<- sqrt(mean(error^2)) # 3.832974

#Forsvm, we have to manually calculate the difference between actual values (train$y) with our
predictions (pred)
error_2 <- train$y - pred
svm_error<- sqrt(mean(error_2^2)) # 2.696281

# perform a grid search

svm_tune<- tune(svm, y ~ x, data = train,
ranges = list(epsilon = seq(0,1,0.01), cost = 2^(2:9))
)
print(svm_tune)

#Parameter tuning of „svm‟:

# - sampling method: 10-fold cross validation

#- best parameters:
# epsilon cost
#0 8

#- best performance: 2.872047

#The best model

best_mod<- svm_tune$best.model
best_mod_pred<- predict(best_mod, train)

error_best_mod<- train$y - best_mod_pred

# this value can be different on your computer

# because the tune method randomly shuffles the data
best_mod_RMSE<- sqrt(mean(error_best_mod^2)) # 1.290738

plot(svm_tune)

plot(train,pch=16)
points(train$x, best_mod_pred, col = "blue", pch=4)
RESULT:

Thus the implementation of SVM was executed and verified successfully.

OUTPUT:
Ex.No:5b) Date: IMPLEMENTATION OF DECISSION TREE CLASSIFICATION TECHNIQUE

AIM:

To write the implementation of decision tree classification.

PROCEDURE:

1. Install party packages

a. install.packages("party")
i. it has the ctree function.
2. Create the input data
i. # Create the input data frame.
ii. input.dat <- readingSkills[c(1:105),]
3. Give the chart file a name.
i. png(file = "decision_tree.png")
4. Create the tree.
i. output.tree <- ctree( nativeSpeaker ~ age + shoeSize + score, data =
input.dat)
5. Plot the tree.
i. plot(output.tree)

6. Save the file

i. dev.off()

PROGRAM:

library(party)
input.dat <- readingSkills[c(1:105),]
png(file = "decision_tree.png")
output.tree <- ctree( nativeSpeaker ~ age + shoeSize + score, data = input.dat)
plot(output.tree)
dev.off()
RESULT:

Thus the implementation of decision tree classification was executed and verified
successfully.
OUTPUT:

null device

Loading required package: methods

Loading required package: grid

Loading required package: mvtnorm

Loading required package: modeltools

Loading required package: stats4

Loading required package: strucchange

Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’: as.Date, as.Date.numeric

Loading required package: sandwich

Ex.No:6a) Date: IMPLEMENTATION OF HIERARCHICAL CLUSTERING

AIM:

To write the implementation of clustering techniques using hierarchical clustering.

PROCEDURE:

1. Hierarchical clustering is an alternative approach to partitioning clustering for identifying

groups in the dataset.
2. It does not require to pre-specify the number of clusters to be generated.
3. The result of hierarchical clustering is a tree-based representation of the objects, which is
also known as dendrogram.
4. Observations can be subdivided into groups by cutting the dendrogram at a desired
similarity level.
5. R code to compute and visualize hierarchical clustering.
PROGRAM:

install.packages("factoextra")

install.packages("cluster")

install.packages("magrittr")

library("factoextra")

library("cluster")

library("magrittr")

res.hc <- USArrests %>%

scale() %>%

# Scale the data

dist(method = "euclidean") %>% # Compute dissimilarity

matrix hclust(method = "ward.D2") # Compute hierachical

clustering
# Visualize using factoextra

# Cut in 4 groups and color by groups

fviz_dend(res.hc, k = 4, # Cut in four groups

cex = 0.5, # label size

k_colors = c("#2E9FDF", "#00AFBB", "#E7B800", "#FC4E07"),

color_labels_by_k = TRUE, # color labels by groups

rect = TRUE # Add rectangle around groups

RESULT:

Thus the implementation of clustering techniques using hierarchical clustering was

executed and verified successfully.
OUTPUT:
Ex.No:6b) Date: IMPLEMENTATION OF PARTITIONING CLUSTERING

AIM:

To write the implementation of clustering techniques using partitioning clustering.

PROCEDURE:

1. Partitioning algorithms are clustering techniques that subdivide the data sets into a set of
k groups, where k is the number of groups pre-specified by the analyst.
2. There are different types of partitioning clustering methods. The most popular is the K-
means clustering (MacQueen 1967), in which, each cluster is represented by the center or
means of the data points belonging to the cluster. The K-means method is sensitive to
outliers.
3. An alternative to k-means clustering is the K-medoids clustering or PAM (Partitioning
Around Medoids, Kaufman & Rousseeuw, 1990), which is less sensitive to outliers
compared to k-means.
4. Determining the optimal number of clusters: use factoextra::fviz_nbclust()
5. Compute and visualize k-means clustering.
PROGRAM:

install.packages("factoextra")
install.packages("magrittr")
install.packages("cluster")
library("factoextra")
library("magrittr")
library("cluster")
set.seed(123)
km.res<-kmeans(my_data, 3, nstart=25)
# Visualize
library("factoextra")
fviz_cluster(km.res, data=my_data,
ellipse.type="convex",
palette="jco",
ggtheme=theme_minimal())
RESULT:

Thus the implementation of clustering techniques using partitioning clustering was

executed and verified successfully.
OUTPUT:
Ex.No:6c) Date: IMPLEMENTATION OF FUZZY CLUSTERING

AIM:

To write the implementation of clustering techniques using fuzzy clustering.

PROCEDURE:

1. Fuzzy clustering is also known as soft method. Standard clustering approaches

produce partitions (K-means, PAM), in which each observation belongs to only
one cluster. This is known as hard clustering.
2. In Fuzzy clustering, items can be a member of more than one cluster. The Fuzzy
c-means method is the most popular fuzzy clustering algorithm.
3. Cluster for computing fuzzy clustering
4. factoextra for visualizing clusters.
5. The function fanny()can be used to compute fuzzy clustering.
6. Compute and visualize fuzzy clustering using the
combination of cluster and factoextra R packages.

PROGRAM:

install.packages("factoextra")
install.packages("magrittr")
install.packages("cluster")
library("factoextra")
library("magrittr")
library("cluster")
library(cluster)
df<-scale(USArrests)# Standardize the data
res.fanny<-fanny(df, 2)# Compute fuzzy clustering with k = 2
head(res.fanny$membership, 3)# Membership coefficients
res.fanny$coeff# Dunn's partition coefficient
head(res.fanny$clustering)# Observation groups
library(factoextra)
fviz_cluster(res.fanny, ellipse.type="norm", repel=TRUE,
palette="jco", ggtheme=theme_minimal(),
legend="right")

RESULT:

Thus the implementation of clustering techniques using fuzzy clustering was executed
and verified successfully.
OUTPUT:
Ex.No:6d) Date: IMPLEMENTATION OF DENSITY BASED CLUSTERING

AIM:

To write the implementation of clustering techniques using density based clustering.

PROCEDURE:

1. It can be used to identify clusters of any shape in a data set containing noise and outliers.
2. Clusters are dense regions in the data space, separated by regions of lower density
of points.
3. The simulated data set multishapes is used.
4. The function fviz_cluster() is used to visualize the clusters.
5. First, install factoextra: install.packages(“factoextra”); then compute and visualize k-
means clustering using the data set multishapes.
6. The goal is to identify dense regions, which can be measured by the number of
objects close to a given point.

PROGRAM:

install.packages("factoextra")
install.packages("magrittr")
install.packages("cluster")
library("factoextra")
library("magrittr")
library("cluster")
install.packages("fpc")

install.packages("dbscan")

install.packages("factoextra")

# Load the data

data("multishapes", package="factoextra")
df<-multishapes[, 1:2]

# Compute DBSCAN using fpc package

library("fpc")

set.seed(123)

db<-fpc::dbscan(df, eps=0.15, MinPts=5)

# Plot DBSCAN results

library("factoextra")

fviz_cluster(db, data=df, stand=FALSE,

ellipse=FALSE,

show.clust.cent=FALSE,

geom="point",palette="jco", ggtheme=theme_classic())

RESULT:

Thus the implementation of clustering techniques using density based clustering was
executed and verified successfully.
OUTPUT:
Ex.No:6e) Date: IMPLEMENTATION OF MODEL BASED CLUSTERING

AIM:

To write the implementation of clustering techniques using model based clustering.

PROCEDURE:

1. In model-based clustering, the data are viewed as coming from a distribution that is
mixture of two ore more clusters.
2. It finds best fit of models to data and estimates the number of clusters.
3. Install the mclust package as follow: install.packages(“mclust”).
4. Model-based clustering results can be drawn using the base function plot.Mclust() .
5. fviz_mclust() uses a principal component analysis to reduce the dimensionnality of
the data.

PROGRAM:

install.packages("factoextra")

install.packages("cluster")

install.packages("magrittr")

library("cluster")

library("factoextra")

library("magrittr")

library("mclust")

data("diabetes")

head(diabetes, 3)

library(factoextra)

# BIC values used for choosing the number of clusters

fviz_mclust(mc, "BIC", palette="jco")

# Classification: plot showing the clustering

fviz_mclust(mc, "classification", geom="point",

pointsize=1.5, palette="jco")

# Classification uncertainty

fviz_mclust(mc, "uncertainty", palette="jco")

RESULT:

Thus the implementation of clustering techniques using model based clustering was
executed and verified successfully.
OUTPUT:

##
## Gaussian finite mixture model fitted by EM algorithm
##
##
## Mclust VVV (ellipsoidal, varying volume, shape, and
orientation) model with 3 components:
##
## log.likelihood n df BIC ICL
## -169 145 29 -483 -501
##
## Clustering table:
## 1 2 3
## 81 36 28
Model selection
Best model: VVV | Optimal clusters: n = 3

-500

-750

-1000
-s- Eli - - VI EEI
VEI EVI —•— VVI
— - EEE EVE — - VEE
— - VVE —e•- EEV VE'v
-1250 EVV VVV

1 2 3 4 5 6 7 8 9
Number of components

Cluster plot
Classification

dev.off()

RESULT:
Thus the data is visualized using plotty framework.
OUTPUT:
Ex.No:7f) Date: DATA VISUALIZATION USING SCATTER PLOT

AIM:
To visualize data using plotty framework.
PROCEDURE:
1. The simple scatterplot is created using the plot() function.
2. The basic syntax for creating scatterplot in R is −
3. Following is the description of the parameters used −
i. plot(x, y, main, xlab, ylab, xlim, ylim, axes)
a. x is the data set whose values are the horizontal coordinates.
b. y is the data set whose values are the vertical coordinates.
c. main is the tile of the graph.
d. xlab is the label in the horizontal axis.
e. ylab is the label in the vertical axis.
f. xlim is the limits of the values of x used for plotting.
g. ylim is the limits of the values of y used for plotting.
h. axes indicates whether both axes should be drawn on the plot.
PROGRAM:
# Get the input values.
input<- mtcars[,c('wt','mpg')]

# Give the chart file a name.

png(file = "scatterplot.png")

# Plot the chart for cars with weight between 2.5 to 5 and mileage between 15 and 30.
plot(x = input$wt,y = input$mpg,
xlab = "Weight",
ylab = "Milage",
xlim = c(2.5,5),
ylim = c(15,30),
main = "Weight vsMilage"
)

# Save the file.

dev.off()

RESULT:
Thus the data is visualized using plotty framework.
OUTPUT:
Ex.No:8a) Date: APPLICATION TO ADJUST THE NUMBER OF BINS IN THE HISTOGRAM USING R

AIM:
To implement the application to adjust the number of bins in the histogram using r
language.
PROCEDURE:
Any shiny app is built using two components:
1. UI.R: This file creates the user interface in a shiny application. It provides interactivity to
the shiny app by taking the input from the user and dynamically displaying the generated
output on the screen.
2. Server.R: This file contains the series of steps to convert the input given by user into
the desired output to be displayed.
a. Before we proceed further you need to set up Shiny in your system. Follow
these steps to get started.
1. Create a new project in R Studio
2. Select type as Shiny web application.
3. It creates two scripts in R Studio named ui.R and server R.

4. Each file needs to be coded separately and the flow of input and output between two

is possible.
PROGRAM:
#
# This is a Shiny web application. You can run the application by clicking
# the 'Run App' button above.
#
# Find out more about building applications with Shiny here:
#
# http://shiny.rstudio.com/
#

library(shiny)

# Define UI for application that draws a histogram

ui<- fluidPage(

# Application title
titlePanel("Old Faithful Geyser Data"),

# Sidebar with a slider input for number of bins

sidebarLayout(
sidebarPanel( sliderInpu
t("bins",
"Number of bins:",
min = 1,
max = 50,
value = 30)
),

# Show a plot of the generated distribution

mainPanel(
plotOutput("distPlot")
)
)
)

# Define server logic required to draw a histogram

server<- function(input, output) {

output$distPlot<- renderPlot({
# generate bins based on input$bins from
ui.R x <- faithful[, 2]
bins<- seq(min(x), max(x), length.out = input$bins + 1)

# draw the histogram with the specified number of bins

hist(x, breaks = bins, col = 'darkgray', border = 'white')
})}
RESULT:
Thus the application to adjust the number of bins in the histogram using r is implemented.
OUTPUT:
Ex.No:8b) Date: APPLICATION TO ANALYZE STOCK MARKET DATA USING R LANGUAGE

AIM:

To create an application to analyze Stock Market Data using R language.

PROCEDURE:

1a.Toanalyze stock data, Stock data can be obtained from Yahoo! Finance
(http://finance.yahoo.com) by using the quantmod package provides easy access to Yahoo!
Finance.

1b.getSymbols() can create a object called AAPL in the global environment.

2a.The class of AAPL object can be obtained with the command

2b.AAPL is of the xts class (which is also a zoo-class object). xts objects (provided in the
xts package) are seen as improved versions of the ts object for storing time series data.

3a.In this stock data’s are stored based on time-based indexing and can provide custom
attributes, along with allowing multiple (presumably related) time series with the same time
index to be stored in the same object.

3b.Yahoo! Finance provides six series with each security. Open is the price of the stock
at the beginning of the trading day, high is the highest price of the stock on that trading day, low
the lowest price of the stock on that trading day, and close the price of the stock at closing time.
Volume indicates how many stocks were traded. Adjusted is the closing price of the stock that
adjusts the price of the stock for corporate actions.

4a.Stock data series can be visualized using base R plotting with

4b.Visualization is obtained as
5a.Financial data is often plotted with the function called candleChart() from quantmod
to create a chart.

5b.With this function, plotting of variables with separate lines as follows

RESULT:

Thus an application to analyze Stock Market Data using R language is created

successfully.

HDLT12300 Device Instruction Manual
100% (6)
HDLT12300 Device Instruction Manual
88 pages
Online Book Store Project Report
100% (2)
Online Book Store Project Report
51 pages
Ten Dirty Little Secrets of Successful Entrepreneurs: 1. People Are Lazy
No ratings yet
Ten Dirty Little Secrets of Successful Entrepreneurs: 1. People Are Lazy
4 pages
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
How to a Developers Guide to 4k: Developer edition, #3
From Everand
How to a Developers Guide to 4k: Developer edition, #3
Xinc Cyberwizard
No ratings yet
3-1 Bigdata (Spark)
No ratings yet
3-1 Bigdata (Spark)
3 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
DotNet-Experienced-V1 0 0
No ratings yet
DotNet-Experienced-V1 0 0
1 page
Unit-2 Solution
No ratings yet
Unit-2 Solution
22 pages
Question Bank 1to11
No ratings yet
Question Bank 1to11
19 pages
DSBDA Practical Final
No ratings yet
DSBDA Practical Final
49 pages
Unit 1 PPT CC
No ratings yet
Unit 1 PPT CC
38 pages
Vivekananda: Lecture Notes On
No ratings yet
Vivekananda: Lecture Notes On
24 pages
Unit 2 B)
No ratings yet
Unit 2 B)
16 pages
AI Lab Manual
No ratings yet
AI Lab Manual
37 pages
Candidate Generation and Pruning
No ratings yet
Candidate Generation and Pruning
9 pages
Data Science Techniques Classification Regression and Clustering
No ratings yet
Data Science Techniques Classification Regression and Clustering
5 pages
Cloud Computing Unit-1 Notes
No ratings yet
Cloud Computing Unit-1 Notes
12 pages
BDA Lab ManuaL[1]
No ratings yet
BDA Lab ManuaL[1]
83 pages
ccs341-data-warehousing-lab-manual2021 (1)
No ratings yet
ccs341-data-warehousing-lab-manual2021 (1)
48 pages
Classical Analysis
No ratings yet
Classical Analysis
6 pages
DS Practical (BSC CS)
No ratings yet
DS Practical (BSC CS)
49 pages
Age Detection Using Machine
No ratings yet
Age Detection Using Machine
11 pages
Hbase PPT PDF
No ratings yet
Hbase PPT PDF
100 pages
UNIT 4 Information Retrieval Using NLP
No ratings yet
UNIT 4 Information Retrieval Using NLP
13 pages
Unit 4 - Domain Testing
100% (1)
Unit 4 - Domain Testing
76 pages
Unit 4 HIVE - PIG
No ratings yet
Unit 4 HIVE - PIG
71 pages
Cs3353 Foundations of Data Science L T P C 3 0 0 3
No ratings yet
Cs3353 Foundations of Data Science L T P C 3 0 0 3
2 pages
Nested Classes
No ratings yet
Nested Classes
23 pages
Da Unit-2
No ratings yet
Da Unit-2
23 pages
Python Solutions For iPA 10-Feb-23
No ratings yet
Python Solutions For iPA 10-Feb-23
21 pages
Se Module 2 PPT
No ratings yet
Se Module 2 PPT
86 pages
CS-703 (B) Data Warehousing and Data Mining Lab
No ratings yet
CS-703 (B) Data Warehousing and Data Mining Lab
50 pages
Designing Gui Based On A Data Mining Query Language
0% (1)
Designing Gui Based On A Data Mining Query Language
2 pages
Hive Lecture Notes
100% (1)
Hive Lecture Notes
17 pages
P.prabu (28x61c) CCS334 BDA - Unit 4
No ratings yet
P.prabu (28x61c) CCS334 BDA - Unit 4
28 pages
Principles of Compiler Design
No ratings yet
Principles of Compiler Design
36 pages
Ooad SDLC
0% (1)
Ooad SDLC
32 pages
DM Unit 5
No ratings yet
DM Unit 5
47 pages
Web Services Notes
No ratings yet
Web Services Notes
119 pages
EH Syllabus
No ratings yet
EH Syllabus
2 pages
Software Engineering Notes (Unit-III)
No ratings yet
Software Engineering Notes (Unit-III)
21 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
98 pages
006 Practical List of DM-2023
No ratings yet
006 Practical List of DM-2023
1 page
Case Study (Analysis of Algorithm
No ratings yet
Case Study (Analysis of Algorithm
14 pages
CS302 Unit1-III
No ratings yet
CS302 Unit1-III
18 pages
Atm Uml Diagram
No ratings yet
Atm Uml Diagram
7 pages
Data Mining & Business Intelligence (2170715) : Unit-5 Concept Description and Association Rule Mining
No ratings yet
Data Mining & Business Intelligence (2170715) : Unit-5 Concept Description and Association Rule Mining
39 pages
Distributed File Systems
No ratings yet
Distributed File Systems
18 pages
Computer Forensics Computer Crime Scene Investigation 2nd Edition Networking Series John R. Vacca - Own the ebook now and start reading instantly
100% (1)
Computer Forensics Computer Crime Scene Investigation 2nd Edition Networking Series John R. Vacca - Own the ebook now and start reading instantly
47 pages
16 Mark Questions OOAD
100% (2)
16 Mark Questions OOAD
9 pages
Cloud Computing Lab Manual-New
No ratings yet
Cloud Computing Lab Manual-New
150 pages
Cs-3491-Ai-Ml-Lab RECORD
No ratings yet
Cs-3491-Ai-Ml-Lab RECORD
59 pages
Big Data Analytics – Unit 4
No ratings yet
Big Data Analytics – Unit 4
32 pages
Advanced Databases - Unit - V - PPT
No ratings yet
Advanced Databases - Unit - V - PPT
71 pages
Visvesvaraya Technological University: "Car Rental Management System"
No ratings yet
Visvesvaraya Technological University: "Car Rental Management System"
31 pages
Mass Storage Structure
100% (1)
Mass Storage Structure
35 pages
DAA-2020-21 Final Updated Course File
No ratings yet
DAA-2020-21 Final Updated Course File
49 pages
Unit 3 Big Data MCQ AKTU: Royal Brinkman Gartenbaubedarf
No ratings yet
Unit 3 Big Data MCQ AKTU: Royal Brinkman Gartenbaubedarf
17 pages
A Model For Network Security
No ratings yet
A Model For Network Security
1 page
APP Question Bank Unit3
100% (1)
APP Question Bank Unit3
5 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
Introduction to Linux: Installation and Programming
From Everand
Introduction to Linux: Installation and Programming
N. B. Venkateswarlu
No ratings yet
RMTC Part-1 Robotics in Food Industries
No ratings yet
RMTC Part-1 Robotics in Food Industries
6 pages
LSISAS2108 Product Brief
No ratings yet
LSISAS2108 Product Brief
2 pages
Computational Fluid Dynamics Assignment 2
No ratings yet
Computational Fluid Dynamics Assignment 2
20 pages
The Inverse Laplace Transform Partial Fractions and The First Shifting Theorem
No ratings yet
The Inverse Laplace Transform Partial Fractions and The First Shifting Theorem
5 pages
130+ Linux Operating System Solved MCQs With PDF Download
No ratings yet
130+ Linux Operating System Solved MCQs With PDF Download
34 pages
Project Report On FXpro NITISH
No ratings yet
Project Report On FXpro NITISH
53 pages
Student Management System Literature Review
50% (2)
Student Management System Literature Review
4 pages
Introduction To Minitab: Lab No: 01
No ratings yet
Introduction To Minitab: Lab No: 01
3 pages
WIR001 PriceList 4
No ratings yet
WIR001 PriceList 4
16 pages
HSEF Laptop New Contrat
No ratings yet
HSEF Laptop New Contrat
1 page
Instructions for Term End Examination-June 2025
No ratings yet
Instructions for Term End Examination-June 2025
3 pages
SAP PP Bom
100% (1)
SAP PP Bom
39 pages
SSL Report:: Scan Another
No ratings yet
SSL Report:: Scan Another
5 pages
Multi-User Automated Pageant Tabulation System: Shoven M. Afable, Janice Dyan G. Quiloña
No ratings yet
Multi-User Automated Pageant Tabulation System: Shoven M. Afable, Janice Dyan G. Quiloña
4 pages
Jawaharlal Nehru Technological University Kakinada
No ratings yet
Jawaharlal Nehru Technological University Kakinada
4 pages
Immediate download Fixed Time Cooperative Control of Multi Agent Systems Zongyu Zuo ebooks 2024
100% (1)
Immediate download Fixed Time Cooperative Control of Multi Agent Systems Zongyu Zuo ebooks 2024
65 pages
ADP License Measurements: Victor Tham Technical Enablement
No ratings yet
ADP License Measurements: Victor Tham Technical Enablement
26 pages
Metode Inversi Pada Pengolahan MT
No ratings yet
Metode Inversi Pada Pengolahan MT
16 pages
AI: Neural Network For Beginners (Part 1 of 3) : Sacha Barber
No ratings yet
AI: Neural Network For Beginners (Part 1 of 3) : Sacha Barber
9 pages
Chapter 10 Microwave Devices
No ratings yet
Chapter 10 Microwave Devices
18 pages
Datenblatt Serie-23SX e PDF
No ratings yet
Datenblatt Serie-23SX e PDF
7 pages
HCMUT Internship Report DoanTienThong
No ratings yet
HCMUT Internship Report DoanTienThong
21 pages
BC Grade10 Mathematics10 To CEMC
No ratings yet
BC Grade10 Mathematics10 To CEMC
4 pages
Mba Crisis Management Notes
100% (1)
Mba Crisis Management Notes
14 pages
Fisa Tehnica DS-2CD1023G2-I 2.8mm
No ratings yet
Fisa Tehnica DS-2CD1023G2-I 2.8mm
5 pages
Verification of Systems and Circuits Using LOTOS Petri Nets and CCS 1st Edition Yoeli 2024 Scribd Download
100% (15)
Verification of Systems and Circuits Using LOTOS Petri Nets and CCS 1st Edition Yoeli 2024 Scribd Download
70 pages
Gary Bronson Excel 2019 Project Book Mercury Learning and Information 2021
No ratings yet
Gary Bronson Excel 2019 Project Book Mercury Learning and Information 2021
162 pages