0% found this document useful (0 votes)

38 views46 pages

DA Lab Manual Final

Data analytics lab manual For finally year BE CSE 7th semester

Uploaded by

julie M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views46 pages

DA Lab Manual Final

Data analytics lab manual For finally year BE CSE 7th semester

Uploaded by

julie M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

LIST OF EXPERIMENTS

S.NO TITLE OF THE EXPERIMENT PAGE

NO
1. Install, configure and run Hadoop and HDFS
2. Implement word count / frequency programs using MapReduce
3. Implement an MR program that processes a weather dataset
4. Install, configure and run Spark
5. Implement word count / frequency programs using Spark
6. Implement Machine learning using Spark
7. Implement Linear and logistic Regression
8. Implement SVM / Decision tree classification techniques
9. Implement clustering techniques
10. Visualize data using any plotting framework
11. Implement an application that stores big data in Hbase / MongoDB / Pig
using Hadoop /R
EX.1 Hadoop Installation and Configuration
Aim:-
To install and Configure Hadoop Environment.
Procedure:-
1. Install Java 8:
a.Download Java 8 a. Set environmental variables:
i. User variable:
● Variable: JAVA_HOME
● Value: C:\java
ii. System variable:
● Variable: PATH
● Value: C:\java\bin
b. Check on cmd, see below:

2.Install Eclipse Mars. Download it from the link: https://eclipse.org/downloads/ and

extract it into C drive.
a. Set environmental variables:
i. User variable:
● Variable: ECLIPSE_HOME
● Value: C:\eclipse
ii. System variable:
● Variable: PATH
● Value: C:\eclipse \bin
b. Download “hadoop2x-eclipse-plugin-master.”Three Jar files on the path
“hadoop2x- eclipse-plugin-master\release.” Copy these three jar files and pate
them into “C:\eclipse\dropins.”
c. Download “slf4j-1.7.21.” Copy Jar files from this folder and paste them to
“C:\eclipse\plugins”. This step may create errors; when you will execute Eclipse,
you will see errors like org.apa…..jar file in multiple places. So, now delete these
files from all the places except one.
Errors
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/C:/eclipse/plugins/org.slf4j.impl.log4j12_1.7.2.v20131105-2200.j
ar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/C:/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar
!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

3. Download Apache-ant-1.9.6: (optional step) extract it into a folder in C drive.

4. Download Hadoop-2.6.x:
a. Put extracted Hadoop-2.6.x files into D drive.
b. Download “hadoop-common-2.6.0-bin-master. Paste all these files into the
“bin” folder of Hadoop-2.6.x.
c. Create a “data” folder inside Hadoop-2.6.x, and also create two more folders in
the “data” folder as “data” and “name.”
d. Create a folder to store temporary data during execution of a project, such as
“D:\hadoop\temp.”
e. Create a log folder, such as “D:\hadoop\userlog”
f. Go to Hadoop-2.6.x etc  Hadoop and edit four files:
i. core-site.xml
ii. hdfs-site.xml
iii. mapred.xml
iv. yarn.xml

core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>D:\hadoop\temp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:50071</value>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<configuration>
<property><name>dfs.replication</name><value>1</value></property>
<property>
<name>dfs.namenode.name.dir</name><value>/hadoop-2.6.0/data/na
me</value><final>true</final></property>
<property><name>dfs.datanode.data.dir</name><value>/hadoop-2.6.0/
data/data</value><final>true</final> </property>
</configuration>
mapred.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>/hadoop-2.6.0/share/hadoop/mapreduce/*,
/hadoop-2.6.0/share/hadoop/mapreduce/lib/*,
/hadoop-2.6.0/share/hadoop/common/*,
/hadoop-2.6.0/share/hadoop/common/lib/*,
/hadoop-2.6.0/share/hadoop/yarn/*,
/hadoop-2.6.0/share/hadoop/yarn/lib/*,
/hadoop-2.6.0/share/hadoop/hdfs/*,
/hadoop-2.6.0/share/hadoop/hdfs/lib/*,
</value>
</property>
</configuration>
yarn-site.xml
<?xml version="1.0"?>

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>D:\hadoop\userlog</value><final>true</final>
</property>
<property><name>yarn.nodemanager.local-dirs</name><value>D:\hadoop\temp\nm-lo
cal-dir</value></property>
<property>
<name>yarn.nodemanager.delete.debug-delay-sec</name>
<value> 600</value>
</property>
<property><name>yarn.application.classpath</name>
<value>/hadoop-2.6.0/,/hadoop-2.6.0/share/hadoop/common/*,/hadoop-2.6.0/share/had
oop/common/lib/*,/hadoop-2.6.0/share/hadoop/hdfs/*,/hadoop-2.6.0/share/hadoop/hdfs/l
ib/*,/hadoop-2.6.0/share/hadoop/mapreduce/*,/hadoop-2.6.0/share/hadoop/mapreduce/l
ib/*,/hadoop-2.6.0/share/hadoop/yarn/*,/hadoop-2.6.0/share/hadoop/yarn/lib/*</value>
</property>
</configuration>
g. Go to the location: “Hadoop-2.6.0->etc->hadoop,” and edit “hadoop-env.cmd” by
writing
set JAVA_HOME=C:\java\jdk1.8.0_91
h. Set environmental variables: Do: My computer -> Properties -> Advance system
settings -> Advanced -> Environmental variables
i. User variables:
● Variable: HADOOP_HOME
● Value: D:\hadoop-2.6.0
ii. System variable
● Variable: Path
● Value: D:\hadoop-2.6.0\bin

D:\hadoop-2.6.0\sbin
D:\hadoop-2.6.0\share\hadoop\common\*
D:\hadoop-2.6.0\share\hadoop\hdfs
D:\hadoop-2.6.0\share\hadoop\hdfs\lib\*
D:\hadoop-2.6.0\share\hadoop\hdfs\*
D:\hadoop-2.6.0\share\hadoop\yarn\lib\*
D:\hadoop-2.6.0\share\hadoop\yarn\*
D:\hadoop-2.6.0\share\hadoop\mapreduce\lib\*
D:\hadoop-2.6.0\share\hadoop\mapreduce\*
D:\hadoop-2.6.0\share\hadoop\common\lib\*
i. Check on cmd;

j. Format name-node: On cmd go to the location “Hadoop-2.6.0bin” by writing on

cmd “cd hadoop-2.6.0.\bin” and then “hdfs namenode –format”
k. Start Hadoop. Go to the location: “D:\hadoop-2.6.0\sbin.” Run the following files as
administrator “start-dfs.cmd” and “start-yarn.cmd”

How to create a new MapReduce project in Eclipse

1. Open Ellipse
2. Click File -> New Project -> Java project
3. Click next and add external Jars for MapReduce.

Copy all the Jar files from the locations “D:\hadoop-2.6.0\”

a. \share\hadoop\common\lib
b. \share\hadoop\mapreduce
c. \share\hadoop\mapreduce\lib share\hadoop\yarn
d. \share\hadoop\yarn\lib
4. Connect DFS in Eclipse
Eclipse -> Window -> Perspective -> Open Perspective -> Other -> MapReduce -> Click
OK.
See a bar at the bottom. Click on Map/Reduce locations.
Right click on blank space, then click on “Edit setting,” and see the following screen.

a. Set the following:

i. MapReduce (V2) Master
● Host: localhost
● Port: 9001
ii. DFS Master
● Host: localhost
● Port: 50071
b. Click finish

Result:-
Thus the Hadoop Environment was installed and Configured.
EX.2 Implementation of word count/frequency using MapReduce
Aim:-
To implement word count program using MapReduce.
Procedure:-
Program:-
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

public static class TokenizerMapper

extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(Object key, Text value, Context context

) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,

Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Output:-
Compile WordCount.java and create a jar:
$ bin/hadoop com.sun.tools.javac.Main WordCount.java
$ jar cf wc.jar WordCount*.class

Sample text-files as input:

$ bin/hadoop fs -ls /user/joe/wordcount/input/

/user/joe/wordcount/input/file01
/user/joe/wordcount/input/file02

$ bin/hadoop fs -cat /user/joe/wordcount/input/file01

Hello World Bye World

$ bin/hadoop fs -cat /user/joe/wordcount/input/file02

Hello Hadoop Goodbye Hadoop

Run the application:

$ bin/hadoop jar wc.jar WordCount /user/joe/wordcount/input /user/joe/wordcount/output
Display output as
$ bin/hadoop fs -cat /user/joe/wordcount/output/part-r-00000
Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2

Result:-
Thus the word count program was executed using Hadoop environment.
EX.3 Implementation of MR program using Weather dataset

Aim :-
To write a code to find maximum temperature per year from sensor temperature
data sheet, using hadoop mapreduce framework.
Procedure:-

Implement Mapper and Reducer program for finding Maximum temperature in java

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;

//Mapper class
class MaxTemperatureMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {

private static final int MISSING = 9999;

@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {

String line = value.toString();

String year = line.substring(15, 19);
int airTemperature;
if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs
airTemperature = Integer.parseInt(line.substring(88, 92));
} else {
airTemperature = Integer.parseInt(line.substring(87, 92));
}
String quality = line.substring(92, 93);
if (airTemperature != MISSING && quality.matches("[01459]")) {
context.write(new Text(year), new IntWritable(airTemperature));
}
}
}

//Reducer class
class MaxTemperatureReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {

@Override
public void reduce(Text key, Iterable<IntWritable> values,
Context context)
throws IOException, InterruptedException {

int maxValue = Integer.MIN_VALUE;

for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}

//Driver Class

public class MaxTemperature {

public static void main(String[] args) throws Exception {

if (args.length != 2) {
System.err.println("Usage: MaxTemperature <input path=""> <output path>");
System.exit(-1);
}

Job job = Job.getInstance(new Configuration());

job.setJarByClass(MaxTemperature.class);
job.setJobName("Max temperature");

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

job.submit();
}
}
Input

Text containing sensor data.like

Input for String :

0029029070999991902010720004+64333+023450FM-12+
000599999V0202501N027819999999N0000001N9-00331+
99999098351ADDGF102991999999999999999999

Output

Output Text contain year and maximum temperature in that year as 1902 33

Result:-
Thus Maximum temperature of weather dataset was obtained using MapReduce.
EX.4 INSTALL, CONFIGURE AND RUN SPARK

Aim:-
To install and configure spark in standalone machine.
Procedure:-
Step 1: Install Java 8
Apache Spark requires Java 8. You can check to see if Java is installed using the
command prompt.
Open the command line by clicking Start > type cmd > click Command Prompt.
Type the following command in the command prompt:
java -version
If Java is installed, it will respond with the following output:

Step 2: Install Python

1. To install the Python package manager, navigate to https://www.python.org/ in your
web browser.
2. Mouse over the Download menu option and click Python 3.8.3. 3.8.3 is the latest
version at the time of writing the article.
3. Once the download finishes, run the file.
4. Near the bottom of the first setup dialog box, check off Add Python 3.8 to PATH.
Leave the other box checked.
5. Next, click Customize installation.

6. You can leave all boxes checked at this step, or you can uncheck the options you do
not want.
7. Click Next.
8. Select the box Install for all users and leave other boxes as they are.
9. Under Customize install location, click Browse and navigate to the C drive. Add a
new folder and name it Python.
10. Select that folder and click OK.

11. Click Install, and let the installation complete.

12. When the installation completes, click the Disable path length limit option at the
bottom and then click Close.
13. If you have a command prompt open, restart it. Verify the installation by checking
the version of Python:
python --version
The output should print Python 3.8.3.
Step 3: Download Apache Spark
1. Open a browser and navigate to https://spark.apache.org/downloads.html.
2. Under the Download Apache Spark heading, there are two drop-down menus. Use
the current non-preview version.
In our case, in Choose a Spark release drop-down menu select 2.4.5
In the second drop-down Choose a package type, leave the selection Pre-built for
Apache Hadoop 2.7.
3. Click the spark-2.4.5-bin-hadoop2.7.tgz link.

4. A page with a list of mirrors loads where you can see different servers to download
from. Pick any from the list and save the file to your Downloads folder.
Step 4: Verify Spark Software File
1. Verify the integrity of your download by checking the checksum of the file. This
ensures you are working with unaltered, uncorrupted software.
2. Navigate back to the Spark Download page and open the Checksum link, preferably
in a new tab.
3. Next, open a command line and enter the following command:
certutil -hashfile c:\users\username\Downloads\spark-2.4.5-bin-hadoop2.7.tgz
SHA512
4. Change the username to your username. The system displays a long alphanumeric
code, along with the message Certutil: -hashfile completed successfully.
5. Compare the code to the one you opened in a new browser tab. If they match, your
download file is uncorrupted.
Step 5: Install Apache Spark
Installing Apache Spark involves extracting the downloaded file to the desired location.
1. Create a new folder named Spark in the root of your C: drive. From a command line,
enter the following:
cd \
mkdir Spark
2. In Explorer, locate the Spark file you downloaded.
3. Right-click the file and extract it to C:\Spark using the tool you have on your system
(e.g., 7-Zip).
4. Now, your C:\Spark folder has a new folder spark-2.4.5-bin-hadoop2.7 with the
necessary files inside.
Step 6: Add winutils.exe File
Download the winutils.exe file for the underlying Hadoop version for the Spark
installation you downloaded.
1. Navigate to this URL https://github.com/cdarlint/winutils and inside the bin folder,
locate winutils.exe, and click it.

2. Find the Download button on the right side to download the file.
3. Now, create new folders Hadoop and bin on C: using Windows Explorer or the
Command Prompt.
4. Copy the winutils.exe file from the Downloads folder to C:\hadoop\bin.
Step 7: Configure Environment Variables
Configuring environment variables in Windows adds the Spark and Hadoop locations to
your system PATH. It allows you to run the Spark shell directly from a command prompt
window.
1. Click Start and type environment.
2. Select the result labeled Edit the system environment variables.
3. A System Properties dialog box appears. In the lower-right corner, click Environment
Variables and then click New in the next window.

4. For Variable Name type SPARK_HOME.

5. For Variable Value type C:\Spark\spark-2.4.5-bin-hadoop2.7 and click OK. If you
changed the folder path, use that one instead.
6. In the top box, click the Path entry, then click Edit. Be careful with editing the system
path. Avoid deleting any entries already on the list.

7. You should see a box with entries on the left. On the right, click New.
8. The system highlights a new line. Enter the path to the Spark folder
C:\Spark\spark-2.4.5-bin-hadoop2.7\bin. We recommend using %SPARK_HOME%\bin
to avoid possible issues with the path.

9. Repeat this process for Hadoop and Java.

● For Hadoop, the variable name is HADOOP_HOME and for the value use the
path of the folder you created earlier: C:\hadoop. Add C:\hadoop\bin to the Path
variable field, but we recommend using %HADOOP_HOME%\bin.
● For Java, the variable name is JAVA_HOME and for the value use the path to
your Java JDK directory (in our case it’s C:\Program Files\Java\jdk1.8.0_251).

10. Click OK to close all open windows.

Step 8: Launch Spark
1. Open a new command-prompt window using the right-click and Run as administrator:
2. To start Spark, enter:
C:\Spark\spark-2.4.5-bin-hadoop2.7\bin\spark-shell
If you set the environment path correctly, you can type spark-shell to launch Spark.
3. The system should display several lines indicating the status of the application. You
may get a Java pop-up. Select Allow access to continue.
Finally, the Spark logo appears, and the prompt displays the Scala shell.

4., Open a web browser and navigate to http://localhost:4040/.

5. You can replace localhost with the name of your system.
6. You should see an Apache Spark shell Web UI. The example below shows the
Executors page.

7. To exit Spark and close the Scala shell, press ctrl-d in the command-prompt window.
Result:-
Thus, the SPARK was installed and configured successfully.
EX.5 IMPLEMENT WORD COUNT / FREQUENCY PROGRAMS USING SPARK

Aim:-
To Implement word count / frequency programs using Spark

Program:-
package org.apache.spark.examples;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import scala.Tuple2;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;

public final class WordCount {

private static final Pattern SPACE = Pattern.compile(" ");

public static void main(String[] args) throws Exception {

if (args.length < 1) {
System.err.println("Usage: WordCount <file>");
System.exit(1);
}

final SparkConf sparkConf = new

SparkConf().setAppName("WordCount");
final JavaSparkContext ctx = new JavaSparkContext(sparkConf);
final JavaRDD<String> lines = ctx.textFile(args[0], 1);

final JavaRDD<String> words = lines.flatMap(s ->

Arrays.asList(SPACE.split(s)));
final JavaPairRDD<String, Integer> ones = words.mapToPair(s -> new
Tuple2<>(s, 1));
final JavaPairRDD<String, Integer> counts = ones.reduceByKey((i1, i2) ->
i1 + i2);

final List<Tuple2<String, Integer>> output = counts.collect();

for (Tuple2 tuple : output) {
System.out.println(tuple._1() + ": " + tuple._2());
}
ctx.stop();
}}
Input file content:
abcd
abef
hijl
mhil

Output:
{e=1, h=2, b=2, j=1, m=1, d=1, a=2, i=2, c=1, l=2, f=1}

Result:-
Thus, the word count program is executed successfully.
EX.6 IMPLEMENT MACHINE LEARNING USING SPARK

Aim:-
To implement machine learning using spark
Procedure:-
Spark MLlib is a module on top of Spark Core that provides machine learning primitives
as APIs.

1. Setting the Dependencies

First, we have to define the following dependency in Maven to pull the relevant libraries:

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.4.3</version>
<scope>provided</scope>
</dependency>

And we need to initialize the SparkContext to work with Spark APIs:

SparkConf conf = new SparkConf()
.setAppName("Main")
.setMaster("local[2]");
JavaSparkContext sc = new JavaSparkContext(conf);

2. Loading the Data

First things first, we should download the data, which is available as a text file in CSV
format. Then we have to load this data in Spark:

String dataFile = "data\\iris.data";

JavaRDD<String> data = sc.textFile(dataFile);

Spark MLlib offers several data types, both local and distributed, to represent the input
data and corresponding labels. The simplest of the data types are Vector:

JavaRDD<Vector> inputData = data

.map(line -> {
String[] parts = line.split(",");
double[] v = new double[parts.length - 1];
for (int i = 0; i < parts.length - 1; i++) {
v[i] = Double.parseDouble(parts[i]);
}
return Vectors.dense(v);
});

A training example typically consists of multiple input features and a label, represented
by the class LabeledPoint:
Map<String, Integer> map = new HashMap<>();
map.put("Iris-setosa", 0);
map.put("Iris-versicolor", 1);
map.put("Iris-virginica", 2);

JavaRDD<LabeledPoint> labeledData = data

3. Exploratory Data Analysis

MultivariateStatisticalSummary summary = Statistics.colStats(inputData.rdd());
System.out.println("Summary Mean:");
System.out.println(summary.mean());
System.out.println("Summary Variance:");
System.out.println(summary.variance());
System.out.println("Summary Non-zero:");
System.out.println(summary.numNonzeros());

Another important metric to analyze is the correlation between features in the input
data:
Matrix correlMatrix = Statistics.corr(inputData.rdd(), "pearson");
System.out.println("Correlation Matrix:");
System.out.println(correlMatrix.toString());
Iris.data (Input Dataset)

Output for our input data:

Summary Mean:
[5.843333333333332,3.0540000000000003,3.7586666666666666,1.198666666666666
8]
Summary Variance:
[0.6856935123042509,0.18800402684563744,3.113179418344516,0.58241431767337
83]
Summary Non-zero:
[150.0,150.0,150.0,150.0]

Correlation Matrix:
1.0 -0.10936924995064387 0.8717541573048727 0.8179536333691672
-0.10936924995064387 1.0 -0.4205160964011671 -0.3565440896138163
0.8717541573048727 -0.4205160964011671 1.0 0.9627570970509661
0.8179536333691672 -0.3565440896138163 0.9627570970509661 1.0
Result:-
Thus, the word count program is executed successfully.

EX.7 IMPLEMENTATION OF LINEAR AND LOGISTIC REGRESSION USING R

Aim:-
To implement Linear and Logistic Regression using R.
Procedure:-
Step 1: Carry out the experiment of gathering a sample of observed values of height and
corresponding weight.
Step 2: Create a relationship model using the lm() functions in R.
Step 3: Find the coefficients from the model created and create the mathematical
equation using these
Step 4: Get a summary of the relationship model to know the average error in prediction.
Also called residuals.
Step 5:To predict the weight of new persons, use the predict () function in R.
Program:-
Linear Regression:
Linear..r
# The predictor vector.
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)

# The response vector.

y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)

# Apply the lm() function.

relation <- lm(y~x)
print(relation)
print(summary(relation))
# Find weight of a person with height 170.
a <- data.frame(x = 170)
result <- predict(relation,a)
print(result)
# Give the chart file a name.
png(file = "linearregression.png")

# Plot the chart.

plot(y,x,col = "blue",main = "Height & Weight Regression",
abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height in cm")

# Save the file.

dev.off()
Output:-
Call:
lm(formula = y ~ x)

Coefficients:
(Intercept) x
-38.4551 0.6746

Call:
lm(formula = y ~ x)

Residuals:
Min 1Q Median 3Q Max
-6.3002 -1.6629 0.0412 1.8944 3.9775

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -38.45509 8.04901 -4.778 0.00139 **
x 0.67461 0.05191 12.997 1.16e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.253 on 8 degrees of freedom

Multiple R-squared: 0.9548, Adjusted R-squared: 0.9491
F-statistic: 168.9 on 1 and 8 DF, p-value: 1.164e-06

1
76.22869

Logistic Regression:
logistic.r
# Select some columns form mtcars.
input <- mtcars[,c("am","cyl","hp","wt")]
print(head(input))

am.data = glm(formula = am ~ cyl + hp + wt, data = input, family = binomial)

print(summary(am.data))

Output:
am cyl hp wt
Mazda RX4 1 6 110 2.620
Mazda RX4 Wag 1 6 110 2.875
Datsun 710 1 4 93 2.320
Hornet 4 Drive 0 6 110 3.215
Hornet Sportabout 0 8 175 3.440
Valiant 0 6 105 3.460

Call:
glm(formula = am ~ cyl + hp + wt, family = binomial, data = input)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.17272 -0.14907 -0.01464 0.14116 1.27641

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 19.70288 8.11637 2.428 0.0152 *
cyl 0.48760 1.07162 0.455 0.6491
hp 0.03259 0.01886 1.728 0.0840 .
wt -9.14947 4.15332 -2.203 0.0276 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 43.2297 on 31 degrees of freedom

Residual deviance: 9.8415 on 28 degrees of freedom
AIC: 17.841
Number of Fisher Scoring iterations: 8
Result:-
Linear and Logistic Regression was implemented using R Programming
successfully.
EX.8 IMPLEMENTATION OF DECISION TREE CLASSIFIER USING R
Aim:-
To implement Decision Tree Classifier using R
Procedure:-
Step 1:Install party package
Step 2:Input the data
Step 3:Create decision tree using ctree() function
Step 4: Display and Print Result.
Program:-
# Load the party package. It will automatically load other dependent packages.
library(party)

# Create the input data frame.

input.dat <- readingSkills[c(1:105),]

# Give the chart file a name.

png(file = "decision_tree.png")

# Create the tree.

output.tree <- ctree(
nativeSpeaker ~ age + shoeSize + score,
data = input.dat)

# Plot the tree.

plot(output.tree)

# Save the file.

dev.off()

Output:-
null device
1
Loading required package: methods
Loading required package: grid
Loading required package: mvtnorm
Loading required package: modeltools
Loading required package: stats4
Loading required package: strucchange
Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

as.Date, as.Date.numeric

Loading required package: sandwich

Result:-
Thus, the Decision tree classifier is implemented using R.
EX.9 IMPLEMENTATION OF CLUSTERING TECHNIQUE
Aim: -
To implement Clustering Technique (Pam) using R
Procedure:-
Step 1: Include cluster package.
Step 2: Apply Pam method to car dataset
Step 3: Compare hclust and pam result using table method
Step 4: Plot the result
Program:-
Pam.r
library(cluster)
cars.pam = pam(cars.dist,3)
names(cars.pam)
table(groups.3,cars.pam$clustering)
cars$Car[groups.3 != cars.pam$clustering]
cars$Car[cars.pam$id.med]
cars$Car[cars.pam$id.med]
plot(cars.pam)
Output:-

> names(cars.pam)

[1] "medoids" "id.med" "clustering" "objective" "isolation"

[6] "clusinfo" "silinfo" "diss" "call"

> table(groups.3,cars.pam$clustering)
groups.3 1 2 3
1 8 0 0
2 0 19 1
3 0 0 10
> cars$Car[groups.3 != cars.pam$clustering]
[1] Audi 5000
> cars$Car[cars.pam$id.med]
> cars$Car[cars.pam$id.med]
[1] Dodge St Regis Dodge Omni Ford Mustang Ghia
> plot(cars.pam)
Result:-
Thus, the clustering using Pam was implemented using R.
EX.10 IMPLEMENTATION OF DATA VISUALIZATION
Aim:-
To implement the Data Visualization using R Program.
Procedure:-
Step 1: Read the input
Step 2: Visualize the data using
i)Piechart
ii)3DPiechart
iii)Boxplot
iv)Histogram
v)Linechart
vi)Scatterplot
Program :-
1.Piechart.r
# Create data for the graph.
x <- c(21, 62, 10, 53)
labels <- c("London", "New York", "Singapore", "Mumbai")

# Give the chart file a name.

png(file = "city_title_colours.jpg")

# Plot the chart with title and rainbow color pallet.

pie(x, labels, main = "City pie chart", col = rainbow(length(x)))

# Save the file.

dev.off()

Output:-
Executing the program....

$Piechart.r

2.ThreeDPiechart.r
# Get the library.
library(plotrix)

# Create data for the graph.

x <- c(21, 62, 10,53)
lbl <- c("London","New York","Singapore","Mumbai")

# Give the chart file a name.

png(file = "3d_pie_chart.jpg")

# Plot the chart.

pie3D(x,labels = lbl,explode = 0.1, main = "Pie Chart of Countries ")

# Save the file.

dev.off()

Output:-
Executing the program....

$ThreeDPiechart.r

3.Boxplot.r

# Give the chart file a name.

png(file = "boxplot.png")

# Plot the chart.

boxplot(mpg ~ cyl, data = mtcars, xlab = "Number of Cylinders",
ylab = "Miles Per Gallon", main = "Mileage Data")
# Save the file.
dev.off()

Output

4.Histogram.r
# Create data for the graph.
v <- c(9,13,21,8,36,22,12,41,31,33,19)

# Give the chart file a name.

png(file = "histogram.png")

# Create the histogram.

hist(v,xlab = "Weight",col = "yellow",border = "blue")

# Save the file.

dev.off()

output
5.Linechart.r
# Create the data for the chart.
v <- c(7,12,28,3,41)

# Give the chart file a name.

png(file = "line_chart_label_colored.jpg")

# Plot the bar chart.

plot(v,type = "o", col = "red", xlab = "Month", ylab = "Rain fall",
main = "Rain fall chart")

# Save the file.

dev.off()

output
6.Scatterplot.r
# Get the input values.
input <- mtcars[,c('wt','mpg')]

# Give the chart file a name.

png(file = "scatterplot.png")

# Plot the chart for cars with weight between 2.5 to 5 and mileage between 15 and 30.
plot(x = input$wt,y = input$mpg,
xlab = "Weight",
ylab = "Milage",
xlim = c(2.5,5),
ylim = c(15,30),
main = "Weight vs Milage"
)

# Save the file.

dev.off()

output
Result:-
Thus the different data visualization techniques were implemented using R.
EX.11 Implementation of an Application
Aim:-
To implement survival analysis using R
Procedure:-
Step 1: Install survival package
Step 2: Display input to check details
Step 3: Create survival object
Step 4: Display the output
Program:-
Survival.r
# Load the library.
library("survival")

# Print first few rows.

print(head(pbc))
# Create the survival object.
survfit(Surv(pbc$time,pbc$status == 2)~1)

# Give the chart file a name.

png(file = "survival.png")

# Plot the graph.

plot(survfit(Surv(pbc$time,pbc$status == 2)~1))

# Save the file.

dev.off()

Output:-
>print(head(pbc))

id time status trt age sex ascites hepato spiders edema bili chol
1 1 400 2 1 58.76523 f 1 1 1 1.0 14.5 261
2 2 4500 0 1 56.44627 f 0 1 1 0.0 1.1 302
3 3 1012 2 1 70.07255 m 0 0 0 0.5 1.4 176
4 4 1925 2 1 54.74059 f 0 1 1 0.5 1.8 244
5 5 1504 1 2 38.10541 f 0 1 1 0.0 3.4 279
6 6 2503 2 2 66.25873 f 0 1 0 0.0 0.8 248
albumin copper alk.phos ast trig platelet protime stage
1 2.60 156 1718.0 137.95 172 190 12.2 4
2 4.14 54 7394.8 113.52 88 221 10.6 3
3 3.48 210 516.0 96.10 55 151 12.0 4
4 2.54 64 6121.8 60.63 92 183 10.3 4
5 3.53 143 671.0 113.15 72 136 10.9 3
6 3.98 50 944.0 93.00 63 NA 11.0 3

Call: survfit(formula = Surv(pbc$time, pbc$status == 2) ~ 1)

n events median 0.95LCL 0.95UCL
418 161 3395 3090 3853

Result:-
Thus, the
survival analysis
was implemented
using R.

Cp5261 Da Lab Me-Cse 2021 - Edit
No ratings yet
Cp5261 Da Lab Me-Cse 2021 - Edit
88 pages
Durag D-Ug 660
80% (5)
Durag D-Ug 660
34 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
20CSPL701 - Bda - Record 2024-2025
No ratings yet
20CSPL701 - Bda - Record 2024-2025
61 pages
20CSPL701 - Bda - Record 2024-2025
No ratings yet
20CSPL701 - Bda - Record 2024-2025
54 pages
BDA Lab Manual 2023-2024
No ratings yet
BDA Lab Manual 2023-2024
54 pages
BDA Lab Manual - Organized
No ratings yet
BDA Lab Manual - Organized
69 pages
Da Lab Record - Merged
No ratings yet
Da Lab Record - Merged
48 pages
Hadoopfile PP
No ratings yet
Hadoopfile PP
83 pages
Hadoop Week 3
No ratings yet
Hadoop Week 3
60 pages
BDA Record
No ratings yet
BDA Record
58 pages
Big Data Analytics lab-JD
No ratings yet
Big Data Analytics lab-JD
49 pages
New Bda Manual
No ratings yet
New Bda Manual
80 pages
Big Data Lab Manual Printout
No ratings yet
Big Data Lab Manual Printout
51 pages
Bda Lab
No ratings yet
Bda Lab
39 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
43 pages
Data Analytics Lab
No ratings yet
Data Analytics Lab
42 pages
Bda Record
No ratings yet
Bda Record
83 pages
Big Data Manual
No ratings yet
Big Data Manual
19 pages
Exp 5 - 9
No ratings yet
Exp 5 - 9
25 pages
Data Science
No ratings yet
Data Science
82 pages
Bda Manual
No ratings yet
Bda Manual
80 pages
Hadoop Record 2024-Final
No ratings yet
Hadoop Record 2024-Final
59 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
34 pages
BDA Manual
No ratings yet
BDA Manual
41 pages
Big Data Manual
No ratings yet
Big Data Manual
82 pages
Lab Manual
No ratings yet
Lab Manual
34 pages
BDF Programs
No ratings yet
BDF Programs
32 pages
DA Lab EXERCISE
No ratings yet
DA Lab EXERCISE
24 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
48 pages
BDA Lab Manual by T.Naga Praveena
No ratings yet
BDA Lab Manual by T.Naga Praveena
40 pages
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
No ratings yet
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
30 pages
CCS334-BDA LAB MANUAL Final
No ratings yet
CCS334-BDA LAB MANUAL Final
46 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
32 pages
Step 2 - First MapReduce Program
No ratings yet
Step 2 - First MapReduce Program
25 pages
LExmark M5155
No ratings yet
LExmark M5155
1,041 pages
Bigdatamanual
No ratings yet
Bigdatamanual
45 pages
BIG Data File
No ratings yet
BIG Data File
28 pages
Bda Manual
No ratings yet
Bda Manual
33 pages
Big Data
No ratings yet
Big Data
28 pages
BDA Lab
No ratings yet
BDA Lab
13 pages
Tricky Mathmatics
No ratings yet
Tricky Mathmatics
74 pages
Big Data Analytics Lab
No ratings yet
Big Data Analytics Lab
18 pages
Bda Lab Manuel
No ratings yet
Bda Lab Manuel
9 pages
Anushka Shetty 35
No ratings yet
Anushka Shetty 35
34 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
Big Data File
No ratings yet
Big Data File
16 pages
Cloud Computing Ex 6
No ratings yet
Cloud Computing Ex 6
8 pages
200-301 Cisco CCNA Exam Updated Practice Questions
No ratings yet
200-301 Cisco CCNA Exam Updated Practice Questions
67 pages
Hive INstallation
No ratings yet
Hive INstallation
13 pages
Install Hadoop-2.6.0 On Windows10
No ratings yet
Install Hadoop-2.6.0 On Windows10
8 pages
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
No ratings yet
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
61 pages
Running Map Reduce Program in Eclipse: C:/hadoop
No ratings yet
Running Map Reduce Program in Eclipse: C:/hadoop
6 pages
Exp 1-2
No ratings yet
Exp 1-2
9 pages
EX1-Installation of Hadoop
No ratings yet
EX1-Installation of Hadoop
6 pages
Experiment No. 3.1: 1) JAVA-Java JDK 2) HADOOP-Hadoop Package - Step 1: Verify The Java Installed
No ratings yet
Experiment No. 3.1: 1) JAVA-Java JDK 2) HADOOP-Hadoop Package - Step 1: Verify The Java Installed
6 pages
(DSU) Data Structure Using 'C' (22317)
0% (1)
(DSU) Data Structure Using 'C' (22317)
6 pages
Big Datalab
No ratings yet
Big Datalab
4 pages
Group A 1st
No ratings yet
Group A 1st
4 pages
1.1. How Should We Define AI
No ratings yet
1.1. How Should We Define AI
14 pages
Amc Engineering College: Dept. of Computer Science and Engineering
No ratings yet
Amc Engineering College: Dept. of Computer Science and Engineering
6 pages
Hadoop Single Node Cluster Setup Steps
No ratings yet
Hadoop Single Node Cluster Setup Steps
7 pages
Setup Hadoop On Windows 10 Machines
No ratings yet
Setup Hadoop On Windows 10 Machines
4 pages
Hadoop Administrator Training - Lab Hand Book
No ratings yet
Hadoop Administrator Training - Lab Hand Book
12 pages
San Jose Community College
No ratings yet
San Jose Community College
8 pages
HYC 1718 P1 Eng
No ratings yet
HYC 1718 P1 Eng
24 pages
JCM Training Overview Uba 10-11-12 14
No ratings yet
JCM Training Overview Uba 10-11-12 14
25 pages
Install Hadoop-2.6.0 On Windows10
No ratings yet
Install Hadoop-2.6.0 On Windows10
8 pages
Aux Boiler Feed Water VV Sipart PS2-Manual
No ratings yet
Aux Boiler Feed Water VV Sipart PS2-Manual
3 pages
Bharat Intern Data Science
No ratings yet
Bharat Intern Data Science
9 pages
Honeywell Brochure
No ratings yet
Honeywell Brochure
8 pages
PrE4 Module 1
No ratings yet
PrE4 Module 1
8 pages
Q1-Proper Use of Tools
No ratings yet
Q1-Proper Use of Tools
10 pages
INVENTORY SHEET Final
No ratings yet
INVENTORY SHEET Final
1 page
Risc V Cores
No ratings yet
Risc V Cores
41 pages
Microprocessors and Peripherals: Lab Programs - 2019
No ratings yet
Microprocessors and Peripherals: Lab Programs - 2019
40 pages
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Sage UBS Accounting & Billing Brochure
No ratings yet
Sage UBS Accounting & Billing Brochure
12 pages
Flutter User Interface Using Scaffolds
No ratings yet
Flutter User Interface Using Scaffolds
36 pages
Codeverse Documentation
No ratings yet
Codeverse Documentation
60 pages
apna-TCS-IT Support Engineer-Interview-Report-202506041718
No ratings yet
apna-TCS-IT Support Engineer-Interview-Report-202506041718
6 pages
Oxford Learner's Bookshelf E-Books For Learning English 2
No ratings yet
Oxford Learner's Bookshelf E-Books For Learning English 2
1 page
An Online Road Transport Booking System: Asian Journal of Computer Science and Technology October 2021
No ratings yet
An Online Road Transport Booking System: Asian Journal of Computer Science and Technology October 2021
6 pages
How To Use KG Relative Strength Original and Range Trading EA
No ratings yet
How To Use KG Relative Strength Original and Range Trading EA
8 pages
Ecotruck Api-Instrukcja
No ratings yet
Ecotruck Api-Instrukcja
5 pages
Thulsi Soap
No ratings yet
Thulsi Soap
1 page
Solai's Shampoo
No ratings yet
Solai's Shampoo
1 page
Multani Mitti Powder
No ratings yet
Multani Mitti Powder
1 page
Hair Oil
No ratings yet
Hair Oil
1 page
Service Manual: TV-21ST3 TV-20ST5 TV-14ST5
No ratings yet
Service Manual: TV-21ST3 TV-20ST5 TV-14ST5
6 pages
Prescription 1750335455310
No ratings yet
Prescription 1750335455310
1 page
Prescription 1750335398682
No ratings yet
Prescription 1750335398682
1 page
Prescription 1750335382980
No ratings yet
Prescription 1750335382980
1 page
Data Analysis and Interpretation
No ratings yet
Data Analysis and Interpretation
32 pages
Computer Image Corporation Brochure
No ratings yet
Computer Image Corporation Brochure
8 pages
Sdma - Tdma - Fdma - Cdma.
No ratings yet
Sdma - Tdma - Fdma - Cdma.
11 pages
Javell: Address: 23 A East Avenue, Linstead P.O., Jamaica Email: Telephone: (876) 484-8766 1876-416-8765
No ratings yet
Javell: Address: 23 A East Avenue, Linstead P.O., Jamaica Email: Telephone: (876) 484-8766 1876-416-8765
3 pages
PecStar iEMS V3.6 System Design Guide
No ratings yet
PecStar iEMS V3.6 System Design Guide
17 pages
Nilai Uh Statistika
No ratings yet
Nilai Uh Statistika
14 pages
Development of A MIS in Marketing Function of A Sales Company
No ratings yet
Development of A MIS in Marketing Function of A Sales Company
4 pages