0% found this document useful (0 votes)

94 views

Data Analytics Lab

The document describes experiments related to data analytics using Python and Excel. Experiment 1 involves implementing linear regression for single and multivariable data to predict home prices. Experiment 2 covers logistic regression to predict insurance purchases. Experiment 3 uses decision trees to predict employee salaries. Experiment 4 applies random forests to classify handwritten digits. Experiment 5 demonstrates two-sample and paired t-tests in Excel to analyze group means.

Uploaded by

Anupriya Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views

Data Analytics Lab

Uploaded by

Anupriya Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 46

LABORATORY MANUAL

DATA ANALYTICS LAB

B. TECH.
(III YEAR – SIXTH SEM)
LIST OF PROGRAMS

Subject: Data Analytics Lab Code: CS-60

SNo Name of Experiment S.No.

1 Implement and analyze Linear regression in python (Single variable &

Multivariable)
2 Implement and analyze Logistic regression in python

3 Implement and analyze Decision tree algorithm in python

4 Implement and analyze Random Forest algorithm in python

5 Implementation of two samples T-test and paired two-sample T-test in

excel.
6 Implementation of one-way and two-way ANOVA in excel.

7 Write steps for installing the Hadoop in windows 10

8 Working with Hadoop commands

9 Implementation of word count example using MapReduce.

10 Implementation of MapReduce program to count unique number

Of times a song is played based on userid and trackid.
Experiment No. 1

Title: Implementation of Linear regression in python (Single variable &

Multivariable)

Regression is a statistical method used in finance, investing, and other disciplines that
attempts to determine the strength and character of the relationship between one dependent
variable (usually denoted by Y) and a series of other variables (known as independent
variables).

1. Linear Regression Single Variable:-

Below table represents current home prices in abc township based on square feet area

Problem Statement: Given above data build a machine learning model that can predict home prices
based on square feet area
2. Linear Regression MultiVariable:-
Now we have to predict the price of house based of many parameters like area, number
of bedroom and age.
The objective is to predict the price of a house based on all the above
mentioned parameters:-
Simple linear regression is a regression model that figures out the relationship
between one independent variable and one dependent variable using a straight line.
SOME APPLICATIONS WHERE WE CAN USE SIMPLE LINEAR
REGRESSION

1. Marks scored by students based on number of hours studied (ideally)- Here

marks scored in exams are independent and the number of hours studied is
independent.
2. Predicting crop yields based on the amount of rainfall- Yield is a dependent
variable while the measure of precipitation is an independent variable.
3. Predicting the Salary of a person based on years of experience- Therefore,
Experience becomes the independent while Salary turns into the dependent variable.
Experiment No. 2

Title: Implementation of logistic regression in python.

Logistic regression estimates the probability of an event occurring, such as voted or
didn't vote, based on a given dataset of independent variables. Since the outcome is a
probability, the dependent variable is bounded between 0 and 1.
In the below given example we are predicting whether a customer will buy insurance or not
based on his age.
Applications: Logistic regression is used in various fields, including machine learning,
most medical fields, and social sciences. For example, the Trauma and Injury Severity
Score (TRISS), which is widely used to predict mortality in injured patients, was originally
developed using logistic regression. Many other medical scales used to assess severity of a
patient have been developed using logistic regression. Logistic regression is mainly used
for binary classification though it can also be used for multiclass classification problems.
Experiment No. 3

Title: Implementation of decision tree algorithm in python.

Decision Tree is one of the most powerful and popular algorithm. Decision-tree
algorithm falls under the category of supervised learning algorithms. It works for both
continuous as well as categorical output variables.

In this example our objective is to predict the salary of an employee based on various
parameters.
A Decision Tree is a kind of supervised machine learning algorithm that has a root node
and leaf nodes. Every node represents a feature, and the links between the nodes show the
decision. Every leaf represents a result.

Decision Trees Applications

Marketing: Businesses can use decision trees to enhance the accuracy of their promotional
campaigns by observing the performance of their competitors’ products and services.
Decision trees can help in audience segmentation and support businesses in producing
better-targeted advertisements that have higher conversion rates.

Retention of Customers: Companies use decision trees for customer retention through
analyzing their behaviors and releasing new offers or products to suit those behaviors. By
using decision tree models, companies can figure out the satisfaction levels of their
customers as well.

Diagnosis of Diseases and Ailments: Decision trees can help physicians and medical
professionals in identifying patients that are at a higher risk of developing serious (or
preventable) conditions such as diabetes or dementia. The ability of decision trees to
narrow down possibilities according to specific variables is quite helpful in such cases.
Experiment No. 4

Title: Implementation of Random Forest algorithm in python

Random Forest is a popular machine learning algorithm that belongs to the supervised
learning technique. It can be used for both Classification and Regression problems in ML. It
is based on the concept of ensemble learning, which is a process of combining multiple
classifiers to solve a complex problem and to improve the performance of the model.

In this example we are classifying hand written digits using random forest algorithm.
Random Forest is a classifier that contains a number of decision trees on various subsets
of the given dataset and takes the average to improve the predictive accuracy of that
dataset." Instead of relying on one decision tree, the random forest takes the prediction
from each tree and based on the majority votes of predictions, and it predicts the final
output.
Applications of Random Forest
Random forest algorithm mostly used in following areas:

 Banking: Banking sector mostly uses this algorithm for the identification of loan
risk.
 Medicine: With the help of this algorithm, disease trends and risks of the disease
can be identified.
 Land Use: We can identify the areas of similar land use by this algorithm.
 Marketing: Marketing trends can be identified using this algorithm.
Experiment No. 5

Title: Implementation of Two sampled T-test and paired two sampled T-test in excel

T-tests are hypothesis tests that assess the means of one or two groups. Hypothesis tests
use sample data to infer properties of entire populations. To be able to use a t-test, you need
to obtain a random sample from your target populations. Depending on the t-test and how you
configure it, the test can determine whether:

o Two group means are different.

o Paired means are different.
o One mean is different from a target value.

STEPS

1. Install the Data Analysis Tool Pack in Excel

The Data Analysis Tool Pack must be installed on your copy of Excel to perform t-tests. To
determine whether you have this Tool Pack installed, click Data in Excel’s menu across the
top and look for Data Analysis in the Analyze section. If you don’t see Data Analysis, you
need to install it.

To install Excel’s Analysis Tool pack, click the File tab on the top-left and then
click Options on the bottom-left. Then, click Add-Ins. On the Manage drop-down list,
choose Excel Add-ins, and click Go. On the popup that appears, check Analysis Tool
Pcak and click OK. After you enable it, click Data Analysis in the Data menu to display the
analyses you can perform. Among other options, the popup presents three types of t-test,
which we’ll cover next.
Two-Sample t-Tests in Excel

Two-sample t-tests compare the means of precisely two groups—no more and no less!
Typically, you perform this test to determine whether two population means are different. For
example, do students who learn using Method A have a different mean score than those who
learn using Method B? This form of the test uses independent samples. In other words,

If the p-value is less than your significance level (e.g., 0.05), you can reject the null
hypothesis. The difference between the two means is statistically significant. Your sample
provides strong enough evidence to conclude that the two population means are different.

Let’s assume that the variances are equal and use the Assuming Equal Variances version. If
we had chosen the unequal variances form of the test, the steps and interpretation are the
same—only the calculations change.

1. In Excel, click Data Analysis on the Data tab.

2. From the Data Analysis popup, choose t-Test: Two-Sample Assuming Equal
Variances.
3. Under Input, select the ranges for both Variable 1 and Variable 2.
4. In Hypothesized Mean Difference, you’ll typically enter zero. This value is the
null hypothesis value, which represents no effect. In this case, a mean
difference of zero represents no difference between the two methods, which is
no effect.
5. Check the Labels checkbox if you have meaningful variable names in row 1.
This option makes the output easier to interpret. Ensure that you include the
label row in step #3.
6. Excel uses a default Alpha value of 0.05, which is usually a good value.
Alpha is the significance level. Change this value only when you have a
specific reason for doing so.
7. Click OK.
For the example data, your popup should look like the image below:

Interpreting the Two-Sample t-Test Results

The output indicates that mean for Method A is 71.50362 and for Method B it is 84.74241.
Looking in the Variances row, we can see that they are not exactly equal, but they are close
enough to assume equal variances. The p-value is the most important statistic. If you want to
learn about the other statistics, you can read my posts about the t Stat (i.e., the t-value), df
(degrees of freedom), and the t Critical values.

If the p-value is less than your significance level, the difference between means is statistically
significant. Excel provides p-values for both one-tailed and two-tailed t-tests.

For our results, we’ll use P(T<=t) two-tail, which is the p-value for the two-tailed form of the
t-test. Because our p-value (0.000336) is less than the standard significance level of 0.05, we
can reject the null hypothesis. Our sample data support the hypothesis that the population
means are different. Specifically, Method B’s mean is greater than Method A’s mean.
Paired t-Tests in Excel

Paired t-tests assess paired observations, which are often two measurements on the same
person or item

Step-by-Step Instructions for Running the Paired t-Test in Excel

To perform a paired t-test in Excel, arrange your data into two columns so that each row
represents one person or item, as shown below. Note that the analysis does not use the
subject’s ID number.

1. In Excel, click Data Analysis on the Data tab.

2. From the Data Analysis popup, choose t-Test: Paired Two Sample for Means.
3. Under Input, select the ranges for both Variable 1 and Variable 2.
4. In Hypothesized Mean Difference, you’ll typically enter zero. This value is the null
hypothesis value, which represents no effect. In this case, a mean difference of zero
represents no difference between the two methods, which is no effect.
5. Check the Labels checkbox if you have meaningful variables labels in row 1. This option
helps make the output easier to interpret. Ensure that you include the label row in step3
6. Excel uses a default Alpha value of 0.05, which is usually a good value. Alpha is the
significance level. Change this value only when you have a specific reason for doing so.
7. Click OK.
For the example data, your popup should look like the image below:

Interpreting Excel’s Paired t-Test Results

The output indicates that mean for the Pertest is 97.06223 and for the Post-test it is 107.8346.

For our results, we’ll use P(T<=t) two-tail, which is the p-value for the two-tailed form of the
t-test. Because our p-value (0.002221) is less than the standard significance level of 0.05, we
can reject the null hypothesis. Our sample data support the hypothesis that the population
means are different. Specifically, the Post-test mean is greater than the Pre-test mean.

T-test applications

 The T-test is used to compare the mean of two samples, dependent or independent.
 It can also be used to determine if the sample mean is different from the
assumed mean.
 T-test has an application in determining the confidence interval for a sample mean.
Experiment No. 6

Title: Implementation of one-way and two-way ANOVA in excel.

Analysis of Variance (ANOVA) is a statistical formula used to compare variances across the
means (or average) of different groups. A range of scenarios use it to determine if there is any
difference between the means of different groups.

Creating an ANOVA table with Excel

To perform a one way anova implement the following step.

 Import your data set in any preferred Excel format.

 Go to the Data tab, click on the Data Analysis sub-tab. If you can’t find the sub-tab, check the

subheading beneath.

 Select ANOVA: single factor and click ok.

 Click on the input range and highlight the dataset you want to use.
 You can decide if you want to view it in the same spreadsheet or another spreadsheet.

In our ANOVA table above, we analysed the sum of squares and other values of the ANOVA. With

this, we can solve a one-way ANOVA using Microsoft Excel.

Two-Way ANOVA

In Excel, do the following steps:

1. Click Data Analysis on the Data tab.

2. From the Data Analysis popup, choose Anova: Two-Factor With Replication.

3. Under Input, select the ranges for all columns of data.

4. In Rows per sample, enter 20. This represents the number of observations per group.
5. Excel uses a default Alpha value of 0.05, which is usually a good value. Alpha is the significance level.
Change this value only when you have a specific reason for doing so.

6. Click OK.
Applications of Anova
Organizations use ANOVA to make decisions about which alternative to choose among many
possible options. For example, ANOVA can help to:

 Compare the yield of two different wheat varieties under three different
fertilizer brands.
 Compare the effectiveness of various social media advertisements on the sales of
a particular product.
 Compare the effectiveness of different lubricants in different types of vehicles.
Experiment No. 7

Title: Write steps for installing the HADOOP in windows 10

The Apache Hadoop software library is a framework that allows for the distributed
processing of large data sets across clusters of computers using simple programming
models. It is designed to scale up from single servers to thousands of machines, each
offering local computation and storage. Rather than rely on hardware to deliver high-
availability, the library itself is designed to detect and handle failures at the application
layer, so delivering a highly-available service on top of a cluster of computers, each of
which may be prone to failures.

Install Java

– Java JDK Link to download

https://www.oracle.com/java/technologies/javase-jdk8-downloads.html
– extract and install Java in C:\Java
– open cmd and type -> javac -version

Download Hadoop

– https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz

– extract to C:\Hadoop
1. Set the path JAVA_HOME Environment variable
2. Set the path HADOOP_HOME Environment variable
Configurations: -

Edit file C:/Hadoop-3.3.0/etc/hadoop/core-site.xml,

paste the xml code in folder and save

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
======================================================

Rename “mapred-site.xml.template” to “mapred-site.xml” and edit this file C:/Hadoop-

3.3.0/etc/hadoop/mapred-site.xml, paste xml code and save this file.

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
======================================================
Create folder “data” under “C:\Hadoop-3.3.0”
Create folder “datanode” under “C:\Hadoop-3.3.0\data”
Create folder “namenode” under “C:\Hadoop-3.3.0\data”

======================================================
Edit file C:\Hadoop-3.3.0/etc/hadoop/hdfs-site.xml,
paste xml code and save this file.

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/hadoop-3.3.0/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/hadoop-3.3.0/data/datanode</value>
</property>
</configuration>
======================================================

Edit file C:/Hadoop-3.3.0/etc/hadoop/yarn-site.xml,

paste xml code and save this file.

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
======================================================

Edit file C:/Hadoop-3.3.0/etc/hadoop/hadoop-env.cmd

by closing the command line
“JAVA_HOME=%JAVA_HOME%” instead of set “JAVA_HOME=C:\Java”

======================================================
Testing: -

– Open cmd and change directory to C:\Hadoop-3.3.0\sbin

– type start-all.cmd

– Start namenode and datanode with this command

– type start-dfs.cmd
– Start yarn through this command
– type start-yarn.cmd

Make sure these apps are running

– Hadoop Namenode
– Hadoop datanode
– YARN Resource Manager
– YARN Node Manager

Open: http://localhost:8088
======================================================

Hadoop installed Successfully…………

======================================================

Applications of Hadoop

Hadoop is developed by Doug Cutting and Michale J. It is managed by apache software

foundation and licensed under the Apache license 2.0 Hadoop. It is beneficial for the big
business because it is based on cheap servers, requiring less cost to store the data and process
the data. Hadoop helps make a better business decision by providing a history of data and
various company records. So by using this technology company can improve its business.
Hadoop does lots of processing over collected data from the company to deduce the result
which can help to make a future decision.
Experiment No. 8

Title: Working with HADOOP commands

Apache Hadoop is an open source framework that is used to efficiently store and process
large datasets ranging in size from gigabytes to petabytes of data. Instead of using one large
computer to store and process the data, Hadoop allows clustering multiple computers to
analyze massive datasets in parallel more quickly.

1. Version
Hadoop HDFS version Command Usage:
The Hadoop fs shell command version prints the Hadoop version

Hadoop HDFS version Command Description:

The Hadoop fs shell command version prints the Hadoop version.

2. mkdir

Hadoop HDFS mkdir Command Usage:

Hadoop HDFS mkdir Command Description:
This command creates the directory in HDFS if it does not already exist.

3. ls
Hadoop HDFS ls Command Usage:
Here in the below example, we are using the ls command to enlist the files and directories
present in HDFS.

4. put

Hadoop HDFS put Command Usage:

Here in this example, we are trying to copy localfile1 of the local file system to the Hadoop
filesystem.
Hadoop HDFS put Command Description:
The Hadoop fs shell command put is similar to the copyFromLocal, which copies files or
directory from the local filesystem to the destination in the Hadoop filesystem.

5. copyFromLocal
Hadoop HDFS copyFromLocal Command Usage:
Here in the below example, we are trying to copy the ‘test1’ file present in the local file
system to the newDataFlair directory of Hadoop.

Hadoop HDFS copyFromLocal Command Example:

Here in the below example, we are trying to copy the ‘test1’ file present in the local file
system to the newDataFlair directory of Hadoop.

6. get

Hadoop HDFS get Command Usage:

In this example, we are trying to copy the ‘file’ of the hadoop filesystem to the local file
system.
Hadoop HDFS get Command Description:
The Hadoop fs shell command get copies the file or directory from the Hadoop file system to
the local file system.

7. copyToLocal

Hadoop HDFS copyToLocal Command Usage:

Here in this example, we are trying to copy the ‘sample’ file present in the newDataFlair
directory of HDFS to the local file system.

Hadoop HDFS copyToLocal Description:

copyToLocal command copies the file from HDFS to the local file system.

8. cat

Hadoop HDFS cat Command Usage:

Here in this example, we are using the cat command to display the content of the ‘sample’
file present in newDataFlair directory of HDFS.
Hadoop HDFS cat Command Description:
The cat command reads the file in HDFS and displays the content of the file on console or
stdout.

9. mv

Hadoop HDFS mv Command Description:

The HDFS mv command moves the files or directories from the source to a destination
within HDFS.

10. cp

Hadoop HDFS cp Command Description:

The cp command copies a file from one directory to another directory within the HDFS.
Experiment No. 9

Title: Implementation of word count example using MapReduce.

In MapReduce word count example, we find out the frequency of each word. Here, the role
of Mapper is to map the keys to the existing values and the role of Reducer is to aggregate the
keys of common values. So, everything is represented in the form of Key-value pair.

package wordcount2;

import
java.io.IOException; import
java.util.*;
import
org.apache.hadoop.fs.Path; import
org.apache.hadoop.conf.*; import
org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class Wordcount2 {

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}}

public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterable<IntWritable> values, Context context)

throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values)
{ sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();
Job job = new Job(conf, "Wordcount2");
job.setJarByClass(Wordcount2.class);
job.setJobName("WordCounter");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.waitForCompletion(true);
}
}
Input File

Output File
Experiment No. 10

Title: Implementation of MapReduce program to count unique number Of times a song

is played based on userid and trackid.

package tempex;
import java.io.IOException;
import java.util.*;

import
org.apache.hadoop.fs.Path; import
org.apache.hadoop.conf.*; import
org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class Tempex {

public static class UniqueListenersMapper

extends
Mapper<Object,Text,IntWritable,IntWritable>{
IntWritable trackId = new
IntWritable(); IntWritable userId = new
IntWritable();

public void map(Object key, Text value,Context context)

throws IOException, InterruptedException {

String[] parts = value.toString().split("[|]");

trackId.set(Integer.parseInt(parts[1]));
userId.set(Integer.parseInt(parts[0]));
context.write(trackId, userId);
}
}

public static class UniqueListenersReducer extends

Reducer< IntWritable , IntWritable, IntWritable, IntWritable> {
public void reduce(
IntWritable trackId,
Iterable< IntWritable > userIds,Context context) throws IOException,
InterruptedException
{
Set< Integer > userIdSet = new HashSet< Integer >();
for (IntWritable userId : userIds) {
userIdSet.add(userId.get());
}
IntWritable size = new IntWritable(userIdSet.size());
context.write(trackId, size);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
if (args.length != 2) {
System.err.println("Usage: uniquelisteners < in > < out >");
System.exit(2);
}
Job job = new Job(conf, "Unique listeners per track");
job.setJarByClass(Tempex.class);
job.setMapperClass(UniqueListenersMapper.class);
job.setReducerClass(UniqueListenersReducer.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);

}
}

Input

Output:-
Applications/Advantages of MapReduce:

MapReduce is a programming paradigm that enables massive scalability across hundreds or

thousands of servers in a Hadoop cluster. As the processing component, MapReduce is the
heart of Apache Hadoop. The term "MapReduce" refers to two separate and distinct tasks that
Hadoop programs perform. The first is the map job, which takes a set of data and converts it
into another set of data, where individual elements are broken down into tuples (key/value
pairs).

The reduce job takes the output from a map as input and combines those data tuples into a
smaller set of tuples. As the sequence of the name MapReduce implies, the reduce job is
always performed after the map job.

MapReduce programming offers several benefits to help you gain valuable insights from
your big data:

 Scalability. Businesses can process petabytes of data stored in the Hadoop Distributed
File System (HDFS).
 Flexibility. Hadoop enables easier access to multiple sources of data and
multiple types of data.
 Speed. With parallel processing and minimal data movement, Hadoop offers
fast processing of massive amounts of data.
 Simple. Developers can write code in a choice of languages, including Java, C++ and
Python.

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Writing An APA Style Research Paper PDFF
No ratings yet
Writing An APA Style Research Paper PDFF
5 pages
Sturdevant S Art and Science of Operative Dentistry
No ratings yet
Sturdevant S Art and Science of Operative Dentistry
23 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
This Study Resource Was: MAT 243 Project Three Summary Report
No ratings yet
This Study Resource Was: MAT 243 Project Three Summary Report
8 pages
Agniva
No ratings yet
Agniva
16 pages
Objects Oriented Programming OOP
No ratings yet
Objects Oriented Programming OOP
66 pages
Objects Oriented Programming OOP
No ratings yet
Objects Oriented Programming OOP
67 pages
Broadly, There Are 3 Types of Machine Learning Algorithms.
No ratings yet
Broadly, There Are 3 Types of Machine Learning Algorithms.
33 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
Statistical Computing by Using R
100% (1)
Statistical Computing by Using R
11 pages
Big Data Mid Term
No ratings yet
Big Data Mid Term
14 pages
R Unit 4th and 5th
No ratings yet
R Unit 4th and 5th
17 pages
Commonly Used Machine Learning Algorithms
No ratings yet
Commonly Used Machine Learning Algorithms
27 pages
DAV Short Notes
No ratings yet
DAV Short Notes
5 pages
DATA SCIENCE iNTERVIEW QUESTION
No ratings yet
DATA SCIENCE iNTERVIEW QUESTION
42 pages
ML File - Merged
No ratings yet
ML File - Merged
24 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
SMEC ML LAB MANUAL R22
No ratings yet
SMEC ML LAB MANUAL R22
21 pages
Machine Learing Algorithms
No ratings yet
Machine Learing Algorithms
13 pages
2-Machine Learning Algorithms
No ratings yet
2-Machine Learning Algorithms
16 pages
lab mannual of ML
No ratings yet
lab mannual of ML
43 pages
ML Lab Manual
No ratings yet
ML Lab Manual
38 pages
41 Machine Learning Algorithms I
No ratings yet
41 Machine Learning Algorithms I
8 pages
University Institute of Computing: Big Data Analytics 22CAH-782
No ratings yet
University Institute of Computing: Big Data Analytics 22CAH-782
27 pages
Assignment B 1 LinearRegression
No ratings yet
Assignment B 1 LinearRegression
5 pages
machinelearning_lab manual
No ratings yet
machinelearning_lab manual
26 pages
ML manoj
No ratings yet
ML manoj
51 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
Linear.regression.with.Python
No ratings yet
Linear.regression.with.Python
140 pages
Lab Experiment 4 - AI
No ratings yet
Lab Experiment 4 - AI
7 pages
Regression Dataset Example
No ratings yet
Regression Dataset Example
14 pages
data-analytics-manual lab g.anill kumar
No ratings yet
data-analytics-manual lab g.anill kumar
23 pages
AI lab7
No ratings yet
AI lab7
13 pages
R Intro 2011
No ratings yet
R Intro 2011
115 pages
Artificial Intelligence & BA - Practicals Assignments
No ratings yet
Artificial Intelligence & BA - Practicals Assignments
15 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
Machinelearning Algorithm Basics2 NOTES
No ratings yet
Machinelearning Algorithm Basics2 NOTES
72 pages
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
No ratings yet
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
10 pages
Machine Learning
100% (3)
Machine Learning
46 pages
AIML
No ratings yet
AIML
30 pages
Lab Manual
No ratings yet
Lab Manual
7 pages
Machine Learning Algorithm With Python Implementation
No ratings yet
Machine Learning Algorithm With Python Implementation
34 pages
BA - Advanced statistical method using R (P2)
No ratings yet
BA - Advanced statistical method using R (P2)
12 pages
R 2nd IA
No ratings yet
R 2nd IA
7 pages
Greenwood Intermediate Statistics With R
No ratings yet
Greenwood Intermediate Statistics With R
429 pages
Commonly Used Machine Learning Algorithms
No ratings yet
Commonly Used Machine Learning Algorithms
38 pages
Islp 3
No ratings yet
Islp 3
5 pages
Data Science Cheatsheet 2.0: Statistics Model Evaluation Logistic Regression
No ratings yet
Data Science Cheatsheet 2.0: Statistics Model Evaluation Logistic Regression
4 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
Data Science Machine Leraning222
No ratings yet
Data Science Machine Leraning222
11 pages
DA Lab ANSWERS
No ratings yet
DA Lab ANSWERS
10 pages
BN2102 7-10
No ratings yet
BN2102 7-10
24 pages
Exp 5-6-7-8
No ratings yet
Exp 5-6-7-8
8 pages
R Tutorial Slides
No ratings yet
R Tutorial Slides
13 pages
Statistics Consulting Cheat Sheet: Kris Sankaran October 1, 2017
100% (1)
Statistics Consulting Cheat Sheet: Kris Sankaran October 1, 2017
44 pages
Unit 2 Regression Analysis
No ratings yet
Unit 2 Regression Analysis
16 pages
T-Test in ML
No ratings yet
T-Test in ML
3 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
CS 501 TOC Lab Manual
No ratings yet
CS 501 TOC Lab Manual
28 pages
Skill Development Practical File
No ratings yet
Skill Development Practical File
18 pages
Machine Learning Practical File
No ratings yet
Machine Learning Practical File
41 pages
Computer Network Practical File
No ratings yet
Computer Network Practical File
34 pages
Major Project New
No ratings yet
Major Project New
20 pages
Relationship of Smartphone Addiction With Social Development and Empathy Levels in School-Age Children
No ratings yet
Relationship of Smartphone Addiction With Social Development and Empathy Levels in School-Age Children
6 pages
Advanced Marketing Research: Project
No ratings yet
Advanced Marketing Research: Project
22 pages
Chi Squared
No ratings yet
Chi Squared
15 pages
Hypothesis Test: Mean vs. Hypothesized Value
No ratings yet
Hypothesis Test: Mean vs. Hypothesized Value
11 pages
Big-Five Personality Traits and Academic Performance
No ratings yet
Big-Five Personality Traits and Academic Performance
18 pages
Senior High School Students Perceptions and Awareness on the Ethical Implications of Artificial Intelligence
No ratings yet
Senior High School Students Perceptions and Awareness on the Ethical Implications of Artificial Intelligence
10 pages
MCQ Practice Questions
No ratings yet
MCQ Practice Questions
114 pages
Lab Manual
No ratings yet
Lab Manual
83 pages
Desarrollo de Sistema de Detección de Fugas para Tubería de PVC Mediante Emisión Vibro-Acústica
No ratings yet
Desarrollo de Sistema de Detección de Fugas para Tubería de PVC Mediante Emisión Vibro-Acústica
4 pages
Effect of Implementing Nursing Care Guidelines On Nurses Knowledges and Practices Regarding External Fixation in Orthopedic Patients
No ratings yet
Effect of Implementing Nursing Care Guidelines On Nurses Knowledges and Practices Regarding External Fixation in Orthopedic Patients
12 pages
PG Program in Data Science
No ratings yet
PG Program in Data Science
11 pages
Research Paper - EVA Indian Banking Sector
No ratings yet
Research Paper - EVA Indian Banking Sector
14 pages
A Study Examining The Students Satisfaction in Higher Education
No ratings yet
A Study Examining The Students Satisfaction in Higher Education
5 pages
A Comparative Study On The Difference of The Academic Performance Between Working and Non-Working Students
100% (1)
A Comparative Study On The Difference of The Academic Performance Between Working and Non-Working Students
21 pages
Chi Square Test of Independence
No ratings yet
Chi Square Test of Independence
2 pages
Advanced Statistical Methods
No ratings yet
Advanced Statistical Methods
38 pages
Pup Format Template 111
No ratings yet
Pup Format Template 111
149 pages
FRQ 2 Sample With Key
No ratings yet
FRQ 2 Sample With Key
5 pages
Confidence Interval For Population Variance
No ratings yet
Confidence Interval For Population Variance
48 pages
HW12 Sol
No ratings yet
HW12 Sol
9 pages
Dav - Lab Manual (1)
No ratings yet
Dav - Lab Manual (1)
34 pages
Student Notes 7.2 New
No ratings yet
Student Notes 7.2 New
7 pages
MODULE 3. Research in Educ Student
No ratings yet
MODULE 3. Research in Educ Student
11 pages
Sca 14 - Larasatri Nuansari
No ratings yet
Sca 14 - Larasatri Nuansari
14 pages
Answer Sheet Group Number: - 6 - Names of Team Members: - Hongru Bi, Yangxi Gan, Tara Kelley, Jie Xu Data Set - Arlhomes1
No ratings yet
Answer Sheet Group Number: - 6 - Names of Team Members: - Hongru Bi, Yangxi Gan, Tara Kelley, Jie Xu Data Set - Arlhomes1
6 pages
SSM & Da All Unit Notes
No ratings yet
SSM & Da All Unit Notes
152 pages
Sri Lanka
No ratings yet
Sri Lanka
6 pages

Data Analytics Lab

Uploaded by

Data Analytics Lab

Uploaded by

LABORATORY MANUAL

DATA ANALYTICS LAB

Subject: Data Analytics Lab Code: CS-60

SNo Name of Experiment S.No.

1 Implement and analyze Linear regression in python (Single variable &

3 Implement and analyze Decision tree algorithm in python

4 Implement and analyze Random Forest algorithm in python

5 Implementation of two samples T-test and paired two-sample T-test in

7 Write steps for installing the Hadoop in windows 10

8 Working with Hadoop commands

9 Implementation of word count example using MapReduce.

10 Implementation of MapReduce program to count unique number

Title: Implementation of Linear regression in python (Single variable &

1. Linear Regression Single Variable:-

1. Marks scored by students based on number of hours studied (ideally)- Here

Title: Implementation of logistic regression in python.

Title: Implementation of decision tree algorithm in python.

Decision Trees Applications

Title: Implementation of Random Forest algorithm in python

o Two group means are different.

1. Install the Data Analysis Tool Pack in Excel

1. In Excel, click Data Analysis on the Data tab.

Interpreting the Two-Sample t-Test Results

Step-by-Step Instructions for Running the Paired t-Test in Excel

1. In Excel, click Data Analysis on the Data tab.

Interpreting Excel’s Paired t-Test Results

Title: Implementation of one-way and two-way ANOVA in excel.

Creating an ANOVA table with Excel

To perform a one way anova implement the following step.

 Import your data set in any preferred Excel format.

 Select ANOVA: single factor and click ok.

this, we can solve a one-way ANOVA using Microsoft Excel.

In Excel, do the following steps:

1. Click Data Analysis on the Data tab.

3. Under Input, select the ranges for all columns of data.

Title: Write steps for installing the HADOOP in windows 10

– Java JDK Link to download

Edit file C:/Hadoop-3.3.0/etc/hadoop/core-site.xml,

Rename “mapred-site.xml.template” to “mapred-site.xml” and edit this file C:/Hadoop-

Edit file C:/Hadoop-3.3.0/etc/hadoop/yarn-site.xml,

Edit file C:/Hadoop-3.3.0/etc/hadoop/hadoop-env.cmd

– Open cmd and change directory to C:\Hadoop-3.3.0\sbin

– Start namenode and datanode with this command

Make sure these apps are running

Hadoop installed Successfully…………

Hadoop is developed by Doug Cutting and Michale J. It is managed by apache software

Title: Working with HADOOP commands

Hadoop HDFS version Command Description:

Hadoop HDFS mkdir Command Usage:

Hadoop HDFS put Command Usage:

Hadoop HDFS copyFromLocal Command Example:

Hadoop HDFS get Command Usage:

Hadoop HDFS copyToLocal Command Usage:

Hadoop HDFS copyToLocal Description:

Hadoop HDFS cat Command Usage:

Hadoop HDFS mv Command Description:

Hadoop HDFS cp Command Description:

Title: Implementation of word count example using MapReduce.

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {

public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterable<IntWritable> values, Context context)

public static void main(String[] args) throws Exception {

FileInputFormat.addInputPath(job, new Path(args[0]));

Title: Implementation of MapReduce program to count unique number Of times a song

public static class UniqueListenersMapper

public void map(Object key, Text value,Context context)

String[] parts = value.toString().split("[|]");

public static class UniqueListenersReducer extends

MapReduce is a programming paradigm that enables massive scalability across hundreds or

You might also like