0% found this document useful (0 votes)

42 views57 pages

Data Analytics Lab Manual

Uploaded by

Shivam Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views57 pages

Data Analytics Lab Manual

Uploaded by

Shivam Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 57

DATA ANALYTICS LAB MANUAL (KIT-651)

LIST OF EXPERIMENTS
S.NO. NAME PAGE
NO.
1. Introduction 2 -9

2. To get the input from user and perform numerical operations (MAX, MIN, 10-12
AVG, SUM, SQRT, ROUND) using in R.

3. To perform data import/export (CSV, XLS, TXT) operations using data 13-15
frames in R.

4. To get the input matrix from user and perform Matrix Addition, 16-19
Subtraction, Multiplication, Inverse Transpose and Division
operations using vector concept in R.

5. To perform statistical operations (Mean, Median, Mode and 20-21

Standard Deviation) using R.
6. To perform data pre-processing operations i) Handling Missing data 22-23
ii) Min-Max normalization
7. To perform dimensionality reduction operation using PCA for Houses Data 24-28
Set
8. To perform Simple Linear Regression with R. 29-36

9. To perform K-Means clustering operation and visualize for iris data set 37-41

10. Write R script to diagnose any disease using KNN classification and 42-48
plot the results.
11. To perform market basket analysis using Association Rules (Apriori). 49-52

Name: Anshika Singh

Roll No.: 2000300130022
INTRODUCTION
General Overview:

One of the principle attractions of utilizing the R environment is the simplicity with which users can
compose their own projects and custom functions. The R programming syntax is very simple to learn, in
any event, for clients with no past programming experience. Once the basic R programming control
structures are perceived, users can utilize the R language as an amazing environment to perform complex
custom analyses of almost any type of data.

Code Editors for R:

Several code editors are available that provide functionalities like R syntax highlighting, auto code
indenting and utilities to send code/functions to the R console.

 Basic code editors provided by Rguis

 Rstudio: GUI-based IDE for R
 Vim-R-Tmux: R working environment based on vim and tmux
 Emacs (ESS add-on package)
 Gedit and Rgedit
 RKWard
 Eclipse
 Tinn-R
 Notepad++

History of R:

R is a programming language and free software environment for statistical computing and graphics that is
supported by the R Foundation for Statistical Computing. The R language is widely used among
statisticians and data miners for developing statistical software and data analysis.

R is an implementation of the S programming language combined with lexical scoping semantics inspired
by Scheme. S was created by John Chambers in 1976, while at Bell Labs. There are some important
differences, but much of the code written for S runs unaltered.

R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is
currently developed by the R Development Core Team, of which Chambers is a member. R is named partly
after the first names of the first two R authors and partly as a play on the name of S. The project was
conceived in 1992, with an initial version released in 1995 and a stable beta version in 2000.

R and its libraries implement a wide variety of statistical and graphical techniques, including linear and
nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, and others. R
is easily extensible through functions and extensions, and the R community is noted for its active

Name: Anshika Singh

Roll No.: 2000300130022
contributions in terms of packages. Many of R's standard functions are written in R itself, which makes it
easy for users to follow the algorithmic choices made.

R is an interpreted language; users typically access it through a command- line interpreter. If a user types
2+2 at the R command prompt and presses enter, the computer replies with 4, as shown below:

>2+2

[1] 4

Features of R:

As stated earlier, R is a programming language and software environment for statistical analysis, graphics
representation and reporting. The following are the important features of R –

 R is a well-developed, simple and effective programming language which includes conditionals, loops,
user defined recursive functions and input and output facilities.
 R has an effective data handling and storage facility,
 R provides a suite of operators for calculations on arrays, lists, vectors and matrices.
 R provides a large, coherent and integrated collection of tools for data analysis.
 R provides graphical facilities for data analysis and display either directly at the computer or printing at
the papers.

Getting R:

R can be downloaded from one of the CRAN (Comprehensive R Archive Network) sites http://cran.us.r-
projects.org/. Look in the download and install area. Then, download it according to your OS either Mac
OS or Windows.

What is RStudio?

If you click on the R program you just downloaded, you will find a very basic user interface. For example,
below is what I get on Windows.

Name: Anshika Singh

Roll No.: 2000300130022
We will not use R’s direct interface to run analyses. Instead, we will use the program RStudio. RStudio
gives you a true integrated development environment (IDE), where you can write code in a window, see
results in other windows, see locations of files, see objects you’ve created and so on.

Getting RStudio:

To download and install RStudio, follow the directions below:

 Navigate to RStudio’s download site:

https://www.rstudio.com/products/rstudio/download/#download
 Click on the appropriate link based on your OS.
 Click on the installer that you downloaded. Follow the installation wizard’s directions, making sure
to keep all defaults intact.

The RStudio interface:

Open up RStudio. You should see the interface shown in the figure below which has three windows:

Name: Anshika Singh

Roll No.: 2000300130022
 Console (left): The way R works is you write a line of code to execute some kind of task on a data
object. The R console allows you to run code interactively. The screen prompt > is an invitation
from R to enter its world. This is where you type code in, press enter to execute the code and see
the results.
 Environment, History and Connections tabs (upper-right):
o Environment- shows all the R objects that are currently open in your workspace.
o History- shows a list of executed commands in your current session.
o Connections- you can connect to a variety of data sources and explore the objects and data
inside the connection.
 Files, Plots, Packages, Help and Viewer tabs (lower-right):
o Files- show all the files and folders in your current working directory.
o Plots- show any charts, graphs, maps and plots you’ve successfully executed.
o Packages- tell you all the R packages that you have access to.
o Help- shows help documentation for R commands that you’ve called up
o Viewer- allows you to view local web content.

R Data Types:

Let’s now explore what R can do. R is really just a big fancy calculator. For example, type in the
following mathematical expression next to the > in the R console (left window)

Name: Anshika Singh

Roll No.: 2000300130022
There are four basic data types in R: character, logical, numeric and factors (they are two types:
complex and raw).

1. Characters: used to represent words or letter in R. Anything with quotes will be interpreted as a
character.
2. Logicals: takes two values i.e. FALSE or TRUE. They are usually constructed with comparison
operators.
3. Numeric: are separated into two types: integer and double.
4. Factors: are sort of like characters, but not really. It is actually a numeric code with character-
valued levels.

R Data Structures:

Now, let’s go through how we can store data in R.

Vectors: are the most common and basic R data structure. It is simply a sequence of values which can
be of any data type but all of the same type. There are number of ways to create a vector depending on
the data type, but the most common is to insert the data you want to save in a vector into the command
c().

For example, to save the values 4, 16, 9 in a vector type in

Name: Anshika Singh

Roll No.: 2000300130022
You can also have a vector of character values.

The above code does not actually save the values 4, 16, 9- it just presents it on the screen in a vector. If
you want to use these values again, you can save it in a data object. You assign data to an object using
the arrow sign < -. This will create an object in R’s memory that can be called back into the command
window at any time.

Now, the “b becomes “hello world””.

You should see the object b pop up in the Environment tab on the top right window of your RStudio
Interface.

Every vector has two key properties: type and length.

The type property indicates the data type that the vector is holding. Use the command typeof() to
determine the type.

The command length() determines the number of data values that the vector is storing

Name: Anshika Singh

Roll No.: 2000300130022
You can also directly determine if a vector is of a specific data type by using the command is.X()
where you replace X with the data type.

You can also coerce a vector of one data type to another. For example, save the value “1” and “2” (both
in quotes) into a vector named x1.

To remove any object from R forever, use the command rm().

Data Frames: are even higher level data structures. They store vectors of the same length.

Create a vector called v2 storing the values 5, 12, 25. We can create a data frame using the command
data.frame() storing the vectors v1 and v2 as columns

df1 should pop up in your Environment window.

We can store different types of vectors in a data frame. For higher level data structures like a data
frame, use the function class() to figure out what kind of object you are working with. We can’t use
length() on a data frame as it has more than one vector. Instead, it has dimensions – the number of rows
and columns. We can find column names by using the command colnames(). Moreover, we can extract
columns from data frames by referring to their names using the $ sign. We can also extract data from
data frames using brackets [ , ].

Name: Anshika Singh

Roll No.: 2000300130022
Functions:

Functions are also known as commands. An R function is a packaged recipe that converts one or more
inputs (called arguments) into a single output. You execute most of your tasks in R using functions.
Every function in R will have the following basic format

functionName(arg1 = val1, arg2 = val2, ….)

Let’s use the function seq() which makes regular sequences of numbers. You can find out what a
function does and its options by calling up its help documentation by typing ? and the function name.
the help documentation should have popped up in the bottom right window of you RStudio interface.

Name: Anshika Singh

Roll No.: 2000300130022
R Scripting:

In running few lines of code above, you directly work in the R console and issue commands in an
interactive way. That is, you type a command after >, you hit enter/return, R responds, you type the
next command, hit enter, R responds and so on.

So instead of writing the command directly into the console, you should write it in a script. The process
is as follows: Type your command in the script. Run the code from the script. R responds. You get
results. You can write two commands in a script. Run both simultaneously. R responds. You get results.
This is the basic flow.

Name: Anshika Singh

Roll No.: 2000300130022
Experiment No. 1
AIM:

To get the input from user and perform numerical operations (MAX, MIN, AVG, SUM, SQRT, ROUND)
using in R.

PROCEDURE:

1. max() function in R language is used to find the maximum element present in an object. This object can
be a vector, a list, a data frame etc.
Syntax: max(object,
na.rm) Parameters:
object: vector, matrix, list, data frame, etc.
na.rm: Boolean value to remove NA element.

2. min() function in R language is used to find the minimum element present in an object. This object can be
a vector, a list, a data frame etc.
Syntax: min(object,
na.rm) Parameters:
object: vector, matrix, list, data frame, etc.
na.rm: Boolean value to remove NA element.

Name: Anshika Singh

Roll No.: 2000300130022
3. mean() function in R language is used to compute the average of values. This function takes a numerical
vector as an argument and results in the average/mean of this vector.
Syntax: mean(x, na.rm)
Parameters:
x: numeric vector
na.rm: Boolean value to ignore NA element.

4. sum() function in R language is used to compute the sum of

values. Syntax: sum(x, na.rm)
Parameters:
x: it is the vector having the numeric values
na.rm: Boolean value to remove or returns NA element

5. sqrt() function in R language is used to find the square root for an individual number or an
expression. Syntax: sqrt(numeric_expression)
Parameters:
numeric_expression: it can be a numeric value or a valid numerical expression for which you want to find
the square root in R.

Name: Anshika Singh

Roll No.: 2000300130022
6. round() function in R language is used to round off values to a specific number of decimal
value. Syntax: round(x, digits)
Parameters:
x: value to be round off
digits: number of digits to which value has to be round off

Name: Anshika Singh

Roll No.: 2000300130022
Experiment No. 2
AIM:

To perform data import/export (CSV, XLS, TXT) operations using data frames in R.

PROCEDURE:

1. Export Data-
There are numerous methods for exporting R objects into other formats. For SPSS, SAS and Stata, you will
need to load the foreign packages. For excel, you will need the xls ReadWrite package.
a. To a CSV file- Firstly, you’re required to create data frame. Then, will export our dataframe into
csv file.
Syntax: write.csv(df, path)
Parameters:
df: dataset to save.
path: a string. Set the destination path. Path + filename + extension

write.csv(Your Dataframe, “Path where you’d like to export the Dataframe\\FileName.csv”,

row.names = FALSE)

b. To Excel file- Java needs to installed before.

If you’re a Windows user, you can install the library directly with conda to export dataframe to excel:
conda install –c r r-xlsx
Once the library is installed you can use the function write.xlsx(). A new Excel workbook is created in
the working directory for R export to Excel data.

library(xlsx)
write.xlsx(df, “X.xlsx”)

If you’re a Mac OS user, you need to follow these steps:

 Step1: install the latest version of java
 Step2: install library rJava
 Step3: install library xlsx

c. To TXT file- the basic function write.table() can be used to export data frame to txt
file. Syntax: write.table(x, file)
Parameters:
x: a matrix or a data frame to be written
file: a character specifying the name of the result.

write.table(mydata, "c:/mydata.txt", sep="\t")

Name: Anshika Singh

Roll No.: 2000300130022
2. Import Data-
Importing data into R is fairly simple. For Stata and Systat, use the foreign package. For SPSS and SAS use
the Hmisc package for ease and functionality.
a. From a Comma Delimited Text file-

# first row contains variable names, comma is separator

#assign the variable is to row names
#note the / instead of \ on mswindows systems

mydata < - read.table(“c:/mydata.csv”, header=TRUE, sep=”,”, row.names=”id”)

b. From Excel-

#read in the first worksheet from the workbook myexcel.xlsx

#first row contains variable names
library(xlsx)
mydata < - read.xlsx(“c:/myexcel.xlsx”, 1)

#read in the worksheet named mysheet

mydata < - read.xlsx(“c:/myexcel.xlsx”, sheetName=”mysheet”)

c. From SPSS-
#save SPSS dataset in transport format
get file=’c:\mydata.sav’
export outfile=’c:\mydata.por’

#in R
library(Hmisc)
mydata < - spss.get(“c:/mydata.por”, use.value.labels=TRUE)
#last option converts value labels to R factors

d. From SAS-

#save SAS dataset in transport format

libname out xport ‘c:/mydata.xpt’;
data out.mydata;
set sasuser.mydata;
run;

#in R
library(Hmisc)
mydata < - sasxport.get(“c:/mydata.xpt”)
Name: Anshika Singh
Roll No.: 2000300130022
#character variables are converted to R factors
e. From Stata-

#input Stata file

library(foreign)
mydata < - read.dta(“c:/mydata.dta”)

f. From systat-

#input Systat file

library(foreign)
mydata < - read.systat(“c:/mydata.dta”)

Name: Anshika Singh

Roll No.: 2000300130022
Experiment No. 3
AIM:

To get the input matrix from user and perform Matrix Addition, Subtraction, Multiplication, Inverse Transpose
and Division operations using vector concept in R.

PROCEDURE:

Matrices in R are a bunch of values, either real or complex numbers, arranged in a group of fixed number of
rows and columns. Matrices are used to depict the data in a structured and well-organized format.
It is necessary to enclose the elements of a matrix in parentheses or brackets.

A matrix with 9 elements is shown below

This matrix [M] has 3 rows and 3 columns. Each element of matrix [M] can be referred to by its row and
column number.

Syntax for creating a matrix:

matrix(data, nrow, ncol, byrow, dimnames)

Parameters:

data: input vector which becomes the data elements of the

matrix nrow: number of rows to be created

ncol: number of columns to be created

byrow: logical clue. If TRUE then the input vector elements are arranged by

row. dimname: names assigned to the rows and columns

Order of a matrix – is defined in terms of its number of rows and

columns. Order of a matrix = No. of rows * No. of columns

Name: Anshika Singh

Roll No.: 2000300130022
Operations on Matrices – there are four basic operations i.e. DMAS (Division, Multiplication, Addition,
Subtraction)

1. Matrix addition:
Step1: Creating first matrix
Step2: Creating second matrix
Step3: Getting number of rows and columns
Step4: Creating matrix to store results
Step5: Printing original matrices

Name: Anshika Singh

Roll No.: 2000300130022
2. Matrix subtraction:
Step1: Creating first matrix
Step2: Creating second matrix
Step3: Getting number of rows and columns
Step4: Creating matrix to store results
Step5: Printing original matrices
Step6: Calculating difference of matrices
Step7: Printing resultant matrix

3. Matrix multiplication:
Step1: Creating first matrix
Step2: Creating second matrix
Step3: Getting number of rows and
columns Step4: Creating matrix to store
results Step5: Printing original matrices
Step6: Calculating product of
matrices Step7: Printing resultant
matrix
4. Matrix division:
Step1: Creating first matrix
Step2: Creating second matrix
Step3: Getting number of rows and
columns Step4: Creating matrix to store
results Step5: Printing original matrices
Name: Anshika Singh
Roll No.: 2000300130022
Step6: Calculating product of matrices

Name: Anshika Singh

Roll No.: 2000300130022
Step7: Printing resultant matrix

5. Inverse transpose:
Step1: Create 3 different vectors using combine method
Step2: Bind the three vectors into a matrix using rbind() which is basically row-wise binding
Step3: Print the original matrix
Step4: Use the solve() function to calculate the inverse
Step5: Print the inverse of the matrix

Name: Anshika Singh

Roll No.: 2000300130022
Experiment No. 4
AIM:

To perform statistical operations (Mean, Median, Mode and Standard Deviation) using R.

PROCEDURE:

Statistical analysis in R is performed by using many in-built functions. Most of these functions are part of the
R base package. These functions take R vector as an input along with the arguments and give the result.

1. Mean- It is calculated by taking the sum of the values and dividing with the number of values in a data
series.
Syntax: mean(x, trim = 0, na.rm = FALSE,
…) Parameters:
x: input vector
trim: used to drop some observations from both end of the sorted
order na.rm: used to remove the missing values from the input vector

2. Median- The middle most value in a data series is called the median. The median() function is used in R
to calculate this value.
Syntax: median(x, na.rm =
FALSE) Parameters:
x: input vector
na.rm: used to remove the missing values from the input vector

Name: Anshika Singh

Roll No.: 2000300130022
3. Mode- The mode is the value that has highest number of occurrences in a set of data. Unlike mean and
median, mode can have both numeric and character data.
R does not have a standard in-built function to calculate mode. So we create a user function to calculate
mode of a data set in R. This function takes the vector as input and gives the mode value as output.

4. Standard Deviation- A measure that is used to quantify the amount of variation or dispersion of a set of
data values.
Syntax: sd(x, na.rm = FALSE)
Parameters:
x: input vector
na.rm: used to remove the missing values from the input vector

Name: Anshika Singh

Roll No.: 2000300130022
Experiment No. 5
AIM:

To perform data pre-processing operations i) Handling Missing data ii) Min-Max normalization

PROCEDURE:

i) In R missing values are represented by NA (not available). Impossible values are represented by the
symbol NaN (not a number). Unlike SAS, R uses the symbol for character and numeric data.
 Testing for Missing Values:
is.na(x) #returns TRUE of x is
missing y < - c(1, 2, 3, NA)
is.na(y) #returns a vector (F F F T)

 Recoding Values to Missing:

#recode 99 to missing for variable v1
#select rows where v1 is 99 and recode column v1
mydata$v1[mydata$v1==99] < - NA

 Excluding Missing Values from Analyses:

x < - c(1, 2, NA, 3)
mean(x) #returns NA
mean(x, na.rm=TRUE) #returns 2

Name: Anshika Singh

Roll No.: 2000300130022
The function complete.cases() returns a logical vector indicating which cases are complete.
#list rows of data that have missing value
mydata[!complete.cases(mydata), ]

The function na.omit() returns the object with listwise deletion of missing
values. #create new dataset without missing data
newdata < - na.omit(mydata)

ii) Min-max normalization subtracts the minimum value of an attribute from each value of the attribute
and then divides the difference by the range of the attribute. These new values are multiplied by the
new range of the attribute and finally added to the new maximum value of the attribute. These
operations transform the data into a new range, generally [0,1].
Syntax: mmnorm(data, minval = 0, maxval = 1)
Parameters:
data: the dataset to be normalized, including classes
minval: the minimum value of the transformed range
maxval: the maximum value of the transformed range

Name: Anshika Singh

Roll No.: 2000300130022
Experiment No. 6
AIM:

To perform dimensionality reduction operation using PCA for Houses Data Set

PROCEDURE:

In machine learning and statistics, dimensionality reduction or dimension reduction is the process of reducing
the number of random variables under consideration, via obtaining a set of principal variables. It can be
divided into feature selection and feature extraction.
Principal component analysis (PCA) is routinely employed on a wide range of problems. From the detection of
outliers to predictive modeling, PCA has the ability of projecting the observations described by p variables into
few orthogonal components defined. It is an unsupervised method, meaning it will always look into the
greatest sources of variation regardless of the data structure.

Step 0: Built pcaChart function for exploratory data analysis on Variance

Step 1: Load Data for analysis - Crime Data

Name: Anshika Singh

Roll No.: 2000300130022
Principal Component Analysis (PCA) is a method used to reduce the number of variables in a dataset. Now, we
will simplify the data into two-variables data. This does not mean that we are eliminating two variables and
keeping two; it means that we are replacing the four variables with two brand new ones called principal
components.€ • .
prcomp: Performs a principal components analysis on the given data matrix and returns the results as an object
of class.

Step 2: Standardize the data by using scale and apply “prcomp” function

Name: Anshika Singh

Roll No.: 2000300130022
Step 3: Choose the principal components with highest variances
Now that R has computed 4 new variables (principal components), you can choose the two (or one, or three)
principal components with the highest variances.
This step is to identify coverage of variance in dataset by individual principal components. summary() function
can be used or screen plot can be used to explain the variance.

From the the summary, we can undersand PC1 explains 62% of variance and PC2 explains 24% so on. Usually
Principal components which explains about 95% variance can be considered for models. Summary also yields
cumulative proportion of the principal components.
Best thing is, plot PCA using various types of scree plot. Above declared pcaCharts function invokes various
forms of scree plot.

Step 4: Visualization of Data in the new reduced dimension

Name: Anshika Singh

Roll No.: 2000300130022
Name: Anshika Singh
Roll No.: 2000300130022
Name: Anshika Singh
Roll No.: 2000300130022
Experiment No. 7
AIM:

To perform Simple Linear Regression with R.

PROCEDURE:

Linear Regression:
It is commonly used type of predictive analysis. It is a statistical approach for modeling relationship
between a dependent variable and a given set of independent variables.
There are two types of linear regression:
1. Simple Linear Regression- uses only one independent variable.
2. Multiple Linear Regression- uses two or more independent variables.
Simple Linear Regression:
The dataset contains observations about income (in a range of $15k to $75k) and happiness (rated on a scale
of 1 to 10) in an imaginary sample of 500 people. The income values are divided by 10,000 to make the
income data match the scale of the happiness scores (so a value of $2 represents $20,000, $3 is #30,000,
etc.)
First of all, you are required to install the packages you need for the analysis,
install.packages(“ggplot2”)
install.packages(“dplyr”)
install.packages(“broom”)
install.packages(“ggpubr”)

Next, load the packages into your R environment:

library(ggplot2)
library(dplyr)
library(broom)
library(ggpubr)

Name: Anshika Singh

Roll No.: 2000300130022
Step1: Load the data into R
Follow these 4 steps:
1. In RStudio, go to File > Import Dataset > From Text(base).
2. Choose the data file and an Import Dataset window pops up.
3. In the Data Frame window, you should see an X (index) column and columns listing the data for each of
the variables (income and happiness).
4. Click on the Import button and the file should appear in your Environment tab on the upper right side of
the RStudio screen.
After you’ve loaded the data, check that it has been read in correctly using summary().
summary(income.data)

As both our variables are quantitative, when we run this function we see a table in our console with a
numeric summary of the data. This tells us the minimum, median, mean, and maximum values of the
independent variable(income) and dependent variable(happiness):

Name: Anshika Singh

Roll No.: 2000300130022
Step2: Make sure your data meet the assumptions
We can use R to check that our data meet the four main assumptions for linear regression.
1. Independence of observations- As we have only one independent variable and one dependent variable,
we don’t need to test for any hidden relationships among variables.
2. Normality- To check whether the dependent variable follows a normal distribution, use the hist()
function.
hist(income.data$happiness)

3. Linearity- The relationship between the independent and dependent variable must be linear. We can test
this visually with a scatter plot to see if the distribution of data points could be described with a straight
line.
plot(happiness ~ income, data = income.data)

Name: Anshika Singh

Roll No.: 2000300130022
The relationship looks roughly linear.
4. Homoscedasticity- This means that the prediction error doesn’t change significantly over the range of
prediction of the model. We can test this assumption later.
Step3: Perform the linear regression analysis
Now that you’ve determined your data meet the assumptions, you can perform a linear regression analysis
to evaluate the relationship between the independent and dependent variables.
Let’s see if there’s a linear relationship between income and happiness in our survey of 500 people with
incomes ranging from $15k to $75k, where happiness is measured on a scale of 1 to 10.
To perform a simple linear regression analysis and check the results, you need to run two lines of code. The
first line of code makes the linear model and the second line prints out the summary of the model:
income.happiness.lm <- lm(happiness ~income, data = income.data)
summary(income.happiness.lm)

Name: Anshika Singh

Roll No.: 2000300130022
This output table first presents the model equation, then summarizes the model residuals.

The Coefficients section shows:

1. The estimates (Estimate) for the model parameters- the value of the y- intercept(0.204) and the
estimated effect of income on happiness (0.713).
2. The standard error of the estimated values (Std. Error).
3. The test statistic (t value)
4. The p-value (Pr ( > | t | ) ), aka the probability of finding the given t-statistic of the null hypothesis of no
relationship were true.
The final three lines are model diagnostics- the most important thing to note is the p-value, which will
indicate whether the model fits the data well.
From these results, we can say that there is a significant positive relationship between income and
happiness (p-value < 0.001), with a 0.713 unit (+/- 0.01) increase in happiness for every unit increase in
income.
Step4: Check for homoscedasticity

Name: Anshika Singh

Roll No.: 2000300130022
Before proceeding with data visualization, we should make sure that our models fit the homoscedasticity
assumption of the linear model.
We can run plot(income.happiness.lm) to check whether the observed data meets our model assumptions:
par(mfrow=c(2,2))
plot(income.happiness.lm)
par(mfrow=c(1,1))

Note that the par(mfrow()) command will divide the Plots window into the number of rows and columns
specified in the brackets. So par(mfrow=c(2,2)) divides it up into two rows and two columns. To go back to
plotting one graph in the entire window, set the parameters again and replace the (2,2) with (1,1).

Residuals are the unexplained variance. They are not exactly the same as model error, but they are
calculated from it, so seeing a bias in the residuals would also indicate a bias in the error. The most
important thing to look for is that the red lines representing the mean of the residuals are all basically
horizontal and centered on zero. This means there are no outliers or biases in the data that would make a

Name: Anshika Singh

Roll No.: 2000300130022
linear regression invalid. Based on these residuals, we can say that our model meets the assumption of
homoscedasticity.

Step5: Visualize the results with a graph

Next, we can plot the data and the regression line from our linear regression model so that the results can be
shared.
Follow 4 steps to visualize the results of your simple linear regression:
1. Plot the data points on a graph
income.graph <- ggplot(income.data, aes(x=income, y=happiness)) + geom_point()
income.graph

2. Add the linear regression line to the plotted data

income.graph <- income.graph + geom_smooth(method=”lm”, col=”black”)
income.graph

3. Add the equation for the regression line

income.graph <- income.graph + stat_regline_equation(label.x=3, label.y=7)
income.graph

Name: Anshika Singh

Roll No.: 2000300130022
4. Make the graph ready for publication
income.graph + theme_bow() + labs(title = “Reported happiness as a function of income”, x= “Income
(x$10,000)”, y= “Happiness score (0 to 10)”)

Step6: Report your results

We found a significant relationship between income and happiness (p<0.001, R2=0.73 _+ 0.0193), with a
0.73-unit increase in reported happiness for every $10,000 increase in income.

Name: Anshika Singh

Roll No.: 2000300130022
Experiment No. 8
AIM:

To perform K-Means clustering operation and visualize for iris data set

PROCEDURE:

The Iris dataset contains the data for 50 flowers from each of the 3 species - Setosa, Versicolor and Virginica.
The data gives the measurements in centimeters of the variables sepal length and width and petal length and
width for each of the flowers.
Goal of the study is to perform exploratory analysis on the data and build a K-means clustering model to
cluster them into groups. Here we have assumed we do not have the species column to form clusters and then
used it to check our model performance.
First of all, install package ggplot2,
library(ggplot2)

Exploratory Data Analysis:

The dataset has 150 observations equally distributed observations among the three species - Setosa, Versicolor
and Verginica. The below table shows the summary statistics of all the 4 variables.

summary(iris)

Name: Anshika Singh

Roll No.: 2000300130022
sapply(iris[,-5], var)

The petal length and petal width show 3 clusters.

summary(iris)

Name: Anshika Singh

Roll No.: 2000300130022
library(ggplot2)
ggplot(iris, aes(x = Sepal.length, y = Sepal.width, col = Species)) + geom_point()

ggplot(iris,aes(x = Petal.Length, y = Petal.Width, col= Species)) + geom_point()

Name: Anshika Singh

Roll No.: 2000300130022
Finding the optimum number of clusters:
The plot of Within cluster sum of squares vs the number of clusters show us an elbow point at 3. So, we can
conclude that 3 is the best value for k to be used to create the final model.

plot(1:k.max,wss, type= "b", xlab = "Number of clusters(k)", ylab = "Within cluster sum of squares")

Name: Anshika Singh

Roll No.: 2000300130022
The final cluster model:
The final model is built using kmeans and k = 3. The nstart value has also been defined as 20 which means
that R will try 20 different random starting assignments and then select the one with the lowest within cluster
variation.

Name: Anshika Singh

Roll No.: 2000300130022
From the table we can see most of the observations have been clustered correctly however, 2 of the versicolor
have been put in the cluster with all the virginica and 4 of the verginica have been put in cluster 3 which
mostly has versicolor.

Name: Anshika Singh

Roll No.: 2000300130022
Experiment No. 9
AIM:

Write R script to diagnose any disease using KNN classification and plot the results.

PROCEDURE:

Machine learning finds extensive usage in pharmaceutical industry especially in detection of oncogenic (cancer
cells) growth. R finds application in machine learning to build models to predict the abnormal growth of cells
thereby helping in detection of cancer and benefiting the health system.

Let’s see the process of building this model using KNN algorithm in R Programming.

Step1: Data Collection

We will use a data set of 100 patients (created solely for the purpose of practice) to implement the KNN
algorithm and thereby interpreting results.

The data set consists of 100 observations and 10 variables (out of which 8 numeric variables and one
categorical variable and is ID) which are as follows:

1. Radius
2. Texture
3. Perimeter
4. Area
5. Smoothness
6. Compactness
7. Symmetry
8. Fractal dimension

In real life, there are dozens of important parameters needed to measure the probability of cancerous growth
but for simplicity purposes let’s deal with 8 of them.

Here’s how the data set looks like:

Name: Anshika Singh

Roll No.: 2000300130022
Step2: Preparing and exploring data:

Let’s make sure that we understand every line of code before proceeding to the next stage:
setwd(“C:/Users/Payal/Desktop/KNN”) #using this command we’ve imported the Prostate_Cancer.csv data
file. This command is used to point to the folder containing the required file.

prc <- read.csv(“Prostate_Cancer.csv”, stringsAsFactors=FALSE) #this command imports the

required data set and saves it to the prc data frame.

stringsAsFactors=FALSE #this command helps to convert every character vector to a factor

wherever it makes sense.

str(prc) #we use this command to see whether the data is structured or not.

We find that the data is structured with 10 variables and 100 observations. If we observe the data set, the first
variable ‘id’ is unique in nature and can be removed as it does not provide useful information.

Name: Anshika Singh

Roll No.: 2000300130022
prc <- prc[-1] #removes the first variable(id) from the data set.

The data set contains patients who have been diagnosed with either Malignant (M) or Benign (B) cancer.

table(prc$diagnosis_result) #it helps us to get the numbers of patients

The variable diagnosis_result is our target variable i.e. this variable will determine the results of the diagnosis
based on the 8 numeric variables)

In case we wish to rename B as “Benign” and M as “Malignant” and see the results in the percentage form, we
may write as:

prc$diagnosis <- factor(prc$diagnosis_result, levels=c(“B”, “M”), labels=c(“Benign”,”Malignant”))

round(prop.table(table(prc$diagnosis)) * 100, digits=1) #it gives the result in the percentage form rounded of
to 1 decimal place (and so it’s digits=1)

Normalizing numeric data

This feature is of paramount importance since the scale used for the values for each variable might be different.
The best practice is to normalize the data and transform all the values to a common scale.

normalize <- function(x) {

Name: Anshika Singh
Roll No.: 2000300130022
return ((x – min(x) / (max(x) – min(x)))) }

Once we run this code, we are required to normalize the numeric features in the data set. Instead of
normalizing each of the 8 individual variables we use:

prc <- as.data.frame(lapply(prc[2:9], normalize))

The first variable in our data set (after removal of id) is ‘diagnosis_result’ which is not numeric in nature. So,
we start from 2nd variable. The function lapply() applies normalize() to each feature in the data frame. The final
result is stored to prc_n data frame using as.data.frame() function

Let’s check using the variable ‘radius’ whether the data has been normalized.

summary(prc_n$radius)

The results show that the data has been normalized. Do try with the other variables such as perimeter, area etc.

Creating training and test data set

The KNN algorithm is applied to the training data set and the results are verified on the test data set.

For this, we would divide the data set into 2 portions in the ratio of 65: 35 (assumed) for the training and test
data set respectively. You may use a different ratio altogether depending on the business requirement!

We shall divide the prc_n data frame into prc_train and prc_test data frames

prc_train <- prc_n[1:65,]

prc_test <-

prc_n[66:100,]

A blank value in each of the above statements indicate that all rows and columns should be included.

Our target variable is ‘diagnosis_result’ which we have not included in our training and test data

sets. prc_train_labels <- prc_n[1:65, 1]

Name: Anshika Singh
Roll No.: 2000300130022
prc_test_labels <- prc_n[66:100, 1] #this code takes the diagnosis factor in column1 of the prc data frame and
on turn creates prc_train_labels and prc_test_labels data frame.

Step 3 – Training a model on data

The knn() function needs to be used to train a model for which we need to install a package ‘class’. The knn() function
identifies the k-nearest neighbors using Euclidean distance where k is a user-specified number.

You need to type in the following commands to use knn()

install.packages(“class”)

library(class)

Now we are ready to use the knn() function to classify test data

prc_test_pred <- knn(train = prc_train, test=prc_test, c1=prc_train_labels, k=10)

The value for k is generally chosen as the square root of the number of observations.

knn() returns a factor value of predicted labels for each of the examples in the test data set which is then
assigned to the data frame prc_test_pred.

Name: Anshika Singh

Roll No.: 2000300130022
Step 4 – Evaluate the model performance
We have built the model but we also need to check the accuracy of the predicted values in prc_test_pred as to
whether they match up with the known values in prc_test_labels. To ensure this, we need to use the
CrossTable() function available in the package ‘gmodels’.
We can install it using:

install.packages(“gmodels”)

The test data consisted of 35 observations. Out of which 5 cases have been accurately predicted (TN->True
Negatives) as Benign (B) in nature which constitutes 14.3%. Also, 16 out of 35 observations were accurately
predicted (TP-> True Positives) as Malignant (M) in nature which constitutes 45.7%. Thus, a total of 16 out of
35 predictions where TP i.e, True Positive in nature.
There were no cases of False Negatives (FN) meaning no cases were recorded which actually are malignant in
nature but got predicted as benign. The FN’s if any poses a potential threat for the same reason and the main
focus to increase the accuracy of the model is to reduce FN’s.
There were 14 cases of False Positives (FP) meaning 14 cases were actually benign in nature but got predicted
as malignant.
Name: Anshika Singh
Roll No.: 2000300130022
The total accuracy of the model is 60 %( (TN+TP)/35) which shows that there may be chances to improve the
model performance

Step 5 – Improve the performance of the model

This can be taken into account by repeating the steps 3 and 4 and by changing the k-value. Generally, it is the
square root of the observations and in this case we took k=10 which is a perfect square root of 100.The k-value
may be fluctuated in and around the value of 10 to check the increased accuracy of the model. Do try it out
with values of your choice to increase the accuracy! Also remember, to keep the value of FN’s as low as
possible.

Name: Anshika Singh

Roll No.: 2000300130022
Experiment No. 10
AIM:

To perform market basket analysis using Association Rules (Apriori).

PROCEDURE:

Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between
items. It works by looking for combinations of items that occur together frequently in transactions. To put it
another way, it allows retailers to identify relationships between the items that people buy.
Association Rules are widely used to analyze retail basket or transaction data, and are intended to identify
strong rules discovered in transaction data using measures of interestingness, based on the concept of strong
rules.
In retailing, most purchases are bought on impulse. Market basket analysis gives clues as to what a customer
might have bought if the idea had occurred to them. As a first step, therefore, market basket analysis can be
used in deciding the location and promotion of goods inside a store. If, as has been observed, purchasers of
Barbie dolls have are more likely to buy candy, then high-margin candy can be placed near to the Barbie doll
display. Customers who would have bought candy with their Barbie dolls had they thought of it will now be
suitably tempted.

Association Rules:

There are many ways to see the similarities between items. These are techniques that fall under the general
umbrella of association. The outcome of this type of technique, in simple terms, is a set of rules that can be
understood as “if this, then that”.
Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were
purchased. The receipt is a representation of stuff that went into a customer’s basket — and therefore ‘Market
Basket Analysis’.
That is exactly what the Groceries Data Set contains: a collection of receipts with each line representing 1
receipt and the items purchased. Each line is called a transaction and each column in a row represents an item.

Apriori Recommendation with R

#load the libraries
library(arules)
library(arulesViz)
library(datasets)

#load the dataset

data(Groceries)

Name: Anshika Singh

Roll No.: 2000300130022
Lets explore the data before we make any rules:

#create an item frequency plot for the top 20 items

itemFrequencyPlot(Groceries, topN=20, type=

“absolute”)

We are now ready to mine some rules! You will always have to pass the minimum
required support and confidence.
 We set the minimum support to 0.001

 We set the minimum confidence of 0.8

 We then show the top 5 rules

#get the rules

rules <- apriori(Groceries, parameter = list(sup=0.001,
conf=0.8)) #show the top 5 rules, but only 2 digits
options(digits=2)
inspect(rules[1:5])

The output we see should look something like this

Name: Anshika Singh

Roll No.: 2000300130022
This reads easily, for example: if someone buys yogurt and cereals, they are 81% likely to buy whole milk too.
We can get summary info. about the rules that give us some interesting information such as:
 The number of rules generated: 410

 The distribution of rules by length: Most rules are 4 items long

 The summary of quality measures: interesting to see ranges of support, lift, and confidence.

 The information on the data mined: total data mined, and minimum parameters.

Sorting stuff out

The first issue we see here is that the rules are not sorted. Often we will want the most relevant rules first. Lets
say we wanted to have the most likely rules. We can easily sort by confidence by executing the following code.

rules <- sort(rules, by=”confidence”, decreasing=TRUE)

Now our top 5 output will be sorted by confidence and therefore the most relevant rules appear.

Name: Anshika Singh

Roll No.: 2000300130022
Rule 4 is perhaps excessively long. Lets say you wanted more concise rules. That is also easy to do by adding a
“maxlen” parameter to your apriori function:

rules <- apriori(Groceries, parameter = list(sup =0.001, conf = 0.8, maxlen = 3))

Redundancies

Sometimes, rules will repeat. Redundancy indicates that one item might be a given. As an analyst you can elect
to drop the item from the dataset. Alternatively, you can remove redundant rules generated. We can eliminate
these repeated rules using the follow snippet of code:

subset.matrix <- is.subset(rules, rules)

subset.matrix[lower.tri(subset.matrix, diag=T) ] <- NA
redundant <- colSums(subset.matrix, na.rm=T) >= 1
rules.pruned <- rules[!redundant]
rules <- rules.pruned

Targeting items:

Now that we know how to generate rules, limit the output, lets say we wanted to target items to generate rules.
There are two types of targets we might be interested in that are illustrated with an example of “whole milk”:
1. What are customers likely to buy before buying whole milk

2. What are customers likely to buy if they purchase whole milk?

Answering the first question we adjust our apriori() function as follows:

rules<-apriori(data=Groceries, parameters=list (supp=0.001, conf=0.08),

appearance=list(default=”lhs”,rhs=”whole milk”), control= list(verbose=F))
rules <- sort(rules, decreasing=TRUE,
by=”confidence”) inspect(rules[1:5])

Name: Anshika Singh

Roll No.: 2000300130022
Likewise, we can set the left hand side to be “whole milk” and find its antecedents.

rules<-apriori(data=Groceries, parameters=list (supp=0.001, conf=0.15, minlen=2),

appearance=list(default=”rhs”,lhs=”whole milk”), control= list(verbose=F))
rules <- sort(rules, decreasing=TRUE,
by=”confidence”) inspect(rules[1:5])

Visualization
The last step is visualization. Lets say you wanted to map out the rules in a graph. We can do that with another
library called “arulesViz”.

library(arulesViz)
plot(rules, method=”graph”, interactive= TRUE, shading=NA)

You will get a nice graph that you can move around to look like this:

Name: Anshika Singh

Roll No.: 2000300130022
Name: Anshika Singh
Roll No.: 2000300130022

R Programming R Basics For Beginners. (Z-Library)
No ratings yet
R Programming R Basics For Beginners. (Z-Library)
177 pages
Unit 4
No ratings yet
Unit 4
105 pages
R Manual
No ratings yet
R Manual
48 pages
Data Analysis Using R Programming
No ratings yet
Data Analysis Using R Programming
341 pages
R Programming Language Unit01
No ratings yet
R Programming Language Unit01
133 pages
OOP With Java Notes-Module1 (BCS306A)
No ratings yet
OOP With Java Notes-Module1 (BCS306A)
50 pages
Introduction To R Programming Notes For Students
No ratings yet
Introduction To R Programming Notes For Students
41 pages
Bda Unit5
No ratings yet
Bda Unit5
110 pages
R Manual
No ratings yet
R Manual
84 pages
R - Overview
No ratings yet
R - Overview
178 pages
Mmsac FDP Tutorial
No ratings yet
Mmsac FDP Tutorial
54 pages
R Programming Lab
100% (1)
R Programming Lab
46 pages
Computerstatistik Skriptum
No ratings yet
Computerstatistik Skriptum
162 pages
Chapter1 Notes
No ratings yet
Chapter1 Notes
73 pages
R Practical Report
No ratings yet
R Practical Report
55 pages
R Programming Lab Manual
No ratings yet
R Programming Lab Manual
35 pages
How To Use The R Software
No ratings yet
How To Use The R Software
18 pages
R For Beginners
No ratings yet
R For Beginners
76 pages
R Tutorial
No ratings yet
R Tutorial
100 pages
1.R Unit 1
No ratings yet
1.R Unit 1
49 pages
DAR Programming - An Approach To Data Analytics-1
No ratings yet
DAR Programming - An Approach To Data Analytics-1
156 pages
Lec 1
No ratings yet
Lec 1
42 pages
R Notes
No ratings yet
R Notes
189 pages
R Notes
No ratings yet
R Notes
46 pages
Topic 1 - Intro To Basics
No ratings yet
Topic 1 - Intro To Basics
38 pages
R Tutiorial
No ratings yet
R Tutiorial
6 pages
R Module 1
No ratings yet
R Module 1
34 pages
R
No ratings yet
R
30 pages
Unit 1
No ratings yet
Unit 1
22 pages
1mod References
No ratings yet
1mod References
52 pages
Nirula R Programming Lab Manual
No ratings yet
Nirula R Programming Lab Manual
94 pages
Conclusion, Advanced Data Structures:Data Frames, Lists, Matrices, Arrays, Classes
No ratings yet
Conclusion, Advanced Data Structures:Data Frames, Lists, Matrices, Arrays, Classes
22 pages
Infosys Technical Interview Questions
No ratings yet
Infosys Technical Interview Questions
29 pages
R Language 1st Unit Deep
100% (3)
R Language 1st Unit Deep
61 pages
Introduction To R Programming
No ratings yet
Introduction To R Programming
60 pages
Unit 1 - Data Analysis Using R
No ratings yet
Unit 1 - Data Analysis Using R
28 pages
SystemVerilog Testbench
100% (18)
SystemVerilog Testbench
167 pages
Unit I
No ratings yet
Unit I
15 pages
Unit 1
No ratings yet
Unit 1
16 pages
Lab Manual
No ratings yet
Lab Manual
46 pages
R Language Lab Manual Lab 1
100% (1)
R Language Lab Manual Lab 1
33 pages
Co358U R' Programming Lab: Government College of Engineering Jalgaon M.S. Department of Computer Engineering
No ratings yet
Co358U R' Programming Lab: Government College of Engineering Jalgaon M.S. Department of Computer Engineering
97 pages
R Quick Guide
No ratings yet
R Quick Guide
140 pages
Brief R Tutorial
No ratings yet
Brief R Tutorial
8 pages
Ayush Lab File R
No ratings yet
Ayush Lab File R
25 pages
R Language Lab Manual Lab 1
No ratings yet
R Language Lab Manual Lab 1
32 pages
SSMDA Expt 7
No ratings yet
SSMDA Expt 7
16 pages
Introduction R
No ratings yet
Introduction R
20 pages
Ashish Srivastava R Lab File
No ratings yet
Ashish Srivastava R Lab File
25 pages
Unit 1
No ratings yet
Unit 1
19 pages
MIT 201 - Tutorial 01
No ratings yet
MIT 201 - Tutorial 01
8 pages
R Language
No ratings yet
R Language
59 pages
E5 - Statistical Analysis Using R
100% (1)
E5 - Statistical Analysis Using R
45 pages
R PROGRAMMING Material Upto Variable Assignment
No ratings yet
R PROGRAMMING Material Upto Variable Assignment
11 pages
Statistical Methods Lab Manual-2021-22
No ratings yet
Statistical Methods Lab Manual-2021-22
58 pages
Introduction To R
No ratings yet
Introduction To R
67 pages
Howtouser: 1 What Is R
No ratings yet
Howtouser: 1 What Is R
6 pages
Lab 01
No ratings yet
Lab 01
11 pages
R Unit 1 2018 Notes
No ratings yet
R Unit 1 2018 Notes
36 pages
R Tutorial Session 1-2
100% (1)
R Tutorial Session 1-2
8 pages
Chapter-1:-Introduction To R Language: 1.1 History and Overview
No ratings yet
Chapter-1:-Introduction To R Language: 1.1 History and Overview
7 pages
Verilog Tips and Interview Questions - Verilog
No ratings yet
Verilog Tips and Interview Questions - Verilog
8 pages
Introduction To C Programming
100% (2)
Introduction To C Programming
81 pages
Champion Compressors Modbus RTU Manual: Aircon L1 Compressor Controller Enercon Sequencer Controller & Accessories
100% (2)
Champion Compressors Modbus RTU Manual: Aircon L1 Compressor Controller Enercon Sequencer Controller & Accessories
45 pages
Fortran Tutorial
No ratings yet
Fortran Tutorial
31 pages
CSA02 C Programming Syllabus 20200504081112
No ratings yet
CSA02 C Programming Syllabus 20200504081112
2 pages
C++ Presentation
No ratings yet
C++ Presentation
4 pages
Java Notes
No ratings yet
Java Notes
7 pages
Robotics Training LNCHS 4
No ratings yet
Robotics Training LNCHS 4
18 pages
Python Notes
No ratings yet
Python Notes
249 pages
Pro Notes
No ratings yet
Pro Notes
52 pages
COSC-211 Introduction To Computer Program and Its Applications
No ratings yet
COSC-211 Introduction To Computer Program and Its Applications
8 pages
Frontsheet Asm Final Report (Ind)
No ratings yet
Frontsheet Asm Final Report (Ind)
16 pages
Storing and Operating in Java Class 7
No ratings yet
Storing and Operating in Java Class 7
7 pages
Java Full Stack Internship Report
No ratings yet
Java Full Stack Internship Report
31 pages
Oops Lab Course File - Sec A - Harshita Virwani 2023-24
No ratings yet
Oops Lab Course File - Sec A - Harshita Virwani 2023-24
42 pages
Wrapper Class
No ratings yet
Wrapper Class
6 pages
Blackmagic 3G-SDI Arduino Shield Manual
No ratings yet
Blackmagic 3G-SDI Arduino Shield Manual
244 pages
YCP Booklet 2024
No ratings yet
YCP Booklet 2024
57 pages
Julia Tutorial
No ratings yet
Julia Tutorial
53 pages
Object Oriented Concepts
No ratings yet
Object Oriented Concepts
11 pages
Com 315 Python Lecture Note 2022 1
No ratings yet
Com 315 Python Lecture Note 2022 1
44 pages
Cfitsio Quick Guide
No ratings yet
Cfitsio Quick Guide
41 pages
WK 1 Ses 1-2 - Review of OOP
No ratings yet
WK 1 Ses 1-2 - Review of OOP
13 pages
EE212 Group Assignment 6
No ratings yet
EE212 Group Assignment 6
6 pages
Data Stucture LAB Manual 03
No ratings yet
Data Stucture LAB Manual 03
12 pages
SPA Syllabus Semester 1 - Mumbai University
No ratings yet
SPA Syllabus Semester 1 - Mumbai University
2 pages
Unit 1
No ratings yet
Unit 1
42 pages
Learn R By Coding
From Everand
Learn R By Coding
Thomas Kurnicki
No ratings yet
R Programming - a Comprehensive Guide: Software
From Everand
R Programming - a Comprehensive Guide: Software
Editor IJSMI
No ratings yet