0% found this document useful (0 votes)
26 views78 pages

BDA Chapter6

Big Data Analysis

Uploaded by

20co08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views78 pages

BDA Chapter6

Big Data Analysis

Uploaded by

20co08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 78

Chapter 6:Big Data Analysis

Data Analytics with


R
By
Prof. Mukhtar Ansari
Department of Computer Engineering.
AIKTC –Anjuman-I-Islam’s Kalsekar Technical Campus.
Chapter 5: Data Analytics with R
• Exploring Basic features of R,
• Exploring RGUI,
• Exploring RStudio,
• Handling Basic Expressions in R,
• Variables in R,
• Working with Vectors,
• Storing and Calculating Values in R,
• Creating and using Objects,
• Interacting with users,
• Handling data in R workspace,
• Executing Scripts,
• Creating Plots,
• Accessing help and documentation in R
• Reading datasets and Exporting data from R,
• Manipulating and Processing
• Data in R,
• Using functions instead of script,
• built-in functions in R
• Data Visualization: Types, Applications
What is R?
 R is a programming language and free software developed by Ross Ihaka and Robert
Gentleman in 1990.

 This programming language was named R, based on the first letter of first name of the
two R authors (Robert Gentleman and Ross Ihaka)

 R is an open-source programming language mostly used for statistical computing and


data analysis and is available across widely used platforms like Windows, Linux, and
MacOS.

 It generally comes with the command-line interface and provides a vast list of packages
for performing tasks.
 R is an interpreted language that supports both procedural programming
and
object-oriented programming.
A.I. Kalsekar Technical Campus, New Panvel
Why R Programming Language?

A.I. Kalsekar Technical Campus, New Panvel


Why R ?
• R programming is used as a leading tool for machine learning, statistics, and data
analysis. Objects, functions, and packages can easily be created by R.
• It’s a platform-independent language. This means it can be applied to all
operating system.
• It’s an open-source free language. That means anyone can install it in any
organization without purchasing a license.
• R programming language is not only a statistic package but also allows us to
integrate with other languages (C, C++). Thus, you can easily interact with
many data sources and statistical packages.
• The R programming language has a vast community of users and it’s growing
day by day.
• R is currently one of the most requested programming languages in the Data
Science job market that makes it the hottest trend nowadays.

A.I. Kalsekar Technical Campus, New Panvel


History of R….
The initial version of R, known as R 0.16, was released in 1995.

R quickly gained popularity among statisticians, data analysts, and researchers due to its flexibility,
extensibility, and powerful statistical capabilities.

The R language provides a wide range of statistical and graphical techniques, including linear and
nonlinear modeling, time series analysis, clustering, and more.

The first project was considered in 1992. The initial version was released in 1995,and in 2000, a stable
beta version was released.

Latest version of R version 4.3.1 was released on 16-06-2023.

A.I. Kalsekar Technical Campus, New Panvel


1991 Created in New Zealand by Ross Ihaka and Robert Gentleman.

1993 August, First announcement of R to the public.

1995 Martin Machler convinces Ross and Robert to use the GNU General public License to make R a free s\w.

1996 A public mailing list is created

1997 The R core Group is formed. The core group controls the source code for R.

2000 R version 1.0.0 was released.

2013 R version 3.0.2 was released in December.

2014-16 R versions 3.2.x to 3.3.x

R version 3.4.0 was released in April.

2023
VeAr.Is.
Features of R Programming Language.

R is a Programming language that supports both procedural as well as object-


oriented programming .

R can be easily integrated with many other technologies and


frameworks like Hadoop and HDFS.It can also integrate with other
programming languages like C,C++, python, java, FORTRAN and
JavaScript.

Open-source free language. That means anyone can install it in


any organization without purchasing a
license.

R Packages: One of the major features of R is it has a wide


availability of libraries. R has CRAN(Comprehensive R
Archive Network), which is a repository holding more
than 15, 0000 packages.
A.I. Kalsekar Technical Campus, New Panvel
Powerful Graphics: R’s graphical capabilities are amazing.it can produce publication-
quality graphs and plots
of any kind with its base package. With added packages like ggplot2 and plotly the
possibilities are endless.

No need for a compiler: The R language is interpreted. It does not need a compiler to convert the code into
a program.

Cross- Platform support: R is cross-platform supportive that is it can run on any OS and in any
Software environment without any hassle.

Performs fast calculations: You can perform wide variety of complex operations on
vectors, arrays, data frames and other data objects of varying sizes.

vast community of users :The R programming language has a vast


community of users and it’s growing day by day.

A.I. Kalsekar Technical Campus, New Panvel


Programming in R:
• Since R is much similar to other widely used languages syntactically, it is easier to
code and learn in R.
• Programs can be written in R in any of the widely used IDE like R Studio, Rattle,
Tinn-R, etc. After writing the program save the file with the extension .r.

• To run the program use the following command on the command line:
R file_name.r

A.I. Kalsekar Technical Campus, New Panvel


Advantages of R:

 R is the most comprehensive statistical analysis package. As new technology and


concepts often appear first in R.

 As R programming language is an open source. Thus, you can run R anywhere and at any
time.

 R programming language is suitable for GNU/Linux and Windows operating system.

 R programming is cross-platform which runs on any operating system.

 In R, everyone is welcome to provide new packages, bug fixes, and code enhancements.

A.I. Kalsekar Technical Campus, New Panvel


Disadvantages of R:
• In the R programming language, the standard of some packages is less than perfect.
• Although, R commands give little pressure to memory management. So
R programming language may consume all available memory.
• In R basically, nobody to complain if something doesn’t work.
• R programming language is much slower than other programming languages such as
Python and MATLAB.

A.I. Kalsekar Technical Campus, New Panvel


Exploring RGUI

A.I. Kalsekar Technical Campus, New Panvel


Exploring R
Studio

 R Studio is an integrated development environment(IDE) for R.

 IDE is a GUI, where you can write your quotes, see the results and also see the
variables that are generated during the course of programming.

 R Studio is available as both Open source and Commercial software.

 R Studio is also available as both Desktop and Server versions.

 R Studio is also available for various platforms such as Windows, Linux, and macOS.

A.I. Kalsekar Technical Campus, New Panvel


After the installation process is over, the R Studio
interface looks like:

A.I. Kalsekar Technical Campus, New Panvel


• The console panel(left panel) is the place where R is waiting for you to tell it what to do, and
see the results that are generated when you type in the commands.
• To the top right, you have the Environmental/History panel. It contains 2 tabs:
• Environment tab: It shows the variables that are generated during the course of
programming in a workspace that is temporary.
• History tab: In this tab, you’ll see all the commands that are used till now from the start of
usage of R Studio.
•To the right bottom, you have another panel, which contains multiple tabs, such as files, plots,
packages, help, and viewer.
• The Files tab shows the files and directories that are available within the default workspace
of R.
• The Plots tab shows the plots that are generated during the course of programming.
• The Packages tab helps you to look at what are the packages that are already installed in the
R Studio and it also gives a user interface to install new packages.
• The Help tab is the most important one where you can get help from the R
Documentation on the functions that are in built-in R.
• The final and last tab is that the Viewer tab which can be used to see the local web
content that’s generated using R. A.I. Kalsekar Technical Campus, New Panvel
Basic Expressions in R

• "hello world“
• 100+200
• a <- 60
• b <-68
• c =a+b
•C
• a<b
• a>b

A.I. Kalsekar Technical Campus, New Panvel


Variables in R
• A variable in R can store an atomic vector, group of atomic vectors or a combination of many R
objects. A valid variable name consists of letters, numbers and the dot or underline characters. The
variable name starts with a letter or the dot not followed by a number.

Variable Name Validity Reason


var_name2. valid Has letters, numbers, dot and underscore

var_name% Invalid Has the character '%'. Only dot(.) and underscore allowed.
2var_name invalid Starts with a number
.var_name, valid Can start with a dot(.) but the dot(.)should not be followed
var.name by a number.

.2var_name invalid The starting dot is followed by a number making it invalid.

_var_name invalid Starts with _ which is not valid

A.I. Kalsekar Technical Campus, New Panvel


R - Data
Types

Data Type Example Verify


Logical TRUE, FALSE v <- TRUE
print(class(v))
[1] "logical"
Numeric 12.3, 5, 999 v <- 23.5 print(class(v))
[1] "numeric"
Integer 2L, 34L, 0L v <- 2L print(class(v))
[1] "integer"
Complex 3 + 2i v <- 2+5i print(class(v))
[1] "complex"
Character 'a' , '"good", v <- "TRUE" print(class(v))
"TRUE", '23.4' [1] "character"

A.I. Kalsekar Technical Campus, New Panvel


Programming
Exercises
1. Write a R program to create three vectors numeric data, character data
and logical data. Display the content of the vectors and their type.
2. The numbers below are the first ten days of rainfall amounts in 1996. Read
them into a vector using the c() function
0.1 0.6 33.8 1.9 9.6 4.3 33.7 0.3 0.0 0.1
a) What was the mean rainfall, how about the standard deviation?
b)Calculate the cumulative rainfall (’running total’) over these ten days.
Confirm that the last value of the vector that this produces is equal to
the total sum of the rainfall.
c) Which day saw the highest rainfall (write code to get the answer)?

A.I. Kalsekar Technical Campus, New Panvel


3. Write a R program to create a simple bar plot of five subjects
marks.

marks = c(70, 95, 80, 74)


barplot(marks, main = "Comparing marks of 5 subjects",
xlab = "Marks", ylab = "Subject", names.arg =
c("English", "Science", "Math.", "Hist."), col = "green",horiz
= FALSE )

4. Write a R program to compute sum, mean and product of a given vector


elements.
nums = c(10, 20, 30)
print('Original vector:')
print(nums)
print(paste("Sum of vector elements:",sum(nums)))
print(paste("Mean of vector elements:",mean(nums)))
A.I. Kalsekar Technical Campus, New Panvel
print(paste("Product of vector elements:",prod(nums)))
5. Write a R program to list the distinct values in a vector from a
given vector.
v = c(10, 10, 10, 20, 30, 40, 40, 40, 50)
print("Original vector:")
print(v)
print("Distinct values of
the said vector:")
print(unique(v))

6. Write a R program to find the elements of a given vector


that are not in another given vector.
a = c(0, 10, 10, 10, 20, 30, 40, 40, 40, 50, 60)
b = c(10, 10, 20, 30, 40, 40, 50)
print("Original vector-1:")
print(a)
print("Original vector-2:")
print(b)
print("Elements of a that
are not in b:")
A.I. Kalsekar Technical Campus, New Panvel
result = setdiff(a, b)
7. Write a R program to reverse the order of given
vector.

v = c(0, 10, 10, 10, 20, 30, 40, 40, 40, 50, 60)
print("Original vector-1:")
print(v)
rv = rev(v)
print("The said vector in reverse order:")
print(rv)

8. Write a R program to
concatenate a vector.

a = c("Python","NumPy", "Pandas")
print(a)
x = paste(a, collapse = "")
print("Concatenation of the said string:")
print(x)

A.I. Kalsekar Technical Campus, New Panvel


9. Write a R program to add 3 to each element in a given vector. Print the original and
new vector.

v = c(1, 2, NULL, 3, 4, NULL)


print("Original vector:")
print(v)
new_v = (v+3)[(!is.na(v)) & v > 0]
print("New vector:")
print(new_v)

A.I. Kalsekar Technical Campus, New Panvel


R - Data
Types
• while doing programming in any programming language, you need to use various variables to store
various information. Variables are nothing but reserved memory locations to store values. This
means that, when you create a variable you reserve some space in memory.
• In R, the variables are not declared as some data type. The variables are assigned with R-Objects
and the data type of the R-object becomes the data type of the variable. There are many types of R-
objects. The frequently used ones are −
• Vectors
• Lists
• Matrices
• Arrays
• Factors
• Data Frames

A.I. Kalsekar Technical Campus, New Panvel


Creating Plots
• data()
• data(cars)
• Cars
• cars$speed
• cars$dist
• plot(cars$speed,cars$dist,xlab="speed",ylab = "Distance", main = "Cars speed and distance")
• barplot(BOD$Time,BOD$demand,xlab = "Time", ylab = "Demand", main = "Biochemical Oxygen
Demand",col="red",border="black")
• Rainfall_data<-c(18,23,29,24,12)
• month<-c("jun","july","aug","sept","oct")
• png(filename = "Bar chart.jpg")
• barplot(Rainfall_data,xlab="Month",ylab="Rainfall",main="Rainfall variation in monsoon
season",names.arg=month,col="black",border="Red")

A.I. Kalsekar Technical Campus, New Panvel


Accessing help and documentation in R

• The help() function and ? help operator in R provide access to the documentation
pages for R functions, data sets, and other objects, both for packages in the
standard R distribution and for contributed packages.
• To access documentation for the standard lm (linear model) function, for example,
enter the command help(lm) or help("lm"), or ?lm or ?"lm" (i.e., the quotes are
optional).
• help()
• help(lm)

A.I. Kalsekar Technical Campus, New Panvel


Reading Data
into R
• # Read data into R using the read.csv function
• # Set working directory
• setwd("C:/Users/91776/Desktop/Bank_project")
• # Read data from csv file
• read.csv("be.csv")
• student <- read.csv("be.csv")
• # view the data frame object in window
• view(student)

A.I. Kalsekar Technical Campus, New Panvel


Reading Data
into R
• #print data frame object to console
• print(student)
• # view just names of the variables in the data
frame
• names(student)
• studentbe <- read.csv("be.csv")
• # remove data frame
• remove(studentbe)

A.I. Kalsekar Technical Campus, New Panvel


Built-in functions in R

• The functions which are already created or


defined in the programming framework are
known as a built-in function.
• R has a rich set of functions that can be used
to perform almost every task for the user.

A.I. Kalsekar Technical Campus, New Panvel


Math Functions
• R provides the various mathematical functions to perform the mathematical calculation. These
mathematical functions are very helpful to find absolute value, square value and much more
calculations. In R, there are the following functions which are used:

S. No Function Description Example


1. abs(x) It returns the absolute value of input x. x<- -4 print(abs(x))
Output[1] 4
2. sqrt(x) It returns the square root of input x. x<- 4 print(sqrt(x))
Output[1] 2
3. ceiling(x) It returns the smallest integer which is larger x<- 4.5 print(ceiling(x))
than or equal to x. Output[1] 5

4. floor(x) It returns the largest integer, which is smaller x<- 2.5 print(floor(x))
than or equal to x. Output[1] 2

A.I. Kalsekar Technical Campus, New Panvel


5. trunc(x) It returns the truncate value of input x. x<- c(1.2,2.5,8.1)
print(trunc(x))
Output[1] 1 2 8
6. cos(x), It returns cos(x), sin(x) value of input x. x<- 4
sin(x), print(cos(x))
print(sin(x))
tan(x) print(tan(x))
Output[1] -06536436 [2] -0.7568025 [3]
1.157821
7. log(x) It returns natural logarithm of input x. x<- 4 print(log(x))
Output[1] 1.386294
8. log10(x) It returns common logarithm of input x. x<- 4 print(log10(x))
Output[1] 0.60206
9. exp(x) It returns exponent. x<- 4 print(exp(x))
Output[1] 54.59815

A.I. Kalsekar Technical Campus, New Panvel


String Function
• R provides various string functions to perform tasks. These string
functions
allow us to extract sub string from string, search pattern etc.
• String manipulation is a process used for handling and analyzing
strings. String functions help manipulate the contents of a string.
• There are the following string functions in R:

A.I. Kalsekar Technical Campus, New Panvel


S. No Function Description Example
1. paste() It concatenates strings together, separating them string1 <- "Hello"
with the sep string. It allows us to combine string2 <-
multiple strings into a single string. "world"
result <-
paste(string1,
string2, sep = ", ")
print(result)
Output: "Hello,
world"
2. substr() It extracts substrings from a character vector by text <- "Hello World.."
specifying the starting and ending positions. subs <- substr(text, start = 1, stop = 5)
print(subs)
Output: "Hello"
3. toupper() It converts a given string into uppercase letters. text <- "Hello World.."
up_text <- toupper(text)
print(up_text)
Output: "HELLO
WORLD.."
4. tolower() It converts a given string into lowercase letters. text <- "Hello World.."
lo_text <- tolower(text)
print(lo_text)
Output: "hello world.."
5. sub() It finds a pattern in a given character vector and text <- "Hello World.."
replaces it with a specified replacement text. new_text <- sub("World", "everyone", text)
Statistical Probability Functions

R provides extensive statistical probability functions, allowing programmers


to analyze and work with probability distributions.
 These functions include normal, binomial, Poisson, and uniform
distribution.
We can calculate cumulative probabilities, quantiles, and densities and generate
random numbers using these functions.

A.I. Kalsekar Technical Campus, New Panvel


S. No Function Description Example

1. pnorm() It calculates a given number's cumulative x <- 4.78


probability (area under the curve) in a standard cum_prob <-
normal distribution.
pnorm(x)
print(cum_prob)
Output: 0.9999991

2. qnorm() It calculates a given probability's quantile (inverse x <- 0.75


cumulative probability) in a standard normal quant <- qnorm(x)
distribution.
print(quant)
Output: 0.6744898
3. dnorm() It calculates a given number's density (probability x <- 1.43
mass) in a standard normal distribution. dens <-
dnorm(x)
print(dens)
Output:
0.1435046

4. rnorm() It generates random numbers from a standard rnum <-


normal distribution. rnorm(10)
A.I. Kalsekar Technical Campus, New Panvel print(rnum)
Other Statistical Functions
S. No Function Description Example
1. cor() It measures the correlation coefficient value x <- c(1, 5, 15, 20)
between two given vectors and calculates the
strength and direction of the linear relationship y <- c(2, 6, 18, 24)
between the two variables. corr<- cor(x, y)
print(corr)
Output:
0.9996147

2. var() It computes the sample variance of a given vector. x <- c(5, 7, 9, 12, 15)
varn <- var(x)
print(varn)
Output: 15.8

3. cov() It measures the covariance between two vectors. x <- c(1, 2, 3, 4, 5)


y <- c(6, 7, 8, 9, 10)
covr <- cov(x, y)
print(covr)
Output: 2.5

4. median() It computes the sample median of a given numeric df <- c(1, 2, 7, 12, 15)
vector. A.I. Kalsekar Technical Campus, New Panvel
med_value <- median(df)
S. No Function Description Example
5. sd() It computes the standard deviation of a given set of df <- c(1, 3, 5, 12, 20)
values. std_dev <-
sd(df)
print(std_dev)
Output: 7.79102

6. range() It returns a vector with two elements representing a df <- c(1, 3, 4, 5, 9, 10)
given dataset's minimum and maximum values. rang <- range(df)
print(rang)
Output: 1 10

7. diff() It computes the lagged differences between x <- c(4, 8, 12, 16, 20)
consecutive elements in a given vector. dif <- diff(x)
print(dif)
Output: 4 4
44

A.I. Kalsekar Technical Campus, New Panvel


Other Useful Functions
S. No Function Description Example
1. unique() It extracts only the unique elements or rows from x <- c(1, 2, 3, 2, 4, 1, 4, 3, 5)
the input object and returns a vector, data frame, unique_values <-
or array with duplicate elements removed.
unique(x)
print(unique_values)
Output: 1 2 3 4 5

2. sort() It sorts a vector in ascending order by default. x <- c(5, 2, 7, 1, 4, 9, 8)


sort_data <-
sort(x)
print(sort_data)
Output: 1 2 4 5 7 8
9

3. rev() It returns the reverse version of data objects. x <- c(39, 40, 41, 42, 43, 44, 45)
rev_x <- rev(x)
print(rev_x)
Output: 45 44 43 42 41 40 39
4. length() It determines the length or the number x <- c(1, 2, 3, 4, 5, 12, 15, 18)
of elements in a A.I.
vector orTechnical
Kalsekar an object.
Campus, New Panvel
x_length <- length(x)
Data Visualization in R
• Data visualization is the technique used to deliver insights in data using visual cues
such as graphs, charts, maps, and many others.
• This is useful as it helps in easy understanding of the large quantities of data and
thereby make better decisions regarding it.
• Data Visualization in R Programming Language
 The popular data visualization tools that are available are Tableau, Plotly, R, Google
Charts, Infogram, and Kibana. The various data visualization platforms have
different capabilities, functionality, and use cases.
 R is a language that is designed for statistical computing, graphical data analysis,
and scientific research.
 It is usually preferred for data visualization as it offers flexibility and minimum
required coding through its packages.

A.I. Kalsekar Technical Campus, New Panvel


R provides a series of packages for data visualization. These
packages are as follows:

A.I. Kalsekar Technical Campus, New Panvel


Types of Data Visualizations

1. Bar Plot
There are two types of bar plots- horizontal and vertical which represent data
points as horizontal or vertical bars of certain lengths proportional to the value of
the data item.
 They are generally used for continuous and categorical variable plotting.
By setting the horiz parameter to true and false, we can get horizontal and vertical
bar plots respectively.

A.I. Kalsekar Technical Campus, New Panvel


barplot(airquality$Ozone, barplot(airquality$Ozone,
main = 'Ozone Concenteration in air', main = 'Ozone Concenteration in air',
xlab = 'ozone levels’, xlab = 'ozone levels’,
horiz = FALSE) horiz = TRUE)

A.I. Kalsekar Technical Campus, New Panvel


Bar plots are used for the following scenarios:

•To perform a comparative study between the various data categories in the data
set.

•To analyze the change of a variable over time in months or years.

A.I. Kalsekar Technical Campus, New Panvel


Types of Data Visualizations contd…
2. Histogram
 A histogram is like a bar chart as it uses bars of varying height to represent
data distribution.
 However, in a histogram values are grouped into consecutive intervals called bins.

 In a Histogram, continuous values are grouped and displayed in these bins whose size
can be varied.

Histograms are used in the following scenarios:

•To verify an equal and symmetric distribution of the data.


•To identify deviations from expected values.

A.I. Kalsekar Technical Campus, New Panvel


data(airquality)

hist(airquality$Temp, main ="La Guardia


Airport's\Maximum Temperature(Daily)",
xlab ="Temperature(Fahrenheit)",
xlim = c(50, 125), col ="yellow",
freq = TRUE)

The parameter xlim can be used to specify the


interval within which all values are to be
displayed.

freq when set to TRUE denotes the frequency of the


various values in the histogram and when set to FALSE,
the probability densities are represented on the y-axis
A.I. Kalsekar Technical Campus, New Panvel
hist(airquality$Temp, main ="La Guardia hist(airquality$Temp, main ="La Guardia
Airport's\Maximum Airport's\Maximum
Temperature(Daily)", xlab Temperature(Daily)", xlab
="Temperature(Fahrenheit)", ="Temperature(Fahrenheit)",
xlim = c(50, xlim = c(50,
125), col 125), col
="yellow", freq = ="yellow", freq =
TRUE) A.I. Kalsekar Technical Campus, New Panvel
FALSE)
Types of Data Visualizations contd…
3. Box Plot
 The statistical summary of the given data is presented graphically using a boxplot.
A boxplot depicts information like the minimum and maximum data point, the
median value, first and third quartile, and interquartile range.

Box Plots are used for:


•To give a comprehensive statistical description of the data through a visual
cue.
•To identify the outlier points that do not lie in the inter-quartile range of data.

A.I. Kalsekar Technical Campus, New Panvel


data(airquality)

boxplot(airquality$Wind,
main = "Average wind speed\at La Guardia
Airport",
xlab = "Miles per hour", ylab = "Wind",
col = "orange", border = "brown",
horizontal = TRUE, notch = TRUE)

A.I. Kalsekar Technical Campus, New Panvel


A.I. Kalsekar Technical Campus, New Panvel
A.I. Kalsekar Technical Campus, New Panvel
A.I. Kalsekar Technical Campus, New Panvel
Types of Data Visualizations contd…
• Scatter Plot
• A scatter plot is composed of many points on a
Cartesian plane. Each point denotes the value taken by
two parameters and helps us easily identify the
relationship between them.

Scatter Plots are used in the following scenarios:


•To show whether an association exists between
bivariate data.
•To measure the strength and direction of such
a
relationship.

A.I. Kalsekar Technical Campus, New Panvel


Application Areas:
• Presenting analytical conclusions of the data to the non-analysts departments of
your company.
• Health monitoring devices use data visualization to track any anomaly in blood
pressure, cholesterol and others.
• To discover repeating patterns and trends in consumer and marketing data.
• Meteorologists use data visualization for assessing prevalent weather changes
throughout the world.
• Real-time maps and geo-positioning systems use visualization for traffic monitoring and
estimating travel time.

A.I. Kalsekar Technical Campus, New Panvel


Data Visualization using R
Base Package
• 1) Scatteí
Diagíam
• 2) Line Chaít
• 3) Baí Chaít
• 4) Histogíam
• 5) Boxplot
• 6) coííelation
matíix

A.I. Kalsekar Technical Campus, New Panvel


A.I. Kalsekar Technical Campus, New Panvel
Data Visualization using R Base
Package
• # mtcars dataset
• mtcars

• #pressure dataset
• pressure

• #airquality dataset
• airquality

A.I. Kalsekar Technical Campus, New Panvel


#scatter
plot
• plot(mtcars$mpg,mtcars$disp)

• #change the X and Y labels and also give some title


• plot(mtcars$disp,mtcars$mpg,xlab="disp",ylab="mpg",main="disp qnd mpg")

• #To change the symbol


• #Plot character or pch
• #In R base plot functions, two options are available lty and lwd, lty stands for line types, and lwd for line
width.

• plot(mtcars$disp,mtcars$mpg,xlab="disp",ylab="mpg",main="disp qnd mpg",pch=2)


• # want to change the color of the symbol,You can also use html hex color codetable

• plot(mtcars$disp,mtcars$mpg,xlab="disp",ylab="mpg",main="disp qnd
mpg",pch=21,col="red3",bg="slateblue3",lwd=5)

• # plotting by colour by group


• plot(mtcars$disp,mtcars$mpg,xlab="disp",ylab="mpg",main="disp qnd mpg",
• pch=21,col=mtcars$cyl,bg="slateblue3",lwd=5)
A.I. Kalsekar Technical Campus, New Panvel
# line
chart
• pressure
• plot(pressure$temperature,pressure$pressure)

• #to add lines


• plot(pressure$temperature,pressure$pressure,type="l")

• #create line chart with point together


• plot(pressure$temperature,pressure$pressure,type="b")

• #to change line type


• plot(pressure$temperature,pressure$pressure,type="l",lty=1,lwd=3,col="red",
• main="line chart of temperatur and pressure",
• xlab = "temperatur",ylab="pressure")

• #when we change to both


• plot(pressure$temperature,pressure$pressure,type="b",lty=1,lwd=3,col=rainbow(7),pch=3,
• main="line chart of temperatur and pressure",
• xlab = "temperatur",ylab="pressure")

A.I. Kalsekar Technical Campus, New Panvel


#Bar chart
• #for categorical variable
• #we want to see the frequency Distribution
• #using barplot()

• barplot(mtcars$cyl)

• #need to make aggregate data


• barplot(table(mtcars$cyl))

• barplot(table(mtcars$cyl),main = "Bar chart showing distribution of \n numger of cylinder",


• xlab="No of cylinder",ylab = "Frequency",col="pink",border = "blue")

• barplot(table(mtcars$cyl),main = "Bar chart showing distribution of \n numger of cylinder",


• xlab="No of cylinder",ylab = "Frequency",col="pink",border = NA)

• barplot(table(mtcars$cyl),main = "Bar chart showing distribution of \n numger of cylinder",


• xlab="No of cylinder",ylab = "Frequency",col=c("pink","blue","red"),border = NA)

A.I. Kalsekar Technical Campus, New Panvel


Mumbai University Questions
• Explain the Collaborative Filtering based recommendation System. How it is different from
content-based recommendation systems? (10 marks)
• For the Graph given below use the betweenness factor and find all communities. (10
• Marks)
• What is a Community in a social network graph? (5 marks)
• What is a Community in a social network graph? Explain any one algorithm for
• Finding communities in a social graph. (10 marks)
• Explain the Clique Percolation Method (CPM) used in direct discovery of communities In a
social graph with an example. (10 marks)
• Clearly explain two applications for the Recommendation system. (5 marks)
• What are Recommendation systems? (5 marks)
• How recommendation is done based on the properties of product? Explain with a suitable
example. (10 marks)

You might also like