0% found this document useful (0 votes)
15 views46 pages

BDA LabManual 2024-25

It is lab manual for bda subject sem 7 computer engineering
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views46 pages

BDA LabManual 2024-25

It is lab manual for bda subject sem 7 computer engineering
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

ATHARVA EDUCATIONAL TRUST'S

ATHARVA COLLEGE OF ENGINEERING


(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

Academic Year: 2024-2025 Semester: VII


Class / Branch / Division: BE/CMPN/1 & 2
Subject: Big Data and Analytics Lab
Lab Outcomes
LO1: To interpret business models and scientific computing paradigms, and apply software tools for big
data analytics.
LO2: To implement algorithms that uses Map Reduce to apply on structured and unstructured data
LO3:To perform hands-on NoSql databases such as Cassandra, HadoopHbase, MongoDB, etc.
LO4: To implement various data streams algorithms.
LO5: To develop and analyze the social network graphs with data visualization techniques.
Experiment List
S No Title LO
1 Copying File to Hadoop. LO1
-Copy from Hadoop File system and deleting file.
-Moving and displaying files in HDFS.
-Programming exercises on Hadoop
2 To install and configure MongoDB/ Cassandra/ HBase/ Hypertable to execute LO3
NoSQLcommands
3 Experiment on Hadoop Map-Reduce: LO2
-Write a program to implement a word count program using MapReduce.
4 Experiment on Hadoop Map-Reduce: LO2
-Implementing simple algorithms in Map-Reduce: Matrix multiplication etc
5 Create HIVE Database and Descriptive analytics-basic statistics. LO2
6 Data Stream Algorithms: LO4
- Implement Bloom Filter using any programming language
7 Social Network Analysis using R (for example: Community Detection Algorithm) LO5
8 Mini Project: One real life large data application to be implemented (Use standard LO5
Datasets available on the web).
- Streaming data analysis – use flume for data capture, HIVE/PYSpark for analysis of
twitter data, chat data, weblog analysis etc.
- Recommendation System (for example: Health Care System, Stock Market Prediction,
Movie Recommendation, etc.) SpatioTemporal DataAnalytics

Subject Incharge HOD CMPN Dept


Prof.Ashwini Gaikwad Dr. Suvarna Pansambal
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

EXPERIMENT NO: 1
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

EXPERIMENT NO.2

Aim:To study and implement the Nosql program.

Theory:

NoSQL databases have grown in popularity with the rise of Big Data applications. In
comparison to relational databases, NoSQL databases are much cheaper to scale, capable
of handling unstructured data, and better suited to current agile development approaches.

The advantages of NoSQL technology are compelling but the thought of replacing a
legacy relational system can be daunting. To explore the possibilities of NoSQL in your
enterprise, consider a small-scale trial of a NoSQL database like MongoDB. NoSQL
databases are typically open source so you can download the software and try it out for
free. From this trial, you can assess the technology without great risk or cost to your
organization.

Commands of Neo4j:

1)Create Clause

Creating a Single node

CREATE (sample)

Creating Multiple Nodes

CREATE (sample1),(sample2)

Creating a Node with a Label

CREATE (Vidya:Student)

Create Node with Properties

CREATE (Vivek:student{name: "Vivek Mittal", age:25, City: "Delhi"})

2)Creating Relationships

CREATE (Vivek1:student{name: "Vivek Mittal", age:25, City: "Delhi"})

CREATE (MU: University {name: "Mumbai University"})


ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

CREATE (Vivek1)-[r:STUDENT_OF]->(MU)

3)Creating Complete path

CREATE (Vivek2:student{name: "Vivek Mittal", age:25, City: "Delhi"})

CREATE (MU1:University {name: "Mumbai University"})

create (Ind:Country{name:"India"})

CREATE (Vivek2)-[r:STUDENT_OF]->(MU1)

create (MU1)-[r1:UNIVERSITY_OF]->(Ind)

return Vivek2, MU1, Ind

3)Merge

MERGE (Rashi:person) RETURN Rashi

4)Delete

MATCH (n) DETACH DELETE n

Sample Output:
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

EXPERIMENT NO: 3
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

EXPERIMENT NO: 6
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

EXPERIMENT NO: 5

Aim:To implement matrix multiplication using Map reduce.

Theory:

Map reduce
MapReduce is a style of computing that has been implemented in several systems,
including Google’s internal implementation (simply called MapReduce)and the popular
open-source implementation Hadoop which can be obtained,along with the HDFS file
system from the Apache Foundation. You can usean implementation of MapReduce to
manage many large-scale computations in a way that is tolerant of hardware faults. All
you need to write are two functions, called Map and Reduce, while the system manages
the parallel execution, coordination of tasks that execute Map or Reduce, and also deals
with the possibility that one of these tasks will fail to execute. In brief, a MapReduce
computation executes as follows:
1. Some number of Map tasks each are given one or more chunks from a distributed file
system. These Map tasks turn the chunk into a sequence of key-value pairs. The way
key-value pairs are produced from the input data is determined by the code written by the
user for the Map function.
2. The key-value pairs from each Map task are collected by a master controller and sorted
by key. The keys are divided among all the Reduce tasks, so all key-value pairs with the
same key wind up at the same Reduce task.
3. The Reduce tasks work on one key at a time, and combine all the values associated
with that key in some way. The manner of combination of values is determined by the
code written by the user for the Reduce function.

Matrix Multiplication

Suppose we have an nxn matrix M, whose element in row i and column j will be denoted
by Mij. Suppose we also have vector v of length n, whose jthelement is Vj . Then the
matrix vector product is the vector of length n, whose ith element xi .

Program
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

Matrix_Mapper.py
#!/usr/bin/env python
import sys
for line in sys.stdin:
line = line.strip()
entry = line.split(",")
key = entry[0]
value= line
if key=='a':
print('{0}\t{1}'.format(key,value))
elif key=='b':
print('{0}\t{1}'.format(key,value))
Matrix_Reducer.py
#!/usr/bin/env python
import sys
a={}
b={}
for input_line in sys.stdin:
input_line = input_line.strip()

this_key,value = input_line.split("\t",1)
v = value.split(",")
if this_key=='a':
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

a[(int(v[1]),int(v[2]))]=int(v[3])
elif this_key=='b':
b[(int(v[1]),int(v[2]))]=int(v[3])

#and fill the blanks


#for i in range(0,5):
# for j in range(0,5):
# if (i,j) not in a.keys():
# a[(i,j)]=0
# if (j,i) not in b.keys():
# b[(j,i)]=0
result=0
#compute the multiplication A*Bij = SUM(Aik * Bkj) for k in 0..4
for i in range(0,4):
for j in range(0,5):
for k in range(0,3):
result = result + a[(i,k)]*b[(k,j)]
print("({0},{1})\t{2}".format(i,j,result))
result =0
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

output

(0,0) 30
(0,1) 51
(0,2) -5
(0,3) 15
(0,4) 14
(1,0) 15
(1,1) -12
(1,2) -25
(1,3) 12
(1,4) 28
(2,0) 50
(2,1) 65
(2,2) 5
(2,3) 33
(2,4) -26
(3,0) -5
(3,1) 2
(3,2) -3
(3,3) -6
(3,4) 16
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

EXPERIMENT NO: 5
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

Conclusion: The Hive tool successfully demonstrated


ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

EXPERIMENT NO.6

Aim: Implement Bloom Filter using any programming language

Theory:

Filtering Streams

Another common process on streams is selection, or filtering. We want to accept those


tuples in the stream that meet a criterion. If the selection criterion is a property of the
tuple that can be calculated (e.g., the first component is less than 10), then the selection is
easy to do . Suppose we have a set S of one billion allowed email addresses – those that
we will allow through because we believe them not to be spam. The stream consists of
pairs: an email address and the email itself. Since the typical email address is 20 bytes or
more, it is not reasonable to store S in main memory.

The Bloom Filter

In the technique known as Bloom filtering, we use that main memory as a bit array.

A Bloom filter consists of:

1. An array of n bits, initially all 0’s.

2. A collection of hash functions h1, h2, . . . , hk. Each hash function maps “key” values
to n buckets, corresponding to the n bits of the bit-array.

3. A set S of m key values

The purpose of the Bloom filter is to allow through all stream elements whose keys are in
S, while rejecting most of the stream elements whose keys are not in S.To initialize the
bit array, begin with all bits 0. Take each key value in S and hash it using each of the k
hash functions. Set to 1 each bit that is hi(K) for some hash function hi and some key
value K in S. To test a key K that arrives in the stream, check that all of

h1(K), h2(K), . . . , hk(K) are 1’s in the bit-array. If all are 1’s, then let the stream element
through. If one or more of these bits are 0, then K could not be in S, so reject the stream
element.

Example:
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

Program
#include <bits/stdc++.h>
#define ll long long
using namespace std;

// hash 1
int h1(string s, int arrSize)
{
ll int hash = 0;
for (int i = 0; i < s.size(); i++)
{
hash = (hash + ((int)s[i]));
hash = hash % arrSize;
}
return hash;
}

// hash 2
int h2(string s, int arrSize)
{
ll int hash = 1;
for (int i = 0; i < s.size(); i++)
{
hash = hash + pow(19, i) * s[i];
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

hash = hash % arrSize;


}
return hash % arrSize;
}

// hash 3
int h3(string s, int arrSize)
{
ll int hash = 7;
for (int i = 0; i < s.size(); i++)
{
hash = (hash * 31 + s[i]) % arrSize;
}
return hash % arrSize;
}

// hash 4
int h4(string s, int arrSize)
{
ll int hash = 3;
int p = 7;
for (int i = 0; i < s.size(); i++) {
hash += hash * 7 + s[0] * pow(p, i);
hash = hash % arrSize;
}
return hash;
}

// lookup operation
bool lookup(bool* bitarray, int arrSize, string s)
{
int a = h1(s, arrSize);
int b = h2(s, arrSize);
int c = h3(s, arrSize);
int d = h4(s, arrSize);

if (bitarray[a] && bitarray[b] && bitarray


ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

&& bitarray[d])
return true;
else
return false;
}

// insert operation
void insert(bool* bitarray, int arrSize, string s)
{
// check if the element in already present or not
if (lookup(bitarray, arrSize, s))
cout << s << " is Probably already present" << endl;
else
{
int a = h1(s, arrSize);
int b = h2(s, arrSize);
int c = h3(s, arrSize);
int d = h4(s, arrSize);

bitarray[a] = true;
bitarray[b] = true;
bitarray[c] = true;
bitarray[d] = true;

cout << s << " inserted" << endl;


}
}

// Driver Code
int main()
{
bool bitarray[100] = { false };
int arrSize = 100;
string sarray[33]
= { "abound", "abounds", "abundance",
"abundant", "accessible", "bloom",
"blossom", "bolster", "bonny",
"bonus", "bonuses", "coherent",
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

"cohesive", "colorful", "comely",


"comfort", "gems", "generosity",
"generous", "generously", "genial",
"bluff", "cheater", "hate",
"war", "humanity", "racism",
"hurt", "nuke", "gloomy",
"facebook","flower", "twitter" };
for (int i = 0; i < 33; i++) {
insert(bitarray, arrSize, sarray[i]);
}
return 0;
}
Output
/tmp/bloomfilter.o
abound inserted
abounds inserted
abundance inserted
abundant inserted
accessible inserted
bloom inserted
blossom inserted
bolster inserted
bonny inserted
bonus inserted
bonuses is Probably already present
coherent inserted
cohesive inserted
colorful inserted
comely is Probably already present
comfort inserted
gems inserted
generosity inserted
generous inserted
generously inserted
genial inserted
bluff is Probably already present
cheater inserted
hate inserted
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

war is Probably already present


humanity inserted
racism inserted
hurt inserted
nuke is Probably already present
gloomy is Probably already present
facebook inserted
flower inserted
twitter inserted

Conclusion: Successfully Implemented Bloom Filter using any programming language


ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

EXPERIMENT NO: 7

Aim: Social Network Analysis using R (for example: Community Detection Algorithm)

Theory

Social network Analysis

It is the process of exploring or examining the social structure by using graph theory. It is
used for measuring and analyzing the structural properties of the network. It helps to
measure relationships and flows between groups, organizations and other connected
entities.

Social Network Analysis Terminology

● A network is represented as a graph which shows links between each vertex and
its neighbors
● A line indicating a link between vertices is called an edge
● A group of vertices that are mutually reachable by following edges on the graph is
called component
● The edges followed from one vertex to another are called a path

The following software is required in order to perform network analysis

● R software
● Package
○ igraph
○ sna(social network analysis)

Community Detection Algorithm

A community with respect to graphs, can be defined as a subset of nodes that are densely
connected to each other and loosely connected to the nodes in the other communities in
the same graph. Detecting communities in a network is one of the most important tasks in
network analysis. In a large scale network such as an online social network we could
have millions of nodes and edges. Detecting communities in such networks becomes a
herculean task. Therefore we need community detection algorithms that can partition the
network into multiple communities

Girvan Newman Algorithm for Community Detection


ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

Under the Girvan Newman algorithm the communities in a graph are discovered
by iteratively removing the edges of the graph based on the edge betweenness removed
first.

Edge betweenness centrality (EBC)

The edge betweenness centrality can be defined as the number of shortest paths
that pass through an edge in a network. Each and every edge is given an EBC score based
on the shortest paths among all the nodes in the graph

Program

# Social Network Analysis


library(igraph)
g <- graph(c(1,2,2,3,3,4,4,1),
directed = F,
n=7)

g1 <- graph(c("Amy", "Ram", "Ram", "Li", "Li", "Amy",


"Amy", "Li", "Kate", "Li"),
directed=T)

# Network measures
degree(g1, mode='all')
degree(g1, mode='in')
degree(g1, mode='out')

diameter(g1, directed=F, weights = NA)


edge_density(g1, loops = F)
ecount(g1)/(vcount(g1)*(vcount(g1)-1))
reciprocity(g1)
closeness(g1, mode='all', weights = NA)
betweenness(g1, directed=T, weights=NA)
edge_betweenness(g1, directed=T, weights=NA)

# Read data file


data<-read.csv(file.choose(), header=T)
View(data)
y <-data.frame(data$first, data$second, data$grade)
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

# Create network
net <- graph.data.frame(y,directed=F)
V(net)$label <- V(net)$name
V(net)$degree <- degree(net)

# Histogram of node degree


hist(V(net)$degree)

# Network diagram
plot(net)

# Highlighting degrees & layouts


plot(net,
vertex.color = rainbow(52),
vertex.size = V(net)$degree*0.4,
edge.arrow.size = 0.1,
layout=layout.fruchterman.reingold)

# Hub and authorities


hs <- hub_score(net)$vector
as <- authority.score(net)$vector
par(mfrow=c(1,2))
set.seed(123)
plot(net,
vertex.size=hs*30,
main = 'Hubs',
vertex.color = rainbow(52),
edge.arrow.size=0.1,
layout = layout.kamada.kawai)

plot(net,
vertex.size=as*30,
main = 'Authorities',
vertex.color = rainbow(52),
edge.arrow.size=0.1,
layout = layout.kamada.kawai)
par(mfrow=c(1,1))
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

# Community detection
net <- graph.data.frame(y, directed = F)
cnet <- cluster_edge_betweenness(net)
plot(cnet,net,vertex.size=10,vertex.label.cex=0.8)
Output
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade
ATHARVA EDUCATIONAL TRUST'S
ATHARVA COLLEGE OF ENGINEERING
(Approved by AICTE, Recognized by Government of Maharashtra
& Affiliated to University of Mumbai - Estd. 1999 - 2000)
ISO 2100:2018 ISO 14001:2015 ISO 9001:2015
NAAC Accredited A+ Grade

EXPERIMENT NO: 8

Aim: Case Study: One real life large data application

Mini Project: One real life large data application to be implemented (Use standard

Datasets available on the web).

- Streaming data analysis – use flume for data capture, HIVE/PYSpark for analysis
of twitter data, chat data, weblog analysis etc.

- Recommendation System (for example: Health Care System, Stock Market

Prediction, Movie Recommendation, etc.) SpatioTemporal DataAnalytics

Note: Students choose different topic for case study

Examples :

● Movie Recommender Systems using Hive


● To build world cloud ,a text mining method using R for easy to understand and
visualization than a table

You might also like