0% found this document useful (0 votes)
11 views5 pages

BDA Module Wise Important Questions

This document serves as a quick reference for students studying Big Data Analysis, providing a comprehensive overview of topics and numericals organized by module. It covers key concepts such as Big Data characteristics, Hadoop components, MapReduce, and various algorithms for mining and analyzing big data. Additionally, it includes links to external resources for further study and examples related to real-world applications.

Uploaded by

darishdias30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

BDA Module Wise Important Questions

This document serves as a quick reference for students studying Big Data Analysis, providing a comprehensive overview of topics and numericals organized by module. It covers key concepts such as Big Data characteristics, Hadoop components, MapReduce, and various algorithms for mining and analyzing big data. Additionally, it includes links to external resources for further study and examples related to real-world applications.

Uploaded by

darishdias30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

SEM VIII ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ Subject - Big Data Analysis

hi
A quick Reference for the students to all the Documentations and numericals for the subject

s
an
Revisions of all the topics module wise :

Module Topics :Numericals Topics : Theory ( answer ot eb

aw
supported with nice examples
related to real world
applications/

y
graphs/tables/assumptions
suitable to the questions)
1: Introduction to

ur 1.​ Big Data characteristics (V’s)


iS
Big Data 2.​ Big Data Challenges,

https://docs.google.com/document/d/16Gdf3qg2I
Fm07JqT7p-11noLwHX4bCSM/edit?usp=sharing
&ouid=111378912346065138505&rtpof=true&sd=t
l
rue
na

2. Introduction Mostly numericals on calculation number of 1.​ Core Hadoop Components;


to Big Data blocks for a given big file( 1 mark question) 2.​ Limitations and advantages of hadoop
So

Frameworks 3.​ Hadoop Ecosystem;


4.​ HDFS Architecture
5.​ NOSQL architectural Patterns (Key-value
store, column store, Database store, Graph
store )
6.​ CAP theorem(brewer’s theorem )
7.​ NoSQL Business drivers

hi
8.​ Difference between SQL and NOSQL

https://docs.google.com/document/d/1OC

s
dkF-Wl8afBuXaEOYj81UBQWsvAXgU-v9q
dWlKMW_Q/edit?usp=sharing

an
3. MapReduce Introduction 1.​ Relational-Algebra Operations - join,
Paradigm Matrix Multiplications: ( any method can eb union, intersection, Grouping and
asked) Aggregation
1)​ Matrix vector multiplication https://docs.google.com/presentation/d/1j

w
2)​ 1-step matrix matrix multiplications eqEZ-GEfvN4qYHcw4FF32Be2O0m2KIc/e
3)​ 2-step matrix matrix multiplications dit?usp=sharing&ouid=1113789123460651

ya
(pseudocodes ) 38505&rtpof=true&sd=true

https://drive.google.com/file/d/1akJU_l1T-X_Bg7 https://medium.com/swlh/relational-operat
usCk5B-JWNCYDWL9M9/view?usp=sharing ions-using-mapreduce-f49e8bd14e31#:~:t

ur
ext=For%20example%2C%20If%20a%20re
https://docs.google.com/document/d/15yPb7OyT lation,values%20and%20output%20the%2
GT8QSNdYRXfaQ-yX6nEqJFor/edit?usp=sharing 0result.
iS
&ouid=111378912346065138505&rtpof=true&sd=t
rue

https://drive.google.com/file/d/15nHeLdkHkKm8x
b492OuFseOORyBUBcsG/view?usp=sharing 2.​ Grouping by Key,
l
3.​ Shuffle , sort and Reduce Tasks
na

4.​ Combiners,
Mapreduce: Word count problems 5.​ Details of MapReduce Execution,
https://drive.google.com/file/d/1J43oANUTDboht 6.​ Coping With Node Failures.
12Tizz6w_GEUEjr8f2z/view?usp=sharing
So

4. Mining Big 1)​ The Bloom Filter:(if the value is present or 1.​ The Stream Data Model: A
Data Streams not) DataStream-Management System(DSMS)
https://docs.google.com/presentation/d/1k 2.​ Difference between DSMS and DBMS
u40VHl8xGJMp71EiKOAKJ5MTrjSXXu0/ed 3.​ Issues in Stream Processing.

hi
it?usp=sharing&ouid=11137891234606513 4.​ Sampling Techniques.
8505&rtpof=true&sd=true 5.​ (FM algorithm, DGIM, Bloom filter)

s
2)​ Flejolet martin algorithm (FM https://docs.google.com/presentation/d/1l2380Qt
algorithm):(no of distinct elements) A8XPsUGDzX4MaxWAtK4Gt1NUB/edit?usp=shar

an
https://docs.google.com/presentation/d/1k ing&ouid=111378912346065138505&rtpof=true&s
tgOTlkuz0B2XeaIYC8K0laS27VAQPOW/ed d=true
it?usp=sharing&ouid=11137891234606513
8505&rtpof=true&sd=true

w
https://drive.google.com/file/d/1s9DzRjMP
FhLVu91izW_UZDp-1WekEp2y/view?usp=

ya
sharing

3)​ DGIM Algorithm :(count of one’s)

ur
https://docs.google.com/presentation/d/1c
g_7NGbuFmT00fl3W_HIdMH3-6xtmPX8tv
Y2QSORJ6A/edit?usp=sharing
iS
5. Big Data Mining 1)​ Algorithm of Park, Chen, and 1.​ CURE algorithm :
Algorithms Yu. (PCY algorithm for frequent items) 2.​ SON algorithm’
3.​ Canopy Clustering,
l
https://docs.google.com/presentation/d/1Ohd9p4
na

evmdxSckaG-J-Ap3j7Htnyazc5/edit?usp=sharing https://drive.google.com/file/d/1tBWgvUn
&ouid=111378912346065138505&rtpof=true&sd=t YNZEZNA9jiBom0yJi6Cua-bxN/view?usp=
rue sharing
So

https://drive.google.com/file/d/1s9vOThD7tX8Ix5
VkFKsnJddnQHHmzK1t/view?usp=sharing
4.​ PCY algorithm problem
https://drive.google.com/file/d/1miG1Pr2Q
CqV5jHZ7HsFM5uzEixp1tqe1/view?usp=s
haring 5.​ Multistage and multi-hash algorithm:
https://docs.google.com/document/d/11Cx

hi
oSopZA4aR1Sh6x7Ee69mVOtESHVR2CUt
2)​ Distance measure problems : jaccard vCqqEpLU/edit?usp=sharing
distance, cosine distance, hamming

s
distance, edit distance
:https://docs.google.com/presentation/d/1

an
pBxucVb7NvGwvzrV-y7CZ-nAu2ym5UvF/e
dit?usp=sharing&ouid=1113789123460651
38505&rtpof=true&sd=true

aw
6. Big Data 1)​ Page rank using teleportation (damping 1.​ Structure of the web- Bow tie structure
Analytics factor beta) 2.​ Content-Based Recommendations &
Applications https://drive.google.com/file/d/1D8HpTMT Collaborative Filtering

y
ubxGcfJE0xJmhjKi0Njzr8zeC/view?usp=s https://docs.google.com/presentation/d/1s
haring sk5AIyKsAnEBjJXzmmvrahAyB8KoQ4j/ed

ur
it?usp=sharing&ouid=11137891234606513
https://drive.google.com/file/d/1G2VxVrkT 8505&rtpof=true&sd=true
d9R5xundYVZT0zGFjLAU68b2/view?usp=
iS
sharing 3.​ HITS algorithm - hubs and authority
explanation

2)​ HITs algorithm 4.​ Page rank algorithm


https://drive.google.com/file/d/1C2_vSwd3
l
t3nXYgxMj2tPhfZHNL3HEuOJ/view?usp= https://docs.google.com/presentation/d/1
na

sharing B6-WlW6Fs_DGJlbtq8MP8BKXA8-9trNT/e
dit?usp=sharing&ouid=1113789123460651
Correct answer of jan 2023 question 38505&rtpof=true&sd=true
So

https://drive.google.com/file/d/182qjH6Nkqqzb9f6
lPnKnrYq1uKGuE3Ql/view?usp=sharing
3)​ Girvan Newman algorithm - community
finding

hi
https://docs.google.com/presentation/d/1
AEr84eiDQgds3dAmAY6FiyH7XOjvOMQV/
edit?usp=sharing&ouid=111378912346065

s
138505&rtpof=true&sd=true

an
Problem
https://drive.google.com/file/d/1EnjiyOl1d
xkhmSl_vJPMxt5EPp97RuNm/view?usp=
sharing

aw
4)​ Clique Percolation and community finding
algorithm

y
https://drive.google.com/file/d/1yAfUPcfuI
PLIf1WidOJJdpFncY7QEwp6/view?usp=s

ur
haring

https://drive.google.com/file/d/1unoDwqxx
iS
DM0_L9ytBudYcmYTA57bme0r/view?usp
=sharing
l
na

Subject Incharge - Sonali Suryawanshi


So

You might also like