0% found this document useful (0 votes)
58 views6 pages

Bda Iat-1 Answer Key

Uploaded by

yash.engineering
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views6 pages

Bda Iat-1 Answer Key

Uploaded by

yash.engineering
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

REG.

NO. :
5113

(Approved by AICTE, affiliated to Anna University & Accredited by NBA)


DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
INTERNAL ASSESSMENT TEST – I – ANSWER KEY

Sem & Branch :III / CSE’A


Subject :BIG DATA ANALYTICS

Sub. Code :CCS334


Part-A
Answer all the questions [6 x 2 = 12 Marks]
1. Differentiate Big Data processing and distributed processing. (K2)(CO1)
Big Data Processing: Refers to the techniques, tools, and frameworks used to handle and analyze
large datasets (often characterized by the "3 Vs": volume, velocity, and variety) that are too
complex for traditional data processing tools. Big data processing often requires specialized
technologies to capture, store, manage, and analyze these vast amounts of data efficiently.
Distributed Processing: Refers to the computing technique where multiple computers (or nodes)
work together to process data in parallel, splitting the workload among different machines. It is a
broader concept that can be used for any kind of data processing (including big data), but the key
idea is that the workload is divided across multiple systems.

2. List the difference between inter and trans firewall analytics. (K1)(CO1)

3. Explain NIST definition to define cloud computing. (K2)(CO1)


Cloud computing refers to the on-demand availability of computing resources over internet.
These resources include servers, storage, databases, software, analytics, networking and
intelligence over the Internet and all these resources can be used as per requirement of the
customer. In cloud computing customers have to pay as per use. It is very flexible and can be
resources can be scaled easily depending upon the requirement.
REG.
NO. :
5113

4. What are the characteristics of firewall? (K1)(CO1)


 Network size and complexity: Larger and more complex networks benefit more from
inter-firewall analytics for comprehensive monitoring.
 Security needs and threats: Trans-firewall analytics is crucial for networks handling
sensitive data and facing advanced threats.
 Budget and resources: Implementing trans-firewall analytics requires
additional investment in specialized hardware and software.
5. Why is Hadoop important? (K3)(CO1)
Salient features of Apache Hadoop:
 Free to use and offers an efficient storage solution for businesses.
 Offers quick access via HDFS (Hadoop Distributed File System).
 Highly flexible and can be easily implemented with MySQL, and JSON.
 Highly scalable as it can distribute a large amount of data in small segments.
 It works on small commodity hardware like JBOD or a bunch of disks.
6. What is machine generated data? (K1)(CO1)
Machine-generated data refers to information created without human intervention, typically by
devices, sensors, software, or machines. In the context of big data analytics, it plays a crucial role
as it provides large volumes of data in real-time or near real-time. This type of data is often
structured and used for predictive analytics, monitoring, and decision-making processes.
Here are some examples of machine-generated data:
1. Log files: Generated by web servers, application servers, or databases, capturing activities,
transactions, errors, and usage patterns.
2. Sensor data: Created by IoT (Internet of Things) devices, monitoring various environmental
parameters like temperature, humidity, motion, and pressure.
3. Telecommunication data: Call records, network traffic data, and performance metrics.
Part-B
Answer any two of the followings [2 x 16 = 32 Marks]

7. (i) Elaborate the significance of three Vs in the context of Big Data. (10)(K2)(CO1) )
Big Data- CONCEPT
Big data refers to extremely large and diverse collections of structured, unstructured, and semi-
structured data that continues to grow exponentially over time. These datasets are so huge and
complex in volume, velocity, and variety, that traditional data management systems cannot store,
process, and analyze them.
The Vs of big data
Volume
Velocity
Variety
Veracity:
Variability:
Value:
REG.
NO. :
5113

8. (i) Define unstructured data? Compare structured and unstructured data. (8)(K1)(CO1) )
Types of Big Data

1.Structured data

2.Semi-Structured data
REG.
NO. :
5113

(ii) Explain the concept of web analytics and list its importance in detail.
( (8)(K2)(CO1)

WEB ANALYTICS
Importance of Web Analytics
Web Analytics needed to assess the success rate of a website and its associated
business. Using Web Analytics, we can −
 Assess web content problems so that they can be rectified.
 Have a clear perspective of website trends
 Monitor web traffic and user flow
 Demonstrate goals acquisition
 Figure out potential keywords
 Identify segments for improvement

Key Performance Indicator (KPI)


It depends upon the business type and strategy. KPI varies from one business to another.
Micro and macro Level Data Insights
Google Analytics gives you more insight data accurately. You can understand the data at two levels
micro level and macro level.
Micro Level Analysis
It pertains to an individual or a small group of individuals. For example, number of times job
application submitted, number of times print this page was clicked, etc.
Macro Level Analysis

It is concerned with the primary business objectives with huge groups of people such as
communities, nation, etc. For example, number of conversions in a particular demographic.
REG.
NO. :
5113

9. List the role and implications of crowdsourcing analytics in today’s data-driven landscape.
(16)(K2)(CO1)
CROWD SOURCING ANALYTICS

Crowdsourcing is a sourcing model in which an individual or an organization gets support


from a large, open-minded, and rapidly evolving group of people in the form of ideas, micro-tasks,
finances, etc. Crowdsourcing typically involves the use of the internet to attract a large group of
people to divide tasks or to achieve a target. The term was coined in 2005 by Jeff Howe and Mark
Robinson. Crowdsourcing can help different types of organizations get new ideas and solutions,
deeper consumer engagement, optimization of tasks, and several other things.
Where Can We Use Crowdsourcing?
Crowdsourcing is touching almost all sectors from education to health. It is not only accelerating
innovation but democratizing problem-solving methods. Some fields where crowdsourcing can be
used.
1. Enterprise

2. IT
3. Marketing
4. Education
5. Finance

6. Science and Health


Examples of Crowdsourcing
1. Doritos: It is one of the companies which is taking advantage of crowdsourcing for a long time for
an advertising initiative. They use consumer-created ads for one of their 30-Second Super Bowl
Spots(Championship Game of Football).
2. Starbucks: Another big venture which used crowdsourcing as a medium for idea generation. Their
white cup contest is a famous contest in which customers need to decorate their Starbucks cup with
an original design and then take a photo and submit it on social media.
3. Lays:” Do us a flavor” contest of Lays used crowdsourcing as an idea-generating medium. They
asked the customers to submit their opinion about the next chip flavor they want.
4. Airbnb: A very famous travel website that offers people to rent their houses or apartments by listing
them on the website. All the listings are crowdsourced by people.
Here is the list of some famous crowdsourcing and crowdfunding sites.
1. Kickstarter

2. GoFundMe
3. Patreon
4. RocketHub
REG.
NO. :
5113

Part-C (Compulsory)
Answer the questions [1 x 16 = 16 Marks]

10. What is Open-Source technology? Explain the advantages, disadvantages and


applications of Open-Source. (16)(K2)(CO1)

OPEN SOURCE TECHNOLOGIES / BIG DATA ANALYTICS TOOLS


1. APACHE Hadoop

Features of Apache Hadoop:


2.Cassandra
Features of APACHE Cassandra:
3.Qubole
Features of Qubole:
4.Xplenty
Features of Xplenty:
5.Spark
Features of APACHE Spark:
6.Mongo DB
Features of Mongo DB:
7.Apache Storm
Features of Storm:
8.SAS
Features of SAS:
9.Data Pine
Features of Datapine:
10. Rapid Miner
Features of Rapid Miner:

Knowledge- K1, Comprehension-K2 Application –K3,


Analysis- K4, Synthesis- K5, Evaluation – K6

You might also like