Msbte UT 1 QB Answers

The document outlines the features and applications of Hadoop, emphasizing its distributed storage, scalability, fault tolerance, and parallel processing capabilities. It also discusses Big Data Analytics (BDA) types, the evolution of Hadoop, and the architecture of Hadoop, including its ecosystem and various stacks. Additionally, it highlights the characteristics of big data, the analytics flow, data science, and challenges associated with big data management.

Uploaded by

yashmoreyt0221

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views13 pages

Msbte UT 1 QB Answers

Uploaded by

yashmoreyt0221

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

2 Marks

Q.1 State features of Hadoop ?

 Hadoop is an open-source framework designed for
storing and processing large datasets in a distributed
computing environment
Features of Hadoop :-
1. Distributed Storage – Hadoop uses HDFS
(Hadoop Distributed File System) to store large
datasets across multiple nodes, ensuring high
availability and fault tolerance.
2. Scalability – It can scale horizontally by adding
more nodes to handle increasing amounts of data
efficiently.
3. Fault Tolerance – Hadoop automatically replicates
data across multiple nodes, ensuring data safety
even if a node fails.
4. Parallel Processing – The MapReduce framework
enables the parallel processing of large datasets,
improving processing speed and efficiency
Q.2 List domain specific features of Hadoop
 Hadoop is an open-source framework designed for
storing and processing large datasets in a distributed
computing environment
Domain specific features of Hadoop are as follows :-
1. Healthcare – Processes large volumes of medical
records, genomic data, and real-time patient
monitoring data for predictive analytics and
disease detection.
2. Finance – Enables fraud detection, risk
management, and real-time transaction analysis by
handling vast amounts of financial data efficiently.
3. E-commerce – Supports recommendation
systems, customer behaviour analysis, and
inventory management by processing big data
from user interactions.
4. Telecommunications – Helps in network
optimization, call detail record analysis, and
predictive maintenance by analyzing large-scale
data traffic patterns

Q.3 Define bda and state it's type

 Big Data Analytics (BDA) is the process of
examining large and complex datasets to uncover
hidden patterns, correlations, trends, and insights for
decision-making.
Types of BDA:-
1. Descriptive Analytics – Summarizes historical
data to understand past trends
2. Diagnostic Analytics – Identifies causes of past
events using data patterns
3. Predictive Analytics – Uses statistical models
and machine learning to forecast future outcomes.
4. Prescriptive Analytics – Recommends actions
based on predictive insights to optimize decision-
making.
Q.4 State different big data stack
 Here are four different Big Data Stacks:
Hadoop Ecosystem Stack – Includes Hadoop
Distributed File System (HDFS), MapReduce, YARN, and
tools like Hive, Pig, and HBase for big data processing
and storage.
Lambda Architecture Stack – Combines batch
processing (Hadoop, Spark) with real-time processing
(Apache Storm, Kafka, Flink) for scalable and fault-
tolerant data analytics.
Kappa Architecture Stack – Focuses on real-time
data processing using tools like Apache Kafka, Apache
Flink, and Apache Samza, eliminating batch layers.
SMACK Stack – Consists of Spark, Mesos, Akka,
Cassandra, and Kafka, providing a real-time, scalable,
and high-performance big data processing solution.

Q.5 State characteristics of big data

 Here are four key characteristics of Big Data:
1. Volume – Refers to the vast amount of data
generated from various sources, such as social
media, sensors, and transactions.
2. Velocity – Represents the speed at which data is
generated, processed, and analyzed in real-time or
near real-time.
3. Variety – Includes different types of data formats,
such as structured (databases), semi-structured
(XML, JSON), and unstructured (videos, images,
text).
4. Veracity – Ensures the reliability and accuracy of
data, dealing with inconsistencies, noise, and
uncertainty in big data sources.

Q.6 Evolution of Hadoop ?


1. Origins (2003–2006): Inspired by Google’s GFS
and MapReduce, Hadoop was created by Doug
Cutting and Mike Cafarella and later open-
sourced under Apache.
2. Growth (2008–2013): Became a top Apache
project, gained industry adoption, and introduced
HDFS, MapReduce, and YARN for better
scalability.
3. Advancements (2017–Present): Hadoop 3.0
added erasure coding, container support, and
cloud integration, improving performance and
efficiency.
4. Future Trends: Focus on AI, real-time
processing, hybrid cloud, and security to
enhance big data analytics.

Q.7 Architecture of Hadoop(draw and list )

 Architecture Mainly consists of 4 components.
 MapReduce
 HDFS(Hadoop Distributed File System)
 YARN(Yet Another Resource Negotiator)
 Common Utilities or Hadoop Common
Q.8 Types of analysis:-
1. Descriptive Analytics – Summarizes historical
data to understand past trends
2. Diagnostic Analytics – Identifies causes of past
events using data patterns
3. Predictive Analytics – Uses statistical models
and machine learning to forecast future outcomes.
4. Prescriptive Analytics – Recommends actions
based on predictive insights to optimize decision-
making.

4 Marks:-
1. Explain data science?
Data science is an interdisciplinary field that involves
using various techniques, algorithms, and systems to
analyze and interpret large sets of data to derive
insights and make informed decisions.
Here are 8 key points about data science:
1. Data Collection and Cleaning: Before any
analysis, data needs to be gathered from different
sources and cleaned to ensure accuracy. Raw data
often contains errors, missing values, or
inconsistencies that need to be addressed.
2. Exploratory Data Analysis (EDA): EDA involves
visualizing and summarizing data to understand
patterns, distributions, and relationships. This step
helps data scientists to uncover hidden insights
and decide on further analysis.
3. Statistical Analysis: Data science heavily relies
on statistics to make inferences, test hypotheses,
and understand the likelihood of certain events or
outcomes. This includes tools like regression
analysis, probability, and hypothesis testing.
4. Machine Learning: Machine learning (ML)
algorithms allow computers to learn from data,
identifying patterns without being explicitly
programmed. Common techniques include
classification, regression, clustering, and decision
trees.
5. Big Data Technologies: Handling vast amounts
of data requires specialized tools like Hadoop,
Spark, and cloud computing resources. These tools
allow for processing, storing, and analyzing data
that exceeds traditional computing power.
6. Data Visualization: Visual representations of
data, such as graphs, charts, and dashboards, are
used to communicate findings clearly. Visualization
helps stakeholders easily interpret complex data
insights.
7. Predictive Modeling: One of the primary goals of
data science is to predict future outcomes based
on historical data. This involves building models
that can forecast trends, behaviors, or risks with a
certain level of confidence.
8. Communication and Decision-Making: Data
science isn't just about analyzing data—it’s also
about communicating the findings to non-technical
stakeholders. Data scientists must be able to
explain their results clearly and help guide
business decisions.
Q.2 Explain analytics flow of big data?
The analytics flow for big data involves several key
stages:
a. Data Collection: Gather data from diverse sources
like sensors, social media, and logs.
b. Data Storage: Store large datasets in scalable
solutions like Hadoop or NoSQL databases.
c. Data Cleaning and Preprocessing: Clean and
transform data to ensure quality and usability.
d. Data Analysis: Conduct exploratory analysis and
summarize trends or patterns using descriptive
analytics.
e. Modeling and Machine Learning: Apply machine
learning techniques to build models that predict or
classify data.
f. Data Visualization and Reporting: Visualize insights
through dashboards and reports for better
decision-making.
g. Deployment and Integration: Deploy models into
production and integrate insights into business
systems.
h. Monitoring and Maintenance: Continuously monitor
models and data pipelines for accuracy and
performance.
This flow ensures that big data is processed,
analyzed, and turned into actionable insights
effectively using advanced technologies.
Q.3 Explain the big data collection process of big
data analytics with an example
Big data collection for analytics involves these key
steps:
1. Identify & Select: Define project goals and choose
relevant data sources (e.g., databases, social
media).
2. Acquire: Gather data using methods like web
scraping, APIs, or data streaming.
3. Store: Use distributed systems (e.g., Hadoop) for
secure and efficient storage.
4. Preprocess: Clean and transform data, handling
missing values and inconsistencies.
5. Govern: Implement policies for data access,
security, and privacy, ensuring compliance.
A retail example: a company wanting to analyze
customer behavior would collect data from
databases, websites, and social media, store it in
Hadoop, clean it, and then analyze it while adhering
to privacy regulations.

Q.4 Describe HDFS.

HDFS (Hadoop Distributed File System) is a
distributed file system designed for large clusters and
high throughput. Key characteristics include:
 Scalability: Handles massive files (GBs to TBs) by
breaking them into blocks and distributing them
across thousands of nodes.
 Replication: Ensures fault tolerance by replicating
data blocks across multiple machines (default is 3
copies of a 64MB block).
 Streaming Access: Optimized for high-
throughput sequential reads and writes, ideal for
batch processing. This trade-off means it's not
suitable for low-latency, interactive access.
 File Appends: While initially designed for
immutable files (write-once), HDFS now supports
appending to files.
Q.5 Compare rdbms vs Hadoop
Feature RDBMS Hadoop
Data Type Primarily structured data (rows and columns) Handles structured, semi-
structured, and unstructured data
Data Volume Designed for smaller to medium-sized datasets Optimized for massive datasets
(Big Data)
Processing Focuses on online transaction processing (OLTP) Emphasizes batch processing for
with fast, consistent transactions complex analysis of large
volumes of data
Scalability Typically scales vertically (more powerful hardware) Scales horizontally (add more
machines to the cluster)
Data Schema Static schema defined upfront Flexible schema, often schema-
on-read

Q.6 Case study on weather forecasting


Weather forecasting is a complex process that relies
on analyzing vast amounts of data from various
sources. Big data analytics has revolutionized this
field, enabling more accurate and timely predictions.
Here's a case study on weather forecasting using big
data:
Data Sources:
 Weather Stations: Collect real-time data on
temperature, humidity, wind speed, and
precipitation.
 Satellites: Provide images and data on cloud
cover, atmospheric conditions, and land surface
temperatures.
 Radar: Detects precipitation and tracks its
movement.
 Aircraft: Gather data on atmospheric conditions at
different altitudes.
 IoT Sensors: A growing network of sensors
provides hyperlocal data on weather conditions.
 Social Media: Can offer insights into real-time
weather events and their impact.
Big Data Technologies:
 Hadoop: Stores and processes massive datasets
from diverse sources.
 Spark: Enables fast and efficient analysis of
weather data.
 Machine Learning: Algorithms identify patterns
and build predictive models.
 Cloud Computing: Provides scalable
infrastructure for data storage and processing.
Analysis and Prediction:
 Data Preprocessing: Cleaning and transforming
raw data to ensure quality.
 Feature Engineering: Extracting relevant
variables for model building.
 Model Development: Training machine learning
models on historical and real-time data.
 Visualization: Presenting weather forecasts
through maps, charts, and reports.
Benefits:
 Improved Accuracy: Big data enhances the
precision of weather forecasts.
 Timely Warnings: Enables early warnings for
severe weather events.
 Enhanced Decision-Making: Helps individuals,
businesses, and governments make informed
choices.
 Better Resource Management: Optimizes
resource allocation for weather-related activities.
Challenges:
 Data Volume and Velocity: Handling the sheer
volume and speed of weather data.
 Data Integration: Combining data from diverse
sources with varying formats.
 Model Complexity: Developing accurate and
reliable predictive models.
 Computational Resources: Requires powerful
computing infrastructure for data processing.

Q.7 Challenges of big data

1. Data Volume and Velocity: The sheer amount of
data being generated is exploding, and it's coming
in faster than ever. This makes it difficult to store,
process, and analyze it all efficiently. Think of
trying to drink from a firehose!
2. Data Variety: Big data comes in all shapes and
sizes - structured (like a spreadsheet), semi-
structured (like a document with some tags), and
unstructured (like social media posts or videos).
This variety makes it tough to integrate and
analyze data from different sources.
3. Data Quality: With so much data, it's easy for
errors, inconsistencies, and duplicates to creep in.
Poor data quality can lead to inaccurate insights
and bad decisions. It's like trying to build a house
with faulty materials.
4. Data Security and Privacy: Big data often
contains sensitive information, so protecting it from
breaches and misuse is crucial. Companies need to
comply with privacy regulations and ensure data is
accessed and used responsibly.
5. Skills Gap: Analyzing big data requires specialized
skills in areas like data science, machine learning,
and statistics. Finding and retaining professionals
with these skills can be a challenge.

Notes - Big Data Analytics Unit I, Ii, Iii
No ratings yet
Notes - Big Data Analytics Unit I, Ii, Iii
39 pages
BDA Question Bank
100% (1)
BDA Question Bank
10 pages
SImplified Solutions of BAD601 Model Question Paper
No ratings yet
SImplified Solutions of BAD601 Model Question Paper
32 pages
Hadoop PPT
No ratings yet
Hadoop PPT
25 pages
BDA Question Bank
No ratings yet
BDA Question Bank
33 pages
Big Data Analytics M1
No ratings yet
Big Data Analytics M1
27 pages
Big Data Hadoop Complete Final Spaced
No ratings yet
Big Data Hadoop Complete Final Spaced
15 pages
Big Data Analysis BDA IMP QNA Openinapp
No ratings yet
Big Data Analysis BDA IMP QNA Openinapp
33 pages
Big Data Analytics 2023 Solution
No ratings yet
Big Data Analytics 2023 Solution
17 pages
Unit-I Material
No ratings yet
Unit-I Material
32 pages
4.bda 2M Q &a
No ratings yet
4.bda 2M Q &a
45 pages
Unit 2
No ratings yet
Unit 2
17 pages
DSBDA EndSem2023 12F FlyHigh
No ratings yet
DSBDA EndSem2023 12F FlyHigh
20 pages
Bda U2
No ratings yet
Bda U2
68 pages
TIE - 21CS71 SIMP With Key Answers
No ratings yet
TIE - 21CS71 SIMP With Key Answers
19 pages
Big Data Imp-1
No ratings yet
Big Data Imp-1
16 pages
BD by Maaz
No ratings yet
BD by Maaz
19 pages
BD Imp Ques 1
No ratings yet
BD Imp Ques 1
22 pages
Big Data 3
No ratings yet
Big Data 3
16 pages
Introduction To Big Dat1
No ratings yet
Introduction To Big Dat1
6 pages
Bda Ut1 Que Ans
No ratings yet
Bda Ut1 Que Ans
13 pages
Big Data Complete Notes
No ratings yet
Big Data Complete Notes
33 pages
BDA Module-2 Notes PDF
100% (1)
BDA Module-2 Notes PDF
14 pages
BDA Simple 1 To 4
No ratings yet
BDA Simple 1 To 4
11 pages
BDS DS307 Unit-1
No ratings yet
BDS DS307 Unit-1
46 pages
BDA Module
No ratings yet
BDA Module
6 pages
V'S" V'S,"
No ratings yet
V'S" V'S,"
4 pages
Bda 23
No ratings yet
Bda 23
12 pages
Last Min Preparation - Big Data
No ratings yet
Last Min Preparation - Big Data
5 pages
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
17 pages
Hadoop
No ratings yet
Hadoop
4 pages
Big Data One Shot
No ratings yet
Big Data One Shot
45 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
15 pages
Data Science
No ratings yet
Data Science
31 pages
Ak As2
No ratings yet
Ak As2
15 pages
Two Marks
No ratings yet
Two Marks
39 pages
Unit 2
No ratings yet
Unit 2
6 pages
Big Data Analysis Unit 1-5 Extended
No ratings yet
Big Data Analysis Unit 1-5 Extended
35 pages
BAD601 Big Data Model Question Paper Solution Search Creators
No ratings yet
BAD601 Big Data Model Question Paper Solution Search Creators
50 pages
Bda Q&a
No ratings yet
Bda Q&a
15 pages
Sdcbdasparkweek1 1
No ratings yet
Sdcbdasparkweek1 1
9 pages
Question Bank
No ratings yet
Question Bank
62 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
20 pages
I Am Preparing For A Big Data Analytics University...
No ratings yet
I Am Preparing For A Big Data Analytics University...
15 pages
Data Analytics Mid Sem Notes
No ratings yet
Data Analytics Mid Sem Notes
9 pages
Experiment No - 1 Bda
No ratings yet
Experiment No - 1 Bda
10 pages
Unit 1 B Tech 3 Year BD
No ratings yet
Unit 1 B Tech 3 Year BD
10 pages
IOT and Comp - Architecture
No ratings yet
IOT and Comp - Architecture
17 pages
Ite06 Big Data Analytics-Qbank
No ratings yet
Ite06 Big Data Analytics-Qbank
18 pages
Super 25 Unit 1 and Unit 2
No ratings yet
Super 25 Unit 1 and Unit 2
15 pages
Big Data
No ratings yet
Big Data
8 pages
IET Udaipur BDA Unit-1
No ratings yet
IET Udaipur BDA Unit-1
10 pages
BigdatMid1 Shcema
No ratings yet
BigdatMid1 Shcema
7 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
Lecture 2
No ratings yet
Lecture 2
11 pages
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
No ratings yet
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
25 pages
BDA Unit 2
No ratings yet
BDA Unit 2
8 pages
File Management in Operating System of A Computer
100% (1)
File Management in Operating System of A Computer
70 pages
Introduction To Power Query
50% (2)
Introduction To Power Query
34 pages
Class XI (Computer Science) Chapterwise MCQ
No ratings yet
Class XI (Computer Science) Chapterwise MCQ
66 pages
SQL Overview
No ratings yet
SQL Overview
3 pages
SQL Server DB Refresh Article-DBAChamps
No ratings yet
SQL Server DB Refresh Article-DBAChamps
4 pages
Python Mini Project
No ratings yet
Python Mini Project
32 pages
Index - 1: The Oracle B-Tree Index
No ratings yet
Index - 1: The Oracle B-Tree Index
6 pages
Database Management System
No ratings yet
Database Management System
22 pages
Applications of Machine Learning in Big Data Analytics and Cloud Computing 1st Edition by River Publishers 9781000796322 1000796329
100% (11)
Applications of Machine Learning in Big Data Analytics and Cloud Computing 1st Edition by River Publishers 9781000796322 1000796329
78 pages
Google Bigquery & Tableau: Best Practices
No ratings yet
Google Bigquery & Tableau: Best Practices
14 pages
Tutorial On File Organization: Comparison of File Organizations
No ratings yet
Tutorial On File Organization: Comparison of File Organizations
15 pages
Grade 8 Pretechnical Studies Schemes of Work Term 1 2025
No ratings yet
Grade 8 Pretechnical Studies Schemes of Work Term 1 2025
17 pages
ACID Properties of Transactions
No ratings yet
ACID Properties of Transactions
13 pages
Resize A Linux Root Partition Without Rebooting
No ratings yet
Resize A Linux Root Partition Without Rebooting
15 pages
Computer Memory: Prepared By: Avanthika Krishnan, XI D
No ratings yet
Computer Memory: Prepared By: Avanthika Krishnan, XI D
17 pages
Using Chatgpt in Translation Teaching
No ratings yet
Using Chatgpt in Translation Teaching
4 pages
Krishna Verma 3.2 Dbms
No ratings yet
Krishna Verma 3.2 Dbms
5 pages
TSM TDP For Exchange Server Windows
No ratings yet
TSM TDP For Exchange Server Windows
328 pages
Customer Analyctics
No ratings yet
Customer Analyctics
6 pages
Project Report Guidelines
No ratings yet
Project Report Guidelines
20 pages
Routledge Handbook of International Organization (Bob Reinalda)
No ratings yet
Routledge Handbook of International Organization (Bob Reinalda)
11 pages
Rattle Brochure
No ratings yet
Rattle Brochure
1 page
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
44 pages
OTN - Strategic Plan 2018 2023
No ratings yet
OTN - Strategic Plan 2018 2023
8 pages
Dbms Lab Split Up (Ct3) - Sheet1
No ratings yet
Dbms Lab Split Up (Ct3) - Sheet1
2 pages
SWP391-AppDevProject Design Template
No ratings yet
SWP391-AppDevProject Design Template
5 pages
RDBMS Concepts & Database Designing: Dr. R.C. Goyal
No ratings yet
RDBMS Concepts & Database Designing: Dr. R.C. Goyal
0 pages
Pruthviraj Data Engineer PDF
No ratings yet
Pruthviraj Data Engineer PDF
1 page
Dbms - Assignment 1 Sol
No ratings yet
Dbms - Assignment 1 Sol
2 pages
Netbackup Cheat Sheet: Master Server Daemons/Processes
No ratings yet
Netbackup Cheat Sheet: Master Server Daemons/Processes
6 pages

Msbte UT 1 QB Answers

Uploaded by

Msbte UT 1 QB Answers

Uploaded by

2 Marks

Q.1 State features of Hadoop ?

Q.3 Define bda and state it's type

Q.5 State characteristics of big data

Q.6 Evolution of Hadoop ?

Q.7 Architecture of Hadoop(draw and list )

Q.4 Describe HDFS.

Q.6 Case study on weather forecasting

Q.7 Challenges of big data

You might also like