0% found this document useful (0 votes)
6 views10 pages

Bda Unit 1

Big Data refers to large volumes of structured and unstructured data generated at high velocity, which traditional systems struggle to manage. It is categorized into structured, unstructured, and semi-structured data, each with its own advantages and challenges. Big Data analytics faces issues such as data quality, integration, storage, scalability, and privacy, while Business Intelligence (BI) transforms raw data into actionable insights through a structured process, although it also has its own set of advantages and disadvantages.

Uploaded by

mshraghvin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views10 pages

Bda Unit 1

Big Data refers to large volumes of structured and unstructured data generated at high velocity, which traditional systems struggle to manage. It is categorized into structured, unstructured, and semi-structured data, each with its own advantages and challenges. Big Data analytics faces issues such as data quality, integration, storage, scalability, and privacy, while Business Intelligence (BI) transforms raw data into actionable insights through a structured process, although it also has its own set of advantages and disadvantages.

Uploaded by

mshraghvin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

1)What is Big Data. Explain the types of Big Data.

Discuss the
challenges for Big Data Analytics.
A) Big Data refers to the massive volumes of structured and unstructured data
generated from various sources at high velocity. This data is often too large and
complex for traditional data processing systems to manage effectively.
Types of Big Data
Big Data can be categorized into three primary types:
1. Structured Data:
o This is data that is organized in a predefined manner, usually in
rows and columns, making it easily searchable and analyzable.
o Examples: Databases, spreadsheets, and any data that can be easily
entered into relational databases.
Advantages:
 Easy to Analyze
 High Accuracy
Disadvantages:
 Limited Flexibility
 Inability to Capture Rich Information
2. Unstructured Data:
o This type of data lacks a predefined format or structure, making it
more complex to analyze.
o Examples: Text documents, images, videos, social media posts, and
emails.
Advantages:
 Rich Insights
 Flexibility
Disadvantages:
 Difficult to Analyze.
 Data Quality Issues
3. Semi-Structured Data:
o This type contains elements of both structured and unstructured
data. While it may have organizational properties to separate data
elements, it does not fit into a strict schema.
o Examples: JSON, XML, and NoSQL databases.
Advantages:
 Balance of Structure and Flexibility
 Ease of Data Integration
Disadvantages:
 Complexity in Analysis
 Inconsistent Formats
Challenges for Big Data Analytics
Despite its potential, Big Data analytics faces several challenges:
1. Data Quality and Cleansing:
o Ensuring that the data is accurate, consistent, and cleaned is critical.
Poor data quality can lead to incorrect insights and decision-
making.
2. Data Integration:
o Combining data from different sources (structured and unstructured)
can be difficult, especially when these sources use various formats
and protocols.
3. Storage and Management:
o Storing vast amounts of data efficiently while maintaining
performance is a significant challenge. This includes choosing the
right technology stack and managing the costs associated with
storage.
4. Scalability:
o As the volume of data grows, systems must be able to scale
effectively without sacrificing performance. This requires robust
architecture and planning.
5. Data Privacy and Security:
o Protecting sensitive data and ensuring compliance with regulations
(like GDPR) represents a major challenge, particularly with the
increase in data breaches.
6. Skill Gap:
o There is often a shortage of skilled professionals who can analyze
Big Data effectively. This includes data scientists, analysts, and
engineers familiar with Big Data technologies.
7. Real-time Processing:
o Analyzing streaming data in real-time poses technical challenges,
as traditional data processing tools may not be able to handle high-
velocity data streams effectively.
8. Interpreting Data:
o Deriving actionable insights from complex datasets can be
daunting, especially when visualizing the data or when decision-
makers lack data literacy.
2) Define Business Intelligence and How the business intelligence
systems implemented.
A) Business intelligence or BI is a set of practices of collecting, structuring,
and analyzing raw data to turn it into actionable business insights. BI considers
methods and tools that transform unstructured data sets, compiling them into
easy-to-grasp reports or information dashboards. The main purpose of BI is to
support data-driven decision-making.
Business intelligence process: How does BI work?
The whole process of business intelligence can be divided into five main stages.
1. Data gathering involves collecting information from a variety of sources,
either external (e.g., market data providers, industry analytics, etc.) or
internal (Google Analytics, CRM, ERP, etc.).
2. Data cleaning/standardization means preparing collected data for
analysis by validating data quality, ensuring its consistency, and so on
(please check the linked articles for more details.)
3. Data storage refers to loading data in the data warehouse and storing it
for further usage.
4. Data analysis is actually the automated process of turning raw data into
valuable, actionable information by applying various quantitative and
qualitative analytical techniques.
5. Reporting involves generating dashboards, graphical imagery, or other
forms of readable visual representation of analytics results that users can
interact with or extract actionable insights from.
Advantages of BI:
 Data driven decision making
 Improved efficiency
 Enhanced visualization
 Data mining
 Real time analytics
Disadvantages of BI:
 High Costs
 Complexity
 Data Overload
 Dependency on IT
 Security and Privacy concerns

3)What are advantages and disadvantages of Big Data Analytics?


A) Advantages:
1. Smarter Decisions: By analyzing large amounts of data, companies can
make more informed choices, leading to better strategies and outcomes.
2. Personalized Experiences: Understanding customer preferences allows
businesses to tailor products and services to individual needs, enhancing
satisfaction.
3. Boosted Efficiency: Big data helps identify areas where operations can be
streamlined, saving time and resources.
4. Competitive Edge: Access to comprehensive data insights enables
companies to stay ahead in the market by quickly adapting to trends.
5. Innovation Opportunities: Analyzing data can reveal gaps in the market,
inspiring the development of new products or services.
Disadvantages:
1. Privacy Concerns: Collecting vast amounts of personal data can lead to
security risks and potential misuse if not handled properly.
2. High Costs: Implementing and maintaining big data systems can be
expensive, requiring significant investment in technology and talent.
3. Data Overload: With so much information available, it can be challenging to
filter out irrelevant data and focus on what's important.
4. Quality Issues: Not all data collected is accurate or useful; relying on poor-
quality data can lead to faulty conclusions.
5. Complex Analysis: Interpreting big data requires specialized skills and tools,
which may not be readily available to all organizations.
4)Describe characteristics of Big Data or 5v’s of Big Data.
A) Characteristics of Big Data:
o Volume
o Veracity
o Variety
o Value
o Velocity

Volume
o The name Big Data itself is related to an enormous size. Big Data is a vast
'volumes' of data generated from many sources daily, such as business processes,
machines, social media platforms, networks, human interactions, and many
more.
o Facebook can generate approximately a billion messages, 4.5 billion times that
the "Like" button is recorded, and more than 350 million new posts are
uploaded each day. Big data technologies can handle large amounts of data.

Variety
o Big Data can be structured, unstructured, and semi-structured that are being
collected from different sources. Data will only be collected
from databases and sheets in the past, But these days the data will comes in
array forms, that are PDFs, Emails, audios, SM posts, photos, videos, etc.
The data is categorized as below:
Structured data: In Structured schema, along with all the required columns. It
is in a tabular form. Structured Data is stored in the relational database
management system.
Semi-structured: In Semi-structured, the schema is not appropriately defined,
e.g., JSON, XML, CSV, TSV, and email. OLTP (Online Transaction
Processing) systems are built to work with semi-structured data. It is stored in
relations, i.e, tables.

Unstructured Data: All the unstructured files, log files, audio files,
and image files are included in the unstructured data. Some organizations have
much data available, but they did not know how to derive the value of data
since the data is raw.

Quasi-structured Data:The data format contains textual data with inconsistent


data formats that are formatted with effort and time with some tools.

Veracity:
Veracity means how much the data is reliable. It has many ways to filter or
translate the data. Veracity is the process of being able to handle and manage
data efficiently. Big Data is also essential in business development.
For example, Facebook posts with hashtags.
Value:
Value is an essential characteristic of big data. It is not the data that we process
or store. It is valuable and reliable data that we store, process, and
also analyze.
Velocity
Velocity plays an important role compared to others. Velocity creates the speed
by which the data is created in real-time. It contains the linking of
incoming data sets speeds, rate of change, and activity bursts. The primary
aspect of Big Data is to provide demanding data rapidly.
Big data velocity deals with the speed at the data flows from sources
like application logs, business processes, networks, and social media sites,
sensors, mobile devices, etc.

5)Discuss the advantages and disadvantages of Business Intelligence.


A) Advantages of Business Intelligence
1. Data-Driven Decision Making:
o BI provides access to valuable data and insights, enabling
companies to make informed decisions based on facts rather than
intuition.
2. Improved Efficiency:
o BI tools automate data analysis and reporting, saving time and
allowing employees to focus on more important tasks, which
boosts productivity and overall performance.
3. Enhanced Visualization:
o BI tools create easy-to-read charts, graphs, and dashboards that
help businesses quickly understand their performance and identify
trends.
4. Data Mining:
o BI systems analyze large datasets to uncover hidden patterns and
insights, helping businesses make proactive decisions and stay
competitive.
5. Real-Time Analytics:
o With BI, companies can access up-to-date information instantly,
allowing for quick responses to market changes and timely
decision-making.
Disadvantages of Business Intelligence
1. High Costs:
o Implementing BI systems can be expensive, including the costs of
software and training, which might be a burden for smaller
businesses.
2. Complexity:
o BI tools can be complicated to set up and use, especially for those
who are not tech-savvy. Proper training and support are essential.
3. Data Overload:
o Access to vast amounts of data can lead to confusion. Companies
need to focus on quality and relevance to avoid being overwhelmed
by unnecessary information.
4. Dependency on IT:
o BI systems often require technical IT support for implementation
and maintenance, which can cause delays and create bottlenecks in
accessing data.
5. Security and Privacy Concerns:
o Storing sensitive data in central databases raises security risks,
making it important for companies to implement strong measures
to protect against breaches.
6)Discuss Evolution of Big Data.
A) 1. Early Days of Data Management (1950s - 1970s)
 Data Collection Begins: This period involved the early use of computers
to collect and manage data. Businesses started using basic databases and
mainframes to store information.
2. The Rise of the Internet and Data Explosion (1990s - 2000s)
 Internet Growth: The internet became popular, leading to a rapid
increase in data from websites, emails, and online transactions. More data
was generated than ever before.
3. Emergence of Big Data Technologies (2000s - 2010s)
 Big Data Defined: This period saw the introduction of technologies that
could handle large amounts of diverse data (like Hadoop), making it
easier to store and process Big Data efficiently.
4. Advancements in Data Analytics and Machine Learning (2010s - 2020s)
 Sophisticated Analysis: Companies began using advanced techniques
like machine learning and predictive analytics to gain deeper insights
from their data, allowing for better decision-making.
5. Current Trends and Future Directions (2020s and Beyond)
 Future Focus: Businesses are now concentrating on real-time data
analysis, artificial intelligence, and how to manage data effectively to stay
ahead in the market.

7) Differentiate between structured, unstructured and semi-structured data.


A)
Structured Data Unstructured Data Semi structured Data
1) Organized in a 1) Lacks a specific 1) Contains elements of
predefined format, often format or structure, both structured and
in rows and columns. making it more unstructured data; has
complex. some organization but
not a strict schema.
2) Databases (e.g., 2) Text files, emails, 2) JSON, XML, NoSQL
SQL), spreadsheets, social media posts, databases.
CSV files. images, videos.
3) Stored in tabular 3) Requires more 3) Can be stored in both
forms (e.g., relational flexible storage database systems and
databases). solutions (e.g., file file formats, depending
systems). on the structure.
4) Easy to analyze using 4) More challenging to 4) Easier to analyze than
traditional tools (like analyze; requires unstructured data, but
SQL). advanced tools and may need specialized
techniques (like NLP). tools for complete
analysis.
5) Low complexity; 5) High complexity due 5) Moderate complexity;
straightforward data to vast variability and some organization helps
management. lack of format. but may still require
parsing.
6) Easily searchable 6) Harder to search; 6) Can be searched more
using standard query often needs indexing or easily than unstructured
languages. advanced searching data, especially if
tools. properly tagged.
7) Financial data, 7) Social media 7) Log files, web data
transaction records, analysis, customer feeds, and data from
customer databases are feedback, multimedia APIs are some of the
some of the uses. content are some of the uses.
uses.

You might also like