0% found this document useful (0 votes)
10 views21 pages

Introduction To Big Data

The document provides an overview of Big Data, including its definition, characteristics (volume, variety, velocity, value, and veracity), and the distinction between Business Intelligence (BI) and Data Analytics (DA). It highlights the importance of data processing and analysis for better decision-making in organizations, as well as the tools and applications used in Big Data analytics. Additionally, it discusses the advantages and disadvantages of both BI and DA, emphasizing their roles in improving business operations.

Uploaded by

Vivek Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views21 pages

Introduction To Big Data

The document provides an overview of Big Data, including its definition, characteristics (volume, variety, velocity, value, and veracity), and the distinction between Business Intelligence (BI) and Data Analytics (DA). It highlights the importance of data processing and analysis for better decision-making in organizations, as well as the tools and applications used in Big Data analytics. Additionally, it discusses the advantages and disadvantages of both BI and DA, emphasizing their roles in improving business operations.

Uploaded by

Vivek Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Large Scale Data Processing

Today’s Agenda

 Big Data Overview


 Characteristics of Big Data
 Business Intelligence vs Data Analytics
Introduction to Big Data
Definition:- Big data can be defined
as a concept used to describe a large
volume of data, which are both
structured and unstructured, and that
gets increased day by day by any
system or business. However, it is not
the quantity of data, which is
essential. The important part is what
any firm or organization can do with
the data matters a lot. Analysis can be
performed on big data for insight and
predictions, which can lead to a better
decision and reliable strategy in
business moves.
Introduction to Big Data
This conception theory gained thrust in the early 2000s when trade and business
analyst Mr. Doug Laney expressed the mainstream explanation of the keyword big
data over the pillars of 3v's:
Volume: Organizations and firms gather as well as pull together different data from
different sources, which includes business transactions and data, data from social
media, login data, as well as information from the sensor as well as machine-to-
machine data. Earlier, this data storage would have been an issue - but because of the
advent of new technologies for handling extensive data with tools like Apache Spark,
Hadoop, the burden of enormous data got decreased.
Velocity: Data is now streaming at an exceptional speed, which has to be dealt with
suitably. Sensors, smart metering, user data as well as RFID tags are lashing the need
for dealing with an inundation of data in near real-time.
Variety: The releases of data from various systems have diverse types and formats.
They range from structured to unstructured, numeric data of traditional databases to
non-numeric or text documents, emails, audios and videos, stock ticker data, login
data, Blockchains' encrypted data, or even financial transactions.
Who’s Generating Big Data

Mobile devices
Social media and networks Scientific instruments (tracking all objects all the time)
(all of us are generating data) (collecting all sorts of data)

Sensor technology and networks


(measuring all kinds of data)

5
Characteristics of Big Data
The primary characteristics of Big Data are –
1. Volume
Volume refers to the huge amounts of data
that is collected and generated every second
in large organizations. This data is generated
from different sources such as IoT devices,
social media, videos, financial transactions,
and customer logs.
Storing and processing this huge amount of
data was a problem earlier. But now
distributed systems such as Hadoop are used
for organizing data collected from all these
sources. The size of the data is crucial for
understanding its value. Also, the volume is
useful in determining whether a collection of
data is Big Data or not.
Characteristics of Big Data
2. Variety
Another one of the most important Big Data
characteristics is its variety. It refers to the
different sources of data and their nature. The
sources of data have changed over the years.
Earlier, it was only available in spreadsheets and
databases. Nowadays, data is present in photos,
audio files, videos, text files, and PDFs.
The variety of data is crucial for its storage and
analysis.
A variety of data can be classified into three
distinct parts:
Structured data
Semi-Structured data
Unstructured data
Quasi- Structured data
Types of Big Data
Structured data: In Structured schema, along
with all the required columns. It is in a tabular
form. Structured Data is stored in the relational
database management system.
Semi-structured: In Semi-structured, the schema
is not appropriately defined, e.g., JSON, XML,
CSV, TSV, and email. OLTP (Online
Transaction Processing) systems are built to
work with semi-structured data. It is stored in
relations, i.e., tables.
Unstructured Data: All the unstructured files,
log files, audio files, and image files are included
in the unstructured data. Some organizations have
much data available, but they did not know how
to derive the value of data since the data is raw.
Quasi-structured Data: The data format
contains textual data with inconsistent data
formats that are formatted with effort and time
with some tools.
Structure:

Machine-generated data – sensors, network


servers, weblogs, GPS, etc.
Human-generated data – data is entered by the
user in their system, such as personal details,
passwords, documents

Unstructured:

your comments, tweets, shares, posts, and likes on


social media. The videos you watch on YouTube
and text messages you send via WhatsApp

Semi-Structure:

Data models ,NoSQL documents have keywords


that are used to process the document. CSV files
are also considered semi-structured data.
Characteristics of Big Data
3. Velocity
This term refers to the speed at which the data is
created or generated. This speed of data
producing is also related to how fast this data is
going to be processed. This is because only after
analysis and processing, the data can meet the
demands of the clients/users.
Massive amounts of data are produced from
sensors, social media sites, and application logs
– and all of it is continuous. If the data flow is
not continuous, there is no point in investing
time or effort on it.
As an example, per day, people generate more
than 3.5 billion searches on Google
Characteristics of Big Data
4. Value
Among the characteristics of Big Data, value is
perhaps the most important. No matter how fast
the data is produced or its amount, it has to be
reliable and useful. Otherwise, the data is not
good enough for processing or analysis.
Research says that poor quality data can lead to
almost a 20% loss in a company’s revenue.
Data scientists first convert raw data into
information. Then this data set is cleaned to
retrieve the most useful data. Analysis and
pattern identification is done on this data set. If
the process is a success, the data can be
considered to be valuable.
Characteristics of Big Data

5. Veracity
This feature of Big Data is connected to the
previous one. It defines the degree of
trustworthiness of the data. As most of the data
you encounter is unstructured, it is important to
filter out the unnecessary information and use
the rest for processing.
Veracity is one of the characteristics of big data
analytics that denotes data inconsistency as well
as data uncertainty.
As an example, a huge amount of data can
create much confusion on the other hand, when
there is a fewer amount of data, that creates
inadequate information.
Other Characteristics of Big Data
Other than these five traits of big data in data science, there are a few more characteristics of
big data analytics that have been discussed down below:

1. Volatility
One of the big data characteristics is Volatility. Volatility means rapid change. And Big data
is in continuous change. Like data collected from a particular source change within a span of
a few days or so. This characteristic of Big Data hampers data homogenization. This process
is also known as the variability of data
2. Visualization
Visualization is one more characteristic of big data analytics. Visualization is the method of
representing that big data that has been generated in the form of graphs and charts. Big data
professionals have to share their big data insights with non-technical audiences on a daily
Business Intelligence vs Data Analytics
Business Intelligence:
The term Business Intelligence (BI) alludes to advances, applications, and
hones for the collection, integration, examination, and introduction of
business data. The reason for Commerce Insights is to bolster superior
trade choice making. Basically, Trade Insights frameworks are data-driven
Decision Support Systems (DSS). Business Intelligence is now and then
utilized traded with briefing books, reports and inquiry instruments, and
official data frameworks. Business Intelligence frameworks give
authentic, current, and prescient sees of commercial operations, most
frequently utilizing information that has been assembled into an
information stockroom or an information shop and sometimes working
from operational information.
Advantages of Business Intelligence:
• BI is focused on providing insights based on historical data, allowing
businesses to understand trends and patterns in their operations.
• BI provides a comprehensive view of the organization’s operations,
allowing managers to understand performance across multiple
departments and functions.
• BI can help identify opportunities for cost reduction and process
improvement, leading to increased efficiency and profitability.
Business Intelligence vs Data Analytics
Disadvantages of Business Intelligence:
• BI is focused on historical data, which may not provide an accurate
picture of current or future conditions.
• BI can be resource-intensive, requiring significant investment in data
collection and processing, as well as specialized software and hardware.
• BI may not provide the level of detail or granularity needed to address
specific business challenges.
Data analytics:
Data analytics (DA) is that the strategy of analyzing information sets to
conclude the data they contain, continuously with the assistance of
particular frameworks and computer program bundle. Information
Analytics strategies are generally utilized in IT Companies to improve the
associations to make more-information organization choices and by
researchers and analysts to test or diverse logical models, standards, and
information.
Advantages of Data Analytics:
• DA is focused on providing insights based on both historical and real-
time data, allowing businesses to understand trends and patterns in
their operations in real-time.
• DA provides a more granular view of the organization’s operations,
allowing managers to identify trends and insights that may not be
Business Intelligence vs Data Analytics
Disadvantages of Data Analytics:
• DA can be more challenging to implement than traditional BI methods,
requiring advanced data processing and analytics technologies.
• DA requires significant expertise in data science, making it more
difficult for organizations to build and maintain a capable team.
• DA can be resource-intensive, requiring significant investment in data
collection and processing, as well as specialized software and hardware.

Similarities between Business Intelligence and Data Analytics:


• Both BI and DA involve the use of data analysis to provide insights that can help
organizations make better decisions.
• Both approaches use advanced statistical and mathematical models to analyze data.
• Both approaches require significant expertise in statistical analysis and data science.
Business Intelligence vs Data Analytics
Business Intelligence Data Analytics

Business Intelligence alludes to the data required to upgrade commerce Data Analytics alludes to altering the crude information into a significant
decision-making activities. arrange.

The prime reason of business intelligence is to supply back in choice- The prime reason for data analytics is to demonstrate, cleanse, foresee and
making and offer assistance the organizations to develop their business. change the information as per the trade needs.

Data analytics can be executed utilizing different data storage devices


Business Intelligence can be executed utilizing different BI devices
accessible within the advertisement. Information analytics can moreover
accessible within the advertisement. BI is executed as it were on Verifiable
be actualized utilizing BI devices but it depends on the approach or
information put away in information distribution centers or data marts.
methodology outlined by an organization.

BI component can be repaired as it were through verifiable information Data Analytics can be repaired through the proposed show to change over
given and the conclusion client requirements. the information into a important organize.

Data analytics has been around since19th century, but it has developed its
The term Business Intelligence has come into presence in 1865.
conspicuousness in 1960’s.

Business Intelligence, on the other hand, is actualized in a circumstance


Data Analytics is executed in a circumstance where an organization is
where an organization doesn’t have any changes to its current trade
moderately unused and needs critical changes to its commerce model.
demonstrate and its prime reason is to meet organizational goals

Business Intelligence (BI) Tools incorporate: Klipfolio, InsightSquared


Data analytics tools are Tableau Public, SAS, Apache Spark., Excel.,
Deals Analytics, ThoughtSpot, TIBCO Spotfire, Alteryx Stage, Domo,
RapidMiner, KNIME, QlikView.
Cyfe, Sisense, Looker, Microsoft Control BI.

Key skills for business intelligence are Data collection and Management, Key skills for a data analysis A tall level of scientific ability, Programming
Data Stockroom concepts, Understanding of diverse data sources and languages, such as SQL, Oracle, and Python, the capacity to analyze,
exchange applications, Domain and business information. demonstrate and translate data, Problem-solving skills.
Big Data Analytics

•Tableau Public

•OpenRefine

•KNIME

•RapidMiner

•Google Fusion
Tables

•NodeXL

•Wolfram Alpha
Applications
1. Discovering consumer shopping habits

2. Personalized marketing

3. Fuel optimization tools for the transportation


industry

4. Monitoring health conditions through data from


wearables

5. Live road mapping for autonomous vehicles

6. Streamlined media streaming

7. Predictive inventory ordering

8. Personalized health plans for cancer patients

9. Real-time data monitoring and cybersecurity


protocols
Any questions
Thank you

You might also like