Introduction To Big Data
Introduction To Big Data
Today’s Agenda
Mobile devices
Social media and networks Scientific instruments (tracking all objects all the time)
(all of us are generating data) (collecting all sorts of data)
5
Characteristics of Big Data
The primary characteristics of Big Data are –
1. Volume
Volume refers to the huge amounts of data
that is collected and generated every second
in large organizations. This data is generated
from different sources such as IoT devices,
social media, videos, financial transactions,
and customer logs.
Storing and processing this huge amount of
data was a problem earlier. But now
distributed systems such as Hadoop are used
for organizing data collected from all these
sources. The size of the data is crucial for
understanding its value. Also, the volume is
useful in determining whether a collection of
data is Big Data or not.
Characteristics of Big Data
2. Variety
Another one of the most important Big Data
characteristics is its variety. It refers to the
different sources of data and their nature. The
sources of data have changed over the years.
Earlier, it was only available in spreadsheets and
databases. Nowadays, data is present in photos,
audio files, videos, text files, and PDFs.
The variety of data is crucial for its storage and
analysis.
A variety of data can be classified into three
distinct parts:
Structured data
Semi-Structured data
Unstructured data
Quasi- Structured data
Types of Big Data
Structured data: In Structured schema, along
with all the required columns. It is in a tabular
form. Structured Data is stored in the relational
database management system.
Semi-structured: In Semi-structured, the schema
is not appropriately defined, e.g., JSON, XML,
CSV, TSV, and email. OLTP (Online
Transaction Processing) systems are built to
work with semi-structured data. It is stored in
relations, i.e., tables.
Unstructured Data: All the unstructured files,
log files, audio files, and image files are included
in the unstructured data. Some organizations have
much data available, but they did not know how
to derive the value of data since the data is raw.
Quasi-structured Data: The data format
contains textual data with inconsistent data
formats that are formatted with effort and time
with some tools.
Structure:
Unstructured:
Semi-Structure:
5. Veracity
This feature of Big Data is connected to the
previous one. It defines the degree of
trustworthiness of the data. As most of the data
you encounter is unstructured, it is important to
filter out the unnecessary information and use
the rest for processing.
Veracity is one of the characteristics of big data
analytics that denotes data inconsistency as well
as data uncertainty.
As an example, a huge amount of data can
create much confusion on the other hand, when
there is a fewer amount of data, that creates
inadequate information.
Other Characteristics of Big Data
Other than these five traits of big data in data science, there are a few more characteristics of
big data analytics that have been discussed down below:
1. Volatility
One of the big data characteristics is Volatility. Volatility means rapid change. And Big data
is in continuous change. Like data collected from a particular source change within a span of
a few days or so. This characteristic of Big Data hampers data homogenization. This process
is also known as the variability of data
2. Visualization
Visualization is one more characteristic of big data analytics. Visualization is the method of
representing that big data that has been generated in the form of graphs and charts. Big data
professionals have to share their big data insights with non-technical audiences on a daily
Business Intelligence vs Data Analytics
Business Intelligence:
The term Business Intelligence (BI) alludes to advances, applications, and
hones for the collection, integration, examination, and introduction of
business data. The reason for Commerce Insights is to bolster superior
trade choice making. Basically, Trade Insights frameworks are data-driven
Decision Support Systems (DSS). Business Intelligence is now and then
utilized traded with briefing books, reports and inquiry instruments, and
official data frameworks. Business Intelligence frameworks give
authentic, current, and prescient sees of commercial operations, most
frequently utilizing information that has been assembled into an
information stockroom or an information shop and sometimes working
from operational information.
Advantages of Business Intelligence:
• BI is focused on providing insights based on historical data, allowing
businesses to understand trends and patterns in their operations.
• BI provides a comprehensive view of the organization’s operations,
allowing managers to understand performance across multiple
departments and functions.
• BI can help identify opportunities for cost reduction and process
improvement, leading to increased efficiency and profitability.
Business Intelligence vs Data Analytics
Disadvantages of Business Intelligence:
• BI is focused on historical data, which may not provide an accurate
picture of current or future conditions.
• BI can be resource-intensive, requiring significant investment in data
collection and processing, as well as specialized software and hardware.
• BI may not provide the level of detail or granularity needed to address
specific business challenges.
Data analytics:
Data analytics (DA) is that the strategy of analyzing information sets to
conclude the data they contain, continuously with the assistance of
particular frameworks and computer program bundle. Information
Analytics strategies are generally utilized in IT Companies to improve the
associations to make more-information organization choices and by
researchers and analysts to test or diverse logical models, standards, and
information.
Advantages of Data Analytics:
• DA is focused on providing insights based on both historical and real-
time data, allowing businesses to understand trends and patterns in
their operations in real-time.
• DA provides a more granular view of the organization’s operations,
allowing managers to identify trends and insights that may not be
Business Intelligence vs Data Analytics
Disadvantages of Data Analytics:
• DA can be more challenging to implement than traditional BI methods,
requiring advanced data processing and analytics technologies.
• DA requires significant expertise in data science, making it more
difficult for organizations to build and maintain a capable team.
• DA can be resource-intensive, requiring significant investment in data
collection and processing, as well as specialized software and hardware.
Business Intelligence alludes to the data required to upgrade commerce Data Analytics alludes to altering the crude information into a significant
decision-making activities. arrange.
The prime reason of business intelligence is to supply back in choice- The prime reason for data analytics is to demonstrate, cleanse, foresee and
making and offer assistance the organizations to develop their business. change the information as per the trade needs.
BI component can be repaired as it were through verifiable information Data Analytics can be repaired through the proposed show to change over
given and the conclusion client requirements. the information into a important organize.
Data analytics has been around since19th century, but it has developed its
The term Business Intelligence has come into presence in 1865.
conspicuousness in 1960’s.
Key skills for business intelligence are Data collection and Management, Key skills for a data analysis A tall level of scientific ability, Programming
Data Stockroom concepts, Understanding of diverse data sources and languages, such as SQL, Oracle, and Python, the capacity to analyze,
exchange applications, Domain and business information. demonstrate and translate data, Problem-solving skills.
Big Data Analytics
•Tableau Public
•OpenRefine
•KNIME
•RapidMiner
•Google Fusion
Tables
•NodeXL
•Wolfram Alpha
Applications
1. Discovering consumer shopping habits
2. Personalized marketing