Unit 1
Unit 1
Big Data
DATA
• Internet data (i.e., social media, social networking
links)
• Primary research (i.e., surveys, experiments,
observations)
• Secondary research (i.e., competitive and marketplace
data, industry reports, consumer data, business data)
• Location data (i.e., mobile device data)
• Image data (i.e., video, satellite image, surveillance)
• Supply chain data (i.e vendor catalogs and pricing,
quality
TYPE OF DATA
Structured data, Semi structured data & Unstructured data
Example Of Data
CASE STUDY EX restro
Market Research:
1. local population's age, income level
2. Competition
3.Location, food trend
Business Plan:
4. Menu
5. Pricing
6.Fundin
g
Operations
:
7. Staffin
g
8. Quality Control
3.Suppliers
4. Customer Service
Marketing:
9. Branding(logo &
design)
10.Online Presence
11.Advertising
12.Reviews and Feedback
Customer Experience:
13.Ambiance (Create a
welcoming and
14.Menu Variety
15.Service Speed
16.Feedback
Need Of Analytics
1.Inventory Management or stock management for buying new cloth for sell
2.customers' buying habits.
Online Retail Store 3. Website Optimization:
4. Sales Forecasting: Using historical sales data and trends,
4
Evolution of Technology
IOT
Social Media
Other Factors
BIG DATA
11
Big data Generation
Computing perfect storm
• Big data analytics are the normal results of four
global trends
– Moore’s law (which always says that technology
always gets cheaper)
– Mobile computing(smart mobile phones or tablet in
our hands)
– Social networking (facebook, twitter, instagram,
pinterest)
– Cloud computing (we don’t even have to own
hardware or software anymore, we can rent or lease
someone else’s)
Data perfect storm
• Volumes of transactional data have been around
for decades for most big firms, but the flood gates
have now opened with more volume, and
velocity and variety – the 3 Vs – of data.
• This perfect storm of the three Vs makes it
extremely complex and cumbersome with the
current data management and analytics
technology and practises.
Convergence perfect storm
• Traditional data management and analytics software
and hardware technologies, open-source technology, and
commodity hardware are merging to create new
alternatives for IT and business executives to address Big
Data analytics.
A Single View to the Customer
Social Banking
Media
Finance
Our
Gaming
Customer Known
History
Entertain Purchase
1. Enterprise resource planning
• Data Volume
– 44x increase from 2009 2020
– From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially
Exponential increase in
collected/generated data
37
Variety (Complexity)
• Relational Data (Tables/Transaction/Legacy Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
– Social Network, Semantic Web (RDF), …
• Streaming Data
– You can only scan the data once
39
WEB ANALYTICS
• Web analytics is the measurement, collection, analysis and reporting of
web data for purposes of understanding and optimizing web usage.
• Web analytics is not just a process for measuring web traffic but can be
used as a tool for business and market research, and to assess and
improve the effectiveness of a website.
• Web analytics applications can also help companies measure the results of
traditional print or broadcast advertising campaigns.
• It helps one to estimate how traffic to a website changes after the launch
of a new advertising campaign.
• Web analytics provides information about the number of visitors to a
website and the number of page views. It helps gauge traffic and
popularity trends which is useful for market research.
Most web analytics processes down to four essential stages or
steps, which are:
Analysis Analytics
• Done on structured data • Unstructured data
• Descriptive model • Predictive Model
• Works on sample of data • Works on real data
• So error prone , not real • So less error ,reveal real
picture picture
• Face the challenges of data • Do not face this
collection challenge
FLOW OF DATA ANALYTICS
Real Time Analytics Processing
(RTAP)
• The big data ecosystem tools are most suitable to analyse time series data.
Time series data is very common in case of IoT (Internet-of-Things) and
M2M (Machine-to-Machine) applications and several monitoring devices.
• The sensors are being widely used in devices like household appliances,
smoke detectors and weather stations.
• These sensors collect data that is generated in extremely short intervals of
time.
• Hence it is important that data is stored according to the exact time that is
it generated.
• Real Time Analytics Processing (RTAP) and stream computing paradigms
are being widely used to process time bound data that is generated in real
time.
• Big data solutions like Apache Storm, S4 and Samza can be used to
perform RTAP. And for collecting the data from different devices various
queuing mechanisms can be applied.
The Model Has Changed…
• The Model of Generating/Consuming Data has Changed
Old Model: Few companies are generating data, all others are consuming data
New Model: all of us are generating data, and all of us are consuming data
51
WHAT’S DRIVING BIG DATA
52
Main Components Of Big
Data
1. Machine
•
Learning
It is the science of making computers learn stuff by themselves.
• In machine learning, a computer is expected to use algorithms
and statistical models to perform specific tasks without any
explicit instructions.
• Machine learning applications provide results based on past
experience.
• For example, these days, there are some mobile applications
that will give you a summary of your finances, bills, will
remind you of your bill payments, and also may give you
suggestions to go for some saving plans.
• These functions are done by reading your emails and text
messages.
2. Natural Language Processing NLP
• It is the ability of a computer to understand human language as
spoken.
• The most obvious examples that people can relate to these days
are google home and Amazon Alexa.
• Both use NLP and other technologies to give us a virtual
assistant experience.
• NLP is all around us without us even realizing it. When writing
a mail, while making any mistakes, it automatically corrects
itself, and these days it gives auto-suggests for completing the
mails and automatically intimidates us when we try to send an
email without the attachment that we referenced in the text of
the email, this is part of Natural Language Processing
Applications which are running at the backend.
3. Business
•
Intelligence
Business Intelligence (BI) is a method or process that is
technology-driven to gain insights by analyzing data and
presenting it in a way that the end-users (usually high-level
executives) like managers and corporate leaders can gain
some actionable insights from it and make informed business
decisions on it.
4. Cloud
Computing
• If we go by the name, it should be computing done on clouds;
• Cloud here is a reference for the Internet.
• So we can define cloud computing as the delivery of
computing services—servers, storage, databases, networking,
software, analytics, intelligence, and moreover the Internet
(“the cloud”) to offer faster innovation, flexible resources,
and economies of scale.
Big Data Importance
• Big data analytics are important because they allow data
scientists and statisticians to dig deeper into vast amounts of
data to find new and meaningful insights.
• This is also important for industries from retail to
government in finding ways to improve customer service and
streamlining operations.
• The importance of big data analytics has increased along
with the variety of unstructured data that can be mined for
information: social media content, texts, clickstream data,
and the multitude of sensors from the Internet of Things.
• Big data analytics is necessary because traditional data
warehouses and relational databases can’t handle the flood
of unstructured data that defines today’s world. They are
best suited for structured data. They also can’t process the
demands of real-time data.
• Big data analytics fills the growing demand for understanding
unstructured data real time. This is particularly important for
companies that rely on fast-moving financial markets and the
volume of website or mobile activity.
• Enterprises see the importance of big data analytics in
helping the bottom line when it comes to finding new
revenue opportunities and improved efficiencies that
provide a competitive edge.
BIG DATA CHALLENGES
• Handling unstructured data:
a) there is no labeled data
b) Difficult to clean
c) Deriving a model and picking useful data is
difficult
d) Mining unstructured data
e) data is highly knowledge intensive
f) Uncertainty
g) Misleading terminology
BIG DATA Applications
•In Health care
•In Marketing
•In Medicine
•In
Advertising
BIG DATA IN HEALTHCARE
What Is Big Data In Healthcare? The application of big
data analytics in healthcare has a lot of positive and also
life-saving outcomes. Big data refers to the vast
quantities of information created by the digitization of
everything, that gets consolidated and analyzed by
specific technologies. Applied to healthcare, it will use
specific health data of a population (or of a particular
individual) and potentially help to prevent epidemics,
cure disease, cut down costs, etc. With healthcare data
analytics, prevention is better than cure and managing to
draw a comprehensive picture of a patient will let
insurances provide a tailored package.
BIG DATA IN MARKETING
Features of Big Data
• Easy Result Formats
• Raw data Processing
– Data Mining
– Data Modeling
– File Exporting
– Data File Sources
• Prediction apps or Identity Management
• Reporting Feature
• Security Features
• Fraud management
• Technologies Support
• Version Control
• Scalability
• Quick Integrations
Easy Result Formats
• The tools must be able to produce a result in
such a way that it can provide insights into
data analysis and decision-making platform.
The platform should be able to provide the
real-time streams that can help in making
instant and quick decisions.
Raw data Processing
• The data processing means collecting and organizing data in
a meaningful manner.
• Data modeling takes complex data sets and displays them in
the visual form or diagram or chart.
• Here, data should be interpretable and digestible so that it
can be used in making decisions. Below-listed features are
essential for the data processing tools:
– Data Mining
– Data Modeling
– File Exporting
– Data File Sources
Reporting Feature
• Businesses remain on top with the help of reporting
features.
• Even time-to-time data should be fetched and represented
in a well-organized manner.
• These way decision-makers can take timely decisions and
handle the critical situations as well, especially in a society
that is moving rapidly.
• Data tools use dashboards to present KPIs (key performance
indicator) and metrics.
• The reports must be customizable and target data set
oriented.
• The expected capabilities of reporting tools are Real-time
reporting, dashboard management, and location-based
insights.
Security Features
• For any successful business, it is essential to save their data.
• The tools that are used for big data analytics should offer
safety and security to the data.
• Data encryption is an imperative feature that should be
provided by Big Data analytics tools.
• It means to change the form of data or to make it unreadable
from a readable form by using several algorithms and codes.
• Sometimes automatic encryption is also offered by web
browsers.
• Comprehensive encryption capabilities are also offered by
data analytics tools. For this single sign-on and data
encryption are two of the most used and popular features.
Fraud Management
• A variety of fraud detection functionalities remain involved in
the fraud analytics.
• Due to these activities, businesses mainly focus on the way
with which they will deal with the fraud rather than
preventing any fraud.
• Fraud detection can be performed by data analytics tools.
• The tools should be able to perform repeated tests on the
data at any time just to ensure that there will be no incorrect
data.
• In this way, threats can be identified quickly and efficiently,
with effective fraud analytics and identity management
capabilities.
Technologies Support
• Here both the versions are compared on the basis in which user interacts with
the webpage and then the best one is considered.
• Moreover, as far as technical support is concerned then your tool must be able to
integrate with Hadoop, that is a set of open-source programs that can work as the
backbone of data-analytics activities.
• Hadoop mainly involves the following four modules with which integration is
expected:
– MapReduce: It can read data from a file system that can be interpreted in
the visualized manner.
– Hadoop Common: For this, Java tool collection may be required to read data
stored in the user’s file system.
– YARN: It is responsible to manage system resources so that data can be
stored and analysis can be performed
– Distributed File System: It allows data to be stored in an easy format. If the
results of tools will be integrated with these Hadoop modules then the user
can easily send the results to the user system. In this way flexibility,
interoperability and both way communication can be ensured between
organizations.
Version Control
• Most of the data analytics tools are involved
in adjusting data analytics model parameters.
But it may cause problems when pushed into
production.
• Version control feature of big analytics tools
will surely improve the capabilities to track
changes and it is able to release previous
versions too whenever needed.
Scalability
• Data will not the same all the times but it will
grow as your organization is growing.
• With big data tools, this is always easy to
scale-up as soon as new data is collected for
the company and it can be analyzed well as
expected.
• Also, the meaningful insights driven from data
is pushed or integrated into the previous data
successfully.
Quick Integration
• With integration capabilities, this is always easy
to share data results with developers and data
scientists.
• Big data tools always support the quick
integration with cloud apps, data warehouses,
other databases etc.