0% found this document useful (0 votes)
27 views74 pages

Unit 1

Uploaded by

ritikp.2110
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views74 pages

Unit 1

Uploaded by

ritikp.2110
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 74

Unit 1

Big Data
DATA
• Internet data (i.e., social media, social networking
links)
• Primary research (i.e., surveys, experiments,
observations)
• Secondary research (i.e., competitive and marketplace
data, industry reports, consumer data, business data)
• Location data (i.e., mobile device data)
• Image data (i.e., video, satellite image, surveillance)
• Supply chain data (i.e vendor catalogs and pricing,
quality
TYPE OF DATA
Structured data, Semi structured data & Unstructured data
Example Of Data
CASE STUDY EX restro

Market Research:
1. local population's age, income level
2. Competition
3.Location, food trend
Business Plan:
4. Menu
5. Pricing
6.Fundin
g
Operations
:
7. Staffin
g
8. Quality Control
3.Suppliers
4. Customer Service
Marketing:
9. Branding(logo &
design)
10.Online Presence
11.Advertising
12.Reviews and Feedback
Customer Experience:
13.Ambiance (Create a
welcoming and
14.Menu Variety
15.Service Speed
16.Feedback
Need Of Analytics
1.Inventory Management or stock management for buying new cloth for sell
2.customers' buying habits.
Online Retail Store 3. Website Optimization:
4. Sales Forecasting: Using historical sales data and trends,

5. Customer Feedback Analysis:


NEED OF VISUALIZATION
Data visualization helps to tell stories by present data into
a form easier to understand, highlighting the trends and outliers.
A good visualization tells a story, removing the noise from data and
highlighting useful information.
WHAT’S BIG DATA?
• Big data is the term for a collection of data sets so large and
complex that it becomes difficult to process using on-hand
database management tools or traditional data processing
applications.
• The challenges include capture, curation, storage, search,
sharing, transfer, analysis, and visualization.
• The trend to larger data sets is due to the additional
information derivable from analysis of a single large set of
related data, as compared to separate smaller sets with the
same total amount of data, allowing correlations to be found
to "spot business trends, determine quality of research,
prevent diseases, link legal citations, combat crime, and
determine real-time roadway traffic conditions.”

4
Evolution of Technology
IOT
Social Media
Other Factors
BIG DATA

11
Big data Generation
Computing perfect storm
• Big data analytics are the normal results of four
global trends
– Moore’s law (which always says that technology
always gets cheaper)
– Mobile computing(smart mobile phones or tablet in
our hands)
– Social networking (facebook, twitter, instagram,
pinterest)
– Cloud computing (we don’t even have to own
hardware or software anymore, we can rent or lease
someone else’s)
Data perfect storm
• Volumes of transactional data have been around
for decades for most big firms, but the flood gates
have now opened with more volume, and
velocity and variety – the 3 Vs – of data.
• This perfect storm of the three Vs makes it
extremely complex and cumbersome with the
current data management and analytics
technology and practises.
Convergence perfect storm
• Traditional data management and analytics software
and hardware technologies, open-source technology, and
commodity hardware are merging to create new
alternatives for IT and business executives to address Big
Data analytics.
A Single View to the Customer

Social Banking
Media
Finance

Our
Gaming
Customer Known

History

Entertain Purchase
1. Enterprise resource planning

2. Customer Relationship Management


BIG DATA TREND
• Data is becoming more valuable.
• Data Analytics is moving from batch to real
time.
STRUCTURED DATA
Data that resides within the fixed confines of a
record or file is known as structured data. Owing
to the fact that structured data – even in large
volumes – can be entered, stored, queried, and
analyzed in a simple and straightforward manner,
this type of data is best served by a traditional
database.
UNSTRUCTURED DATA
Data that comes from a variety of sources, such as
emails, text documents, videos, photos, audio files, and
social media posts, is referred to as unstructured data.
Being both complex and voluminous, unstructured data
cannot be handled or efficiently queried by a traditional
database. Hadoop’s ability to join, aggregate, and
analyze vast stores of multi-source data without having
to structure it first allows organizations to gain deeper
insights quickly.
Thus Hadoop is a perfect fit for companies looking to
store, manage, and analyze large volumes of unstructured
data.
INDUSTRY EXAMPLES OF
BIG DATA
Volume
Variety
Velocity
Value
Veracity
5 V’s of Big Data
Volume (Scale)

• Data Volume
– 44x increase from 2009 2020
– From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially

Exponential increase in
collected/generated data

37
Variety (Complexity)
• Relational Data (Tables/Transaction/Legacy Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
– Social Network, Semantic Web (RDF), …

• Streaming Data
– You can only scan the data once

• A single application can be generating/collecting


many types of data

• Big Public Data (online, weather, finance, etc)

To extract knowledge🡺 all


these types of data need
to linked together 38
Velocity (Speed)

• Data is begin generated fast and need to be


processed fast
• Online Data Analytics
• Late decisions 🡺 missing opportunities
• Examples
– E-Promotions: Based on your current location, your purchase history, what
you like 🡺 send promotions right now for store next to you

– Healthcare monitoring: sensors monitoring your activities and body 🡺 any


abnormal measurements require immediate reaction

39
WEB ANALYTICS
• Web analytics is the measurement, collection, analysis and reporting of
web data for purposes of understanding and optimizing web usage.
• Web analytics is not just a process for measuring web traffic but can be
used as a tool for business and market research, and to assess and
improve the effectiveness of a website.
• Web analytics applications can also help companies measure the results of
traditional print or broadcast advertising campaigns.
• It helps one to estimate how traffic to a website changes after the launch
of a new advertising campaign.
• Web analytics provides information about the number of visitors to a
website and the number of page views. It helps gauge traffic and
popularity trends which is useful for market research.
Most web analytics processes down to four essential stages or
steps, which are:

Step1 :Collection of data: This stage is the collection of the basic,


elementary data

Step2 :Processing of data into information: The objective of this


stage is to take the data and conform it into information, specifically
metrics.

Step3: Developing KPI( Key Performance Indicators): This stage


focuses on using the ratios ,counts and fusing them with business
strategies, referred to as Key Performance Indicators (KPI)

Step4:Formulating online strategy: This stage is concerned with


the online goals, objectives, and standards for the organization or
business. These strategies are usually related to making money,
saving money, or increasing marketshare.
ANALYTICS
DATA ANALYTICS VS DATA ANALYSIS

• Data analytics is a broader term and includes data


analysis as necessary subcomponent.
• Analytics defines the science behind the analysis. The
science means understanding the cognitive processes an
analyst uses to understand problems and explore data in
meaningful ways.
ANALYSIS VS ANALYTICS

Analysis Analytics
• Done on structured data • Unstructured data
• Descriptive model • Predictive Model
• Works on sample of data • Works on real data
• So error prone , not real • So less error ,reveal real
picture picture
• Face the challenges of data • Do not face this
collection challenge
FLOW OF DATA ANALYTICS
Real Time Analytics Processing
(RTAP)
• The big data ecosystem tools are most suitable to analyse time series data.
Time series data is very common in case of IoT (Internet-of-Things) and
M2M (Machine-to-Machine) applications and several monitoring devices.
• The sensors are being widely used in devices like household appliances,
smoke detectors and weather stations.
• These sensors collect data that is generated in extremely short intervals of
time.
• Hence it is important that data is stored according to the exact time that is
it generated.
• Real Time Analytics Processing (RTAP) and stream computing paradigms
are being widely used to process time bound data that is generated in real
time.
• Big data solutions like Apache Storm, S4 and Samza can be used to
perform RTAP. And for collecting the data from different devices various
queuing mechanisms can be applied.
The Model Has Changed…
• The Model of Generating/Consuming Data has Changed

Old Model: Few companies are generating data, all others are consuming data

New Model: all of us are generating data, and all of us are consuming data

51
WHAT’S DRIVING BIG DATA

- Optimizations and predictive analytics


- Complex statistical analysis
- All types of data, and many sources
- Very large datasets
- More of a real-time

- Ad-hoc querying and reporting


- Data mining techniques
- Structured data, typical sources
- Small to mid-size datasets

52
Main Components Of Big
Data
1. Machine

Learning
It is the science of making computers learn stuff by themselves.
• In machine learning, a computer is expected to use algorithms
and statistical models to perform specific tasks without any
explicit instructions.
• Machine learning applications provide results based on past
experience.
• For example, these days, there are some mobile applications
that will give you a summary of your finances, bills, will
remind you of your bill payments, and also may give you
suggestions to go for some saving plans.
• These functions are done by reading your emails and text
messages.
2. Natural Language Processing NLP
• It is the ability of a computer to understand human language as
spoken.
• The most obvious examples that people can relate to these days
are google home and Amazon Alexa.
• Both use NLP and other technologies to give us a virtual
assistant experience.
• NLP is all around us without us even realizing it. When writing
a mail, while making any mistakes, it automatically corrects
itself, and these days it gives auto-suggests for completing the
mails and automatically intimidates us when we try to send an
email without the attachment that we referenced in the text of
the email, this is part of Natural Language Processing
Applications which are running at the backend.
3. Business

Intelligence
Business Intelligence (BI) is a method or process that is
technology-driven to gain insights by analyzing data and
presenting it in a way that the end-users (usually high-level
executives) like managers and corporate leaders can gain
some actionable insights from it and make informed business
decisions on it.
4. Cloud
Computing
• If we go by the name, it should be computing done on clouds;
• Cloud here is a reference for the Internet.
• So we can define cloud computing as the delivery of
computing services—servers, storage, databases, networking,
software, analytics, intelligence, and moreover the Internet
(“the cloud”) to offer faster innovation, flexible resources,
and economies of scale.
Big Data Importance
• Big data analytics are important because they allow data
scientists and statisticians to dig deeper into vast amounts of
data to find new and meaningful insights.
• This is also important for industries from retail to
government in finding ways to improve customer service and
streamlining operations.
• The importance of big data analytics has increased along
with the variety of unstructured data that can be mined for
information: social media content, texts, clickstream data,
and the multitude of sensors from the Internet of Things.
• Big data analytics is necessary because traditional data
warehouses and relational databases can’t handle the flood
of unstructured data that defines today’s world. They are
best suited for structured data. They also can’t process the
demands of real-time data.
• Big data analytics fills the growing demand for understanding
unstructured data real time. This is particularly important for
companies that rely on fast-moving financial markets and the
volume of website or mobile activity.
• Enterprises see the importance of big data analytics in
helping the bottom line when it comes to finding new
revenue opportunities and improved efficiencies that
provide a competitive edge.
BIG DATA CHALLENGES
• Handling unstructured data:
a) there is no labeled data
b) Difficult to clean
c) Deriving a model and picking useful data is
difficult
d) Mining unstructured data
e) data is highly knowledge intensive
f) Uncertainty
g) Misleading terminology
BIG DATA Applications
•In Health care
•In Marketing
•In Medicine
•In
Advertising
BIG DATA IN HEALTHCARE
What Is Big Data In Healthcare? The application of big
data analytics in healthcare has a lot of positive and also
life-saving outcomes. Big data refers to the vast
quantities of information created by the digitization of
everything, that gets consolidated and analyzed by
specific technologies. Applied to healthcare, it will use
specific health data of a population (or of a particular
individual) and potentially help to prevent epidemics,
cure disease, cut down costs, etc. With healthcare data
analytics, prevention is better than cure and managing to
draw a comprehensive picture of a patient will let
insurances provide a tailored package.
BIG DATA IN MARKETING
Features of Big Data
• Easy Result Formats
• Raw data Processing
– Data Mining
– Data Modeling
– File Exporting
– Data File Sources
• Prediction apps or Identity Management
• Reporting Feature
• Security Features
• Fraud management
• Technologies Support
• Version Control
• Scalability
• Quick Integrations
Easy Result Formats
• The tools must be able to produce a result in
such a way that it can provide insights into
data analysis and decision-making platform.
The platform should be able to provide the
real-time streams that can help in making
instant and quick decisions.
Raw data Processing
• The data processing means collecting and organizing data in
a meaningful manner.
• Data modeling takes complex data sets and displays them in
the visual form or diagram or chart.
• Here, data should be interpretable and digestible so that it
can be used in making decisions. Below-listed features are
essential for the data processing tools:
– Data Mining
– Data Modeling
– File Exporting
– Data File Sources
Reporting Feature
• Businesses remain on top with the help of reporting
features.
• Even time-to-time data should be fetched and represented
in a well-organized manner.
• These way decision-makers can take timely decisions and
handle the critical situations as well, especially in a society
that is moving rapidly.
• Data tools use dashboards to present KPIs (key performance
indicator) and metrics.
• The reports must be customizable and target data set
oriented.
• The expected capabilities of reporting tools are Real-time
reporting, dashboard management, and location-based
insights.
Security Features
• For any successful business, it is essential to save their data.
• The tools that are used for big data analytics should offer
safety and security to the data.
• Data encryption is an imperative feature that should be
provided by Big Data analytics tools.
• It means to change the form of data or to make it unreadable
from a readable form by using several algorithms and codes.
• Sometimes automatic encryption is also offered by web
browsers.
• Comprehensive encryption capabilities are also offered by
data analytics tools. For this single sign-on and data
encryption are two of the most used and popular features.
Fraud Management
• A variety of fraud detection functionalities remain involved in
the fraud analytics.
• Due to these activities, businesses mainly focus on the way
with which they will deal with the fraud rather than
preventing any fraud.
• Fraud detection can be performed by data analytics tools.
• The tools should be able to perform repeated tests on the
data at any time just to ensure that there will be no incorrect
data.
• In this way, threats can be identified quickly and efficiently,
with effective fraud analytics and identity management
capabilities.
Technologies Support
• Here both the versions are compared on the basis in which user interacts with
the webpage and then the best one is considered.
• Moreover, as far as technical support is concerned then your tool must be able to
integrate with Hadoop, that is a set of open-source programs that can work as the
backbone of data-analytics activities.
• Hadoop mainly involves the following four modules with which integration is
expected:
– MapReduce: It can read data from a file system that can be interpreted in
the visualized manner.
– Hadoop Common: For this, Java tool collection may be required to read data
stored in the user’s file system.
– YARN: It is responsible to manage system resources so that data can be
stored and analysis can be performed
– Distributed File System: It allows data to be stored in an easy format. If the
results of tools will be integrated with these Hadoop modules then the user
can easily send the results to the user system. In this way flexibility,
interoperability and both way communication can be ensured between
organizations.
Version Control
• Most of the data analytics tools are involved
in adjusting data analytics model parameters.
But it may cause problems when pushed into
production.
• Version control feature of big analytics tools
will surely improve the capabilities to track
changes and it is able to release previous
versions too whenever needed.
Scalability
• Data will not the same all the times but it will
grow as your organization is growing.
• With big data tools, this is always easy to
scale-up as soon as new data is collected for
the company and it can be analyzed well as
expected.
• Also, the meaningful insights driven from data
is pushed or integrated into the previous data
successfully.
Quick Integration
• With integration capabilities, this is always easy
to share data results with developers and data
scientists.
• Big data tools always support the quick
integration with cloud apps, data warehouses,
other databases etc.

You might also like