0% found this document useful (0 votes)
7 views17 pages

BDA Unit 1

Big data refers to large and complex data sets characterized by high volume, velocity, and variety, requiring innovative processing methods for valuable insights. It enables organizations to understand customer behavior, optimize operations, and drive innovation, while also presenting challenges such as rapid data growth and security concerns. The evolution of big data has led to its application across various industries, enhancing decision-making and competitive advantage.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views17 pages

BDA Unit 1

Big data refers to large and complex data sets characterized by high volume, velocity, and variety, requiring innovative processing methods for valuable insights. It enables organizations to understand customer behavior, optimize operations, and drive innovation, while also presenting challenges such as rapid data growth and security concerns. The evolution of big data has led to its application across various industries, enhancing decision-making and competitive advantage.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

UNIT - I

INTRODUCTION
Gartner definition:
Big data is high-volume, velocity, and variety information assets that demand cost-
effective, innovative forms of information processing for enhanced insight and decision making.
Big data refers to complex and large data sets that have to be processed and analyzed to
uncover valuable information that can benefit businesses and organizations.
Big Data refers to massive amounts of data produced by different sources like social media
platforms, web logs, sensors, IoT devices, and many more. It can be either structured (like tables in
DBMS), semi-structured (like XML files), or unstructured (like audios, videos, images).
It helps companies to generate valuable insights.
It can’t equate to any specific data volume. Big data deployments can involve terabytes,
petabytes, and even exabytes of data captured over time.

High-volume
High-velocity
High-variety

Cost-effective,
Innovative forms of
information processing

Enhanced insight &


decision making

The first part "Big data is high-volume, high-velocity, and high-variety information assets"
talks about voluminous data that may have great mixture of structured, semi-structured and
unstructured data and will require a good speed/pace for storage, preparation, processing and
analysis.
The second part "cost effective, innovative forms of information processing" talks about
embracing new techniques and technologies to capture, store, process, persist, integrate and visualize
the high-volume, high-velocity and high-variety data.
The third part "enhanced insight and decision making" talks about deriving deeper, richer
and meaningful insights and then using these insights to make faster and better decisions to gain
business value and thus a competitive edge.
Data—>Information—>Actionable intelligence—>Better decisions—>Enhanced business value

Why Big Data?


Leveraging a Big Data analytics solution helps organizations to unlock the strategic values
and take full advantage of their assets.
It helps organizations:
 To understand Where, When and Why their customers buy
 Protect the company’s client base with improved loyalty programs
 Seizing cross-selling and upselling opportunities
 Provide targeted promotional information
 Optimize Workforce planning and operations
 Improve inefficiencies in the company’s supply chain
 Predict market trends
 Predict future needs
 Make companies more innovative and competitive
 It helps companies to discover new sources of revenue

Companies are using big data to know what their customers want, who are their best
customers, why people choose different products. The more a company knows about its
customers, the more competitive it becomes.
It is possible to use it with Machine Learning for creating market strategies based on
predictions about customers. Leveraging big data makes companies customer-centric.
Companies can use historical and real-time data to assess evolving consumer’s
preferences. This consequently enables businesses to improve and update their marketing
strategies which make companies more responsive to customer needs.

Importance of Big data


The companies in the present market need to collect it and analyze it because:
 Cost Savings
Big Data tools like Apache Hadoop, Spark, etc. bring cost-saving benefits to businesses when
they have to store large amounts of data. These tools help organizations in identifying more
effective ways of doing business.
 Time-Saving
Real-time in-memory analytics helps companies to collect data from various sources. Tools
like Hadoop help them to analyze data immediately thus helping in making quick decisions based
on the learnings.
 Understand the market conditions
Big Data analysis helps businesses to get a better understanding of market situations.
 Social Media Listening
Companies can perform sentiment analysis using big data tools. These enable them to get
feedback about their company, that is, who is saying what about the company.
Companies can use big data tools to improve their online presence.
 Boost Customer Acquisition and Retention
Customers are a vital asset on which any business depends on. No single business can achieve
its success without building a robust customer base. But even with a solid customer base, the
companies can’t ignore the competition in the market.
Big data analytics helps businesses to identify customer related trends and patterns. Customer
behavior analysis leads to a profitable business.
 Solve Advertisers Problem and Offer Marketing Insights
Big data analytics shapes all business operations. It enables companies to fulfill customer
expectations. Big data analytics helps in changing the company’s product line. It ensures
powerful marketing campaigns.
 The driver of Innovations and Product Development
Big data makes companies capable to innovate and redevelop their products.

Real-Time Benefits of Big Data


Big Data analytics has expanded its roots in all the fields. This results in the use of Big
Data in a wide range of industries including Finance and Banking, Healthcare, Education,
Government, Retail, Manufacturing, and many more.
There are many companies like Amazon, Netflix, Spotify, LinkedIn, Swiggy, etc which
use big data analytics. Banking sectors make the maximum use of Big Data Analytics. Education
sector is also using data analytics to enhance students’ performance as well as making teaching
easier for instructors.
Big Data analytics help retailers from traditional to e-commerce to understand customer
behaviour and recommend products as per customer interest. This helps them in developing new
and improved products which help the firm enormously.

Challenges of Big Data


 Quick Data Growth
Data growing at such a quick rate is making it a challenge to find insights from it. There is
more and more data generated every second from which the data that is actually relevant and useful
has to be picked up for further analysis.
 Storage
Such large amount of data is difficult to store and manage by organizations without
appropriate tools and technologies.
 Syncing Across Data Sources
This implies that when organizations import data from different sources the data from one
source might not be up to date as compared to the data from another source.
 Security
Huge amount of data in organizations can easily become a target for advanced persistent
threats.
 Unreliable Data
It can’t deny the fact that big data can’t be 100 percent accurate. It might contain redundant or
incomplete data, along with contradictions.
 Miscellaneous Challenges
These are some other challenges that come forward while dealing with big data, like the
integration of data, skill and talent availability, solution expenses and processing a large amount of
data in time and with accuracy so that the data is available for data consumers whenever they need it.
Capture

Storage

Curation

Challenges with Big Data Search

Analysis

Transfer

Visualization

Privacy Violations

Types of Big Data


Irrespective of the size of the enterprise whether it is big or small, data continues to be a
precious and irreplaceable asset. Data is present in homogeneous sources as well as in heterogeneous
sources. The need of the hour is to understand, manage, process, and take the data for analysis to
draw valuable insights.
Digital data can be structured, unstructured and semi-structured data.
a) Structured
Structured is one of the types of big data that can be processed, stored, and retrieved in a
fixed format. It refers to highly organized information that can be readily and seamlessly stored and
accessed from a database by simple search engine algorithms. When data follows a pre-defined
schema/structure we say it is structured data. This is the data which is in an organized form (e.g., in
rows and columns) and be easily used by a computer program. Relationships exist between entities
of data, such as classes and their objects. About 10% data of an organization is in this format. Data
stored in databases is an example of structured data.

b) Unstructured
This is the data which does not conform to a data model or is not in a form which can be used
easily by a computer program. It refers to the data that lacks any specific form or structure
whatsoever. This makes it very difficult and time-consuming to process and analyze unstructured
data. About 80% data of an organization is in this format; for example, memos, chat rooms,
powerpoint presentations, images, videos, letters, researches, white papers, body of an email, etc.

c) Semi-structured
Semi-structured data is also referred to as self-describing structure. This is the data which does not
conform to a data model but has some structure. It pertains to the data containing both the formats
mentioned above, that is, structured and unstructured data. To be precise, it refers to the data that
although has not been classified under a particular repository (database), yet contains vital
information or tags that segregate individual elements within the data. About 10% data of an organization
is in this format; for example, HTML, XML, JSON, email data etc.

Digital Data
Unstructured data

Semi-structured data

Structured data

Characteristics of Big Data


The data has three key characteristics:
Composition
The composition of data deals with the structure of data, that is, the sources of data, the
granularity, the types, and the nature of data as to whether it is static or real-time streaming.

Condition
The condition of data deals with the state of data, that is, "Can one use this data as is for
analysis?" or "Does it require cleansing for further enhancement and enrichment?"
Context
The context of data deals with "Where has this data been generated?" "Why was this data
generated?" “How sensitive is this data?" "What are the events associated with this data?".
Small data (data as it existed prior to the big data revolution) is about certainty. It is about
known data sources; it is about no major changes to the composition or context of data.
Composition

Data Condition

Context

Evolution of Big Data


1970s and before was the era of mainframes. The data was essentially primitive and
structured. Relational databases evolved in 1980s and 1990s. The era was of data intensive
applications. The World Wide Web (WWW) and the Internet of Things (IOT) have led to an
onslaught of structured, unstructured, and multimedia data.
V’s of Big Data
It is data that is big in volume, velocity, variety, veracity and value.

VOLUME
 Terabyte
 Records
VARIETY  Tables, Files VELOCITY
 Structured  Distributed  Batch
 Unstructured  Real time
 Probabilistic  Processes
 Linked 5 V’s of Big  Stream
Data
VERACITY VALUE
 Authenticity  Statistical
 Reputation  Events
 Availability VARIABILITY  Correlations
 Accountability  Changing data  Hypothetical
 Changing model 
 Linkage

Volume
Volume is a huge amount of data. To determine the value of data, size of data plays a very
crucial role. If the volume of data is very large then it is actually considered as a ‘Big Data’.
Bits—>Bytes—>Kilobytes—>Megabytes—>Gigabytes—>Terabytes—>Petabytes—>
Exabytes—>Zettabytes—>Yottabytes

Archives Data Storage

Media
Sensor Data

Sources of Big Data Docs


Machine log data

Business apps
Public web
Social Media

Velocity
Velocity refers to the high speed of accumulation of data.
In Big Data velocity data flows in from sources like machines, networks, social media,
mobile phones etc.
Batch—>Periodic—>Near real time—>Real-time processing
There is a massive and continuous flow of data. This determines the potential of data that
how fast the data is generated and processed to meet the demands.
Variety
It refers to nature of data that is structured, semi-structured and unstructured data.
It also refers to heterogeneous sources.
Variety is basically the arrival of data from new sources that are both inside and outside of an
enterprise. It can be structured, semi-structured and unstructured.
o Structured data: From traditional transaction processing systems and RDBMS etc.
o Semi-Structured data: HTML, XML
o Unstructured data: Unstructured text documents, audio, video, email, photos, PDFs, social
media, etc

Veracity
The “truth” or accuracy of data and information assets, which often determines executive-level
confidence.
It refers to inconsistencies and uncertainty in data, quality and accuracy are difficult to
control.
Big Data is also variable because of the multitude of data dimensions resulting from multiple
disparate data types and sources.

Value
The value of big data usually comes from insight discovery and pattern recognition that lead
to more effective operations, stronger customer relationships and other clear and quantifiable
business benefits.

Variability
The changing nature of the data companies seek to capture, manage and analyze in sentiment
or text analytics, changes in the meaning of keywords or phrases.

Drivers for Big Data


Volume, variety, velocity and value are the four key drivers of the big data revolution. The
exponential rise in data volumes is putting an increasing strain on the conventional data storage
infrastructures in place in major companies and organizations.
There are three contributing factors: consumers, automation, and monetization.
Sophisticated Consumers
The increase in information level and the associated tools has created a new breed of
sophisticated consumers. These consumers are far more analytic, far savvier at using statistics, and
far more connected, using social media to rapidly collect and collate opinion from others.
Email and text messages rapidly led toward increased interpersonal inter-actions.
There are many ways to utilize social networks to influence purchase and reuse:
 Studying consumer experience
A fair amount of this data is unstructured. By analyzing the text for sentiments,
intensity, readership, related blogs, referrals, and other information, data can be organized
into positive and negative influences and their impact on the customer base.
 Organizing customer experience
To provide reviews to a prospective buyer, so that they can gauge how others
evaluated the product.
 Influencing social networks
To provide marketing material, product changes, company directions, and celebrity
endorsements to social networks, so that social media may influence and enhance the buzz.
 Feedback to products, operations, or marketing
By using information generated by social media, we can rapidly make changes in the
product mix and marketing to improve the offering to customers.

Automation
Interactive Voice Response (IVR), kiosks, mobile devices, email, chat, corporate websites,
third-party applications, and social networks have generated a fair amount of event information about
the customers.
 Product
As products become increasingly electronic, they provide a lot of valuable data to the supplier
regarding product use and product quality. In many cases, suppliers can also collect information
about the context in which a product was used. Products can also supply information related to
frequency of use, interruptions, usage skipping, and other related aspects.
 Electronic touch points
A fair amount of data can be collected from the touch points used for product shopping,
purchase, use, or payment.
 Components
Sometimes, components may provide additional information. This information could include
data about component failures, use, or lack thereof.

Monetization
A data bazaar is the biggest enabler to create an external marketplace, where we collect,
exchange, and sell customer information. We are seeing a new trend in the marketplace, in which
customer experience from one industry is anonymized, packaged, and sold to other industries.
 Location
It is increasingly available to suppliers. Assuming a product is consumed in conjunction with
a mobile device, the location of the consumer becomes an important piece of information that may
be available to the supplier.
 Cookies
Web browsers carry enormous information using web cookies. Some of this may be directly
associated with touch points.
 Usage data
A number of data providers have started to collect, synthesize, categorize, and package
information for reuse. This includes credit-rating agencies that rate consumers, social networks with
blogs published and cable companies with audience information.

Data Environment versus Big Data Environment


Basis of
Small Data Big Data
Comparison
Data sets that are so large or complex
In a volume and format that makes it
Definition that traditional data processing
accessible, informative and actionable
applications cannot deal with them
Data from traditional enterprise ● Purchase data from point-of-sale
systems like Enterprise resource Clickstream data from websites
Data Source planning, Customer relationship
GPS stream data
management, Financial data, payment
transaction data from website Social media - Facebook, Twitter

Most cases in a range of tens or


Volume More than a few Terabytes (TB)
hundreds of GB
● Data can arrive at very fast speed.
● Controlled and steady data flow.
Velocity ● Enormous data can accumulate
● Data accumulation is slow.
withinvery short period of time.
High variety data sets which include
Structured data in tabular format with
tabular data, text files, images, video,
Variety fixed schema and semi-structured data
audio, XML, JSON, logs, sensor data
in JSON or XML format
etc.
Usually, the quality of data not
Contains less noise as data collected in a
Veracity guaranteed. Rigorous data validation is
controlled manner.
required before processing.
Complex data mining for
Business Intelligence, Analysis and
Value prediction, recommendation, pattern
Reporting finding, etc.
Historical data equally valid as data In some cases, data gets older soon
Time Variance represent solid business interactions
(ex. fraud detection).
Databases within an enterprise, local Mostly in distributed storages on
Data Location
servers, etc. Cloud or in external file systems.
Predictable resource allocation. Mostly More agile infrastructure with a
Infrastructure
vertically scalable hardware. horizontally scalable architecture.
Business Intelligence versus Big Data
Big Data and Business Intelligence are two technologies used to analyze data to help
companies in the decision-making process, there are differences between both of them. They differ
in the way they work as much as in the type of data they analyze.
Traditional BI methodology is based on the principle of grouping all business data into a
central server. Typically, this data is analyzed in offline mode, after storing the information in an
environment called Data Warehouse. The data is structured in a conventional relational database
with an additional set of indexes and forms of access to the tables (multidimensional cubes).
A Big Data solution differs in many aspects to BI to use. The following are the main differences
betweenBig Data and Business Intelligence:
1. In a Big Data environment, information is stored on a distributed file system, rather than
on a central server. It is a much safer and more flexible space.
2. Big Data solutions carry the processing functions to the data, rather than the data to the
functions. As the analysis is centered on the information, it´s easier to handle larger
amounts of information in a more agile way.
3. Big Data can analyze data in different formats, both structured and unstructured. The
volume of unstructured data is growing at levels much higher than the structured data.
Nevertheless, its analysis carries different challenges. Big Data solutions solve them by
allowing a global analysis of various sources ofinformation.
4. Data processed by Big Data solutions can be historical or come from real-time sources.
Thus, companies can make decisions that affect their business in an agile and efficient way.
5. Big Data technology uses Massively Parallel Processing (MPP) concepts, which improves
the speed of analysis. With MPP many instructions are executed simultaneously, and since
the various jobs are divided into several parallel execution parts, at the end the overall
results are reunited and presented. This allows you to analyze large volumes of
information quickly.

Big Data Analytics


Big Data Analytics is
 Technology-enabled analytics: Quite a few data analytics and visualization tools are
available in the market today from leading vendors such as IBM, Tableau, SAS, R
Analytics, Statistical, World Programming Systems (WPS), etc. to help process and
analyze your big data.
 About gaining a meaningful, deeper, and richer insight into your business to steer it in
the right direction, understanding the customer's demographics to cross-sell and up-sell
to them, better leveraging the services of your vendors and suppliers, etc.
 About a competitive edge over competitors by enabling with findings that allow quicker
and better decision-making.
 A tight handshake between three communities: IT, business users, and data scientists.
 Working with datasets whose volume and variety exceed the current storage and
processing capabilities and infrastructure of your enterprise.
 About moving code to data. This makes perfect sense as the program for distributed
processing is tiny (just a few KBs) compared to the data (Terabytes/Petabytes/Exabytes).
Better, faster Move code to data for
decisions in real time greater speed and
efficiency

Richer, deeper
Working with datasets whose insights into
volume and variety is beyond customers, partners
the storage and processing and the business
capability of a typical Big Data
database software Analytics
Competitive
advantage

ITs collaboration with


business users and
data scientists
Time-sensitive decisions Technology enabled
made in near real time by analytics
processing a steady
stream of real-time data

Classification of Analytics
There are basically two schools of thought:
 Classify analytics into basic, operationalized, advanced and monetized.
 Classify analytics into analytics 1.0, analytics 2.0, and analytics 3.0.
First School of Thought
Basic analytics: This primarily is slicing and dicing of data to help with basic business insights.
This is about reporting on historical data, basic visualization, etc.
Operationalized analytics: It is operationalized analytics if it gets woven into the enterprises
business processes.
Advanced analytics: This largely is about forecasting for the future by way of predictive and
prescriptive modelling.
Monetized analytics: This is analytics in use to derive direct business revenue.

Second School of Thought


Analytics 1.0 Analytics 2.0 Analytics 3.0
Era: mid 1990s to 2005 to 2012 2012 to present
2009 Descriptive statistics + Descriptive + predictive +
Descriptive Statistics predictive statistics (use data prescriptive statistics
(report on fromthe past to make (use data from the past to make
events, occurrences, predictions for the future) prophecies for the future and at the
etc. of the past) same time make recommendations
to leverage the situation to one's
advantage)
Key questions asked: Key questions asked: Key questions asked:
What happened? What happened? What will happen?
Why did it happen? Why will it happen? When will it happen?
Why will it happen?
What should be the action taken to
take advantage of what will
happen?
Data from legacy Big data A blend of big data and
systems, ERP, CRM, data from legacy
and 3rd party systems, ERP, CRM,
applications and 3rd party applications
Small and structured Big data is being taken up A blend of big data and traditional
data sources. Data seriously. analytics to yield insights and
stored in enterprise Data is mainly unstructured, offerings with speed and impact
data warehouses or data arriving at a much higher
marts pace. This fast flow of data
entailed that the influx of big
volume data had to be stored
and processed rapidly, often
on massive parallel servers
running Hadoop
Data was internally Data was often externally Data is both being internally and
sourced sourced externally sourced
Relational databases Database appliances, In memory analytics, in database
Hadoop clusters, SQL to processing, agile analytical
Hadoop environments, etc methods, machine learning
techniques etc

How can we make


it happen?

Perspective
What will happen? Analytics
Why did it happen? Predictive
Analytics
What happened? Diagnostic
Analytics
Descriptive
Analytics

Challenges facing Big Data


 Scale: Storage (RDBMS) or NoSQL (Not only SQL) is one major concern that needs to be
addressed to handle the need for scaling rapidly and elastically. The need of the hour is a
storage that can best withstand the attack of large volume, velocity and variety of big data.
 Security: Most of the NoSQL big data platforms have poor security mechanisms (lack of
proper authentication and authorization mechanisms) when it comes to safeguarding big data.
A spot that cannot be ignored given that big data carries credit card information, personal
information and other sensitive data.
 Schema: Rigid schemas have no place. We want the technology to be able to fit our big data
and not the other way around. The need of the hour is dynamic schema. Static (pre-defined
schemas) are obsolete.
 Continuous availability: The big question here is how to provide 24/7 support because almost
all RDBMS and NoSQL big data platforms have a certain amount of downtime built in.
 Consistency: Should one opt for consistency or eventual consistency?
 Partition tolerant: How to build partition tolerant systems that can take care of both hardware
and software failures?
 Data quality: How to maintain data quality-accuracy, completeness, timeliness?

Importance of Big Data Analytics


 Reactive-Business Intelligence: It allows the businesses to make faster and better decisions
by providing the right information to the right person at the right time in the right format. It is
about analysis of the past or historical data and then displaying the findings of the analysis or
reports in the form of enterprise dashboards, alerts, notifications, etc. It has support for both
pre-specified reports as well as ad hoc querying.
 Reactive-Big Data Analytics: Here the analysis is done on huge datasets but the approach is
still reactive as it is still based on static data.
 Proactive-Analytics: This is to support futuristic decision making by use of data mining
predictive modelling, text mining and statistical analysis on. This analysis is not on big data
as it still the traditional database management practices on big data and therefore has severe
limitations on the storage capacity and the processing capability.
 Proactive-Big Data Analytics: This is filtering through terabytes, petabytes, exabytes of
information to filter out the relevant data to analyze. This also includes high performance
analytics to gain rapid insights from big data and the ability to solve complex problems using
more data.

Data Science
Data science is the science of extracting knowledge from data. In other words, it is a science
of drawing out hidden patterns amongst data using statistical and mathematical techniques.
It employs techniques and theories drawn from many fields from the broad areas of
mathematics, statistics, information technology including machine learning, data engineering,
probability models, statistical learning, pattern recognition and learning, etc.
Data Scientist works on massive datasets for weather predictions, oil drillings, earthquake
prediction, financial frauds, terrorist network and activities, global economic impacts, sensor logs,
social media analytics, customer churn, collaborative filtering, regression analysis, etc. Data science
is multi-disciplinary.
Business Acumen Skills
A data scientist should have following ability to play the role of data scientist.
 Understanding of domain
 Business strategy
 Problem solving
 Communication
 Presentation
 Keenness

Technology Expertise
Following skills required as far as technical expertise is concerned.
 Good database knowledge such as RDBMS
 Good NoSQL database knowledge such as MongoDB, Cassandra, HBase
 Programming languages such as Java. Python, C++
 Open-source tools such as Hadoop
 Data warehousing
 Data mining
 Visualization such as Tableau, Flare, Google visualization APIs

Mathematics Expertise
The following are the key skills that a data scientist will have to have to comprehend data,
interpret it and analyze.
 Mathematics
 Statistics
 Artificial Intelligence
 Algorithms
 Machine learning
 Pattern recognition
 Natural Language Processing
To sum it up, the data science process is
 Collecting raw data from multiple different data sources
 Processing the data
 Integrating the data and preparing clean datasets
 Engaging in explorative data analysis using model and algorithms
 Preparing presentations using data visualizations
 Communicating the findings to all stakeholders
 Making faster and better decisions

Responsibilities
Data Management: A data scientist employs several approaches to develop the relevant datasets for
analysis. Raw data is just "RAW", unsuitable for analysis. The data scientist works on it to prepare
to reflect the relationships and contexts. This data then becomes useful for processing and further
analysis.
Analytical Techniques: Depending on the business questions which we are trying to find answers
to and the type of data available at hand, the data scientist employs a blend of analytical techniques
to develop models and algorithms to understand the data, interpret relationships, spot trends, and
reveal patterns.
Business Analysis: A data scientist is a business analyst who distinguishes cool facts from insights
and is able to apply his business expertise and domain knowledge to see the results in the business
context.
Communicator: He is a good presenter and communicator who is able to communicate the results
of his findings in a language that is understood by the different business stakeholders.
Models & analyzes
to comprehend,
interpret
relationships, unveils
patterns,spots trends

Prepares & Communicates/


integrates presents
large, varied findings/results
datasets

Applies
Business
/Domain
knowledge to
provide context

Big Data Analytics Applications


Banking and Securities
 The Securities Exchange Commission (SEC) is using Big Data to monitor financial market
activity. They are currently using network analytics and natural language processors to catch
illegal trading activity in the financial markets.
 Retail traders, Big banks, hedge funds, and other so-called big boys in the financial markets
use Big Data for trade analytics used in high-frequency trading, pre-trade decision-support
analytics, sentiment measurement, Predictive Analytics, etc.

Communications, Media and Entertainment


 Collecting, analyzing, and utilizing consumer insights
 Leveraging mobile and social media content
 Understanding patterns of real-time, media content usage
 Create content for different target audiences
 Recommend content on demand
 Measure content performance

Healthcare Providers
The healthcare sector has access to huge amounts of data but has been plagued by failures in
utilizing the data to curb the cost of rising healthcare and by inefficient systems that stifle faster and
better healthcare benefits across the board.
Some hospitals are using data collected from a cell phone app, from millions of patients, to
allow doctors to use evidence-based medicine as opposed to administering several medical/lab tests
to all patients who go to the hospital. A battery of tests can be efficient, but it can also be expensive
and usually ineffective.
Free public health data and Google Maps have been used to create visual data that allows for
faster identification and efficient analysis of healthcare information, used in tracking the spread of
chronic disease.

Education
Big data is used quite significantly in higher education. In a different use case of the use of
Big Data in education, it is also used to measure teacher’s effectiveness to ensure a pleasant
experience for both students and teachers.
Teacher’s performance can be fine-tuned and measured against student numbers, subject
matter, student demographics, student aspirations, behavioral classification, and several other
variables.

Manufacturing and Natural Resources


In the natural resources industry, Big Data allows for predictive modeling to support decision
making that has been utilized for ingesting and integrating large amounts of data from geospatial
data, graphical data, text, and temporal data.
Big data has also been used in solving today’s manufacturing challenges and to gain a
competitive advantage, among other benefits.

Government
In public services, Big Data has an extensive range of applications, including energy
exploration, financial market analysis, fraud detection, health-related research, and environmental
protection.
Big data is being used in the analysis of large amounts of social disability claims made to the
Social Security Administration (SSA) that arrive in the form of unstructured data. The analytics are
used to process medical information rapidly and efficiently for faster decision making and to detect
suspicious or fraudulent claims.
The Food and Drug Administration (FDA) is using Big Data to detect and study patterns of
food-related illnesses and diseases. This allows for a faster response, which has led to more rapid
treatment and less death.

Insurance
Big data has been used in the industry to provide customer insights for transparent and
simpler products, by analyzing and predicting customer behavior through data derived from social
media, GPS-enabled devices, and CCTV footage. The Big Data also allows for better customer
retention from insurance companies.
When it comes to claims management, predictive analytics from Big Data has been used to
offer faster service since massive amounts of data can be analyzed mainly in the underwriting stage.
Fraud detection has also been enhanced.
Through massive data from digital channels and social media, real-time monitoring of claims
throughout the claims cycle has been used to provide insights.

Retail and Wholesale trade


Big data from customer loyalty data, POS, store inventory, local demographics data
continues to be gathered by retail and wholesale stores.
Optimized staffing through data from shopping patterns, local events, and so on.
Reduced fraud.
Timely analysis of inventory.
Social media is used for customer prospecting, customer retention, promotion of products,
and more.

Transportation
Government use of Big Data: traffic control, route planning, intelligent transport systems,
congestion management
Private-sector use of Big Data: revenue management, technological enhancements, logistics
and for competitive advantage
Individual use of Big Data: includes route planning to save on fuel and time, for travel
arrangements in tourism, etc.

Energy and Utilities


Smart meter readers allow data to be collected almost every 15 minutes as opposed to once a
day with the old meter readers. This granular data is being used to analyze the consumption of
utilities better, which allows for improved customer feedback and better control of utilities use.
In utility companies, the use of Big Data also allows for better asset and workforce
management, which is useful for recognizing errors and correcting them as soon as possible before
complete failure is experienced.

You might also like