0% found this document useful (0 votes)
73 views

Understanding of Big Data

Big data

Uploaded by

rohitmarale77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views

Understanding of Big Data

Big data

Uploaded by

rohitmarale77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Unit No 1:Understanding of Big Data

Introduction to Big Data, Definition of Big Data, Need of Big Data Management, Sources of Big
Data, Characteristics of Big Data, Evolution of Big Data, Differentiating between Data Warehouse
and Big Data, Real time data processing, Structure of Big Data, Big Data Life Cycle and
processing, Applications of Big Data, Benefits of Big Data Management,

Challenges of Big Data


Privacy, Visualization, Compliance and Security.

Introduction to Big Data,

According to Gartner, the definition of Big Data – “Big data” is high-volume, velocity, and variety
information assets that demand cost-effective, innovative forms of information processing for
enhanced insight and decision making.” This definition clearly answers the “What is Big Data?”
question – Big Data refers to complex and large data sets that have to be processed and analyzed
to uncover valuable information that can benefit businesses and organizations. However, there are
certain basic tenets of Big Data that will make it even simpler to answer what is Big Data:  It
refers to a massive amount of data that keeps on growing exponentially with time.  It is so
voluminous that it cannot be processed or analyzed using conventional data processing techniques.
 It includes data mining, data storage, data analysis, data sharing, and data visualization.  The
term is an all-comprehensive one including data, data frameworks, along with the tools and
techniques used to process and analyze the data.

Definition of Big Data

Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-
effective, innovative forms of information processing that enable enhanced insight, decision
making, and process automation.

Need of Big data:

The importance of big data does not revolve around how much data a company has but how a
company utilizes the collected data. Every company uses data in its own way; the more efficiently

a company uses its data, the more potential it has to grow. The company can take data from any

source and analyze it to find answers which will enable:

Dr. V.S.Jadhav, VPIMSR Sangli


1. Cost Savings: Some tools of Big Data like Hadoop and Cloud-Based Analytics can bring cost
advantages to business when large amounts of data are to be stored and these tools also help in
identifying more efficient ways of doing business.

2. Time Reductions: The high speed of tools like Hadoop and in-memory analytics can easily
identify new sources of data which helps businesses analyzing data immediately and make quick
decisions based on the learning.

3. Understand the market conditions: By analyzing big data you can get a better understanding of
current market conditions. For example, by analyzing customers’ purchasing behaviors, a
company can find out the products that are sold the most and produce products according to this
trend. By this, it can get ahead of its competitors.

4. Control online reputation: Big data tools can do sentiment analysis. Therefore, you can get
feedback about who is saying what about your company. If you want to monitor and improve the
online presence of your business, then, big data tools can help in all this.

5. Using Big Data Analytics to Boost Customer Acquisition and Retention The customer is the
most important asset any business depends on. There is no single business that can claim success
without first having to establish a solid customer base. However, even with a customer base, a
business cannot afford to disregard the high competition it faces. If a business is slow to learn what
customers are looking for, then it is very easy to begin offering poor quality products. In the end,
loss of clientele will result, and this creates an adverse overall effect on business success. The use
of big data allows businesses to observe various customer related patterns and trends. Observing

customer behavior is important to trigger loyalty.

6. Using Big Data Analytics to Solve Advertisers Problem and Offer Marketing Insights

Sources of Big Data

Voluminous amounts of big data make it crucial for businesses to differentiate, for the purpose of
effectiveness, the disparate big data sources available. Big data is used by organizations for the
sole purpose of analytics. However, before companies can set out to extract insights and valuable
information from big data, they must have the knowledge of several big data sources available.
Data, as we know, is massive and exists in various forms. If it is not classified or sourced well, it

Dr. V.S.Jadhav, VPIMSR Sangli


can end up wasting precious time and resources. In order to achieve success with big data, it is
important that companies have the know-how to sift between the various data sources available
and accordingly classify its usability and relevance.

MEDIA AS A BIG DATA SOURCE

Media is the most popular source of big data, as it provides valuable insights on consumer preferences
and changing trends. Since it is self-broadcasted and crosses all physical and demographical barriers, it is
the fastest way for businesses to get an in-depth overview of their target audience, draw patterns and
conclusions, and enhance their decision-making. Media includes social media and interactive platforms,
like Google, Facebook, Twitter, YouTube, Instagram, as well as generic media like images, videos, audios,
and podcasts that provide quantitative and qualitative insights on every aspect of user interaction.

CLOUD AS A BIG DATA SOURCE

Today, companies have moved ahead of traditional data sources by shifting their data on the cloud. Cloud
storage accommodates structured and unstructured data and provides business with real-time
information and on-demand insights. The main attribute of cloud computing is its flexibility and scalability.
As big data can be stored and sourced on public or private clouds, via networks and servers, cloud makes
for an efficient and economical data source.

THE WEB AS A BIG DATA SOURCE

The public web constitutes big data that is widespread and easily accessible. Data on the Web or ‘Internet’
is commonly available to individuals and companies alike. Moreover, web services such as Wikipedia
provide free and quick informational insights to everyone. The enormity of the Web ensures for its diverse
usability and is especially beneficial to start-ups and SME’s, as they don’t have to wait to develop their
own big data infrastructure and repositories before they can leverage big data.

IOT AS A BIG DATA SOURCE

Machine-generated content or data created from IoT constitute a valuable source of big data. This data is
usually generated from the sensors that are connected to electronic devices. The sourcing capacity
depends on the ability of the sensors to provide real-time accurate information. IoT is now gaining
momentum and includes big data generated, not only from computers and smartphones, but also possibly
from every device that can emit data. With IoT, data can now be sourced from medical devices, vehicular
processes, video games, meters, cameras, household appliances, and the like.

Dr. V.S.Jadhav, VPIMSR Sangli


DATABASES AS A BIG DATA SOURCE

Businesses today prefer to use an amalgamation of traditional and modern databases to acquire relevant
big data. This integration paves the way for a hybrid data model and requires low investment and IT
infrastructural costs. Furthermore, these databases are deployed for several business intelligence
purposes as well. These databases can then provide for the extraction of insights that are used to drive
business profits. Popular databases include a variety of data sources, such as MS Access, DB2, Oracle, SQL,
and Amazon Simple, among others.

The process of extracting and analyzing data amongst extensive big data sources is a complex process and
can be frustrating and time-consuming. These complications can be resolved if organizations encompass
all the necessary considerations of big data, take into account relevant data sources, and deploy them in
a manner which is well tuned to their organizational goals.

Big Data Characteristics

Big Data contains a large amount of data that is not being processed by traditional data storage or
the processing unit. It is used by many multinational companies to process the data and business
of many organizations. The data flow would exceed 150 exabytes per day before replication.

There are five v's of Big Data that explains the characteristics.

5 V's of Big Data

o Volume
o Veracity
o Variety
o Value
o Velocity

Dr. V.S.Jadhav, VPIMSR Sangli


Volume

The name Big Data itself is related to an enormous size. Big Data is a vast 'volumes' of data generated
from many sources daily, such as business processes, machines, social media platforms, networks, human
interactions, and many more.

Facebook can generate approximately a billion messages, 4.5 billion times that the "Like" button is
recorded, and more than 350 million new posts are uploaded each day. Big data technologies can handle
large amounts of data.

Variety

Big Data can be structured, unstructured, and semi-structured that are being collected from different
sources. Data will only be collected from databases and sheets in the past, But these days the data will
comes in array forms, that are PDFs, Emails, audios, SM posts, photos, videos, etc.

The data is categorized as below:


a. Structured data: In Structured schema, along with all the required columns. It is in a
tabular form. Structured Data is stored in the relational database management system.

Dr. V.S.Jadhav, VPIMSR Sangli


b. Semi-structured: In Semi-structured, the schema is not appropriately defined, e.g., JSON,
XML, CSV, TSV, and email. OLTP (Online Transaction Processing) systems are built
to work with semi-structured data. It is stored in relations, i.e., tables.
c. Unstructured Data: All the unstructured files, log files, audio files, and image files are
included in the unstructured data. Some organizations have much data available, but they
did not know how to derive the value of data since the data is raw.
d. Quasi-structured Data:The data format contains textual data with inconsistent data
formats that are formatted with effort and time with some tools.
Example: Web server logs, i.e., the log file is created and maintained by some server that contains
a list of activities.
Veracity
Veracity means how much the data is reliable. It has many ways to filter or translate the data.
Veracity is the process of being able to handle and manage data efficiently. Big Data is also
essential in business development.
For example, Facebook posts with hashtags.
Value
Value is an essential characteristic of big data. It is not the data that we process or store. It
is valuable and reliable data that we store, process, and also analyze.

Velocity

Velocity plays an important role compared to others. Velocity creates the speed by which the data is
created in real-time. It contains the linking of incoming data sets speeds, rate of change, and activity
bursts. The primary aspect of Big Data is to provide demanding data rapidly.

Dr. V.S.Jadhav, VPIMSR Sangli


Big data velocity deals with the speed at the data flows from sources like application logs, business
processes, networks, and social media sites, sensors, mobile devices, etc.

Evolution of Big Data

Big data has revolutionized the modern business environment in recent years. A mixture of
structured, semistructured and unstructured data, big data is a collection of information that
organizations can mine for business purposes through machine learning, predictive modeling, and
other advanced data analytics applications.
At one time the concept of big data may have seemed like a buzzword, but the reality is the impact
of big data on the world around us has been tremendous. As you will see from this timeline
covering the history of big data, big data analytics builds on concepts that have been around for
centuries.
The history data analysis that led to today's advanced big data analytics starts way back in the 17th
century in London. Let's begin our journey.

The bedrock of big data


A foundational period where clever people started seeing the value of turning to statistics and
analysis to make sense of the world around them.

1663
John Graunt introduces statistical data analysis with the bubonic plague. The London haberdasher
published the first collection of public health records when he recorded death rates and their
variations during the bubonic plague in England.

1865

Dr. V.S.Jadhav, VPIMSR Sangli


Richard Millar Devens coins the term "business intelligence." As we understand it today, business
intelligence is the process of analyzing data, and then using it to deliver actionable information. In
his "Cyclopædia of Commercial and Business Anecdotes," Devens described how a banker used
information from his environment to turn a profit.

1884
Herman Hollerith invents the punch card tabulating machine, marking the beginning of data
processing. The tabulating device Hollerith developed was used to process data from the 1890 U.S.
Census. Later, in 1911, he founded the Computing-Tabulating-Recording Company, which would
eventually become IBM.

1926
Nikola Tesla predicts humans will one day have access to large swaths of data via an instrument
that can be carried "in [one's] vest pocket." Tesla managed to predict our modern affinity for
smartphones and other handheld devices based on his understanding of how wireless technology
would change particles: "When wireless is perfectly applied, the whole earth will be converted
into a huge brain, which in fact it is, all things being particles of a real and rhythmic whole. We
shall be able to communicate with one another instantly, irrespective of distance."

1928
Fritz Pfleumer invents a way to store information on tape. Pfleumer's process for putting metal
stripes on magnetic papers eventually led him to create magnetic tape, which formed the
foundation for video cassettes, movie reels and more.

1943
The U.K. created a theoretical computer and one of the first data processing machines to decipher
Nazi codes during WWII. The Colossus, as it was called, performed Boolean and counting
operations to analyze large volumes of data.
1959
Arthur Samuel, a programmer at IBM and pioneer of artificial intelligence, coined the term
machine learning (ML).

Dr. V.S.Jadhav, VPIMSR Sangli


1965
The U.S. plans to build the first data center buildings to store millions of tax returns and
fingerprints on magnetic tape.

1969
Advanced Research Projects Agency Network (ARPANET), the first wide area network that
included distributed control and TCI/IP protocols, was created. This formed the foundation of
today's internet.
The internet age: The dawn of big data
As computers start sharing information at exponentially greater rates due to the internet, the next
stage in the history of big data takes shape.

1989 and 1990


Tim Berners-Lee and Robert Cailliau found the World Wide Web and develop HTML, URLs and
HTTP while working for CERN. The internet age with widespread and easy access to data begins.

1996
Digital data storage becomes more cost-effective than storing information on paper for the first
time in 1996, as reported by R.J.T. Morris and B.J. Truskowski in their 2003 IBM Systems Journal
paper, "The Evolution of Storage Systems."

1997
The domain google.com is registered a year before launching, starting the search engine's climb to
dominance and development of numerous other technological innovations, including in the areas
of machine learning, big data and analytics.

1998
Carlo Strozzi develops NoSQL, an open source relational database that provides a way to store
and retrieve data modeled differently from the traditional tabular methods found in relational
databases.

Dr. V.S.Jadhav, VPIMSR Sangli


1999
Based on data from 1999, the first edition of the influential book, How Much Information, by Hal
R. Varian and Peter Lyman (published in 2000), attempts to quantify the amount of digital
information available in the world to date.
Big Data in the 21st century
Big data as we know it finally arrives, and the explosion of ingenuity that it brings with it cannot
be overestimated. Everyone, and everything, is impacted.

2001
Doug Laney of analyst firm Gartner coins the 3Vs (volume, variety and velocity), defining the
dimensions and properties of big data. The Vs encapsulate the true definition of big data and usher
in a new period where big data can be viewed as a dominant feature of the 21st century. Additional
Vs -- such as veracity, value and variability -- have since been added to the list.

2005

Computer scientists Doug Cutting and Mike Cafarella create Apache Hadoop, the open source
framework used to store and process large data sets, with a team of engineers spun off from Yahoo.

2006

Dr. V.S.Jadhav, VPIMSR Sangli


Amazon Web Services (AWS) starts offering web-based computing infrastructure services, now known
as cloud computing. Currently, AWS dominates the cloud services industry with roughly one-third of
the global market share.

2008
The world's CPUs process over 9.57 zettabytes (or 9.57 trillion gigabytes) of
data, about equal to 12 gigabytes per person. Global production of new
information hits an estimated 14.7 exabytes.
2009
Gartner reports business intelligence as the top priority for CIOs. As companies
face a period of economic volatility and uncertainty due to the Great Recession,
squeezing value out of data becomes paramount.
2011
McKinsey reports that by 2018 the U.S. will face a shortage of analytics talent.
Lacking between 140,000 and 190,000 people with deep analytical skills and a
further 1.5 million analysts and managers with the ability to make accurate data-
driven decisions.
Also, Facebook launches the Open Compute Project to share specifications for
energy-efficient data centers. The initiative's goal is to deliver a 38% increase in
energy efficiency at a 24% lower cost.
2012
The Obama administration announces the Big Data Research and Development
Initiative with a $200 million commitment, citing a need to improve the ability to
extract valuable insights from data and accelerate the pace of STEM (science,
technology, engineering, and mathematics) growth, enhance national security
and transform learning. The acronym has since become STEAM, adding an A by
incorporating the arts.

Dr. V.S.Jadhav, VPIMSR Sangli


Harvard Business Review names data scientist the sexiest job of the 21st
century. As more companies recognized the need to sort and gain insights from
unstructured data, demand for data scientists soared.
2013
The global market for big data reaches $10 billion.
2014
For the first time, more mobile devices access the internet than desktop
computers in the U.S. The rest of the world follows suit two years later, in 2016.
2016
Ninety percent of the world's data was created in the last two years alone, and
IBM reports that 2.5 quintillion bytes of data is created every day (that's 18
zeroes).
2017
IDC forecasts big data analytics market would reach $203 billion in 2020.
2020
Allied Market Research reports the big data and business analytics market hit
$193.14 billion in 2019, and estimates it will grow to $420.98 billion by 2027 at a

compound annual growth rate of 10.9%.


Difference between Big Data and Data Warehouse

Big Data and Data Warehouse both are used as main source of input for Business Intelligence, such as
creation of Analytical results and Report generation, in order to provision effective business decision-
making processes. Big Data allows unrefined data from any source, but Data Warehouse allows only
processed data, as it has to maintain the reliability and consistency of the data. The unprocessed data in
Big Data systems can be of any size depending on the type their formats. Almost all the data in Data
Warehouse are of common size due to its refined structured system organization.

Dr. V.S.Jadhav, VPIMSR Sangli


Data Warehouse Big Data

Data Warehouse is an architecture of data Big Data is a technology to handle huge data
storing or data repository and prepare the repository.
Any kind of DBMS data accepted by Data Big Data accept all kind of data including
warehouse transnational data, social media data,
machinery data or any DBMS data.
Data warehouse only handles structure data big data can handle structure, non-structure,
semi-structured data.
data warehouse doesn’t have that kind of Big data normally used a distributed file
concept. system to load huge data in a distributed way
Data warehouse means the relational big data is not following proper database
database, so storing, fetching data will be structure, we need to use hive or spark SQL
similar with a normal SQL query. to see the data by using hive specific query.
100% data loaded into data warehousing are data loaded by Hadoop, maximum 0.5% used
using for analytics reports. on analytics reports till now. Others data are
loaded into the system, but in not use status.
Data Warehousing never able to handle Big data (Apache Hadoop) is the only option
totally unstructured data. to handle unstructured data.
The timing of fetching increasing it will take a small period of time to fetch
simultaneously in data warehouse based on huge data
data volume.

Real-Time Data Processing


Real-time data processing is the execution of data in a short time period, providing near-
instantaneous output. The processing is done as the data is inputted, so it needs a continuous stream
of input data in order to provide a continuous output. Good examples of real-time data processing
systems are bank ATMs, traffic control systems and modern computer systems such as the PC and
mobile devices. In contrast, a batch data processing system collects data and then processes all the
data in bulk in a later time, which also means output is received at a later time.
Real-time data processing is also known as stream processing.
A real-time data processing system is able to take input of rapidly changing data and then provide
output near instantaneously so that change over time is readily seen in such a system. For example,
a radar system depends on a continuous flow of input data which is processed by a computer to
reveal the location of various aircraft flying within the range of the radar and then display it on a
screen so that anyone looking at the screen can know the actual location of an aircraft at that
moment.

Dr. V.S.Jadhav, VPIMSR Sangli


Real-time data processing is also called stream processing because of the continuous stream of
input data required to yield output for that moment. Good examples are e-commerce order
processing, online booking and reservations, and credit card real-time fraud detection. The biggest
benefit of real-time data processing is instantaneous results from input data that ensures everything
is up to date. Batch processing, on the other hand, means that data is no longer timely.
There are two types of real-time analytics:

On-demand real-time analytics — This is a reactive approach. It awaits a query from the end user
to process a request and then deliver the analytics. For example, a web analyst monitors site traffic
to avoid a potential crash of the website.
Continuous real-time analytics — This is a proactive approach. It alerts users with continuous
updates in real time. For example, tracking the stock market with various visualization
representations on a website.
Structure Of Big Data
In the last few years, big data has become central to the tech landscape. You can consider big data
as a collection of massive and complex datasets that are difficult to store and process utilizing
traditional database management tools and traditional data processing applications. The key
challenges include capturing, storing, managing, analyzing, and visualization of that data.

When it comes to the structure of big data, you can consider it a collection of data values, the
relationships between them together with the operations or functions which can be applied to that
data.

These days, lots of resources (social media platforms being the number one) have become available
to companies from where they can capture massive amounts of data. Now, this captured data is
used by enterprises to develop a better understanding and closer relationships with their target
customers. It’s important to understand that every new customer action essentially creates a more
complete picture of the customer, helping organizations achieve a more detailed understanding of
their ideal customers. Therefore, it can be easily imagined why companies across the globe are
striving to leverage big data. Put simply, big data comes with the potential that can redefine a
business, and organizations, which succeed in analyzing big data effectively, stand a huge chance
to become global leaders in the business domain.
Structures of big data
Big data structures can be divided into three categories – structured, unstructured, and semi-
structured. Let’s have a look at them in detail.
1- Structured data
It’s the data which follows a pre-defined format and thus, is straightforward to analyze. It
conforms to a tabular format together with relationships between different rows and

Dr. V.S.Jadhav, VPIMSR Sangli


columns. You can think of SQL databases as a common example. Structured data relies on
how data could be stored, processed, as well as, accessed. It’s considered the most
“traditional” type of data storage.
2- Unstructured data
This type of big data comes with unknown form and cannot be stored in traditional ways
and cannot be analyzed unless it’s transformed into a structured format. You can think of
multimedia content like audios, videos, images as examples of unstructured data. It’s
important to understand that these days, unstructured data is growing faster than other types
of big data.
3- Semi-structured data
It’s a type of big data that doesn’t conform with a formal structure of data models. But it
comes with some kinds of organizational tags or other markers that help to separate
semantic elements, as well as, enforce hierarchies of fields and records within that data.
You can think of JSON documents or XML files as this type of big data. The reason behind
the existence of this category is semi-structured data is significantly easier to analyze than
unstructured data. A significant number of big data solutions and tools come with the
ability of reading and processing XML files or JASON documents, reducing the
complexity of the analyzing process.

Big Data Analytics Life Cycle


The Big Data Analytics Life cycle is divided into nine phases, named as :
1. Business Case/Problem Definition
2. Data Identification
3. Data Acquisition and filtration
4. Data Extraction
5. Data Munging(Validation and Cleaning)
6. Data Aggregation & Representation(Storage)
7. Exploratory Data Analysis
8. Data Visualization(Preparation for Modeling and Assessment)
9. Utilization of analysis results.
Let us discuss each phase :
 Phase I Business Problem Definition –

In this stage, the team learns about the business domain, which presents the motivation
and goals for carrying out the analysis. In this stage, the problem is identified, and
assumptions are made that how much potential gain a company will make after carrying
out the analysis. Important activities in this step include framing the business problem
as an analytics challenge that can be addressed in subsequent phases. It helps the
decision-makers understand the business resources that will be required to be utilized
thereby determining the underlying budget required to carry out the project.
Moreover, it can be determined, whether the problem identified, is a Big Data problem

Dr. V.S.Jadhav, VPIMSR Sangli


or not, based on the business requirements in the business case. To qualify as a big data
problem, the business case should be directly related to one(or more) of the
characteristics of volume, velocity, or variety.

 Phase II Data Definition –

Once the business case is identified, now it’s time to find the appropriate datasets to
work with. In this stage, analysis is done to see what other companies have done for a
similar case.
Depending on the business case and the scope of analysis of the project being addressed,
the sources of datasets can be either external or internal to the company. In the case of
internal datasets, the datasets can include data collected from internal sources, such as
feedback forms, from existing software, On the other hand, for external datasets, the
list includes datasets from third-party providers.

 Phase III Data Acquisition and filtration –

Once the source of data is identified, now it is time to gather the data from such sources.
This kind of data is mostly unstructured.Then it is subjected to filtration, such as
removal of the corrupt data or irrelevant data, which is of no scope to the analysis
objective. Here corrupt data means data that may have missing records, or the ones,
which include incompatible data types.
After filtration, a copy of the filtered data is stored and compressed, as it can be of use
in the future, for some other analysis.

 Phase IV Data Extraction –

Now the data is filtered, but there might be a possibility that some of the entries of the
data might be incompatible, to rectify this issue, a separate phase is created, known as
the data extraction phase. In this phase, the data, which don’t match with the underlying
scope of the analysis, are extracted and transformed in such a form.

 Phase V Data Munging –

As mentioned in phase III, the data is collected from various sources, which results in
the data being unstructured. There might be a possibility, that the data might have
constraints, that are unsuitable, which can lead to false results. Hence there is a need to
clean and validate the data.
It includes removing any invalid data and establishing complex validation rules. There
are many ways to validate and clean the data. For example, a dataset might contain few
rows, with null entries. If a similar dataset is present, then those entries are copied from
that dataset, else those rows are dropped.

Dr. V.S.Jadhav, VPIMSR Sangli


 Phase VI Data Aggregation & Representation –

The data is cleansed and validates, against certain rules set by the enterprise. But the
data might be spread across multiple datasets, and it is not advisable to work with
multiple datasets. Hence, the datasets are joined together. For example: If there are two
datasets, namely that of a Student Academic section and Student Personal Details
section, then both can be joined together via common fields, i.e. roll number.
This phase calls for intensive operation since the amount of data can be very large.
Automation can be brought into consideration, so that these things are executed,
without any human intervention.

 Phase VII Exploratory Data Analysis –


Here comes the actual step, the analysis task. Depending on the nature of the big data
problem, analysis is carried out. Data analysis can be classified as Confirmatory
analysis and Exploratory analysis. In confirmatory analysis, the cause of a phenomenon
is analyzed before. The assumption is called the hypothesis. The data is analyzed to
approve or disapprove the hypothesis.
This kind of analysis provides definitive answers to some specific questions and
confirms whether an assumption was true or not.In an exploratory analysis, the data is
explored to obtain information, why a phenomenon occurred. This type of analysis
answers “why” a phenomenon occurred. This kind of analysis doesn’t provide
definitive, meanwhile, it provides discovery of patterns.

 Phase VIII Data Visualization –

Now we have the answer to some questions, using the information from the data in the
datasets. But these answers are still in a form that can’t be presented to business users.
A sort of representation is required to obtains value or some conclusion from the
analysis. Hence, various tools are used to visualize the data in graphic form, which can
easily be interpreted by business users.
Visualization is said to influence the interpretation of the results. Moreover, it allows
the users to discover answers to questions that are yet to be formulated.

 Phase IX Utilization of analysis results –


The analysis is done, the results are visualized, now it’s time for the business users to make
decisions to utilize the results. The results can be used for optimization, to refine the
business process. It can also be used as an input for the systems to enhance performance.
The block diagram of the life cycle is given below :

Dr. V.S.Jadhav, VPIMSR Sangli


It is evident from the block diagram that Phase VII, i.e. exploratory Data analysis, is
modified successively until it is performed satisfactorily. Emphasis is put on error

Dr. V.S.Jadhav, VPIMSR Sangli


correction. Moreover, one can move back from Phase VIII to Phase VII, if a satisfactory
result is not achieved. In this manner, it is ensured that the data is analyzed properly.
Applications of Big Data
Here is the list of the top 10 industries using big data applications:

Banking and Securities


Communications, Media and Entertainment
Healthcare Providers
Education
Manufacturing and Natural Resources
Government
Insurance
Retail and Wholesale trade
Transportation
Energy and Utilities
In this article we will examine how the above-listed ten industry verticals are using Big
Data, industry-specific challenges that these industries face, and how Big Data solves these
challenges.

1. Banking and Securities


Industry-specific Big Data Challenges
A study of 16 projects in 10 top investment and retail banks shows that the challenges in
this industry include: securities fraud early warning, tick analytics, card fraud detection,
archival of audit trails, enterprise credit risk reporting, trade visibility, customer data
transformation, social analytics for trading, IT operations analytics, and IT policy
compliance analytics, among others.

Applications of Big Data in the Banking and Securities Industry


The Securities Exchange Commission (SEC) is using Big Data to monitor financial market
activity. They are currently using network analytics and natural language processors to
catch illegal trading activity in the financial markets.

Retail traders, Big banks, hedge funds, and other so-called ‘big boys’ in the financial
markets use Big Data for trade analytics used in high-frequency trading, pre-trade decision-
support analytics, sentiment measurement, Predictive Analytics, etc.

This industry also heavily relies on Big Data for risk analytics, including; anti-money
laundering, demand enterprise risk management, "Know Your Customer," and fraud
mitigation.

Big Data providers are specific to this industry includes 1010data, Panopticon Software,
Streambase Systems, Nice Actimize, and Quartet FS.

Dr. V.S.Jadhav, VPIMSR Sangli


Related Read: How Facebook is using Big Data
2. Communications, Media and Entertainment
Industry-specific Big Data Challenges
Since consumers expect rich media on-demand in different formats and a variety of
devices, some Big Data challenges in the communications, media, and entertainment
industry include:

Collecting, analyzing, and utilizing consumer insights


Leveraging mobile and social media content
Understanding patterns of real-time, media content usage
Big Data Hadoop and Spark Developer Course (FREE)
Learn Big Data Basics from Top Experts - for FREEENROLL NOWBig Data Hadoop and
Spark Developer Course (FREE)
Applications of Big Data in the Communications, Media and Entertainment Industry
Organizations in this industry simultaneously analyze customer data along with behavioral
data to create detailed customer profiles that can be used to:

Create content for different target audiences


Recommend content on demand
Measure content performance
A case in point is the Wimbledon Championships (YouTube Video) that leverages Big
Data to deliver detailed sentiment analysis on the tennis matches to TV, mobile, and web
users in real-time.

Spotify, an on-demand music service, uses Hadoop Big Data analytics, to collect data from
its millions of users worldwide and then uses the analyzed data to give informed music
recommendations to individual users.

Amazon Prime, which is driven to provide a great customer experience by offering video,
music, and Kindle books in a one-stop-shop, also heavily utilizes Big Data.

Big Data Providers in this industry include Infochimps, Splunk, Pervasive Software, and
Visible Measures.

3. Healthcare Providers
Industry-specific Big Data Challenges
The healthcare sector has access to huge amounts of data but has been plagued by failures
in utilizing the data to curb the cost of rising healthcare and by inefficient systems that
stifle faster and better healthcare benefits across the board.

Dr. V.S.Jadhav, VPIMSR Sangli


This is mainly because electronic data is unavailable, inadequate, or unusable.
Additionally, the healthcare databases that hold health-related information have made it
difficult to link data that can show patterns useful in the medical field.

Big Data Uses in Healthcare sector


Source: Big Data in the Healthcare Sector Revolutionizing the Management of Laborious
Tasks

Other challenges related to Big Data include the exclusion of patients from the decision-
making process and the use of data from different readily available sensors.

Applications of Big Data in the Healthcare Sector


Some hospitals, like Beth Israel, are using data collected from a cell phone app, from
millions of patients, to allow doctors to use evidence-based medicine as opposed to
administering several medical/lab tests to all patients who go to the hospital. A battery of
tests can be efficient, but it can also be expensive and usually ineffective.

Free public health data and Google Maps have been used by the University of Florida to
create visual data that allows for faster identification and efficient analysis of healthcare
information, used in tracking the spread of chronic disease. Obamacare has also utilized
Big Data in a variety of ways. Big Data Providers in this industry include Recombinant
Data, Humedica, Explorys, and Cerner.

Big Data Students Also Enroll In


Big Data Hadoop and Spark Developer Free Course | MongoDB Developer and
Administrator Free Course | Apache Spark Free Course | PySpark Free Course | Hadoop
Free Course Apache Spark Data Analytics Free Course
4. Education
Industry-specific Big Data Challenges
From a technical point of view, a significant challenge in the education industry is to
incorporate Big Data from different sources and vendors and to utilize it on platforms that
were not designed for the varying data.

From a practical point of view, staff and institutions have to learn new data management
and analysis tools.

On the technical side, there are challenges to integrating data from different sources on
different platforms and from different vendors that were not designed to work with one
another. Politically, issues of privacy and personal data protection associated with Big Data
used for educational purposes is a challenge.

Applications of Big Data in Education

Dr. V.S.Jadhav, VPIMSR Sangli


Big data is used quite significantly in higher education. For example, The University of
Tasmania. An Australian university with over 26000 students has deployed a Learning and
Management System that tracks, among other things, when a student logs onto the system,
how much time is spent on different pages in the system, as well as the overall progress of
a student over time.

In a different use case of the use of Big Data in education, it is also used to measure
teacher’s effectiveness to ensure a pleasant experience for both students and teachers.
Teacher’s performance can be fine-tuned and measured against student numbers, subject
matter, student demographics, student aspirations, behavioral classification, and several
other variables.

On a governmental level, the Office of Educational Technology in the U. S. Department of


Education is using Big Data to develop analytics to help correct course students who are
going astray while using online Big Data certification courses. Click patterns are also being
used to detect boredom.

Big Data Providers in this industry include Knewton and Carnegie Learning and
MyFit/Naviance.

Big Data Hadoop Certification Training Course


Master Big Data and Hadoop EcosystemEXPLORE COURSEBig Data Hadoop
Certification Training Course
5. Manufacturing and Natural Resources
Industry-specific Big Data Challenges
Increasing demand for natural resources, including oil, agricultural products, minerals, gas,
metals, and so on, has led to an increase in the volume, complexity, and velocity of data
that is a challenge to handle.

Similarly, large volumes of data from the manufacturing industry are untapped. The
underutilization of this information prevents the improved quality of products, energy
efficiency, reliability, and better profit margins.

Applications of Big Data in Manufacturing and Natural Resources


In the natural resources industry, Big Data allows for predictive modeling to support
decision making that has been utilized for ingesting and integrating large amounts of data
from geospatial data, graphical data, text, and temporal data. Areas of interest where this
has been used include; seismic interpretation and reservoir characterization.

Big data has also been used in solving today’s manufacturing challenges and to gain a
competitive advantage, among other benefits.

Dr. V.S.Jadhav, VPIMSR Sangli


In the graphic below, a study by Deloitte shows the use of supply chain capabilities from
Big Data currently in use and their expected use in the future.

Use of Big Data in Government & Private sector


Source: Supply Chain Talent of the Future

Big Data Providers in this industry include CSC, Aspen Technology, Invensys, and
Pentaho.

6. Government
Industry-specific Big Data Challenges
In governments, the most significant challenges are the integration and interoperability of
Big Data across different government departments and affiliated organizations.

Applications of Big Data in Government


In public services, Big Data has an extensive range of applications, including energy
exploration, financial market analysis, fraud detection, health-related research, and
environmental protection.

Some more specific examples are as follows:

Big data is being used in the analysis of large amounts of social disability claims made to
the Social Security Administration (SSA) that arrive in the form of unstructured data. The
analytics are used to process medical information rapidly and efficiently for faster decision
making and to detect suspicious or fraudulent claims.

The Food and Drug Administration (FDA) is using Big Data to detect and study patterns
of food-related illnesses and diseases. This allows for a faster response, which has led to
more rapid treatment and less death.

The Department of Homeland Security uses Big Data for several different use cases. Big
data is analyzed from various government agencies and is used to protect the country.

Big Data Providers in this industry include Digital Reasoning, Socrata, and HP.

Free Course: Getting Started with Hadoop


Learn the Fundamentals of HadoopENROLL NOWFree Course: Getting Started with
Hadoop
7. Insurance
Industry-specific Big Data Challenges
Lack of personalized services, lack of personalized pricing, and the lack of targeted
services to new segments and specific market segments are some of the main challenges.

Dr. V.S.Jadhav, VPIMSR Sangli


In a survey conducted by Marketforce challenges identified by professionals in the
insurance industry include underutilization of data gathered by loss adjusters and a hunger
for better insight.

Applications of Big Data in the Insurance Industry


Big data has been used in the industry to provide customer insights for transparent and
simpler products, by analyzing and predicting customer behavior through data derived
from social media, GPS-enabled devices, and CCTV footage. The Big Data also allows for
better customer retention from insurance companies.

When it comes to claims management, predictive analytics from Big Data has been used
to offer faster service since massive amounts of data can be analyzed mainly in the
underwriting stage. Fraud detection has also been enhanced.

Through massive data from digital channels and social media, real-time monitoring of
claims throughout the claims cycle has been used to provide insights.

Big Data Providers in this industry include Sprint, Qualcomm, Octo Telematics, The
Climate Corp.

8. Retail and Wholesale trade


Industry-specific Big Data Challenges
From traditional brick and mortar retailers and wholesalers to current day e-commerce
traders, the industry has gathered a lot of data over time. This data, derived from customer
loyalty cards, POS scanners, RFID, etc. are not being used enough to improve customer
experiences on the whole. Any changes and improvements made have been quite slow.

Applications of Big Data in the Retail and Wholesale Industry


Big data from customer loyalty data, POS, store inventory, local demographics data
continues to be gathered by retail and wholesale stores.

In New York’s Big Show retail trade conference in 2014, companies like Microsoft, Cisco,
and IBM pitched the need for the retail industry to utilize Big Data for analytics and other
uses, including:

Optimized staffing through data from shopping patterns, local events, and so on
Reduced fraud
Timely analysis of inventory
Social media use also has a lot of potential use and continues to be slowly but surely
adopted, especially by brick and mortar stores. Social media is used for customer
prospecting, customer retention, promotion of products, and more.

Dr. V.S.Jadhav, VPIMSR Sangli


Big Data Providers in this industry include First Retail, First Insight, Fujitsu, Infor, Epicor,
and Vistex.

9. Transportation
Industry-specific Big Data Challenges
In recent times, huge amounts of data from location-based social networks and high-speed
data from telecoms have affected travel behavior. Regrettably, research to understand
travel behavior has not progressed as quickly.

In most places, transport demand models are still based on poorly understood new social
media structures.

Applications of Big Data in the Transportation Industry


Some applications of Big Data by governments, private organizations, and individuals
include:

Governments use of Big Data: traffic control, route planning, intelligent transport systems,
congestion management (by predicting traffic conditions)
Private-sector use of Big Data in transport: revenue management, technological
enhancements, logistics and for competitive advantage (by consolidating shipments and
optimizing freight movement)
Individual use of Big Data includes route planning to save on fuel and time, for travel
arrangements in tourism, etc.
Real-time Applications of big data in top ten industries infographics
Source: Using Big Data in the Transport Sector

Big Data Providers in this industry include Qualcomm and Manhattan Associates.

10. Energy and Utilities


Industry-specific Big Data Challenges
The image below shows some of the main challenges in the energy and utility industry.

Applications of Big Data in the Energy and Utility Industry


Smart meter readers allow data to be collected almost every 15 minutes as opposed to once
a day with the old meter readers. This granular data is being used to analyze the
consumption of utilities better, which allows for improved customer feedback and better
control of utilities use.

In utility companies, the use of Big Data also allows for better asset and workforce management,
which is useful for recognizing errors and correcting them as soon as possible before complete
failure is experienced.

Note: Content of this file is collected from different web sites.

Dr. V.S.Jadhav, VPIMSR Sangli

You might also like