Big Data Analytics

Dr. Md. Abdul Hannan Mia

B. Com, M. Com, PGD, MSc, MBA, FCMA
Department of Management Information Systems (MIS)
Faculty of Business Studies
University of Dhaka
Email: [email protected], [email protected]

This paper aims at imparting knowledge on big data analytics with a view to motivate professionals, practitioners
to develop their skills in data analytics for that eventually improves individual, organizational and national
performance. In digital era big data analytics has pervades its root in business, social and governmental records.
Decision making seldom become effective without the incorporation of big data in the decision support systems
(DSS) and decision-making systems. It accommodates the concept of big data, its characteristics, concept of big
data analytics, types of data analytics, cloud and its services as a platform of big data analytics, skills required to
be a big data analyst, types of decisions taken using data analytics. The study is based on secondary documents.
Literature review from related books, journal articles, other relevant published materials etc. are the sources
of data used for the preparation of the paper. The literature review outlined some big data analytics tool or
software like Microsoft excel, solver, Oracle, statistical package for social sciences (SPSS), R, Stata, Amos graphics,
simulations-ithink, arena, simul8, etc.
Keywords: Big Data, Data Analytics Decision Support System, Analytics Software.

1. Introduction
1.1 The Prelude: Big data has emerged as the most powerful weapon in the digital era where data is produced,
collected, bought and sold, processed and used to produce information, knowledge and eventually come up
with wisdom. The digital world only can be thought of from a system perspective where all sub systems are
interdepended and interacting, and big data is the vital weapon that flow through all the subsystem and creates
business value. Big data big decision and big payoff is the slogan of the business of the digital era. In this paper we
will elaborate the insight of big data analytics with a view to develop interest of the academicians and practitioners.


1.2 Objectives and methodology: This article aims 2.2 Big Data: A massive volume of both structured
at imparting knowledge on big data, characteristics of and unstructured data that is so large it is difficult
big data, big data analytics, types of big data analytics, to process using traditional database and software
importance and application of big data analytics, techniques. In most enterprise scenarios the
future trend of big data analytics. To achieve these volume of data is too big, it moves too fast or it
objectives the article has been produced based on exceeds current processing capacity. Big Data has
secondary data. Literature review was the primary the potential to help companies improve operations
method of data collection. Book, journals, features, and make faster, more intelligent decisions. The data
monographs etc. were reviewed in quest of achieving is collected from several sources including emails,
the objectives. mobile devices, applications, databases, servers and
other means. This data, when captured, formatted,
2. Big Data Analytics manipulated, stored and then analyzed, can help a
World’s most powerful resources for competitive company to gain useful insight to increase revenues,
advantages within the nations, organization and get or retain customers and improve operations. Big
businesses is nothing but Data. It is estimated that 2.5 Data is a field of study that focuses on extraction,
quintillion bytes of data are being produced every day analysis of information that is too complex in nature
from variety of sources. To understand big data, we to be dealt with traditional data processing systems.
need to have clear understanding of data as shown in It deals with massive volume of both structured or
section 2.1 programmed data and unstructured data or non-
programmed data that is too large to process using
2.1 Data: The word data is defined from various angles traditional software. Big Data has the potential to help
based on the interest of the parties in concerned. For companies improve operations and make faster, more
example, data is the raw facts derives from many intelligent decisions. The data is collected from a few
sources to be processed for final output as seen by the sources including emails, mobile devices, applications,
researchers. The quantities, characters, or symbols databases, servers and other means (Grimes, 2019).
on which operations are performed by a computer, This data, when captured, formatted, manipulated,
stored and then analyzed, can help a company to
which may be stored and transmitted in the form of
gain useful insight to increase revenues, get or retain
electrical signals and recorded on magnetic, optical,
customers and improve operations. Big data can be
or mechanical recording media are known as data.
well understood by its characteristics as described in
In computer science data is defined by the bit, byte,
section 2.2.1
kilobyte, megabyte, gigabyte etc. which are often used
to distinguish  binary  machine-readable information 2.2.1 Characteristics of Big Data
from textual human-readable information. For Big data can be well understood by its characteristics.
example, some applications make a distinction Big data is characterized by 6Vs shown in the following
between data files (files that contain binary data) and figure No. 2.1 and discussed below.
text files (files that contain ASCII1 data). In this context,
we define data anything that consumes bit or bytes of (i) Volume –  The name Big Data itself is related to
a computer hard disc or memory. Data has become a size which is enormous. Size of data plays a very
so powerful that it has given birth of lots of disciplines crucial role in determining value out of data. Also,
like database management systems (DBMS), data whether a data can be considered as a Big Data
science, data analytics and so on. Data also has helped or not, is dependent upon the volume of data.
evolved so many data related concepts. As technology Hence,  ‘Volume’  is one characteristic which needs
advances and changes, numerous phrases have been to be considered while dealing with Big Data. As we
used over the years to describe data. How we use know volume of data is measured by bit, bite, kilobyte,
and analyze data, data structure, massive volumes etc. megabyte, gigabyte, terabyte, petabyte, exabyte, etc.
have given a new shape of data which is now called Big (ii) Variety –  The next aspect of Big Data is
Data. Word phrases, like data integrity or data mining its  variety. Variety refers to heterogeneous
are still widely used today. Data related definitions sources and the nature of data, both structured
help us to better understand the data and its role in and unstructured. During earlier days,
information technology and business analysis. spreadsheets and databases were the only sources
1 Pronounced ask-ee, ASCII is the acronym for the American Standard Code for Information Interchange.



Vericity Variety

Big Data
Figure 2.2 Variety of data

Value Velocity cabinet. Unstructured Data refers to any data that

doesn’t reside in a traditional row-column database.
As you might expect, it’s the opposite of structured
data. Unstructured data files often include text
Variability and multimedia content. Examples include e-mail
messages, word processing documents, videos,
photos, audio files, presentations, webpages and many
other kinds of business documents. Note that while
Figure 2.1 Characteristics of Big Data
these sorts of files may have an internal structure,
they are still considered “unstructured” because the
of data considered by most of the applications. data they contain doesn’t fit neatly in a database.
Nowadays, data in the form of emails, photos, videos, Experts estimate that 80 to 90 percent of the data in
monitoring devices, PDFs, audio, etc. are also being any organization is unstructured. And the amount of
considered in the analysis applications. Data also can unstructured data in enterprises is growing significantly
be grouped in terms structure. Some data can be — often many times faster than structured databases
classified as structured data and some as unstructured are growing. Semi-structured Data: In this form of
others may fall into the spectrum called semi- data, the schema is not properly defined, i.e., both
structured and quasi-structured as shown in figure forms of data is present. So, basically semi-structured
2.2. This variety of unstructured data poses certain data has a structured form, but it isn’t defined, e.g.,
issues for storage, mining and analysis. Structured JSON, XML, CSV, TSV, and email. The web application
data that is unstructured contains transaction history
Data refers to any data that resides in a fixed field
files, log files, etc. OLTP systems (Online Transaction
within a record or file. This includes data contained
Processing) are built to work with structured data
in relational databases and spreadsheets. Structured
and the data is stored in relations, i.e., tables. Quasi-
data first depends on creating a data model – a model structured Data:  This data format consists of
of the types of business data that will be recorded textual data with inconsistent data formats that can
and how they will be stored, processed and accessed. be formatted with effort and time, and with the help
This includes defining what fields of data will be stored of several tools. For example, web server logs, i.e., a
and how that data will be stored: data type (numeric, log file that is automatically created and maintained by
currency, alphabetic, name, date, address) and any some server which contains a list of activities.
restrictions on the data input (number of characters; (iii) Velocity – The term 'velocity' refers to the speed
restricted to certain terms such as Mr., Ms. or Dr.; M of generation of data. The speed of data generation
or F etc. Structured data has the advantage of being and accumulation also plays a role in determining
easily entered, stored, queried and analyzed. At one whether the data is classified as big data or small data.
time, because of the high cost and performance With the development of technology, the need for
limitations of storage, memory and processing, velocity has become an essential element without
relational databases and spreadsheets using structured which new technologies nowadays cannot work
data were the only way to effectively manage data. effectively. Tweeter, Facebook, instant messages,
Anything that couldn’t fit into a tightly organized google searches, e-mails, new data creation etc. has
structure would have to be stored on paper in a filing increased the need for velocity as shown in Figure No.


sure that whatever raw data you are given, he/she has
cleaned it to be used for deriving business insights.
After data is cleaned, a challenge pops up, i.e., during
the process of dumping a huge amount of data, some
packages might have lost. (vi)Veracity –The quality of
Figure 2.4 Increasing the Value of Data
data is not reliable as data comes from various
uncontrolled sources. Actual and potential value of
2.3 below. At first, mainframes were used wherein Big Data is virtually worthless if the data is not
fewer people used computers. Then came the client/ accurate. This is particularly true in programs that
server model and more and more computers were involve automated decision-making or feeding the
evolved. After this, the web applications came into data into an unsupervised machine learning algorithm.
the picture and started increasing over the internet. The results of such programs are only as good as the
Then, everyone began using these applications. These data they are working with. If you have customer
applications were then used by more and more devices behavior data and want to predict purchase intent of
such as mobiles as they were very easy to access. the customer. The customer data that you have in log
Hence, a lot of data! As it is clear from the image, files in various formats various systems are incomplete
every 60 seconds, so much of the data is generated. or surrounded with noise and errors those are to be
How fast the data is generated and processed to copied, translated and unified. Data analyst must make
meet the demands, determines real potential in the sure that before they start playing with data, they are
data. Big Data Velocity deals with the speed at which cleaned up. Sometimes the job of the data analyst
data flows in from sources like business processes, revolves so much around the cleaning up of messy
application logs, networks, and social media sites, data that he is more of a ‘data janitor’ than of a data
sensors,  Mobile  devices, etc. The flow of data is scientist or data analyst. What is crucial to
massive and continuous. understanding Big Data is the messy, noisy nature of it,
(iv) Variability – This refers to the inconsistency of and the amount of work that goes into producing an
data which can be shown by the data at times, thus accurate dataset before analysis can even begin.
hampering the process of being able to handle and 2.3 Other Data Related Phrases: Big Data
manage the data effectively. Data loads become conceptualization needs other data related phrases as
challenging to be maintained especially with the discussed hereunder: Data Center: A physical or virtual
increase in usage of the social media which generally infrastructure used by enterprises to house computer,
causes peak in data loads with certain events occurring. server and networking systems and components
(v)  Value Chain-It refers to the process of making the for the company’s information and communication
big data more valuable that adds value through the chain technology (ICT) needs. Data Integrity: Refers to the
of extraction, manipulation use and implementation validity of data. Data integrity can be compromised
of big data. The process that will extract, clean and in several ways, such as human data entry errors or
process data has lots of value addition. errors that occur during data transmission from one
source to another including software bugs or viruses,
hardware malfunctions, such as disk crashes, natural
disasters, such as fires and floods. Data Miner: A
software application that monitors and/or analyzes
the activities of a computer, and subsequently its user
Figure 2.3 Velocity of data for the purpose of collecting information. The two
most common forms of data miners are data mining
At the outset, one need to mine the data, i.e., a
programs that an organization uses to analyze its own
process to turn raw data into useful data. Then, an
data to look for significant patterns, and spyware
analysis is done on the data that you have cleaned or
programs that are uploaded to a user’s computer to
retrieved out of the raw data. Afterwards one need to
monitor the user’s activity and send the data back to
make sure whatever analysis he/she does, has benefits
the organization. Data Mining: A class of database
business such as in finding out insights, results, etc.
applications that look for hidden patterns in a group
which were not possible earlier. One need to make


of data that can be used to predict future behavior. order. The second step is to find the source of data
For example, data mining software can help retail (Arora, and Malik, 2015). Data can be found through
companies find customers with common interests. The various sources. Data sources are computer, media,
phrase data mining is commonly misused to describe online and personnel etc. Data are collected through
software that presents data in new ways. True data those sources and organized according to required
mining software doesn’t just change the presentation order. The third step is to clean the data for analysis.
but discovers previously unknown relationships The error of duplication, missing and incomplete data
among the data. Data mining is popular in any fields are identified and recover in this stage. Finally, data
that requires analysis and analytics. Database: A in the right forms are analyzed. Data analytics is an
database is basically a collection of information automated or mechanical process in this information
organized in such a way that a computer program era. So, data analytics can be defined as the way of
can quickly select desired pieces of data. One can identifying valuable information from data which are
think of a traditional database as an electronic filing various in nature and varying in science. The bigger
system, organized by fields, records, and files. A field is integration of data analytics is for creating efficiency
a single piece of information; a record is one complete in business operation and performing better. The
set of fields; and a file is a collection of records. For term data analysis, data analytics and data science
example, a student register book is analogous to a file. sound similar but they are different from each other.
It contains a list of records, each of which consists of Data analysis focuses on process and function. Data
three fields: name, address, and registration number. analytics focuses on information and reporting (Lu,
An alternative concept in database design is known 2019). Data science focuses on cleaning the data for
as Hypertext. In a Hypertext database, any object, further investigation (Ahmed, 2017). Big Data analytics
whether it be a piece of text, a picture, or a film, can starts with the question of “what happened?”. This is
be linked to any other object. Hypertext databases the process of finding historical patterns in the data.
are particularly useful for organizing large amounts In this sense, data analytics focuses on summarizing
of disparate information, but they are not designed data into a meaningful insight which are based on
for numerical analysis. To access information from a historical trends which does not predict decisions.
database, we need a database management system For example: Return on Investment (ROI). On the
(DBMS) which is a collection of programs that enables other hand, Data analytics focuses on the question
you to enter, organize, and select data in a database. of “what if ?”. This part of data analytics focuses
Raw Data: Information that has been collected but not on analyzing data into valuable prediction which
formatted or analyzed. Raw data often is collected in identifies future trends. For example: future trend on
a database where it can be analyzed and made useful. consumer buying behavior can be found through data
Raw Data can be structured or unstructured as analytics. Data analytics use for making a function or
discussed under characteristics of Big data. a process more efficient by better decision making
to accumulate the answers of what (present), why,
2.4 Big Data Analytics: The process of collecting,
what (future), what (course of action). So, Big data
organizing and analyzing large sets of data to discover
analytics can be defined as the process of converting
obvious and non-obvious patterns and other useful
raw data into meaningful information with the help of
information is known as data analytics. Data analytics
machine learning and algorithm by questioning “what
is the science of processing raw data to produce
happened” or “what if ” to find historical trend or
information. Data analytics finds trends and matric
advanced trend or what course of action to prescribe
from information produced for better decision
for better decision making in performance. The word
making. Any types of information can be analyzed to
data analysis and analytics has some differences yet
better or improved decisions with the help of data
people except who are data scients and data analyst
analytics (Investopedia, 2019). Data analytics includes
use them interchangebly. Data Analysis is breaking the
data analysis and follows several steps. First, the data
parts of data into small pieces to understand data its
requirements need to be determined. For this it is
details. This analysis task is done on past data. That
necessary to find how data is grouped. For example,
is why it is said that data analysis describes the past
data may be grouped by age, gender and income etc.
i.e. what happened? How did it happen? Through
or it may be separated by categorical or by ranking


analysis we can describe or visualize existing patterns (DEI), interactive data extraction and analysis (IDEA),
available and seen in data. Data analyst or any party stata, R, etc.
related to it needs no or less wisdom to understand
the knowledge, impact and patterns therein. Data 2.5.2 Diagnostic Analytics:
analysis tools are Microsoft excel, statistical package Diagnostic analytics deal with “why happened”
for social sciences (SPSS), data envelopment analysis question. It provides deeper analysis from descriptive
(DEI), interactive data extraction and analysis (IDEA), analytics. It identifies the cause and reaction of a
stata, R etc. Data Analytics, on the other hand, refers situation. This type of data analytics needs the
to the use of analytical capabilities of the persons and information from descriptive analytics and further
the software to go forward with the existing data. investigates the information for better understanding
This analytics task is done on past data to describe of why a situation go better or worse. Example:
the future. That is why it is said that data analytics Selecting the best candidate based on quality,
describes the future i.e. what going to happen? How competence and tenure. Diagnostic analytics require
will it happen? Through analytics we can describe or three steps to conduct any study. First one is anomaly
visualize non obvious and non-existed patterns latent detection. Anomaly detection means finding any
in the data set. Data analyst or any party related to it interruption or unexpected changes in an activity. For
needs analytical blend of knowledge and wisdom to example: Unexpected raising demand of salt. The
understand develop idea, product, non-obvious and second step is the investigation of anomaly. It is the
non-existed impact and patterns therein. Data analytics process of finding how interruption or unexpected
tools are Microsoft excel (higher level applications like cause happened. The similar data to anomaly is
Solver, statistical package for social sciences (SPSS), identified in this stage. The sources of data and finding
interactive data extraction and analysis (IDEA), stata, pattern of those data are identified (Grover, and Kar,
R, simul8, ithing, Arena, Vensim, etc.

2.5 Types of Big Data Analytics:

Descriptive Analytics
There four different types of big data analytics. These
are descriptive, diagnostic, predictive and prescriptive
data analytics. Big Data analytics is a broad field so
each types of analytics have individual goals and
objectives in business applications (Northeastern.edu,
2019) as discussed below:


Descriptive Analytics Analytics

2.5.1 Descriptive Data Analytics:

Descriptive data analysis deals with revealing “what
happened”? It provides large data sets into digestible
and coherent form. This type of data analysis is more
of data analysis and less of data analytics. It needs
the collection of relevant past data, process the
data and present it in visualize form. It represents Descriptive Analytics
what happened in the past. This is the basic types of
analysis provides backbone for other data analytics. It
Figure2.5 Types of data Analytics
answers the questions like: ‘What is going on’? ‘how
it is going on’? Examples of descriptive analysis are
key performance indicator (KPI) to track the success 2017). For example: Increased prices of onion and
and failure of indicators. Return on Investment (ROI) shortage of onion in the warehouse may the cause
which find the return of investment after a given of unexpected demand of salt in the market place.
period. Metrics on specialized indicators to find the The third and final step is the determination of casual
success performance of industries. Descriptive Data relationship. In this stage, the relationship and trends
analysis tools are Microsoft excel, statistical package are identified for anomalies. To identify the cause and
for social sciences (SPSS), data envelopment analysis reaction statistical analysis like probability analysis,


time series analysis or regression analysis can be opinion, product review, feedback and entity modeling
applied. Diagnostic analytics Data analysis tools are to enhance the decision making. It requires tagging,
Microsoft excel, statistical package for social sciences clustering, pattern recognition and visualization etc. to
(SPSS), data envelopment analysis (DEI), interactive extract meaningful text data from unstructured data
data extraction and analysis (IDEA), stata, R, etc. (Clarabridge, 2019).
Sentiment Anatytics: Sentiment Analytics is the
2.5.3 Predictive Analytics: process of determining whether a piece of writing
Predictive analytics deal with the question of “what will is positive, negative or neutral. A sentiment analysis
happen?” question. It identifies the future trends and system for text analysis combines natural language
possible outcomes of a situation. This type of analytics processing and machine learning techniques to assign
needs historical data to find the future occurrence. weighted sentiment scores to the entities, topics,
Neural network, decision tree and regression analysis themes and categories within a sentence or phrase.
etc. are used for predictive analysis (Amado, 2018). It helps data analysts within large enterprises gauge
Example: Prediction of result, probabilities of certain public opinion, conduct nuanced market research,
scenario, possible outcome of diseases, market trend monitor brand and product reputation, and understand
in stock exchange and future trend on consumer customer experiences (Towardsdatascience, 2019).
behavior, determine marketing mix etc. Data analysis
Web Analytics: Web analytics is the process of analyzing
tools are Microsoft excel (higher level) like regression,
the behavior of visitors to a Web site (Sterne, 2002).
forecasting, solver et., statistical package for social
The use of Web analytics is said to enable a business
sciences (SPSS), interactive data extraction and
to attract more visitors, retain or attract new customers
analysis (IDEA), stata, R, ithnk, Arena, Vensim, etc.
for goods or services, or to increase the dollar volume
2.5.4 Prescriptive Analytics: each customer spends. It addresses the questions about
Prescriptive Analytics deal with the question of “what the navigation tour and intention of the visitors and
should be done?” question. It identifies the possible traffics on the site, the hit generated, observation and
solution where problems are identified in predictive recommendations made etc. as shown in the figure 2.6
analysis (Lepenioti and others,2019). This type of
analytic needs data from predictive analysis to provide How much
What are time are
possible courses of actions for a given scenario. For they customers
example: What product price maximizes profit? How viewing on spending
on website?
can an organization choose projects that maximize
corporate objectives subject to limited resources?
What minimizes operating costs? Where should
warehouses be located to minimize the distance
that shipments must travel? How should workers Who are
Did they
find what
be assigned to jobs to make both the bosses and the
employees happy? Data analysis tools are Microsoft wanted?
excel (higher level) like regression, forecasting, solver Where are
etc., statistical package for social sciences (SPSS), they
interactive data extraction and analysis (IDEA), stata, from?
R, ithnk, Arena, Vensim, etc.
Figure 2.6 Web Analytics
2.5.5 Other types of Big Data Analytics:
Other types of analytics are based on the objectives
and types of data analyzed to get specific insights of 2.6 Skills Set of a Data Analyst:
the business. These are text analytics, sentiment Data analyst is the person who uses data analysis
analytics, web analytics etc. of analytics software to convert the raw data into
meaningful insight so that effective decisions can be
Text Analytics: Text analytics is the process of
made (Dataquest, 2019). Data analysts combines a
converting unstructured data into meaningful text
specific set of skills as stated by some scholars. Data
data. It uses linguistic, statistical and machine learning
techniques. Text analytics is used to measure customer quest (2019) identified the skill set as (i) Computing


Skill: Use advanced computerized models to extract Hypothesis testing framework is important if you
the data needed (ii) Editing Skill: Remove corrupted are data scientist of any genre. Business knowledge
data, (iii) Validation skill: Perform initial analysis to is important to conduct good analytics. To keep the
assess the quality of the data, (iv) Analysis Skill: analysis focused, to validate, sort, relate, evaluate the
Perform further analysis to determine the meaning of data, the most critical skill of a big data scientist is to
the data, (v) Justification Skill: Perform final analysis to have a good knowledge of the domain one is working
provide additional data screening, (v) Reporting Skill: on. In fact, the reason big data analysts are so much
Prepare reports based on analysis and presenting to in demand is that it’s very rare to find resources who
management. The skills needed to be a data analyst have a thorough understanding of technical aspects,
also has been prescribed by Brink, and Stoel (2019) as statistics and business. There are analysts good in
high level mathematical ability, the ability to analyses, business and statistics but not in programming. There
model and interpret data, high level mathematical are expert programmers without the know-how
ability, problem-solving skills, a methodical and logical of how to put the programs in the context of the
approach, the ability to plan work and meet deadlines, business goal.
accuracy and attention to detail, interpersonal skills. To keep the analysis focused, to validate, sort, relate,
The skill set for big data analytics in multidisciplinary. evaluate the data, the most critical skill of a big data
It also includes Programming skills. While traditional scientist is to have a good knowledge of the domain
data analyst might be able to get away without being one is working on. In fact, the reason big data analysts
a full-fledged programmer, a big data analyst needs to are so much in demand is that it’s very rare to find
be very comfortable with coding. One of the main resources who have a thorough understanding of
reasons for this requirement is that big data is still in technical aspects, statistics and business. There
an evolution phase. Not many standard processes are analysts good in business and statistics but not
are set around the large complex datasets a big data in programming. There are expert programmers
analyst must deal with. A lot of customization is without the know-how of how to put the programs
required on daily basis to deal with the unstructured in the context of the business goal. Lastly, a good
data. Data scientist need some specific computer hold on machine learning is highly beneficial as it helps
languages such as R, Python, Java, C++, Ruby, SQL, in managing complex data structures and learning
Hive, SAS, SPSS, MATLAB, Weka, Julia, Scala. At the patterns that are too difficult to handle using traditional
minimum one needs to know R, Python, and Java. data analytics.
While working one may end up using various tools.
It requires Data Warehousing knowledge. Experience 2.7 Software needed for Data Analytics:
with relational and non -relational database systems Data analytics is done with the help of machine
is a must. Examples of relational database include – learning and algorithm. So, software is needed to
MySQL, Oracle, DB2. Examples of non-relational analysis data in data analytics (Zhang and Xie, 2019).
database include – NoSQL, HBase, HDFS, MongoDB, The software that are needed in data analytics are
CouchDB, Cassandra, Teradata, etc. It needs of various types. Microsoft excel software is used to
Computational skill. Computational frameworks. A create forms, power pivot, pivot tables, dashboards
good understanding and familiarity with frameworks etc. for analysis data and reporting it. It has lots of
such as Apache Spark, Apache Storm, Apache Samza, features including reporting, describing, diagnosing,
Apache Flink and the classic MapReduce and Hadoop. analyzing, forecasting, making queries, reengineering
These technologies help big data processing which and restructuring, filtering, formatting, looking up,
can be streamed to a great extent. Quantitative visualizing, modeling etc. Business Intelligent tools
Aptitude and statistical skills are needed as well. or software is used to data cleaning, processing
While the processing of big data requires great use and modeling and visualizing of data. Examples are
of technology, fundamental to any analysis of data Tableau, Power BI, FineReport software etc. Features
is good knowledge of Statistics and linear algebra. of these software includes Ranking Reports, What-If
Statistics is a basic building block of data science Analysis, Executive Dashboards, Interactive Reports,
and understanding of core concepts like summary Geospatial Mapping, Operational Reports, Pivot
statistics, probability distribution, random variables, Tables, Ad-Hoc Reports. R, Python, Stata, SPSS Amos
etc. are used to code anything that one wants to


make decision (Rprojectorg, 2019). Features are Data current market conditions. For example, by analyzing
handling and storage facility, Calculations on arrays, customers’ purchasing behaviors, a company can find
matrices, Collection of big data tools for data analysis, out the products that are sold the most and produce
Graphical facilities for data analysis. Discover insights products according to this trend. Big data analytics
and solve problems faster by analyzing structured keeps a company ahead of its competitors.  Control
and unstructured data, use an intuitive interface for of online reputation is another importance of Big
everyone to learn, Cloud and hybrid deployment data analytics through sentiment analytics. Therefore,
options, quickly choose the best performing algorithm you can get feedback about who is saying what about
based on model performance etc. Simulation software company. If one wants to monitor and improve the
are used to model realistic scenarios so that predictive online presence of business, then, big data tools can
results can be found in visual way. Simulation software help in all this. It is so significant in Boosting Customer
are iThink, Arena, Vensim Simul8 etc. Features Acquisition and Retention. The customer is the
of these software generally includes Animation, most important asset any business depends on. There
Statistical feature, documentation, output report and is no single business that can claim success without first
plots, visualization, forecasting, scenario analysis etc. having to establish a solid customer base. However,
Other tools are sisense, looker, periscope data, zoho even with a customer base, a business cannot afford
analytics, yellowfin, domo, qlik sense, goodData, brist, to disregard the high competition it faces. If a business
IBM analytics, IBM Cognos, IBM Waston, Matlab, is slow to learn what customers are looking for, then
Minilab, google analytics, stata, apache Hadoop, it is very easy to begin offering poor quality products.
apache spark, SAP business intelligence platform etc. In the end, loss of clientele will result, and this creates
(Financeonline, 2019). an adverse overall effect on business success. The
use of big data allows businesses to observe various
2.8 Big Data Importance: customer related patterns and trends. Observing
The amount of data is less important if a company customer behavior is important to trigger loyalty.
does not utilize it. Every company uses data in its
own way. The more efficiently a company uses 2.9 Applications of Big Data Analytics:
its data, the more potential it has to grow. The Big data analytics are used in almost all industries,
importance of big data related to the usefulness of organizations and nations. Big data analytics
data. The importance of big data is stated from many are significantly being applied in Manufacturing
angles. Perhaps the most important factor is Cost companies. Exact location of product can be tracked
Savings. Big Data analytics tools like Hadoop and by big data analytics. Customer needs are identified
Cloud-Based Analytics can bring cost advantages to by focus market. Big data helps to predict the need
business when large amounts of data are to be stored, of customers. This forecasting can be done when
parallelly processed, multiple reports are generated, companies refer to their supply chain and can help
multi vendors are supported and multi-stakeholders an organization in improving the profitability and
are benefited. Time Reduction is another benefit workforce (Zhang,2017). Manufacturing organizations
for which big data analytics are important. The high needs to maintain machinery. The machinery needs to
speed of tools like Hadoop and in-memory analytics be upgraded for running efficiently. Sensor machines
can easily identify new sources of data which helps can collect information easily. The data gathered
businesses analyzing data immediately and make from devices helps organizations in determining when
quick decisions based on the learnings. There are
numerous opportunities of cost and time reduction
using big data. Big data technologies such as Hadoop
clusters are emerging as significantly low-cost option
compared to traditional databases. It can play a role
in real time decisions regarding promoting offers
and services to customers based on their current
locations ((Davenport & Dyché, 2013). It helps to
understand the market conditions: By analyzing Figure 2.7 Big Data Analytics in Manufacturing Co.
big data one can get a better understanding of


and how intense maintenance is required by a specific being gathered by the banks through big data analytics
machine. Big data analytics can help manufacturers in to customize websites in real time. This gives
keeping track of their machines by continually customers a real and pleasant experience. Banks can
analyzing and focusing on how to improve the use analytics to send real time messages or any
efficiency of devices. Everyday activities are needed to communications regarding account’s predictive and
be monitored in any organization. Big data collects prescriptive status using what if scenarios. With Big
information from every corner of the organization Data analytics, Banks can be proactive to enhance
thus provides valuable insights of day to day business customer service. Big data analytics is used for
operations. The sources of big data are operational bboosting sales revenues. Big data analytics can
machines, databases keeping a log of the number of accurately access customer’s needs & banks can
units produced, and employee records. This promote right types of solutions. For example,
information can help companies in making decisions customers house finance on web sites are most likely
related to making changes that can be profitable for in need of a housing loan which can be identified by
the organizations. The Health Care organizations big data analytics. Fraud detection is a major
like hospitals, clinics etc. apply big data analytics contribution of big data analytics. It can detect fraud in
successfully in their operations. Big data analytics are real time and prevent it effectively. Data from third
applied for diagnostic purpose through data mining parties and banking networks holds valuable
and analysis to identify the causes of illness, predictive information about customer interactions. Study and
analytics and data analytics of genetic, lifestyle, and analysis of big data can help detect the misuses of
social circumstances are used to prevent disease. credit cards, debit cards, venture credit hazard
Leveraging aggregate data to drive hyper-personalized treatment, business clarity, customer statistics
care called precision medicine is another application alteration, money laundering, risk mitigation etc. The
of bigdata analytics. Data driven medical and Government industries are using big data analytics
pharmacological research to cure disease and discover for many purposes. Big data analytics in government
new treatment and medicine etc. are the very has local, national and international impact.
important outcome of big data analytics in health care Government need to deal with lots of information to
industries. It allows harnessing of big data to spot take decisions for millions of people. They need to
medication error and flag potential adverse reactions. keep track of various records and databases regarding
Identification of value that drives better potential the citizens, birth death, TIN, etc. for example. The
outcomes for long term savings can be achieved proper study and analysis of this data helps the
through health data analytics. It helps identifying Governments in endless ways which is possible with
disease trend, and health strategy based on big data. Big data analytics in insurance industry are
demographic, geographic, and socioeconomic factors being applied effectively. The insurance industry holds
called population health (Catalyst, 2019). Big data importance not only for individuals but also business
analytics in finance and banking industry for many companies. The reason insurance holds a significant
practical causes. Such analytics allows the institution place is because it supports people during times of
to develop meaningful customer insight. Banks and uncertainties. The data collected from these sources
financial service rendering companies follow their are of varying formats and change at tremendous
customers behavior patterns in almost all of their speeds. It uses such analytics for collecting information.
banking activities. Big data provides accurate and real As big data refers to gathering data from disparate
time information of customers. Customer service in sources, this feature creates a crucial use case for the
finance, banking and insurance organizations using big insurance industry to pounce on. Example: When a
data analytics has become a successful instance. customer needs driving insurance, the companies can
Customer’s historical data, current web data etc. are obtain information by big data from which they can
used to identify customer issues proactively and calculate the safety levels for driving in the buyer’s
resolve them even before customer complains vicinity and his past driving records. On basis of this
through predictive analytics, analyzing customers they can effectively calculate cost of car insurance as
geographical data to help banks optimize ATM well. Gaining customer insight is another predominant
locations, for example. Customer experience are application of bigdata in insurance companies.


Determining customer experience and making customers the center of a company’s attraction is of prime
importance to organizations. Those can be identified with big data. Insurance frauds are a common incidence. Big
data use case for fraud detection and reduction in highly effective manner. Threat mapping is done through big
data analytics. When an insurance agency sells an insurance, they want to be aware of all the possibilities of things
going unfavorably with their customer, making them file
a claim. Those can be identified with big data. Application
of big data in social media perhaps is the best use in
the history of bigdata analytics. Social media in the
current scenario is considered as the largest data
generator. The stats have shown that around 500+
terabytes of new data get generated into the databases
of social media every day, particularly in the case of
Facebook. The data generated mainly consist of videos,
photos, message exchanges, etc. A single activity on any
social media site generates a lot of data which is again
stored and gets processed whenever required. Since the Figure 2.8 Social Media Analytics
data stored is in terabytes, it would take a lot of time for processing if it is done by our legacy systems. Big Data
is a solution to this problem. Arguably the world’s most popular social media network with more than two billion
monthly active users worldwide, Facebook stores enormous amounts of user data, making it a massive data
wonderland. It’s estimated that there will be more than 169 million Facebook users in the United States alone by
2018. Facebook is the fifth most valuable public company in the world, with a market value of approximately
$321 billion. Every day, we feed Facebook’s data beast with mounds of information. Every 60 seconds, 136,000
photos are uploaded, 510,000 comments are posted, and 293,000 status updates are posted, that create big
data. At first, this information may not seem to mean very much. But with data like this, Facebook knows who
our friends are, what we look like, where we are, what
we are doing, our likes, our dislikes, and so much more.
Some researchers even say Facebook has enough data
to know us better than our therapists! We can see the
volume, variety and velocity of face book data from the
linked video. https://www.youtube.com/watch?v=_
r97qdyQtIk&t=277s. Big data in education industry
is another real instance around the world. Education
concerning big data produces a vital impact on students,
school systems, and curriculums. With interpreting big
Figure 2.9 Big data in e-commerce data, people can ensure students’ growth, identify at-risk
students, and achieve an improvised system for the
evaluation and assistance of authorities and teachers. For example, the education sector holds a lot of information
about curriculum, students, and faculty. The information is analyzed to get insights that can enhance the
operational adequacy of the educational organization. Collecting and analyzing information of a student such as
attendance, assignments, test scores, grades, and other issues take up a lot of data. So, big data makes an approach
for a progressive framework wherein this data can be stored and analyzed making it easier for the institutes to
work with. Big Data in e-commerce, nowadays, is another instance of data analytics. Maintaining customer
relationships is the most important in the e-commerce industry. E-commerce websites have different marketing
ideas to retail their merchandise to their customers, to manage transactions, and to implement better tactics of
using innovative ideas with big data to improve businesses. Amazon, Flipkart, Alibaba etc. are huge e-commerce
websites dealing with lots of traffic daily. But, when there is a pre-announced sale on these sites, traffic grows
exponentially that crashes the website. So, to handle this kind of traffic and data, they use big data. Big data can
help in organizing and analyzing the data for further use. Another real-life example of big data and its analytics
can be found in Stock Exchange around the world. New York Stock Exchange (NYSE) generates about one
terabyte of new trade data every single day. So, imagine, if one terabyte of data is generated every day, in a whole
year how much data there would be to process. This is what Big Data is used for.


2.10 Big data growth trend:
The amount of data created each year is growing faster than ever before. By 2020, every human on the planet
will be creating 1.7 megabytes of information in each second. In only a year, the accumulated world data will
grow to 44 zettabytes (that’s 44 trillion gigabytes). For comparison, today it’s about 4.4 zettabytes. The revenues
generated by BDA worldwide were $42 billion in 2018.
In 2027, they’re projected to increase to $103 billion
by 10.5% until then. Hadoop is the most popular big
data processing software. Its market is expanding fast
and anticipated to hold 53.7% for the period of 2015 to
2022. The Chinese big data market is one of the fastest
growing worldwide by 31.72%. By 2020, the revenue is
projected to reach ¥57.8 billion – that’s $9 billion! In
2014, they were only at ¥8.4 billion, or $1.2 billion. Figure 2.10 Big data Analytics in Education
Statistics show big data adoption can increase retail sales
by 3% to 4%. As more and more companies harness
the power of BDA, the need for tools to process the information rises as well. big data software is projected
to grow at 12.6%, reaching $46 billion in 2027. By 2020, the IoT is projected to generate over $300 billion
annually. The market will grow at a 28.5% (Sources: Forbes, Disruptor daily, Towarddatascience, Techtarget,
ExplainComputers, Cukie).

Figure: 2.11 Estimated worldwide big data market revenues for business analytics in the next 20 years (Source: Economic perspective analysis of
protecting big data security and privacy, 2019).

3. Big Data and Decision Making

Data are coming from various sources while they are structured, unstructured and semi-structured. Big data
taps those data to take strategic, tactical and operational decisions. Big data helps in decision making like:
Understanding Customer Journeys: Website clicks, online transactions and call center logs are the sources of
data that big data analytics uses for understanding customers journey.

Develop Data Data Business

Data Mining Analysis Decisions

Figure: 2.12 Big Data & Decision-Making Process.


