Data Analytics Unit-1
Data Analytics Unit-1
Koteswari
P. Preetha
Evolution of Data Analytics
Data Analytics Overview
Types of Data Analytics -Descriptive Analytics -Diagnostic Analytics -
Predictive Analytics -Prescriptive Analytics
Importance and Benefits of Data Analytics.
Different Applications of Analytics in Business
Text Analytics and Web Analytics
Skills for Business Analytics.
Data: Data is a set of values of qualitative or quantitative variables. It is information
in raw or unorganized form. It may be a fact, figure, characters, symbols etc. Data can
be numbers, like the record of daily weather, or daily sales. Data can be alphanumeric,
such as the names of employees and customers.
Information- Meaningful or organized data is information, comes from analyzing
data.
Data base: A database is a modeled collection of data that is accessible in many
ways. A data model can be designed to integrate the operational data of the
organization. The data model abstracts the key entities involved in an action and their
relationships. Most databases today follow the relational data model and its variants.
Data Warehouse:
A data warehouse is an organized store of data from all over the
organization, specially designed to help make management decisions. Data
can be extracted from operational database to answer a particular set of
queries. This data, combined with other data, can be rolled up to a
consistent granularity and uploaded to a separate data store called the data
warehouse. Therefore, the data warehouse is a simpler version of the
operational data base, with the purpose of addressing reporting and
decision-making needs only.
Data Mining :
Data Mining is the art and science of discovering useful innovative patterns
from data. There is a wide variety of patterns that can be found in the data.
Why Data Analytics?
Organizations today handle and store billions of rows of data, possibly with
millions of combinations. Data Analytics has been hailed as the ‘Game
Changer’, because businesses could transform the raw data into something
actionable, which improved their profits. One of the first applications of
analytics were found in the field of marketing, sales and customer
relationship management.
Once the firms had analyzed the data, they found plethora of information
ranging from insights into the customer’s needs to consumer behavior to
understanding the demand for products/ services.
Evolution of Analytics:
1. Analytics era 1.0:
The first era is also known as the era of ‘Business Intelligence’. Analytics
1.0 was a time of real progress in gaining an objective, deep
understanding of important business phenomena and giving managers the
fact-based comprehension to go beyond intuition when making decisions.
For the first time, data about production processes, sales, customer
interactions, and more were recorded, aggregated, and analyzed. Data
sets were small enough in volume and static enough in velocity to be
segregated in warehouses for analysis.
However, readying a data set for inclusion in a warehouse was difficult.
Analysts spent much of their time preparing data for analysis.
Analytics era 2.0 : Also known as the era of ‘Big Data’. The analytics 1.0 era
lasted until the mid- 2000’s and as analytics entered the 2.0 phase, the need for
powerful new tools and the opportunity to profit by providing them quickly
became apparent. Companies rushed to build new capabilities and acquire new
customers.
Innovative technologies of many kinds had to be created, acquired, and
mastered in this era.
Big data could not fit or be analyzed fast enough on a single server, so it was
processed with Hadoop, an open source software framework for fast batch data
processing across parallel servers.
To deal with relatively unstructured data, companies turned to a new class of
databases known as NoSQL.
Much information was stored and analyzed in public or private cloud-
computing environments.
Machine-learning methods were used to rapidly generate models from
the fast-moving data.
The competencies/ skills thus required for Analytics 2.0 were quite
different from those needed for 1.0.
The next-generation quantitative analysts were called data scientists, and
they possessed both computational and analytical skills.
Analytics era 3.0:
Like the first two eras of analytics, this one brings new challenges and
opportunities, both for the companies that want to compete on analytics
and for the vendors that supply the data and tools with which to do so.
High-performing companies will embed analytics directly into decision
and operational processes, and take advantage of machine-learning and
other technologies to generate insights in the millions per second rather
than an “insight a week or month.”
The pictorial representation of the evolution of Data
Analytics:
The pictorial representation of the evolution of Data Analytics shows
that the concept of Data Analytics started in the early 1980s.
In 1980’s the Data Analytics is used in such a way that only reporting is
used to happen.
That means what is happening with the data being obtained.
After this type of Data Analytic modeling, the Data Analytic is being
moved into the second phase that is with early 1990’s more of Analysis
(Analytics) came into existence.
In this period, it focuses on “why did it happen” to the data.
Then in 2000 onwards, the Monitoring of data happens. The dashboards
and the scoreboards are being used for the same.
With this type of analysis, a clear idea of what’s happening to the data is
being understood.
Then after 2010 onwards, the Prediction with the data and the data
inputs being implemented with.
That means, what will happen with the data is the main question being
asked in the period after 2010.
The different methods of statistics, data mining and the optimization is
being used in this period.
Now we are in the era with the more detailed data analytics and that is
of nature Prescriptive.
In this period we are training our machines to be smarter and
focusing on the computations to happen with less time and less
efforts.
So we can conclude that we are in the period with more of AI.
What is Analytics?
Analytics is the use of tools and processes to combine and examine sets of
data to identify patterns, relationships and trends.
The goal of analytics is to answer specific questions, discover new insights,
and help organizations make better, data-driven decisions.
Phase 4: Model Building –Team develops datasets for testing, training, and
production purposes.
Team also considers whether its existing tools will suffice for running the models or
if they need more robust environment for executing models.
Free or open-source tools – Rand PL/R, Octave, WEKA.
1. Business Context
2. Technology
3. Data Science
Business Context :
Business analytics projects start with the business context and ability of the
organization to ask the right questions.
Another good example of business context driving analytics is the ‘did you
forget feature’ used by the Indian online grocery store bigbasket.com
(Abraham et al., 2016). Many customers have the tendency to forget items
they intended to buy. The customers may buy the forgotten items from a
nearby store where they live, resulting in reduction in basket size in the future
for online grocery stores such as bigbasket.com.
Alternatively, the customer may place another order for forgotten items, but
this time, the size of the basket is likely to be small and results in unnecessary
logistics cost. Thus, the ability to predict the items that a customer may have
forgotten to order can have a significant impact on the profits of online grocers
such as bigbasket.com.
Another problem that online grocery customers face while ordering the items is the
time taken to place an order. Unlike customers of Amazon or Flipkart, online grocery
customers order several items each time; the number of items in an order may cross
100. Searching for all the items that a customer would like to order is a time-
consuming exercise, especially when they order using smart phones. Thus, big basket
created a ‘smart basket’ which is a basket consisting of items that a customer is likely
to buy (recommended basket) reducing the time required to place the order.
The above examples( ‘did you forget’ and smart basket feature at bigbasket.com)
manifest the importance of business context in business analytics, that is, the ability to
ask the right questions is an important success criteria for analytics projects.
Technology:
To find out whether a customer has forgotten to place an order for an item, we need
data. In both the cases, the point of sale data has to be captured consisting of past
purchases made by the customer. Information Technology (IT) is used for data capture,
data storage, data preparation, data analysis, and data share. Today most data are
unstructured data; data that is not in the form of a matrix (rows and columns) is called
unstructured data. Images, texts, voice, video, click stream are few examples of
unstructured data. To analyse data, one may need to use software such as R, Python,
SAS, SPSS, Tableau, etc. for example, in the case of Target, technology can be used to
personalize coupons that can be sent to individual customers.
Data Science :
Data Science is the most important component of analytics, it consists of statistical and
operations research techniques, machine learning and deep learning algorithms.
There are several techniques available for solving classification problems such as logistic
regression, classification trees, random forest, adaptive boosting, neural networks, and
so on. The objective of the data science component is to identify the technique that is
best based on a measure of accuracy.
What is Web Analytics?
Web analytics is the gathering, synthesizing, and analysis of website data with
the goal of improving the website user experience.
Web Analytics is the methodological study of online/offline patterns and
trends. It is a technique that you can employ to collect, measure, report, and
analyze your website data. It is normally carried out to analyze the
performance of a website and optimize its web usage.
We use web analytics to track key metrics and analyze visitors’ activity and
traffic flow.
It is a tactical approach to collect data and generate reports.
Web analytics enables a business to retain customers, attract more
visitors and increase the dollar volume each customer spends.
Analytics can help in the following ways:
Determine the likelihood that a given customer will repurchase a
product after purchasing it in the past.
Personalize the site to customers who visit it repeatedly.
Monitor the amount of money individual customers or specific groups of
customers spend.
Observe the geographic regions from which the most and the least
customers visit the site and purchase specific products.
Predict which products customers are most and least likely to buy in the
future.
Web Analytics is an ongoing process that helps in attracting more traffic to a site and
thereby, increasing the Return on Investment.
The web analytics process involves the following steps:
1. Setting goals:
The first step in the web analytics process is for businesses to determine goals and the
end results they are trying to achieve. These goals can include increased sales, customer
satisfaction and brand awareness.
2. Collecting data:
The second step in web analytics is the collection and storage of data. Businesses can
collect data directly from a website or web analytics tool, such as Google Analytics.The
data mainly comes from Hypertext Transfer Protocol requests. For example, a
user's Internet Protocol address is typically associated with many factors, including
geographic location and click through rates.
3. Processing data:
The next stage of the web analytics funnel involves businesses processing the collected
data into actionable information.
4. Identifying key performance indicators (KPIs):
In web analytics, a KPI is a quantifiable measure to monitor and analyze user
behavior on a website. Examples user sessions and on-site search queries.
5. Developing a strategy :
This stage involves implementing insights to formulate strategies that align with an
organization's goals. For example, search queries conducted on-site can help an
organization develop a content strategy based on what users are searching for on its
website.
6. Experimenting and testing:
Businesses need to experiment with different strategies in order to find the one that
yields the best results.
For example, A/B testing is a simple strategy to help learn how an audience
responds to different content. The process involves creating two or more versions of
content and then displaying it to different audience segments to reveal which
version of the content performs better.
Text Analytics is the process of converting unstructured text data into
meaningful data for analysis, to measure customer opinions, product
reviews, feedback, to provide search facility, sentimental analysis and
entity modeling to support fact based decision making.
Text analytics is the quantitative data that you can obtain by analyzing
patterns in multiple samples of text. It is presented in charts, tables, or
graphs.
Text analytics helps you determine if there’s a particular trend or
pattern from the results of analyzing thousands of pieces of feedback.
Meanwhile, you can use text analysis to determine whether a
customer’s feedback is positive or negative
Text analytics in business :
Every business strives to provide the best to their customers. To achieve this,
they are depending on text analytics to study and understand patterns, drifts in
behavior through the positive and negative feedback provided, buying trends,
opinions of consumers, blogs etc.
And modify the approachability to satisfy needs which can make a greater
impact on business.
By implementing text-based analytics, a business can bridge the gap to unlock
the very needs and demands of the customers.
Text analytics focuses on quantitative insights that give the essence of ‘why’ a
particular problem arises and ‘what’ the reasons are and upon understanding,
‘how’ can a business overcome it in the most effective way.
Various tools like HANA, Python, R, Microsoft excel etc can be used to
achieve important tasks of Text analytics as discussed below.
Important Tasks in Text Analytics:
Information Extraction: It involves extracting the relevant information from
large volumes of textual data. It centres on extracting attributes and entities. This
information can be used for further analysis.
Information Retrieval: Information Retrieval (IR) alludes to extricating relevant
and related examples dependent on a particular arrangement of words or
expressions. In this content mining strategy, IR frameworks utilize various
calculations to track and screen client practices and find applicable information as
needs are. Google and Yahoo web indexes are the two most famous IR frameworks.
Clustering: It looks to recognize characteristic constructions in text based data and
sort them into relevant subgroups or 'bunches' for additional examination. A critical
test in the grouping interaction is to frame significant groups from the unlabelled
text-based information without having any earlier data on them.
Summarization: This content mining strategy helps to create a
summary of a large volume of text in a way that the meaning and
intent of the original document is preserved.
Categorization: This technique is used to classify text (review,
paragraph, document) into a relevant category. The text could be the
reviews provided by different users for a product and the reviews
could be classified as positive or negative. Similarly, a mail can be
classified into a spam or non spam email.
Business analytics refers to the process of extracting insights from
data to make informed decisions regarding a business question or
challenge.
Here are five skills you can develop to improve your understanding of
business analytics.
1. Data Literacy
One of the fundamental skills to build before diving into business analytics
is data literacy. At its most basic, data literacy means you’re familiar with
the language of data, including different types, sources, and analytical
tools and techniques.
Being data literate also means you’re comfortable working with data in
various ways—from evaluating it to manipulating it and gaining insights.
2. Data Collection
The first step in leveraging analytics to drive business decisions is to collect a
data sample from which conclusions can be drawn.
In some cases, a dataset already exists, and it’s up to the business analyst to
pull relevant information. For example, if you’re interested in discovering a
retail store’s most profitable products, you might start by pulling historical
sales data for transactions that took place over a specific period.
3. Statistical Analysis
Several statistical methods can be helpful when it comes to analysis,
including:
Hypothesis testing , which is a statistical means of testing an assumption.