Business analytics is an interactive process of visualization techniques can be used to find
solving a business problem, finding and analyzing patterns or relationships in a large database.
the data required to obtain a solution, and
Data dashboards: Collections of tables, charts,
interpreting the results to provide
maps, and summary statistics that are updated as
recommendations for decision making. There are
new data become available Uses of dashboards:
four types of business analytics that involve from
simple reports to the most advanced optimization - To help management monitor specific aspects
techniques. These are usually implemented in of the company’s performance related to their
stages which are interrelated with each other decision-making responsibilities
that offer various insights. - For corporate-level managers, daily data
dashboards might summarize sales by region,
Data Steward: Develops, enforces, and maintains current inventory levels, and other company-
an organization’s data governance process to wide metrics
ensure the availability and ethical use of high- - Front-line managers may view dashboards that
contain metrics related to staffing levels, local
quality data.
inventory levels, and short-term sales forecasts
Data Engineer: Designs, constructs, tests, and
Data mining: The use of analytical techniques for
maintains data infrastructures including
better understanding patterns and relationships
applications that extract, clean, transform, and
that exist in large data sets. For example, by
load data from transactional systems to
analyzing text on social network platforms such
centralized data repositories.
as Facebook and Twitter, data-mining techniques
Data Scientist: Leverages statistical techniques are used by companies to better understand their
and creates analytical models to derive new customers. By categorizing certain words as
insights from quantitative and qualitative data. positive or negative and keeping track of how
often those words appear in tweets, a company
Functional Analyst: Utilizes data and leverages on
like Samsung can better understand how its
derived insights to help organizations make
customers are feeling about a product like the
better decisions on specific functional domain.
Galaxy Flip Z 5G.
Analytics Manager: Develops and guides data-
Diagnostic analytics: Used to determine why
driven projects, from initiation to planning,
something happened in the past. At times,
execution to performance monitoring, to closure.
businesses are required to think critically about
Data query: A request for information with the nature of data and understand the
certain characteristics from a database. For descriptive analysis in depth. In order to find
example: A query to a logistics company (i.e. LBC) issues in the data, we need to find anomalous
database might be for all records of shipments to patterns that might contribute towards the poor
a particular distribution center the month of May. performance of our model.
This query provides descriptive information about
With diagnostic analysis, you are able to diagnose
these parcels such as the number parcels, how
various problems that are exhibited through your
much was included in each parcel, the date of
data. Businesses use this technique to reduce
each parcel and so on. A report summarizing
their losses and optimize their performances.
relevant information for management might be
Some of the examples where businesses use
conveyed by the use of descriptive statistics
diagnostic analysis are:
(means, measures of variations, etc.) and data-
visualization tools (tables, charts, and maps) then
a simple descriptive statistics and data-
- Businesses implement diagnostic analysis to Prescriptive analysis: Combines insights from all
reduce latency in logistics and optimize their of the above analytical techniques. It indicates a
production process. course of action to take in which the output of
- - With the help of diagnostic analysis in the prescriptive model is a decision. Prescriptive
sales domain, one can update the marketing analytics allows companies to make decisions
strategies which would otherwise diminish based on them. It makes heavy usage of Artificial
the total revenue. In a time-series data of Intelligence in order to facilitate companies into
sales, diagnostic analytics would help you making careful business decisions. It is also
understand why the sales have decreased or referred to as the final frontier of data analytics.
increased for a specific year or so. However,
Rule-based models: are types of prescriptive
this type of analytics has a limited ability to
models that rely on a rule or set of rules. For
give actionable insights. It just provides an
example, we may develop a model to predict the
understanding of causal relationships and
probability that a person will default on a loan. If
sequences while looking backward.
we create a rule that says if the estimated
Predictive analytics: Consists of techniques that probability of default is more than 0.6, we should
use models constructed from past data to not award a loan. This predictive model with the
predict the future or ascertain the impact of one rule becomes a prescriptive model.
variable on another. With the help of predictive
Optimization models: are models that give the
analysis, we determine the future outcome.
best decision subject to the constraints of the
Based on the analysis of the historical data, we
situation.
are able to forecast the future. It makes use of
descriptive analysis to generate predictions about Portfolio models: “Finance” Use historical
the future. With the help of technological investment return data to determine the mix of
advancements and machine learning, we are able investments that yield the highest expected
to obtain predictive insights about the future. return while controlling or limiting exposure to
risk
Linear regression: a regression analysis in which
relationship between the independent variables Supply network design models: “Operation”
and the dependent variables are approximated Provide the cost-minimizing plant and
by a straight line. distribution center locations subject to
meeting the customer service requirements
Time series analysis: is a set of observations on a
variable measured at successive points in time or Price markdown models: “Retailing” Uses
over successive periods of time. historical data to yield revenue maximizing
discount levels and the timing of discount offers
Data mining: is used to find patterns or
when goods have not sold as planned
relationships among elements of the data in a
large database; often used in predictive analytics Simulation optimization: combines the use of
probability and statistics to model uncertainty
Simulation: involves the use of probability and
with optimization techniques to find good
statistics to construct a computer model to study
decisions in highly complex and highly uncertain
the impact of uncertainty on a decision
settings.
Survey data and past purchase behavior may be
Decision analysis: can be used to develop an
used to help predict the market share of a new
optimal strategy when a decision maker is faced
product.
with several decision alternatives and an
uncertain set of future events. It employs utility relational databases could not handle the task.
theory that assigns values to outcomes based on Innovation was needed. In 2006, Hadoop was
the decision maker’s attitude toward risk, loss, created by engineers at Yahoo and launched as
and other factors. an Apache open source project. The distributed
processing framework made it possible to run
The computations include optimization of some
big data applications on a clustered platform.
functions that are related to the desired
This is the main difference between traditional vs
outcome. For example, while calling for a Grab
big data analytics. At first, only large companies
car online, the application uses GPS to connect
like Google and Facebook took advantage of big
you to the correct driver from among a number
data analysis. By the 2010s, retailers, banks,
of drivers found nearby. Hence, it optimizes the
manufacturers and healthcare companies began
distance for faster arrival time. Recommendation
to see the value of also being big data analytics
engines also use prescriptive analytics.
companies. Large organizations with on-premises
Major industrial players like Facebook, Netflix, data systems were initially best suited for
Amazon, and Google are using prescriptive collecting and analyzing massive data sets. But
analytics to make key business decisions. Amazon Web Services (AWS) and other cloud
Furthermore, financial institutions are gradually platform vendors made it easier for any business
leveraging the power of this technique to to use a big data analytics platform. The ability to
increase their revenue. set up Hadoop clusters in the cloud gave a
company of any size the freedom to spin up and
Big data: is any set of data that is too large or too run only what they need on demand.
complex to be handled by standard data-
processing techniques and typical desktop A big data analytics ecosystem is a key
software. component of agility, which is essential for
today’s companies to find success. Insights can
The advent of big data analytics was in response be discovered faster and more efficiently,
to the rise of big data, which began in the 1990s. which translates into immediate business
Long before the term “big data” was coined, the decisions that can determine a win.
concept was applied at the dawn of the computer
age when businesses used large spread sheets to Hadoop: An open-source programming
analyze numbers and look for trends. The sheer environment that supports big data processing
amount of data generated in the late 1990s and through distributed storage and processing over
early 2000s was fueled by new sources of data. multiple computers
The popularity of search engines and mobile
MapReduce: A programming model used within
devices created more data than any company
Hadoop that performs two major steps: the map
knew what to do with. Speed was another factor.
step and the reduce step for processing massive
The faster data was created, the more that had to
amounts of unstructured data in parallel across a
be handled. A recent study by International Data
distributed cluster.
Corporation (IDC) projected that data creation
would grow tenfold globally by 2020. Whoever Apache Kafka: Scalable messaging system that
could tame the massive amounts of raw, lets users publish and consume large numbers of
unstructured information would open a messages in real time by subscription.
treasure chest of insights about consumer
HBase: Column-oriented key/value data store
behavior, business operations, natural
that runs run on the Hadoop Distributed File
phenomena and population changes never seen
System.
before. Traditional data warehouses and
Hive: Open source data warehouse system for term for independent variable (although some
analyzing data sets in Hadoop files. people won't like me saying that; I think life
would be easier if we talked only about
Pig: Open source technology for parallel
predictors and outcomes).
programming of MapReduce jobs on Hadoop
clusters. Outcome variable: A variable thought to change
as a function of changes in a predictor variable.
Spark: Open source and parallel processing
This term could be synonymous with 'dependent
framework for running large-scale data
variable' for the sake of an easy life.
analytics applications across clustered systems.
The relationship between what is being measured
YARN: Cluster management technology in
and the numbers that represent what is being
second-generation Hadoop
measured is known as the level of measurement.
Data security is the protection of stored data Broadly speaking, variables can be categorical or
from destructive forces or unauthorized users, continuous, and can have different levels of
and is of critical importance to companies. For measurement.
example, credit card transactions are potentially
Binary variable: There are only two categories
very useful for understanding consumer
(e.g. dead or alive).
behavior, but compromise of these data could
lead to unauthorized use of the credit card or Nominal variable: There are more than two
identity theft. categories (e.g. whether someone is an
omnivore, vegetarian, vegan, or fruitarian).
Data collection is defined as the procedure of
Ordinal variable: The same as a nominal variable
collecting, measuring and analyzing accurate
but the categories have a logical order (e.g.
insights for research using standard validated
whether people got a fail, a pass, a merit or a
techniques. A researcher can evaluate their
distinction in their exam).
hypothesis on the basis of collected data. In most
cases, data collection is the primary and most Interval variable: Equal intervals on the variable
important step for research, irrespective of the represent equal differences in the property being
field of research. The approach of data collection measured (e.g. the difference between 6 and 8 is
is different for different fields of study, depending equivalent to the difference between 13 and 15).
on the required information. The most critical
Ratio variable: The same as an interval variable,
objective of data collection is ensuring that
but the ratios of scores on the scale must also
information-rich and reliable data is collected for
make sense (e.g. a score of 16 on an anxiety scale
statistical analysis so that data-driven decisions
means that the person is, in reality, twice as
can be made for research.
anxious as someone scoring 8). The relationship
Independent variable: A variable thought to be between what is being measured and the
the cause of some effect. This term is usually numbers that represent what is being measured
used in experimental research to denote a is known as the level of measurement. Broadly
variable that the experimenter has manipulated. speaking, variables can be categorical or
continuous, and can have different levels of
Dependent variable: A variable thought to be
measurement.
affected by changes in an independent variable.
You can think of this variable as an outcome. There will often be discrepancy between the
numbers we use to represent the thing we’re
Predictor variable: A variable thought to predict
measuring and the actual value of the thing we’re
an outcome variable. This is basically another
measuring (i.e the value we would get if we could Hypothesis testing eliminates assumptions while
measure it directly). This discrepancy is known as making a proposition from the basis of reason.
Measurement Error.
For collectors of data, there is a range of
Validity - the first property, which is whether an outcomes for which the data is collected. But the
instrument actually measures what it sets out to key purpose for which data is collected is to put a
measure researcher in a vantage position to make
predictions about future probabilities and trends.
Criterion Validity – is whether the instrument is
measuring what it claims to measure. In an ideal The core forms in which data can be collected are
world, you could assess this by relating scores on primary and secondary data. While the former is
your measure to real-world observations. For collected by a researcher through first-hand
example, we could take an objective measure of sources, the latter is collected by an individual
how helpful lecturers were and compare these other than the user.
observations to student’s rating.
Primary data collection by definition is the
Content Validity – with self-report gathering of raw data collected at the source. It is
measures/questionnaires we can also assess the a process of collecting the original data collected
degree to which individual items represent the by a researcher for a specific research purpose. It
construct being measured, and cover the full could be further analyzed into two segments;
range of the construct. qualitative research and quantitative data
collection methods.
Reliability – the second property, which is
whether an instrument can be interpreted The qualitative research methods of data
consistently across different situations. collection does not involve the collection of data
that involves numbers or a need to be deduced
Test-retest reliability – validity is a necessary but
through a mathematical calculation, rather it is
not sufficient condition of a measure. A second
based on the non-quantifiable elements like the
consideration is reliability, which is the ability of
feeling or emotion of the researcher. An example
the measure to produce the same results under
of such a method is an open-ended
the same conditions. To be valid the instrument
questionnaire.
must first be reliable. The easiest way to assess
reliability is to test the same group of people Quantitative methods are presented in numbers
twice: a reliable instrument will produce similar and require a mathematical calculation to
scores at both points in time. Sometimes, deduce. An example would be the use of a
however, you will want to measure something questionnaire with close-ended questions to
that varies over time (e.g. exam scores, arrive at figures to be calculated mathematically.
productivity rate). Statistical methods can also be Also, methods of correlation and regression,
used to determine reliability. mean, mode and median.
Data collection is a methodical process of Secondary data collection, on the other hand, is
gathering and analyzing specific information to referred to as the gathering of second-hand data
proffer solutions to relevant questions and collected by an individual who is not the original
evaluate the results. It focuses on finding out all user. It is the process of collecting data that
there is to a particular subject matter. Data is already exists, be it already published books,
collected to be further subjected to hypothesis journals and/or online portals. In terms of ease, it
testing which seeks to explain a phenomenon. is much less expensive and easier to collect.
An interview is a face-to-face conversation the ones originally used when the data was
between two individuals with the sole purpose of initially gathered. It involves adding
collecting relevant information to satisfy a measurement to a study or research. An example
research purpose. would be sourcing data from an archive.
Structured interviews - a verbally administered Observation This is a data collection method by
questionnaire. In terms of depth, it is surface which information on a phenomenon is gathered
level and is usually completed within a short through observation. The nature of the
period. For speed and efficiency, it is highly observation could be accomplished either as a
recommendable, but it lacks depth. complete observer, an observer as a participant,
a participant as an observer or as a complete
Semi-structured interviews - several key
participant. This method is a key base of
questions subsist which cover the scope of the
formulating a hypothesis.
areas to be explored. It allows a little more
leeway for the researcher to explore the subject Focus Groups The opposite of quantitative
matter. research which involves numerical based data,
this data collection method focuses more on
Unstructured interviews - an in-depth interview
qualitative research. It falls under the primary
that allows the researcher to collect a wide range
category for data based on the feelings and
of information with a purpose. An advantage of
opinions of the respondents. This research
this method is the freedom it gives a researcher
involves asking open-ended questions to a group
to combine structure with flexibility even though
of individuals usually ranging from 6-10 people,
it is more time-consuming.
to provide feedback.
Questionnaires - this is the process of collecting
Combination Research This method of data
data through an instrument consisting of a series
collection encompasses the use of innovative
of questions and prompts to receive a response
methods to enhance participation to both
from individuals it is administered to.
individuals and groups. Also under the primary
Questionnaires are designed to collect data from
category, it is a combination of Interviews and
a group. For clarity, it is important to note that a
Focus Groups while collecting qualitative data.
questionnaire isn't a survey, rather it forms a part
This method is key when addressing sensitive
of it. A survey is a process of data gathering
subjects.
involving a variety of data collection methods,
including a questionnaire. On a questionnaire,
there are three kinds of questions used. They are;
fixed-alternative, scale, and open-ended. With
each of the questions tailored to the nature and
scope of the research.
Reporting By definition, data reporting is the
process of gathering and submitting data to be
further subjected to analysis. The key aspect of
data reporting is reporting accurate data because
of inaccurate data reporting leads to uninformed
decision making.
Existing Data This is the introduction of new
investigative questions in addition to/other than