Business Analytics 3
Business Analytics 3
Introduction to Business Forecasting and Predictive analytics - Logic and Data Driven Models
– Data Mining and Predictive Analysis Modeling – Machine Learning for Predictive analytics.
Companies conduct business forecasts to determine their goals, targets, and project plans for
each new period, whether quarterly, annually, or even 2–5-year planning. Some companies
utilize predictive analytics software to collect and analyze the data necessary to make an
accurate business forecast. Predictive analytics solutions give you the tools to store data,
organize information into comprehensive datasets, develop predictive models to forecast
business opportunities, adapt datasets to data changes, and allow import/export from other data
channels.
Forecasting helps managers guide strategy and make informed decisions about critical business
operations such as sales, expenses, revenue, and resource allocation. When done right,
forecasting adds a competitive advantage and can be the difference between successful and
unsuccessful companies.
Qualitative Technique is applied when enough data is not available – i.e. when the
product is launched in the market for the first time. They use human evaluation and rating
schemes to convert qualitative data into quantitative calculations.
The goal is to gather all information and considerations related to the factors being evaluated in a
logical, impartial, and systematic manner. Such methods are often used in the field of new
technologies, where the development of product ideas may require more “invention”, so it is
difficult to study the research and development requirements, and the perception and entry of the
market is very uncertain.
Qualitative models are most successful with short-term projections. They are expert-
driven, bringing up contrasting opinions and reliance on judgment over calculable data.
Examples of qualitative models in business forecasting include:
Time Series is a set of observations on the values that a variable takes at different times.
Example: Sales trend, stock market prices, weather forecasts etc. In simple terms. Let’s take a
sales data. You would have cells that are connected on every month basis. Like for January, you
sold 150, and in February, you sold about a bit more let us assume three hundred and so on for
all the 12 months. So, you have your sales data, right? This becomes a time series for you. And
given that there is a pattern, we can predict the future sales of the same unit.
Casual Methods:
Causal forecasting recognizes that the predicted dependent variable affects one or more other
independent variables. Causal methods take into account all possible factors that may affect the
dependent variable. Consequently, the data necessary for such forecasting can vary from internal
data to external data, such as surveys, macroeconomic indicators, product characteristics, social
chatter, etc. Typically, casual models are infinitely modified to ensure that the latest data are
included in the model.
Use quantitative forecasting when there is accurate past data available to analyze patterns and
predict the probability of future events in your business or industry.
Quantitative forecasting extracts trends from existing data to determine the more probable
results. It connects and analyzes different variables to establish cause and effect between events,
elements, and outcomes. An example of data used in quantitative forecasting is past sales
numbers.
Quantitative models work with data, numbers, and formulas. There is little human interference in
quantitative analysis. Examples of quantitative models in business forecasting include:
• The indicator approach: This approach depends on the relationship between specific
indicators being stable over time, e.g., GDP and the unemployment rate. By following the
relationship between these two factors, forecasters can estimate a business's performance.
• The average approach: This approach infers that the predictions of future values are equal
to the average of the past data. It is best to use this approach only when assuming that the
future will resemble the past.
• Econometric modeling: Econometric modeling is a mathematically rigorous approach to
forecasting. Forecasters assume the relationships between indicators stay the same and test
the consistency and strength of the relationship between datasets.
• Time-series methods: Time-series methods use historical data to predict future outcomes.
By tracking what happened in the past, forecasters expect to get a near-accurate view of the
future
Choosing the right business forecasting technique depends on many factors. Some of these are:
Managers and forecasters must consider the stage of the product or business as this influences
the availability of data and how you establish relationships between variables. A new startup
with no previous revenue data would be unable to use quantitative methods in its forecast. The
more you understand the use, capabilities, and impact of different forecasting techniques, the
While there are different forecasting techniques and methods, all forecasts follow the same
process on a conceptual level. Standard elements of business forecasting include:
Prepare the stage: Before you begin, develop a system to investigate the current state of
business.
• Choose a data point: An example for any business could be "What is our sales projection
for next quarter?"
• Choose indicators and data sets: Identify the relevant indicators and data sets you need
and decide how to collect the data.
• Make initial assumptions: To kick start the forecasting process, forecasters may
make some assumptions to measure against variables and indicators.
• Select forecasting technique: Pick the technique that fits your forecast best.
• Analyze data: Analyze available data using your selected forecasting technique.
• Estimate forecasts: Estimate future conditions based on data you've gathered to reach data-
backed estimates.
• Verify forecasts: Compare your forecast to the eventual results. This helps you identify any
problems, tweak errant variables, correct deviations, and continue to improve your
forecasting technique.
Downloaded by P. AJITHA CSE
• Review forecasting process: Review any deviations between your forecasts and actual
performance data.
Choose the best forecasting methods based on the stage of the product or business life cycle, availability
of past data, and skills of the forecasters and managers leading the project.
When you have these answers, you can start collecting data from two main sources:
• Primary sources: These sources are gathered first-hand using reporting tools — you or
members of your team source data through interviews, surveys, research, or observations.
• Secondary sources: Secondary sources are second-hand information or data that others
have collected. Examples include government reports, publications, financial statements,
competitors' annual reports, journals, and other periodicals.
BUSINESS FORECASTING PROCESS
The way a company forecasts is always unique to its needs and resources, but the primary
forecasting process can be summed up in five steps. These steps outline how business forecasting
starts with a problem and ends with not only a solution but valuable learnings.
The first step in predicting the future is choosing the problem you’re trying to solve or the
question you’re trying to answer. This can be as simple as determining whether your audience
will be interested in a new product your company is developing. Because this step doesn’t yet
involve any data, it relies on internal considerations and decisions to define the problem at hand.
The next step in forecasting is to collect as much data as possible and decide how to use it. This
may require digging up some extensive historical company data and examining the past and
present market trends. Suppose your company is trying to launch a new product. In this case, the
gathered data can be a culmination of the performance of your previous product and the current
performance of similar competing products in the target market.
After collecting the necessary data, it’s time to choose a business forecasting technique that
works with the available resources and the type of prediction. All the forecasting models are
effective and get you on the right track, but one may be more favorable than others in creating a
unique, comprehensive forecast.
For example, if you have extensive data on hand, quantitative forecasting is ideal for
interpretation. Qualitative forecasting is best if you have less hard data available and are willing
to invest in extensive market research.
Once the ball starts rolling, you can begin identifying patterns in the past and predict the
probability of their repetition. This information will help your company’s decision-makers
determine what to do beforehand to prepare for the predicted scenarios.
5. Verify your findings
The end of business forecasting is simple. You wait to see if what you predicted actually
happens. This step is especially important in determining not only the success of your forecast
but also the effectiveness of the entire process. Having done some forecasting, you can compare
the present experience with these forecasts to identify potential areas for growth.
When in doubt, never throw away “old” data. The final information of one forecasting process
can also be used as the past data for another forecast. It’s like a life cycle of business
development predictions.
1. Calculating cash flow forecasts, i.e., predicting your financial needs within a timeframe
Downloaded by P. AJITHA CSE
2. Estimating the threat of new entrants into your market
A rapidly evolving modern business climate has proven how fast things can change, with
businesses evolving beside it to succeed. In fact, today’s world requires agile strategy and
management.
This is where business forecasting can help, enabling businesses to plan for unexpected
events. In this, you’ll learn the basic principles of business forecasting and how to implement
forecasting techniques in your business planning.
Now that you understand the basics of business forecasting, it’s time to see how it works in
practice. Read the following examples to better understand the different approaches to
business forecasting.
1. A company forecasting its sales through the end of the year
Let’s suppose a small greeting card company wants to forecast its sales through the end of
the year. The company has just a year and a half of experience and limited data to use for
predictions. Though the first few quarters were slow to start, they have gained a great
reputation in the last three quarters. For this reason, sales are on the rise.
Since the business has limited historical data, they might consider a qualitative model for
predicting future sales. By polling their customers, the greeting card company can gauge the
willingness of their audience to buy new cards and pricing for the remaining quarters of the
year. Market surveys are a type of qualitative forecasting, which utilizes questionnaires to
estimate future customer behavior.
In this example, let’s suppose a well-established shoe brand is forecasting profits for the next
quarter. Normally, this company would use the time series forecasting technique to estimate
profits for the next quarter. However, economic conditions have shifted, and the
unemployment rate is higher than normal. As a result, the company chooses the indicator
approach to predict the actual performance of its product.
In this next example, let’s suppose a loungewear company plans on rolling out a new product:
slippers. Since this product is new to the company, there are no official metrics for pricing
and popularity. For this reason, the company needs to gauge the interest level of its target
audience.
In this case, demand forecasting would be a great approach to gauge how much customers
are willing to spend and how much the company will need to invest in terms of materials. By
using this forecasting process, the loungewear company can decide if the product will
perform well and what kind of demand exists. Ultimately, this will help the team make
informed business decisions for production as well as sales.
PREDICTIVE ANALYTICS
Predictive analytics uses historical data to predict future events. Typically, historical data is used
to build a mathematical model that captures important trends. That predictive model is then used
on current data to predict what will happen next, or to suggest actions to take for optimal
outcomes.
Predictive analytics has received a lot of attention in recent years due to advances in supporting
technology, particularly in the areas of big data and machine learning.
Predictive analytics is often discussed in the context of big data, Engineering data, for example,
comes from sensors, instruments, and connected systems out in the world. Business system data
at a company might include transaction data, sales results, customer complaints, and marketing
information. Increasingly, businesses make data-driven decisions based on this valuable trove of
information.
Increasing Competition
With increased competition, businesses seek an edge in bringing products and services to
crowded markets. Data-driven predictive models can help companies solve long-standing
problems in new ways.
Equipment manufacturers, for example, can find it hard to innovate in hardware alone. Product
developers can add predictive capabilities to existing solutions to increase value to the customer.
Using predictive analytics for equipment maintenance, or predictive maintenance, can anticipate
equipment failures, forecast energy needs, and reduce operating costs. For example, sensors
thatmeasure vibrations in automotive parts can signal the need for maintenance before the
Downloaded by P. AJITHA CSE
vehicle fails on the road.
Companies also use predictive analytics to create more accurate forecasts, such as forecasting the
demand for electricity on the electrical grid. These forecasts enable resource planning (for
example, scheduling of various power plants), to be done more effectively.
Predictive analytics is the process of using data analytics to make predictions based on data.
This process uses data along with analysis, statistics, and machine learning techniques to create
a predictive model for forecasting future events.
The term “predictive analytics” describes the application of a statistical or machine learning
technique to create a quantitative prediction about the future. Frequently, supervised machine
learning techniques are used to predict a future value (How long can this machine run before
requiring maintenance?) or to estimate a probability (How likely is this customer to default on a
loan?).
Downloaded by P. AJITHA CSE
Predictive analytics starts with a business goal: to use data to reduce waste, save time, or cut
costs. The process harnesses heterogeneous, often massive, data sets into models that can
generate clear, actionable outcomes to support achieving that goal, such as less material waste,
less stocked inventory, and manufactured product that meets specifications.
We are all familiar with predictive models for weather forecasting. A vital industry application of
predictive models relates to energy load forecasting to predict energy demand. In this case,
energy producers, grid operators, and traders need accurate forecasts of energy load to make
decisions for managing loads in the electric grid. Vast amounts of data are available, and using
predictive analytics, grid operators can turn this information into actionable insights.
1. Import data from varied sources, such as web archives, databases, and
spreadsheets. Data sources include energy load data in a CSV file and national weather
data showing temperature and dew point.
2. Clean the data by removing outliers and combining data sources.
Identify data spikes, missing data, or anomalous points to remove from the data. Then
aggregate different data sources together – in this case, creating a single table including
energy load, temperature, and dew point.
3. Develop an accurate predictive model based on the aggregated data using
statistics, curve fitting tools, or machine learning.
Energy forecasting is a complex process with many variables, so you might choose to use
neural networks to build and train a predictive model. Iterate through your training data set
to try different approaches. When the training is complete, you can try the model against
new data to see how well it performs.
A linear regression model would be useful when a doctor wants to predict a new patient’s
cholesterol based only on their body mass index (BMI). In this example, the analyst would know
to put the data the doctor gathered from his 5,000 other patients—including each of their BMIs
and cholesterol levels—into the linear regression model. They are hoping to predict an unknown
based on a predetermined set of quantifiable data.
Logic-Driven Models
Logic driven models are created on the basis of inferences and postulations which the sample space and existing
conditions provide. Creating logical models require solid understanding of business functional areas, logical
skills to evaluate the propositions better and knowledge of business practices and research.
To understand better, let us take an example of a customer who visits a restaurant around six times in a year and
spends around ₹5000 per visit. The restaurant gets around 40% margin on per visit billing amount. The annual
gross profit on that customer turns out to be 5000 × 6 × 0.40 =
₹12000. 30% of the customers do not return each year, while 70% do return to provide more business to the
restaurant.
A logic-driven model is one based on experience, knowledge, and logical relationships of variables and
constants connected to the desired business performance outcome situation. The question here is how to put
variables and constants together to create a model that can predict the future. Doing this requires business
experience. Model building requires an understanding of business systems and the relationships of variables
and constants that seek to generate a desirable business performance outcome. To help conceptualize the
relationships inherent in a business system, diagramming methods can be helpful. For example, the cause-and-
effect diagram is a visual aid diagram that permits a user to hypothesize relationships between potential causes
of an outcome. This diagram lists potential causes in terms of human, technology, policy, and process
Downloaded by P. AJITHA CSE
resources in an effort to establish some basic relationships that impact business performance. The diagram is
used by tracing contributing and relational factors from the desired business performance goal back to possible
causes, thus allowing the user to better picture sources of potential causes that could affect the performance.
This diagram is sometimes referred to as a fishbone diagram because of its appearance.
Another useful diagram to conceptualize potential relationships with business performance variables is called the influence
diagram. According to Evans, influence diagrams can be useful to conceptualize the relationships of variables in the
development of models. It maps the relationship of variables and a constant to the desired business performance outcome of
profit. From such a diagram, it is easy to convert the information into a quantitative model with constants and variables that
define profit in this situation:
Profit = (Unit Price × Quantity Sold) - [(Fixed Cost) + (Variable Cost × Quantity Sold)], or
Data-Driven Models
Logic-driven modeling is often used as a first step to establish relationships through data- driven models (using
data collected from many sources to quantitatively establish model relationships).Types
Regression Analysis
Regression analysis is a set of statistical methods used for the estimation of relationships between a dependent
variable and one or more independent variables. It can be utilized to assess the strength of the relationship
between variables and for modeling the future relationship between them.
Correlation Analysis
Correlation Analysis is statistical method that is used to discover if there is a relationship between two
variables/datasets, and how strong that relationship may be.
Probability Distribution
The probability distribution gives the possibility of each outcome of a random experiment or event. It provides
the probabilities of different possible occurrences.
Example –
A restaurant customer dines 6 times a year and spends an average of $50 per visit. The restaurant
realizes a 40% margin on the average bill for food and drinks.
30% of customers do not return each year. Average lifetime of a customer = 1/.3 = 3.33
Predictive modeling is a method of predicting future outcomes by using data modeling. It’s one
of the premier ways a business can see its path forward and make plans accordingly. While not
foolproof, this method tends to have high accuracy rates, which is why it is so commonly used.
In short, predictive modeling is a statistical technique using machine learning and data mining to
predict and forecast likely future outcomes with the aid of historical and existing data. It works
by analyzing current and historical data and projecting what it learns on a model generated to
forecast likely outcomes. Predictive modeling can be used to predict just about anything, from
TV ratings and a customer’s next purchase to credit risks and corporate earnings.
A predictive model is not fixed; it is validated or revised regularly to incorporate changes in the
underlying data. In other words, it’s not a one-and-done prediction. Predictive models make
assumptions based on what has happened in the past and what is happening now. If incoming,
new data shows changes in what is happening now, the impact on the likely future outcome must
be recalculated, too. For example, a software company could model historical sales data against
marketing expenditures across multiple regions to create a model for future revenue based on the
impact of the marketing spend.
Most predictive models work fast and often complete their calculations in real time. That’s why
banks and retailers can, for example, calculate the risk of an online mortgage or credit card
application and accept or decline the request almost instantly based on that prediction.
Some predictive models are more complex, such as those used in computational
biology and quantum computing; the resulting outputs take longer to compute than a credit card
application but are done much more quickly than was possible in the past thanks to advances in
technological capabilities, including computing power.
Top 5 Types of Predictive Models
Fortunately, predictive models don’t have to be created from scratch for every application.
Predictive analytics tools use a variety of vetted models and algorithms that can be applied to a
wide spread of use cases.
1. Classification model: Considered the simplest model, it categorizes data for simple and
direct query response. An example use case would be to answer the question “Is this a
fraudulent transaction?”
2. Clustering model: This model nests data together by common attributes. It works by
grouping things or people with shared characteristics or behaviors and plans strategies for
each group at a larger scale. An example is in determining credit risk for a loan applicant
based on what other people in the same or a similar situation did in the past.
3. Forecast model: This is a very popular model, and it works on anything with a numerical
value based on learning from historical data. For example, in answering how much
lettuce a restaurant should order next week or how many calls a customer support agent
should be able to handle per day or week, the system looks back to historical data.
4. Outliers model: This model works by analyzing abnormal or outlying data points. For
example, a bank might use an outlier model to identify fraud by asking whether a
transaction is outside of the customer’s normal buying habits or whether an expense in a
given category is normal or not. For example, a $1,000 credit card charge for a washer
and dryer in the cardholder’s preferred big box store would not be alarming, but $1,000
spent on designer clothing in a location where the customer has never charged other
items might be indicative of a breached account.
5. Time series model: This model evaluates a sequence of data points based on time. For
example, the number of stroke patients admitted to the hospital in the last four months is
used to predict how many patients the hospital might expect to admit next week, next
month or the rest of the year. A single metric measured and compared over time is thus
more meaningful than a simple average.
1. Random Forest: This algorithm is derived from a combination of decision trees, none of
which are related, and can use both classification and regression to classify vast amounts
of data.
DATA MINING
The data mining process is used to get the pattern and probabilities from the large dataset due to
which it is highly used in business for forecasting the trends, along with this it is also used in
fields like Market, Manufacturing, Finance, and Government to make predictions and analysis
using the tools and techniques like R-language and Oracle data mining, which involves the flow
Advantages
The advantage of data mining includes the ones related to business and ones like medicine, weather
forecast, healthcare, transportation, insurance, government, etc. Some of the advantages include:
1. Marketing/Retail: It helps all the marketing companies and firms to build models which
are based on a historical set of data and information to predict the responsiveness to the
marketing campaigns prevailing today, such as online marketing campaigns, direct mail,
etc.
about loans and also credit reporting. When the model is built on historical information,
good or bad loans can then be determined by the financial institutions. Furthermore,
3. Manufacturing: The faulty equipment and the quality of the manufactured products can
be determined by using the optimal parameters for controlling. For example, for some of
the semi-conductor development industries, water hardness and quality become a major
1. Data cleansing: This is the initial stage in data mining, where the classification of the
data becomes an essential component to obtain final data analysis. It involves identifying
and removing inaccurate and tricky data from a set of tables, databases, and record sets.
Some techniques include the ignorance of tuple, which is mainly found when the class
label is not in place; the next approach requires filling the missing values on its own,
replacing missing values and incorrect values with global constants or predictable or
mean values.
2. Data integration: It is a technique that involves merging the new set of information with
the existing group. The source may, however, involve many data sets, databases or flat
files. The customary implementation for data integration is creating an EDW (enterprise data
warehouse), which then talks about two concepts- tight and loose coupling, but let’s not dig into
Downloaded by P. AJITHA CSE
the detail.
3. Data transformation: This requires transforming data within formats, generally from the
source system to the required destination system. Some strategies include Smoothing,
4. Data discretization: The technique that can split the continuous attribute domain along
intervals is called data discretization. The datasets are stored in small chunks, thereby
making our study much more efficient. Two strategies involve Top-down discretization
5. Concept hierarchies: They minimize the data by replacing and collecting low-level
concepts from high-level concepts. Concept hierarchies define the multi-dimensional data
with multiple levels of abstraction. The methods are Binning, histogram analysis, cluster
analysis, etc.
6. Pattern evaluation and data presentation: If the data is presented efficiently, the client
and the customers can make use of it in the best possible way. After going through the
above set of stages, the data is presented in graphs and diagrams and thereby
effective use. The following two are among the most popular set of tools and techniques of data
mining:
analysis, etc. It makes use of effective storage facilities and data handling.
2. Oracle data mining: It is popularly known as ODM, which becomes a part of Oracle
advanced analytics database, thereby generating detailed insights and predictions specifically
used to detect customer behavior, develop customer profiles, and identify cross-selling ways
opportunities.
One of the drawbacks can include the training of resources on software, which can be a
complicated and time-consuming task. Data mining becomes a necessary component of one’s
system today, and by making efficient use of it, businesses can grow and predict their future
According to this principle, when you go to an online store to buy earrings, you will immediately
be offered a bracelet, pendant, and rings to match. And to the swimsuit - a straw hat, sunglasses,
and sandals.
It is precisely the ideally structured array of specific information that make it possible to identify
a suspicious declaration of income among millions of others of the same kind.
Data mining is conventionally divided into three stages:
• Exploration, in which the data is sorted into essential and non-essential (cleaning, data
transformation, selection of subsets)
• A model building or hidden pattern identification, the same datasets are applied to different
models, allowing better choices. It is called competitive pricing of models
• Deployment - the selected data model is used to predict the results
Data mining is handled by highly qualified mathematicians and engineers as well as AI/ML
experts.
A common misconception is that predictive analytics and machine learning are the same thing.
This is not the case. (Where the two do overlap, however, is predictive modelling – but more
on that later.)
Machine learning, on the other hand, is a subfield of computer science that, as per Arthur
Samuel’s definition from 1959, gives ‘computers the ability to learn without being explicitly
programmed’. Machine learning evolved from the study of pattern recognition and explores
the notion that algorithms can learn from and make predictions on data. And, as they begin to
become more ‘intelligent’, these algorithms can overcome program instructions to make highly
accurate, data-driven decisions.
The most widely used predictive models are:
Decision trees:
Decision trees are a simple, but powerful form of multiple variable analysis. They are produced
by algorithms that identify various ways of splitting data into branch-like segments. Decision
trees partition data into subsets based on categories of input variables, helping you to understand
someone’s path of decisions.
Neural networks
Patterned after the operation of neurons in the human brain, neural networks (also called
artificial neural networks) are a variety of deep learning technologies. They’re typically used to
solve complex pattern recognition problems – and are incredibly useful for analyzing large data
Downloaded by P. AJITHA CSE
sets. They are great at handling nonlinear relationships in data – and work well when certain
variables are unknown
Other classifiers:
Time Series Algorithms: Time series algorithms sequentially plot data and are useful for
forecasting continuous values over time.
Clustering Algorithms: Clustering algorithms organise data into groups whose members are
similar.
Ensemble Models: Ensemble models use multiple machine learning algorithms to obtain better
predictive performance than what could be obtained from one algorithm alone.
Naïve Bayes: The Naïve Bayes classifier allows us to predict a class/category based on a given
set of features, using probability.
Support vector machines: Support vector machines are supervised machine learning techniques
that use associated learning algorithms to analyze data and recognize patterns.
Each classifier approaches data in a different way, therefore for organizations to get the results
they need, they need to choose the right classifiers and models.
For organizations overflowing with data but struggling to turn it into useful insights, predictive
analytics and machine learning can provide the solution. No matter how much data an
organization has, if it can’t use that data to enhance internal and external processes and meet
objectives, the data becomes a useless resource.
Predictive analytics is most commonly used for security, marketing, operations, risk and fraud
detection. Here are just a few examples of how predictive analytics and machine
learning are utilised in different industries:
While machine learning and predictive analytics can be a boon for any organisation,
implementing these solutions haphazardly, without considering how they will fit into everyday
operations, will drastically hinder their ability to deliver the insights the organisation needs.
To get the most out of predictive analytics and machine learning, organisations need to ensure
they have the architecture in place to support these solutions, as well as high-quality data to feed
them and help them to learn. Data preparation and quality are key enablers of predictive
analytics. Input data, which may span multiple platforms and contain multiple big data sources,
must be centralised, unified and in a coherent format.
In order to achieve this, organisations must develop a sound data governance program to police
the overall management of data and ensure only high-quality data is captured and recorded.
Secondly, existing processes will need to be altered to include predictive analytics and machine
learning as this will enable organisations to drive efficiency at every point in the business. Lastly,
organisations need to know what problems they are looking to solve, as this will help them to
determine the best and most applicable model to use.
Typically, an organisation’s data scientists and IT experts are tasked with the development of
choosing the right predictive models – or building their own to meet the organisation’s needs.
Today, however, predictive analytics and machine learning is no longer just the domain
of mathematicians, statisticians and data scientists, but also that of business analysts and
consultants. More and more of a business’ employees are using it to develop insights and
improve business operations – but problems arise when employees do not know what model to
use, how to deploy it, or need information right away.
At SAS, we develop sophisticated software to support organisations with their data governance
and analytics. Our data governance solutions help organisations to maintain high-quality data, as
well as align operations across the business and pinpoint data problems within the
same environment., Our predictive analytics solutions help organisations to turn their data into
timely insights for better, faster decision making. These predictive analytics solutions are
Machine learning derives insightful information from large volumes of data by leveraging
algorithms to identify patterns and learn in an iterative process. ML algorithms use computation
methods to learn directly from data instead of relying on any predetermined equation that may
serve as a model.
While machine learning is not a new concept – dating back to World War II when the Enigma
Machine was used – the ability to apply complex mathematical calculations automatically to
growing volumes and varieties of available data is a relatively recent development.
Today, with the rise of big data, IoT, and ubiquitous computing, machine learning has become
essential for solving problems across numerous areas, such as
Machine learning algorithms are molded on a training dataset to create a model. As new input
data is introduced to the trained ML algorithm, it uses the developed model to make a prediction.
Machine learning algorithms can be trained in many ways, with each method having its pros and
cons. Based on these methods and ways of learning, machine learning is broadly categorized into
four main types:
This type of ML involves supervision, where machines are trained on labeled datasets and
enabled to predict outputs based on the provided training. The labeled dataset specifies that some
input and output parameters are already mapped. Hence, the machine is trained with the input
and corresponding output. A device is made to predict the outcome using the test dataset in
subsequent phases.
For example, consider an input dataset of parrot and crow images. Initially, the machine is
trained to understand the pictures, including the parrot and crow’s color, eyes, shape, and size.
Post- training, an input picture of a parrot is provided, and the machine is expected to identify the
object and predict the output. The trained machine checks for the various features of the object,
such as color, eyes, shape, etc., in the input picture, to make a final prediction. This is the process
of object identification in supervised machine learning.
The primary objective of the supervised learning technique is to map the input variable (a) with
the output variable (b). Supervised machine learning is further classified into two broad
categories:
Some known classification algorithms include the Random Forest Algorithm, Decision Tree
Algorithm, Logistic Regression Algorithm, and Support Vector Machine Algorithm.
Popular regression algorithms include the Simple Linear Regression Algorithm, Multivariate
Regression Algorithm, Decision Tree Algorithm, and Lasso Regression.
Unsupervised learning refers to a learning technique that’s devoid of supervision. Here, the
machine is trained using an unlabeled dataset and is enabled to predict the output without any
supervision. An unsupervised learning algorithm aims to group the unsorted dataset based on the
input’s similarities, differences, and patterns.
For example, consider an input dataset of images of a fruit-filled container. Here, the images are
not known to the machine learning model. When we input the dataset into the ML model, the
• Clustering: The clustering technique refers to grouping objects into clusters based
on parameters such as similarities or differences between objects. For example,
grouping customers by the products they purchase.
Some known clustering algorithms include the K-Means Clustering Algorithm, Mean-Shift
Algorithm, DBSCAN Algorithm, Principal Component Analysis, and Independent Component
Analysis.
Popular algorithms obeying association rules include the Apriori Algorithm, Eclat Algorithm,
and FP-Growth Algorithm.
3. Semi-supervised learning
4. Reinforcement learning
Unlike supervised learning, reinforcement learning lacks labeled data, and the agents learn via
experiences only. Consider video games. Here, the game specifies the environment, and each
move of the reinforcement agent defines its state. The agent is entitled to receive feedback via
Predictive Modelling
Machine learning is an AI technique where the algorithms are given data and are asked to
analysis is the analysis of historical data as well as existing external data to find patterns
and behaviors.
1. Machine learning algorithms are trained to learn from their past mistakes to improve
future performance whereas predictive makes informed predictions based upon historical
and massive amounts of data whereas predictive analysis are the study and not a
particular technology which existed long before Machine learning came into existence.
Alan Turing had already made used of this technique to decode the messages during
3. Related practices and learning techniques for machine learning include Supervised and
4. Once our machine learning model is trained and tested for a relatively smaller dataset,
Downloaded by P. AJITHA CSE
then the same method can be applied to hidden data. The data effectively need not be
biased as it would result in bad decision making. In the case of predictive analysis, data
is useful when it is complete, accurate and substantial. Data quality needs to be taken
care of when data is ingested initially. Organizations use this to predict forecasts,
consumer behaviors and make rational decisions based on their findings. A success