0% found this document useful (0 votes)
61 views42 pages

Sia2206 Data Analytics Notes

The document provides comprehensive notes on data analytics, covering its concepts, types, tools, and methodologies. It discusses the data ecosystem, including structured, semi-structured, and unstructured data, as well as data collection methods and statistical analysis techniques. Additionally, it highlights the importance of data visualization and simulation modeling in understanding data and making informed decisions.

Uploaded by

trish200382
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views42 pages

Sia2206 Data Analytics Notes

The document provides comprehensive notes on data analytics, covering its concepts, types, tools, and methodologies. It discusses the data ecosystem, including structured, semi-structured, and unstructured data, as well as data collection methods and statistical analysis techniques. Additionally, it highlights the importance of data visualization and simulation modeling in understanding data and making informed decisions.

Uploaded by

trish200382
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 42

SCI2206: Data Analytics – Comprehensive Notes (Part 1)

Topics Covered:

TOPIC 1.

Introduction to Data Analytics

Concepts of Data Analytics

Data Analytics is the process of collecting, organizing, analyzing, and


interpreting data to uncover meaningful patterns and support decision-
making. It’s widely used across industries to improve efficiency, gain
insights, and drive strategic planning.

Types of Data Analytics:

Descriptive Analytics – Explains what happened using historical data.

Example: Monthly sales reports.

Diagnostic Analytics – Explores why something happened.

Example: Analyzing churn after customer complaints.

Predictive Analytics – Predicts what might happen in the future.

Example: Forecasting next quarter’s revenue.

Prescriptive Analytics – Suggests what should be done.

Example: Recommending optimal pricing strategies.


Qualitative Analysis

Involves analyzing non-numerical data like text, audio, or video. It’s used to
interpret meanings, opinions, and themes.

Example:

Analyzing interview transcripts to identify customer satisfaction themes.

Tools & Techniques:

NVivo

Manual coding

Thematic analysis

Content analysis

Quantitative Analysis
Focuses on numeric data, using statistical or mathematical models to
analyze and interpret measurable variables.

Example:

Using correlation analysis to understand the relationship between study time


and exam scores.

Common Techniques:

Descriptive statistics (mean, median, mode)

Regression analysis

Probability distributions

Hypothesis testing

Data Analytics Tools


Trends in Data Analytics

AI & Machine Learning: Automating analysis and predictions.

Real-time Data Processing: Stream processing (e.g., for financial trades).

Self-Service BI: Enabling users to run their own reports.

Edge Analytics: Performing analysis closer to where data is generated (e.g.,


IoT).

Data Democratization: Making data more accessible and usable across all
departments.

TOPIC 2.

The Data Ecosystem

Types of Data

1. Structured Data

Organized in rows and columns.

Stored in relational databases.


Easily searchable.

Examples:

Customer records in Excel

SQL database tables

2. Semi-Structured Data

Doesn’t conform to rigid table formats but has organizational markers (tags,
keys).

Examples:

JSON and XML files

NoSQL databases

Web logs
3. Unstructured Data

No predefined structure.

Requires advanced tools to process and analyze.

Examples:

Emails

Images

Videos

Social media posts

Audio recordings

File Formats in Data Analytics


Sources of Data

Internal Systems: Sales databases, HR records, CRM software

Social Media Platforms: Facebook, Twitter analytics

IoT Devices: Sensors, smartwatches, manufacturing systems

Surveys & Interviews: Questionnaires and open-ended feedback

Open Data Repositories: Government, research institutions

Languages Used in Data Manipulation

ETL Process (Extract, Transform, Load)

A foundational pipeline process for moving and preparing data:

1. Extract – Pulling raw data from various sources (databases, APIs, files).
2. Transform – Cleaning and formatting data to make it usable. This
includes:

Removing duplicates

Normalizing values

Parsing dates or text

3. Load – Inserting transformed data into target systems like data


warehouses or BI dashboards.

Example:

Extract product data from Shopify → Clean in Python (remove missing prices)
→ Load into Power BI for visualization.

TOPIC 3.

Basic Simulation and Modelling

Concepts and Classification


Simulation:

Definition: A technique used to model the operation of a system or process


over time. It involves using mathematical models to replicate real-world
systems and processes in order to understand their behavior and predict
outcomes.

Types:

1. Monte Carlo Simulation: Uses random sampling to obtain numerical


results for problems that may be deterministic in principle. It is often
used for risk analysis and decision-making.

Example: Simulating stock market returns by generating random price


changes based on historical data.

2. Discrete Event Simulation: Models systems where events occur at


distinct points in time, such as a queue in a bank or customer service
center.

Example: Simulating customer arrival times and service times at a call center
to optimize staffing.

Classification:
Deterministic Models: The outcome is fully determined by the initial
conditions, and there is no randomness involved.

Example: A mathematical model predicting the growth of a population with a


fixed growth rate.

Stochastic Models: The outcome involves randomness and uncertainty, and


the same set of initial conditions can lead to different outcomes.

Example: Predicting customer behavior based on probabilistic distributions.

Simulation and Modelling Methodologies

1. Mathematical Models: Represent real-world processes with


mathematical equations. These models are often used in simulation to
describe the system’s behavior.

Example: A model describing the motion of a car using Newton’s laws of


motion.
2. Agent-Based Modelling (ABM): Involves creating models based on
individual entities (agents) and simulating their interactions over time.

Example: Modeling traffic flow by simulating individual vehicles and their


interactions at intersections.

3. System Dynamics: Focuses on understanding the feedback loops and


time delays within complex systems.

Example: Modeling population dynamics using birth and death rates to


simulate future population growth.

Verification and Validation

1. Verification: The process of ensuring that the model works as intended


and that there are no errors in the design, coding, or logic of the
model.

Example: Checking whether a simulation program correctly implements the


equations and logic as defined in the model.
2. Validation: The process of ensuring that the model accurately
represents the real-world system. Validation confirms whether the
model’s predictions align with actual observed outcomes.

Example: Comparing simulation results with real-world traffic data to see if


the model accurately predicts traffic patterns.

Simulation Tools

1. Monte Carlo Tools: Tools like @Risk and Crystal Ball can be used to
perform Monte Carlo simulations for risk analysis and decision support.

Example: Using @Risk to model the financial risk of an investment portfolio.

2. AnyLogic: A multi-method simulation modeling tool that supports


discrete event, agent-based, and system dynamics modeling.

Example: Using AnyLogic to simulate supply chain logistics and optimize


inventory management.
3. Arena: A discrete event simulation software widely used for modeling
manufacturing processes, queuing systems, and other business
processes.

Example: Simulating the operation of a manufacturing plant to optimize


production efficiency.

4. Simulink: Used for modeling and simulating dynamic systems,


particularly in engineering and control systems.

Example: Simulating the dynamics of a robotic arm for precise movement


control.

TOPIC 4

Data Collection, Analysis, and Visualization

Sampling
Definition: Sampling is the process of selecting a subset of individuals or
observations from a larger population to estimate characteristics of the
whole population.

Why Sampling?

Cost-effective: It is often impractical or expensive to collect data from an


entire population, so sampling allows for more efficient data collection.

Time-saving: Sampling allows for quicker results since collecting data from
the entire population can be time-consuming.

Statistical Inference: Sampling enables statistical methods to make


inferences about the entire population based on the sample.

Types of Sampling:

1. Random Sampling: Each individual in the population has an equal


chance of being selected.

Example: Randomly selecting survey participants from a list of registered


voters.

2. Stratified Sampling: The population is divided into subgroups (strata),


and samples are taken from each subgroup.
Example: Dividing a population by age groups (e.g., 18-24, 25-34) and
selecting random samples from each group to ensure representation across
all age categories.

3. Systematic Sampling: Every nth item is selected from a list or ordered


population.

Example: Selecting every 10th customer from a queue to survey about their
experience.

4. Cluster Sampling: The population is divided into clusters, and entire


clusters are randomly selected.

Example: Randomly selecting a few schools from a district and surveying all
students in those schools.

5. Convenience Sampling: Sampling based on the ease of access or


availability of the data.

Example: Surveying people who are readily available, such as walking


through a mall and asking shoppers for their opinions.
TOPIC 4.

Data Collection

Definition: Data collection is the systematic process of gathering information


from various sources to answer research questions or to inform decision-
making.

Methods of Data Collection:

1. Surveys and Questionnaires:

A common method for collecting data, where participants respond to a set of


questions.

Example: An online survey asking customers about their satisfaction with a


product.

2. Interviews:

Collecting qualitative data through one-on-one or group interviews.

Example: Conducting interviews with employees to understand workplace


satisfaction.
3. Observations:

Collecting data through direct observation of behaviors or events.

Example: Observing how customers navigate a retail store to understand


purchasing behavior.

4. Existing Data:

Using already available datasets from previous studies, databases, or public


sources.

Example: Accessing government statistics for demographic research.

5. Experiments:

Conducting controlled experiments where researchers manipulate variables


to observe effects.

Example: Testing the effectiveness of a new drug by randomly assigning


participants to treatment and control groups.
Analysis

Definition: Data analysis involves inspecting, cleaning, transforming, and


modeling data with the goal of discovering useful information, drawing
conclusions, and supporting decision-making.

Steps in Data Analysis:

1. Data Cleaning:

Addressing issues like missing values, outliers, and inconsistencies in the


dataset to ensure the data is ready for analysis.

Example: Removing duplicate entries from a customer database or filling in


missing demographic information.

2. Exploratory Data Analysis (EDA):

Using visual and statistical methods to explore and understand the data
before applying complex modeling techniques.

Example: Generating summary statistics like mean, median, and standard


deviation to understand the central tendency of customer ages in a dataset.
3. Statistical Analysis:

Using statistical techniques to examine the relationships between variables


and test hypotheses.

Example: Conducting a t-test to determine if there is a significant difference


between the average sales of two products.

4. Predictive Analysis:

Using historical data to make predictions about future events or outcomes.

Example: Using past sales data to forecast future demand for a product.

5. Descriptive Analysis:

Summarizing the data to describe its main features.

Example: Calculating the average income of survey respondents to describe


the income level of a population.
Visual Results

Definition: Data visualization is the process of representing data in a


graphical format, such as charts, graphs, and maps, to help communicate
findings effectively.

Types of Data Visualizations:

1. Bar Charts:

Display categorical data with rectangular bars to represent the frequency or


amount of data for each category.

Example: A bar chart showing the number of sales for each product category.

2. Histograms:

Similar to bar charts but used for continuous data. They show the distribution
of a single variable.

Example: A histogram representing the distribution of exam scores in a class.


3. Pie Charts:

Display proportions of a whole. Each slice represents a category’s


contribution to the total.

Example: A pie chart showing the market share of different smartphone


brands.

4. Line Charts:

Used to display data trends over time, where each point represents a data
value at a specific time.

Example: A line chart showing the stock price of a company over the past
year.

5. Scatter Plots:

Represent data points on a two-dimensional plane to identify relationships


between two variables.

Example: A scatter plot showing the correlation between advertising spend


and sales revenue.
6. Heatmaps:

A graphical representation of data where values are depicted by color


gradients, used for analyzing complex datasets with multiple variables.

Example: A heatmap showing the intensity of website clicks across different


sections of a webpage.

7. Box Plots:

Visualize the distribution of a dataset through quartiles and highlight outliers.

Example: A box plot showing the distribution of income levels across


different age groups.

TOPIC 5.

Statistical Analysis
Sampling Distributions

Definition: A sampling distribution is the probability distribution of a given


statistic based on a random sample. It describes how the sample statistic
(e.g., sample mean, sample proportion) varies from one sample to another
drawn from the same population.

Central Limit Theorem (CLT): The Central Limit Theorem states that the
distribution of the sample mean will be approximately normal (bell-shaped) if
the sample size is sufficiently large, regardless of the shape of the population
distribution. This is fundamental in statistical inference.

Example: If you take many random samples from a population and calculate
the mean for each sample, the distribution of those sample means will
approach a normal distribution as the sample size increases.

Standard Error: The standard error is the standard deviation of the sampling
distribution. It represents the variability of the sample statistic.

Formula:

Where:

= population standard deviation

= sample size
Example: If the population mean income is $50,000 with a standard
deviation of $5,000, and a sample of 100 people is taken, the standard error
of the mean is .

Random Variable and Distribution Statistical Functions

1. Random Variable:

A random variable is a variable whose values are determined by the


outcomes of a random phenomenon.

Types:

Discrete Random Variable: Takes on a finite or countably infinite set of


values. E.g., number of heads in 10 coin flips.

Continuous Random Variable: Can take on any value within a given range.
E.g., height of a person or temperature at a specific location.

2. Probability Distribution:
A probability distribution describes the likelihood of each possible outcome of
a random variable.

Types of Distributions:

Binomial Distribution (for discrete variables): Used when there are two
possible outcomes, such as success/failure in trials.

Example: Tossing a coin 10 times and counting the number of heads.

Normal Distribution (for continuous variables): A bell-shaped curve where


most of the values cluster around the mean.

Example: Heights of people in a population.

Poisson Distribution: Used for modeling the number of events occurring


within a fixed interval of time or space.

Example: The number of customer arrivals at a store in an hour.

3. Statistical Functions:
Mean: The average of all values.

Formula:

Variance: Measures the spread of the data.

Formula:

Standard Deviation: The square root of the variance, it gives a sense of how
spread out the data is.

Formula:

Statistical Inference

Definition: Statistical inference is the process of drawing conclusions about a


population based on a sample. It allows researchers to make estimates or
test hypotheses about a population without having to collect data from every
individual in the population.
Key Concepts in Statistical Inference:

1. Point Estimation:

A point estimate is a single value used to estimate a population parameter.

Example: Using the sample mean as an estimate of the population mean.

2. Confidence Intervals:

A confidence interval is a range of values used to estimate the true value of


a population parameter. It is associated with a confidence level, such as
95%.

Formula:

\text{CI} = \bar{X} \pm Z \left( \frac{\sigma}{\sqrt{n}} \right)

- \bar{X} = sample mean

- Z = Z-score corresponding to the desired confidence level (e.g., 1.96


for 95% confidence)

- \sigma = population standard deviation

- n = sample size


Example: A 95% confidence interval for the average salary of employees
might be ($45,000, $50,000), meaning there’s a 95% chance that the true
population mean falls within that range.

3. Hypothesis Testing:

Hypothesis testing is used to make decisions about population parameters


based on sample data. It involves comparing a null hypothesis (H₀) against
an alternative hypothesis (H₁).

Steps:

1. State the hypotheses (Null and Alternative).

2. Select the significance level (α), typically 0.05 or 0.01.

3. Compute the test statistic (e.g., t-test, chi-square test).

4. Determine the p-value: The probability of observing the test statistic or


something more extreme, assuming the null hypothesis is true.

5. Decision: If the p-value is less than the significance level (α), reject the
null hypothesis.
Example: Testing whether a new drug is effective. Null hypothesis: The drug
has no effect. Alternative hypothesis: The drug is effective. If the p-value is
less than 0.05, reject the null hypothesis.

4. Types of Tests:

Z-test: Used when the sample size is large (n > 30) and the population
variance is known.

T-test: Used when the sample size is small (n < 30) and the population
variance is unknown.

Chi-Square Test: Used to test relationships between categorical variables.

Example: Testing if there is a relationship between gender and choice of


product.

______________________________________________________________________________
______

TOPIC 6.

Text Analytics

Natural Language Basics


Definition: Natural language refers to the language that humans use to
communicate, such as English, Spanish, or Chinese. In the context of text
analytics, it involves processing and analyzing text data to extract
meaningful information.

Natural Language Processing (NLP) is the field of study that focuses on


enabling computers to understand, interpret, and manipulate human
language.

Goal of NLP: To convert human language into a form that machines can
process, and then extract useful insights from that data.

Key Concepts in NLP:

1. Tokenization:

The process of splitting text into individual units, such as words or phrases,
known as tokens.

Example: The sentence “Text analytics is exciting” is tokenized into [“Text”,


“analytics”, “is”, “exciting”].

2. Stop Words:
Common words (such as “the”, “and”, “is”) that do not carry significant
meaning and are often removed in text analysis to focus on the more
important terms.

3. Stemming:

The process of reducing words to their root form. For example, “running”
becomes “run” or “better” becomes “good”.

4. Lemmatization:

Similar to stemming, but it involves reducing words to their base or


dictionary form (lemma). For example, “better” becomes “good”, and
“running” becomes “run”.

5. Part of Speech Tagging (POS):

Assigning labels to each word in a sentence based on its grammatical role,


such as noun, verb, adjective, etc.

Example: In the sentence “The dog runs fast”, the POS tags would be:
[(“The”, “DT”), (“dog”, “NN”), (“runs”, “VBZ”), (“fast”, “RB”)].
Text Analytics Methods

Definition: Text analytics methods are the techniques and algorithms used to
analyze and extract meaningful patterns and insights from text data.

Key Methods in Text Analytics:

1. Sentiment Analysis:

The process of determining the sentiment (positive, negative, or neutral)


expressed in a piece of text.

Example: Analyzing customer reviews to determine if they are positive (e.g.,


“Great product!”) or negative (e.g., “Very poor quality”).

2. Topic Modeling:

A method for discovering abstract topics within a collection of texts. It groups


words that frequently occur together into topics.
Example: A set of news articles might be analyzed to identify topics like
“politics”, “sports”, and “technology”.

3. Text Classification:

Categorizing text into predefined classes or categories based on its content.

Example: Categorizing news articles as “sports”, “politics”, or


“entertainment”.

4. Named Entity Recognition (NER):

The process of identifying and classifying entities (such as names of people,


organizations, dates, locations) in text.

Example: “Apple Inc. Was founded by Steve Jobs in 1976” would result in
entities like “Apple Inc.” (organization), “Steve Jobs” (person), and “1976”
(date).

5. Text Summarization:

Creating a concise summary of a longer text document while retaining its


essential meaning.
Example: Summarizing a lengthy research paper into a few key sentences
highlighting the major findings.

Applications of Text Analytics

Definition: Text analytics has various real-world applications where extracting


insights from large volumes of textual data is essential.

Key Applications:

1. Customer Feedback Analysis:

Companies use text analytics to analyze customer reviews, surveys, and


social media posts to understand customer sentiment, identify product
issues, and improve customer satisfaction.

Example: Analyzing tweets about a product to assess public sentiment and


identify any recurring complaints.

2. Social Media Monitoring:


Text analytics is used to track public opinion on social media platforms,
detect trends, and monitor brand health.

Example: Analyzing Twitter feeds to track public sentiment about a political


figure or event.

3. Healthcare Text Mining:

Analyzing medical records, research papers, and clinical notes to extract


valuable information for improving patient care and supporting medical
research.

Example: Extracting information about patient symptoms and diagnoses


from electronic health records.

4. Legal Document Analysis:

Text analytics can be applied to legal documents, contracts, and case law to
identify key terms, clauses, and precedents that are relevant to a case.

Example: Automating the extraction of key clauses from contracts to assist in


legal research and due diligence.
5. Fraud Detection:

Analyzing emails, transaction descriptions, and other forms of text to detect


fraudulent activities or patterns.

Example: Detecting fraudulent insurance claims by analyzing the language


used in claim descriptions.

Text Analytics Tools

Definition: Text analytics tools are software platforms that facilitate the
process of analyzing, processing, and extracting insights from textual data.

Popular Text Analytics Tools:

1. NLTK (Natural Language Toolkit):

A comprehensive Python library for working with human language data. It


includes functions for tokenization, parsing, stemming, POS tagging, and
more.

Example: Using NLTK to perform sentiment analysis on a set of customer


reviews.
2. spaCy:

A fast and efficient NLP library for Python, often used for large-scale text
processing tasks such as tokenization, named entity recognition, and
dependency parsing.

Example: Extracting entities like people, organizations, and dates from a


news article using spaCy.

3. TextBlob:

A simple Python library for processing textual data. It provides basic


functionalities like part-of-speech tagging, noun phrase extraction, and
sentiment analysis.

Example: Performing basic sentiment analysis of product reviews using


TextBlob.

4. Apache OpenNLP:
An open-source machine learning-based toolkit for processing natural
language text. It supports various tasks such as tokenization, POS tagging,
and parsing.

Example: Using OpenNLP to parse text and extract sentence structures.

5. Google Cloud Natural Language API:

A cloud-based API that allows developers to perform tasks like sentiment


analysis, entity recognition, and syntax analysis without needing to manage
the underlying infrastructure.

Example: Analyzing customer feedback from surveys using the Google Cloud
Natural Language API.

TOPIC 7.

Predictive Analytics

Definition: Predictive analytics uses statistical techniques, machine learning


models, and data mining to analyze historical data and make predictions
about future events. It identifies patterns and trends in data that can be used
to forecast outcomes.

 Goal of Predictive Analytics: The goal is to forecast the likelihood of


future events based on past data, enabling businesses and
organizations to make informed decisions.
 Key Techniques:
1. Regression Analysis: A statistical method used to predict the
value of a dependent variable based on the value of one or more
independent variables.
 Example: Predicting house prices based on features like
size, location, and number of bedrooms.
2. Time Series Forecasting: A method used for forecasting future
values based on past data collected over time.
 Example: Predicting stock prices or sales numbers over
the next few months based on historical trends.
3. Classification Models: These models predict categorical
outcomes, such as whether an event will happen or not.
 Example: Predicting whether a customer will churn (leave
a service) or remain loyal based on their behavior and
historical data.

Uses of Predictive Analytics

Predictive analytics is used across a wide range of industries to inform


decisions and strategies.

1. Customer Segmentation:
o Businesses use predictive analytics to segment customers into
groups based on their likelihood to respond to certain offers,
make purchases, or engage with the brand.
o Example: Retailers predicting which customers are most likely to
buy specific products during a sale.

2. Risk Management:
o Predictive analytics helps organizations assess and manage risks
by forecasting potential threats or financial losses.
o Example: Banks using predictive models to assess the likelihood
of loan defaults based on borrower history.

3. Supply Chain Optimization:


o By analyzing past demand patterns, predictive analytics helps
businesses predict future demand and optimize inventory and
logistics.
o Example: A retail company predicting demand for certain
products during peak shopping seasons and adjusting inventory
accordingly.

4. Fraud Detection:
o Predictive analytics models can analyze patterns in transactional
data to identify anomalies that may indicate fraudulent activities.
o Example: Credit card companies predicting fraudulent
transactions by analyzing spending behavior patterns.

5. Maintenance and Operations:


o Predictive maintenance uses analytics to predict when
equipment or machinery is likely to fail, allowing businesses to
perform maintenance before failures occur.
o Example: Airlines using predictive analytics to schedule
maintenance for aircraft based on usage patterns and historical
data.

6. Healthcare and Medical Predictions:


o Predictive analytics can be used to predict patient outcomes,
disease outbreaks, and treatment effectiveness based on
historical medical data.
o Example: Predicting the likelihood of a patient developing a
chronic condition like diabetes based on medical history and
lifestyle data.

Types of Predictive Analytical Models

Predictive analytics relies on various models, each suited for different types
of problems. Some common predictive models include:

1. Linear Regression:
o A statistical method used to model the relationship between a
dependent variable and one or more independent variables. It is
often used for predicting continuous outcomes.
o Example: Predicting the sales revenue of a product based on its
advertising spend.

2. Logistic Regression:
o A type of regression used for binary classification problems,
where the outcome is one of two categories (e.g., yes/no,
win/lose).
o Example: Predicting whether a customer will purchase a product
(yes/no) based on past behavior and demographics.

3. Decision Trees:
o A machine learning model that splits data into subsets based on
the value of input features to make predictions. It’s a popular
method for classification and regression tasks.
o Example: Predicting whether a loan application will be approved
based on features such as credit score, income, and loan
amount.

4. Random Forest:
o An ensemble method that uses multiple decision trees to
improve accuracy and reduce overfitting. It aggregates the
predictions from multiple trees to make a final prediction.
o Example: Predicting whether a customer will churn using various
behavioral and demographic features.

5. Support Vector Machines (SVM):


o A powerful machine learning algorithm used for classification
tasks. It works by finding the optimal hyperplane that separates
classes of data.
o Example: Classifying emails as spam or not spam based on their
content.

6. Neural Networks:
o A model inspired by the human brain, neural networks consist of
layers of nodes that process data through activation functions.
They are particularly useful for handling complex patterns and
large datasets.
o Example: Predicting stock market trends based on historical
prices and technical indicators.

7. K-Nearest Neighbors (KNN):


o A simple algorithm that makes predictions based on the majority
vote of the nearest data points. It is commonly used for
classification tasks.
o Example: Classifying new customer data into predefined
categories based on the behavior of similar customers.

8. Ensemble Methods:
o These methods combine multiple models to improve predictive
performance. Popular ensemble techniques include boosting,
bagging, and stacking.
o Example: Using an ensemble of decision trees and logistic
regression models to predict customer behavior.

You might also like