0% found this document useful (0 votes)

9 views10 pages

Big Data - Unit 5

The document outlines four main types of data analysis techniques: Descriptive, Diagnostic, Predictive, and Prescriptive, each serving distinct purposes in business decision-making. It also discusses the importance of text analysis for extracting actionable insights from unstructured data, emphasizing methods like sentiment analysis and various text analysis techniques. Additionally, it introduces ensemble methods in machine learning, highlighting their ability to improve model accuracy through techniques such as bagging, boosting, and stacking.

Uploaded by

Misba firdose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views10 pages

Big Data - Unit 5

Uploaded by

Misba firdose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Unit 5

Types of Data Analysis

Data analysis plays a crucial role in business decision-making by providing insights into past
performance and future trends. There are four main types of data analysis techniques used across
industries, each serving different purposes and levels of complexity.

1. Descriptive Analysis

 Definition: Descriptive analysis focuses on summarizing historical data to understand what has
happened.
 Key Questions: What happened? What are the key trends and patterns?
 Examples:
o KPI Dashboards: Visual summaries of key metrics like sales, revenue, and customer
acquisition.
o Monthly Revenue Reports: Summaries detailing revenue performance across different
months or quarters.
o Sales Leads Overview: Analysis of leads generated and their conversion rates.

2. Diagnostic Analysis

 Definition: Diagnostic analysis seeks to understand why certain events occurred by drilling
deeper into data.
 Key Questions: Why did it happen? What were the causes behind the outcomes observed in
descriptive analysis?
 Examples:
o Investigating Slow Shipments: Identifying factors contributing to delays in specific
regions.
o Marketing Effectiveness: Analyzing which campaigns or channels contributed most to
customer trials in a SaaS company.

3. Predictive Analysis

 Definition: Predictive analysis forecasts future outcomes based on historical data and statistical
modeling.
 Key Questions: What is likely to happen? What can we expect in the future based on past
trends?
 Examples:
o Risk Assessment: Predicting the likelihood of default for loans or credit risk.
o Sales Forecasting: Estimating future sales based on historical sales data and market
trends.
o Customer Segmentation: Identifying which customer segments are likely to respond
positively to marketing campaigns.

4. Prescriptive Analysis

 Definition: Prescriptive analysis goes beyond predicting future outcomes by recommending

actions to optimize a given outcome.
 Key Questions: What should we do? What actions should be taken to achieve desired
outcomes?
 Examples:
o AI-Based Decision Making: Using AI systems to recommend personalized actions in
customer service or logistics.
o Optimization Strategies: Recommending optimal pricing strategies based on demand
forecasting and competitor analysis.
o Operational Efficiency: Suggesting process improvements based on real-time data
insights.
. Points to Consider During Analysis

 Data Quality: Ensure data is accurate, complete, and consistent.

 Relevance: Use data that is relevant to the analysis goals.
 Context: Understand the business context and the data source.
 Bias: Be aware of any biases that might affect the analysis.
 Tools and Techniques: Choose appropriate tools and methods for analysis.

Developing an Analytic Team

1. Clarify your people analytics goals. ...
2. Decide what skills you need on your team. ...
3. Foster business acumen in your team members. ...
4. Empower your team with the right tools. ...
5. Consider where your people analytics team should be located within the
organization.
6. Set your team up for success.

What is text analysis?

Text analysis is the process of using computer systems to read and understand human-written
text for business insights. Text analysis software can independently classify, sort, and extract
information from text to identify patterns, relationships, sentiments, and other actionable
knowledge. You can use text analysis to efficiently and accurately process multiple text-based
sources such as emails, documents, social media content, and product reviews, like a human
would.

Why is text analysis important?

Businesses use text analysis to extract actionable insights from various unstructured data sources.
They depend on feedback from sources like emails, social media, and customer survey responses
to aid decision making. However, the immense volume of text from such sources proves to be
overwhelming without text analytics software.

With text analysis, you can get accurate information from the sources more quickly. The process
is fully automated and consistent, and it displays data you can act on. For example, using text
analysis software allows you to immediately detect negative sentiment on social media posts so
you can work to solve the problem

Sentiment analysis
Sentiment analysis or opinion mining uses text analysis methods to understand the opinion
conveyed in a piece of text. You can use sentiment analysis of reviews, blogs, forums, and other
online media to determine if your customers are happy with their purchases. Sentiment analysis
helps you spot new trends, track sentiment changes, and tackle PR issues. By using sentiment
analysis and identifying specific keywords, you can track changes in customer opinion and
identify the root cause of the problem.

Record management

Text analysis leads to efficient management, categorization, and searches of documents. This
includes automating patient record management, monitoring brand mentions, and detecting
insurance fraud. For example, LexisNexis Legal & Professional uses text extraction to identify
specific records among 200 million documents.

Personalizing customer experience

You can use text analysis software to process emails, reviews, chats, and other text-based
correspondence. With insights about customers’ preferences, buying habits, and overall brand
perception, you can tailor personalized experiences for different customer segments.

How does text analysis work?

The core of text analysis is training computer software to associate words with specific meanings
and to understand the semantic context of unstructured data. This is similar to how humans learn a
new language by associating words with objects, actions, and emotions.

Text analysis software works on the principles of deep learning and natural language processing.

Deep learning
Artificial intelligence is the field of data science that teaches computers to think like humans.
Machine learning is a technique within artificial intelligence that uses specific methods to teach or
train computers. Deep learning is a highly specialized machine learning method that uses neural
networks or software structures that mimic the human brain. Deep learning technology powers text
analysis software so these networks can read text in a similar way to the human brain.

Natural language processing

Natural language processing (NLP) is a branch of artificial intelligence that gives computers the
ability to automatically derive meaning from natural, human-created text. It uses linguistic models
and statistics to train the deep learning technology to process and analyze text data, including
handwritten text images. NLP methods such as optical character recognition (OCR) convert text
images into text documents by finding and understanding the words in the images.

Text Analysis Methods & Techniques

1. Text Classification
o Definition: Assigning predefined tags or categories to unstructured text.
o Applications: Sentiment analysis, topic modeling, language detection, intent
detection.
o Example: Classifying customer reviews as positive, negative, or neutral to gauge
sentiment.
2. Text Extraction
o Definition: Extracting specific pieces of data (e.g., keywords, prices, names) from
text.
o Applications: Populating spreadsheets, extracting product specifications from
reviews.
o Example: Extracting customer names and complaint details from support tickets.
3. Word Frequency
o Definition: Measuring the frequency of words in a text using TF-IDF (term
frequency-inverse document frequency).
o Applications: Analyzing common topics or issues in customer feedback.
o Example: Identifying frequently mentioned topics like 'delivery' in negative
customer reviews.
4. Collocation
o Definition: Identifying words that frequently occur together (bigrams and
trigrams).
o Applications: Finding related terms in customer feedback or product reviews.
o Example: Identifying common phrases like 'customer support' in customer
reviews.
5. Concordance
o Definition: Showing the context and instances of words or phrases in a text.
o Applications: Understanding how specific terms are used across different
contexts.
o Example: Analyzing how the word 'simple' is used in app reviews to understand
user perceptions.
6. Word Sense Disambiguation
o Definition: Resolving ambiguity in word meanings based on context.
o Applications: Understanding multiple meanings of words like 'light' (weight,
color, etc.).
o Example: Distinguishing between different senses of 'bank' (financial institution
vs. river bank).
7. Clustering
o Definition: Grouping similar documents or texts into clusters based on similarity.
o Applications: Organizing search results, grouping related articles or documents.
o Example: Google clustering search results based on relevance to search queries.

What are the stages in text analysis?

To implement text analysis, you need to follow a systematic process that goes through four stages.

Stage 1—Data gathering

In this stage, you gather text data from internal or external sources.

Internal data

Internal data is text content that is internal to your business and is readily available—for example,
emails, chats, invoices, and employee surveys.

External data

You can find external data in sources such as social media posts, online reviews, news articles, and
online forums. It is harder to acquire external data because it is beyond your control. You might need
to use web scraping tools or integrate with third-party solutions to extract external data.

Stage 2—Data preparation

Data preparation is an essential part of text analysis. It involves structuring raw text data in an
acceptable format for analysis. The text analysis software automates the process and involves the
following common natural language processing (NLP) methods.

Tokenization

Tokenization is segregating the raw text into multiple parts that make semantic sense. For example,
the phrase text analytics benefits businesses tokenizes to the words text, analytics, benefits,
and businesses.

Part-of-speech tagging

Part-of-speech tagging assigns grammatical tags to the tokenized text. For example, applying this
step to the previously mentioned tokens results in text: Noun; analytics: Noun; benefits: Verb;
businesses: Noun.

Parsing

Parsing establishes meaningful connections between the tokenized words with English grammar. It
helps the text analysis software visualize the relationship between words.

Lemmatization

Lemmatization is a linguistic process that simplifies words into their dictionary form, or lemma. For
example, the dictionary form of visualizing is visualize.

Stop words removal

Stop words are words that offer little or no semantic context to a sentence, such as and, or, and for.
Depending on the use case, the software might remove them from the structured text.

Stage 3—Text analysis

Text analysis is the core part of the process, in which text analysis software processes the text by
using different methods.

Text classification
Classification is the process of assigning tags to the text data that are based on rules or machine
learning-based systems.

Text extraction

Extraction involves identifying the presence of specific keywords in the text and associating them
with tags. The software uses methods such as regular expressions and conditional random fields
(CRFs) to do this.

Stage 4—Visualization
Visualization is about turning the text analysis results into an easily understandable format. You will
find text analytics results in graphs, charts, and tables. The visualized results help you identify
patterns and trends and build action plans. For example, suppose you’re getting a spike in product
returns, but you have trouble finding the causes. With visualization, you look for words such
as defects, wrong size, or not a good fit in the feedback and tabulate them into a chart. Then you’ll
know which is the major issue that takes top priority.

What is text analytics?

Text analytics is the quantitative data that you can obtain by analyzing patterns in multiple samples
of text. It is presented in charts, tables, or graphs.

Text analysis vs. text analytics

Text analytics helps you determine if there’s a particular trend or pattern from the results of
analyzing thousands of pieces of feedback. Meanwhile, you can use text analysis to determine
whether a customer’s feedback is positive or negative.

What are Ensemble Methods?

Ensemble methods are techniques that aim at improving the accuracy of
results in models by combining multiple models instead of using a single
model. The combined models increase the accuracy of the results
significantly. This has boosted the popularity of ensemble methods
in machine learning.
Summary

 Ensemble methods aim at improving predictability in models by combining

several models to make one very reliable model.
 The most popular ensemble methods are boosting, bagging, and stacking.
 Ensemble methods are ideal for regression and classification, where they
reduce bias and variance to boost the accuracy of models.

Categories of Ensemble Methods

Ensemble methods fall into two broad categories, i.e., sequential ensemble
techniques and parallel ensemble techniques. Sequential ensemble
techniques generate base learners in a sequence, e.g., Adaptive Boosting
(AdaBoost). The sequential generation of base learners promotes the
dependence between the base learners. The performance of the model is
then improved by assigning higher weights to previously misrepresented
learners.

In parallel ensemble techniques, base learners are generated in a parallel

format, e.g., random forest. Parallel methods utilize the parallel generation of
base learners to encourage independence between the base learners. The
independence of base learners significantly reduces the error due to the
application of averages.

The majority of ensemble techniques apply a single algorithm in base

learning, which results in homogeneity in all base learners. Homogenous
base learners refer to base learners of the same type, with similar qualities.
Other methods apply heterogeneous base learners, giving rise to
heterogeneous ensembles. Heterogeneous base learners are learners of
distinct types.
Main Types of Ensemble Methods

1. Bagging

Bagging, the short form for bootstrap aggregating, is mainly applied in

classification and regression. It increases the accuracy of models through
decision trees, which reduces variance to a large extent. The reduction of
variance increases accuracy, eliminating overfitting, which is a challenge to
many predictive models.

Bagging is classified into two types, i.e., bootstrapping and

aggregation. Bootstrapping is a sampling technique where samples are
derived from the whole population (set) using the replacement procedure.
The sampling with replacement method helps make the selection procedure
randomized. The base learning algorithm is run on the samples to complete
the procedure.

Aggregation in bagging is done to incorporate all possible outcomes of the

prediction and randomize the outcome. Without aggregation, predictions will
not be accurate because all outcomes are not put into consideration.
Therefore, the aggregation is based on the probability bootstrapping
procedures or on the basis of all outcomes of the predictive models.

Bagging is advantageous since weak base learners are combined to form a

single strong learner that is more stable than single learners. It also
eliminates any variance, thereby reducing the overfitting of models. One
limitation of bagging is that it is computationally expensive. Thus, it can lead
to more bias in models when the proper procedure of bagging is ignored.

2. Boosting

Boosting is an ensemble technique that learns from previous predictor

mistakes to make better predictions in the future. The technique combines
several weak base learners to form one strong learner, thus significantly
improving the predictability of models. Boosting works by arranging weak
learners in a sequence, such that weak learners learn from the next learner
in the sequence to create better predictive models.

Boosting takes many forms, including gradient boosting, Adaptive Boosting

(AdaBoost), and XGBoost (Extreme Gradient Boosting). AdaBoost uses weak
learners in the form of decision trees, which mostly include one split that is
popularly known as decision stumps. AdaBoost’s main decision stump
comprises observations carrying similar weights.

Gradient boosting adds predictors sequentially to the ensemble, where

preceding predictors correct their successors, thereby increasing the model’s
accuracy. New predictors are fit to counter the effects of errors in the
previous predictors. The gradient of descent helps the gradient booster
identify problems in learners’ predictions and counter them accordingly.

XGBoost makes use of decision trees with boosted gradient, providing

improved speed and performance. It relies heavily on the computational
speed and the performance of the target model. Model training should follow
a sequence, thus making the implementation of gradient boosted machines
slow.

3. Stacking

Stacking, another ensemble method, is often referred to as stacked

generalization. This technique works by allowing a training algorithm to
ensemble several other similar learning algorithm predictions. Stacking has
been successfully implemented in regression, density estimations, distance
learning, and classifications. It can also be used to measure the error rate
involved during bagging.

Variance Reduction

Ensemble methods are ideal for reducing the variance in models, thereby
increasing the accuracy of predictions. The variance is eliminated when
multiple models are combined to form a single prediction that is chosen from
all other possible predictions from the combined models. An ensemble of
models combines various models to ensure that the resulting prediction is
the best possible, based on the consideration of all predictions.

1Unit1L1 Art Education in The Elementary Grades
No ratings yet
1Unit1L1 Art Education in The Elementary Grades
32 pages
TEXT ANALYTICS With Python
No ratings yet
TEXT ANALYTICS With Python
37 pages
M3-Social Media Text Analytics
No ratings yet
M3-Social Media Text Analytics
19 pages
Chapter 7 - Text Mining, Sentiment Analysis, and Social Analytics
No ratings yet
Chapter 7 - Text Mining, Sentiment Analysis, and Social Analytics
91 pages
Text Analytics
No ratings yet
Text Analytics
9 pages
Text Analytics
100% (1)
Text Analytics
34 pages
Bda - 2 Unit
No ratings yet
Bda - 2 Unit
12 pages
320 Cohort 9 Report Final
No ratings yet
320 Cohort 9 Report Final
46 pages
BA - Unit 2 - 2025 - 0702
No ratings yet
BA - Unit 2 - 2025 - 0702
204 pages
2025 Sma M3
No ratings yet
2025 Sma M3
77 pages
Ewc661 Draft Proposal
No ratings yet
Ewc661 Draft Proposal
5 pages
Lesson Plan (Types of Industries)
100% (1)
Lesson Plan (Types of Industries)
2 pages
Text Analysis Monkeylearncom
No ratings yet
Text Analysis Monkeylearncom
46 pages
Sma QB Solved QB
No ratings yet
Sma QB Solved QB
43 pages
Business Intelligence and Anlytics UNIT 2
No ratings yet
Business Intelligence and Anlytics UNIT 2
35 pages
ETB Text Analytics Using Machine Learning - 20-12-24
No ratings yet
ETB Text Analytics Using Machine Learning - 20-12-24
38 pages
Data Analytics
No ratings yet
Data Analytics
32 pages
What Is Big Data?
No ratings yet
What Is Big Data?
19 pages
2 Technology and Data
No ratings yet
2 Technology and Data
12 pages
DMPPT 557
No ratings yet
DMPPT 557
14 pages
Business Analytics, For Managers
No ratings yet
Business Analytics, For Managers
45 pages
Sentiment Analysis: Natural Language Processing (NLP) Customer Feedback
No ratings yet
Sentiment Analysis: Natural Language Processing (NLP) Customer Feedback
12 pages
Unit V Sentiment Analysis
No ratings yet
Unit V Sentiment Analysis
17 pages
Astma Lab Manual
No ratings yet
Astma Lab Manual
17 pages
Customer Engagement Analytics Encompasses These Highly Intelligent Capabilities
No ratings yet
Customer Engagement Analytics Encompasses These Highly Intelligent Capabilities
9 pages
Softs Skills (Presentation)
No ratings yet
Softs Skills (Presentation)
21 pages
Text Mining
No ratings yet
Text Mining
12 pages
Lec # 8
No ratings yet
Lec # 8
23 pages
TextAnalyticsApplicationofTextMining2021 31122023 071845am 1 10122024 061001pm
No ratings yet
TextAnalyticsApplicationofTextMining2021 31122023 071845am 1 10122024 061001pm
7 pages
Da Sem 6
No ratings yet
Da Sem 6
9 pages
Harnessing Text and Web Analytics To Enhance Decision-Making in Job Opportunity Categorization
No ratings yet
Harnessing Text and Web Analytics To Enhance Decision-Making in Job Opportunity Categorization
8 pages
1st Unit DA
No ratings yet
1st Unit DA
7 pages
Chapter 1: Text Mining: Big Data Analytics (15CS82)
No ratings yet
Chapter 1: Text Mining: Big Data Analytics (15CS82)
12 pages
Grammar
No ratings yet
Grammar
6 pages
Different Approaches To Sentiment Analysis
No ratings yet
Different Approaches To Sentiment Analysis
8 pages
Lesson Objectives
No ratings yet
Lesson Objectives
9 pages
Dept. of ISE, Acit 1
No ratings yet
Dept. of ISE, Acit 1
12 pages
What Is Text Analysis
No ratings yet
What Is Text Analysis
5 pages
Iia Text Analytics Unlocking Value Unstructured Data 108443 (2508)
No ratings yet
Iia Text Analytics Unlocking Value Unstructured Data 108443 (2508)
7 pages
002 - Discover Data Analysis - Overview of Data Analysis
No ratings yet
002 - Discover Data Analysis - Overview of Data Analysis
4 pages
Big Data Analysis
No ratings yet
Big Data Analysis
4 pages
Text Analytics For Executives 109630
No ratings yet
Text Analytics For Executives 109630
9 pages
R U A A E C: Instructional Planning
No ratings yet
R U A A E C: Instructional Planning
4 pages
AI in Sentiment Analysis
No ratings yet
AI in Sentiment Analysis
2 pages
AlchemyAPI Text Analysis Enterprise Data Initiatives
No ratings yet
AlchemyAPI Text Analysis Enterprise Data Initiatives
7 pages
Q. Discuss About The Text Analysis. Ans
No ratings yet
Q. Discuss About The Text Analysis. Ans
1 page
Data Analysis A Beginner Guide
No ratings yet
Data Analysis A Beginner Guide
1 page
Module Two Resource Guide
No ratings yet
Module Two Resource Guide
80 pages
Volleyball Differentiation
No ratings yet
Volleyball Differentiation
9 pages
Decodable Texts Silent e Reading Universe
No ratings yet
Decodable Texts Silent e Reading Universe
26 pages
Makalah Scientific Approach
No ratings yet
Makalah Scientific Approach
8 pages
02 Listening Skills - Exam Overview & Note Completion
No ratings yet
02 Listening Skills - Exam Overview & Note Completion
15 pages
Math 6 Week 7 Q4 DLP
No ratings yet
Math 6 Week 7 Q4 DLP
5 pages
Calabanga National High School RPMS-Individual Performance Commitment and Review Form
No ratings yet
Calabanga National High School RPMS-Individual Performance Commitment and Review Form
8 pages
Singing Game Music Lesson Plan
No ratings yet
Singing Game Music Lesson Plan
5 pages
1 1excite
No ratings yet
1 1excite
23 pages
The - Role - of - Teaching - Practice
No ratings yet
The - Role - of - Teaching - Practice
12 pages
2017 - Rubén Gaztambide Fernández - Onknowledgeandknowing (Retrieved 2018-06-05)
No ratings yet
2017 - Rubén Gaztambide Fernández - Onknowledgeandknowing (Retrieved 2018-06-05)
4 pages
The Effect of An Unplugged Coding Course On Primary
No ratings yet
The Effect of An Unplugged Coding Course On Primary
17 pages
Strategies of Teaching by S' Jaymar Arago
No ratings yet
Strategies of Teaching by S' Jaymar Arago
13 pages
Bicol University Gubat Campus: BU-F-VPAA-04 Revision: 3 Effectivity: - Page 1 of 7
No ratings yet
Bicol University Gubat Campus: BU-F-VPAA-04 Revision: 3 Effectivity: - Page 1 of 7
7 pages
Evolution of Learning - Chapter 2 Training and Development
No ratings yet
Evolution of Learning - Chapter 2 Training and Development
5 pages
Leson Plan Template Rocks and Minerals
No ratings yet
Leson Plan Template Rocks and Minerals
4 pages
Learning Module In: English 9
No ratings yet
Learning Module In: English 9
5 pages
2025 Chum-1 8
No ratings yet
2025 Chum-1 8
8 pages
Call For Papers AI in Education 2025
No ratings yet
Call For Papers AI in Education 2025
2 pages
IMRaD Sample 2
No ratings yet
IMRaD Sample 2
13 pages
Weekly Home Learning Plan - Grade 12 Week 3
100% (1)
Weekly Home Learning Plan - Grade 12 Week 3
1 page
Annual Date Sheet 2018-19 PDF
No ratings yet
Annual Date Sheet 2018-19 PDF
1 page
My Favorite Subject
No ratings yet
My Favorite Subject
3 pages
Data Analytics for Beginners: Introduction to Data Analytics
From Everand
Data Analytics for Beginners: Introduction to Data Analytics
Anthony S. Williams
4/5 (19)
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Going IT Alone: The Handbook for Freelance and Contract Software Developers: A detailed guide to self-employment for software and web developers - from identifying your target market, through to managing your time, finances, and client behavior
From Everand
Going IT Alone: The Handbook for Freelance and Contract Software Developers: A detailed guide to self-employment for software and web developers - from identifying your target market, through to managing your time, finances, and client behavior
Leon Brown
No ratings yet
Test Development: Fundamentals for Certification and Evaluation
From Everand
Test Development: Fundamentals for Certification and Evaluation
Melissa Fein
No ratings yet
How To Win Customers Every Day _ Volume 7: Data-Driven Selling: The Complete Guide to Success
From Everand
How To Win Customers Every Day _ Volume 7: Data-Driven Selling: The Complete Guide to Success
Max Editorial
No ratings yet
Investigating Performance: Design and Outcomes With Xapi
From Everand
Investigating Performance: Design and Outcomes With Xapi
Sean Putman
No ratings yet
Content Strategy: Connecting the dots between business, brand, and benefits
From Everand
Content Strategy: Connecting the dots between business, brand, and benefits
Rahel Anne Bailie
No ratings yet
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
From Everand
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
Waldo Todd
No ratings yet
How to Research Qualitatively: Tips for Scientific Working
From Everand
How to Research Qualitatively: Tips for Scientific Working
Martin Gertler
No ratings yet
Business Analytics: Leveraging Data for Insights and Competitive Advantage
From Everand
Business Analytics: Leveraging Data for Insights and Competitive Advantage
Ronald BLaha
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
What Is Data Analytics? A Complete Guide For Beginners
From Everand
What Is Data Analytics? A Complete Guide For Beginners
Piyush Kumar Jain
No ratings yet
Professional Business Research
From Everand
Professional Business Research
Akida
No ratings yet
Microsoft Excel Statistical and Advanced Functions for Decision Making
From Everand
Microsoft Excel Statistical and Advanced Functions for Decision Making
Palani Murugappan
5/5 (2)
Introduction to Business Analysis
From Everand
Introduction to Business Analysis
DeEtta Jennings - Balthazar
No ratings yet
Customer Analysis & Insight: An Introductory Guide To Understanding Your Audience
From Everand
Customer Analysis & Insight: An Introductory Guide To Understanding Your Audience
Anpar Insights
No ratings yet
Web Copy For Beginners: Crafting Effective Online Content
From Everand
Web Copy For Beginners: Crafting Effective Online Content
Jake Hill
No ratings yet
Text Analytics with Python: A Brief Introduction to Text Analytics with Python
From Everand
Text Analytics with Python: A Brief Introduction to Text Analytics with Python
Anthony S. Williams
No ratings yet
Text Mining: Fundamentals and Applications
From Everand
Text Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Content Analytics Second Edition
From Everand
Content Analytics Second Edition
Gerardus Blokdyk
No ratings yet

Big Data - Unit 5

Uploaded by

Big Data - Unit 5

Uploaded by

Unit 5

Types of Data Analysis

 Definition: Prescriptive analysis goes beyond predicting future outcomes by recommending

 Data Quality: Ensure data is accurate, complete, and consistent.

Developing an Analytic Team

What is text analysis?

Why is text analysis important?

Personalizing customer experience

How does text analysis work?

Natural language processing

Text Analysis Methods & Techniques

What are the stages in text analysis?

Stage 1—Data gathering

Stage 2—Data preparation

Stop words removal

Stage 3—Text analysis

What is text analytics?

Text analysis vs. text analytics

What are Ensemble Methods?

 Ensemble methods aim at improving predictability in models by combining

Categories of Ensemble Methods

In parallel ensemble techniques, base learners are generated in a parallel

The majority of ensemble techniques apply a single algorithm in base

Bagging, the short form for bootstrap aggregating, is mainly applied in

Bagging is classified into two types, i.e., bootstrapping and

Aggregation in bagging is done to incorporate all possible outcomes of the

Bagging is advantageous since weak base learners are combined to form a

Boosting is an ensemble technique that learns from previous predictor

Boosting takes many forms, including gradient boosting, Adaptive Boosting

Gradient boosting adds predictors sequentially to the ensemble, where

XGBoost makes use of decision trees with boosted gradient, providing

Stacking, another ensemble method, is often referred to as stacked

You might also like