Topic Modeling P.P.T
Topic Modeling P.P.T
Modeling
Prepared By:
Maryam Kashani
Rozhin Ahmadi Khamene
Zoya Chegini
Ahmad Shahriari
Table of Contents
01 02 03
An Introduction to Topic Modeling Topic Modelling
Topic Modeling Application Techniques
04 05 06
How to Apply STM? Techniquesy Conclusion
Challengesy and Future
Directions for STM
An
Introduction
to Topic
Modeling
LSA is much faster to train than LDAy but has lower accuracy
Topic Modeling
Techniques LDA & LSA Hierarchy
Differences Between LSAy
LDA
Type: Non-probabilistic model
Latent Semantic Analysis (LSA)
Strengths:
Fast computation for smaller datasets.
Effective at revealing hidden relationships between terms.
Limitations:
Lacks interpretability of topics.
Sensitive to noise in data.
Differences Between LSAy
LDA
Latent Dirichlet Allocation (LDA)
Focus: Identifies latent topics in a corpus and how these topics are distributed across
documents.
Applications: Topic modeling in large text corporay such as academic papers or social media
content.
Strengths:
Provides interpretable topics linked to specific words.
Handles large datasets effectively.
Limitations:
Requires careful tuning of hyper parameters.
Computationally intensive compared to LSA.
Structural Topic Modeling
(STM)
Type: Supervised topic model that builds on LDA principles.
Focus: Allows for the inclusion of external variables (e.g.y authory date) to understand topic
prevalence better.
Applications: Social science researchy political discourse analysisy and marketing insights.
Strengths:
Provides richer insights by considering contextual factors.
Enhances interpretability by linking topics to metadata.
Limitations:
More complex and computationally demanding than both LSA and LDA.
Requires careful selection and preprocessing of metadata.
03 05
01
Topic Topic
Data Interpretation
Collection Modeling
02 04
Data
Preparation
Topic
Visualization How to
Apply
STM
STM?
Software for Running
STM
Heuristic
Description of
STM Package
Features
How does STM work
Pkojhnklmcjidsovmcgpseprgzjtsevygtzjhivotyhysopzfesxfopyz0fuerziogjrtyuiogsfoaesdyucghjsduicyh
ajskifuohydvcbjsdkhyifcvxcuoiahscpuoozdsgsbzunvaschiszuofhpoiuytwfdghjklkmanbvcxzsdrtyuiawu
ystrdfhjkaociucyxfvwsbndpscoiuydsgcvbneldpeoiuyctsrfcvbjskociusydgwbnelqwoiduxysgbnxkcijusyd
gwhkdofciudsgvcbnmklskodiuygshdjckos0duyfhjskelpfcouehfjkoisdcv9uyhdjcnhuytfsdcgvchysdtrfcvb
njkdiokjflefkiuvhsncjhuysdtgfvdbhaejwiduhj2keiuydtfgshu7yawdhjkoiuryqghbcpsoiugvbhgfreqtyauwijd
kcbxvgctyefuijdcxfdstkjvhhyidsfbsyiecngisegfaonxfanifxucngrnysdnfygusdgvuifreovbvnhjdkwsjfrdegts
ujwldqwdefefwefwefjwiduhj2keiuydtfgshu7yawdhjkoiuryqghbcpsoiugvbhgfreqtyauwijdkcbxvgctyefuijd
cxfdstkjvhhyidsyfhjskelpfcouehfjkoisdcv9uyhdjcnhuytfsdcgvchysdtrfcvbnjkdiokjflefkiuvhsncjhuysdtgfv
dbhaejwiduhj2keiuydtfgshu7yawdhjkoiuryqghbcpsoiugvbhgfreqtyauwijdkcbxvgctyefuijdcxfdstkjvhhyid
sfbsyiecngisegPkojhnklmcjidsovmcgpseprgzjtsevygtzjhivotyhysopzfesxfopyz0fuerziogjrtyuiogsfoaes
dyucghjsduicyhajskifuohydvcbjsdkhyifcvxcuoiahscpuoozdsgsbzunvaschiszuofhpoiuytwfdghjklkmanb
vcxzsdrtyuiawuystrdfhjkaociucyxfvwsbndpscoiuydsgcvbneldpeoiuyctsrfcvbjskociusydgwbnelqwoidux
ysgbnxkcijusydgwhkdofciudsgvcbnmklskodiuygshdjckos0duyfhjskelpfcouehfjkoisdcv9uyhdjcnhuytfsd
cgvchysdtrfcvbnjkdiokjflefkiuvhsncjhuysdtgfvdbhaejwiduhj2keiuydtfgshu7yawdhjkoiuryqghbcpsoiugv
bhgfreqtyauwijdkcbxvgctyefuijdcxfdstkjvhhyidsfbsyiecngisegfaonxfanifxucngrnysdnfypoiuyopokouytf
eertyuipomzx
rs
25 0 le
tt e
How does STM work
Pkojhnklmcjidsovmcgpseprgzjtsevygtzjhivotyhysopzfesxfopyz0fuerziogjrtyuiogsfoaesdyucghjsduicyh
ajskifuohydvcbjsdkhyifcvxcuoiahscpuoozdsgsbzunvaschiszuofhpoiuytwfdghjklkmanbvcxzsdrtyuiawu
ystrdfhjkaociucyxfvwsbndpscoiuydsgcvbneldpeoiuyctsrfcvbjskociusydgwbnelqwoiduxysgbnxkcijusyd
gwhkdofciudsgvcbnmklskodiuygshdjckos0duyfhjskelpfcouehfjkoisdcv9uyhdjcnhuytfsdcgvchysdtrfcvb
njkdiokjflefkiuvhsncjhuysdtgfvdbhaejwiduhj2keiuydtfgshu7yawdhjkoiuryqghbcpsoiugvbhgfreqtyauwijd
kcbxvgctyefuijdcxfdstkjvhhyidsfbsyiecngisegfaonxfanifxucngrnysdnfygusdgvuifreovbvnhjdkwsjfrdegts
ujwldqwdefefwefwefjwiduhj2keiuydtfgshu7yawdhjkoiuryqghbcpsoiugvbhgfreqtyauwijdkcbxvgctyefuijd
cxfdstkjvhhyidsyfhjskelpfcouehfjkoisdcv9uyhdjcnhuytfsdcgvchysdtrfcvbnjkdiokjflefkiuvhsncjhuysdtgfv
dbhaejwiduhj2keiuydtfgshu7yawdhjkoiuryqghbcpsoiugvbhgfreqtyauwijdkcbxvgctyefuijdcxfdstkjvhhyid
sfbsyiecngisegPkojhnklmcjidsovmcgpseprgzjtsevygtzjhivotyhysopzfesxfopyz0fuerziogjrtyuiogsfoaes
dyucghjsduicyhajskifuohydvcbjsdkhyifcvxcuoiahscpuoozdsgsbzunvaschiszuofhpoiuytwfdghjklkmanb
vcxzsdrtyuiawuystrdfhjkaociucyxfvwsbndpscoiuydsgcvbneldpeoiuyctsrfcvbjskociusydgwbnelqwoidux
ysgbnxkcijusydgwhkdofciudsgvcbnmklskodiuygshdjckos0duyfhjskelpfcouehfjkoisdcv9uyhdjcnhuytfsd
cgvchysdtrfcvbnjkdiokjflefkiuvhsncjhuysdtgfvdbhaejwiduhj2keiuydtfgshu7yawdhjkoiuryqghbcpsoiugv
bhgfreqtyauwijdkcbxvgctyefuijdcxfdstkjvhhyidsfbsyiecngisegfaonxfanifxucngrnysdnfypoiuyopokouytf
eertyuipomzx
How does STM work
Pkojhnklmcjidsovmcgpseprgzjtsevygtzjhivotyhysopzfesxfopyz0fuerziogjrtyuiogsfoaesdyucghjsduicyh
Number of first 4 letters of the alphabet
ajskifuohydvcbjsdkhyifcvxcuoiahscpuoozdsgsbzunvaschiszuofhpoiuytwfdghjklkmanbvcxzsdrtyuiawu
ystrdfhjkaociucyxfvwsbndpscoiuydsgcvbneldpeoiuyctsrfcvbjskociusydgwbnelqwoiduxysgbnxkcijusyd
gwhkdofciudsgvcbnmklskodiuygshdjckos0duyfhjskelpfcouehfjkoisdcv9uyhdjcnhuytfsdcgvchysdtrfcvb
73
njkdiokjflefkiuvhsncjhuysdtgfvdbhaejwiduhj2keiuydtfgshu7yawdhjkoiuryqghbcpsoiugvbhgfreqtyauwijd
66
62
kcbxvgctyefuijdcxfdstkjvhhyidsfbsyiecngisegfaonxfanifxucngrnysdnfygusdgvuifreovbvnhjdkwsjfrdegts
ujwldqwdefefwefwefjwiduhj2keiuydtfgshu7yawdhjkoiuryqghbcpsoiugvbhgfreqtyauwijdkcbxvgctyefuijd
51
cxfdstkjvhhyidsyfhjskelpfcouehfjkoisdcv9uyhdjcnhuytfsdcgvchysdtrfcvbnjkdiokjflefkiuvhsncjhuysdtgfv
dbhaejwiduhj2keiuydtfgshu7yawdhjkoiuryqghbcpsoiugvbhgfreqtyauwijdkcbxvgctyefuijdcxfdstkjvhhyid
sfbsyiecngisegPkojhnklmcjidsovmcgpseprgzjtsevygtzjhivotyhysopzfesxfopyz0fuerziogjrtyuiogsfoaes
dyucghjsduicyhajskifuohydvcbjsdkhyifcvxcuoiahscpuoozdsgsbzunvaschiszuofhpoiuytwfdghjklkmanb
vcxzsdrtyuiawuystrdfhjkaociucyxfvwsbndpscoiuydsgcvbneldpeoiuyctsrfcvbjskociusydgwbnelqwoidux
ysgbnxkcijusydgwhkdofciudsgvcbnmklskodiuygshdjckos0duyfhjskelpfcouehfjkoisdcv9uyhdjcnhuytfsd
cgvchysdtrfcvbnjkdiokjflefkiuvhsncjhuysdtgfvdbhaejwiduhj2keiuydtfgshu7yawdhjkoiuryqghbcpsoiugv
bhgfreqtyauwijdkcbxvgctyefuijdcxfdstkjvhhyidsfbsyiecngisegfaonxfanifxucngrnysdnfypoiuyopokouytf
Part 1 Part 2 Part 3 Part 4
eertyuipomzx
How does STM work
Higher k shows more
focus and tunnel vision
on a subject
Pkojhnklmcjidsovmcgpseprgzjtsevygtzjhivotyhysopzfesxfopyz0fuerziogjrtyuiogsfoaescyucghjscuicyh
Number of 3rd letter of the alphabet
ajskifuohydvcbjsdkhyifcvxcuoiahscpuoozdsgsbzunvaschiszuofhpoiuytwfdghjklkmanbvcxzsdrtyuiawu
ystrdfhjkaociucyxfvwsbndpscoiuydsgcvbneldpeoiuyctsrfcvbjskociusydgwbnelqwoiduxysgbnxkcijusyd
gwhkdofciudsgvcbnmklskodiuygshdjckos0duyfhjskelpfcouehfjkoisdcv9uyhdjcnhuytfsdcgvchysdtrfcvb
njkdiokjflefkiuvhsncjhuysdtgfvdbhaejwiduhj2keiuydtfgshu7yawdhjkoiuryqghbcpsoiugvbhgfreqtyauwijd
29
kcbxvgctyefuijdcxfdstkjvhhyidsfbsyiecngisegfaonxfanifxucngrnysdnfygusdgvuifreovbvnhjdkwsjfrdegts
ujwldqwdefefwefwefjwiduhj2keiuydtfgshu7yawdhjkoiuryqghbcpsoiugvbhgfreqtyauwijdkcbxvgctyefuijd
cxfdstkjvhhyidsyfhjskelpfcouehfjkoisdcv9uyhdjcnhuytfsdcgvchysdtrfcvbnjkdiokjflefkiuvhsncjhuysdtgfv
19 19
dbhaejwiduhj2keiuydtfgshu7yawdhjkoiuryqghbcpsoiugvbhgfreqtyauwijdkcbxvgctyefuijdcxfdstkjvhhyid
17
sfbsyiecngisegPkojhnklmcjidsovmcgpseprgzjtsevygtzjhivotyhysopzfesxfopyz0fuerziogjrtyuiogsfoaes
dyucghjsduicyhajskifuohydvcbjsdkhyifcvxcuoiahscpuoozdsgsbzunvaschiszuofhpoiuytwfdghjklkmanb
vcxzsdrtyuiawuystrdfhjkaociucyxfvwsbndpscoiuydsgcvbneldpeoiuyctsrfcvbjskociusydgwbnelqwoidux
ysgbnxkcijusydgwhkdofciudsgvcbnmklskodiuygshdjckos0duyfhjskelpfcouehfjkoisdcv9uyhdjcnhuytfsd
cgvdhysdtrfcvbnjkdiokjflefkiuvhsncjhuysdtgfvdbhaejwiduhj2keiuydtfgshu7yawdhjkoiuryqghbdpsoiugv
bhgfreqtyauwijdkcbxvgctyefuijdcxfdstkjvhhyidsfbsyiecngisegfaonxfanifxucngrnysdnfypoiuyopokouytf
Part 1 Part 2 Part 3 Part 4
eertyuipomzx
How does STM work
What is FREX?
Topic Identification by STM
Topic Identification by STM
Topical Content: This enables metadata to affect the word distribution within a topicy allowing for nuanced
interpretations of how topics are framed based on external factors
Variational Inference: The estimation process in STM is typically accomplished through fast variational
approximationy which enhances computational efficiency and scalabilityy particularly for large datasets15.
Model Initialization Techniques: Proper initialization is crucial due to the non-convex nature of the posterior
distribution. Techniques like spectral initializationy which uses non-negative matrix factorizationy help stabilize
results across different runs15.
Model Selection and Evaluation: The select Model function automates the evaluation of multiple models
based on different initializationsy allowing researchers to identify models with desirable properties. This
includes calculating held-out log-likelihood and performing residual analyses to select the optimal number of
topics
Techniquesy Challengesy and
Future Directions
Challenges in Implementing STM
Sensitivity to Initialization: The multi-modal estimation problem can lead to different results based on initial
parameter values. This necessitates careful model selection and multiple runs to ensure robustness15.
Determining the Number of Topics: There is no definitive method for selecting the appropriate number of
topicsy which can lead to subjective decisions. Automated methods like ‘searchK' can assisty but they may not
always yield clear results15.
Complexity of Interpretation: Analyzing and interpreting the results from STM can be complexy especially when
dealing with multiple metadata covariates. Researchers must be adept at using visualization tools and statistical
tests to draw meaningful conclusions from the model outputs
Techniquesy Challengesy and
Future Directions
The Future of Structural Topic Models:
Integration with Machine Learning: Combining STM with advanced machine learning
techniques could enhance its predictive capabilities and allow for more sophisticated
analyses of large text datasets5.
Improved User Interfaces: Developing more intuitive interfaces for tools like the stm
package could broaden access and usability for researchers without extensive
programming backgrounds15.
Enhanced Model Flexibility: Future iterations of STM could incorporate more flexible
modeling structures that account for temporal dynamics or hierarchical relationships within
datay improving its applicability across different contexts and research questions5
PLZ WRITE A CONCLUSION BASED ON YOUR OPINION
CONCLUSION
Question
?