Sma Exp 4

This document describes an experiment to build an LDA topic model from a Google CSV dataset and use word clouds to visualize the top keywords in each topic. It lists the key Python libraries used like NLTK, NumPy, Pandas, Gensim, and Matplotlib. It provides theoretical background on LDA topic modeling and how it assigns documents to topics based on word distributions. It also explains techniques like word clouds, n-grams, and t-SNE used in the analysis. The conclusion states that the experiment successfully created an LDA model from the dataset to identify meaningful topics and used word clouds to aid in interpreting the topics.

Uploaded by

pameluft

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views3 pages

Sma Exp 4

Uploaded by

pameluft

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

SOCIAL MEDIA ANALYTICS

EXPERIMENT 4
Aim: To build a LDA Topic model from given google csv dataset and use
wordcloud to visualize top N keywords in each topic.
Libraries Used:
NLKT: The Natural Language Toolkit, or more commonly NLTK, is a suite of
libraries and programs for symbolic and statistical natural language processing
for English written in the Python programming language. It supports
classification, tokenization, stemming, tagging, parsing, and semantic
reasoning functionalities.
RE: A regular expression (or RE) specifies a set of strings that matches it; the
functions in this module let you check if a particular string matches a given
regular expression (or if a given regular expression matches a particular string,
which comes down to the same thing).
NUMPY: NumPy is a library for the Python programming language, adding
support for large, multi-dimensional arrays and matrices, along with a
large collection of high-level mathematical functions to operate on these
arrays.
Pandas: Pandas is a Python library used for working with data sets. It has
functions for analyzing, cleaning, exploring, and manipulating data. pandas
is a fast, powerful, flexible and easy to use open-source data analysis and
manipulation tool, built on top of the programming language.
Pprint: The pprint module provides a capability to “pretty-print” arbitrary
Python data structures in a form which can be used as input to the interpreter.
Gensim: Gensim is an open-source library for unsupervised topic modelling,
document indexing, retrieval by similarity, and other natural language
processing functionalities, using modern statistical machine learning. Gensim is
implemented in Python.
Spacy: spaCy is a free, open-source library for NLP in Python written in Cython.
spaCy is designed to make it easy to build systems for information
extraction or general-purpose natural language processing.
Logging: is used to import the built-in logging module. This module allows you
to use a logger to log messages. Logging is the process of keeping a record
of events that occur in a computer system, these events can include problems,
errors, or information on current operations.
Matplotlib: Matplotlib is an amazing visualization library in Python for 2D
plots of arrays. Matplotlib is a multi-platform data visualization library built on
NumPy arrays and designed to work with the broader SciPy stack. It was
introduced by John Hunter in the year 2002. One of the greatest benefits of
visualization is that it allows us visual access to huge amounts of data in
easily digestible visuals. Matplotlib consists of several plots like line, bar,
scatter, histogram etc.

CoherenceModel: CoherenceModel is used for evaluation of topic models.

coherence score in topic modeling to measure how interpretable the topics
are to humans. In this case, topics are represented as the top N words with
the highest probability of belonging to that particular topic. Briefly, the
coherence score measures how similar these words are to each other.

Lemmatize: Lemmatization is the process of grouping together the

different inflected forms of a word so they can be analyzed as a single
item. Lemmatization is similar to stemming but it brings context to the
words. So, it links words with similar meanings to one word.

PyLDAvis: Python library for interactive topic model visualization.

pyLDAvis is designed to help users interpret the topics in a topic model
that has been fit to a corpus of text data. The package extracts information
from a fitted LDA topic model to inform an interactive web-based
visualization.

Here are some theoretical points related to the experiment:

LDA topic model using Gensim: The topic modelling strategy used by LDA is
to assign text in a document to a specific topic, and LDA constructs Dirichlet
distributions as a model. A model of a topic per document and a model of words
per topic. It re-arranges the topic-keyword distribution after giving the LDA topic
model algorithm to produce a decent composition of the topic-keyword
distribution. The distribution of themes inside the document and the distribution
of keywords within the topics.
Every document is modelled as a set of multi-nominal topic distributions.
Every topic is represented by multi-nominal word distributions. Because LDA
believes that each piece of text contains the linked terms, we should choose the
proper corpus of data.

WordCloud: Word Cloud is a data visualization technique used for

representing text data in which the size of each word indicates its frequency
or importance. Significant textual data points can be highlighted using a word
cloud. Word clouds are widely used for analyzing data from social network
websites. For generating word cloud in Python, modules needed are –
matplotlib, pandas and wordcloud.
Bigram and Trigram Model: An N-gram is a sequence of n items (words in
this case) from a given sample of text or speech. For example, given the text “
Susan is a kind soul, she will help you out as long as it is within her
boundaries”
bigram: [‘susan is’, ‘is a’, ‘a kind’, ’kind soul’, ‘ soul she‘, ‘she will’, ‘will help’, ‘
help you’………….]
trigram: [‘susan is a’, ‘is a kind’, ‘a kind soul’, ’kind soul she’, ‘soul she will‘,
‘she will help you’………….]
From the examples above, we can see that n in n-grams can be different
values, a sequence of 2 grams is called a bigram, sequence of 3 grams is
called a trigram.
t-SNE: (t-distributed Stochastic Neighbour Embedding) is a technique for
dimensionality reduction. It's used to visualize high-dimensional data by
embedding it into lower dimensional data, such as 2D or 3D. The algorithm
matches the original data to determine how to best represent it using fewer
dimensions. t-SNE is a non-linear dimensionality reduction technique, which
means it can separate data that cannot be separated by a straight line. It can
be used for data exploration and visualization.

CONCLUSION:

this experiment successfully demonstrated the creation of an LDA (Latent

Dirichlet Allocation) Topic model using a provided Google CSV dataset. LDA is
a powerful technique for uncovering hidden thematic structures within a large
collection of text documents. By employing this model, we were able to identify
and extract meaningful topics from the dataset, shedding light on the underlying
themes present in the data
the utilization of word clouds to visualize the top N keywords within each
identified topic added an informative and visually engaging dimension to our
analysis. Word clouds provide a clear and concise representation of the most
prominent terms associated with each topic, aiding in the interpretation and
understanding of the topics generated by the LDA model.
this experiment showcased the potential of LDA Topic modeling and word cloud
visualization as powerful tools for uncovering and communicating the key
themes and keywords within a dataset, facilitating better comprehension and
decision-making in various applications involving textual data.

Techknowledge Publication: Artificial Intelligence and Soft Computing
No ratings yet
Techknowledge Publication: Artificial Intelligence and Soft Computing
336 pages
Electrostatics - JEE Main 2023 April Chapterwise PYQ
No ratings yet
Electrostatics - JEE Main 2023 April Chapterwise PYQ
4 pages
Lect 10 - AI-rule-based-expert-systems - Uncertainty
No ratings yet
Lect 10 - AI-rule-based-expert-systems - Uncertainty
32 pages
My Important Curation Usecase
No ratings yet
My Important Curation Usecase
7 pages
5 - Ines - Topic Modeling On News Articles Using Latent Dirichlet Allocation Kretinin A Kol
No ratings yet
5 - Ines - Topic Modeling On News Articles Using Latent Dirichlet Allocation Kretinin A Kol
10 pages
A Novel Heuristic For Graph-Based Topic
No ratings yet
A Novel Heuristic For Graph-Based Topic
9 pages
Selecting CA Topics For Interview
No ratings yet
Selecting CA Topics For Interview
133 pages
Surveying Project Quantities
No ratings yet
Surveying Project Quantities
11 pages
Landing Gear Shimmy - Dr. Ing. Besselink
No ratings yet
Landing Gear Shimmy - Dr. Ing. Besselink
31 pages
A Guide To Convolutional Neural Networks - The ELI5 Way - Saturn Cloud Blog
No ratings yet
A Guide To Convolutional Neural Networks - The ELI5 Way - Saturn Cloud Blog
10 pages
FACTORS
No ratings yet
FACTORS
23 pages
Mathematics Grade 10 Term 3 Week 3 - 2020
No ratings yet
Mathematics Grade 10 Term 3 Week 3 - 2020
8 pages
Resolution 2
No ratings yet
Resolution 2
3 pages
Javascript
No ratings yet
Javascript
7 pages
Bird
No ratings yet
Bird
50 pages
Full Text 01
No ratings yet
Full Text 01
68 pages
Proflo® PF1 Monitoring Device
100% (1)
Proflo® PF1 Monitoring Device
3 pages
8194 27869 1 PB
No ratings yet
8194 27869 1 PB
8 pages
Doubts AWS
No ratings yet
Doubts AWS
13 pages
Human Blood
No ratings yet
Human Blood
40 pages
GSM Telephone
No ratings yet
GSM Telephone
63 pages
Description: Sn54F374 - . - J Package Sn74F374 - . - DB, DW, or N Package (Top View)
No ratings yet
Description: Sn54F374 - . - J Package Sn74F374 - . - DB, DW, or N Package (Top View)
23 pages
LX30B Q40T
No ratings yet
LX30B Q40T
2 pages
Module 3 (Business)
No ratings yet
Module 3 (Business)
39 pages
Term Paper Int 423
No ratings yet
Term Paper Int 423
9 pages
Soundar Poster PDF
No ratings yet
Soundar Poster PDF
1 page
Cloud
No ratings yet
Cloud
50 pages
RigVeda Samhita Vol 01
No ratings yet
RigVeda Samhita Vol 01
408 pages
DBM 302 Presentation
No ratings yet
DBM 302 Presentation
5 pages
NLP
No ratings yet
NLP
9 pages
Problem Set No. 5 Permeability
No ratings yet
Problem Set No. 5 Permeability
5 pages
Lecture 6 - From Unstructured Texts To Structure Data I
No ratings yet
Lecture 6 - From Unstructured Texts To Structure Data I
17 pages
Ceiling Fan
100% (2)
Ceiling Fan
6 pages
Farmbot Firmware Scope of Work
No ratings yet
Farmbot Firmware Scope of Work
6 pages
NLP 3
No ratings yet
NLP 3
2 pages
Major Research Topics in Big Data
No ratings yet
Major Research Topics in Big Data
4 pages
Blur Effect
No ratings yet
Blur Effect
1 page
MATH1070 Final Review PDF
No ratings yet
MATH1070 Final Review PDF
9 pages
20-1 U1L8 - Factoring Review
No ratings yet
20-1 U1L8 - Factoring Review
5 pages
Akshay DBpedia GSoC 2017 Proposal
No ratings yet
Akshay DBpedia GSoC 2017 Proposal
12 pages
HPMPapers 2016
No ratings yet
HPMPapers 2016
12 pages
Statistics with Rust, Second Edition
From Everand
Statistics with Rust, Second Edition
Keiko Nakamura
No ratings yet
Unit-4 NLP
No ratings yet
Unit-4 NLP
21 pages
Nystatin Preparation: Coccidioides Immitis, Cryptococcus Neoformans, Histoplasma Capsulatum, Blastomyces Dermatidis, and
No ratings yet
Nystatin Preparation: Coccidioides Immitis, Cryptococcus Neoformans, Histoplasma Capsulatum, Blastomyces Dermatidis, and
1 page
Experiment No. 1 - Logic Gates
No ratings yet
Experiment No. 1 - Logic Gates
10 pages
Twitter Topic Modeling On Football News
No ratings yet
Twitter Topic Modeling On Football News
5 pages
C# Data Structures and Algorithms: Harness the power of C# to build a diverse range of efficient applications
From Everand
C# Data Structures and Algorithms: Harness the power of C# to build a diverse range of efficient applications
Marcin Jamro
No ratings yet
Weekly Report Automation
No ratings yet
Weekly Report Automation
1 page
Autodesk Product Design Suite Premium 2015 System Requirements
No ratings yet
Autodesk Product Design Suite Premium 2015 System Requirements
3 pages
Statistics with Rust, Second Edition: Explore rust programming and its powerful crates across data science, machine learning and NLP projects
From Everand
Statistics with Rust, Second Edition: Explore rust programming and its powerful crates across data science, machine learning and NLP projects
Keiko Nakamura
No ratings yet
Text Mining of Twitter Data Using A Latent Dirichlet Allocation Topic Model and Sentiment Analysis
No ratings yet
Text Mining of Twitter Data Using A Latent Dirichlet Allocation Topic Model and Sentiment Analysis
6 pages
Eai 13-7-2018 159623
No ratings yet
Eai 13-7-2018 159623
16 pages
An Integrated Clustering and BERT Framework For Improved Topic Modeling
No ratings yet
An Integrated Clustering and BERT Framework For Improved Topic Modeling
9 pages
FYP Proposal Submission Form: Project Title
No ratings yet
FYP Proposal Submission Form: Project Title
3 pages
CSE442 Text
No ratings yet
CSE442 Text
89 pages
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
Mastering Deepseek in Python: A Complete Guide to Building, Training, Deploying, and Scaling Advanced NLP Applications with Deepseek Models in Python
From Everand
Mastering Deepseek in Python: A Complete Guide to Building, Training, Deploying, and Scaling Advanced NLP Applications with Deepseek Models in Python
Dargslan
No ratings yet
A Two Staged NLP Based Framework For Assessing The Sentiments On Indian Supreme Court Judgments
No ratings yet
A Two Staged NLP Based Framework For Assessing The Sentiments On Indian Supreme Court Judgments
10 pages
Nightstick XPP 5452g
No ratings yet
Nightstick XPP 5452g
2 pages
Domain-Specific Languages in R: Advanced Statistical Programming
From Everand
Domain-Specific Languages in R: Advanced Statistical Programming
Thomas Mailund
No ratings yet
Microsoft Visual Basic para Aplicaciones
No ratings yet
Microsoft Visual Basic para Aplicaciones
3 pages
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet
Inspection Tank
100% (2)
Inspection Tank
22 pages
2019 - Latent Dirichlet Allocation (LDA) and Topic Modeling: Models, Applications, A Survey
No ratings yet
2019 - Latent Dirichlet Allocation (LDA) and Topic Modeling: Models, Applications, A Survey
43 pages
Topic Analysis Presentation
No ratings yet
Topic Analysis Presentation
23 pages
Data Driven Guide for Python Programming : Master Essentials to Advanced Data Structures
From Everand
Data Driven Guide for Python Programming : Master Essentials to Advanced Data Structures
Younes Hamdani
No ratings yet
Probabilistic Topic Modeling and Its Variants - A Survey: Padmaja CH V R S Lakshmi Narayana
No ratings yet
Probabilistic Topic Modeling and Its Variants - A Survey: Padmaja CH V R S Lakshmi Narayana
5 pages
CHATGPT DALL.E 3: Complete Guide. Third Edition
From Everand
CHATGPT DALL.E 3: Complete Guide. Third Edition
Hesham Mohamed Elsherif
No ratings yet
Basics of Python Programming: Learn Python in 30 days (Beginners approach) - 2nd Edition
From Everand
Basics of Python Programming: Learn Python in 30 days (Beginners approach) - 2nd Edition
Dr. Pratiyush Guleria
No ratings yet
Latent Dirichlet Allocation (LDA) and Topic Modeling: Models, Applications, A Survey
No ratings yet
Latent Dirichlet Allocation (LDA) and Topic Modeling: Models, Applications, A Survey
40 pages
Bash Ri 2017
No ratings yet
Bash Ri 2017
5 pages
T 2V: D R T: OP EC Istributed Epresentations of Opics
No ratings yet
T 2V: D R T: OP EC Istributed Epresentations of Opics
25 pages
Data Structures and Algorithms with Python
From Everand
Data Structures and Algorithms with Python
Aadinath Pothuvaal
No ratings yet
Latent Dirichlet Allocation LDA and Topic Modeling PDF
No ratings yet
Latent Dirichlet Allocation LDA and Topic Modeling PDF
41 pages
3BSE035040-510 - en System 800xa 5.1 PLC Connect Operation
No ratings yet
3BSE035040-510 - en System 800xa 5.1 PLC Connect Operation
62 pages
Topic Modeling Text Clustering Based On Deep Learning Model
No ratings yet
Topic Modeling Text Clustering Based On Deep Learning Model
11 pages
Topic Modelling Using NLP
No ratings yet
Topic Modelling Using NLP
18 pages
Comparing Topic Modeling and Named Entity Recognition Techniques For The Semantic Indexing of A Landscape Architecture Textbook
No ratings yet
Comparing Topic Modeling and Named Entity Recognition Techniques For The Semantic Indexing of A Landscape Architecture Textbook
6 pages
Kinetix Motion Control - (Gmc-sg001)
No ratings yet
Kinetix Motion Control - (Gmc-sg001)
106 pages
Basic Concepts
No ratings yet
Basic Concepts
2 pages
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
Mastering TensorFlow 2.x: Implement Powerful Neural Nets across Structured, Unstructured datasets and Time Series Data
From Everand
Mastering TensorFlow 2.x: Implement Powerful Neural Nets across Structured, Unstructured datasets and Time Series Data
Rajdeep Dua
No ratings yet
Combine PDF
No ratings yet
Combine PDF
7 pages
Mastering Computer Programming: A Comprehensive Guide
From Everand
Mastering Computer Programming: A Comprehensive Guide
Kondwani Hara
No ratings yet
Practical Data Analysis - Second Edition
From Everand
Practical Data Analysis - Second Edition
Hector Cuesta
No ratings yet
Hands-on Data Analysis and Visualization with Pandas: Engineer, Analyse and Visualize Data, Using Powerful Python Libraries
From Everand
Hands-on Data Analysis and Visualization with Pandas: Engineer, Analyse and Visualize Data, Using Powerful Python Libraries
PURNA CHANDER RAO. KATHULA
5/5 (1)
DATABASE From the conceptual model to the final application in Access, Visual Basic, Pascal, Html and Php: Inside, examples of applications created with Access, Visual Studio, Lazarus and Wamp
From Everand
DATABASE From the conceptual model to the final application in Access, Visual Basic, Pascal, Html and Php: Inside, examples of applications created with Access, Visual Studio, Lazarus and Wamp
Olga Maria Stefania Cucaro
No ratings yet
Exploring the World of Data Science and Machine Learning
From Everand
Exploring the World of Data Science and Machine Learning
NIBEDITA Sahu
No ratings yet
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
Gensim for Natural Language Processing: Definitive Reference for Developers and Engineers
From Everand
Gensim for Natural Language Processing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Topic Modelling: A Survey of Topic Models: Abstract-In Recent Years We Have Significant Increase
No ratings yet
Topic Modelling: A Survey of Topic Models: Abstract-In Recent Years We Have Significant Increase
12 pages
Python Data Persistence
From Everand
Python Data Persistence
Malhar Lathkar
No ratings yet
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
Visualizing Data Structures
From Everand
Visualizing Data Structures
Rhonda Hoenigman
No ratings yet
A Survey of Topic Pattern Mining in Text Mining PDF
No ratings yet
A Survey of Topic Pattern Mining in Text Mining PDF
7 pages
Python For Data Science
From Everand
Python For Data Science
Kevin Clark
No ratings yet
Semantic Network: Fundamentals and Applications
From Everand
Semantic Network: Fundamentals and Applications
Fouad Sabry
No ratings yet
Concept Mining: Fundamentals and Applications
From Everand
Concept Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Semantic Translation: Fundamentals and Applications
From Everand
Semantic Translation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Relationship Extraction: Fundamentals and Applications
From Everand
Relationship Extraction: Fundamentals and Applications
Fouad Sabry
No ratings yet

Sma Exp 4

Uploaded by

Sma Exp 4

Uploaded by

SOCIAL MEDIA ANALYTICS

CoherenceModel: CoherenceModel is used for evaluation of topic models.

Lemmatize: Lemmatization is the process of grouping together the

PyLDAvis: Python library for interactive topic model visualization.

Here are some theoretical points related to the experiment:

WordCloud: Word Cloud is a data visualization technique used for

this experiment successfully demonstrated the creation of an LDA (Latent

You might also like