Data Science Project
Data Science Project
2 Introduction 2
3 Dataset 2
3.1 Data source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3.2 Data preproccessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3 Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3.1 Analyzing text statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3.1.1 Number of words in reviews . . . . . . . . . . . . . . . . . . . . 5
3.3.1.2 Number of stop words in reviews . . . . . . . . . . . . . . . . . 8
3.3.2 N-gram exploration and word cloud . . . . . . . . . . . . . . . . . . . . . 8
3.3.3 Topic modeling exploration with pyLDAvis . . . . . . . . . . . . . . . . . 9
3.4 TD-IDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5 Word Embedding (word2vec) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.5.1 One-Hot Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.5.2 The Skip-Gram Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.5.3 The Continuous Bag of Words Model . . . . . . . . . . . . . . . . . . . . 13
4 Methodology 14
4.1 Machine learning methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1.1 XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1.2 SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Deep learning methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.1 Convolutional neural network . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.2 RNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.3 LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.4 GRU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.5 BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5 Results 18
5.1 Metrics used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
8 References 23
1
Sentiment analysis for up-to-date film reviews
1 Abtract
Nowadays, the number of movies being released has skyrocketed thanks to the development
of the film industry. As a result, there are numerous good movies, but there remain also many
bad-quality movies. On the other hand, there are only a few websites for reviewing up-to-date
movies. Therefore, our project aims to gather opinions (comments, ratings) of old-released films
by scraping data from rotten tomatoes and Imdb. This will allow us to obtain a dataset of
reviews. We will then clean and preprocess the dataset and utilize sentiment analysis techniques
to classify correctly the reviewers’ opinions which can reflect the quality of the old movies. By
doing so, we aim to create a program that can qualify up-to-date movies based on their review.
2 Introduction
In the vibrant realm of movies, where stories unfold and emotions run high, the voices of
film enthusiasts reverberate through the digital corridors of reviews. This project sets out to
explore the beating heart of audience reactions in up-to-date film critiques. Using sentiment
analysis, we aim to tap into the diverse sentiments expressed by viewers, unraveling the threads
of excitement, disappointment, and everything in between. As we navigate the ever-changing
landscape of cinema, this analysis promises to unveil the authentic emotions that shape our
cinematic experiences.
Movies, with their ability to transport us to different worlds and evoke a myriad of feelings,
inspire conversations that resonate across online platforms. In this project, we take a closer
look at the sentiments embedded in contemporary film reviews, embracing the vast spectrum
of opinions shared by audiences. Through the lens of sentiment analysis, our goal is to capture
the essence of what viewers truly feel about the latest cinematic releases. In a world buzzing
with instant reactions, this exploration promises to uncover the collective emotions that define
our relationship with the silver screen.
In a world where every film elicits a unique emotional response, the digital landscape be-
comes a canvas for audiences to paint their opinions. This project immerses itself in the rich
tapestry of up-to-date film reviews, employing sentiment analysis as a compass to navigate the
sea of sentiments. As we embark on this journey, our aim is to understand the ebb and flow
of audience reactions, from the highs of cinematic delight to the lows of critique. Through the
natural language processing lens, we endeavor to capture the genuine emotions that breathe
life into the words of film enthusiasts in our interconnected world.
3 Dataset
3.1 Data source
Our dataset comprises over 140,000 film reviews gathered from prominent movie-review
platforms. The data extraction was performed on IMDB and Rotten Tomatoes, both globally
recognized platforms. Selenium, an automated web browser interaction tool, was utilized for
the crawling process. The procedure for crawling can be outlined as follows:
- Find the list of the title of the movies and their corresponding URL links.
Page 2
Sentiment analysis for up-to-date film reviews
- For each movies’ URLs , we crawl reviewers’ name, ratings, the date that they posted
reviews , review body and the movie name.
- We classify the class of the reviews based on their rating score: (1-2: Very Bad, 3-4: Bad,
5-6: Decent, 7-8: Recommend, 9-10: Exceptional)
After crawled, our dataset has over 140 000 samples, with the distribution of ratings as
Figure below:
• The data is skewed, there are nearly 40.000 instances on Recommend sentiment and
Exceptional sentiment, while only more than 15.000 instances on Bad and 20.000 instances
on Very Bad.
• There are many irrelevant data. (Eg : Customer just types an irrelevant text to fill in
the comment)
Page 3
Sentiment analysis for up-to-date film reviews
• The data has many misspelled words, emojis, emoticons, . . . which need to be prepro-
cessed.
• These kinds of problems make our dataset noisy and it is a big challenge for us to clean
the data so that it can be used in the models.
• Remove the URLs in sentences: In general, there exists numerous reviews that contain
URLs (e.g.: URLs of actors’ profile...). If we don’t handle these URLs, our data will have
many noises and our models can’t learn properly as these URLs contain irrelevant words.
• Remove Duplicate Characters in Words and Words in Sentences: Handling these dupli-
cates ensures that models recognize them correctly, preventing the loss of information
from sentences.
• Stemming: The purpose of Stemming is to reduce the size of our vocabulary by converting
a word to its most general form, or stem.
• Handle emoticons and emojis: Since body language and verbal tone do not translate in
our text messages or e-mails, people have developed alternate ways to convey nuanced
meaning. The most prominent change to our online style has been emoticons and emoji.
Emoticons are punctuation marks, letters, and numbers used to create pictorial icons that
generally display an emotion or sentiment. In other side, emoji are pictographs of faces,
objects, and symbols. In our datasets, there are many emoji and emoticons, and how
to handle them properly is crucial, as it may contain sentimental information we need.
We have tested two cases: remove all emoji/emoticons and replace emoji/ emoticons by
words and obtain that by replacing them by word using Emoji library, we can have better
results.
Page 4
Sentiment analysis for up-to-date film reviews
The histogram shows that reviews range from 0 to 6000 characters and generally, it is
between 0 to 2000 characters. In details, reviews range from 0 to 1.000 words and mostly
between 0 to 250 words.
Figure below presents the kernel density estimation of the lengths of reviews. Kernel density
estimation involves applying kernel smoothing for the purpose of probability density estimation.
Consider a set of independent and identically distributed samples x1 , x2 , . . . , xn drawn from
some univariate distribution with an unknown density f at any given point x. The kernel
density estimator for f is expressed as:
n
1 X x − x i
fˆh (x) = K
nh i=1 h
In this formula:
• n represents the number of samples.
Page 5
Sentiment analysis for up-to-date film reviews
The provided distribution exhibits a mean of approximately 254 and a variance of about
42.297. Our objective is to determine whether this distribution follows a normal distribution.
To achieve this, we will employ two numerical indicators of shape, namely skewness and kurtosis.
For this particular distribution, the skewness is 1.86, indicating a positive skew. This implies
that the distribution is skewed to the right. The kurtosis is 4.36, suggesting that the distribution
has more values concentrated in the tails or is more peaked compared to a normal distribution
with the same variance. Based on these measures, it can be inferred that the distribution of
review lengths deviates from a normal distribution.
Page 6
Sentiment analysis for up-to-date film reviews
We want to test whether there is a relationship between the number of words and the
sentiments. To do so, we will use Point Biserial Correlation to measure the relation between
the reviews’ lengths and the sentiment. The point biserial correlation coefficient rpb is a special
case of Pearson’s correlation coefficient that measures the relationship between two variable:
One continuous variable and one naturally binary variable. It has the formula:
s
¯ ¯
Y1 − Y0 N1 N0
rpb =
Sy N (N − 1)
Where Y1 , Y0 are the means of the metric observations coded 0 and 1 respectively, re-
spectively; sy is the standard deviation of all the metric observations; N1 , N0 are number of
observations coded 0 and 1 respectively; N is the total number of observations.
Figure shows the box plots for reviews’ lengths in all sentiments.
A value of rpb that is significantly different from zero is completely equivalent to a significant
difference in means between the two groups. Thus, an independent groups t − T est with N − 2
degrees of freedom may be used to test whether rpb is nonzero. The relation between the
t − statistic for comparing two independent groups and rpb is given by:
√ rpb
t= N − 2q
2
1 − rpb
We conduct a one tailed t-test with the null hypothesis is that “There is no correlation
between the number of words and the sentiment”.
Page 7
Sentiment analysis for up-to-date film reviews
We can evidently see that stopwords such as “the”,” and” and “a” dominate in movies’
reviews.
Page 8
Sentiment analysis for up-to-date film reviews
Figure below shows the word clouds of all sentiment and 3 sentiments:(Recommend,Decent,Bad)
after remove stop words. The bigger the word is, the more it appears in the dataset. Overall,
the common words in the dataset are mostly verb and adjective.
Page 9
Sentiment analysis for up-to-date film reviews
This estimation can be performed using the Expectation – Maximization algorithm or other
inference methods such as Variational Bayes.
To evaluate a LDA model, we can use perplexity and coherence score:
• Perplexity is a measure of how well the model fits the data. It is calculated as the
exponentiated average log-likelihood of the held-out (unseen) data. The formula for
perplexity is given by:
1 PN
Perplexity = e− N i=1 log p(wi |w1 ,...,wi−1 )
Where N is the total number of words in the held-out data,wi is the i-th word in the
heldout data, p(wi |w1 , . . . , wi−1 ) is the probability of the i-th word given the previous
words
• Coherence score is a measure of the semantic similarity between the words in a topic.
A high coherence score indicates that the words in a topic are semantically similar and
form a coherent topic. The formula for coherence score depends on the specific coherence
measure being used. The formula of Cv Coherence is:
X log p(wi |wj ) + log p(wj |wi )
Cv (t) = ∀wi , wj ∈ t
i,j
2
Where t is a topic, p(wi |wj ) and p(wj |wi ) are the probabilities of the words wi and wj
given each other.
In this project, we apply LDA model and choose the model with lowest perplexity score.
The final model has 5 topics, equals to the number of sentiments, with −6.18 perplexity. The
coherence score is 0.66, indicates that the words in a topic are semantically similar.
To get a better understandings about LDA model, we use pyLDAvis library and get the
complete visualization of the model in Figure 5. Overall, there are some notable elements in
the visualization:
• Intertopic Distance Map: The map displays a two - dimensional representation of the
topics found by the LDA model. Topics that are close together are considered to be
similar to each other in terms of the words they contain. We can observe that there are
5 topics with significant differences.
Page 10
Sentiment analysis for up-to-date film reviews
• Topic Terms: The list of words in the right-side panel shows the most relevant words for
each selected topic. The size of the word indicates its relevance to the topic.
• Topic and Term Frequency: The term frequency is shown in the table below the plot.
We can observe the frequency of each term in the corpus, as well as its frequency in the
selected topic.
3.4 TD-IDF
Term Frequency-Inverse Document Frequency (TF-IDF) is a numerical statistic that reflects
the importance of a word in a document relative to a collection of documents, often used in
natural language processing and information retrieval. It is a combination of two components:
Term Frequency (TF) and Inverse Document Frequency (IDF).
Page 11
Sentiment analysis for up-to-date film reviews
Page 12
Sentiment analysis for up-to-date film reviews
The Skip-Gram objective function aggregates the log probabilities of the surrounding n
words to the left and right of the target word wt to formulate the objective. This process can
be visualized as follows:
T
1 X
Jθ = log p(w( t + j)|wt )
T t=−n≤j≤n,̸=0
• θ: all variables.
• n: size of training.
• T : number of words.
• θ: all variables.
• T : number of words.
The CBOW model has a number of advantages over the skip-gram model. First, it is able to
learn more complex relationships between words, as it is able to take into account the context
of the center word. Second, it is more efficient to train, as it only needs to calculate the gradient
for the center word, rather than for all of the surrounding words.
Page 13
Sentiment analysis for up-to-date film reviews
4 Methodology
4.1 Machine learning methods
4.1.1 XGBoost
XGBoost, or eXtreme Gradient Boosting, is an advanced machine learning algorithm that
has gained prominence for its exceptional performance in supervised learning tasks. It operates
within the realm of ensemble learning, a methodology that combines the predictive strength of
multiple models to enhance overall accuracy and generalization. What sets XGBoost apart is its
sequential construction of decision trees, where each subsequent tree focuses on correcting the
errors of the preceding ones. This process, known as boosting, allows XGBoost to incrementally
refine its predictions, producing a robust and accurate model.
The algorithm’s optimization hinges on a carefully crafted objective function that comprises
two crucial components: a loss function and a regularization term. The loss function quantifies
the disparity between predicted and actual values, while the regularization term helps prevent
the model from becoming overly complex and overfitting the training data. Through an itera-
tive process, XGBoost minimizes this combined objective function, striking a balance between
precision and model simplicity. One key feature that contributes to XGBoost’s popularity is
its interpretability. The algorithm provides valuable insights into the importance of each fea-
ture in the dataset, offering a clear understanding of how these features influence the model’s
predictions. This interpretability is especially advantageous in scenarios where understanding
the underlying decision-making process is as crucial as predictive accuracy.
XGBoost’s versatility is evident in its applicability to a wide range of machine learning tasks.
Whether tackling classification problems, regression analyses, or ranking challenges, XGBoost
has proven itself as a reliable and efficient solution. Its adaptability to diverse datasets and
ability to handle large-scale, high-dimensional data make it a favored choice among data sci-
entists and machine learning practitioners seeking powerful models for real-world applications.
In essence, XGBoost stands as a sophisticated and versatile tool, making it a cornerstone in
the toolkit of machine learning professionals.
4.1.2 SVM
Support Vector Machines (SVM) are a versatile class of supervised learning algorithms with
a robust mathematical foundation. At their core, SVMs are designed to find the hyperplane
that best separates classes in the feature space. The algorithm achieves this by identifying
support vectors—data points crucial for defining the decision boundary. The concept of a
margin, representing the distance between the hyperplane and the nearest data point of any
class, is central to SVM. SVMs excel in scenarios where the goal is to maximize this margin,
as it leads to a more resilient and generalizable model.
One notable feature of SVM is its ability to handle non-linear decision boundaries. This
is accomplished through the use of kernel functions, such as radial basis function (RBF) or
polynomial kernels, which implicitly map input features into a higher-dimensional space where
a linear separation is feasible. Its formula:
||x−z||2
K(x, z) = e 2σ , where σ > 0
Page 14
Sentiment analysis for up-to-date film reviews
The regularization parameter (C) in SVM controls the trade-off between achieving a smooth
decision boundary and correctly classifying training data. A smaller C value encourages a
broader margin, potentially allowing for some misclassifications, while a larger C emphasizes
accurate classification, potentially leading to a narrower margin.
SVMs find application in diverse domains, including image recognition, text categorization,
and bioinformatics. Their efficacy in high-dimensional spaces, even when the number of features
exceeds the number of samples, makes SVMs particularly suitable for tasks with complex data
structures. While SVMs are powerful, their performance can be influenced by the choice of
kernel and tuning parameters, requiring careful consideration in practice. Nevertheless, SVMs
remain a foundational tool in machine learning, celebrated for their ability to handle various
types of data and produce robust decision boundaries.
4.2.2 RNN
RNN is a class of powerful deep neural network using its internal memory with loops to
deal with sequence data. in this project, we use architecture of many-to-one of RNN
One challenge associated with Recurrent Neural Networks (RNNs) is the exploding gra-
dient problem, wherein gradients grow excessively large, leading to numerical instability and
Page 15
Sentiment analysis for up-to-date film reviews
hindering the optimization process. This problem can be addressed by implementing effective
solutions such as proper weight initialization, careful selection of activation functions, and the
application of gradient clipping. These measures collectively work to mitigate the issues associ-
ated with the exploding gradient problem and contribute to more stable and effective training
of RNNs.
Another version of RNN we use is Bi-RNNS. A Bidirectional Recurrent Neural Network
(RNN) is a neural network architecture that incorporates two RNNs operating in distinct
directions. The forward RNN processes the input sequence from the beginning to the end, while
the backward RNN processes it in the reverse direction, from end to start. These two RNNs
are stacked on top of each other, and their states are commonly combined by concatenating the
two resulting vectors. This bidirectional approach allows the network to capture information
from both the past and future context of each element in the input sequence, enhancing its
ability to understand and model sequential data.
4.2.3 LSTM
Long Short-Term Memory (LSTM) is a type of Recurrent Neural Network (RNN) that is
capable of learning long-term dependencies in sequence data. This is particularly useful for
tasks involving sequential inputs like natural language processing, speech recognition, and time
series prediction. LSTM models overcome the limitations of traditional RNNs, such as vanishing
gradient problem, by using a unique gating mechanism that controls the flow of information
between cells in the network.
The LSTM layers take into account not only the word order but also their contextual
significance within the sentence. Through this process, the model discerns key patterns residing
within the sequences, identifying their correlation with specific emotions. The ultimate layer of
the model typically consists of a softmax layer, producing a probability distribution of potential
emotions. With the highest-probability emotion selected as its prediction, this model possesses
a key advantage in its adaptability to varying sentence lengths due to the inherent properties
of LSTM networks. As such, it boasts immense versatility and efficacy in accurately classifying
the conveyed emotion.
4.2.4 GRU
Introduced in 2014 by Kyunghyun Cho et al., GRU has established itself as a well-known
type of recurrent neural network. It was designed as a simpler alternative to Long Short-Term
Memory (LSTM) networks, with fewer parameters, making it more computationally efficient.
GRU has gained popularity in the field of deep learning, particularly in tasks involving
sequential data like natural language processing, speech recognition, and time-series prediction.
Its simpler architecture compared to LSTM makes it faster to train, which can be advantageous
in projects where computational resources or time are limiting factors.
However, it’s important to note that while GRU has been widely adopted, LSTM is still
more prevalent in certain applications, especially those that require modeling longer sequences
and more complex dependencies. The choice between GRU and LSTM often depends on the
specific requirements of the task at hand. Despite its relative simplicity, GRU has proven to
be a powerful tool in the deep learning toolkit.
Page 16
Sentiment analysis for up-to-date film reviews
4.2.5 BERT
One of the biggest challenges in natural language processing (NLP) is the shortage of training
data. Because NLP is a diversified field with many distinct tasks, most task-specific datasets
contain only a few thousand or a few hundred thousand human-labeled training examples.
However, modern deep learning-based NLP models see benefits from much larger amounts of
data, improving when trained on millions, or billions, of annotated training examples. To help
close this gap in data, researchers have developed a variety of techniques for training general
purpose language representation models using the enormous amount of unannotated text on
the web (known as pre-training). The pre-trained model can then be fine-tuned on small-data
NLP tasks like question answering and sentiment analysis, resulting in substantial accuracy
improvements compared to training on these datasets from scratch.
Open sourced a new technique for NLP pre-training called Bidirectional Encoder Represen-
tations from Transformers, or BERT. With this release, anyone in the world can train their own
state-of-the-art question answering system (or a variety of other models) in about 30 minutes
on a single Cloud TPU, or in a few hours using a single GPU. The release includes source code
built on top of TensorFlow and a number of pre-trained language representation models. In our
associated paper, we demonstrate state-of-the-art results on 11 NLP tasks, including the very
competitive Stanford Question Answering Dataset (SQuAD v1.1). Input: The input data of
BERT concludes 3 components which are token embedding which separates tokens in an input;
segment embedding, this helps the model to distinguish different sentences in an input; finally
positional embedding which indicate the position of each token.
Page 17
Sentiment analysis for up-to-date film reviews
MLM: The first idea used in BERT is Masked-Language Modeling (MLM). BERT tries to
predict words in a sentence which are randomly masked before by it. The masking proportion
is about 15 percent and the masked token will be replaced by token [MASKED]. BERT use the
bidirectional approach, so that it can look both previous and next tokens, understand the full
context of the sentence to predict the masked words.
NSP: The second idea in BERT is Next Sentence Prediction (NSP). This technique is used
to learn the relationship between two sentences. BERT receives the input of two sentences and
tries to predict whether the second sentence is the next sentence of the first sentence. During
training, half of time the truth second sentence is fed with the first sentence and half of time
the second sentence is a random sentence.
5 Results
5.1 Metrics used
Let:
• T Ni true negative) is the number of instances inside ci that are assigned correctly to
another class.
• F Pi (false possitive) is the number of instances that are assigned incorrectly to classci .
Page 18
Sentiment analysis for up-to-date film reviews
• F Ni (false negative) is the number of instances inside ci that are assigned incorrectly to
another class.
• F1 - score is the harmonic mean of precision and recall. It can provide a unified view on
the performance of a classifier and is computed as:
2 ∗ P recision ∗ Recall
F1 =
P recision + Recall
Where ti is the truth label and pi is the softmax probability for the it h class.
• Weight initialization: Xavier initialization. All the weights of a layer L are picked ran-
1
domly from a normal µ = 0 and variance σ 2 = L where nL is the number of neurons in
n
layer L − 1. This will prevent the gradients of network’s activations from vanishing or
exploding.
• Optimizer: Adam. It integrates the pros of both Momentum and RMSprop. It utilizes the
squared gradients to scale the learning rate as RMSprop and it is similar to the momentum
by using the moving average of the gradient. Its advantages are more memory efficient
and less computational ower.
Page 19
Sentiment analysis for up-to-date film reviews
5.3 Results
Figure 4 shows the results of our models with the best embedding technique. Here, Pre
stands for pretrained embedding:
• SVM– despite being a simple model, has pretty good result with 56.05. We can see that
TF – IDF with machine learning methods still perform quite well in the sentiment analysis
task.
• LSTM architecture’s performance is slightly better than CNN in this case and just lower.
• BERT, with huge improvements in dealing with NLP, has outperformed all other models
with a big gap
Page 20
Sentiment analysis for up-to-date film reviews
Users simply input the movie they wish to assess and our model promptly crawl up to 100
reviews for that movie.
The sentiment of these reviews is then determined using our advanced pretrained model,
"Bert." After aggregating the sentiment scores from all the reviews and applying predefined
criteria, the system generates a result, categorizing the film as either a "Bad movie," "Decent
movie," "Good movie," or "Must-watch movie."
User can easily use our work in this notebook: https://www.kaggle.com/great23u5/project-
movie-ratings
Page 21
Sentiment analysis for up-to-date film reviews
In this project, we gathered a dataset comprising more than 140,000 reviews on film from
diverse platforms. Subsequently, we applied a range of techniques for preprocessing. Our
Exploratory Data Analysis encompassed various approaches, including hypothesis testing for
word count, analysis of stop words, and the implementation of topic modeling using the LDA
model.
For word embedding, we employed TF-IDF, CBOW, Skip-gram, and pretrained embeddings.
In the modeling phase, we explored traditional Machine Learning models such as Random For-
est and Gradient Tree Boosting, alongside various Deep Learning methods like Convolutional
Neural Network, Long Short-Term Memory, Gated Recurrent Unit in bidirectional form. Ad-
ditionally, we integrated BERT, a powerful model in Natural Language Processing (NLP).
Looking ahead, our future plans involve an extended focus on data preparation, Exploratory
Data Analysis (EDA), and modeling. Our strategy includes broadening the dataset by collecting
additional data, experimenting with diverse preprocessing and augmentation techniques, and
conducting further EDA methods to gain more profound insights. On the modeling front, we
aim to create a user-friendly website for easy accessibility to our project.
Page 22
Sentiment analysis for up-to-date film reviews
8 References
[1] NGUYEN, Dat Quoc; NGUYEN, Anh Tuan. PhoBERT: Pre-trained language models
for Vietnamese. arXiv preprint arXiv:2003.00744, 2020.
[2] DEVLIN, Jacob, et al. Bert: Pre-training of deep bidirectional transformers for language
understanding. arXiv preprint arXiv:1810.04805, 2018.
[3] DANG, Nhan Cach; MORENOGARCÍA, María N.; DE LA PRIETA, Fernando. Senti-
ment analysis based on deep learning: A comparative study. Electronics, 2020, 9.3: 483.
[4] LIU, Yinhan, et al. Roberta: A robustly optimized bert pretraining approach. arXiv
preprint arXiv:1907.11692, 2019.
[5] CUI, Zhiyong, et al. Deep bidirectional and unidirectional LSTM recurrent neural net-
work for network-wide traffic speed prediction. arXiv preprint arXiv:1801.02143, 2018.
[6] BAHDANAU, Dzmitry; CHO, Kyunghyun; BENGIO, Yoshua. Neural machine transla-
tion by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
[7] VASWANI, Ashish, et al. Attention is all you need. Advances in neural information
processing systems, 2017, 30.
[8] NGUYEN, Khang Phuoc-Quy; VAN NGUYEN, Kiet. Exploiting vietnamese social
media characteristics for textual emotion recognition in vietnamese. In: 2020 International
Conference on Asian Language Processing (IALP). IEEE, 2020. p. 276-281.
[9] L. Zhang, S. Wang, and B. Liu, “Deep learning for sentiment analysis: a survey,” WIREs
Data Mining and Knowledge Discovery, vol. 8, no. 4, pp. 1942–4795, 2018, https:// onlineli-
brary. wiley.com/doi/abs/10.1002/widm.1253.
[10] T. Singh and M. Kumari, “Role of text pre-processing in twitter sentiment analysis,” Pro-
cedia Computer Science, vol. 89, pp. 549–554, 2016, https://linkinghub.elsevier.com/retrieve/
pii/S1877050916311607.
[11] HOANG, Suong N., et al. An efficient model for sentiment analysis of electronic product
reviews in Vietnamese. In: International Conference on Future Data and Security Engineering.
Springer, Cham, 2019. p. 132-142.
[12] ZHANG, Ye; WALLACE, Byron. A sensitivity analysis of (and practitioners’ guide
to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820,
2015.
[13] RAFFEL, Colin, et al. Exploring the limits of transfer learning with a unified text-to-
text transformer. J. Mach. Learn. Res., 2020, 21.140: 1-67.
[14] SUN, Chi, et al. How to fine-tune bert for text classification?. In: China national
conference on Chinese computational linguistics. Springer, Cham, 2019. p. 194- 206.
[15] Vong Anh Ho, et al. Emotion Recognition for Vietnamese Social Media Text. In
Proceeding of PACLING, 2019
Page 23