0% found this document useful (0 votes)
136 views14 pages

Key Data Extraction and Emotion Analysis of Digital Shopping Based On BERT

The document discusses using BERT for key data extraction and emotion analysis from digital shopping reviews. It describes how BERT is a good method for understanding context-heavy texts. The author aims to use BERT and other NLP techniques like sentiment analysis to analyze reviews and identify trends to boost online sales. The system architecture involves preprocessing data, using a BERT classifier with a neural network to perform fine-tuning and make predictions. The document references several papers on using BERT and other deep learning methods for tasks like sentiment analysis and text summarization.

Uploaded by

saRIKA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
136 views14 pages

Key Data Extraction and Emotion Analysis of Digital Shopping Based On BERT

The document discusses using BERT for key data extraction and emotion analysis from digital shopping reviews. It describes how BERT is a good method for understanding context-heavy texts. The author aims to use BERT and other NLP techniques like sentiment analysis to analyze reviews and identify trends to boost online sales. The system architecture involves preprocessing data, using a BERT classifier with a neural network to perform fine-tuning and make predictions. The document references several papers on using BERT and other deep learning methods for tasks like sentiment analysis and text summarization.

Uploaded by

saRIKA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Key Data Extraction and Emotion

Analysis of Digital Shopping based on


BERT
SARIKA JAY
20MAI1011
M.Tech (CSE AI&ML) 2nd semester

Supervisor : Dr. B.V.A.N.S.S PRABHAKAR RAO


Natural Language Processing (NLP)
A subfield of artificial intelligence that helps machines understand natural human language.

Approaches for Extracting information :


1.Named Entity Recognition

2.Sentiment Analysis

3.Text summarization

4.Aspect mining

5.Topic modelling

Techniques :
1.Seq2seq

2.Long Short-term memory (LSTM)

3.Convolutional Neural Network (CNN)

4.Recurrent Neural Network (RNN)

5.Bidirectional Encoder Representations from Transformers (BERT)

6.Transformers
Why BERT- Bidirectional Encoder Representations
from Transformers ??
 Builds upon recent work in pre-training contextual representations.
 Best method in NLP to understand context-heavy texts.
 BERT provides pre-trained language models for English and 103 other
languages that you can fine-tune to fit your needs. Possible to fine-tune the
English model to do sentiment analysis.
Problem Statement
To accelerate digital-sales by identifying key trends and predict their performance
in the current market.

Challenge targets :

• Companies find difficult in understanding what are the current public needs in this
pandemic situation.
• Knowing the human needs and predictions based on it
boosts the digital sales,
as online shopping is preferred the most these days.
Target :
 Kaggle Datasets.
 Review content and collection of positive and negative words for training purpose to be
concentrated.
 Pre train and fine tune BERT model with category wise words.
 Vectored output to be classified with TF-IDF,CRF models for understanding the emotion.
System Architecture & Modules
Explained
• Steps:
1.Obtain the scrape reviews of products under different category.
2.Data wrangling
3.EDA with pre-trained BERT along with a neural network classifier.
Major modules :
•Load the BERT Classifier and Tokenizer along with Input modules;
•Configure the Loaded BERT model and Train for Fine-tuning
•Make Prediction with the Fine-tuned Model

System Architecture
The BERT model ,a dropout layer and a classifier. The IMDB dataset is used for binary sentiment
classification , for knowing a positive or negative review. It contains 25,000 movie reviews for training
and 25,000 for testing. All these are labeled data.

Used libraries : Transformers,Tenserflow and Pandas


Two main functions:
1 Accept the train and test datasets and convert each row into an InputExample object.
2 Tokenize the InputExample objects, then create the required input format with the tokenized
objects,
finally, create a sequenced input dataset that can be feed to the model.

Optimization and final functions :


Adam as the optimizer, CategoricalCrossentropy
as loss function,
and SparseCategoricalAccuracy as accuracy metric.
• We can see that negative sentences are longer on average.
• To say how significant this difference  permutation testing and calculate
p-value.
• First, we define a function to generate a permutation sample from two
arrays. Then, generate permutation replicates, which are a single statistic
computed from permutation sample.
• Last, compute the probability of getting at least 5.91 difference in mean
under the hypothesis that the distributions of words are identical.
Training setup:
• Initialize pre-trained model.
• From config use dimensionality of the encoder layers and the pooler layer = 768 and
dropout probabilities = 0.2.
• Compute logits-final score for the input sequence.

1.Apply weight decay for all parameters except 'bias' and 'LayerNorm'
2.Lookahead optimizer(improves the learning stability and lowers the variance of its inner
optimizer)
3.OneCycleLRWithWarmup with 0 warmup steps, cosine annealing from 5e-5 to 1e-8.
4.Gradient accumulation for large batch training.
BERT model for classification
After two epochs, we’ll able to reach 96.22% accuracy, which is on 6% higher than logistic
regression.
To improve result  fine-tuning with frozen encoder.

Test content :
References :
 Wanying Yan And Junjun Guo ,“Joint Hierarchical Semantic Clipping And Sentence Extraction For Document Summarization”, J Inf Process Syst, Vol.16, No.4, Pp.820~831, August
2020
 Han Zhang,Shaoqi Sun1 , Yongjin Hu , Junxiu Liu, And Yuanbo Guo , (Member, IEEE) “Sentiment Classification For Chinese Text Based On Interactive Multitask Learning”,IEEE
2020,Doi 10.1109/ACCESS.2020.3007889
 Abdessamad Benlahbib And El Habib Nfaoui “Aggregating Customer Review Attributes For Online Reputation Generation”,IEEE 2020,Doi 10.1109/ACCESS.2020.2996805
 A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, And I. Polosukhin, ‘‘Attention Is All You Need,’’ In Proc. Adv. Neural Inf. Process. Syst., 2017,
Pp. 5998–6008.
 Yaser Keneshloo,Tian Shi,Naren Ramakrishnan And Chandan K Reddy,”Deep Reinforcement Learning For Sequence-To-Sequence Models” IEEE Vol. 31, No. 7, JULY 2020.
 Jacob Devlin,Ming-Wei Chang,Kenton Lee And Kristina Toutanova , Google AI Language “BERT: Pre-Training Of Deep Bidirectional Transformers For Language Understanding”,
Arxiv:1810.04805v2 [Cs.CL] 24 May 2019
 Andrea Galassi,Marco Lippi And Paolo Torroni“Attention In Natural Language Processing”IEEE,Unpublished.
 Andres Alejandro Ramos Magna,Hector Aleende-CID,Carla Taramasco,Carlos Becerra And Rosa L Figueroa,“Application Of Machine Learning And Word Embedding For
Cancer”IEEE 2020,Doi 10.1109/ACCESS.2020.3000075
 Hongbin Xia,Chenhui Ding and Yuan Liu “Sentiment Analysis model based on Self-Attention and Character level Embedding”.IEEE 2020.Doi: 10.1109/ACCESS.2020.3029694
 Shaozong Zhang,Dingkai Zhang,Haidong Zhong and Guorong Wang“A multiclassification model of sentiment for E-commerce reviews”.IEEE 2020.Doi:
10.1109/ACCESS.2020.3031588.
 Tiancheng Tang,Xinhuai Tang and Tianyi Yuan“Fine-tuning BERT for Multi-Label Senitment analysis uin unbalanaced code -switching text”.IEEE 2020 Doi:
10.1109/ACCESS.2020.3030468.
 Abdulmohsen Al-Thubaity,Atheer Alkhalifa,Abdulrahman Almuhareb and Waleed Alsanie“Arabic Diacrticization using Bidirectional Long Short-term memory neural networks with
Conditional Random Fields”.IEEE 2020,Doi:10.1109/ACCESS.2020.3018885.
 Yongping Du,Xiaozheng Zhao,Meng He,Wenyang Guo “A novel capsule based hybrid neural network for sentiment classification “IEEE 2019.Doi:
10.1109/ACCESS.2019.2906398.
 “Sentiment Analysis About Investors And Consumers In Energy Market Based On BERT-Bilstm”IEEE 2020,Doi 10.1109/ACCESS.2020.3024750

You might also like