Lit Review - Webscraping
Lit Review - Webscraping
Analysis
Apurva Wankhade, Tanvi Paigude, Prof. Dr. Anny Leema,
Abstract
Artificial Intelligence (AI)-powered product review analysis is a growing field that uses AI to
extract valuable insights from customer reviews on a variety of products which will be useful to
improve product development and enhancement, marketing, and customer service. It can also
be used to identify the customer-specific gain points such as key features and benefits using
natural language processing (NLP) to extract keywords and phrases from reviews. After the
successful identification of gain points, they can be used to improvise product development and
marketing. Besides, AI can also be used to extract the pain points of the customers from their
negative parts of the reviews. For that, the proposed work is going to develop a product-specific
focused crawler whose objective is firstly to search the web to extract customer reviews from
the most trusted websites. Subsequently, it has to extract the positive or negative sentiments on
specific features of a product under concern from their gain points and pain points respectively.
As a side chain, this can also be used as a tool for analyzing the competitors’ products to
understand their pain as well as gain points. The proposed approach can definitely uncover the
most discussed aspects of products, enabling businesses to better understand customer
perceptions and preferences to make better decisions thereby saving time and money.
Introduction
In today's competitive market, understanding customer sentiments is vital for businesses.
Artificial Intelligence (AI) and Natural Language Processing (NLP) offer a solution known as AI-
powered product review analysis. This field extracts insights from customer reviews to enhance
product development, marketing, and customer service. It identifies "gain points" (positive
aspects) and "pain points" (negative aspects) using NLP. To achieve this, we propose developing
a product-specific web crawler to gather trusted customer reviews. This data not only improves
our own products but also provides a competitive edge by analyzing rival products. By
deciphering what customers love and dislike, businesses can make informed decisions, saving
time and resources. In summary, AI-powered review analysis revolutionizes how we understand
customer preferences, ensuring smarter business choices.
Problem Statement
The growing field of AI-powered product review analysis aims to leverage artificial intelligence
techniques to extract valuable insights from customer reviews across various products. These
insights are crucial for improving product development, enhancing marketing strategies, and
elevating customer service. The objective of this research is to develop a product-specific
focused crawler. This crawler will search the web to gather customer reviews from trusted
websites. It will then analyze these reviews to extract positive and negative sentiments related
to specific product features, commonly referred to as gain points and pain points. Additionally,
this tool can be used to analyze competitors' products to understand their strengths and
weaknesses. Ultimately, this approach seeks to uncover the most discussed aspects of products,
providing businesses with a deeper understanding of customer perceptions and preferences to
facilitate more informed decision-making and resource optimization.
Literature Review
The advent of Artificial Intelligence (AI) has ushered in transformative changes across various
industries, particularly in the realm of business. A key application of AI in the business landscape
is the analysis of product reviews, which has garnered significant attention from researchers.
Several scholars have highlighted the importance of extracting and summarizing customer
opinions from product reviews. For instance, Htay SS et al. [7] propose techniques that leverage
linguistic rules and part-of-speech tagging for this purpose. Similarly, C. Chauhan et al. [8]
emphasize the significance of sentiment analysis in understanding customer sentiments on
platforms like Amazon and Flipkart, underlining its role in improving products and marketing
strategies through web scraping and polarity analysis. Additionally, Zhang, Z. et al. [5] introduce
a unique approach that combines sentiment analysis and the intuitionistic fuzzy TODIM method
for product selection based on online reviews, enhancing the decision-making process.
In the field of sentiment analysis, researchers have proposed various techniques to enhance
opinion mining. L. Yang et al. [1] present the SLCABG model, which integrates sentiment
lexicons, Convolutional Neural Networks (CNN), and Bidirectional Gated Recurrent Unit (BiGRU)
to extract sentiment features and context from reviews. Additionally, [2] and [14] introduce
supervised deep Embedding (WDE) learning frameworks, utilizing Convolutional Neural
Networks (CNN) and Long Short-Term Memory (LSTM) architectures for sentiment analysis,
each offering unique advantages and trade-offs. In [6], SenticNet 2 is introduced as a valuable
resource that associates semantics and sentics with common-sense concepts, enhancing
opinion mining and sentiment analysis with a comprehensive approach. It leverages a paradigm
known as sentic computing, which combines computer and social sciences to better recognize
and interpret opinions and sentiments on the web.
The applications of sentiment analysis extend beyond product reviews. S. A. A. Shah et al. [3]
propose a novel Bi-directional LSTM with a CNN model for detecting e-commerce entities,
particularly focusing on products sold on the dark web. Their approach outperforms all others,
achieving impressive accuracy rates. Doaa Mohey El-Din Mohamed Hussein et al. [4] conducted
a survey that delves into the significance of sentiment analysis challenges and their impact on
sentiment evaluation. The study reveals connections between sentiment review structures and
these challenges, emphasizing domain dependence as a crucial factor. Efficient preprocessing of
online reviews is also highlighted as a critical step, as discussed by James [10]. Furthermore,
researchers like S. Dey et al. [15] conduct comparative analyses between classifiers, adding
depth to the exploration of sentiment analysis in various contexts. These diverse applications
and ongoing challenges underscore the evolving landscape of sentiment analysis in AI-driven
business insights.
Deep learning methods are increasingly prevalent in sentiment analysis, as evidenced by the
work of M. S. Parvez et al. [18]. They propose that web wrappers and machine learning
approaches yield fast and accurate results for both structured and unstructured HTML pages.
Additionally, researchers like Haque, T. U. et al. [20] are developing supervised learning models
to polarize large, unlabeled product review datasets, achieving impressive results and
highlighting the potential of hybrid feature extraction approaches. Moreover, Zhang, L. et al.
[19] explore the surge in applying deep learning to sentiment analysis, showcasing various deep
learning architectures and their successful applications, signifying a trend toward improved
sentiment analysis techniques in the future. This review collectively emphasizes the dynamic
nature of sentiment analysis and its pivotal role in AI-driven business insights.
System Design
1. Web Crawling
Use a web crawler or scraping tool to collect product reviews from Amazon. You will typically
need to specify the product(s) and the number of reviews you want to scrape.
2. Data Preprocessing
Clean and preprocess the scraped review data, including removing HTML tags, special
characters, and irrelevant information. Tokenize the text into sentences or words.
Train two separate models for sentiment analysis and text summarization. For sentiment
analysis, train an LSTM model on labeled review data to classify sentiments (e.g., positive,
negative, neutral). For text summarization, train a Seq2Seq model on pairs of reviews and their
corresponding summaries (if available).
4. Inference
Use the trained LSTM model to perform sentiment analysis on each review to determine
sentiment polarity (positive/negative/neutral). Use the trained Seq2Seq model to generate
summaries for the reviews. Alternatively, you can use prebuilt models for sentiment analysis
and text summarization.
For each review, apply sentiment analysis to determine the sentiment polarity (positive,
negative, or neutral). For each review, apply text summarization to generate concise summaries
of the reviews. This can help condense lengthy reviews into key points.
Analyze the sentiment analysis results to understand the overall sentiment distribution of the
reviews. Examine the generated summaries to extract insights and key information from the
reviews. Use the results and insights for decision-making, product improvement, or further
analysis.
References
[1] L. Yang, Y. Li, J. Wang, and R. S. Sherratt, "Sentiment Analysis for E-Commerce Product
Reviews in Chinese Based on Sentiment Lexicon and Deep Learning," in IEEE Access, vol. 8,
pp. 23522-23530, 2020, doi: 10.1109/ACCESS.2020.2969854.
[2] W. Zhao et al., "Weakly-Supervised Deep Embedding for Product Review Sentiment Analysis,"
in IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 1, pp. 185-197, 1 Jan.
2018, doi: 10.1109/TKDE.2017.2756658.
[3] S. A. A. Shah, M. Ali Masood, and A. Yasin, "Dark Web: E-Commerce Information Extraction
Based on Name Entity Recognition Using Bidirectional-LSTM," in IEEE Access, vol. 10, pp.
99633-99645, 2022, doi: 10.1109/ACCESS.2022.3206539.
[4] Doaa Mohey El-Din Mohamed Hussein, A survey on sentiment analysis challenges, Journal of
King Saud University – Engineering Sciences, Volume30, Issue 4, 2018, Pages 330-338, ISSN
1018-3639,
[5] Zhang, Z., Guo, J., Zhang, H. et al. Product selection based on sentiment analysis of online
reviews: an intuitionistic fuzzy TODIM method. Complex Intell. Syst. 8, 3349–3362 (2022).
[6] SenticNet 2: A semantic and affective resource for opinion mining and sentiment analysis,
Proceedings of the 25th International Florida Artificial Intelligence Research Society
Conference, FLAIRS-25.
[7] Htay SS, Lynn KT. Extracting product features and opinion words using pattern knowledge in
customer reviews. ScientificWorldJournal. 2013 Dec 26;2013:394758. doi:
10.1155/2013/394758. PMID: 24459430; PMCID: PMC3888732.
[8] C. Chauhan and S. Sehgal, "Sentiment analysis on product reviews," 2017 International
Conference on Computing, Communication and Automation (ICCCA), Greater Noida, India,
2017, pp. 26-31, doi: 10.1109/CCAA.2017.8229825.
[9] M.D. Devika, C. Sunitha, Amal Ganesh, Sentiment Analysis: A Comparative Study on Different
Approaches, Procedia Computer Science, Volume 87, 2016, Pages 44-49, ISSN 1877-0509
[10] Kavanagh, James, Greenhow, Keith and Jordanous, Anna (2023) Assessing the Effects of
Lemmatisation and Spell Checking on Sentiment Analysis of Online Reviews. In: 17th IEEE
International Conference on SEMANTIC COMPUTING (ICSC), 1-3 Feb 2023, Laguna Hills, USA.
[11] Apoorv Agarwal, Boyi Xie, Ilia Vovsha, Owen Rambow, and Rebecca Passonneau. 2011.
Sentiment analysis of Twitter data. In Proceedings of the Workshop on Languages in Social
Media (LSM '11). Association for Computational Linguistics, USA, 30–38.
[12] Conference Proceedings, Subjectivity Word Sense Disambiguation Akkaya, Cem, Wiebe,
Janyce, Mihalcea, Rada, Proceedings of the 2009 Conference on Empirical Methods in
Natural Language Processing, Association for Computational Linguistics
[13] Socher, Richard, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, A. Ng
and Christopher Potts. “Recursive Deep Models for Semantic Compositionality Over a
Sentiment Treebank.” Conference on Empirical Methods in Natural Language Processing.
[14] Zhao, Wei & Guan, Ziyu & Chen, Long & He, Xiaofei & Cai, Deng & Wang, Beidou &
Wang, Quan. (2017). Weakly-Supervised Deep Embedding for Product Review Sentiment
Analysis. IEEE Transactions on Knowledge and Data Engineering. PP. 1-1.
10.1109/TKDE.2017.2756658.
[15] S. Dey, S. Wasif, D. S. Tonmoy, S. Sultana, J. Sarkar, and M. Dey, "A Comparative Study of
Support Vector Machine and Naive Bayes Classifier for Sentiment Analysis on Amazon
Product Reviews," 2020 International Conference on Contemporary Computing and
Applications (IC3A), Lucknow, India, 2020, pp. 217-220, doi:
10.1109/IC3A48958.2020.233300.
[16] Kim, S. G., & Kang, J. (2018). Analyzing the discriminative attributes of products using
text mining focused on cosmetic reviews. Information Processing & Management, 54(6),
938-957.
[17] Perwej, Dr. Yusuf & Divya, Km & Rastogi, Dr & Yadav, Puneet. (2022). Sentimental
Analysis on Web Scraping Using Machine Learning Method. Journal of Information and
Computational Science. Volume 12. 10.12733/JICS.2022/V12I08.535569.67004.
[18] M. S. Parvez, K. S. A. Tasneem, S. S. Rajendra and K. R. Bodke, "Analysis Of Different Web
Data Extraction Techniques," 2018 International Conference on Smart City and Emerging
Technology (ICSCET), Mumbai, India, 2018, pp. 1-7, doi: 10.1109/ICSCET.2018.8537333.
[19] Zhang, L., Wang, S., & Liu, B. (2018). Deep learning for sentiment analysis: A survey.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1253.
doi:10.1002/widm.1253
[20] Haque, T. U., Saber, N. N., & Shah, F. M. (2018). Sentiment analysis on large scale
Amazon product reviews. 2018 IEEE International Conference on Innovative Research and
Development (ICIRD). doi:10.1109/icird.2018.8376299