App Review Summarizer For Google Play Store Using OpenAI
App Review Summarizer For Google Play Store Using OpenAI
for
Google Play Store
using
OpenAI
1 ANSHUMAN JHA
Table of Contents
1. Overview
2. Objectives
3. Key Features
4. Step-by-Step
Walkthrough
5. Benefits
6. Conclusion
7. Call to Action
8. Link of Example Google
Colab Notebook
2 ANSHUMAN JHA
1.Overview
3 ANSHUMAN JHA
enables you to make data-driven
decisions with clarity and efficiency.
2. Objectives
1. Enhance Customer Insights:
Quickly gain a structured overview
of user opinions, highlighting key
positive and negative aspects of your
app.
3. Key Features
5 ANSHUMAN JHA
4. LLM Summarization: Generating
concise summaries using LangChain
and OpenAI.
4. Step-by-Step
Walkthrough
• Use RecursiveCharacterTextSplitter
to manage and prepare review texts
for embedding.
8 ANSHUMAN JHA
4.7 Defining the Review
Summarization Prompt
10 ANSHUMAN JHA
Benefits
• Time-Saving
Automate the process of review
analysis and gain insights faster.
• Accuracy
Use advanced AI to ensure summaries
are precise and reflect the true
sentiments of users.
11 ANSHUMAN JHA
• Scalability
Handle large volumes of reviews with
ease, making it suitable for apps with
extensive user feedback.
Conclusion
Call to Action
13 ANSHUMAN JHA
Link of Example Google
Colab Notebook
14 ANSHUMAN JHA
App_Review_Summarizer_for_Google_Play_Store_using_OpenAI
[ ]: import openai
from langchain import LLMChain, PromptTemplate
import chromadb
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from google_play_scraper import Sort, reviews_all
import pandas as pd
from tqdm import tqdm
[ ]: openai.api_key = 'your_openai_api_key'
1
Get the app id of the Application from Playstore you want to fetch the reviews of -
e.g. For Facebook app, the link on the Google Play Store is :
https://play.google.com/store/apps/details/Facebook?id=com.facebook.katana&hl=en_ZA
So the app id for Facebook is com.facebook.katana
[ ]: app_id = 'com.facebook.katana'
import pandas as pd
from datetime import datetime
from tqdm import tqdm
import time
import json
from time import sleep
from typing import List, Optional, Tuple
[ ]: MAX_COUNT_EACH_FETCH = 199
class _ContinuationToken:
__slots__ = (
"token",
"lang",
"country",
"sort",
"count",
"filter_score_with",
"filter_device_with",
)
def __init__(
self, token, lang, country, sort, count, filter_score_with,␣
↪filter_device_with
):
self.token = token
self.lang = lang
self.country = country
self.sort = sort
self.count = count
self.filter_score_with = filter_score_with
self.filter_device_with = filter_device_with
2
def _fetch_review_items(
url: str,
app_id: str,
sort: int,
count: int,
filter_score_with: Optional[int],
filter_device_with: Optional[int],
pagination_token: Optional[str],
):
dom = post(
url,
Formats.Reviews.build_body(
app_id,
sort,
count,
"null" if filter_score_with is None else filter_score_with,
"null" if filter_device_with is None else filter_device_with,
pagination_token,
),
{"content-type": "application/x-www-form-urlencoded"},
)
match = json.loads(Regex.REVIEWS.findall(dom)[0])
def reviews(
app_id: str,
lang: str = "en",
country: str = "us",
sort: Sort = Sort.MOST_RELEVANT,
count: int = 100,
filter_score_with: int = None,
filter_device_with: int = None,
continuation_token: _ContinuationToken = None,
) -> Tuple[List[dict], _ContinuationToken]:
sort = sort.value
if token is None:
return (
[],
continuation_token,
)
3
lang = continuation_token.lang
country = continuation_token.country
sort = continuation_token.sort
count = continuation_token.count
filter_score_with = continuation_token.filter_score_with
filter_device_with = continuation_token.filter_device_with
else:
token = None
_fetch_count = count
result = []
while True:
if _fetch_count == 0:
break
try:
review_items, token = _fetch_review_items(
url,
app_id,
sort,
_fetch_count,
filter_score_with,
filter_device_with,
token,
)
except (TypeError, IndexError):
#funnan MOD start
token = continuation_token.token
continue
#MOD end
4
if isinstance(token, list):
token = None
break
return (
result,
_ContinuationToken(
token, lang, country, sort, count, filter_score_with,␣
↪filter_device_with
),
)
continuation_token = None
result = []
while True:
_result, continuation_token = reviews(
app_id,
count=MAX_COUNT_EACH_FETCH,
continuation_token=continuation_token,
**kwargs
)
result += _result
if continuation_token.token is None:
break
if sleep_milliseconds:
sleep(sleep_milliseconds / 1000)
return result
[ ]: reviews_count = 25000
[ ]: result = []
continuation_token = None
5
while len(result) < reviews_count:
new_result, continuation_token = reviews(
app_id,
continuation_token=continuation_token,
lang='en', #The language of review
country='in', #Country for which you want to scrape
sort=Sort.NEWEST,
filter_score_with=None,
count=199 #No need to change this
)
if not new_result:
break
result.extend(new_result)
pbar.update(len(new_result))
[ ]: df = pd.DataFrame(result)
df.head(5)
[ ]: reviewId userName \
0 94313fff-72a7-476c-ad2f-387aac0bc58a Norbert Jardeleza
1 4d6010c2-8140-4236-b44c-d5f95810b4f9 Donna McMurren
2 29c05994-1cd3-4528-8c72-b213febee8c0 Tobiloba Jesuferanmi
3 226f5781-1b80-4f6b-8d19-ad23b4193df3 Skyler Lee
4 2d0f8a85-f224-4242-bb51-034cd92199b6 Solange Ntube
userImage \
0 https://play-lh.googleusercontent.com/a-/ALV-U…
1 https://play-lh.googleusercontent.com/a/ACg8oc…
2 https://play-lh.googleusercontent.com/a-/ALV-U…
3 https://play-lh.googleusercontent.com/a-/ALV-U…
4 https://play-lh.googleusercontent.com/a-/ALV-U…
6
appVersion
0 480.0.0.54.88
1 324.0.0.48.120
2 480.0.0.54.88
3 None
4 463.1.0.53.85
# Initialize ChromaDB
chroma_db = Chroma(embedding_function=embeddings)
7
Start with a brief 'Customers say' section that summarizes the overall␣
↪sentiment and main points.
----
"""
)
8
# Generate summaries for each chunk
summaries = []
for chunk in relevant_chunks:
summary = llm_chain.run(name="Example Product", reviews=chunk)
summaries.append(summary)
Customers say: Overall, customers have positive feedback for Example Product,
praising its excellent designs and very good quality materials. They also
appreciate its versatility as a social platform and intuitive user interface,
but mention some concerns with resource usage, privacy, and frequent ads.
� Positive aspects: Customers frequently mention the excellent designs and very
good quality materials of Example Product. They also appreciate its versatility
as a social platform and intuitive user interface.
� Negative aspects: Some customers report concerns with resource usage, privacy,
and frequent ads while using the app.
� Mixed opinions: Some customers have varying opinions on the app's resource
usage and privacy concerns.
� Positive aspects:
1. Easy to use: Many customers mention that Example Product is straightforward
to use.
2. Informative: Users are impressed with the amount of new information they
learn from Example Product.
3. Five-star rating: The majority of customers give Example Product a five-star
rating, indicating high satisfaction.
9
� Mixed opinions:
- None mentioned.
� Negative aspects:
1. None mentioned.
Key features:
- User-friendly interface �
- High-quality information �
- Five-star rating �
Customers say: Many customers report experiencing visual and marketplace
glitches in the app, but praise its social media integration.
� Positive Aspects:
- Social media integration
- Up-to-date app
- Suggested local listings
� Negative Aspects:
- Visual and marketplace glitches
- Search area reverting to 250 miles
� Mixed Opinions:
- Specific keyword searches may return results outside of the desired area.
Key Features:
- Social media integration �
- Up-to-date app �
- Marketplace with suggested local listings �
- Visual glitches �
- Search area reverting to 250 miles �
- Keyword search functionality �
Customers say: Customers are overall very happy with this product, describing it
as good, wonderful, and a great way to connect with others and advertise
products. Some customers also mention that the app is nice and interesting.
� Positive aspects: The app is easy to use, has a lot of features, and allows
for voice comments.
� Negative aspects: Some customers report a need for updates, specifically for
voice comments.
� Mixed opinions: There are mixed opinions about the app's design, with some
finding it impressive and others finding it discreet.
10
opinions on its design and some request updates for certain features.
Customers say: Customers generally have mixed opinions about Example Product.
While some appreciate the convenience of using the app for Facebook, others are
not satisfied with being constantly asked to leave a review.
� Positive: Many customers mention the ease and convenience of using the app for
Facebook, as well as the absence of any problems while using it.
� Negative: Some customers report being annoyed with the constant requests to
leave a review for the app.
� Mixed opinions: There are mixed opinions about the usefulness and necessity of
leaving a review for the app.
Key features:
� Easy to use for Facebook
� Constant requests for reviews
� Mixed opinions on the usefulness of leaving a review.
Overall, customers have mixed opinions about Example Product, with some
appreciating its convenience for using Facebook and others being annoyed with
the frequent review requests. While the app seems to be generally problem-free,
the need to leave a review is a point of contention among users.
1.0.10 Conclusion
In this tutorial, we built an LLM-based product review summarization tool using Google Play
Store reviews. We used a combination of web scraping, text chunking, embedding with ChromaDB,
and querying through LangChain to generate well-structured, easy-to-read summaries of product
reviews. This approach can be adapted to summarize reviews from other platforms or any large
text dataset.
11