0% found this document useful (0 votes)
22 views25 pages

App Review Summarizer For Google Play Store Using OpenAI

The document outlines a tutorial for creating a Google Play Store app review summarization tool using OpenAI. It details objectives such as enhancing customer insights and streamlining data processing, along with key features like web scraping and LLM summarization. The guide includes a step-by-step walkthrough for setting up the environment, scraping reviews, processing text, and generating concise summaries to aid in data-driven decision-making.

Uploaded by

anitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views25 pages

App Review Summarizer For Google Play Store Using OpenAI

The document outlines a tutorial for creating a Google Play Store app review summarization tool using OpenAI. It details objectives such as enhancing customer insights and streamlining data processing, along with key features like web scraping and LLM summarization. The guide includes a step-by-step walkthrough for setting up the environment, scraping reviews, processing text, and generating concise summaries to aid in data-driven decision-making.

Uploaded by

anitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

App Review Summarizer

for
Google Play Store
using
OpenAI

1 ANSHUMAN JHA
Table of Contents

1. Overview
2. Objectives
3. Key Features
4. Step-by-Step
Walkthrough
5. Benefits
6. Conclusion
7. Call to Action
8. Link of Example Google
Colab Notebook

2 ANSHUMAN JHA
1.Overview

In today's digital world,


understanding customer feedback
quickly and effectively can be a game-
changer for businesses.

Our tutorial provides a comprehensive


guide to building a sophisticated tool
that scrapes and summarizes Google
Play Store reviews using OpenAI.

This tool not only enhances your


ability to process and interpret vast
amounts of user feedback but also

3 ANSHUMAN JHA
enables you to make data-driven
decisions with clarity and efficiency.

2. Objectives
1. Enhance Customer Insights:
Quickly gain a structured overview
of user opinions, highlighting key
positive and negative aspects of your
app.

2. Streamline Data Processing:


Automatically process thousands of
reviews to uncover actionable
insights without manual effort.

3. Leverage Advanced AI: Utilize


state-of-the-art AI models for
summarizing and categorizing
4 ANSHUMAN JHA
reviews into a concise, user-friendly
format.

3. Key Features

1. Web Scraping: Scraping up to


25,000 reviews from the Google Play
Store.

2. Text Chunking: Breaking long


reviews into smaller, manageable
chunks using
RecursiveCharacterTextSplitter.

3. Embeddings with ChromaDB:


Storing text embeddings in
ChromaDB for efficient querying.

5 ANSHUMAN JHA
4. LLM Summarization: Generating
concise summaries using LangChain
and OpenAI.

4. Step-by-Step
Walkthrough

4.1 Setting Up the Environment

• Install required libraries with a


simple pip command to get started
quickly.

4.2 Importing Required Libraries

• Load and configure libraries


necessary for web scraping, text
6 ANSHUMAN JHA
processing, embedding, and
summarization.

4.3 Initializing OpenAI API

• Set up your OpenAI API key to


access advanced language models.

4.4 Scraping Google Play Store


Reviews

• Fetch up to 25,000 reviews from


your chosen app using the google-
play-scraper library.
7 ANSHUMAN JHA
• Detailed scraping procedure with
error handling and pagination.

4.5 Splitting Reviews into Chunks

• Use RecursiveCharacterTextSplitter
to manage and prepare review texts
for embedding.

4.6 Embedding and Storing in


ChromaDB

• Embed text chunks with OpenAI


embeddings and store them in
ChromaDB for efficient querying.

8 ANSHUMAN JHA
4.7 Defining the Review
Summarization Prompt

• Create a detailed prompt template to


guide the AI in summarizing reviews
with structured and actionable
insights.

4.8 Querying and Summarization

• Retrieve relevant chunks from


ChromaDB and use OpenAI’s
9 ANSHUMAN JHA
language model to generate
summaries.

4.9 Outputting the Summary

• Display the final, concise summary


of product reviews for easy
interpretation and decision-making.

10 ANSHUMAN JHA
Benefits

• Time-Saving
Automate the process of review
analysis and gain insights faster.

• Accuracy
Use advanced AI to ensure summaries
are precise and reflect the true
sentiments of users.

11 ANSHUMAN JHA
• Scalability
Handle large volumes of reviews with
ease, making it suitable for apps with
extensive user feedback.

Conclusion

This tool efficiently summarizes large


volumes of Google Play Store app
reviews into concise, actionable
insights using OpenAI’s LLM.

The combination of web scraping,


ChromaDB for embedding storage,
and LangChain for querying provides
a powerful solution for handling large
text datasets.
12 ANSHUMAN JHA
This framework can be easily adapted
for other platforms or review data
sources.

Call to Action

Ready to streamline your review


analysis and gain deeper insights into
customer feedback? Follow our step-
by-step tutorial to build your own
Google Play Store app review
summarization tool and stay ahead of
the competition.
For more details and to get started,
visit [Your Tutorial Link Here].

13 ANSHUMAN JHA
Link of Example Google
Colab Notebook

14 ANSHUMAN JHA
App_Review_Summarizer_for_Google_Play_Store_using_OpenAI

September 12, 2024

1 Building a Google Play Store App Review Summarization Tool


Using openAI
In this tutorial, we will build an application that scrapes product reviews from the Google Play Store
and uses a large language model (LLM) to summarize the reviews into a concise and structured
format. We’ll integrate web scraping, text chunking, embedding with ChromaDB, and querying
using LangChain to achieve this.

1.0.1 1. Setting Up the Environment


First, make sure you have the required libraries installed:
[ ]: !pip install openai langchain chromadb google-play-scraper pandas tqdm textblob

1.0.2 2. Import Required Libraries

[ ]: import openai
from langchain import LLMChain, PromptTemplate
import chromadb
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from google_play_scraper import Sort, reviews_all
import pandas as pd
from tqdm import tqdm

1.0.3 3. Initialize OpenAI API

[ ]: openai.api_key = 'your_openai_api_key'

1.0.4 4. Scraping Google Play Store Reviews


We’ll scrape up to 25,000 reviews from the Google Play Store using the google-play-scraper
library.
[ ]: import google_play_scraper

1
Get the app id of the Application from Playstore you want to fetch the reviews of -
e.g. For Facebook app, the link on the Google Play Store is :
https://play.google.com/store/apps/details/Facebook?id=com.facebook.katana&hl=en_ZA
So the app id for Facebook is com.facebook.katana
[ ]: app_id = 'com.facebook.katana'

[ ]: from google_play_scraper import Sort


from google_play_scraper.constants.element import ElementSpecs
from google_play_scraper.constants.regex import Regex
from google_play_scraper.constants.request import Formats
from google_play_scraper.utils.request import post

import pandas as pd
from datetime import datetime
from tqdm import tqdm
import time
import json
from time import sleep
from typing import List, Optional, Tuple

[ ]: MAX_COUNT_EACH_FETCH = 199

class _ContinuationToken:
__slots__ = (
"token",
"lang",
"country",
"sort",
"count",
"filter_score_with",
"filter_device_with",
)

def __init__(
self, token, lang, country, sort, count, filter_score_with,␣
↪filter_device_with

):
self.token = token
self.lang = lang
self.country = country
self.sort = sort
self.count = count
self.filter_score_with = filter_score_with
self.filter_device_with = filter_device_with

2
def _fetch_review_items(
url: str,
app_id: str,
sort: int,
count: int,
filter_score_with: Optional[int],
filter_device_with: Optional[int],
pagination_token: Optional[str],
):
dom = post(
url,
Formats.Reviews.build_body(
app_id,
sort,
count,
"null" if filter_score_with is None else filter_score_with,
"null" if filter_device_with is None else filter_device_with,
pagination_token,
),
{"content-type": "application/x-www-form-urlencoded"},
)
match = json.loads(Regex.REVIEWS.findall(dom)[0])

return json.loads(match[0][2])[0], json.loads(match[0][2])[-2][-1]

def reviews(
app_id: str,
lang: str = "en",
country: str = "us",
sort: Sort = Sort.MOST_RELEVANT,
count: int = 100,
filter_score_with: int = None,
filter_device_with: int = None,
continuation_token: _ContinuationToken = None,
) -> Tuple[List[dict], _ContinuationToken]:
sort = sort.value

if continuation_token is not None:


token = continuation_token.token

if token is None:
return (
[],
continuation_token,
)

3
lang = continuation_token.lang
country = continuation_token.country
sort = continuation_token.sort
count = continuation_token.count
filter_score_with = continuation_token.filter_score_with
filter_device_with = continuation_token.filter_device_with
else:
token = None

url = Formats.Reviews.build(lang=lang, country=country)

_fetch_count = count

result = []

while True:
if _fetch_count == 0:
break

if _fetch_count > MAX_COUNT_EACH_FETCH:


_fetch_count = MAX_COUNT_EACH_FETCH

try:
review_items, token = _fetch_review_items(
url,
app_id,
sort,
_fetch_count,
filter_score_with,
filter_device_with,
token,
)
except (TypeError, IndexError):
#funnan MOD start
token = continuation_token.token
continue
#MOD end

for review in review_items:


result.append(
{
k: spec.extract_content(review)
for k, spec in ElementSpecs.Review.items()
}
)

_fetch_count = count - len(result)

4
if isinstance(token, list):
token = None
break

return (
result,
_ContinuationToken(
token, lang, country, sort, count, filter_score_with,␣
↪filter_device_with

),
)

def reviews_all(app_id: str, sleep_milliseconds: int = 0, **kwargs) -> list:


kwargs.pop("count", None)
kwargs.pop("continuation_token", None)

continuation_token = None

result = []

while True:
_result, continuation_token = reviews(
app_id,
count=MAX_COUNT_EACH_FETCH,
continuation_token=continuation_token,
**kwargs
)

result += _result

if continuation_token.token is None:
break

if sleep_milliseconds:
sleep(sleep_milliseconds / 1000)

return result

[ ]: reviews_count = 25000

[ ]: result = []
continuation_token = None

with tqdm(total=reviews_count, position=0, leave=True) as pbar:

5
while len(result) < reviews_count:
new_result, continuation_token = reviews(
app_id,
continuation_token=continuation_token,
lang='en', #The language of review
country='in', #Country for which you want to scrape
sort=Sort.NEWEST,
filter_score_with=None,
count=199 #No need to change this
)
if not new_result:
break
result.extend(new_result)
pbar.update(len(new_result))

25074it [01:10, 358.13it/s]

[ ]: df = pd.DataFrame(result)
df.head(5)

[ ]: reviewId userName \
0 94313fff-72a7-476c-ad2f-387aac0bc58a Norbert Jardeleza
1 4d6010c2-8140-4236-b44c-d5f95810b4f9 Donna McMurren
2 29c05994-1cd3-4528-8c72-b213febee8c0 Tobiloba Jesuferanmi
3 226f5781-1b80-4f6b-8d19-ad23b4193df3 Skyler Lee
4 2d0f8a85-f224-4242-bb51-034cd92199b6 Solange Ntube

userImage \
0 https://play-lh.googleusercontent.com/a-/ALV-U…
1 https://play-lh.googleusercontent.com/a/ACg8oc…
2 https://play-lh.googleusercontent.com/a-/ALV-U…
3 https://play-lh.googleusercontent.com/a-/ALV-U…
4 https://play-lh.googleusercontent.com/a-/ALV-U…

content score thumbsUpCount \


0 Since the update, reactions don't always show,… 1 0
1 Happy Anniversary !!!!! 5 0
2 Good app 5 0
3 Facebook is always good in the world's than an… 5 0
4 I love this app so much 5 0

reviewCreatedVersion at replyContent repliedAt \


0 480.0.0.54.88 2024-09-11 21:45:23 None None
1 324.0.0.48.120 2024-09-11 21:42:45 None None
2 480.0.0.54.88 2024-09-11 21:42:02 None None
3 None 2024-09-11 21:40:18 None None
4 463.1.0.53.85 2024-09-11 21:39:28 None None

6
appVersion
0 480.0.0.54.88
1 324.0.0.48.120
2 480.0.0.54.88
3 None
4 463.1.0.53.85

1.0.5 5. Splitting Reviews into Chunks


Since reviews can be long, we’ll split them into manageable chunks using
RecursiveCharacterTextSplitter.
[ ]: reviews_text = " ".join(df['content']) # Combine all review texts into one␣
↪string

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)


chunks = text_splitter.split_text(reviews_text)

1.0.6 6. Embedding and Storing in ChromaDB


We will embed the text chunks using OpenAI embeddings and store them in ChromaDB.
[ ]: from langchain.schema import Document
# Initialize the embeddings model
embeddings = OpenAIEmbeddings(openai_api_key=openai.api_key)

# Initialize ChromaDB
chroma_db = Chroma(embedding_function=embeddings)

# Convert chunks to Document objects and add them to ChromaDB


documents = [Document(page_content=chunk) for chunk in chunks]
chroma_db.add_documents(documents)

1.0.7 7. Defining the Review Summarization Prompt


Now, we set up a detailed prompt template for the LLM to summarize the reviews. The prompt will
extract key points and categorize customer feedback into positive, negative, and mixed opinions.
[ ]: # Define the prompt template
prompt_template = PromptTemplate(
input_variables=["name", "reviews"],
template="""
You are an AI assistant specialized in analyzing and summarizing product␣
↪reviews.

Your task is to synthesize multiple customer reviews into a concise summary␣


↪that highlights key points about the product.

Follow these guidelines:

7
Start with a brief 'Customers say' section that summarizes the overall␣
↪sentiment and main points.

Identify 3-5 positive aspects that customers frequently mention.


Identify 2-3 negative aspects or issues that some customers report.
Note any mixed opinions or areas where customer feedback varies.
Create a list of key features or aspects of the product, categorizing them as␣
↪follows:

Positive (use a green checkmark emoji �)


Negative (use a red X emoji �)
Mixed opinions (use an orange circle emoji �)

Present the information in a short paragraph no more than 5 sentences long.


Use clear, concise language and avoid unnecessary detail.
If the product is aimed at a specific user group (e.g., beginners), mention␣
↪this if it comes up frequently in reviews.

----

Now for the products and reviews in question

Product Name: {name}

Product Reviews: {reviews}

"""
)

1.0.8 8. Querying and Summarization


Now, we use the prompt template with LangChain to generate the summary. We first retrieve the
most relevant chunks from ChromaDB and pass them to the LLM chain.
[ ]: from langchain.llms import OpenAI
# Initialize the OpenAI LLM with your API key
llm = OpenAI(openai_api_key=openai.api_key)

[ ]: # Create the LLMChain with the OpenAI model and prompt


llm_chain = LLMChain(
llm=llm,
prompt=prompt_template
)

[ ]: # Retrieve relevant chunks


query = "Summarize the key points from the product reviews."
relevant_chunks = chroma_db.similarity_search(query, k=5)

8
# Generate summaries for each chunk
summaries = []
for chunk in relevant_chunks:
summary = llm_chain.run(name="Example Product", reviews=chunk)
summaries.append(summary)

# Combine summaries into the final output


final_summary = " ".join(summaries)

1.0.9 9. Output the Summary


Finally, we can output the structured and concise summary of the product reviews.
[ ]: print("Final Product Review Summary:")
print(final_summary)

Final Product Review Summary:

Customers say: Overall, customers have positive feedback for Example Product,
praising its excellent designs and very good quality materials. They also
appreciate its versatility as a social platform and intuitive user interface,
but mention some concerns with resource usage, privacy, and frequent ads.

� Positive aspects: Customers frequently mention the excellent designs and very
good quality materials of Example Product. They also appreciate its versatility
as a social platform and intuitive user interface.

� Negative aspects: Some customers report concerns with resource usage, privacy,
and frequent ads while using the app.

� Mixed opinions: Some customers have varying opinions on the app's resource
usage and privacy concerns.

Key features: Excellent designs, very good quality materials, versatility as a


social platform, intuitive user interface, resource-intensive, privacy concerns,
frequent ads.
Customers say: Overall, customers are happy with Example Product and give it a
five-star rating. They find it easy to use and appreciate its ability to provide
new information.

� Positive aspects:
1. Easy to use: Many customers mention that Example Product is straightforward
to use.
2. Informative: Users are impressed with the amount of new information they
learn from Example Product.
3. Five-star rating: The majority of customers give Example Product a five-star
rating, indicating high satisfaction.

9
� Mixed opinions:
- None mentioned.

� Negative aspects:
1. None mentioned.

Key features:
- User-friendly interface �
- High-quality information �
- Five-star rating �
Customers say: Many customers report experiencing visual and marketplace
glitches in the app, but praise its social media integration.

� Positive Aspects:
- Social media integration
- Up-to-date app
- Suggested local listings

� Negative Aspects:
- Visual and marketplace glitches
- Search area reverting to 250 miles

� Mixed Opinions:
- Specific keyword searches may return results outside of the desired area.

Key Features:
- Social media integration �
- Up-to-date app �
- Marketplace with suggested local listings �
- Visual glitches �
- Search area reverting to 250 miles �
- Keyword search functionality �
Customers say: Customers are overall very happy with this product, describing it
as good, wonderful, and a great way to connect with others and advertise
products. Some customers also mention that the app is nice and interesting.

� Positive aspects: The app is easy to use, has a lot of features, and allows
for voice comments.

� Negative aspects: Some customers report a need for updates, specifically for
voice comments.

� Mixed opinions: There are mixed opinions about the app's design, with some
finding it impressive and others finding it discreet.

Key features: Easy to use interface �, diverse features �, voice comments �,


need for updates �, mixed opinions on design �. Overall, customers are satisfied
with this app as a way to connect and advertise products, but have mixed

10
opinions on its design and some request updates for certain features.
Customers say: Customers generally have mixed opinions about Example Product.
While some appreciate the convenience of using the app for Facebook, others are
not satisfied with being constantly asked to leave a review.

� Positive: Many customers mention the ease and convenience of using the app for
Facebook, as well as the absence of any problems while using it.

� Negative: Some customers report being annoyed with the constant requests to
leave a review for the app.

� Mixed opinions: There are mixed opinions about the usefulness and necessity of
leaving a review for the app.

Key features:
� Easy to use for Facebook
� Constant requests for reviews
� Mixed opinions on the usefulness of leaving a review.

Overall, customers have mixed opinions about Example Product, with some
appreciating its convenience for using Facebook and others being annoyed with
the frequent review requests. While the app seems to be generally problem-free,
the need to leave a review is a point of contention among users.

1.0.10 Conclusion
In this tutorial, we built an LLM-based product review summarization tool using Google Play
Store reviews. We used a combination of web scraping, text chunking, embedding with ChromaDB,
and querying through LangChain to generate well-structured, easy-to-read summaries of product
reviews. This approach can be adapted to summarize reviews from other platforms or any large
text dataset.

11

You might also like