0% found this document useful (0 votes)
4 views10 pages

BI Case Study 3

This case study focuses on a text analysis project analyzing customer feedback for the iPhone 16, utilizing business intelligence techniques to derive insights for product improvement. The analysis revealed a predominantly neutral to positive sentiment among users, with notable concerns regarding battery life and screen refresh rate. Challenges included manual data collection limitations and natural language complexities, with proposed solutions involving automation and advanced NLP models.

Uploaded by

kenomeshack
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views10 pages

BI Case Study 3

This case study focuses on a text analysis project analyzing customer feedback for the iPhone 16, utilizing business intelligence techniques to derive insights for product improvement. The analysis revealed a predominantly neutral to positive sentiment among users, with notable concerns regarding battery life and screen refresh rate. Challenges included manual data collection limitations and natural language complexities, with proposed solutions involving automation and advanced NLP models.

Uploaded by

kenomeshack
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

BIA-5401-0LC

Professor Viktoria Varga


Case Study #3
Submitted Date: 2 April 2025
David Akporuere

Isha Ajit Phakatkar

Ismail Sahin

Meshack Oniera

Rakshini Prabu

Tran Bao Ngoc Nguyen

1
Contents
Summary of the case:.................................................................3
Deliverables................................................................................4
Step 1: Text Analysis Application for Business:.........................4
Step 2: Data Collection Strategy...............................................4
Step 3: Data Storage Strategy...................................................5
Step 4: Text Corpus Construction in Python.....................7
Step 5: Discussion of Challenges...............................................8
Results and Key Findings:.........................................................9
Conclusion:.............................................................................10

Summary of the case:


2
This assignment involves a text analysis project focused on customer feedback regarding the iPhone 16.
The project utilizes business intelligence techniques to extract insights from user-generated content,
facilitating data-driven decision-making for product improvement and marketing strategies. Key steps
include data collection from various tech platforms, sentiment analysis using tools like Python and NLTK,
and data storage in CSV format. The analysis revealed a predominantly neutral to positive sentiment
among users, with concerns over features like battery life and screen refresh rate. Challenges
encountered included manual data collection limitations and natural language complexities, with
proposed solutions involving automation and advanced NLP models.

3
Deliverables
Step 1: Text Analysis Application for Business:
● Data Source: User-generated content (reviews, forum posts, expert commentary)
pertaining to the iPhone 16.
● Business Value: Analyzing customer feedback provides direct insights into
customer perceptions, preferences, and pain points. This informs data-driven
decisions to improve products/services, enhance customer satisfaction, and
drive competitive advantage.
● Example: Identification of prevalent complaints regarding battery life serves as a
clear indicator for improvements in future iPhone models.

Step 2: Data Collection Strategy


● Sources: PCMag, Reddit, Medium, Apple Community, GSM Arena, The Verge, and
other tech platforms.
● Method: Manual curation of user and expert review snippets.
● Format: Plain text (.txt) file named "iPhone16_reviews.txt."
● Tools:
● Data Source: Websites & online tech review platforms
● Data Format: .txt file
● Data Tool: Python open() for reading, nltk for sentiment analysis
● Rationale: Manual collection ensured diverse and relevant content without API or
authentication complexities. The collected reviews were saved in a .txt file
containing 15 structured lines, each representing one review. These were read
into Python using a simple file handler (open()), and each line was treated as a
separate review record for analysis.
This following pie chart illustrates the sentiment distribution of iPhone 16 reviews.
● Neutral: 62.5% of the reviews had a neutral sentiment.
● Positive: 34.4% of the reviews were positive.
● Negative: Only 3.1% of the reviews were negative.

4
Step 3: Data Storage Strategy
1. Data Structure: The CSV file stores the data in a tabular format. Each row
represents a single review, and the columns represent different attributes of the
review and its analysis. Based on the notebook, the columns include:
● "Review #": The index or number of the review.
● "Review": The actual text of the iPhone 16 review.
● "Sentiment": The overall sentiment classification assigned to the review
(e.g., "Positive", "Negative", or "Neutral").
● "Score": The compound sentiment score generated by the VADER
sentiment analyzer (a numerical value indicating the intensity and
direction of the sentiment).
2. Implementation: With the use of the pandas library to create a DataFrame from
the sentiment analysis results and then save the DataFrame to a CSV file using
the to_csv() function. The index=False argument prevents pandas from writing
the DataFrame index as a separate column in the CSV.
3. Rationale: As previously stated, the reasons for choosing CSV in this context are:

5
● Simplicity: CSV is a very simple file format, making it easy to understand,
create, and parse.
● Portability: CSV files can be opened and viewed in almost any spreadsheet
program (Excel, Google Sheets, etc.) or text editor.
● Ease of Use with Pandas: pandas provides excellent support for reading
and writing CSV files, making it a natural choice for data analysis
workflows.
● Suitable for Tabular Data: The sentiment analysis results are inherently
tabular, making CSV a good fit.

4. Advantages in this Specific Case:


● The case processes a limited number of iPhone 16 reviews (based on the
example output). CSV is perfectly adequate for this scale.
● The analysis is relatively straightforward (sentiment classification and
scoring). Complex querying or data relationships aren't required.
● The focus is on analyzing the sentiment of the reviews and storing the
results for further examination. CSV provides a simple way to achieve this.
5. Limitations and Alternatives:
● Scalability: CSV is not suitable for very large datasets (millions or billions
of reviews).

6
● Querying: CSV files are not efficient for complex queries. we would need to
load the entire file into memory and then perform filtering or searching.
● Concurrency: CSV files are not designed for concurrent access by multiple
users or processes.
● Data Integrity: CSV files do not enforce data types or constraints.

Step 4: Text Corpus Construction in Python


In this step, we structured and cleaned our collected reviews to create a well-prepared text
corpus for analysis. The main objective was to transform raw, unstructured text into a format
suitable for natural language processing (NLP).

Text Preprocessing Steps:

To ensure consistency and accuracy in analysis, we applied several preprocessing techniques:

● Text Normalization – Convert text to a standard format by handling special characters


and encoding issues.
● Lowercasing – Standardize all text by converting it to lowercase.
● Tokenization – Split text into individual words for better processing.
● Stopword Removal – Eliminate common words that do not add value (e.g., “the,” “is,”
“and”).
● Punctuation & Special Character Removal – Clean unnecessary symbols and digits.

● Lemmatization – Reduce words to their root form (e.g., “running” →


“run”).

7
Results & Key Changes

Step 5: Discussion of Challenges

The team encountered several challenges during the text analysis process, which
influenced the project's efficiency, accuracy, and scalability.
1. Limitations in Manual Data Collection
● Challenge: Labor-intensive and potentially biased data collection due to
the absence of APIs or web scraping tools.
● Proposed Solution: Employ web scraping tools (e.g., BeautifulSoup) or
leverage APIs (e.g., Reddit API, Google Search API) to automate data
collection, expand data volume, and ensure a more representative dataset.
2. Presence of Data Noise and Inconsistencies
● Challenge: Non-standard expressions, spelling errors, emojis, and
inconsistent formatting compromised analysis quality.
● Proposed Solution: Implement advanced text-cleaning functions, such as
emoji removal and typo correction with libraries like TextBlob, to enhance
data quality further.
3. Ambiguity and Complexity in Natural Language
● Challenge: Ambiguity in natural language, sarcasm, and mixed sentiments
complicated the interpretation of results from the VADER sentiment
analyzer.
● Proposed Solution: Combine VADER with other NLP models like TextBlob
or utilize transformer-based models like BERT (especially for larger
datasets) to enhance the accuracy of sentiment detection.

8
4. Limited Size of the Dataset
● Challenge: The small dataset (15 reviews) limited the generalizability of
sentiment findings and increased susceptibility to outliers.
● Proposed Solution: Automate data collection to increase the dataset to
hundreds or thousands of reviews, enabling more thorough statistical
analysis and trend visualization.
5. Constraints in Storage and Data Management
● Challenge: CSV file limitations for managing larger or real-time data,
restrictions regarding querying or simultaneous access.
● Proposed Solution: Implement structured databases like SQLite or NoSQL
options such as MongoDB to facilitate efficient storage, indexing, and
querying of unstructured text data for larger-scale analyses.

Results and Key Findings:


1. Sentiment Analysis Findings:
● The sentiment distribution of iPhone 16 reviews revealed that:
● 62.5% of the reviews were neutral.
● 34.4% were positive.
● Only 3.1% were negative.
● This indicates a predominantly neutral to positive perception of the iPhone
16, with minimal outright dissatisfaction.
2. Insights from Customer Feedback:
● Positive aspects highlighted include seamless functionality, excellent
camera performance, and high-quality construction.
● Negative aspects focused on issues such as battery life, notification
system usability, and occasional hardware failures like charging port
malfunctions.
3. Challenges in Data Processing:
● Manual data collection was labor-intensive and limited in scale.
● Ambiguities in natural language, such as sarcasm and mixed sentiments,
complicated sentiment classification.
● The dataset was small (15 reviews), limiting generalizability.

9
4. Proposed Solutions:
● Automation of data collection using web scraping or APIs to increase
dataset size and diversity.
● Use of advanced NLP models (e.g., BERT) for more nuanced sentiment
analysis.
● Transition to structured databases (e.g., SQLite, MongoDB) for efficient
storage and querying of larger datasets.

Conclusion:
The text analysis project successfully demonstrated the value of customer feedback in
identifying areas for improvement and guiding product development for the iPhone 16.
While the sentiment analysis provided actionable insights into customer perceptions,
scalability and accuracy challenges were evident due to the manual data collection
process and limited dataset size. Future iterations should leverage automation tools
and advanced NLP techniques to enhance efficiency and analytical depth, ensuring
more comprehensive and reliable insights for business decision-making.

10

You might also like