Report Project Final Final
Report Project Final Final
ON
OF
SUBMITTED BY
Submitted by
is a bonafide student of this institute and the work has been carried out by him/her under the
supervision of Mrs. Sunayana Sutar and it is approved for the partial fulfillment of the requirement
of Savitribai Phule Pune University, for the award of the degree of Bachelor of Engineering
(Artificial Intelligence and Data Science).
Place : Pune
Date :
2
ACKNOWLEDGEMENT
It gives us great pleasure in presenting the preliminary project report on ‘Focus Feed: A Personalized
News Network’.
I would like to take this opportunity to thank my internal guide Mrs. Sunayana Sutar for giving me
all the help and guidance I needed. I am really grateful to them for their kind support. Their valuable
suggestions were very helpful.
I am also grateful to Dr. Suvarna Patil, Head of Artificial Intelligence and Data Science Department,
Dr. D. Y. Patil Institute of Engineering, Management & Research for her indispensable support,
suggestions.
In the end our special thanks to Mrs. Sneha Kanawade for providing various resources such as
laboratory with all needed software platforms, continuous guidance, for Our Project.
Aniket Bhand
Nilkanth Ahire
Shreyash Chavan
Vaibhav jadhav
3
ABSTRACT
The understanding of product labels is a challenge that everyone faces today especially when health
is of concern. Edible aimed to address this issue by facilitating the parsing of ingredient lists and
nutritional information. NutriChek is a web-based application, which is capable of immediate
analysis and interpretation of nutritional information off the labels of products by using Optical
Character Recognition combined with cutting edge Machine Learning and Natural Language
Processing techniques. The system works in the following way: text is extracted from an image, then
analyzed and a short report is generated that lists the benefits of the product, warns about the harmful
components together with their risks and proposes healthier substitutes. This functionality motivates
users to effortlessly reach the various health-related targets they set for themselves. In addition to
that, NutriCheck helps to provide the additional benefit of ingredient analysis by encouraging the
everyday evaluation of products and enabling the consumers to choose healthier options. Whether
you are out shopping or examining products at home, NutriCheck offers an easy and quick way to
help you stick to your health aspirations with your purchases and lead a healthier life, helping you
manage your purchases in relation to your health aspirations.
Keywords: Optical Character Recognition (OCR), Machine Learning, Large Language Models
(LLMs), Healthy Choices, Ingredient Analysis, Daily Products, Healthier Living.
4
TABLE OF CONTENTS
LIST OF ABBREVATIONS i
LIST OF FIGURES ii
LIST OF TABLES iii
5
3.3.2 Software Requirements (Platform Choice) 19
5.3.3 Hardware Requirements 19
3.6 Analysis Models: SDLC Model to be applied 19
3.7 System Implementation Plan 21
04 System Design 22
4.1 System Architecture 22
4.2 Data Flow Diagrams 23
4.3 Entity Relationship Diagrams 25
4.4 UML Diagram 26
4.5 Sequence Diagrams 26
4.6 Activity Diagram 27
05 Other Specification 28
5.1 Advantages 28
5.2 Limitations 28
5.3 Applications 28
Appendix A: 29
Appendix B:
Details of the papers/Copyright. Summary of the Paper/copyright in not more than 3-4
lines. Here you should write the seed idea of the papers/ copyright you had referred for
preparation of this project report in the following format.
Example:
Thomas Noltey, Hans Hanssony, Lucia Lo Belloz,”Communication Buses for Automotive
Applications” In Proceedings of the 3rd Information Survivability Workshop (ISW-2007),
Boston, Massachusetts, USA, October 2007. IEEE Computer Society.
References 29
6
LIST OF ABBREVATIONS
Abbreviation Illustration
NCF Neural Collaborative Filtering
CBF Content-Based Filtering
7
LIST OF FIGURES
1 Agile Model 20
2 System Architecture 22
3 DFD- level 0 23
4 DFD- level 1 23
5 DFD- level 2 24
7 UML diagram 26
8 Sequence Diagram 26
9 Activity Diagram 27
8
LIST OF TABLES
9
01. INTRODUCTION
1.1 OVERVIEW
Knowing the contents of products is a thing of a healthy lifestyle, but translating complexity in
product labels into an easily readable message for consumers is not always feasible. "NutriCheck:
Decoding Everyday Products for Optimal Living" helps out with this by offering a web-based tool
that uses advanced Optical Character Recognition (OCR), Natural Language Processing (NLP),
and Machine Learning to automatically decode and analyze ingredient lists. NutriCheck offers a
natural artwork on the labels of products that is transformed into easy-to-understand reports. The user
can fast-read for health risks and spot useful ingredients while giving consumers the suggestion of
healthier alternatives. It empowers consumers to make informed choices very easily toward a
healthier lifestyle while at the same time promoting greater transparency in everyday products.
1.2 MOTIVATION
In the health-conscious world today, consumers look for more transparency in what they consume
daily. However, product labels are so tough to decipher; complicated ingredient lists filled with many
scientific jargons most people aren't familiar with make it very confusing and lead to poor decision-
making on what's considered better. The problem is exacerbated by the vast number of products on
the market, so it will be near impossible for consumers to research any product they may encounter.
This was the inspiration behind the creation of NutriCheck: the need for a streamlined system that
would take real-time ingredient analysis and make it actionable. Using advanced OCR, Machine
Learning, and NLP technologies, NutriCheck, therefore, allows users to easily read what is written
on a product label, in turn enabling making better choices for an ultimately better, healthier lifestyle.
11
02. LITERATURE SURVEY
12
6. Online 2024 SQL/NoSQL OCR, NLP, Automated High costs,
Analysis of ingredient and LLMs ingredient barcode
Ingredient database. for safety analysis dependency,
Safety analysis enhances underutilized
consumer LLMs
safety
10. Single Model 2022 CHEMDNE SCIBERT for Near state-of- Slightly lower F1
for Organic and R corpus chemical the-art scores than state-
Inorganic with 84,355 NER tasks. performance of-the-art
Chemical NER annotations. in chemical
NER.
13
03. SOFTWARE REQUIREMENT SPECIFICATION
3.1 INTRODUCTION
3.1.1 PROJECT SCOPE
• Ingredient Analysis and Reporting: Analyze extracted ingredient data to identify harmful
substances and assess potential health benefits, generating comprehensive reports for users..
• Personalized Health Recommendations: Provide users with tailored suggestions for healthier
alternatives based on the analysis of ingredient lists.
• User-Friendly Interface: Design an intuitive web interface that facilitates easy navigation and
interaction for users of all backgrounds, ensuring accessibility.
• Educational Resources: Incorporate informative content that enhances user understanding of
ingredient labels and nutrition, promoting informed decision-making.
• Scalability: Develop the system to efficiently handle a high volume of users and data,
ensuring consistent performance during peak traffic periods.
14
Dependencies:
• Reliable OCR Technology: NutriCheck depends on OCR technology to read text from
ingredient images accurately. If the OCR doesn’t perform well, it could lead to incorrect
analyses.
• Machine Learning Libraries: We use machine learning libraries like TensorFlow and PyTorch
to process the text and generate health assessments. Their performance is crucial for accurate
results..
• Database Infrastructure: A strong and scalable database is needed to store user profiles and
interaction history. This ensures users can easily access their saved analyses.
• User Interface Frameworks: NutriCheck uses web development frameworks such as React
and Django to create a friendly and easy-to-navigate interface. Smooth performance is
important for keeping users engaged.
The mathematical modeling for NutriCheck revolves around two primary components: Optical
Character Recognition (OCR) and Natural Language Processing (NLP) integrated with machine
learning techniques. This combination allows for efficient extraction and analysis of ingredient data
from user-uploaded images, ultimately providing personalized health insights.
• Image Input Matrix (I): An image III is represented as a matrix where each pixel's value
indicates intensity or color. The OCR process transforms this matrix into a textual
representation TTT by identifying and extracting characters and words.
• Character Recognition Model: The OCR system utilizes a machine learning model (often a
Convolutional Neural Network, CNN) to predict the likelihood P(c∣I)P(c | I)P(c∣I) of a
character ccc given the image input III. The output is a sequence of recognized characters
forming the ingredient list.
15
2. Natural Language Processing (NLP)
After OCR successfully extracts text, NLP techniques analyze the ingredients to provide
meaningful health insights:
• Health Assessment Model: We utilize machine learning models to evaluate the safety and
healthiness of extracted ingredients. This can be expressed as:
H=f(V)H = f(V)H=f(V)
Here, RuR_uRu is the final recommendation for user uuu, HHH is the health assessment derived
from the NLP model, PPP represents user preferences or historical data, and α\alphaα is a tuning
parameter that balances these contributions.
Explanation
• OCR efficiently transforms ingredient images into actionable text, ensuring accurate data
input.
• NLP analyzes the textual data to provide insights regarding health implications, using both
feature extraction and predictive modeling.
• By integrating these components, NutriCheck delivers accurate, personalized reports on the
healthiness of everyday products, empowering users to make informed dietary choices..
The hybrid approach improves recommendation quality, ensuring that FocusFeed delivers timely,
personalized, and diverse news content.
16
3.2 FUNCTIONAL REQUIRMENTS
• The system should allow users to create, update, and manage their profiles, including their
dietary preferences and health goals.
• Users must have the option to save their ingredient analysis history for future reference and
easy access.
• The system must enable users to upload images of product ingredient lists effortlessly.
• Upon upload, the system should process the image using Optical Character Recognition
(OCR) to extract text from the ingredient list accurately.
• The system should analyze the extracted ingredients using Natural Language Processing
(NLP) to provide detailed health insights.
• The insights should include potential health benefits, concerns regarding specific
ingredients, and recommendations for healthier alternatives.
• The system must maintain a history of user interactions, including uploaded images and
generated reports, accessible in a "Saved Posts" section.
• Users should be able to view, edit, and delete entries from their interaction history at any
time.
• 5. Personalized Recommendations
• Based on user profiles and past interactions, the system should offer personalized
recommendations for healthier products and alternatives.
• Recommendations must update dynamically as users upload new images and provide
feedback on the insights received.
17
3.3 EXTERNAL INTERFACE REQUIREMENTS
3.3.1 USER INTERFACES
• A clean, user-friendly dashboard where users can upload product images, view ingredient
analyses, and access their saved history.
• An admin panel for system administrators to manage updates, monitor performance, and
review system health.
18
3.5 SYSTEM REQUIREMENTS
• Database: A scalable database for storing user preferences and content data.
• Software: Use of Python for algorithms, Django for backend, and HTML, JavaScript for
frontend development.
• Hardware: Cloud-based infrastructure to support real-time recommendations.
19
Figure 3.6.1: Agile Model
20
3.7 SYSTEM IMPLEMENTION PLAN
Phase 1: Requirement Gathering and Analysis
• Objective: Collect detailed system requirements from stakeholders.
• Key Tasks: Identify user needs, analyze similar systems, define success criteria.
• Outcome: Comprehensive requirement specification document.
Phase 2: System Design
• Objective: Create system architecture and design for the recommendation engine,
database, and user interface.
• Key Tasks: Design the hybrid recommendation engine, create database schemas, design
user interfaces and data flow diagrams.
• Outcome: Complete system architecture and design documentation.
Phase 3: Development
• Objective: Build core system components, including recommendation engine and
article summarization.
• Key Tasks: Implement the NCF and CBF models, build user interfaces, integrate news
APIs, develop admin panel.
• Outcome: A functional prototype of FocusFeed ready for testing.
Phase 4: Testing
• Objective: Ensure system functionality, performance, and security.
• Key Tasks: Conduct unit, integration, performance, usability, and security testing.
• Outcome: Fully tested system, free of critical bugs, ready for deployment.
Phase 5: Deployment
• Objective: Deploy the system into the production environment.
• Key Tasks: Deploy on cloud infrastructure, set up CI/CD pipelines, ensure cross-
platform compatibility.
• Outcome: Live version of FocusFeed, accessible to users and providers.
Phase 6: Maintenance and Updates
• Objective: Continuously monitor and improve the system.
• Key Tasks: Monitor performance, fix bugs, provide updates, optimize security.
• Outcome: A maintained and evolving system with regular updates and high user
engagement.
21
04. SYSTEM DESIGN
22
4.2 DATA FLOW DIAGRAMS
23
Figure 4.2.3: DFD Level 2
24
4.3 ENTITY RELATIONSHIP DIAGRAMS
25
4.4 UML DIAGRAM
27
05. OTHER SPECIFICATION
5.1 ADVANTAGES
• Personalized Ingredient Analysis: Delivers tailored insights based on the user's health
preferences and uploaded product ingredients, enhancing decision-making for healthier
choices.
• Enhanced Product Understanding: Utilizes OCR and LLMs to provide a detailed
breakdown of product ingredients, allowing users to understand even complex labels quickly
and easily.
• Time-Saving Insights: Generates concise, easy-to-understand summaries of product
ingredients, ideal for users looking to make quick, informed decisions.
• Real-Time Updates: Processes images of ingredient lists in real-time, providing immediate
feedback on product suitability and potential healthier alternatives.
• Scalability: Built on cloud infrastructure, NutriCheck can efficiently scale to support a
growing user base and increasing product uploads without affecting performance.
• User Engagement and Retention: Offers personalized recommendations and saves user
interaction history, encouraging users to return and fostering long-term engagement.
• Health-Conscious Recommendations: Provides users with healthier product alternatives,
supporting their journey towards making more informed and nutritious choices.
5.2 LIMITATIONS
• OCR Accuracy: The quality of ingredient recognition may be impacted by poor image
resolution or unclear product labels, leading to less accurate analysis.
• Dependency on Language Models: The system's responses may vary in accuracy
depending on the complexity of the ingredient list and the performance of the underlying
language models.
• Limited Data Sources: The tool relies on predefined data sets for ingredient analysis and
health recommendations, which may not cover all products or regional variations.
• User Input Quality: The accuracy of the results is highly dependent on the quality of the
images uploaded by users, which may lead to incorrect recommendations.
• Processing Time: While designed for real-time processing, large images or highly detailed
ingredient lists may take slightly longer to analyze, affecting user experience.
28
06. CONCLUSION & FUTURE WORK
Conclusion:
NutriCheck is an innovative web tool designed to simplify ingredient analysis and promote healthier
lifestyle choices. By utilizing OCR and advanced language models, the system decodes complex
ingredient lists and provides users with clear, actionable insights. The tool empowers users to make
informed decisions about the products they consume, offering healthier alternatives when necessary.
With features like ingredient history tracking, NutriCheck not only educates users but also helps them
develop long-term, health-conscious habits in a world of often overwhelming product choices.
Future scope:
• Expanded Language Support: Enhance the OCR and LLM modules to support multiple
languages for a global user base.
• Mobile App Development: Develop a dedicated mobile application for easier access and
ingredient scanning on the go.
• Health Recommendations: Integrate personalized health recommendations based on user
dietary preferences or restrictions.
• AI-Driven Insights: Utilize advanced AI models to provide more detailed insights into potential
health risks of specific ingredients.
• Integration with Wearables: Sync with health-monitoring wearables to offer real-time dietary
advice based on user health data.
References:
1. Kim D, Kim S-Y, Yoo R, Choo J, Yang H (2024) Innovative AI methods for monitoring front-of-package
information: A case study on infant foods. PLoS ONE 19(5): e0303083.
https://doi.org/10.1371/journal.pone.0303083
2. Saltelli, A., et al. “Sensitivity Analysis as an Ingredient of Modeling.” Statistical Science, vol. 15, no. 4,
2000, pp. 377–95. JSTOR. Accessed 17 Oct. 2024.
https://www.jstor.org/stable/2676831
3. Kim JH, Kim TS, Yoon HJ, et al. Health risk assessment of dermal and inhalation exposure to deodorants
in Korea. Sci. Total Environ. 2018;625:1369–1379.
https://doi.org/10.4491/eer.2023.123
4. Pettersson, Tobias, Maria Riveiro, and Tuwe Löfström. "Multimodal fine-grained grocery product recognition
using image and OCR text." Machine Vision and Applications 35.4 (2024): 79.
https://github.com/Tubbias/finegrainocr.
29
5. Perkovic, Sonja. "Utilizing consumer-based label equity to signal consumer products free from endocrine-
disrupting chemicals." Journal of Retailing and Consumer Services 76 (2024).
10.1016/j.jretconser.2023.103611
6. V. C P, A. D, D. D. Kedilaya, S. S. Gondkar and S. Halhalli, "Online Analysis of Ingredient Safety,
Leveraging OCR and Machine Learning for Enhanced Consumer Product Safety," 2024 2nd International
Conference on Artificial Intelligence and Machine Learning Applications Theme: Healthcare and Internet
of Things (AIMLA), Namakkal, India, 2024, pp. 1-6, doi: 10.1109/AIMLA59606.2024.10531558.
https://ieeexplore.ieee.org/document/10531558
7. Wu, F., Li, Dingsheng, and Sangwon Suh. "Health risks of chemicals in consumer products: A review."
Environment international 123 (2019): 580-587.
https://doi.org/10.1016/j.envint.2018.12.033
8. Pastor-Nieto, María-Antonia, and María-Elena Gatica-Ortega. "Ubiquity, hazardous effects, and risk
assessment of fragrances in consumer products." Current treatment options in allergy 8 (2021): 21-41.
https://link.springer.com/article/10.1007/s40521-020-00275-7
9. Kim D, Kim S-Y, Yoo R, Choo J, Yang H (2024) Innovative AI methods for monitoring front-of-package
information: A case study on infant foods. PLoS ONE 19(5): e0303083.
https://doi.org/10.1371/journal.pone.0303083
10. Wu, F., Li, Dingsheng, and Sangwon Suh. "Health risks of chemicals in consumer products: A review."
Environment international 123 (2019): 580-587.
https://doi.org/10.1016/j.envint.2018.12.033
30