0% found this document useful (0 votes)
438 views47 pages

Internship Report Hamro Patro Final

This is Hamro Patro's internship report.

Uploaded by

Manish Shrestha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
438 views47 pages

Internship Report Hamro Patro Final

This is Hamro Patro's internship report.

Uploaded by

Manish Shrestha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

TRIBHUVAN UNIVERSITY

INSTITUTE OF SCIENCE AND TECHNOLOGY


BHAKTAPUR MULTIPLE CAMPUS
Dudhpati, Bhaktapur

A Final Year Internship Report

on

CHAT APPLICATION AND NEWS FOR YOU


AT

HAMRO PATRO INC.

Under the Supervision of


Asst. Prof. Surya Bam
Bhaktapur Multiple Campus

Submitted by:

KRISHNA RIJAL (15144/074)

Submitted to:
Institute of Science and Technology
Tribhuvan University

In partial fulfillment of the requirement for the Bachelor’s Degree in Computer


Science and Information Technology

September 2022
SUPERVISOR’S RECOMMENDATION

I hereby recommend that the report prepared under my supervision by Krishna Rijal
(15144/074) entitled “Chat Application and News for You” in partial fulfillment of the
requirements for the degree of B.Sc. in Computer Science and Information Technology be
processed for evaluation.

………………………………..

Asst Prof Surya Bam

Project Supervisor

Department of Computer Science and Information Technology

Bhaktapur Multiple Campus


CERTIFICATE OF APPROVAL

This is to certify that this project Krishna Rijal [15144/074] entitled “Chat Application
and News for You” in partial fulfillment of the requirement for the degree of Bachelor of
Science in Computer Science and Information Technology (B.Sc. CSIT) has been well
studied. In our opinion, it is satisfactory in the scope and quality for the required degree.

…………………………………… ……………..……………………..
Supervisor Head of Department
Asst. Prof Surya Bam Mr. Sushant Poudel
Bhaktapur Multiple Campus Bhaktapur Multiple Campus

……………..…………………….. ……………..………………….
Internal Examiner External Examiner
Bhaktapur Multiple Campus Arjun Singh Saud
IOST
ACKNOWLEDGEMENT

First of all, I would like to express special gratitude to the Institute of Science and
Technology, Tribhuvan University for providing the opportunity to explore the interests
and ideas in the field of computer science through this internship project as a part of my
duty for the fulfillment of requirement of bachelor’s degree of Computer Science and
Information Technology.

I would like to thank Hamro Patro Inc. for providing this opportunity to undertake an
internship which was a great platform for learning and developing professionalism. I would
like to express my sincere gratitude to Mr. Shankar Uprety, CEO, Hamro Patro Inc.,
who helped me complete my internship. I would also like to express my gratitude to Er.
Aayush Subedi, Engineering Manager, Hamro Patro Inc. who has been a continuous
source of inspiration as my mentor during my internship period. My gratitude for his trust
and generosity goes beyond words.

Furthermore, I would also like to acknowledge with much appreciation the crucial role of
the staff of BSc. CSIT department, who gave permission to use all required equipment and
the necessary materials to complete the task. A special thanks goes to constant support from
our seniors and every teaching staff of BSc. CSIT department who helped me successfully
to complete this project. Many thanks go to the supervisor, Asst. Prof. Surya Bam, who
has invested his full effort in guiding me in achieving the goal.

Thank You.

Krishna Rijal (15144/074)

Bhaktapur Multiple Campus

September 2022

i
ABSTRACT

Hamro Patro Inc. provided an opportunity to be a part of their team and get involved in the
“Chat Application” and “News for You” project. The “Chat Application” was a part of
the learning phase to understand the tech stack used in the organization. “News For You”
aims to replace the existing news recommendation system of Hamro Patro application with
a more optimized recommendation system that communicates with the backend system to
collect overall information from the user. The news recommendation system generally
represents a methodology to recommend news to the users based on the content they prefer.
Clustering mechanisms is also implemented to cluster users based on the similarity in the
reading patterns.

Keywords: Chat, News recommendation, Recommendation system, Backend system

ii
TABLE OF CONTENTS
ACKNOWLEDGEMENTS …………………………………………………………….. i

ABSTRACT ……………………………………………...….………………………….. ii

LIST OF FIGURES…………………………………………..……………………........ v

LIST OF TABLES ……………………………………………………………..……… vi

ABBREVATIONS ………………………………..……………………………..……. vii

CHAPTER 1: INTRODUCTION ..................................................................................... 1

1.1 Introduction ................................................................................................................ 1

1.2 Statement of the Problem ........................................................................................... 2

1.3 Objective .................................................................................................................... 2

1.4 Scope and Limitations ................................................................................................ 3

1.5 Report Organization ................................................................................................... 3

CHAPTER 2: ORGANIZATIONAL DETAILS AND LITERATURE REVIEW ....... 4

2.1 Introduction to Organization ...................................................................................... 4

2.1.1 Organization Background .................................................................................... 4

2.1.2 Services Provided by the Organization .......................................................... 5

2.2 Organization Hierarchy .............................................................................................. 6

2.3 Working Domains of Organization ............................................................................ 6

2.4 Description of Intern Department .............................................................................. 7

2.4.1 Placement............................................................................................................. 7

2.4.2 Duration ......................................................................................................... 7

2.4.3 Company Culture and Working Structure ..................................................... 8

2.5 Literature Review .................................................................................................. 9

2.5.1. Study of the existing system .......................................................................... 9

2.5.2 Literature Review......................................................................................... 10

CHAPTER 3: INTERNSHIP ACTIVITIES .................................................................. 11

iii
3.1 Roles and Responsibilities ....................................................................................... 11

3.2 Weekly Activity log ................................................................................................. 12

3.3 Description of the Projects Involved During Internship .......................................... 15

3.4 Tasks/Activities Performed ...................................................................................... 17

3.4.1 Requirement Collection ..................................................................................... 18

3.4.1.1 Functional Requirement ....................................................................... 18

3.4.1.2 Non-Functional Requirements .............................................................. 19

3.4.2 Feasibility Analysis ...................................................................................... 20

3.4.3 System Design ............................................................................................. 20

3.4.3.1 Architectural Design .................................................................................. 20

3.4.3.2 Data Modeling using Class Diagram .................................................... 21

3.4.4 Implementation ............................................................................................ 26

3.4.5 Testing.......................................................................................................... 31

CHAPTER 4: CONCLUSION AND LEARNING OUTCOMES ................................ 33

4.1 Conclusion................................................................................................................ 33

4.2 Learning Outcomes .................................................................................................. 33

REFERENCES................................................................................................................. 35

APPENDICES .................................................................................................................. 36

iv
LIST OF FIGURES
Figure 2.1: Organization Hierarchy ………………………...…………………………….6

Figure 3.1: Use Case Diagram ……………………………….………………………… 19

Figure 3.2: Architectural design of the system …………….…………………………... 21

Figure 3.3: Class Diagram………………………………………………………...……..22

Figure 3.4: DFD Level-0…………………………………………………………...…... 23

Figure 3.5: DFD Level-1……………….…………………………………………...…...23

Figure 3.6: Activity Diagram…………………………………………………………... 24

Figure 3.7: Sequence Diagram……………………………………………..…………... 25

Figure 3.8: Clustering the TFIDF based model ………………………………………... 28

Figure 3.9: Clustering the Word2Vec based model …………………………………… 29

Figure 3.10: Parameters tuning for GridSearchCV ……………………………………. 30

Figure 3.11: Parameters tuning for RandomizedSearchCV ………………………….... 30

v
LIST OF TABLES

Table 2.1: Internship Duration ………….……………………………………………….. 8

Table 3.1: Weekly activity log ………………………………………............................. 12

Table 3.2: Test Case-1…………………………………………………………………... 31

Table 3.3 Test-Case-2…………………………………………………………………… 31

Table 3.4 Test Case-3……………………………………………………………….……32

Table 3.5 Test Case-4…………………………………………………………………… 32

vi
ABBREVIATIONS

Some of the abbreviations used in this report are shown below:

API: Application Program Interface

CF: Collaborative Filtering

CRUD: Create Read Update Delete

gRPC: google Remote Procedure Calls

HTTP/2: Hypertext Transfer Protocol/2

IDE: Integrated Development Environment

NEB: Nepal Education Board

NMC: Nepal Medical Council

QA: Quality Assurance

RPC: Remote Procedure Call

RS: Recommender System

SQL: Structured Query Language

TFIDF: Term Frequency – Inverse Document Frequency

vii
CHAPTER 1: INTRODUCTION

1.1 Introduction

An internship is a period of work experience offered by an organization for a limited period


of time. (contributors, 2022) Typically an internship consists of an exchange of services for
experience between the intern and the organization. In addition, an internship can be used
to build a professional network that can assist with letters of recommendation or lead to
future opportunities. Internships provide current college students with the ability to
participate in a field of their choice to receive hands-on learning about a particular future
career, preparing them for full-time work following graduation.

As an intern at Hamro Patro Inc., project entitled News for You was assigned. However, at
the starting days of my internship, as an intern, first approach was to research for the tech-
stack used in the organization. For that purpose, a project entitled, Chat Application was
assigned. Chat Application was a part of the learning phase project to get along with the
technologies and gain insight along with proper knowledge of the system. Chat Application
was solely intended for learning purposes only. The main purpose of this project was to
make communication between members within the organization.

News For You is a news recommendation system for all the news received within the
system. This recommendation system aims to optimize the news received by the system
and properly recommend the news to the user of the Hamro Patro application. The
Recommendation System aims to provide proper news according to the behavior of the
users. Nowadays, more and more news readers read news online where they have access to
millions of news articles from multiple sources. To help users find the right and relevant
content, news recommender systems (NRS) are developed to relieve the information
overload problem and suggest news items that might be of interest to the news readers.

1
1.2 Statement of the Problem

Hamro Patro provides online news services by collecting from different news portals. It
also offers news recommendations to the users, but the news is recommended according to
the current trend and breaking news. The existing system recommends the news to users
according to geographical region, and location. Among 1.4 million daily users of the
application, only a small percentage of user’s login to the system and read the news,
recommending news to those users who read news is very limited and is very difficult to
track and recommend.

Recommending news to the user requires a recommendation engine that works with the
help of data that are collected from the user. This data can consist of a type of news, genre
of news, news read from different news portals, time of reading, and so on. This different
kind of data can be collected and used to train the model and design an effective system to
recommend news to users that they want to read. However, due to a smaller number of
users logging in to the Hamro Patro app to read news, it becomes a challenging task to
process and provide an actual recommendation to the user. To make the recommendation
system work effectively, there should be good data analytics such that the data from users
can be used precisely to create a recommendation engine that helps to recommend news to
the user as per their interests.

1.3 Objective

The main objective of the projects is to simply provide recommended news to the users.
The challenge is to recommend the news users who haven’t created an account in the
system and continuously fetch the news on a daily basis. The main objects of the projects
are defined as follows:

● Recommend news content to the users based on their history of news.


● Implement the more automated recommendation system with more accuracy and
replacing the recommendation system based on the topics and current or highlighted
news.

2
1.4 Scope and Limitations

News For You basically is a project that recommends news to users. The following are the
project scopes:
● Understanding the architecture and workflow of the current recommendation
system.
● Understanding the technological stack used in the current recommendation system.

● Finding the best possible solution to recommend news to users.

● Making the backend system to communicate with the recommendation system.

● Collecting insights and data from the user in order to analyze and implement a better
recommendation system.

Although the project has a wide scope and is feasible to be implemented, there are some
limitations of the projects. Below listed are limitation of this project:

● Lack of data availability.


● Less number of users to keep track and to recommend news.
● Recommendation engine may not be accurate while recommending news due to
less insights.

1.5 Report Organization

The report contains information that will assist anyone in gaining a complete understanding
of our system. The system's working mechanism as discussed. The report is organized into
four chapters, each of which describes the system structure in turn. The following are the
chapters of our system:

The first chapter begins with a general overview of the project, including its problem
statement, objectives, scopes, and limitations and finally report organization. Likewise the
second chapter is dedicated to organization details and literature review. The third chapter
describes the methodology for gathering requirements and lays out the system needs. This
chapter also covers how to use class diagrams, sequence diagrams, and activity diagrams
to construct the proposed system for recommendation and weekly logs are also provided.
It also contains a feasibility analysis that covers technical, economic, and operational
feasibility. The fourth chapter discusses the project conclusion, including what lessons were
learned and recommendations.

3
CHAPTER 2: ORGANIZATIONAL DETAILS AND
LITERATURE REVIEW
2.1 Introduction to Organization

2.1.1 Organization Background

Smart Ideas Pvt. Ltd, also commonly known as Hamro Patro Inc. is a professional Software
Company located in Kathmandu, Nepal. Hamro Patro is one of the first Nepali apps to
include Nepali Patro, launched in 2010. Hamro Patro started with a Nepali Calendar mobile
app to help Nepalese living abroad stay in touch with Nepalese festivals and important
dates in Nepali calendar year. Later on, to cater to the people who couldn’t type in Nepali
using fonts like Preeti, Ganesh and even Nepali Unicode. Hamropatro built Nepali mobile
keyboard called Hamro Nepali keyboard. They believe in providing cost effective and
reliable services to their users and clients with their highly skilled team members. Hamro
Patro is currently the only Nepali app to reach the milestone of the highest rated app in
Nepal with over ten million downloads and more than one million daily active users.

Hamro Patro has a team of professional and expert developers and is always ready to serve
you with professional custom website design, web development, e-commerce website
design and development, community websites design, website redesign etc. and also the
maintenance service for your website. Their experts are well skilled and can build stylish
as well as elegant and easy navigated websites for you. Recently Smart Ideas has launched
its new feature called “Hamro Recharge” which allows users to easily transfer recharge
from abroad to Nepal.

Over the years of experience, Smart Ideas Pvt. Ltd has achieved a prominent position of a
high-level software company possessing some of the best analytical brains. Their
transparent, efficient, and flexible world class software development process zero downs
risks of project failures and creates powerful software solutions that meet present as well
as future demands.

4
2.1.2 Services Provided by the Organization

Hamro Patro is a mobile application which contains a wide range of features in multiple
areas. It has a lot of features within the same application. Some of the main features
provided by the applications are mentioned below:

● Hamro Nepali Keyboard: A keyboard for smartphones that allows users to type
in Nepali font as well as Romanized Nepali which is used for Nepali translation
using English words.

● Nepali Calendar and Patro: A service that shows all the dates, festivals, patro,
sacred ceremony timings and sun-moon alignment that fall within a calendar year.

● Hamro Recharge: A service to transfer recharge to mobile devices within Nepal


as well as from abroad.

● Hamro Patro Remit: A service to transfer remittance from foreign countries to


Nepal.

● Hamro Health: A doctor appointment platform through which users can book an
appointment to get health consultancy from NMC registered doctors.

● Hamro Learning Center: An education platform that provides video lectures for
different subjects taught from class 4 to class 12 falling under curriculum designed
by NEB.

● Online News: A service that recommends breaking news and news from different
news portals to the users.

● Hamro Gifts: An e-commerce platform that provides service to buy and send gifts
within Nepal and from abroad.

● Hamro Jyotish Sewa: A platform that provides astrological consultancy to the


users.

● Hamro Chautari: A video conferencing platform that can be used by users for
video as well as audio communication.

5
2.2 Organization Hierarchy

The organizational structure of Hamro Patro Inc. contains a main head CEO as a main
designation. Multiple departments and directors are there for handling the overall
organization. The hierarchy is given in picture below:

Figure 2.1: Organization Hierarchy.

2.3 Working Domains of Organization

Smart Ideas Pvt. Ltd is a youth driven and innovation charged organization. Smart Ideas
Pvt. Ltd has offered many opportunities and infrastructures required in order to understand
the technical aspects of the real-world scenario. They provide the skilled, experienced and
the best in the industry. They are constantly launching new features to the Hamro Patro
application. The main working domain of Hamro Patro Inc. is mobile application as well
as a desktop website where all the features like News, Rashifals, Remit, etc. are provided
within the same platform.

6
2.4 Description of Intern Department

Intern department in Smart Ideas Pvt. Ltd. consists of the whole intern member and their
respective assigned mentors. All the interns are divided into teams, each team consists of
project manager, developer, QA, and their engineering manager as a mentor. Each team is
assigned with a different project and all of them work in a team to achieve their goals in
designated time. The interns are always looked after by their mentors and all the queries,
and the choirs are discussed during daily stand ups. Weekly reports are also maintained in
order to be ahead of the time by mentioning what is to be done in that particular week.
Additionally, Open Project is used to track the progress of the report and also assign tickets
and also maintain the backlog.

2.4.1 Placement

The internship period was 6 months. The time was divided as 3 months for intern and 3
months for probation. During the time of internship we mainly were assigned a project
initially to learn the technology used in the organization and gain knowledge about the
working environment and get used to the overall working culture of the organization. My
role in the organization was based on the project that I was assigned to. Initially during the
starting period of internship, I was assigned a backend role for the project. The role was
initially assigned with the help and guidance of the mentors. The evaluation was performed
in three months.

2.4.2 Duration

The standard internship period fixed by the Tribhuvan University is six credit hours, which
is equivalent to ten weeks or three months or minimum 180 hours. However, the internship
period allocated by the company was for six months.

7
Table 2.1: Internship Duration

Time period 15th March 2022 – 15th June 2022

Days per week 5 days

Office Time 10am – 6pm

Working Hour 8 hours per day

Position Intern

Mentor Mr. Aayush Subedi

Average Working Hour in a Week 40 hours

Holidays Sundays and Saturdays

2.4.3 Company Culture and Working Structure

1. Design: Figma was used as a tool for designing initial wireframe and create UI/UX
design to be implemented for the project.

2. Frontend: The Project uses React as JavaScript framework for handling frontend
functions and components. For UI framework used within the component Material UI
was used.

3. Backend: For Backend, Java is the preferred programming language and Micronaut is
the base framework for Java programming language implemented using Gradle.
MongoDB is the preferred database of choice due to its simplicity. Docker is preferred
for containerizing the backend and frontend for efficient communication and testing
purposes. WebSocket is used for singleton communication and gRPC for bi-directional
communication. Bitbucket is used as version control to maintain git repository and
project status and logs.

8
4. Recommendation Engine: For recommendation engine, Python programming
language is used which includes Word2Vec for vectorization and K-means for
Clustering.

5. QA: Manual testing and automation testing was performed during the project. Manual
testing of every API call and function implementation was done by frontend and
backend developer respectively while automation testing was done by the QA engineer
associated for the project.

6. Project Progress Tracking: OpenProject was used for project management tool that
is preferred for keeping track of the project timeline and project structure.

7. Communication: Slack was used for means of communication among the team
members as well as with the mentor for review, tips and some sort of help.

2.5 Literature Review

2.5.1. Study of the existing system

Online news is one of the prominent features of Hamro Patro application. Hamro Patro
collects news from different news portals, organizes them and recommends news to the
user from this news collected. Recommendation here refers to the news notification sent
and the news that are shown in the news feed section of Hamro Patro application.
Notifications sent are mostly from the news portals that users select to get news from. Also
the notifications are provided from the choice of news genres such as: Sports, Politics,
Lifestyle and so on.

The recommendation system currently existing in Hamro Patro application is not


automated. Instead it works on the basis of users selecting the type and genre of news. Users
that have opted to get news from a specific new portal get news recommendations from the
same new portal and its news content. Apart from that breaking news are recommended to
almost all users who have selected to get notifications about it. The system currently lacks
an approach through which the news can be recommended to the users based on the news
they select to read and eventually recommend the best possible news that they might find
interesting to read and get notified.

9
2.5.2 Literature Review

Recommendation System (RS) has emerged as a major research interest that aims to help
users to find items online by providing suggestions that closely match their interests. (Singh
et al., 2021) Recommendation system emerges when the users are required to make
suggestions to the users based on their references. Recommendation systems recommend
an item to which a user prefers by using automatic information filtering method. It deals
with the detection and delivery of information that the user is likely to find interesting or
useful. It assists users by filtering the data source and delivers relevant information to the
users (Singhal et al., 2021).

Recommender systems use filtering algorithms to provide recommendations to users.


These algorithms are classified or categorized majorly into collaborative-based filtering,
content-based filtering, and hybrid algorithms. Collaborative Filtering (CF) refers to an
algorithm or technique that recommends items or products (in the case of e-commerce) to
users based on the past ratings of other users (with similar interest or preferences) on the
items or products collectively. It works by collecting users' feedback in the form of ratings
for items in a given domain and exploring similarities in rating behavior amongst several
users in determining how to recommend an item. This technique is subdivided into
neighborhood-based and model-based techniques. Content-based recommenders provide
recommendations by comparing representation of contents describing an item or a product
to the representation of the content describing the interest of the user (User's profile of
interest). They are sometimes referred to as content-based filtering. Hybrid algorithm
combines both content-based and collaborative-based techniques to produce separate
ranked lists of recommendations and then merge their results to produce a final list of
recommendations. (Philip et al., 2014)

10
CHAPTER 3: INTERNSHIP ACTIVITIES

3.1 Roles and Responsibilities

As an intern my responsibilities at Hamropatro Inc. was to first understand the


technological stack used, learn them, and implement the stack of technologies used to
perform backend programming. For backend, Java programming language was preferred,
and all the application systems were developed with the help of Java programming
language. The preferred framework for Java programming was Micronaut. Micronaut is a
software framework for the java virtual machine platform. It is designed to avoid reflection,
thus reducing memory consumption, and improving start times. Features which would
typically be implemented at run-time are instead pre-computed at compile time.

While gaining information about different tech stack, gRPC was a very important concept
for communication between frontend and backend. gRPC is a modern, open-source RPC
framework that can run anywhere. gRPC uses HTTP/2 for transport, Protocol Buffers as
the interface description language, and provides features such as authentication,
bidirectional streaming, and flow control, blocking or nonblocking bindings, and
cancellation and timeouts. Most common usage scenario includes connection service in a
microservice style architecture or connecting mobile device clients to backend services.
gRPC complex use of HTTP/2 makes it possible to implement a gRPC client in the browser,
instead requiring a proxy.

The responsibilities assigned to me as a backend developer are represented as follows:

● Creating services that can perform CRUD operations required for the app.
● Making API calls using gRPC to communicate with the frontend.
● Collaborate with frontend to integrate user-facing elements with server-side logic
● Use tools like MySQL, and SQL server to find, save, or change data and serve it to the
frontend.
● Participate in clearing the errors and bugs in the application and make it better for good
performance.
● To test out the data, BloomRPC is being used and also to test various APIs for
correction.

11
For the Recommendation project, my task was to study the existing system and propose a
better solution for the recommendation. News Recommendation System at Hamro Patro
Inc. is a manual system where each news is recommended based on the region, age, etc.
The sole purpose was to implement Clustering methodology to manage clusters of users
with similar interests and recommend news accordingly. Here are some more roles and
responsibilities in the ‘News For You’ Project:

● Study the existing system for recommendation, and research the possible way to
enhance the system for a more appropriate model for recommendation.
● Create a server using gRPC to communicate with the backend services.
● Implement suitable embedding methods for word embeddings and also vectorize the
content as well.
● Implement suitable clustering methods to cluster the content of the news.
● Train the model including hyperparameter tuning and test the belongings of the new
news.

3.2 Weekly Activity log

The weekly activity log of the system is given as follows:

Table 3.1: Weekly activity log

Weeks Task Done

Week 1 ● Understanding the basic technology and learning and


practicing the system for organization.
● Learning the basic implementation of Java programming
language, especially Micronaut framework.

Week 2 ● Use Bitbucket to store all the code


● Understanding the working flow of Bitbucket, how to work on
the team in the same project.
● Working on multiple features by creating the branch
● Studying and implementing what gRPC actually is with proper
implementation.

12
Week 3 ● Working on the server side of the Chat Application.
● Defining the initial proto file for the system.

Week 4 Working on admin section


● Implementing login system
● Defining Admin Services i.e. admin’s username and password
● Defining gRPC for admin

Week 5 Working on user section


● Defining the required UserServices in the system i.e. addUser,
updateUser, deleteUser, getAllUserExcept, getAllUser,
getJoinResponse, getLoginResponse
● Creating UserRepository
● Create User model
● Defining gRPC services for UserServices

Week 6 Working on message section


● Defining the MessageServices service i.e. getMessage and
saveMessage
● Defining MessageRepository
● Create Message Model
● Defining gRPC services for MessageServices

Working on chat room


● Creating Room model with roomId, senderId, receiverID
● Defining RoomServices i.e. getRoomId and isValidRoomId
● Defining RoomRepository i.e. getRoom, createNewRoom
● Defining gRPC services for roomServices

Week 7 Database
● Implementing Docker image of MongoDB for storage and
retrieval of the information
● Creating MongoMessageRepository i.e. getId, getCollection,
saveMessage, getMessage
● Create MongoUserRepository
● Create MongoRoomRepository

13
Week 8 Implementing client to test connection
● Testing the API via BloomRPC if communication is proper or
not
● Implementing the WebSocket for one-way communication in
client side
● Integrating frontend and backend system for testing

Week 9 New Project: News for You


● Finding general working of Recommendation systems
● Learn basic algorithms for recommendation
● Content based, Collaborative based and hybrid of both
● Research on basic Machine Learning Algorithms
● Supervised and Unsupervised Learning Algorithms
● Trying to implement some concept of scraping and also look
into database as well

Week 10 Identify the Requirements for the projects


● Implement Use Case Diagram
● Implement Entity Diagram
● Implement DFD Level-1 and Level-2 Diagram
● Analysis and possible implementation of all the algorithms

Week 11 ● Research and study possible implementation of gRPC in


Python programming language inclusive of examples
● Implementing gRPC Server to communicate with the backend
service to forward the news to frontend.
● Dummy database creation for the storage and access of data for
server communication testing

Week 12 ● Study of the available data and related fields of the data
received by the system.
● Working around with possible fields necessary for the data for
recommendation
● Testing the API for data retrieval.

Week 13 ● Research phase for best method to implement news for the
system.
● Considering The best approach for the recommendation of the
system.
● Data Cleaning is being done after properly analyzing the

14
available data.

Week 14 Best Model Identification


● Identify the best model for vectorizing the news content.
● Implementing TFIDF method for vectorization
● Identifying errors while vectorizing the data in TFIDF method.

Week 15 ● Research on Word2Vec embedding method for embedding


content to properly vectorize the content.
● Analyze and compare the result of both TFIDF and Word2Vec
embedding.

Week 16 ● Study of Clustering techniques


● Implementing K-Means Clustering in order to cluster the data
based on the content’s embeddings generated from the
Word2vec as well as TFIDF.
● Implementing Mini Batch K-Means Clustering in order to
cluster the data based on the content’s embeddings generated
from the Word2vec as well as TFIDF.

Week 17 ● Training the data for the project by identifying the news content
of the system.
● Saving the model

Week 18 ● For test data, merging all the news Content for an individual
user and performing the clustering operation.

3.3 Description of the Projects Involved During Internship

At the start of internship, just for getting used to the technology used inside the
organization, and to get along with the overall development environment of the Hamro
Patro Inc., as a practice, we were assigned a Chat Application Project. The project was
assigned with the intention of achieving more fluency in newer framework of Java
Programming Language, i.e. Micronaut. In addition to that implementation of gRPC for
communication, was another important part. For the Chat Application, after designing part,
the role for us was to implement according to the design. The following activities were
performed in Chat Application:

15
a) Implementing a static admin panel just for login includes defining admin services
i.e. username and password and also defining the proto for the application. It also
includes defining gRPC services for the admin.
b) In the User Section, required Users Services are defined. Some of the defined
services are addUser, updateUser, deleteUser, etc. It also involves creating a user
repository, user model as well as gRPC services.
c) On the Message Section, services like getMessage and saveMesssage are defined
including Message Repository, message Model and also gRPC services.
d) For chatroom, room services, room repository and gRPC services are also defined
in order to make a separate room for sender and receiver.
e) For Database purpose, for simplicity, docker image of MongoDB database was
implemented and separate repository for database operation for separate services
were also implemented.
f) For Communication, a web-socket was implemented for one-way communication.

“News For You” project is based on the user’s recommendation of news to the users based
on the content viewed by those individual users. The project's main scope is to deliver a
proof of Concept and also to replace the currently existing system.

Recommendation system is an essential concept for recommending news to the users. There
are 1.4 million daily users of the Hamropatro application. Usage of the application is
growing rapidly and increasingly. The recommendation system for the users would be an
additional to the users. Recommendation system was initially scoped to implement
clustering approach to make clusters based on the content viewed by the users to simplify
the methods and approach. The Recommendation aims to provide more fluent
recommendation to the users based on the overall news content available. The users-based
recommendation observes the preferences and recommends accordingly.

To further simplify the method, here are some steps that describes the overall News For
You projects:

a) Initially research on the possible method of the system which can be more efficient
and more lightweight and also replaces the manual recommendation system.
b) Gathering the news database from the Legacy app and performing data
preprocessing.

16
c) Tokenizing the data and removing all other unnecessary parts of the data to make it
ready for training.
d) Implementing Word2Vec methods to perform vectorization and generating
embeddings from the tokenized content.
e) Based on the embeddings, Clustering the content after Hyperparameter tuning by
K-Means to find the appropriate placement of an individual system.
f) For any new news content, after preprocessing and vectorizing the content,
identifying the similarity of the content with the prebuilt clusters and recommending
that particular content to the users present in the clusters.

3.4 Tasks/Activities Performed

Since I’ve worked on two projects, the roles in the projects are defined according to the
requirements of the project. My involvement in the projects is mentioned as below:

Backend

As a Backend developer, my role in the project entitled “Chat Application” was first to get
familiar with the technology and also to develop an API to communicate with multiple
services from the frontend. Creating services and implementing the particular services
based on the requirement was the main role. Also includes the testing of the API’s and
services for their proper usages was also a part of my role.

Machine Learning System

“News for You” project aimed to replace the existing system that recommends news with
a more Machine Learning oriented approach to recommend the news and to automate the
existing manual recommendation system of the news. My responsibility was more towards
research the best possible ways to implement news to the system. The aim was to implement
the simplest Machine Learning approach to recommend the news to the users based on the
system.

Existing system was composed of the simplest approach towards recommending news to
users based on geographics, age and trending news. Most of the system for recommendation
is based on manual systems and is based on push notification systems. Since the old
recommendation system is based on more manuals, replacing the system was the initial

17
system. Firstly, along with the project managers, mentors and the members of the other
teams, the initial recommendation of the project was determined. The brief approach of the
project is given as follows:

a) Requirement Collection
b) Design the approach to the system implementation.
c) Implement the methodologies to perform recommendations.
d) Test the possible approach for the system.
e) Implement the system to the main system.

3.4.1 Requirement Collection

In the requirement collection phase, the original system for recommendation was studied
and understood. Upon understanding the existing system, the possible functional and non-
functional requirements of the systems are determined that the system must meet. For
requirement collection following conditions are followed:

3.4.1.1 Functional Requirement

In functional requirements, analysis of the functional aspect or the behavior of the system
are studied. Functional requirement of a system specifies how the system should react to a
certain situation that it is put on and how the system comes up with the output to the given
input. Following are the functional requirements for “News For You” System:

a) Generate embeddings and vectorized docs


b) Train the model
c) Cluster the users based on the contents
d) Views recommendations for users

Use Case Diagram

It shows the interaction between the system and the user in a particular environment. The
use case model contains actors and the use cases. The actors are the external entities, and
the use cases are the system's functions.

18
Figure 3.1: Use Case Diagram.

3.4.1.2 Non-Functional Requirements

Accuracy: For the accuracy, the Word2Vec model is used for the better accuracy in
replacement for TF-IDF vectorization for better accuracy. The systems accuracy depends
on how well the word Embeddings are created and how well we are able to tune the
hyperparameters and how best the cluster predicts the content.

Efficiency: For more efficiency, the news contents are automated to recommend the news
to the preferred to the clusters of users.

Availability: The recommendations system will be available as an API. The end system
will be running with not as much downtime as possible.

19
3.4.2 Feasibility Analysis

Here, we have studied all the feasibility aspects of the project under consideration to check
out if the project is feasible with the decided requirements and availability of information,
technologies, and budget.

Technical
The First step will be data preprocessing and cleaning techniques. The Project is simply
based on data from the organization and implements elastic search to access the data. The
cleaning process is being implemented on my personal laptop.

Operational

Operational feasibility refers to solving problems and building new systems with the help
of a new proposed system. It takes the ideas and opportunities developed during the initial
phase and the insights from requirement gathering to build a new system. The proposed
system will be used to recommend news to the users based on their viewed content.

Economic

The project will only use the usual laptop specification for building the system making this
economically feasible. The heavy computing resources will not be needed after the system
is trained. The inference can be carried out in smaller computing devices too.

3.4.3 System Design

System Design focuses on how to accomplish the objective of the system. System Design
is to deliver the requirements as specified in the feasibility report. System Design is the
process of planning a new business system or replacing an existing system by defining its
components or modules to satisfy the specific requirements.

3.4.3.1 Architectural Design

The process of defining a collection of hardware and software components and their
interfaces to establish the framework for the development of a computer system. The
software that is built for computer-based systems can exhibit one of these many
architectural styles. The API communicates with the legacy app for recommending the
news to the users. The system takes raw text as an input and processes the test, cleans the
test to generate a list of tokens. Thus obtained tokens are then passed towards the
Word2Vec model and also generated the embeddings. The embeddings are then passed

20
through the K-Means Clustering model and generates the clusters labels recognizing in
which cluster the particular content belongs to. In such a way, any new news article will be
recommended to the users based on the cluster label.

Figure 3.2: Architectural design of the system

3.4.3.2 Data Modeling using Class Diagram

In the object-oriented approach, a class diagram defines and provides the overview and
structure of a system in terms of classes, objects, attributes, and methods and their
relationship. The class diagram can also be termed as a type of structure diagram which
provides a conceptual model and architecture of the system being developed.

21
Figure 3.3: Class Diagram of NFY System

The class diagram shows that we have classes like users, News For You, Embeddings, and
Clustering. Each of the classes has its individual properties and functionalities. An
individual content is recommended for the news based on the cluster label generated by K-
Means Clustering and forwarded accordingly.

3.4.3.1. Process Modeling

The overall operation of the system design part is represented with multiple levels of DFD
diagram. The diagrammatic representation of each level of the DFD diagram is given as
follows.

DFD Diagram

a. DFD Level-0 Diagram

In level 0 DFD, the whole News For You system is represented as the single process with
a single entity. Flow of data between the admin and NFY system are represented by
incoming and outgoing arrows.

22
Figure 3.4: DFD Level-0 diagram.

b. DFD Level-1 Diagram

The level-0 diagram is further expanded to the further subsections. The NFY system is
divided into further subsections like Login, Recommendation services, Data Preprocessing,
Clustering. Here databases for respective sub-processes are created such as MongoDB, HP
database and News Database.

Figure 3.5: DFD Level-1 diagram.

23
Activity Diagram

An activity diagram is essentially an advanced form of a flow chart that generally describes
the model's flow. The activity diagram follows a behavioral approach which shows the flow
from one activity to another from start to end.

Figure 3.6: News For You Activity Diagram

The activity diagram elaborates the flow of the whole system from the starting state to the
ending state. The activity starts with the input that the user provides to the system. The
input starts with the raw news content as an input. Initially the input will generate word
embeddings after performing the Word2Vec. Thus vectorized content will then generate
the cluster label and thus recommend news to those particular users based on the cluster
label.

24
3.4.4. Dynamic Modeling with Sequence diagram

The sequence diagram shows the interaction between objects in sequential order. It shows
how an object operates with one another and in what order. The following sequence
diagram depicts the flow of information in News For You.

Figure 3.7: Sequence Diagram of NFY system

The sequence diagram above describes the sequential interaction of the System from input
to output generated. The news content is passed through the system where the raw contents
are preprocessed which generates an embedding. Thus the generated embedded file is then
passed to the pretrained clustering module which provides the clustering labels. Thus
generated clustering labels help to recommend the news to the users.

25
3.4.4 Implementation

Tools Used:

Java Programming language being the main important tools for backend programming and
Python for Machine learning tasks, the tools used in while creating the projects are as
follows:
Backend

a) Java Programming Language: Java Programming Language is the base of the


programming language and a language of choice in backend programming. For Chat
Application projects, Java is used for the backend and Typescript as frontend.
Micronaut is the preferred framework for programming which is more advanced and
newer than its predecessors.
b) Docker: Docker is used as the choice for cross platform and cross device access and
communication as well as to match the dependencies across devices to run the project.

Machine Learning

a) Python: For all the machine learning tasks and responsibilities, python is preferred and
the language of choice because of its variations and wide range of available packages.
Python programming language is the most efficient as it provides various libraries such
as NumPy, Pandas, Word2Vec, K-Means Clustering and many more.
b) Server: For server communication from java backend with the News For You
Recommendation System, gRPC is being used in conjunction with protobuf. gRPC in
python supports the communication between backend services written in Java
programming language and recommendation engine.

IDE

a) Intellij: For Java Backend Intellij is the platform of choice as it provides more
support in programming with Java.
b) For machine learning projects and especially in News For You, Pycharm, Visual
Studio Code, Jupyter Notebook are the IDE of choice.

Other Tools

a) BloomRPC is used to test the communication with the API in Java. It helps to
communicate with the services and test the services with the data and based on the
condition.

26
For News For You, the implementation part contains following steps;

a) Datasets

New For You projects make use of the news database or the contents of the news available
in the Legacy Database. For the purpose of the recommendation, the news contents and the
news specification is being used. The backend services make the data available for further
processing. The Backend services will get the data from legacy database and dump the
database into the MongoDB database. The data is then provided to the Recommendation
engine for preprocessing.

b) Preprocessing

Since the data available in the project is somewhat in the raw and uncleaned format,
preprocessing the data is necessary. Since the recommendation system is solely based on
the contents of the news, the recommendation requires clean data by removing stopwords,
tokenizing and lemmatizing the data and this process is necessary. Individual users view
multiple news content during certain period of time. The user’s content is trained based on
all the news of an individual users after merging the contents of the news. All the merged
contents are preprocessed and tokenized.

c) Model Training

For training the contents of the news based on the user, simple approach of the vectorization
was used. Initially, TF-IDF was used to train the model. Upon training, the accuracy of the
content was not as expected based on the available data upon plotting the data. The scatter
plot of the contents on multiple clusters were not as distributed and not much accurate as
expected.

27
Figure 3.8: Clustering the TFIDF based model

Performing same method on the Word2Vec model was much more accurate and provides
more consistent clusters on the vectorized content.

28
Figure 3.9: Clustering the Word2Vec based model

d) Fine Tuning

Before clustering, finding the best parameter for the K-Means Clustering model was
performed. For Hyperparameter tuning, GridSearchCV and RandomSearchCV both were
tested to show which performs better. The types of parameters that were tested were as
follows:

29
Figure 3.10: Parameters tuning for GridSearchCV

Figure 3.11: Parameters tuning for RandomizedSearchCV

Based on the tuning, the obtained parameters are as follows:

GridSearchCV: <bound method Kmeans.score of Kmeans(max_iter = 100, _clusters=7,


n_init=4)>

RandomizedSearchCV: <bound method Kmeans.score of Kmeans(max_iter = 125,


_clusters=11, n_init=4)>

Based on the methods, GridSearchCV was the best parameters on the basis of the data
available.

e) Clustering

Since the model is based on the clustering, the dataset is first optimized and tuned for the
best selections of parameters for K-Means. Thus the obtained dictionary of parameters is
then implemented with clustering. The clustering results in a cluster label which is
supposed to recommend that particular news content to the users who are situated on that
particular cluster based on the content.

When an individual content arrives to the system, after all processing and clustering, thus
generated label decides to whom that news to be recommend.

30
3.4.5 Testing

Test Cases for Unit Testing:

Unit testing deals with the functional correctness of the system. Unit testing is a software
development process in which the smallest testable parts of an application, called units, are
individually and independently tested for proper operation. Unit testing can be done
manually but is often automated. Some test cases tested during individual functionalities
are as follows:

Table 4.1 Test Case-1

Test ID TC01

Title Get data from legacy application

Test Action Implement the API

Expected Result List of all the news with their parameters such as: ID and contents

Obtained Result News content with multiple parameters

Final Result PASS

Table 4.2 Test Case-2

Test ID TC02

Title Preprocessing

Test Action Read the file and ‘Stop words’ removal and tokenization

Expected Result Preprocessed content

Obtained Result Preprocessed content

Final Result PASS

31
Table 4.3 Test Case-3

Test ID TC03

Title Word Embedding Generation

Test Action Take tokenized content as input and return the embedded array

Expected Result Return embeddings as NumPy array

Obtained Result Obtained embeddings as NumPy array

Final Result PASS

Table 4.4 Test Case-4

Test ID TC04

Prerequisite Hyperparameter Tuning

Test Action Take embedded array as well as clustering model as input and
return the best parameters for clustering

Expected Result Return a python dictionary of best parameters for clustering

Obtained Result Obtained dictionary of best parameters for clustering

Final Result PASS

32
CHAPTER 4: CONCLUSION AND LEARNING
OUTCOMES

4.1 Conclusion

During my entire time of internship, all the senior developers and mentors were very
friendly and were totally into instructing the concepts required to complete the project.
They have treated me as one of the colleagues rather than an intern. Friendly working
environment also helped me to adapt and perform my tasks very efficiently. In a nutshell,
the internship at Hamro Patro was a really great experience. I am confident that all the
knowledge and experience that I have gained during this internship will be very much
beneficial for me in my career growth in near future.

Chat Application project was for learning purpose and to get to know with the technological
stack and basics of java programming language and especially Micronauts framework. The
project only includes the backend and includes the necessary services for communication
between sender and receiver. The application also includes the room services and also the
services for the database. News For You Project was the recommendation system based on
the content of news viewed by the users. The project was main intended to recommend
users based on the contents.

4.2 Learning Outcomes

The internship period was a prominent experience working in the organization and
provided the experience of real working environment of the organization. It was a great
opportunity to learn the disciplines, effort, hardships and moral necessary in the real
scenario of the working organization. During the period of internship, various interpersonal
and professional skills were learnt from Hamro Patro Inc. Some of them are listed below:
● Learn technologies and programming languages like Java, Micronaut, gRPC, web
socket, Python, etc.
● Create APIs to communicate with the frontend and interchange data.
● Perform research and find suitable outcomes and requirement for the recommendation
system.
● Manage and create project pipeline for recommendation system.

33
● Implement containerization of project using Docker for efficient execution.
● Implement python programming language to perform recommendations.
● Tools like Postman, BloomRPC for testing API’s working is always good for learning.

34
REFERENCES
contributors, W. (2022, Jult 16). Internship. From Wikipedia :
https://en.wikipedia.org/wiki/Internship

Philip, S., Shola, P., & Ovye John, A. (2014). Application of Content-Based Approach in
Research Paper Recommendation System for a Digital Library. In IJACSA) International
Journal of Advanced Computer Science and Applications (Vol. 5, Issue 10).
www.ijacsa.thesai.org

Raza, S., & Ding, C. (2022). News recommender system: a review of recent progress,
challenges, and opportunities. Artificial Intelligence Review, 55(1), 749–800.
https://doi.org/10.1007/s10462-021-10043-x

Singh, P. K., Kumar Dey, A., Choudhury, P., Kanti, P., Pramanik, D., & Singh, P. K.
(2021). Recommender systems: an overview, research trends, and future directions. In Int.
J. Business and Systems Research (Vol. 15, Issue 1).
https://www.researchgate.net/publication/339172772

Singhal, A., Rastogi, S., Panchal, N., & Varshney, S. (2021). Research Paper On
Recommendation System. http://sciplore.org/publications/2009Sc

35
APPENDICES

Figure: Hamro patro News Dashboard

Figure: Code Snippet of gRPC implementation

36
Figure: Code Snippet of vectorized docs

37

You might also like