Saketha PDF
Saketha PDF
ON
AT
BY
2024-2025
i
CERTIFICATE
ii
DECLARATION
This is to certify that the Summer Internship entitled Artificial Intelligence and Machine
internship in the Google Academy. The report is based on the project work done entirely
2451-22-750-018
iii
ACKNOWLEDGEMENT
I would like to express my gratitude to all the people behind the screen who helped me to transform an
I would like to thank K. Padma , Asst Professor, Department of Computer Science and Engineering for
her technical guidance, constant encouragement and support in carrying out my project at college.
I profoundly thank J. Prasanna Kumar, Head of the Department of Computer Science and Engineering
who has been an excellent guide and also a great source of inspiration to my work.
I would like to express my heart-felt gratitude to my parents without whom I would not have been
privileged to achieve and fulfil my dreams. I am grateful to our principal Dr. Vijaya Gunturu who most
ably run the institution and has had the major hand in enabling me to do my project.
The satisfaction and euphoria that accompany the successful completion of the task would be great but
incomplete without the mention of the people who made it possible with their constant guidance and
encouragement crowns all the efforts with success. In this context, I would like thank all the other staff
members, both teaching and non-teaching, who have extended their timely help and eased my task.
iv
VISION AND MISSION
VISION
• To impart technical education of the highest standards, producing competent and confident
engineers with an ability to use computer science knowledge to solve societal problems.
MISSION
The Bachelor’s program in Computer Science and Engineering is aimed at preparing graduates
who will:
v
PEO-2: Practice life-long learning by pursuing professional certifications, higher education or
research in the emerging areas of information processing and intelligent systems at a global level.
vi
Program Outcomes (PO’s)
3.Design and Development of Solutions: Design and develop software solutions and
systems to meet specified needs, considering public health, safety, and environmental
concerns.
complex problems, design experiments, and interpret data to reach valid conclusions.
5.Modern Tool Usage: Select and apply modern tools, techniques, and resources for
6.The Engineer and Society: Assess societal, legal, health, safety, and cultural issues
12.Lifelong Learning: Recognize the need for lifelong learning and the ability to adapt
viii
Course Objectives:
1.To give an experience to the students in solving real life practical problems with all its constraints.
2. To give an opportunity to integrate different aspects of learning with reference to real life problems.
3.To enhance the confidence of the students while communicating with industry engineers
Course Outcomes:
2. Able to complete the task or realize a pre-specified target, with limited scope, rather than taking up a
3. Able to learn to find alternate viable solutions for a given problem and evaluate these alternatives with
ix
ABSTRACT
not limited to use of TensorFlow. Interns undertake different areas of work, including the
theory and practice course that are meant to promote practical application of the knowledge
gained.
Efficiency of the course will be presented by unlocking badges through the Google
Developer Profile successively which means earning proficiency in essential skills like object
By working in real-time with TensorFlow over the course of the internship, the
participant will have the capability of designing and deploying machine learning models
without a problem. They also resort to Google Collab, an online platform that provides an
environment for scientific computing and for writing scientific documents (which is called
flexible.
Kapotasteria program structure involves having mentorship meetings, group projects, and
code reviews together with other interns, aiming at a balanced learning experience and skill
building.
These skills are put to good use in the real world during the legal application of AI/ML.
During code reviews and presentations, communication and problem-solving skills are further
seen in action, thus preparing students for real-world workplace scenarios. Technology
x
employed in the program accurately reflects the industry standards and the emphasis on team
learning enables participants to be trained with skills and knowledge that are highly valued in
the modern AI/ML. At the end of the internship, trainers, with strong knowledge of the basic
them to solve these tasks. It is through the internship program that trainees are well equipped
xi
Table of Contents
Certificate ...................................................................................................................................ii
Acknowledgement .................................................................................................................... iv
Abstract ...................................................................................................................................... x
1.INTRODUCTION ................................................................................................................ 14
3.1 Add ML Kit Object Detection and Tracking API to the project .................................... 19
12
3.2 Add on-device object detection ...................................................................................... 21
5.1 Detect objects in images to build a visual product search with ML Kit:Android .......... 30
5.1.2 Add the dependencies for ML Kit Object Detection and Tracking............................. 30
CONCLUSION ........................................................................................................................ 48
LIST OF FIGURES
Fig 6.3 Interface of product image search app after connecting the two APIs ........................ 42
1.INTRODUCTION
14
Today's world needs understanding and handling of visual information in almost
every field, for instance: self-driving technology, e-health, e-commerce among others.
process visual information and has a number of applications that improve user and business
processes.
important computer vision problems namely: object detection, image classification and
product search. TensorFlow is a broad open-source library created by Google that allows the
content security. Besides that, visual product search has also gained importance especially in
online shopping where images are used to source for products instead of written text.
Within the scope of this project, we will consider the methods, algorithms,
will demonstrate the application of Tensor Flow in building effective and precise vision
15
industry-leading tools and platforms. Interns engage in hands-on projects covering key topics
such as supervised and unsupervised learning, neural networks, natural language processing,
gain technical expertise but also develop problem-solving and analytical skills necessary to
AI/ML professionals by providing in-depth insights into the industry and fostering a deep
16
2.Program neural networks with TensorFlow
This field focuses on teaching machines to understand and interpret visual data
involves applying filters to images, which helps in detecting patterns such as edges, textures,
and corners. These filters are small matrices that move across the image, capturing essential
features.
CNNs are deep learning models designed specifically for processing and analysing visual
(kernels).
● Pooling layers-reduce the dimensionality of the image data, retaining the most
important information.
● Fully connected layers-connect the final features to the output, making predictions
like classification.
17
CNNs are widely used in image recognition tasks because they efficiently capture
structures.
When working with large datasets, CNNs show exceptional performance but
require a lot of computational power. Key techniques for managing large datasets include:
● Transfer learning: Involves using pre-trained CNNs and fine-tuning them for
specific tasks. This saves time and computational resources while leveraging the
By employing CNNs and these techniques, developers can handle complex, large-scale visual
data, building systems that achieve high accuracy in image classification, object detection,
and more.
18
3.Get started with object detection
expertise to Android and iOS apps. You can use the powerful yet simple to use Vision and
Natural Language APIs to solve common challenges in your apps or create brand-new user
experiences. All are powered by Google's best-in-class ML models and offered to you at no
cost.
ML Kit's APIs all run on-device, allowing for real-time use cases where you
want to process alive camera stream, for example. This also means that the functionality is
available offline. This codelab will walk you through simple steps to add Object Detection
and Tracking (ODT) for a given image into your existing Android app. Please note that
3.1 Add ML Kit Object Detection and Tracking API to the project
Try out the Take photo button, follow the prompts to take a photo, accept the photo and
19
Fig 3.1 interface of the object fig 3.2 capturing photo in fig 3.3 captured
20
3.2 Add on-device object detection
In this step, you will add the functionality to the starter app to detect objects in
images. As you saw in the previous step, the starter app contains boilerplate code to take
photos with the camera app on the device. There are also 3 preset images in the app that you
can try object detection on if you are running the codelab on an Android emulator.
When you have selected an image, either from the preset images or taking a
photo with the camera app, the boilerplate code decodes that image into a Bitmap instance,
shows it on the screen and calls the runObjectDetection method with the image.
There are only 3 simple steps with 3 APIs to set up ML Kit ODT:
21
ML Kit follows Builder Design Pattern. You will pass the configuration to the builder, then
acquire a detector from it. There are 3 options to configure (the options in bold are used in
this codelab):
In this section, you'll make use of the result into the image:
22
3.5 Understand the visualization utilities
Bitmap This method draws the object detection results in detectionResults on the
Use the visualization utilities to draw the ML Kit object detection result on top of the
input image. Once the app loads, press the Button with the camera icon, point your camera to
an object, take a photo, accept the photo (in Camera App) or you can easily tap any preset
images. You should see the detection results; press the Button again or select another image
23
fig 3.5 detection result of a photo
24
4. Go further with object detection
In this unit, you'll learn how to train a custom object detection model using a set of
training images with TFLite Model Maker, then deploy your model to an Android app using
●Integrate a TFLite pre-trained object detection model and see the limit of what the
meal using a custom dataset called salad and TFLite Model Maker.
●Deploy the custom model to the Android app using TFLite Task Library.
Object detection is a set of computer vision tasks that can detect and locate objects in
a digital image. Given an image or a video stream, an object detection model can identify
which of a known set of objects might be present, and provide information about their
TensorFlow provides pre-trained, mobile optimized models that can detect common
objects, such as cars, oranges, etc. You can integrate these pre-trained models in your mobile
app with just a few lines of code. However, you may want or need to detect objects in more
distinctive or offbeat categories. That requires collecting your own training images,
running machine learning models on edge devices, including Android and iOS mobile
25
devices. TensorFlow Lite is actually the core engine used inside ML Kit to run
machine learning models. There are two components in the TensorFlow Lite ecosystem
that make it easy to train and deploy machine learning models on mobile devices:
●Model Maker is a Python library that makes it easy to train TensorFlow Lite models
using your own data with just a few lines of code, no machine learning expertise required.
Lite models with just a few lines of code in your mobile apps.
26
Here is an example of an output of the drawDetectionResult utility method.
The TFLite Task Library makes it easy to integrate mobile-optimized machine learning
models into a mobile app. It supports many popular machine learning use cases, including
object detection, image classification, and text classification. You can load the TFLite model
- Now contains methods for taking pictures and presenting object detection output.
You will add functionality for object detection within the application by filling out
input image.
27
It uses the object detection algorithm.
model is designed to be mobile efficient, and it's trained on the COCO 2017 data set.
● Add dependencies
● You will train a custom model to detect meal ingredients using TFLite Model
Maker and Google Colab. The dataset is composed of some labelled images of ingredients
28
Fig 4.2 accuracy of the predicted items
Developed an Android application that can detect objects in images, first by a TFLite
pretrained model, then train and deploy the learnt object detection model. You have utilized
TFLite Model Maker for model training and TFLite Task Library for its integration into the
application.
29
5.Get started with product image search
5.1 Detect objects in images to build a visual product search with ML Kit:Android
Have you seen the Google Lens demo, where you can point your phone camera at an
object and find where you can buy it online? If you want to learn how you can add the same
feature to your app, then this codelab is for you. It is part of a learning pathway that teaches
you how to build a product image search feature into a mobile app.
In this codelab, you will learn the first step to build a product image search feature:
how to detect objects in images and let the user choose the objects they want to search for.
You will use ML KitObject Detection and Tracking to build this feature.
Go to Android Studio, select Import Project (Gradle, Eclipse ADT, etc.) and choose the
starter folder from the source code that you have downloaded earlier.
5.1.2 Add the dependencies for ML Kit Object Detection and Tracking
The ML Kit dependencies allow you to integrate the ML Kit ODT SDK in your app.Go to the
app/build.gradle file of your project and confirm that the dependency is already there:
build.gradle
30
5.2 Add on-device object detection
In this step, you'll add the functionality to the starter app to detect objects in images. As you
saw in the previous step, the starter app contains boilerplate code to take photos with the
camera app on the device. There are also 3 preset images in the app that you can try object
When you select an image, either from the preset images or by taking a photo with the
camera app, the boilerplate code decodes that image into a Bitmap instance, shows it on the
In this step, you will add code to the runObjectDetection method to do object detection!
31
Upon completion, detector notifies you with
●trackingId: an integer you use to track it cross frames (NOT used in this codelab)
●labels: list of label(s) for the detected object (only when classification is enabled)
●text (Get the text of this label including "Fashion Goods", "Food", "Home
There is some boilerplate code inside the codelab to help you visualize the detection result.
a callback to receive the cropped image that contains only the object that the user has
tapped on. You will send this cropped image to the image search backend in a later codelab
to get a visually similar result. In this codelab, you won't use this method yet
32
fig 5.1 interface of the product image search app
33
6.Go further with product image search
Have you seen the Google Lens demo, where you can point your phone camera to an
object and find where you can buy it online? If you want to learn how you can add the same
feature to your app, then this codelab is for you. It is part of a learning pathway that teaches
you how to build a product image search feature into a mobile app.
In this codelab, you will learn how to call a backend built with Vision API Product
Search from a mobile app. This backend can take a query image and search for visually
Vision API Product Search is a feature in Google Cloud that allows users to search for
visually similar products from a product catalog. Retailers can create products, each
containing reference images that visually describe the product from a set of viewpoints. You
can then add these products to product sets (i.e. product catalog). Currently Vision
API Product Search supports the following product categories: home goods, apparel,
When users query the product set with their own images, Vision API Product Search applies
machine learning to compare the product in the user's query image with the images in the
34
retailer's product set, and then returns a ranked list of visually and semantically similar
results.
Now you'll add code to allow users to select an object from the image and start
the product search. The starter app already has the capability to detect objects in the
image. It's possible that there are multiple objects in the image, or the detected object
only occupies a small portion of the image. Therefore, you need to have the user tap
on one of the detected objects to indicate which object they want to use for product
search.
methods, including:
This is a callback to receive the cropped image that contains only the object that
the user has tapped on. You will send this cropped image to the product search
backend.
35
fig 6.1 interface of the product image search app
The onObjectClickListener is called whenever the user taps on any of the detected objects on
the screen.
It receives the cropped image that contains only the selected object.
●The logic for detecting objects and querying the backend has been split into 2
activities only to make the codelab easier to understand. It's up to you to decide how to
36
●You need to write the query image into a file and pass the image URI between
activities because the query image can be larger than the 1MB size limit of an Android intent.
●You can store the query image in PNG because it's a lossless format.
This codelab requires a product search backend built with Vision API Product Search.
Option 1: Use the demo backend that has been deployed for you
Option 2: Create your own backend by following the Vision API Product Search
quickstart
37
You will come across these concepts when interacting with the product search backend:
●Product Set: A product set is a simple container for a group of products. A product
●Product: After you have created a product set, you can create products and
●Product's Reference Images: They are images containing various views of your
products. Reference images are used to search for visually similar products.
●Search for products: Once you have created your product set and the product
set has been indexed, you can query the product set using the Cloud Vision API.
The product search demo backend used in this codelab was created using the Vision
API Product Search and a product catalog of about a hundred shoes and dress images. Here
38
You can call the Vision API Product Search directly from a mobile app by setting up a
GoogleCloud API key and restricting access to the API key to just your app.
To keep this codelab simple, a proxy endpoint has been set up that allows you to
access the demo backend without worrying about the API key and authentication. It receives
the HTTP request from the mobile app, appends the API key, and forwards the request to the
Vision API Product Search backend. Then the proxy receives the response from the backend
●Receive the product image IDs from the previous API call and send them to
Now you'll implement code to call the product search backend in a dedicated
class called ProductSearchAPIClient. Some boilerplate code has been implemented for you
●class ProductSearchAPIClient: This class is mostly empty now but it has some
●SearchResult.kt: This file contains several data classes to represent the types
You can find similar products to a given image by passing the image's Google
Cloud Storage URI, web URL, or base64 encoded string to Vision API Product Search.
Here are some important fields in the product search result object:
id}/locations/{location-id}/products/{product_id}
●product.score: A value indicating how similar the search result is to the query
40
You'll send a GET HTTP request with an empty request body to
The reference images of the demo product search backend was set up to
have public-read permission. Therefore, you can easily convert the GCS URI to an HTTP
URL and display it on the app UI. You only need to replace the gs:// prefix with
https://storage.googleapis.com/.
Next, craft a product search API request and send it to the backend. You'll use Volley
Go back to annotateImage and modify it to get all the reference images' HTTP URLs
Once the app loads, tap any preset images, select an detected object, tap the Search
button to see the search results, this time with the product images.
41
Fig 6.3 interface of product image search app after connecting the two APIs
42
7.Go further with image classification
In the previous codelab you created an app for Android and iOS that used a basic
recognized a picture of a flower very generically – seeing petals, flower, plant, and
sky.
To update the app to recognize specific flowers, daisies or roses for example, you'll
need a custom model that's trained on lots of examples of each of the type of flower you want
to recognize.
This codelab will not go into the specifics of how a model is built. Instead, you'll
learn about the APIs from TensorFlow Lite Model Maker that make it easy.
Install TensorFlow Lite Model Maker. You can do this with a pip install. The &>
/dev/null at the end just suppresses the output. Model Maker outputs a lot of stuff that isn't
immediately relevant. It's been suppressed so you can focus on the task at hand.
If your images are organized into folders, and those folders are zipped up, then if you
download the zip and decompress it, you'll automatically get your images labeled based on
This data path can then be loaded into a neural network model for training with
and you're good to go. One important element in training models with machine learning is to
43
not use all of your data for training. Hold back a little to test the model with data it hasn't
previously seen. This is easy to do with the split method of the dataset that comes back from
ImageClassifierDataLoader.
Model Maker abstracts a lot of the specifics of designing the neural network so you
don't have to deal with network design, and things like convolutions, dense, relu, flatten, loss
The model went through 5 epochs – where an epoch is a full cycle of training where
the neural network tries to match the images to their labels. By the time it went
through 5 epochs, in around 1minute, it was 93.85% accurate on the training data. Given
that there's 5 classes, a random guess would be 20% accurate, so that's progress!
Now that the model is trained, the next step is to export it in the .tflite format that a
mobile application can use. Model maker provides an easy export method that you can use —
44
fig 7.1 classification of an image
For the rest of this lab, I'll be running the app in the iPhone
simulator which should support the build targets from the codelab.
If you want to use your own device, you might need to change the
The model you created in the previous codelab was trained to detect5 varieties of flower,
45
For the rest of this codelab, you'll look at what it will take to
1.Open your ViewController.swift file. You may see an error on the ‘import
MLKitImageLabeling' at the top of the file. This is because you removed the generic image
import MLKitVision
import MLKit
import MLKitImageLabelingCommon
import MLKitImageLabelingCustom
It might be easy to speed read these and think that they're repeating the same code! But it's
2.Next you'll load the custom model that you added in the previous step. Find the
3.Find the code for specifying the options for the generic ImageLabeler. It's probably
giving you an error since those libraries were removed:let options = ImageLabelerOptions()
Replace that with this code, to use a CustomImageLabelerOptions, and which specifies the
...and that's it! Try running your app now! When you try to classify the image it should be
more accurate – and tell you that you're looking at a daisy with high probability!
46
fig 7.2 showing accuracy of the classification of an image.
47
CONCLUSION
comprehensive courses and hands-on projects that provided me with a strong foundation in
Starting with the AI Foundations course, I gained essential knowledge about the
fundamental concepts of AI and ML. This course covered critical topics such as supervised
and unsupervised learning, neural networks, and deep learning algorithms. Understanding
these key concepts has been instrumental in shaping my perspective on the growing impact of
which offered deeper insights into deploying machine learning models in real-world
frameworks. The hands-on experience from this course equipped me with the ability to build
tools, where I implemented a machine learning solution to a real business problem. This
project allowed me to apply everything I learned, from data preprocessing and model
Overall, this virtual internship has not only expanded my technical knowledge but
also solidified my passion for pursuing a career in artificial intelligence and machine
48
learning. The combination of theoretical learning and practical application through
to leverage this knowledge as I continue to explore the field and contribute to innovative
49