Intern Report - Example
Intern Report - Example
Bachelor of Technology
In
Submitted by:
Avvaru Srilakshmi
22F01A4604
Dr. M. Ramesh
Professor in CSE – CS
St. Anns College of Engineering and Technology
&
Ramesh, Professor, Department of CSE – Cyber Security, for all his support and valuable
Society, Vijayawada.
I also take this opportunity to express my deep felt gratitude to my External guide Mr. D Sai
Satish, CEO Indian Servers for his valuable guidance and advice throughout this work.
I would like to thank to the management who have given us the opportunities to work with
My Parents have been the moving spirit behind this work. This acknowledgement is only a
At the end, I would like to express my sincere thanks to Principal, Management and all those
who were involved directly in bringing the dissertation work to this final form.
(AVVARU SRILAKSHMI)
CONTENTS
Sl. No. Page No.
College Certificate
Industry Certificate
Acknowledgements
CHAPTER – 1 Introduction of the Industry
CHAPTER – 2 Training Schedule
Literature Survey
3.1 Computer Vision
CHAPTER – 3
3. 2 Convolution Neural Networks
3.3 Artificial Intelligence Models
Tasks
CHAPTER – 8 Observations
References
Name: Artificial Intelligence Medical and Engineering Researchers Society (AIMER Society)
Overview:
The Artificial Intelligence Medical and Engineering Researchers Society (AIMER Society)
stands as a premier professional organization at the forefront of the advancement of
Artificial Intelligence (AI) within the realms of medical and engineering research. This
esteemed society is committed to driving innovation and excellence in AI by fostering a
collaborative environment among researchers, practitioners, and students from diverse
backgrounds and disciplines.
The AIMER Society's mission is to serve as a catalyst for the development and application of
cutting-edge AI technologies that can address complex challenges in healthcare and
engineering. By creating a vibrant and inclusive platform, the society facilitates the exchange
of knowledge, ideas, and best practices among its members. This collaborative approach
ensures that AI research is not only innovative but also practically applicable, leading to real
world solutions that can significantly improve medical outcomes and engineering pro.
In pursuit of its mission, the AIMER Society organizes a wide array of activities and initiatives
designed to promote AI research and development. These include annual conferences,
symposiums, and workshops that bring together leading AI experts to discuss the latest
advancements and trends. Such events provide invaluable opportunities for networking,
collaboration, and professional growth.
Mission:
The mission of the AIMER Society is to promote the development and application of AI
technologies to solve complex medical and engineering problems, improve healthcare 3
outcomes, and enhance engineering solutions. The society aims to bridge the gap between
theoretical research and practical implementation, encouraging interdisciplinary
collaboration and real-world impact.
Objectives:
Key Activities:
Membership:
The AIMER Society offers various membership categories, including individual, student, and
corporate memberships. Members gain access to exclusive resources, networking
opportunities, and discounts on events and publications. The society encourages
participation from AI enthusiasts, researchers, practitioners, and organizations interested in
the advancement of AI technologies.
Leadership:
The AIMER Society is led by a team of experienced professionals and experts in the fields of
AI, medical research, and engineering. The leadership team is responsible for strategic
planning, organizing events, and guiding the society towards achieving its mission and
objectives.
Future Goals:
Contact Information:
-Email: [email protected]
3.1Computer Vision
Computer vision works much the same as human vision, except humans have a
head start. Human sight has the advantage of lifetimes of context to train how
to tell objects apart, how far away they are, whether they are moving or
something is wrong with an image.
Computer vision is used in industries that range from energy and utilities to
manufacturing and automotive—and the market is continuing to grow. It is
expected to reach USD 48.6 billion by 2022.
Scientists and engineers have been trying to develop ways for machines to see
and understand visual data for about 60 years. Experimentation began in 1959
when neurophysiologists showed a cat an array of images, attempting to
correlate a response in its brain. They discovered that it responded first to hard
9
edges or lines and scientifically, this meant that image processing starts with
simple shapes like straight edges.
At about the same time, the first computer image scanning technology was
developed, enabling computers to digitize and acquire images. Another
milestone was reached in 1963 when computers were able to transform two-
dimensional images into three-dimensional forms. In the 1960s, AI emerged as
an academic field of study and it also marked the beginning of the AI quest to
solve the human vision problem.
By 2000, the focus of study was on object recognition; and by 2001, the first
real-time face recognition applications appeared. Standardization of how visual
data sets are tagged and annotated emerged through the 2000s. In 2010, the
ImageNet data set became available. It contained millions of tagged images
across a thousand object classes and provides a foundation for CNNs and deep
learning models used today. In 2012, a team from the University of Toronto
entered a CNN into an image recognition contest. The model, called AlexNet,
1
0
significantly reduced the error rate for image recognition. After this
breakthrough, error rates have fallen to just a few percent.
1
1
Optical Character Recognition (OCR): The process of converting different
types of documents, such as scanned paper documents or PDFs, into
editable and searchable data. This is useful for digitizing printed texts.
3D Vision: Creating three-dimensional models from 2D images, which is
used in areas like virtual reality, augmented reality, robotics, and
autonomous navigation.
Neural Networks and Deep Learning: Utilizing neural networks, especially
convolutional neural networks (CNNs), which are particularly effective for
tasks involving image and video data due to their ability to capture spatial
hierarchies in visual information.
AI Computer Vision is applied in various industries, including healthcare,
automotive, retail, security, and entertainment, transforming how tasks are
performed and enabling new capabilities.
Image Preprocessing:
1
2
• Keypoint Detection: Finding and describing local features, such as
corners and blobs, which are invariant to changes in scale and
rotation.
Healthcare:
1
3
b) Pathology: Automated analysis of histopathological slides to identify
abnormalities and diseases.
c) Telemedicine: Using image-based diagnostics for remote consultations.
Automotive:
Retail:
Agriculture:
b) Special Effects: Enhancing movies and games with realistic visual effects
generated by AI.
Computational Resources:
1
5
1. Processing Power: Training deep learning models
requires significant computational resources, often necessitating
specialized hardware like GPUs.
1
6
Generalization and Bias:
inaccurate predictions.
Interpretability:
1
7
3.2 Convolutional Neural Networks (CNN)
Convolution Neural Networks (CNNs) are a class of deep neural networks that
are particularly effective for analyzing visual data. They are designed to
automatically and adaptively learn spatial hierarchies of features through back
propagation by using multiple building blocks, such as convolution layers,
pooling layers, and fully connected layers.
Img: CNN
1
8
Convolution Layer:
Pooling Layer:
Max Pooling: Reduces the spatial dimensions of the input by taking the
maximum value in each patch of the feature map.
Average Pooling: Reduces the spatial dimensions by taking the average value in
each patch of the feature map.
Dense Layer: Connects every neuron in one layer to every neuron in the next
layer, similar to traditional neural networks.
Output Layer: Typically a softmax activation function is used in the final layer
for classification tasks, giving a probability distribution over classes.
Dropout:
Object Detection:
Image Segmentation:
12
Facial Recognition:
Medical Imaging:
Disease Detection: Analyzing medical images like X-rays, MRIs, and CT scans to
detect conditions such as tumors, fractures, and anomalies.
Video Analysis:
13
Self-Driving Cars:
Robotics:
Navigation and Interaction: Enabling robots to navigate and interact with their
environment by recognizing objects and understanding their surroundings.
Style Transfer: Applying the artistic style of one image to another image.
Advantages of CNNs
ii. Translation Invariance: CNNs can recognize objects even if they are
translated or slightly transformed within the image.
iii. Scalability: They can be scaled to handle large datasets and complex tasks
14
Challenges and Limitations
15
3.3 AI Models
Overview:
Summarization involves condensing a longer text into a shorter version, capturing the
main ideas and essential information. This is useful for quickly understanding large
volumes of text, such as articles, reports, and documents.
Summarization Steps:
Generate Summary: Use the model to generate a summary of the input text.
16
CHAPTER – 4: Tasks:
Overview:
Question Answering (QA) systems extract answers from a given context based on a
posed question. QA models are trained to understand the context and locate the span
of text that answers the question.
Steps:
Select a Pre-trained Model: Choose a model designed for question answering, such
as BERT, RoBERTa, or DistilBERT.
Load the Model and Tokenizer: Initialize the model and tokenizer using the
transformers library.
Prepare the Context and Question: Input the context (passage of text) and the
question you want to answer.
Generate Answer: Use the model to find and generate the answer from the context.
17
Fill Mask Overview:
The fill-mask task involves predicting missing words in a sentence. It is commonly used in
language modeling and text completion. The model predicts masked words based on the
context provided by the surrounding words.
steps:
Select a Pre-trained Model: Choose a model designed for fill-mask tasks, such as
BERT, RoBERTa, or DistilBERT.
Load the Model and Tokenizer: Initialize the model and tokenizer using the
transformers library.
Prepare the Input Sentence: Input a sentence with a masked word (e.g., "Artificial
intelligence is [MASK] by machines.").
18
4.2 ChatBot
Chatbot means creating an interaction between human and AI. A human can directly
interact with AI with natural language . Here I developed a “Telegram Bot” using chat GPT,
API keys, and telegram etc…….
21
Send a /start command
5. It asks choose a name for your bot you need to give the name for your bot .
6. Again it asks a username for your bot you need o give a user name to your bot
7. It generate your telegram bot token you need to copy it In that it provide your bot
link also.
8. But it not worked because it doesn’t have any backend For that we use a python
code to it you can run the code in any python platform here I am using google colab
take a new notebook install the packages required and run the main code int that
code we need to change the telegram bot token that was generated by Bot Father
and also change the “Api key” with your system generated key. And then run the
code go to you bot ask something it will interact with you . It only can interact with
us only when code is running .
To connect to your bot you need to copy the token and place in TelegramBOT TOKEN
22
Now run the above cell. Generate API key and paste in genai.configure and run the cell.
23
It interacts in an attractive way it will answer to everything we ask. Finally it is my
“Telegram Bot”.
24
4.3 YOLO (you only look once)
YOLO, which stands for "You Only Look Once," is a state-of-the-art real-time object
detection system. YOLO have several versions like Yolov3, YOLOv5, YOLOv6, YOLOv8,
YOLOv9. YOLOv8 is the latest installment and it is better version compared YOLOv9
and all. YOLOv8 was developed by Ultralytics .
3. After that, you can upload minimum 500 images or you can upload a youtube
link and then we have to label all the images that we need to detect. All 500 images
we need ti label them correctly.
25
4. otherwise, we have an option called Universe Roboflow provides a number of
universe datasets that are already labelled. We can use that data sets also.
26
Then, you will be asked to invite collaborators to your workspace. These
collaborators can help you annotate images or manage the vision projects in your
workspace. Once you have invited people to your workspace (if you want to), you
will be able to create a project.
5. Select a Dataset you want and download the dataset and you must use
“YOLOv8” version then it can generate a code copy it. Then go to the AI model called
YOLOv8 you can train the model on colab, Kaggle etc.. you need to choose colab.
6. After that, training in colab you must connect with runtime GPU.
7. Then train the model by running the cells. you can custom the model here you
can change epoch rate also it means no. of iterations you need after that you can
inference the model.
8. you must need to download the Best.pt file after the iterations completed it
generates a file you must download it.
9. Finally it give the path like runs/detect/predict your output is there you check
and download it. Otherwise, there is a option to connect with our drive you can
connect with your drive and drag the out put to your drive.
27
Object Detection (YOLOV8)
The task of detecting instances of objects of a certain class within an image. Object
detection is a computer vision task that involves identifying and locating objects in
images or videos. It is an important part of many applications, such as self-driving
cars, robotics, and video surveillance. Over the years, many methods and algorithms
have been developed to find objects in images and their positions. The best quality in
performing these tasks comes from using convolutional neural networks.
One of the most popular neural networks for this task is YOLO, created in 2015 by
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi in their famous
research paper "You Only Look Once: Unified, Real-Time Object Detection".
Since that time, there have been quite a few versions of YOLO. Recent releases can
do even more than object detection. The newest release is YOLOv8, which we are
going to use in this tutorial.
The main features of this network for object detection is First, we will use a pre-
trained model to detect common object classes like cats and dogs. Then, I will show
how to train your own model to detect specific object types that you select, and how
to prepare the data for this process. Finally, we will create a web application to
detect objects on images right in a web browser using the custom trained model.
We can use the YOLOv8 network to solve classification, object detection, and image
segmentation problems. All these methods detect objects in images or in videos in
different ways, as you can see in the image below:
28
The neural network that's created and trained for image classification determines a
class of object on the image and returns its name and the probability of this
prediction.
For example, on the left image, it returned that this is a "cat" and that the
confidence level of this prediction is 92% (0.92).
The neural network for object detection, in addition to the object type and
probability, returns the coordinates of the object on the image: x, y, width and
height, as shown on the second image. Object detection neural networks can also
detect several objects in the image and their bounding boxes.
Finally, in addition to object types and bounding boxes, the neural network trained
for image segmentation detects the shapes of the objects, as shown on the right
image.
29
Object Detection using YOLOv8:
Adding Data:
Click “Create Project.” to continue.
Paste the youtube link and then click next it will process the video here the video
is nothing but a series of images here we will take 1 frame/second then click on
choose frame rate.
Now click on universe and click on self-driving there are 100 of datasets select
“vehicles Computer Vision Project” click on download this dataset. In format
select YOLOv8.
After selecting YOLOv8 click on continue. Copy the code now click on a notebook
from your model library select YOLOv8 after clicking YOLOv8 you can see Train on
colab on the right- side open train on colab. Here you can see the following
codes.
30
Now run the cell. This cell is used to check whether you have connected to
GPU(Graphics Processing Unit) or not.
Now run the above cell this cell works the current directory to home.
After running the above cell the result will be saved in runs/detect/predict.
31
Paste the video link in source After running the cell.
After running the cell you will find the result video in runs/detect/predict2.
32
After running the cell you can find the result video in your google drive.
33
CHAPTER – 5: Open CV Basics:
Video Capture and Analysis: OpenCV can capture video from cameras, video
files, or image sequences. It also supports real-time video processing, making it
useful for applications like video surveillance and motion tracking.
Object Detection: The library includes pre-trained models and functions for
detecting objects such as faces, eyes, and cars. It supports various object
detection algorithms, including Haar cascades, HOG + SVM, and deep learning-
based methods like YOLO and SSD.
Feature Detection and Matching: OpenCV can detect and match features
between images using algorithms like SIFT, SURF, and ORB. This is essential for
tasks like image stitching, 3D reconstruction, and object recognition.
34
Camera Calibration: OpenCV includes tools for calibrating cameras,
estimating camera parameters, and correcting lens distortion. This is critical for
applications that require precise camera measurements and 3D reconstruction.
GUI Features: The library provides simple functions to create graphical user
interfaces, allowing users to create windows, display images, and capture mouse
and keyboard events.
Applications of OpenCV
OpenCV is highly versatile and can be used in various industries and research
areas. Its wide range of functionalities and ease of use make it a popular choice
for both beginners and experienced practitioners in computer vision and image
processing
35
CHAPTER – 6: Object Tracking:
Object tracking in OpenCV involves detecting an object in a video frame and then
following it as it moves across subsequent frames. Here’s a conceptual overview of
how object tracking works in OpenCV:
Object Detection: First, the object of interest must be detected. This can be done
using various techniques like background subtraction, frame differencing, or using
pre-trained models (e.g., Haar cascades, YOLO).
Tracking: The tracker updates the position of the object in each new frame based
on its appearance and motion. Different tracking algorithms can be used depending
on the requirements and complexity of the task.
MedianFlow Tracker: Tracks the object by estimating the median of the flow
vectors, ensuring robustness to abrupt movements.
36
CSRT Tracker (Discriminative Correlation Filter with Channel and Spatial
Reliability): Provides higher accuracy and robustness to occlusions and variations in
scale.
MOSSE Tracker (Minimum Output Sum of Squared Error): Fast and efficient,
suitable for real-time applications but less accurate.
Conceptual Workflow
Initialization:
Detect the object to track and initialize the tracker with the object's bounding box.
Tracking Loop:
37
Detect the face using a pre-trained Haar cascade classifier.
detected face.
Track the face across subsequent video frames using one of the tracking algorithms.
Ease of Use: High-level functions and a wide range of tutorials and examples make
it easy to implement tracking.
By leveraging these features and understanding the workflow, you can implement
robust object tracking systems using OpenCV for various applications. If you have
specific questions or need guidance on implementing a particular tracking algorithm.
38
39
CHAPTER – 7: AI Talking Bot
Set Up Environment:
Install libraries and obtain API keys for translation and TTS services.
40
Translation:
Use a translation API to convert text from the source language to the target language.
43
Text-to-Speech (TTS):
Use a TTS API to convert the translated text into spoken words.
Integration:
Create a script or application that takes input text, translates it, converts it to speech,
and plays the audio.
Process Overview:
Input:
The user provides text in a specific language.
Translate:
The text is sent to a translation service to be translated into the desired language.
Convert to Speech:
The translated text is sent to a TTS service to generate an audio file.
Play Audio:
The audio file is played, allowing the robot to "speak" the translated text.
43
The output is:
Translation API:
Example: Google Cloud Translation
API. Function: Translate text
between languages. Text-to-Speech
API:
Example: Google Text-to-Speech, Amazon Polly.
Function: Convert text to spoken audio.
43
Key Points:
Google Cloud Translation API:
Translate text from one language to another.
Requires setting up a project on Google Cloud and enabling the Translation API.
Environment Variables:
GOOGLE_APPLICATION_CREDENTIALS should point to your Google Cloud service
account JSON file.
This example can be extended to include more languages, different TTS services, and
more sophisticated error handling and user interaction features.
43
CHAPTER –8: OBSERVATIONS
The Telegram Bot setup provides an intuitive way for users to interact
with AI using natural language directly via the Telegram platform.
Using BotFather simplifies bot creation, but connecting the bot with a
backend, such as a Python script on Google Colab, enables dynamic
interaction by linking to APIs.
The bot responds accurately as long as the Python code is running,
making it ideal for real-time interactions but limited by the runtime of
the backend environment.
To develop a generative AI robot that can interact in multiple
languages, you can follow these simplified steps:
CHAPTER –7: Learning Outcomes
Learning Outcomes:
43
CHAPTER –8: Conclusion and Future Extensions
43
References
1. Google Cloud. (2023). Cloud Translation API Documentation. @
https://cloud.google.com/translate/docs
https://learn.microsoft.com/en-us/azure/cognitive-services/translator/
Documentation @ https://cloud.google.com/text-to-speech/docs
43