3D Animation Using YoloV3
3D Animation Using YoloV3
ARTICLEINFO ABSTRACT
Article History: Simulating and demonstrating a dynamic system can be done in a variety of
ways. Our emphasis is on fabricating, comprehending and dragging the
Accepted: 05 May 2023
positive and negative attributes of three distinct methods that have been
Published: 16 May 2023
proportionate uncomplicated to implement with software that is both
inexpensive and freely available, ranging from amalgam of MATLAB, A few
well-known illustration players are Simulink, In conjunction with the
Publication Issue preloaded animation function in MATLAB, their peers animation tools
Volume 10, Issue 3 include Simulink 3D Animation, SolidWorks (basic), SolidWorks (Motion
May-June-2023 Manager), as well as Windows (Live) Media Maker. A MATLAB/Simulink
Motion Manager-based animation registry may be used for animation
Page Number creation. In this regard, the final SolidWorks data in the "Simulink 3D
Animation" must include information ingested throughout the MATLAB
159-164
environment and altered using the VRML Editor integrant in order to
initiate the creation of geometric constraints that will be represented as an
animation cyberspace sink block within the Simulink model of the dynamic
system. Every scenario may be addressed with a You Only Look Once
(YOLO) Version 3 prompt. These three distinct methods will be juxtaposed
and appraised, a benchmark challenge was formulated: A tandem parking
vehicle with four wheels and frontal steering.
Keywords—3D Animation, Simulink, VRML Editor.
Copyright: © 2023, the author(s), publisher and licensee Technoscience Academy. This is an open-access article distributed 174
under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial
use, distribution, and reproduction in any medium, provided the original work is properly cited
S. Thejaswini et al Int J Sci Res Sci & Technol. May-June-2023, 10 (3) : 174-180
methods that give computers the ability to detect and while image processing typically uses more traditional
understand images and videos much like people do. signal processing techniques.
Applications for computer vision systems include The method of producing animated three-
robotics, surveillance, autonomous vehicles, medical dimensional visuals in a computer setting is known as
imaging, and more. Computer vision is a difficult field. 3D animation. These images were produced using 3D
Field that involves a combination of techniques from software, which enables animators to produce digital
various areas of study, including mathematics, physics, things that appear three-dimensional despite being on
statistics, and computer science. Image segmentation, a two-dimensional surface. Animation professionals
spotting objects, image classification, and image can make anything appear to be moving evenly three
recognition represent a few of the pursue degrees dimensions, from a video game character to an
methods utilised within computer vision. automobile in an advertisement. This is done via the
Image Processing use of visual effects and exact timing.
The specialty of image processing, on the other YOLOv3 (You Only Look Once version 3) is primarily
conjunction, emphasises improving and altering used for object detection and tracking in 2D images
digital images to facilitate improved viewing or and videos. While YOLOv3 itself is not designed for
elucidation. It involves developing algorithms and 3D animation, it can be used as a tool to help
techniques that can remove noise, enhance contrast, automate the process of object tracking and motion
and sharpen images. Medical imaging, digital capture, which are important components of 3D
photography, and computer graphics are just a few of animation. In order to use YOLOv3 for 3D animation,
the applications that make use of image processing you would need to first detect and track the
techniques. Enhancing the visual quality of an image movements of real-world objects or people in a video
is the main objective of image processing or make it using YOLOv3. This can be done using YOLOv3's
easier to interpret. Some of the key techniques used in object detection and tracking capabilities, which can
image processing include image filtering, image identify and track objects in a video frame-by-frame.
enhancement, and image restoration. Once you have tracked the movements of the objects
Computer Vision vs. Image Processing or people in the video, you can use this data to drive
Despite the fact that image processing and computer the animation of corresponding 3D objects in a virtual
vision are remarkably comparable, there are also some space. This is typically done using motion capture
key differences between the two fields. The primary techniques, which involve mapping the movements of
difference is that computer vision is concerned with real-world objects onto digital 3D models.
extracting information from images to understand the
world around us, while image processing is concerned II. RELATED WORKS
with manipulating and improving images for better
visualization or interpretation. Another difference is A novel method of object detection was used in
the type of input data used in each field. Computer YOLOv1. Leveraging classifiers to detect objects has
vision typically uses image or video data captured by been done before. Using bounding boxes, the fact that
cameras or sensors, while image processing can also objects reside miles apart and are matched with the
work with other types of data, such as signals from correct class probabilities, beyond an object
medical equipment or satellite imagery. Finally, identification component, is done as a regression issue.
computer vision tends to use more advanced A single neural network employs bounding boxes,
techniques and algorithms, such as machine learning class probabilities, and just one analysis of all the data
and deep learning, to extract information from images, to create predictions. The unified paradigm or a
International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 10 | Issue 3 175
S. Thejaswini et al Int J Sci Res Sci & Technol. May-June-2023, 10 (3) : 174-180
single-stage detector are both as simple as YOLOv1. a single stage. Different to SSD, only one feature map
You Only Look Once version 3 (YOLOv3) is primarily is used for prediction and anchor box regression is
used for object recognition and tracking in 2D modified by using a sigmoid function to predict the
pictures and movies. A single neural network predicts center coordinates. Instead of VGG-16, the authors
multiple bounding boxes and class probabilities. propose Darknet-19 as base network. Darknet-19
While YOLOv3 itself is not designed for 3D comprises less parameters, which results in decreased
animation, it can be used as a tool to help automate inference time while the classification accuracy is
the process of object tracking and motion capture, comparable to VGG-16. To determine the anchor box
which are important components of 3D animation. In dimensions, k-means clustering is employed [2].
order to use YOLOv3 for 3D animation, you would The YOLO neural network integrates the candidate
need to first detect and track the movements of real- boxes extraction, feature extraction and objects
world objects or people in a video using YOLOv3. classification methods into a neural network. The
This can be done using YOLOv3's object detection YOLO neural network directly extracts candidate
and tracking capabilities, which can identify and track boxes from images and objects are detected through
objects in a video frame-by-frame. Once you have the entire image features. Traffic sign detection refers
tracked the movements of the objects or people in the to applying the candidate box extraction technique to
video, you can use this data to drive the animation of the input image to determine whether it contains
corresponding 3D objects in a virtual space. This is traffic signs and output their location. The data set we
typically done using motion capture techniques, used in this paper for training and evaluation comes
which involve mapping the movements of real-world from ImageNet. None of the images are tagged and
objects onto digital 3D models. Those boxes alongside they are independent from the ones used in pre-
other tasks, negating the need for a convoluted training. We selected 1,652 images containing traffic
pipeline with numerous steps [1]. signs as the data set. Then, 1,318 images were selected
YOLOv1 takes the whole image for detection and which form the standard training set, and the
after scaling the model divides the input image into N remaining 334 images were used as the test set. All
equal-sized grids. The object that requires the centre traffic signs in the images are labelled [3].
of the object must be positioned inside a specific grid To recognize the objects in real world images, shape
is determined by each cell or grid. The bounding box features are used. Since most of the objects are better
for every recognised object is composed of 5 variables, explained by their shape rather than texture, shape
including the bounding box's coordinate location, its features are compared to local features like SIFT.
height and width, and two additional values. the most Shape features are the supplement for the local
important value is the probability of the object features since the local features cannot avoid large
belonging to the detected class. Probability amount of background mess [4].
determines the likeliness whether the discovered
object belongs to a specific class. The identified object III.METHODOLOGY
is classified as a stored in the form of the number. The
output of the detected bounding box is stored in an The approach used in the field for achieving animated
array involving the 4 values of location along with objects and people in three dimensions (three-
detected class number and class probability. Thus, dimensional animation) occupies computer-generated
YOLO prediction has the total shape of N * N * no of imagery (CGI) and three-dimensional models. It
boxes * (5+Probability). involves designing and building 3D models, rigging
YOLOv2 performs localization and classification in them for animation, creating textures, lighting, and
International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 10 | Issue 3 176
S. Thejaswini et al Int J Sci Res Sci & Technol. May-June-2023, 10 (3) : 174-180
animating them in order to give the impression of "Frozen" and "Ice Age". This essential distinction can
depth and motion. be attributed to the technical procedures used to
produce 2D vs 3D animation. What distinguishes 3D
3D animation production can be a labor-intensive animation from 2D animation? The animator draws a
process broken down into several stages: series of images on a flat surface to produce a 2D
1. Pre-production: This stage involves animation. By slightly changing the position of the
conceptualizing and planning the project, animation figure between frames, which are then
including developing the story, characters, and quickly played back to produce a dynamic image, the
environment. This stage also includes creating illusion of movement is produced.
storyboards and animatics, which are rough In a variety of fields, such as film, television, video
visual representations of the animation. games, and advertising, 3D animation has emerged as
2. Modeling: This stage involves creating 3D models a crucial tool. Toy Story, Shrek, and The Incredibles
of the characters, objects, and environment using are just a few of the 3D animated films and TV shows
specialized software. The models can be created that have achieved great popularity. In the world of
from scratch or from pre-existing templates. video games, realistic and compelling surroundings
3. Texturing: This stage involves adding colour, and characters are made possible via 3D animation.
texture, and other details to the 3D models. Animations for cutscenes, trailers, and in-game play
Texturing can be done using various techniques, are all made in 3D by video game developers. The
including painting, sculpting, or applying pre- production of captivating and interesting commercials
existing textures. and promotional movies in the advertising sector uses
4. Rigging: This stage involves adding a digital 3D animation.3D animation allows advertisers to
skeleton or rig to the 3D models, which allows create highly realistic or stylized visuals, which can be
them to be animated. The rigging process used to demonstrate the features and benefits of a
involves creating bones, joints, and controls for product or service. Animation is possible because of a
the model. biological phenomenon known vision of perseverance.
5. Animation: This stage involves bringing the 3D The ability of the human eye to keep an image for a
models to life by animating them. The animator little length of time after it has vanished is known as
creates keyframes, which are the main poses or the persistence of vision, which is a biological
actions of the character, and then fills in the in- phenomenon. This phenomenon is a crucial
between frames to create smooth motion. component of how we perceive motion in the world
6. Lighting: This stage involves adding lighting and around us, and it is the basis for many visual
shadows to the scene to create the desired mood technologies, including film, television, and
and atmosphere. animation.
7. Rendering: This stage involves exporting the When an image is projected onto Upon impact with
animation from the software into a final video the retina of the eye, it causes chemical and electrical
file. reactions in the retinal cells, which convey messages
The graphics all appear flat because they only span to the brain. The viewer experiences an image as a
the x- and y-axes in a 2D universe. The z-axis, which result of these impulses. However, even after the
adds depth, is a crucial third axis in 3D animation. 2D image is no longer being projected onto the retina, the
animation may be seen in classic Disney animated cells continue to send signals to the brain for a
films like "Sleeping Beauty" and "Bambi". In fraction of a second. This creates a brief afterimage
comparison, examples of 3D animation include that appears to linger in the viewer's mind.
International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 10 | Issue 3 177
S. Thejaswini et al Int J Sci Res Sci & Technol. May-June-2023, 10 (3) : 174-180
This afterimage is what allows us to perceive motion • In order to reach the next section of the road,
in a collection of still photos, such those found in a we'll slightly turn the car.
movie or television show. The afterimages that result • The 'rotation' property of the 'Automobile' node
from showing a series of images quickly one after is set to achieve this.
another combine to provide the appearance of motion. • Accessing the Neural Networks based on the
This effect is called the phi phenomenon, and it is the YOLO real time objects.
foundation of animation.
The persistence of vision has been studied extensively
by scientists and artists throughout history. Some of
the earliest experiments were conducted by Dutch
scientist Christiaan Huygens in the 17th century, who
made the discovery that a spinning disc with
numerous slits would appear to have a continuous
image when viewed through the openings. The
methodology was later enhanced and tapped in the
beginning film projectors, which gave an illusion of
motion by whirling the shutter mechanism. The
longing of vision is a fundamental concept in many
fields today, such as aesthetics, psychology, and Fig 1: Vehicle Dynamics Visualization
neuroscience. It is also a fundamental part of several
contemporary technologies, such as digital projectors, IV. RESULTS AND DISCUSSIONS
LED displays, and virtual reality equipment.
rendering enlivened illustrations that appear to be in An average precision was used to measure and
two dimensions is known as "2D space animation." estimate the model. The object detection results are
This type of animation is often used in traditional obtained by using YOLOV3.
hand-drawn animation, as well as in digital animation
software such as Adobe Flash and Toon Boom
Harmony. In 2D space animation, images are created
on a flat plane, with no apparent depth or volume.
However, through the use of various techniques such
as perspective, shading, and layering, the images can
appear to have depth and movement.
Pseudo Algorithm:
• Creating one Virtual world
• Examine the Virtual World Properties
• Creating the YOLOv3 Version
• Performing the YOLOv3 real time virtual world Simulation in vehicles dynamics that 3D output in a
• Develop a VRNODE object connected to the synthetic camera attached to the car is processed
VRML node "Automobile" that symbolises a using a video processing. That space mouse is used to
representation of a vehicle on a road. control a manipulator in the virtual scene.
International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 10 | Issue 3 178
S. Thejaswini et al Int J Sci Res Sci & Technol. May-June-2023, 10 (3) : 174-180
VI. REFERENCES
International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 10 | Issue 3 179
S. Thejaswini et al Int J Sci Res Sci & Technol. May-June-2023, 10 (3) : 174-180
International Journal of Scientific Research in Science and Technology (www.ijsrst.com) | Volume 10 | Issue 3 180