Unit I-Ar Vr-Studt
Unit I-Ar Vr-Studt
1
3D Graphics
3D computer graphics represent the creation and rendering of 3D objects and scenes
using computer software.
The images are typically displayed on a 2D screen, such as a computer monitor or a TV.
Generated for various purposes, including movies, video games, architectural
visualizations, and more.
This provide depth and a sense of realism,
The user does not have an immersive experience or interact directly with the virtual
objects instead of 3D.
To summarize,
Virtual Reality (VR) involves wearing a headset and being fully immersed in a virtual
environment,
while 3D computer graphics refer to the creation of three-dimensional objects and
scenes displayed on a 2D screen.
VR provides an interactive and immersive experience,
whereas 3D computer graphics are primarily visual representations
2
1. HMDs [Head Mounted Displays]:
It is a display that consists of two screens that display the virtual world in front of the
users.
They have motion sensors that detect the orientation and position of your head and adjust
the picture accordingly.
It is built-in headphones or external audio connectors to output sound.
Moreover, they have a blackout blindfold to ensure the users are fully disconnected from
the outside world.
2. Computing device:
3. Sensor(s):
4. Audio systems:
Audio ensuring a great VR experience in which users’ brain is forced to think like they
are in that artificial world.
They are mostly integrated inside the HMD.
VR provides spatial audio, so the users feel how real the virtual world is.
5. Software:
3
1.6 Introduction to Augmented Reality [AR]
Definition: It is an interactive experience that combines the real world and computer-generated
content.
The content can distance multiple sensory modalities, including visual, auditory, etc.
System structure of AR
1. User: The user can be a student, doctor and employee. This user is responsible for
creation of AR models.
4. Virtual Content: The virtual content is nothing but the 3D model created or generated
by the system or AR application.
Virtual content is type of information that can be integrated in real world user’s
environment.
This Virtual content can be 3D models, texture, text, images etc.
5. Tracking: This helps to determine the device where to place or integrate the 3D model
in real world environment.
6. Real-life entity: These entities can be tree, book, fruits, computer or anything which is
visible in screen.
AR application does not change position of real life entity. It only integrate the
digital information with this entities
4
1.7 System Structure of Augmented Reality
History of AR
Retail: consumers use store's online app to see how products, such as furniture, will
look in their own homes before buy.
Entertainment and gaming: AR can be used to overlay a virtual game in the real world
Navigation: AR can be used to overlay a route to the user's destination over a live view
of a road.
AR used for navigation can also display information about local businesses in the user's
immediate surroundings.
Tools and measurement: Mobile devices can use AR to measure different 3D points
in the user's environment.
Architecture: AR can help architects visualize a building project.
Military: Data can be displayed on a vehicle's windshield that indicates destination
directions, distances, weather and road conditions.
Archaeology: AR has aided archaeological research by helping archaeologists
reconstruct sites.
3D models help museum visitors and future archaeologists experience an excavation
site as if they were there.
5
1.8 Key Technology in AR
Hardware
Hardware for AR a processor, display, sensors and input devices.
Modern mobile computing devices: smartphones and tablet computers which often include
a camera
The main 2 AR Techniques are 1) diffractive waveguides and 2) reflective waveguides.
Display
optical projection systems, monitors, handheld devices, and display systems, which are
damaged on the human body.
HMD
HMD is a display device worn on the forehead, such as a harness or helmet-mounted.
HMDs place images of both the physical world and virtual objects over the user's field of
view.
Modern HMDs often employ sensors for six degrees of freedom monitoring that allow the
system to align virtual information
HMDs can provide VR users with mobile and collaborative experiences.
Specific providers, such as uSens and Gestigon, include gesture controls for full virtual
immersion.
Eyeglasses
AR displays can be rendered on devices resembling eyeglasses.
Employs cameras to intercept the real world view and re-display its augmented view
through the eyepieces
Contact lenses
Contact lenses that display AR imaging are in development.
Embedded into the lens including integrated circuitry, LEDs and an antenna for wireless
communication.
The first contact lens was in 1999 by Steve Mann and was intended to work in combination
with AR spectacles
EyeTap: Captures rays of light that would otherwise pass through the centre of the lens of the
wearer's eye, and substitutes synthetic computer-controlled light for each ray of real light.
6
Handheld
A Handheld display employs a small display that fits in a user's hand.
Projection mapping
Use the scenes of special displays such as monitors, head-mounted displays or hand-held
devices.
Projection mapping makes use of digital projectors to display graphical information onto
physical objects.
The key difference in projection mapping is that the display is separated from the users of
the system.
Tracking
Digital cameras, optical sensors, accelerometers, GPS, gyroscopes, solid state compasses,
radio-frequency identification (RFID).
These technologies offer varying levels of accuracy and precision.
Implemented in the ARKit API by Apple and ARCore API by Google to allow tracking for
their respective mobile device platforms.
Networking
Gaining popularity because of the wide adoption of mobile and especially wearable devices.
This requires computationally intensive algorithms with extreme latency requirements.
To compensate for the lack of computing power, offloading data processing to a distant
machine is often desired.
Input devices
Speech recognition systems that translate a user's spoken words into computer instructions
Gesture recognition systems that interpret a user's body movements by visual detection or
from sensors embedded
The peripheral device such as a wand, stylus, pointer, glove Products which are trying to
serve as a controller of AR
The computer analyses the sensed visual and other data to synthesize and position
augmentations.
Computers are responsible for the graphics that go with AR.
Augmented Reality uses a computer-generated image which has a striking effect on the way
the real world is shown.
Projector
The projector can throw a virtual object on a projection screen and the viewer can interact
with this virtual object.
Projection surfaces can be many objects such as walls or glass panes.
It is a data standard developed within the Open Geospatial Consortium (OGC), which
consists of XML.
Describe the location and appearance of virtual objects in the scene, as well as ECMAScript
bindings to allow virtual objects.
Development
AR in consumer products requires the design of the applications and the related constraints
of the technology platform.
The user and the interaction between the user and the system, design can facilitate the
adoption of virtuality
Environmental/context design
Context Design focuses on the end-user's physical surrounding, spatial space, and
accessibility that may play a role AR system.
Designers should be aware of the possible physical scenarios the end-user may be in such
as:
Interaction design
Interaction design in augmented reality technology centres on the user's engagement with
the end product to improve the overall user experience and enjoyment.
The purpose of interaction design is to avoid alienating or confusing the user by organizing
the information presented.
Visual design
Visual design is the appearance of the developing application that engages the user.
Developers may use visual cues to inform the user what elements of UI are designed to
interact with and how to interact.
1.9 3D Vision
“3D Vision,” depth perception is dependent on the ability to use both eyes together at the
highest level.
3D vision relies on both eyes working together to accurately focus on the same point in
space.
The brain is then able to interpret the image each eye sees to create your perception of depth.
Deficiencies in depth perception can result in a lack of 3D vision or headaches and eyestrain
during 3D movies.
Discomfort: Viewer to focus their eyes either in front of the screen it can result in
headaches, eyestrain, or fatigue.
Dizziness: Often times people will complain of dizziness or nausea after viewing 3D
material.
One reason for these complaints is that some of the technology used to create 3D can
worsen Visual Motion Hypersensitivity (VMH).
Depth: People who are unable to use both eyes together to achieve binocularity will not
see the depth of 3D content.
Marker-based AR:
QR codes to trigger the display of digital content when detected by a camera-equipped
device, like a smartphone or AR headset.
When the marker is recognized, can be covered digital objects, animations onto the marker's
location in the real world.
Projection-based AR:
Projection-based AR projects digital content onto physical surfaces or objects in the real
world using projectors.
This approach is often used in interactive installations, art exhibitions, and marketing
campaigns to create VE
9
SLAM AR (Simultaneous Localization and Mapping)
This approach allows AR devices to place digital objects accurately in the user's
environment and maintain their position as the user moves.
Head-mounted AR:
AR apps for smartphones and tablets use the device's camera and sensors to provide AR
experiences.
They can recognize images, surfaces, or objects and overlay digital content on the
device's screen.
Platforms like Apple's ARKit and Google's ARCore provide tools and frameworks for
developing AR applications
Web-based AR:
10
1.11 Alternative Interface Paradigms
Gesture-Based Interfaces:
Allow users to interact with digital objects in the real world using hand gestures or body
movements.
Devices like Microsoft's HoloLens and Leap Motion have popularized this approach.
Users can control, manipulate, or select virtual objects by performing specific hand or
body movements
Voice Commands:Voice commands in AR enable users to interact with digital content and
applications using natural language.
Siri, Google Assistant, and AR-specific voice command systems can recognize and
respond to spoken instructions.
It is useful for hands-free and eyes-free interaction in situations where touch or gesture-
based input.
Eye-Tracking Interfaces:
Eye-tracking technology can be integrated into AR headsets to determine where a user
is looking.
Used for selecting objects, adjusting focus, or providing additional information based
on the user's observation.
Enhance user engagement and streamline interactions in AR applications.
Haptic Feedback:
Haptic feedback provides physical sensations to the user,
enhancing the sense of touch in AR experiences.
It can be delivered through vibrations, force feedback, or other tactile feedback
mechanisms.
In AR, haptic feedback can simulate the feeling of interacting with virtual objects,
providing a more immersive experience.
Augmented Reality Mark-up Language (ARML):
designed for creating AR experiences that include interactive 3D models, animations,
and information overlays.
11
It allows developers to define the structure and behaviour of AR content.
Used to create interactive AR app that respond to user actions, such as selecting,
moving, resizing virtual objects.
1.12 Spatial AR
A type of AR technology that combines virtual and real objects by projecting virtual
images onto real objects
12
13
1.14 3D Position Trackers
3D Tracking:
Like vehicle GPS navigation, coordinate data are used to show where an object is in 3D
space, and where it needs to go next.
With vehicle GPS navigation, the car is moving and the destination is fixed.
A map supplies the travel route.
Only a 2D view of the vehicle’s position is available, and only movement along the
longitude and latitude lines (X and Y )
Altitude info (Z-axis) is missing, as is rotational data – only two degrees-of-freedom are
reported.
A view that limited cannot support complex OEM surgical navigation applications
(OEM:Original Equipment Manufacture)
The Polaris optical measurement solution, and Aurora and 3D Guidance
electromagnetic (EM) tracking solutions capture the position (X-Y-Z coordinate data)
Position and orientation measurements also refer to the “degrees of freedom” (DOF) in
which an object moves in 3D
There are six degrees of freedom in total; NDI’s solutions capture all six degrees of
freedom in real-time.
This type of technology, known as 3D measurement, spatial measurement,
or 3D motion tracking, can be used for real-time tool tracking and navigation purposes.
14
1.15 Types of 3D Trackers
1 Single-Point Tracking
A single-point tracker refers to tracking an object using a single point of reference within a
composition.
The software is given a single point in a clip to focus on and tracks the movement of the
camera around that single point.
2. Two-Point Tracking
Allows you to apply two different points of motion tracking to an image and track more
than one type of movement.
Apply 2 separate tracking points to an image and use each one to track a different type of
motion.
3. Four-Point Tracking
Four-point tracking allows you to track each corner of a four-point surface throughout a
shot (such as a smartphone screen).
4. Planar tracking
Planar tracking is a method that looks at multiple track points at the same time to
calculate and assume a plane, rather than looking at each track point
5. Spline tracking
This complex motion tracking technique is one of the most accurate of all techniques, but
comes with a significant learning curve.
This allows you to trace around an object that you want to track instead of focusing on a
single or set of points.
6. 3D Camera Tracking
The 3D tracking tool automatically generates dozens of possible motion tracking points,
then allows the user to select which points they would like to track.
This takes a lot of the manual labor out of setting tracking points, but does require quite a
bit of time and processing power to use.
3D camera tracking, a feature of After Effects that has gained mounting popularity by
hobbyists and pro-visual effects artists alike.
15
1.16 Navigation and Manipulation Interfaces
Gesture recognition is the process by which gestures formed by a user are made known
to the system. In completely immersive VR environments, the keyboard is generally not
included, and some other means of control over the environment is needed
16
Fig 2: Hand points
A rough overview of the main parts that make up the framework is shown in Fig1. The
framework supports an easy swap of the Hand Tracker component, allowing many hand
tracking solutions to be used with the framework. The Gesture Recorder allows recording and
saving gestures that are then stored in a set of known gestures.
These gestures are then recognized by the Gesture Recognizer by comparing stored data
with live data from the hand tracker. A Gesture Interpreter is used to communicate with the
desired application using events that inform when gestures are performed. In the above picture
shown all the sides
The hand detection in the proposed framework is realized using a hand tracker built into
an HMD. The tracking part is crucial and serves as an entry point to the gesture recognition
framework as it is responsible for accurate pose matching.
The hand tracker primarily used in the development of the framework provides 23 points
for each hand (see Fig 2. Other configurations are supported as well. The more points a tracking
device provides, the more fine-grained the gesture recognition is requiring two primary features
for recognizing one-handed gestures
Hand poses and shapes can be stored for later recognition of static hand gestures, e.g.,
while a user is performing the hand movement, it can be matched against a predefined set of
hand shapes. In order to increase the recognition performance for users with hands below or
above the average human hand size, the hand positions are adjusted by a scaling factor. This
normalization is necessary to increase the recognition accuracy for gestures that were recorded
by a different user than the user performing the gestures. To achieve this, each joint position is
divided by the hand scaling factor.
17
The raw hand shape does not take hand rotation or orientation into account and therefore
has rotation invariance. Performing a hand shape that resembles a “thumbs up” will therefore
be indistinguishable from a “thumbs down”, but it should likely have a different meaning.
Furthermore, for some gestures, it is necessary to know whether the hand is facing the user.
Gesture input and output devices are technology tools that enable users to interact with
computers, smartphones, and other digital devices through hand and body movements. These
devices can be categorized into various types based on their functions and capabilities. Here
are some common types of gesture input and output devices:
Touchscreens: Touchscreens are one of the most common gesture input devices, allowing
users to interact with a device by tapping, swiping, pinching, and zooming using their fingers
or a stylus.
Motion Controllers: Motion controllers, such as the ones used with gaming consoles like the
PlayStation Move or the Xbox Kinect, capture the user's hand and body movements to control
on-screen actions in games and applications.
Gesture Recognition Cameras: Devices like the Kinect for Xbox and various webcams
equipped with gesture recognition software can track hand and body movements to control
applications and games.
Gesture Gloves: Specialized gloves with embedded sensors can detect hand and finger
movements, making them suitable for virtual reality and augmented reality applications.
18
Inertial Measurement Units (IMUs): IMUs are sensors that can be attached to various body
parts to track their movements, commonly used in motion capture and gesture recognition
systems.
Depth-Sensing Cameras: Cameras like the Intel RealSense and the Orbbec Astra use infrared
technology to create 3D depth maps, allowing for precise gesture recognition and tracking.
Haptic Feedback Devices: These devices provide tactile feedback to the user, such as
vibration or force feedback, in response to specific gestures. Examples include haptic feedback
in smartphones and game controllers.
Augmented Reality (AR) and Virtual Reality (VR) Headsets: AR and VR headsets, like the
Oculus Rift and HoloLens, offer immersive experiences where gestures can control virtual
objects and environments.
Projectors: Projectors can display interactive content on various surfaces, enabling users to
interact with projected images and interfaces through gestures and touch.
Interactive Whiteboards: These large touchscreen displays are used in educational and
business settings to allow users to control and interact with digital content using gestures and
digital pens.
Smart TVs: Some modern smart TVs come with gesture control features, allowing users to
change channels, adjust volume, and navigate menus with hand movements.
19
Interactive Tables: Tables equipped with touch-sensitive or gesture-sensing surfaces are used
in various applications, such as restaurants, retail, and collaborative workspaces.
20
Hand Supported Displays
The user can hold the device in one or both hands in order to periodically view a synthetic
scene
Allows user to go in and out of the simulation environment as demanded by the application
It has push buttons that can be used to interact with the virtual scene
The latest application of augmented reality is flooring. It works by using a smartphone camera
to project various virtual objects onto the floor. The app can simulate various objects, such as
water waves. Researchers at McGill University came up with the idea of putting floor tiles on
the floor, which can mimic different objects. The app also allows users to share these
measurements with contractors. It also allows for private-label versions. This will make the
process of selecting the right floor plan easier.
Large Volume Displays Large volume displays are used in VR environment that allow more
than one user located in close proximity
Large-volume displays in augmented reality (AR) refer to systems or setups that enable the
projection of AR content into a physical space on a larger scale, often beyond the confines of
typical handheld devices or headsets. These displays can be used for various purposes, such as
virtual design and prototyping, immersive gaming experiences, architectural visualization, and
more. Here are some methods and technologies commonly used for creating large-volume AR
displays:
The Visual System The human visual system can be regarded as consisting of two parts.
The eyes act as image receptors which capture light and convert it into signals which
are then transmitted to image processing centres in the brain.
These centres process the signals received from the eyes and build an internal “picture”
of the scene being viewed.
21
Processing by the brain consists of partly of simple image processing and partly of
higher functions which build and manipulate an internal model of the outside world.
Although the division of function between the eyes and the brain is not clear-cut, it is
useful to consider each of the components separately.
The basic structure of the eye is displayed in figure a cross-section of a right eye. The
cornea and aqueous humour act as a primary lens which perform crude focusing of the
incoming light signal.
A muscle called the zonula controls both the shape and positioning (forward and
backwards) of the eye’s lens.
This provides a fine control over how the light entering the eye is focused.
The iris is a muscle which, when contracted, covers all but a small central portion of the
lense.
This allows dynamic control of the amount of light entering the eye, so that the eye can
work well in a wide range of viewing conditions, from dim to very bright light.
The portion of the lens not covered by the iris is called the pupil.
The retina provides a photo-sensitive screen at the back of the eye, which incoming light
is focused onto.
Light hitting the retina is converted into nerve signals.
A small central region of the retina, called the fovea, is particularly sensitive because it
is tightly packed with photo-sensitive cells.
It provides very good resolution and is used for close inspection of objects in the visual
field.
The optic nerve transmits the signals generated by the retina to the vision processing
centres of the brain.
The retina is composed of a thin layer of cells lining the interior back and sides of the
eye.
22
Many of the cells making up the retina are specialised nerve cells which are quite similar
to the tissue of the brain.
Other cells are light-sensitive and convert incoming light into nerve signals which are
transmitted by the other retinal cells to the optic nerve and from there to the brain.
There are two general classes of light sensitive cells in the brain; rods and cones.
Rod cells are very sensitive and provide visual capability at very low light levels.
Cone cells perform best at normal light levels.
The provide our daytime visual facilities, including the ability to see in colour (which
we discuss in the next chapter).
There are roughly 120 million rod cells and 6 million cone cells in the retina. There are
many more rods than cones because they are used at low light levels and so more of
them are required to gather the light
To ensure full immersion in VR systems, the spatial sounds need to match the spatial
characteristics of the visuals – so if you see a car moving away from you in the VR
environment, you will also expect to hear the car moving away from you.
Stereo sound
If we have two loudspeakers (stereo), we can move the perceived position of a sound
source anywhere along the horizontal plane between the two loudspeakers. We can ‘pan’ the
sound to the left side by increasing the amplitude level of the left loudspeaker and lowering the
amplitude of the right loudspeaker.
If the sound is played at the same amplitude level through both loudspeakers it will be
heard as if coming from directly in between the two.
So – how do we go about delivering a 360 degree immersive sound scene over a pair of
headphones?
23
Spatial sound
In How Your Ear Changes Sound we discovered how sounds are filtered by the outer
ear, mainly the pinna. Think about a sound placed to your right, slightly above your head. The
acoustic wave that reaches your ears from this sound has travelled directly to your right ear,
but has had to travel around your head to reach your left ear. The shape of your head actually
filters the sound, meaning that some frequencies are dampened and the overall tone is altered
– these are spectral cues.
There are two more types of binaural cue your brain can use:
The interaural level difference (ILD) – in our example the sound has travelled further to
get to your left ear, so it’s quieter because it’s lost more energy on the way
The interaural time difference (ITD) – in our example the sound reaches your right ear a
fraction of a millisecond before it reaches your left ear.
24
The auditory system changes a wide range of weak mechanical signals into a complex
series of electrical signals in the central nervous system. Sound is a series of pressure changes
in the air. Sounds often vary in frequency and intensity over time. Humans can detect sounds
that cause movements only slightly greater than those of Brownian movement. Obviously, if
The inner ear is filled with fluid. Since fluid is incompressible, as the stapes moves in
and out there needs to be a compensatory movement in the opposite direction. Notice that the
round window membrane, located beneath the oval window, moves in the opposite direction.
Because the tympanic membrane has a larger area than the stapes footplate there is a
hydraulic amplification of the sound pressure. Also because the arm of the malleus to which
the tympanic membrane is attached is longer than the arm of the incus to which the stapes is
attached, there is a slight amplification of the sound pressure by a lever action. These two
impedance matching mechanisms effectively transmit air-born sound into the fluid of the inner
ear. If the middle-ear apparatus (ear drum and ossicles) were absent, then sound reaching the
oval and round windows would be largely reflected.
25