Computer Vision
Computer Vision
History
In the late 1960s, computer vision began at universities which were pioneering artificial intelligence. It
was meant to mimic the human visual system, as a stepping stone to endowing robots with intelligent
behavior. In 1966, it was believed that this could be achieved through a summer project, by attaching a
camera to a computer and having it "describe what it saw".
What distinguished computer vision from the prevalent field of digital image processing at that time was
a desire to extract three-dimensional structure from images with the goal of achieving full scene
understanding. Studies in the 1970s formed the early foundations for many of the computer
vision algorithms that exist today, including extraction of edges from images, labeling of lines, non-
polyhedral and polyhedral modeling, representation of objects as interconnections of smaller
structures, optical flow, and motion estimation.
The next decade saw studies based on more rigorous mathematical analysis and quantitative aspects of
computer vision. These include the concept of scale-space, the inference of shape from various cues
such as shading, texture and focus, and contour models known as snakes. Researchers also realized that
many of these mathematical concepts could be treated within the same optimization framework
as regularization and Markov random fields. By the 1990s, some of the previous research topics became
more active than the others. Research in projective 3-D reconstructions led to better understanding
of camera calibration. With the advent of optimization methods for camera calibration, it was realized
that a lot of the ideas were already explored in bundle adjustment theory from the field
of photogrammetry. This led to methods for sparse 3-D reconstructions of scenes from multiple images.
Progress was made on the dense stereo correspondence problem and further multi-view stereo
techniques. At the same time, variations of graph cut were used to solve image segmentation. This
decade also marked the first time statistical learning techniques were used in practice to recognize faces
in images (see Eigenface). Toward the end of the 1990s, a significant change came about with the
increased interaction between the fields of computer graphics and computer vision.
This included image-based rendering, image morphing, view interpolation, panoramic image
stitching and early light-field rendering. Recent work has seen the resurgence of feature-based
methods, used in conjunction with machine learning techniques and complex optimization
frameworks. the advancement of Deep Learning techniques has brought further life to the field of
computer vision. The accuracy of deep learning algorithms on several benchmark computer vision data
sets for tasks ranging from classification, segmentation and optical flow has surpassed prior methods.
Hardware
There are many kinds of computer vision systems; however, all of them contain these basic elements: a
power source, at least one image acquisition device (camera, ccd, etc.), a processor, and control and
communication cables or some kind of wireless interconnection mechanism. In addition, a practical
vision system contains software, as well as a display in order to monitor the system. Vision systems for
inner spaces, as most industrial ones, contain an illumination system and may be placed in a controlled
environment. Furthermore, a completed system includes many accessories such as camera supports,
cables and connectors.
Most computer vision systems use visible-light cameras passively viewing a scene at frame rates of at
most 60 frames per second (usually far slower).
A few computer vision systems use image-acquisition hardware with active illumination or something
other than visible light or both, such as structured-light 3D scanners, thermo graphic cameras, hyper
spectral imagers, radar imaging, lidar scanners, magnetic resonance images, side-scan sonar, synthetic
aperture sonar, etc. Such hardware captures "images" that are then processed often using the same
computer vision algorithms used to process visible-light images.
While traditional broadcast and consumer video systems operate at a rate of 30 frames per second,
advances in digital signal processing and consumer graphics hardware has made high-speed image
acquisition, processing, and display possible for real-time systems on the order of hundreds to
thousands of frames per second. For applications in robotics, fast, real-time video systems are critically
important and often can simplify the processing needed for certain algorithms. When combined with a
high-speed projector, fast image acquisition allows 3D measurement and feature tracking to be realized.
Egocentric vision systems are composed of a wearable camera that automatically takes pictures from a
first-person perspective.
As of 2016, vision processing units are emerging as a new class of processor, to complement CPUs
and graphics processing units (GPUs) in this role.
6. Retail analytics
Density is a startup that anonymously tracks the movement of people as they move around work
spaces, using a small piece of hardware that can track movement through doorways. There are many
uses of this data, notably in safety, but they include tracking how busy a store is or tracking how long a
queue / wait time is. Of course, automated footfall counters have been available for a while, but
advances in computer vision mean people tracking is sophisticated enough to be used in the
optimization of merchandising. RetailNext is one company which provides such retail analytics, allowing
store owners to ask:
Where do shoppers go in my store (and where do they not go)?
Where do shoppers stop and engage with textures or sales associates?
How long do they stay engaged?
Which are my most effective textures, and which ones are underperforming?
7. Emotional analytics
In January 2016 Mediacom announced that it would be using facial detection and analytics technology
developed by Realeyes as part of content testing and media planning. The tech uses remote panels of
users and works using their existing webcams to capture their reactions to ads and content.
Real eyes CEO Mikhel Jaatma told Martech Today that emotional analytics is “faster and Cheaper” than
traditional online surveys or focus groups, and gathers direct response rather than drawing on
subjective or inferred opinions Other companies in the emotional analytics space include Unruly, in
partnership with Nielsen.
8. Image search
As computer vision improves, it can be used to perform automated general tagging of images. This may
eventually mean that manual and inconsistent tagging is not needed making image organization on a
large scale quicker and more accurate. This has profound implications when querying large sets of
images, as Gaurav Oberoi suggests in a blog post, a user could ask the question “what kinds of things are
shown on movie posters and do they differ by genre?”, for example. Eventually, when applied to video,
the data available will be mind boggling, and how we access and archive imagery may fundamentally
change. Though this is still a long way off, many will already be familiar with the power of image search
in Google Photos, which is trained to recognize thousands of objects, and with doing a reverse image
search within Google’s search engine or in a stock photo archive.
9. Augmented reality
From Snap chat Lenses to as-yet commercially unproven technology involving headsets such as
Hololens, augmented reality is increasingly mentioned as a possible next step for mobile technology.
Indeed, Tim Cook seems particularly excited about it.