0% found this document useful (0 votes)
208 views7 pages

Computer Vision

This document discusses the field of computer vision. It begins by defining computer vision as dealing with how computers can understand digital images and videos at a high level, similar to human visual understanding. It describes computer vision tasks like image analysis and extraction of information from real-world data. The document then provides more details on the scientific and technological disciplines of computer vision and some sub-domains. It continues with a brief history of computer vision and discussions of hardware and recent advances in deep learning techniques.

Uploaded by

Daniel Belay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
208 views7 pages

Computer Vision

This document discusses the field of computer vision. It begins by defining computer vision as dealing with how computers can understand digital images and videos at a high level, similar to human visual understanding. It describes computer vision tasks like image analysis and extraction of information from real-world data. The document then provides more details on the scientific and technological disciplines of computer vision and some sub-domains. It continues with a brief history of computer vision and discussions of hardware and recent advances in deep learning techniques.

Uploaded by

Daniel Belay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Computer vision 

is an interdisciplinary scientific field that deals with how computers can gain


high-level understanding from digital images or videos. From the perspective of engineering, it seeks to
understand and automate tasks that the human visual system can do.
Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital
images, and extraction of high-dimensional data from the real world in order to produce numerical or
symbolic information, e.g. in the forms of decisions. Understanding in this context means the
transformation of visual images (the input of the retina) into descriptions of the world that make sense
to thought processes and can elicit appropriate action. This image understanding can be seen as the
disentangling of symbolic information from image data using models constructed with the aid of
geometry, physics, statistics, and learning theory.
The scientific discipline of computer vision is concerned with the theory behind artificial systems that
extract information from images. The image data can take many forms, such as video sequences, views
from multiple cameras, multi-dimensional data from a 3D scanner, or medical scanning device. The
technological discipline of computer vision seeks to apply its theories and models to the construction of
computer vision systems.
Sub-domains of computer vision include scene reconstruction, event detection, video tracking, object
recognition, 3D pose estimation, learning, indexing, motion estimation, visual servoing, 3D scene
modeling, and image restoration.

History
In the late 1960s, computer vision began at universities which were pioneering artificial intelligence. It
was meant to mimic the human visual system, as a stepping stone to endowing robots with intelligent
behavior. In 1966, it was believed that this could be achieved through a summer project, by attaching a
camera to a computer and having it "describe what it saw".
What distinguished computer vision from the prevalent field of digital image processing at that time was
a desire to extract three-dimensional structure from images with the goal of achieving full scene
understanding. Studies in the 1970s formed the early foundations for many of the computer
vision algorithms that exist today, including extraction of edges from images, labeling of lines, non-
polyhedral and polyhedral modeling, representation of objects as interconnections of smaller
structures, optical flow, and motion estimation.
The next decade saw studies based on more rigorous mathematical analysis and quantitative aspects of
computer vision. These include the concept of scale-space, the inference of shape from various cues
such as shading, texture and focus, and contour models known as snakes. Researchers also realized that
many of these mathematical concepts could be treated within the same optimization framework
as regularization and Markov random fields. By the 1990s, some of the previous research topics became
more active than the others. Research in projective 3-D reconstructions led to better understanding
of camera calibration. With the advent of optimization methods for camera calibration, it was realized
that a lot of the ideas were already explored in bundle adjustment theory from the field
of photogrammetry. This led to methods for sparse 3-D reconstructions of scenes from multiple images.
Progress was made on the dense stereo correspondence problem and further multi-view stereo
techniques. At the same time, variations of graph cut were used to solve image segmentation. This
decade also marked the first time statistical learning techniques were used in practice to recognize faces
in images (see Eigenface). Toward the end of the 1990s, a significant change came about with the
increased interaction between the fields of computer graphics and computer vision.
This included image-based rendering, image morphing, view interpolation, panoramic image
stitching and early light-field rendering. Recent work has seen the resurgence of feature-based
methods, used in conjunction with machine learning techniques and complex optimization
frameworks. the advancement of Deep Learning techniques has brought further life to the field of
computer vision. The accuracy of deep learning algorithms on several benchmark computer vision data
sets for tasks ranging from classification, segmentation and optical flow has surpassed prior methods.

Hardware
There are many kinds of computer vision systems; however, all of them contain these basic elements: a
power source, at least one image acquisition device (camera, ccd, etc.), a processor, and control and
communication cables or some kind of wireless interconnection mechanism. In addition, a practical
vision system contains software, as well as a display in order to monitor the system. Vision systems for
inner spaces, as most industrial ones, contain an illumination system and may be placed in a controlled
environment. Furthermore, a completed system includes many accessories such as camera supports,
cables and connectors.
Most computer vision systems use visible-light cameras passively viewing a scene at frame rates of at
most 60 frames per second (usually far slower).
A few computer vision systems use image-acquisition hardware with active illumination or something
other than visible light or both, such as structured-light 3D scanners, thermo graphic cameras, hyper
spectral imagers, radar imaging, lidar scanners, magnetic resonance images, side-scan sonar, synthetic
aperture sonar, etc. Such hardware captures "images" that are then processed often using the same
computer vision algorithms used to process visible-light images.
While traditional broadcast and consumer video systems operate at a rate of 30 frames per second,
advances in digital signal processing and consumer graphics hardware has made high-speed image
acquisition, processing, and display possible for real-time systems on the order of hundreds to
thousands of frames per second. For applications in robotics, fast, real-time video systems are critically
important and often can simplify the processing needed for certain algorithms. When combined with a
high-speed projector, fast image acquisition allows 3D measurement and feature tracking to be realized.
Egocentric vision systems are composed of a wearable camera that automatically takes pictures from a
first-person perspective.
As of 2016, vision processing units are emerging as a new class of processor, to complement CPUs
and graphics processing units (GPUs) in this role.

Ways Computer Vision Helps Marketers See Better Performance


The capability to consume and translate images into a dataset has become vital as online
Communication evolves. Visual content is poised to overtake text in its importance in social Media,
publishing platforms, and online marketing campaigns. “The camera is replacing the keyboard,” said
Richard Lee, the CEO of a visual technology company. “Images are clearly the overwhelming data piece
of social. And now, it is possible for brands to extract data and insights from the plethora of images
available online at scale.” To illustrate just how useful and versatile computer vision is becoming to
marketers, here are seven of the most exciting applications of computer vision we expect to see in the
near future.
1. Helping Customers Discover Products Based on Visual Traits
When browsing a website but looking for a specific item, customers generally rely on the search or filter
function to help find the category of products they’re looking for. Similarly, sites often provide
recommended or related items based on a selected product. On the back end, this typically requires an
extensive set of “tags” that are manually assigned to products and subjective to the retailer. For
instance, one brand’s “culottes” can be another’s nearly identical “gaucho pants,” and details can get
even more confusing when proprietary terms are used for specific clothing lines or styles. Thus, visual
search can help customers browse, compare, and narrow their choices through image-generated
similarities vs. manually attributed classifications. A customer can use a shoe they like, for example, to
find more options in a similar style without the need for specific terminology or from the potential bias
of a tagged category. Features like these minimize the need for customers to know brand jargon and
simplifies their product hunt.
Data generated from browsing using visual cue modifiers can be just as useful for retailers, who can use
retail content management systems (CMS) to trace patterns not visible through language-based tagging
alone. For example, two seemingly unrelated products might be purchased together because they have
complementary style elements, enabling the system to suggest similar pairings moving forward.

2. Using the Camera over the Keyboard to Search for Products


Computer vision technology enables the use of photo inputs to begin a search query. For instance, if
someone sees a great hat or backpack, they can snap a photo of the item and submit it for product
information.
Think of it like a reverse Google image search. Rather than entering text information to find images, you
can submit a photo for analysis to return information on it. And because computer vision can not only
identify the contents of an image, but also the context, it knows that the item is not just a pair of
sunglasses, but Ray-Ban Club master Classics. This can shorten the customer journey by directing the
searcher to an item’s product page, minimizing the steps to purchase, or a missed opportunity
altogether. For this reason, platforms like Pinterest and eBay have been exploring ways to use photos
instead of text within search. Shoppers can also use photos of items they already own to get
complementary suggestions, such as submitting a photo of their car to find floor mats that fit.

3. Scraping Data from Social and Video Channels for Discovery


Computer vision can scrape images and videos for metadata to be used for image-based discovery. For
instance, Instagram could use computer vision to recognize people and products in photos that are then
searchable within their Explore function. Instead of searching for specific hashtags (that would’ve had to
have been manually added by the uploader), users could use general terms like “beach bag” or specific
product names to return user images or videos with those contents identified in them. This would aid
potential customers in product research and help brands identify influencers. The technology can also
track fashion trends by drawing patterns from popular photos and videos. This information can supply
insights and feedback to optimize creative and improve targeting through image-based performance
analytics.

4. Serving Relevant and Personalized Creative


With computer vision’s ability to ascribe detailed attributes and text descriptors to images, this
metadata can then be used in algorithms to guide machine learning selection of creatives within ad or
marketing campaigns. Someone who regularly browses fitness sites, for instance, can be served creative
imagery that has been tagged with descriptors that correspond to an active person’s lifestyle. The
process can also work in reverse, such as when a brand of soy-based milk substitute places ads on sites
showing images of people engaged in fitness activities or making healthy meals. Similarly, contextual ads
for a specific cosmetic brand can be served as an in-video overlay On a makeup tutorial featuring their
products.

5. People Tracking for Optimization and Non-Digital Ad Attribution


Modern computer vision technology has progressed to tracking human behaviors in real-time through a
live video feed, like how autonomous cars can sense pedestrians. Cameras in a retail location, for
instance, can draw conclusions about a store layout and shelf arrangement based on customer traffic,
how they move throughout a space, and where their gaze falls using facial recognition. For example, the
system can accurately track the first area that a majority of customers go to when they enter a store and
which products are drawing the most attention. This technology can also serve as a form of attribution
for ad formats such as outdoor signage and potentially TV viewership. Just as how digital ads can track
impressions versus clicks, an outdoor sign could track how many people walked past the ad versus how
many actually looked at it, how long they looked at it, and even estimate individual demographics
Like age and gender.

6. Gathering Data for Emotional Analytics and Tracking Consumer Attention


Similar to how social listening technology can gauge sentiment within written content, computer vision
systems can track and measure emotional reactions to ad creative. This data is important because self-
reported emotions can be inaccurate, especially if the subject’s face is telling a different story.
Annalect’s Moodometer experiment, for instance, revealed that Super Bowl watchers had the most
positive reaction to an ad that they had ranked 55 out of 63. The experiment demonstrated that the
creative had an impact despite its lower ranking and that consumer surveys do not always provide a
complete picture of a campaign’s effectiveness.

7. Using Visual Data for Customer Personalization


Using computer vision, companies can gather real-time visual data on customers to personalize
experiences and inform marketing strategy. Select McDonald’s locations have implemented camera-
equipped kiosks that suggest menu items based on the customer’s perceived age and gender. In another
example, some analysts theorize that the addition of a camera-equipped smart speaker to the Amazon
Echo lineup could give Amazon the ability to gather customer data for more effective cross-sells. By
observing what people wear and what they bring into their homes, the company can learn which products
to restock or suggest for purchase.

Uses of computer vision in marketing & customer experience


The ability for computers to recognize novel images – to ‘see’ – is perhaps the most exciting side of AI.
But while most of the hype is centered around computer vision in self-driving cars, there are already
plenty of implications for marketers.

1. Smarter online merchandising


Merchandising in ecommerce is traditionally all about tagging. Each product has numerous tags, which
allow the customer to ·alter for particular attributes, but also allows recommendation algorithms to
surface related products (these algorithms may also analyze behavioral and purchase data). Online
retailers can also override these algorithms if they want to surface a particularly important product –
perhaps something new. However, AI-based software such as Sentient Aware is now allowing for visual
product discovery, which negates the need for most metadata and surfaces similar products. Based on
their visual affinity. This means that as an alternative to using a standard altering system, shoppers can
select a product they like and be shown visually similar products. The advantages of visual product
discovery are many fold. It can surface a greater proportion of a product catalogue and ·nd products in
separate categories that a customer may not have otherwise encountered (e.g. a ‘sports’ shoe that looks
similar to one from the ‘lifestyle’ category). Where once online retailers had to decide between selling a
focused/curated selection of products or a larger range, visual product discovery allows the best of both
worlds.

2. More effective retargeting


The same technology used for merchandising can also be applied to retargeting. Retargeting site visitors
with display advertising for a single product (perhaps after cart abandonment) is effective but can often
be a blunt tactic. Dynamic creative that features a range of visually similar products may have more
success, especially as retailers may be unsure if a customer has already bought a speci·c product offline.
.
3. Real-world product & content discovery
Pinterest has only recently launched a tool called Lens, which functions like Shazam but for the visual
world. This gives the consumer the ability to point their smartphone camera at an object and perform a
Pinterest search, to surface that particular product related content.
The social network has had a form of visual search functionality since 2015 (as has Houzz), allowing
users to select part of an image and search for related items, but has expanded this further with Lens, as
well as allowing brands to surface products found w4. Image-aware social listening Brands chie¸y
monitor social media for mention of their products and services. But text forms only a part of what
social media users post online – images and video are arguably just as important. There are already
companies (such as Ditto or gumgum) providing social listening that can recognize the use of brand
logos, helping community managers good and bad feedback. Sub tweeters, beware.

5. Frictionless store experiences


Amazon Go hit the headlines in December 2016. Customers enter the store via a turnstile which scans a
barcode on their Amazon Go app Computer vision technology then tracks the customer around the
store (presumably alongside some form of phone tracking, though Amazon has not released full details)
and sensors on the shelves detect when the customer selects an item. Once you’ve got everything you
need, you simply leave the store, with the Go app knowing what you have taken with you.

6. Retail analytics
Density is a startup that anonymously tracks the movement of people as they move around work
spaces, using a small piece of hardware that can track movement through doorways. There are many
uses of this data, notably in safety, but they include tracking how busy a store is or tracking how long a
queue / wait time is. Of course, automated footfall counters have been available for a while, but
advances in computer vision mean people tracking is sophisticated enough to be used in the
optimization of merchandising. RetailNext is one company which provides such retail analytics, allowing
store owners to ask:
Where do shoppers go in my store (and where do they not go)?
Where do shoppers stop and engage with textures or sales associates?
How long do they stay engaged?
Which are my most effective textures, and which ones are underperforming?

7. Emotional analytics
In January 2016 Mediacom announced that it would be using facial detection and analytics technology
developed by Realeyes as part of content testing and media planning. The tech uses remote panels of
users and works using their existing webcams to capture their reactions to ads and content.
Real eyes CEO Mikhel Jaatma told Martech Today that emotional analytics is “faster and Cheaper” than
traditional online surveys or focus groups, and gathers direct response rather than drawing on
subjective or inferred opinions Other companies in the emotional analytics space include Unruly, in
partnership with Nielsen.

8. Image search
As computer vision improves, it can be used to perform automated general tagging of images. This may
eventually mean that manual and inconsistent tagging is not needed making image organization on a
large scale quicker and more accurate. This has profound implications when querying large sets of
images, as Gaurav Oberoi suggests in a blog post, a user could ask the question “what kinds of things are
shown on movie posters and do they differ by genre?”, for example. Eventually, when applied to video,
the data available will be mind boggling, and how we access and archive imagery may fundamentally
change. Though this is still a long way off, many will already be familiar with the power of image search
in Google Photos, which is trained to recognize thousands of objects, and with doing a reverse image
search within Google’s search engine or in a stock photo archive.

9. Augmented reality
From Snap chat Lenses to as-yet commercially unproven technology involving headsets such as
Hololens, augmented reality is increasingly mentioned as a possible next step for mobile technology.
Indeed, Tim Cook seems particularly excited about it.

10. Direct mail processing


Essential use of optical character recognition (OCR), Royal Mail in the UK spent £150m on an automated
facility in 2004, near Heathrow, which scans the front and back of envelopes and translate addresses
into machine-readable code.

How computer vision may impact the future of marketing


It’s important to note that these applications are most likely to be found in retail or broad B2C markets.

1. Contextual ads/in-image ads


There are companies (GumGum is one of them) that can display advertisements over images, by
contextually identifying what is in the image and displaying relevant ads on the image itself. For
example, an image featuring playing kittens might be a good place to advertise a cat food brand or an
image of a tropical beach might be a good place to advertise vacation rentals in the Bahamas. This is a
challenging task that hasn’t been possible until relatively recently thanks to major developments in
machine vision in the last two to three years. “Until very recently, it hasn’t been possible for a computer
to get a semantic — that is to say, a human level understanding of pictures,” machine vision guru
Nathan Hurst, a distinguished engineer at Shutters tock, told me. In a recent interview, he explained
how past approaches almost always boiled down to tagging images to identify their contents — until
engineers built machine learning models that could be trained on massive image data sets. With
algorithms that can distinguish not just a “car,” but a “2004 Honda Civic,” and not just a “dog,” but a
“cocker spaniel,” advertisers now have the ability to target image contexts to target their ads. An e-
commerce business targeting Honda owners can not only target branded search terms (in Google
AdWords, for example), but might also target only the images of Honda cars on related websites.

2. Programmatically generating advertising creatives


The online world is moving to video — with Cisco research predicting that 80 percent of web by
2019 will be from engagement with video. Because of this trend, not only are major journalistic sites
(Such as Mic and Verge) pivoting to video, but brands are also aiming to win in the video game — but
it’s not easy. If a sunglasses brand has 100 images of its newest design Montreal-based Envision.ai is
working on applications to parse through myriad image and video options to match the right media to
the right user at the right time. Because a certain user or demographic group may change click-through
behavior depending on the time of day, an AI system could be trained to adjust advertising media on
these real-time factors. For instance, Unilever’s Axe body spray has run social media campaigns with
100,000 different versions of its “Romeo Reboot” video, according to a post by the project’s visual
effects director. As this kind of deep “calibration” to users and segments becomes the norm, large
consumer brands may be forced to follow suit to match the innovators in online media engagement.

3. Facial recognition for advertising feedback


One of the benefits of online advertising is the fact that it’s trackable. Advertisers know how many
Sessions, users, clicks and so on happen in a given day or minute. They can calibrate specific ads to
Certain types of users or geolocations or days of the week and so on. This digital “footprint” allows for
A tremendous amount of data to be collected to help optimize an advertiser’s efforts. But outdoor
advertising hasn’t been able to keep up. Tracking “users” and tracking “number of people who walk
within 10 feet of this signage” are very, very different — the latter being much more challenging.
Tracking “number of clicks to video content” and tracking “number of passersby who look at this
outdoor advertisement for more than 3 seconds” are very different — again with the latter being
Much more challenging. The limitations of the physical world are being overcome, however, by
innovative companies that are taking the principles of online testing and variation and bringing them
offline. London’s M&C Saatchi has experimented with outdoor advertisements that track physical
equivalents of “engagement” and varies its outdoor signage in real time based on the responses of the
people who walk past.

You might also like