Artificial Passenger
Artificial Passenger
BY
NAME: XXXXXXXX
MATRIC NO:
XXXXXXXX
i
TABLE OF CONTENTS
CHAPTER ONE .........................................................................................................................1
INTRODUCTION ...................................................................................................................1
1.1 BACKGROUND OF THE STUDY ..............................................................................1
CHAPTER TWO ........................................................................................................................3
LITERATURE REVIEW ........................................................................................................3
2.1 HISTORY OF ARTIFICIAL INTELLIGENCE ............................................................3
2.2 RECOGNIZING ARTIFICIAL INTELLIGENCE ........................................................4
2.3 WORKING OF ARTIFICIAL PASSENGER ................................................................6
CHAPTER THREE.....................................................................................................................8
DISCUSSION .........................................................................................................................8
3.1 FUNCTION OF ARTIFICIAL PASSENGER ...............................................................8
3.2 FEATURES OF ARTIFICIAL PASSENGER ............................................................. 10
3.3 APPLICATION OF ARTIFICIAL PASSENGER ....................................................... 11
3.4 FUTURE APPLICATION .......................................................................................... 11
CHAPTER FOUR ..................................................................................................................... 12
CONCLUSION ..................................................................................................................... 12
REFERENCES ...................................................................................................................... 13
ii
CHAPTER ONE
INTRODUCTION
The Artificial Passenger is a telematic device, developed by IBM that interacts verbally with a
driver to reduce the likelihood of them falling asleep at the controls of a vehicle. It is based on
inventions covered by U.S. patent 6,236,968. The Artificial Passenger is equipped to engage a
vehicle operator by carrying on conversations, playing verbal games, controlling the vehicle's
stereo system, and so on. It also monitors the driver's speech patterns to detect fatigue, and in
response can suggest that the driver take a break or get some sleep. The Artificial Passenger may
also be integrated with wireless services to provide weather and road information, driving
directions, and other such notifications systems.
The AP is an artificial intelligence based companion that will be resident in software and chips
embedded in the automobile dashboard. The heart of the system is a conversation planner that
holds a profile of you, including details of your interests and profession. When activated, the AP
uses the profile to cook up provocative questions such “Who was the first person you dated?” via
a speech generator and in-car speakers. A microphone picks up your answer and breaks it down
into separate words with speech-recognition software. A camera built into the dashboard also
tracks your lip movements to improve the accuracy of the speech recognition. A voice analyzer
then looks for signs of tiredness by checking to see if the answer matches your profile. Slow
responses and a lack of intonation are signs of fatigue. If you reply quickly and clearly, the system
judges you to be alert and tells the conversation planner to continue the line of questioning. If your
response is slow or doesn’t make sense, the voice analyzer assumes you are dropping off and acts
to get your attention. The system, according to its inventors, does not go through a suite of rote
questions demanding rote answers. Rather, it knows your tastes and will even, if you wish, make
certain you never miss Paul Harvey again. This is from the patent application: “An even further
object of the present invention is to provide a natural dialog car system that understands content
of tapes, books, and radio programs and extracts and reproduces appropriate phrases from those
materials while it is talking with a driver. For example, a system can find out if someone is singing
on a channel of a radio station. The system will state, “And now you will hear a wonderful song!”
1
or detect that there is news and state, “Do you know what happened now hear the following and
play some news.” The system also includes a recognition system to detect who is speaking over
the radio and alert the driver if the person speaking is one the driver wishes to hear.” Just because
you can express the rules of grammar in software doesn’t mean a driver is going to use them. The
AP is ready for that possibility: It provides for a natural dialog car system directed to human factor
engineering for example, people using different strategies to talk (for instance, short vs. elaborate
responses).
2
CHAPTER TWO
LITERATURE REVIEW
2.1 HISTORY OF ARTIFICIAL INTELLIGENCE
This chapter sheds more light into the historical perspective and genesis of Artificial Intelligence.
The term artificial intelligence was first coined by John McCarthy in 1956 when he held
the first academic conference on the subject. But the journey to understand if machines can truly
think began much before that. In Vannevar Bush’s seminal work As We May Think [Bush45]
he proposed a system which amplifies people’s own knowledge and understanding. Five years
later Alan Turing wrote a paper on the notion of machines being able to simulate human beings
and the ability to do intelligent things, such as play Chess [Turing50]. No one can refute a
computer’s ability to process logic. But to many it is unknown if a machine can think. The precise
definition of think is important because there has been some strong opposition as to whether or not
this notion is even possible. For example, there is the so-called ‘Chinese room’ argument
[Searle80]. Imagine someone is locked in a room, where they were passed notes in Chinese. Using
an entire library of rules and look-up tables they would be able to produce valid responses in
Chinese, but would they really ‘understand’ the language? The argument is that since computers
would always be applying rote fact lookup they could never ‘understand’ a subject.
Early work in AI focused on using cognitive and biological models to simulate and explain human
information processing skills, on "logical" systems that perform common-sense and expert
reasoning, and on robots that perceive and interact with their environment. This early work was
spurred by visionary funding from the Defense Advanced Research Projects Agency (DARPA)
and Office of Naval Research (ONR), which began on a large scale in the early 1960's and
continues to this day. Basic AI research support from DARPA and ONR -- as well as support from
NSF, NIH, AFOSR, NASA, and the U.S. Army beginning in the 1970's -- led to theoretical
advances and to practical technologies for solving military, scientific, medical, and industrial
information processing problems.
By the early 1980's an "expert systems" industry had emerged, and Japan and Europe dramatically
increased their funding of AI research. In some cases, early expert systems success led to inflated
claims and unrealistic expectations: while the technology produced many highly effective systems,
3
it proved very difficult to identify and encode the necessary expertise. The field did not grow as
rapidly as investors had been led to expect, and this translated into some temporary
disillusionment. AI researchers responded by developing new technologies, including streamlined
methods for eliciting expert knowledge, automatic methods for learning and refining knowledge,
and common sense knowledge to cover the gaps in expert information. These technologies have
given rise to a new generation of expert systems that are easier to develop, maintain, and adapt to
changing needs.
Today developers can build systems that meet the advanced information processing needs of
government and industry by choosing from a broad palette of mature technologies. Sophisticated
methods for reasoning about uncertainty and for coping with incomplete knowledge have led to
more robust diagnostic and planning systems. Hybrid technologies that combine symbolic
representations of knowledge with more quantitative representations inspired by biological
information processing systems have resulted in more flexible, human-like behavior. AI ideas also
have been adopted by other computer scientists -- for example, "data mining," which combines
ideas from databases, AI learning, and statistics to yield systems that find interesting patterns in
large databases, given only very broad guidelines.
It is difficult to define exactly what we mean when saying that a computer program should behave
intelligently. Most people can give an abstract definition of intelligence, and anyone can look it
up in a dictionary. However, conventional definitions of intelligence, like many commonly used
expressions, are too ambiguous to be directly and usefully applied to computers. It is impossible
to describe artificial intelligence, or to gauge our progress in that field, without knowing how
intelligence applies to computers.
In a paper in 1950, Alan Turing proposed a test to measure the intelligence of computer programs
[Turing 50]. Turing refers to this test as the ‘imitation game’ (though it has since been dubbed
simply the Turing test). In the imitation game, a human judge uses a Teletype or some other simple
interface to interrogate both a man (A) and a woman (B). The interrogator does not know in
advance whether A is male and B is female, or vice versa. It is A’s job to convince the interrogator
that A is actually a woman. If asked, for example, the length of his hair, A might indicate that it
4
is straight and layered, with the longest strands being several inches. It is B’s job to help the
interrogator figure out which interrogatee is male and which is female. B might type things like,
“I am the woman! Trust me!” Such statements, however, would be of limited value, since A could
easily type the same.
Roughly half of the time, the interrogator might be fooled into believing that A is actually the
woman. Suppose, however, that A were a computer rather than a man. If that computer could win
the imitation game, i.e. fool the human interrogator, with the same frequency as a man, then the
computer is said to have passed the Turing test. In terms of Turing’s original paper, the computer
might be judged capable of thinking. While passing of the Turing test implies some definition of
artificial intelligence, it is insufficient for describing modern AI systems. As computer science
has begun to mature, we have developed new goals and uses for artificial intelligence, as well as
new technologies for achieving those goals. Intelligent systems need not be designed to fool a
human judge. Nor is such a facade necessarily desirable. A human working in a factory, for
example, would require rest, supervision, and incentive to continue working. These are not
characteristics we choose to emulate in computer programs. Yet there seems to be something
intelligent about a robotic system that can, for example, build or design cars.
It is perhaps better to think of artificial intelligence as the study and design of computer programs
that respond flexibly in unanticipated situations. A computer program can give the illusion of
intelligence if it is designed to react sensibly to a large number of likely and unlikely situations.
This is similar to the way we might judge human intelligence, by a person’s ability to solve
problems and cope effectively with a wide variety of situations [Dean]. In this case, it is not
necessary for an intelligent program (or person) to develop an original solution to a problem. Still,
to say that a computer program should react sensibly to situations is analogous to saying that it
should react intelligently. In other words, the meaning of intelligence in terms of computers
remains elusive. For the purposes of this paper, we will say that artificial intelligence is defined
by two major methodologies and their purposes. Weak artificial intelligence is the design of
computer programs with the intention of adding functionality while decreasing user intervention.
Many modern word processors are designed to indicate misspelled words without being asked to
do so by the user. Some programs will even correct misspellings automatically. This is an example
of weak artificial intelligence.
5
Strong artificial intelligence is the design of a computer program that may be considered a self-
contained intelligence (or intelligent entity). The intelligence of these programs is defined more
in terms of human thought. They are designed to think in the same way that people think. Passage
of the Turing test, for example, might be one criterion for development of a strong AI system. The
ethical issues in this paper deal largely with the strong AI methodology. However, the bulk of
useful artificial intelligence applications lie in the realm of weak AI.
The main devices that are used in this artificial passenger are:-
Eye Tracker
Voice recognizer or speech recognizer
Collecting eye movement data requires hardware and software specifically designed to perform
this function. Eye-tracking hardware is either mounted on a user's head or mounted remotely. Both
systems measure the corneal reflection of an infrared light emitting diode (LED), which
illuminates and generates a reflection off the surface of the eye. This action causes the pupil to
appear as a bright disk in contrast to the surrounding iris and creates a small glint underneath the
pupil. It is this glint that head-mounted and remote systems use for calibration and tracking.
B. Algorithm for monitoring head/eye motion for driver alertness with one camera
The invention robustly tracks a person's head and facial features with a single on-board camera
with a fully automatic system that can initialize automatically, and can reinitialize when it needs
to and provide outputs in real time. The system can classify rotation in all viewing direction,
detects' eye/mouth occlusion, detects' eye blinking, and recovers the 3Dgaze of the eyes. In
addition, the system is able to track both through occlusions like eye blinking and also through
occlusion like rotation. Outputs can be visual and sound alarms to the driver directly.
6
C. Method for detecting driver vigilance comprises the following steps
Aiming a single camera at a head of a driver of a vehicle, detecting frequency of up and
down nodding and left to right rotations of the head within a selected time period of the
driver with the camera.
Determining frequency of eye blinking and eye closings of the driver within the selected
time period with the camera.
Determining the left to right head rotations and the up and down head from examining
approximately 10 frames from approximately.
Determining frequency of yawning of the driver within the select period with the camera.
Generating an alarm signal in real time if a frequency value of the number of the frequency
of the up and down nodding, the left to right rotations, the eye blinking, the eye closings,
the yawning exceed a selected threshold value.
Determine eye blinking and eye closing using the number and intensity of pixels in the eye
region.
The system goes to the eye center of the previous frame and finds the center of mass of the
eye region pixels.
Look for the darkest pixel, which corresponds to the pupil.
This estimate is tested to see if it is close enough to the previous eye location.
Feasibility occurs when the newly computed eye centers are close in pixel distance units
to the previous frames computed eye centers. This kind of idea makes sense because the
video data is 30 frames per second, so the eye motion between individual frames should be
relatively small.
7
CHAPTER THREE
DISCUSSION
One of the ways to address driver safety concerns is to develop an efficient system that relies on
voice instead of hands to control Telemetric devices. CIT speech systems can significantly
improve a driver-vehicle relationship and contribute to driving safety. But the development of full-
fledged Natural Language Understanding (NLU) for CIT is a difficult problem that typically
requires significant computer resources that are usually not available in local computer processors
that car manufacturers provide for their cars.To address this, NLU components should be located
on a server that is accessed by cars remotely or NLU should be downsized to run on local computer
devices (that are typically based on embedded chips). Our department is developing a “quasi-
NLU” component - a “reduced” variant of NLU that can be run in CPU systems with relatively
limited resources. In our approach, possible variants for speaking commands are kept in special
grammar files (one file for each topic or application).
Logically a speech system work as follows: speech is measured in terms of frequency and then
sampled at 16,000 times per second. We empirically observed that noise sources, such as car noise,
have significant energy in the low frequencies and speech energy is mainly concentrated in
frequencies above 200 Hz. Filters are therefore placed in the frequency range (200Hz – 5500 Hz),
Discarding the low frequencies in this way improves the robustness of the system to noise.
Any sound is then identified by matching it to its closest entry in the database of such graphs,
producing a number, called the feature number that describes the sound. Unit matching system
provides likelihoods of a match of all sequences of speech recognition units to the input speech.
These units may be phones, diphones or whole word units. Lexical decoding constraints the unit
matching system to follow only those search paths sequences whose speech units are present in a
word dictionary. Apply a grammar so the speech recognizer knows what phonemes to expect.
Figure out which phonemes are spoken, this is a quite difficult task as different words sound
8
differently as spoken by different persons. Also, background noises from microphone make the
recognizer hear a different vector. Thus a probability analysis is done during recognition by HMM
(Hidden Markov Model).
Fatigue causes more than 240,000 vehicular accidents every year. Driving, however, occupies the
driver’s eyes and hands, thereby limiting most current interactive options. Among the efforts
presented in this general direction, the invention suggests fighting drowsiness by detecting
drowsiness via speech biometrics and, if needed, by increasing arousal via speech interactivity. It
is a common experience for drivers to talk to other people while they are driving to keep
themselves awake. The purpose of Artificial Passenger part of the CIT project at IBM is to provide
a higher level of interaction with a driver than current media, such as CD players or radio stations,
can offer.
This is envisioned as a series of interactive modules within Artificial Passenger that increase driver
awareness and help to determine if the driver is losing focus. This can include both conversational
dialog and interactive games, using voice only. The scenarios for Artificial Passenger currently
include: quiz games, reading jokes, asking questions, and interactive books. In the Artificial
Passenger paradigm, the awareness state of the driver will be monitored, and the content will be
modified accordingly. Drivers evidencing fatigue, for example, will be presented with more
stimulating content than drivers who appear to be alert. Most well-known emotion researchers
agree that arousal (high, low) is the fundamental dimensions of emotion. Arousal reflects the level
of stimulation of the person as measured by physiological aspects such as heart rate, cortical
activation, and respiration.
D. Workload Manager
It is a key component of driver Safety Manager. An object of the workload manager is to determine
a moment-to-moment analysis of the user's cognitive workload. It accomplishes this by collecting
data about user conditions, monitoring local and remote events, and prioritizing message delivery.
There is rapid growth in the use of sensory technology in cars. These sensors allow for the
monitoring of driver actions (e.g. application of brakes, changing lanes), provide information about
local events (e.g. heavy rain), and provide information about driver characteristics (e.g. speaking
9
speed, eyelid status). There is also growing amount of distracting information that may be
presented to the driver (e.g. phone rings, radio, music, e-mail etc.) and actions that a driver can
perform in cars via voice control. The workload manager is closely related to the event manager
that detects when to trigger actions and/or make decisions about potential actions. The system uses
a set of rules for starting and stopping the interactions (or interventions).
IBM’s Artificial Passenger is like having a butler in your car-someone who looks after you, takes
care of your every need, is bent on providing service, and has enough intelligence to anticipate
your needs. This voice-actuated telematics system helps you perform certain actions within your
car hands-free: turn on the radio, switch stations, make a cell phone call, and more. It provides
uniform access to devices and networked services in and outside your car. It reports car conditions
and external hazards with minimal distraction. Plus, it helps you stay awake with some form of
entertainment when it detects you’re getting drowsy.
You’re driving at 70 mph, it’s raining hard, a truck is passing, the car radio is blasting, and the
A/C is on. Such noisy environments are a challenge to speech recognition systems, including the
Artificial Passenger. IBM’s Audio Visual Speech Recognition (AVSR) cuts through the noise. It
reads lips to augment speech recognition. Cameras focused on the driver’s mouth do the lip
reading; IBM’s Embedded Via Voice does the speech recognition. In places with moderate noise,
where conventional speech recognition has a 1% error rate, the error rate of AVSR is less than 1%.
In places roughly ten times noisier, speech recognition has about a 2% error rate; AVSR’s is still
pretty good (1% error rate).
C. Analyzing Data
IBM’s Automated Analysis Initiative is a data management system for identifying failure trends
and predicting specific vehicle failures before they happen. The system comprises capturing,
retrieving, storing, and analyzing vehicle data; exploring data to identify features and trends;
developing and testing reusable analytics; and evaluating as well as deriving corrective measures.
10
It involves several reasoning techniques, including filters, transformations, fuzzy logic, and
clustering/mining. An Internet-based diagnostics server reads the car data to determine the root
cause of a problem or lead the technician through a series of tests. The server also takes a
“snapshot” of the data and repair steps. Should the problem reappear, the system has the fix readily
available.
Will provide you with a “shortest time” routing based on road conditions changing because of
weather and traffic, remote diagnostics of your car and cars on your route, destination requirements
(your flight is delayed) etc.
11
CHAPTER FOUR
CONCLUSION
We suggested that such important issues related to a driver safety, such as controlling Telematics
devices and drowsiness can be addressed by a special speech interface. This interface requires
interactions with workload, dialog, event, privacy, situation and other modules. We showed that
basic speech interactions can be done in a low-resource embedded processor and this allows a
development of a useful local component of Safety Driver Manager. The reduction of conventional
speech processes to low resources processing was done by reducing a signal processing and
decoding load in such a way that it did not significantly affect decoding accuracy and by the
development of quasi-NLU principles. We observed that an important application like Artificial
Passenger can be sufficiently entertaining for a driver with relatively little dialog complexity
requirements – playing simple voice games with a vocabulary containing a few words. Successful
implementation of Safety Driver Manager would allow use of various services in cars (like reading
e-mail, navigation, downloading music titles etc.) without compromising a driver safety.
12
REFERENCES
L.R. Bahl et al., "Performance of the IBM Large Vocabulary Continuous Speech Recognition
System on the ARPA Wall Street Journal Task," ICASSP 1995, vol.1, pp 41-44.). 6.
Lawrence
http://www.scribd.com/doc/19067668/artificialpas senger
http://www.freepatentsonline.com/4682348.html
http://seminars4you.wordpress.com/artificialpassenger/
www.visualexpert.com/Resources/roadaccidents.ht ml
13