Human Computer Interaction: Chapter 2 Human in HCI
Human Computer Interaction: Chapter 2 Human in HCI
The human, the user, is, after all, the one whom computer systems are designed to assist. The
requirements of the user should therefore be our first priority.
In this chapter we will look at areas of human psychology coming under the general banner of
cognitive psychology. This may seem a far cry from designing and building interactive computer
systems, but it is not. In order to design something for someone, we need to understand their
capabilities and limitations. We need to know if there are things that they will find difficult or,
even, impossible. It will also help us to know what people find easy and how we can help them by
encouraging these things. We will look at aspects of cognitive psychology which have a bearing
on the use of computer systems: how humans perceive the world around them, how they store and
process information and solve problems, and how they physically manipulate objects.
Card, Moran and Newell introduced the Model Human Processor, which is a simplified view of
the human processing involved in interacting with computer systems. The model comprises three
subsystems: the perceptual system, handling sensory stimulus from the outside world, the motor
system, which controls actions, and the cognitive system, which provides the processing needed
to connect the two. Each of these subsystems has its own processor and memory, although
obviously the complexity of these varies depending on the complexity of the tasks the subsystem
has to perform. The model also includes a number of principles of operation which dictate the
behavior of the systems under certain conditions.
Information comes in, is stored and processed, and information is passed out. We will therefore
discuss three components of this system: input–output, memory and processing. In the human,
we are dealing with an intelligent information-processing system, and processing therefore
includes problem solving, learning, and, consequently, making mistakes. This model is obviously
a simplification of the real situation, since memory and processing are required at all levels, as we
have seen in the Model Human Processor. However, it is convenient as a way of grasping how
information is handled by the human system. The human, unlike the computer, is also influenced
by external factors such as the social and organizational environment, and we need to be aware of
these influences as well.
Input Output channels
A person’s interaction with the outside world occurs through information being received and sent:
input and output. In an interaction with a computer the user receives information that is output by the
computer, and responds by providing input to the computer – the user’s output become the computer’s
1
Human computer interaction
input and vice versa. Consequently the use of the terms input and output may lead to confusion so we
shall blur the distinction somewhat and concentrate on the channels involved. This blurring is
appropriate since, although a particular channel may have a primary role as input or output in the
interaction, it is more than likely that it is also used in the other role. For example, sight may be used
primarily in receiving information from the computer, but it can also be used to provide information
to the computer, for example by fixating on a particular screen point when using an eye gaze system.
Input-Output in the human occurs mainly through the senses and motor control of the effectors.
✓ Senses: sight, hearing, touch, taste and smell. Of these, the first three are the most important
to HCI. Taste and smell do not currently play a significant role in HCI, and it is not clear
whether they could be exploited at all in general computer systems, although they could
have a role to play in more specialized systems or in augmented reality systems. However,
vision hearing and touch are central.
✓ Effectors: the limbs, fingers, eyes, head and vocal system. In the interaction with the
computer, the fingers play the primary role, through typing or mouse control, with some
use of voice, and eye, head and body position.
Vision
Human vision is a highly complex activity with range of physical and perceptual limitations, yet
it is the primary source of information for the average person. We can roughly divide visual
perception into two stages:
✓ the physical reception of the stimulus from outside world, and
✓ The processing and interpretation of that stimulus.
On the one hand the physical properties of the eye and the visual system mean that there are certain
things that cannot be seen by the human; on the other interpretative capabilities of visual
processing allow images to be constructed from incomplete information. We need to understand
both stages as both influence what can and can not be perceived visually by a human being, which
is turn directly affect the way that we design computer system. We will begin by looking at the
eye as a physical receptor, and then go onto consider the processing involved in basic vision.
The human eye
Vision begins with light. The eye is a mechanism for receiving light and transforming it into
electrical energy. Light is reflected from objects in the world and their image is focused upside
down on the back of the eye. The receptors in the eye transform it into electrical signals, which are
passed to brain.
2
Human computer interaction
The eye has a number of important components as you can see in the figure. Let us take a deeper
look. The cornea and lens at the front of eye focus the light into a sharp image on the back of the
eye, the retina. The retina is light sensitive and contains two types of photoreceptor: rods and cones.
Rods are highly sensitive to light and therefore allow us to see under a low level of illumination.
However, they are unable to resolve fine detail and are subject to light saturation. This is the reason
for the temporary blindness we get when moving from a darkened room into sunlight: the rods have
been active and are saturated by the sudden light. The cones do not operate either as they are suppressed
by the rods. We are therefore temporarily unable to see at all. There are approximately 120 million
rods per eye, which are mainly situated towards the edges of the retina. Rods therefore dominate
peripheral vision.
Cones are the second type of receptor in the eye. They are less sensitive to light than the rods and can
therefore tolerate more light. There are three types of cone, each sensitive to a different wavelength of
light. This allows color vision. The eye has approximately 6 million cones, mainly concentrated on
the fovea.
Hearing (Auditory input channel)
The sense of hearing is often considered secondary to sight, but we tend to underestimate the amount
of information that we receive through our ears. hearing begins with vibrations in the air or sound
waves. The ear receives these vibrations and transmits them, through various stages, to the auditory
3
Human computer interaction
nerves. The ear comprises three sections, commonly known as the Outer ear, middle ear and inner
ear.
The outer ear is the visible part of the ear. It has two parts: the pinna, which is the structure that is
attached to the sides of the head, and the auditory canal, along which sound waves are passed to the
middle ear. The outer ear serves two purposes. First, it protects the sensitive middle ear from damage.
The auditory canal contains wax which prevents dust, dirt and over-inquisitive insects reaching the
middle ear. It also maintains the middle ear at a constant temperature. Secondly, the pinna and auditory
canal serve to amplify some sounds.
The middle ear is a small cavity connected to the outer ear by the tympanic membrane, or ear drum,
and to the inner ear by the cochlea. Within the cavity are the ossicles, the smallest bones in the body.
Sound waves pass along the auditory canal and vibrate the ear drum which in turn vibrates the ossicles,
which transmit the vibrations to the cochlea, and so into the inner ear. This ‗relay‘ is required because,
unlike the air-filled outer and middle ears, the inner ear is filled with a denser cochlea liquid. If passed
directly from the air to the liquid, the transmission of the sound waves would be poor. By transmitting
them via the ossicles the sound waves are concentrated and amplified.
Processing sound
As we have seen, sound is changes or vibrations in air pressure. It has a number of characteristics
which we can differentiate.
✓ Pitch is the frequency of the sound. A low frequency produces a low pitch, a high frequency, a high
pitch.
✓ Loudness is proportional to the amplitude of the sound; the frequency remains constant.
✓ Timbre relates to the type of the sound: sounds may have the same pitch and loudness but be made
by different instruments and so vary in timbre.
We can also identify a sound‘s location, since the two ears receive slightly different sounds, owing to
the time difference between the sound reaching the two ears and the reduction in intensity caused by
the sound waves reflecting from the head. The human ear can hear frequencies from about 20 Hz to 15
kHz. It can distinguish frequency changes of less than 1.5 Hz at low frequencies but is less accurate at
high frequencies. Different frequencies trigger activity in neurons in different parts of the auditory
system, and cause different rates of firing of nerve impulses. The auditory system performs some
filtering of the sounds received, allowing us to ignore background noise and concentrate on important
information. We are selective in our hearing, as illustrated by the cocktail party effect, where we can
pick out our name spoken across a crowded noisy room. If sounds are too loud or frequencies are too
similar, we will be unable to differentiate sound. Sound can convey a remarkable amount of
4
Human computer interaction
information. In interface design, sound is usually used for warning and notifications. Sound could be
used more extensively in interface design, to convey information about the system state,
Touch
Touch provides us with vital information about our environment. It tells us when we touch something
hot or cold, and can therefore act as a warning. It also provides us with feedback when we attempt to
lift an object, for example. Consider the act of picking up a glass of water. If we could only see the
glass and not feel when our hand made contact with it or feel its shape, the speed and accuracy of the
action would be reduced. This is the experience of users of certain virtual reality games: they can see
the computer-generated objects which they need to manipulate but they have no physical sensation of
touching them. Watching such users can be an informative and amusing experience! Touch is therefore
an important means of feedback, and this is no less so in using computer systems. Feeling buttons
depress is
an important part of the task of pressing the button. Also, we should be aware that, although for the
average person, haptic perception is a secondary source of information, for those whose other senses
are impaired, it may be vitally important. For such users, interfaces such as braille may be the primary
source of information in the interaction. The apparatus of touch differs from that of sight and hearing
in that it is not localized. The skin contains three types of sensory receptor: thermo receptors respond
to heat and cold, nociceptors respond to intense pressure, heat and pain, and mechanoreceptors respond
to pressure.
Movement
A simple action such as hitting a button in response to a question involves a number of processing
stages. The stimulus (of the question) is received through the sensory receptors and transmitted to
the brain. The question is processed and a valid response generated. The brain then tells the
appropriate muscles to respond. Each of these stages takes time, which can be roughly divided into
reaction time and movement time.
Movement time is dependent largely on the physical characteristics of the subjects: their age and
fitness, for example. Reaction time varies according to the sensory channel through which the
stimulus is received. A person can react to an auditory signal in approximately 150 ms, to a visual
signal in 200 ms and to pain in 700 ms. A second measure of motor skill is accuracy. One question
that we should ask is whether speed of reaction results in reduced accuracy. This is dependent on
the task and the user. In some cases, requiring increased reaction time reduces accuracy. This is
the premise behind many arcade and video games where less skilled users fail at levels of play that
5
Human computer interaction
require faster responses. Speed and accuracy of movement are important considerations in the
design of interactive systems, primarily in terms of the time taken to move to a particular
target on a screen.
Sensory memory
The sensory memories act as buffers for stimuli received through the senses. A sensory memory exists
for each sensory channel: iconic memory for visual stimuli, echoic memory for aural stimuli and haptic
memory for touch. These memories are constantly overwritten by new information coming in on these
channels.
The existence of echoic memory is evidenced by our ability to ascertain the direction from which a
sound originates. This is due to information being received by both ears. Since this information is
received at different times, we must store the stimulus in the meantime. Echoic memory allows brief
‗play-back ‗of information. Information is passed from sensory memory into short-term memory by
attention, thereby filtering the stimuli to only those which are of interest at a given time.
6
Human computer interaction
Attention is the concentration of the mind on one out of a number of competing stimuli or thoughts. It
is clear that we are able to focus our attention selectively, choosing to attend to one thing rather than
another. This is due to the limited capacity of our sensory and mental processes.
Short-term memory
Short-term memory or working memory acts as a ‗scratch-pad‘ for temporary recall of information. It
is used to store information which is only required fleetingly. Short-term memory can be accessed
rapidly, in the order of 70 ms. It also decays rapidly, meaning that information can only be held there
temporarily, in the order of 200 ms. Short-term memory also has a limited capacity. There are two
basic methods for measuring memory capacity. The first involves determining the length of a sequence
which can be remembered in order. The second allows items to be freely recalled in any order.
Long-term memory
If short-term memory is our working memory or ‗scratch-pad‘, long-term memory is our main
resource. Here we store factual information, experiential knowledge, procedural rules of behavior – in
fact, everything that we ‗know‘. It differs from short-term memory in a number of significant ways.
First, it has a huge, if not unlimited, capacity. Secondly, it has a relatively slow access time of
approximately a tenth of a second. Thirdly, forgetting occurs more slowly in long-term memory, if at
all. Long-term memory is intended for the long-term storage of information. Information is placed
there from working memory through rehearsal. Unlike working memory there is little decay: long-term
recall after minutes is the same as that after hours or days.
Thinking can require different amounts of knowledge. Some thinking activities are much directed and
the knowledge required is constrained. Others require vast amounts of knowledge from different
domains. For example, performing a subtraction calculation requires a relatively small amount of
knowledge, from a constrained domain, whereas understanding newspaper headlines demands.
7
Human computer interaction
Reasoning
Reasoning is the process by which we use the knowledge we have to draw conclusions or infer
something new about the domain of interest. There are a number of different Types of reasoning:
deductive, inductive and abductive. We use each of these types of reasoning in everyday life, but
they differ in significant ways.
Deductive reasoning :- Deductive reasoning derives the logically necessary conclusion from the
given premises.
For example,
If it is Friday then she will go to work
It is Friday
Therefore she will go to work.
Inductive reasoning :-Induction is generalizing from cases we have seen to infer information about
cases we have not seen. Induction is a useful process, which we use constantly in learning about our
environment. We can never see all the elephants that have ever lived or will ever live, but we have
certain knowledge about elephants which we are prepared to trust for all practical purposes, which has
largely been inferred by induction. Even if we saw an elephant without a trunk, we would be unlikely
to move from our position that ‗All elephants have trunks‘, since we are better at using positive than
negative evidence.
Abductive reasoning :-The third type of reasoning is abduction. Abduction reasons from a fact to the
action or state that caused it. This is the method we use to derive explanations for the events we observe.
For example, suppose we know that Sam always drives too fast when she has been drinking. If we see
Sam driving too fast we may infer that she has been drinking. Of course, this too is unreliable since
there may be another reason why she is driving fast: she may have been called to an emergency.
Problem solving
Human problem solving is characterized by the ability to adapt the information we have to deal with
new situations often solutions seem to be original and creative. There are a number of different views
of how people solve problems.
The Gestalt view that problem solving involves both reuse of knowledge and insight. This has been
largely superseded but the questions it was trying to address remain and its influence can be seen in
later research. In the 1970s by Newell and Simon, was the problem space theory, which takes the view
that the mind is a limited information processor.
Gestalt theory
Gestalt psychologists were answering the claim, made by behaviorists, that problem solving is a matter
of reproducing known responses or trial and error. This explanation was considered by the Gestalt
school to be insufficient to account for human problem-solving behavior. Instead, they claimed,
8
Human computer interaction
problem solving is both productive and reproductive. Reproductive problem solving draws on previous
experience as the behaviorists claimed, but productive problem solving involves insight and
restructuring of the problem. Indeed, reproductive problem solving could be a hindrance to finding a
solution, since a person may ‘fixate‘on the known aspects of the problem and so be unable to see novel
interpretations that might lead to a solution. Gestalt psychologists backed up their claims with
experimental evidence.
Problem space theory
Newell and Simon proposed that problem solving centers on the problem space. The problem space
comprises problem states, and problem solving involves generating these states using legal state
transition operators. The problem has an initial state and a goal state and people use the operators to
move from the former to the latter. Such problem spaces may be huge, and so heuristics are employed
to select appropriate operators to reach the goal. One such heuristic is means–ends analysis. In means–
ends analysis the initial state is compared with the goal state and an operator chosen to reduce the
difference between the two.
Newell and Simon‘s theory, and their General Problem Solver model which is based on it, have largely
been applied to problem solving in well-defined domains, for example solving puzzles. These problems
may be unfamiliar but the knowledge that is required to solve them is present in the statement of the
problem and the expected solution is clear. In real-world problems finding the knowledge required to
solve the problem may be part of the problem, or specifying the goal may be difficult.