Assignment 1
Assignment 1
By the 1920s, John B. Watson had left academic psychology, and other behaviorists were
becoming influential, proposing new forms of learning other than classical conditioning.
Perhaps the most important of these was Burrhus Frederic Skinner. Although, for obvious
reasons, he is more commonly known as B.F. Skinner.
Skinner’s views were slightly less extreme than Watson’s (1913). Skinner believed that we
do have such a thing as a mind, but that it is simply more productive to study observable
behavior rather than internal mental events.
Skinner’s work was rooted in the view that classical conditioning was far too simplistic to
fully explain complex human behavior. He believed that the best way to understand behavior
is to examine its causes and consequences. He called this approach operant conditioning.
How It Works
Skinner is regarded as the father of Operant Conditioning, but his work was based
on Thorndike’s (1898) Law of Effect. According to this principle, behavior that is followed
by pleasant consequences is likely to be repeated, and behavior followed by unpleasant
consequences is less likely to be repeated.
Skinner introduced a new term into the Law of Effect – Reinforcement. Behavior that is
reinforced tends to be repeated (i.e., strengthened); behavior that is not reinforced tends to die
out or be extinguished (i.e., weakened).
Skinner (1948) studied operant conditioning by conducting experiments using animals, which
he placed in a “Skinner Box,” which was similar to Thorndike’s puzzle box.
A Skinner box, also known as an operant conditioning chamber, is a device used to
objectively record an animal’s behavior in a compressed time frame. An animal can be
rewarded or punished for engaging in certain behaviors, such as lever pressing (for rats) or
key pecking (for pigeons).
Skinner identified three types of responses, or operant, that can follow behavior.
Neutral operants: Responses from the environment that neither increase nor decrease the
probability of a behavior being repeated.
Reinforcers: Responses from the environment that increase the probability of a behavior
being repeated. Reinforcers can be either positive or negative.
Punishers: Responses from the environment that decrease the likelihood of a behavior
being repeated. Punishment weakens behavior.
We can all think of examples of how reinforcers and punishers have affected our behavior.
As a child, you probably tried out a number of behaviors and learned from their
consequences.
For example, when you were younger, if you tried smoking at school, and the chief
consequence was that you got in with the crowd you always wanted to hang out with, you
would have been positively reinforced (i.e., rewarded) and would be likely to repeat the
behavior.
If, however, the main consequence was that you were caught, caned, suspended from school,
and your parents became involved, you would most certainly have been punished, and you
would consequently be much less likely to smoke now.
Positive Reinforcement
B. F. Skinner’s theory of operant conditioning describes positive reinforcement. In positive
reinforcement, a response or behavior is strengthened by rewards, leading to the repetition of
the desired behavior. The reward is a reinforcing stimulus.
Primary reinforcers are stimuli that are naturally reinforcing because they are not learned and
directly satisfy a need, such as food or water.
Secondary reinforcers are stimuli that are reinforced through their association with a primary
reinforcer, such as money, school grades. They do not directly satisfy an innate need but may
be the means. So a secondary reinforcer can be just as powerful a motivator as a primary
reinforcer.
Skinner showed how positive reinforcement worked by placing a hungry rat in his Skinner
box. The box contained a lever on the side, and as the rat moved about the box, it would
accidentally knock the lever. Immediately, it did so that a food pellet would drop into a
container next to the lever.
After being put in the box a few times, the rats quickly learned to go straight to the lever. The
consequence of receiving food if they pressed the lever ensured that they would repeat the
action again and again.
Positive reinforcement strengthens a behavior by providing a consequence an individual finds
rewarding. For example, if your teacher gives you £5 each time you complete your
homework (i.e., a reward), you will be more likely to repeat this behavior in the future, thus
strengthening the behavior of completing your homework.
This method incentivizes the less desirable behavior by associating it with a desirable
outcome, thus strengthening the less favored behavior.
Negative Reinforcement
Negative reinforcement is the termination of an unpleasant state following a response.
For example, if you do not complete your homework, you give your teacher £5. You will
complete your homework to avoid paying £5, thus strengthening the behavior of completing
your homework.
Skinner showed how negative reinforcement worked by placing a rat in his Skinner box and
then subjecting it to an unpleasant electric current which caused it some discomfort. As the
rat moved about the box it would accidentally knock the lever.
Immediately, it did so the electric current would be switched off. The rats quickly learned to
go straight to the lever after being put in the box a few times. The consequence of escaping
the electric current ensured that they would repeat the action again and again.
In fact, Skinner even taught the rats to avoid the electric current by turning on a light just
before the electric current came on. The rats soon learned to press the lever when the light
came on because they knew that this would stop the electric current from being switched on.
These two learned responses are known as Escape Learning and Avoidance Learning.
Punishment
Punishment is the opposite of reinforcement since it is designed to weaken or eliminate a
response rather than increase it. It is an aversive event that decreases the behavior that it
follows.
Like reinforcement, punishment can work either by directly applying an unpleasant stimulus
like a shock after a response or by removing a potentially rewarding stimulus, for instance,
deducting someone’s pocket money to punish undesirable behavior.
Note: It is not always easy to distinguish between punishment and negative reinforcement.
They are two distinct methods of punishment used to decrease the likelihood of a specific
behavior occurring again, but they involve different types of consequences:
1. Positive Punishment:
Positive punishment involves adding an aversive stimulus or something unpleasant
immediately following a behavior to decrease the likelihood of that behavior happening
in the future.
It aims to weaken the target behavior by associating it with an undesirable
consequence.
Example: A child receives a scolding (an aversive stimulus) from their parent
immediately after hitting their sibling. This is intended to decrease the likelihood of the
child hitting their sibling again.
2. Negative Punishment:
Negative punishment involves removing a desirable stimulus or something rewarding
immediately following a behavior to decrease the likelihood of that behavior happening
in the future.
It aims to weaken the target behavior by taking away something the individual values
or enjoys.
Example: A teenager loses their video game privileges (a desirable stimulus) for not
completing their chores. This is intended to decrease the likelihood of the teenager
neglecting their chores in the future.
Causes increased aggression – shows that aggression is a way to cope with problems.
Creates fear that can generalize to undesirable behaviors, e.g., fear of school.
Does not necessarily guide you toward desired behavior – reinforcement tells you what to
do, and punishment only tells you what not to do.
2. Negative Reinforcement: If you notice your team working together effectively and
exhibiting excellent team spirit during a tough training session, you might end the training
session earlier than planned, which the team perceives as a relief. They understand that
teamwork leads to positive outcomes, reinforcing team behavior.
3. Negative Punishment: If an office worker continually arrives late, their manager might
revoke the privilege of flexible working hours. This removal of a positive stimulus
encourages the employee to be punctual.
4. Positive Reinforcement: Training a cat to use a litter box can be achieved by giving it a
treat each time it uses it correctly. The cat will associate the behavior with the reward and
will likely repeat it.
5. Negative Punishment: If teenagers stay out past their curfew, their parents might take
away their gaming console for a week. This makes the teenager more likely to respect their
curfew in the future to avoid losing something they value.
6. Ineffective Punishment: Your child refuses to finish their vegetables at dinner. You
punish them by not allowing dessert, but the child still refuses to eat vegetables next time.
The punishment seems ineffective.
7. Premack Principle Application: You could motivate your child to eat vegetables by
offering an activity they love after they finish their meal. For instance, for every vegetable
eaten, they get an extra five minutes of video game time. They value video game time,
which might encourage them to eat vegetables.
8. Other Premack Principle Examples:
A student who dislikes history but loves art might earn extra time in the art studio for
each history chapter reviewed.
For every 10 minutes a person spends on household chores, they can spend 5 minutes
on a favorite hobby.
For each successful day of healthy eating, an individual allows themselves a small
piece of dark chocolate at the end of the day.
A child can choose between taking out the trash or washing the dishes. Giving them the
choice makes them more likely to complete the chore willingly.
One of the most famous of these experiments is often colloquially referred to as “Superstition
in the Pigeon.”
The Experiment:
1. Pigeons were brought to a state of hunger, reduced to 75% of their well-fed weight.
2. They were placed in a cage with a food hopper that could be presented for five seconds at a
time.
3. Instead of the food being given as a result of any specific action by the pigeon, it was
presented at regular intervals, regardless of the pigeon’s behavior.
Observation:
1. Over time, Skinner observed that the pigeons began to associate whatever random action
they were doing when food was delivered with the delivery of the food itself.
2. This led the pigeons to repeat these actions, believing (in anthropomorphic terms) that their
behavior was causing the food to appear.
Findings:
2. These behaviors did not appear until the food hopper was introduced and presented
periodically.
3. These behaviors were not initially related to the food delivery but became linked in the
pigeon’s mind due to the coincidental timing of the food dispensing.
4. The behaviors seemed to be associated with the environment, suggesting the pigeons were
responding to certain aspects of their surroundings.
5. The rate of reinforcement (how often the food was presented) played a significant role.
Shorter intervals between food presentations led to more rapid and defined conditioning.
6. Once a behavior was established, the interval between reinforcements could be increased
without diminishing the behavior.
Superstitious Behavior:
The pigeons began to act as if their behaviors had a direct effect on the presentation of food,
even though there was no such connection. This is likened to human superstitions, where
rituals are believed to change outcomes, even if they have no real effect.
For example, a card player might have rituals to change their luck, or a bowler might make
gestures believing they can influence a ball already in motion.
Conclusion:
This experiment demonstrates that behaviors can be conditioned even without a direct cause-
and-effect relationship. Just like humans, pigeons can develop “superstitious” behaviors
based on coincidental occurrences.
This study not only illuminates the intricacies of operant conditioning but also draws parallels
between animal and human behaviors in the face of random reinforcements.
Schedules Of Reinforcement
Imagine a rat in a “Skinner box.” In operant conditioning, if no food pellet is delivered
immediately after the lever is pressed, then after several attempts, the rat stops pressing the
lever (how long would someone continue to go to work if their employer stopped paying
them?). The behavior has been extinguished.
Behaviorists discovered that different patterns (or schedules) of reinforcement had different
effects on the speed of learning and extinction. Ferster and Skinner (1957) devised different
ways of delivering reinforcement and found that this had effects on
1. The Response Rate – The rate at which the rat pressed the lever (i.e., how hard the rat
worked).
2. The Extinction Rate – The rate at which lever pressing dies out (i.e., how soon the rat
gave up).
Skinner found that variable-ratio reinforcement produces the slowest rate of extinction (i.e.,
people will continue repeating the behavior for the longest time without reinforcement). The
type of reinforcement with the quickest rate of extinction is continuous reinforcement.
An animal or human is positively reinforced every time a specific behavior occurs, e.g., every
time a lever is pressed, a pellet is delivered, and then food delivery is shut off.
Behavior is reinforced only after the behavior occurs a specified number of times. e.g., one
reinforcement is given after every so many correct responses, e.g., after every 5th response.
For example, a child receives a star for every five words spelled correctly.
One reinforcement is given after a fixed time interval providing at least one correct response
has been made. An example is being paid by the hour. Another example would be every 15
minutes (half hour, hour, etc.) a pellet is delivered (providing at least one lever press has been
made) then food delivery is shut off.
Response rate is FAST
Extinction rate is SLOW (very hard to extinguish because of unpredictability)
Providing one correct response has been made, reinforcement is given after an unpredictable
amount of time has passed, e.g., on average every 5 minutes. An example is a self-employed
person being paid at unpredictable times.
Applications In Psychology
There are different types of positive reinforcements. Primary reinforcement is when a reward
strengths a behavior by itself. Secondary reinforcement is when something strengthens a
behavior because it leads to a primary reinforcer.
Examples of behavior modification therapy include token economy and behavior shaping.
Token Economy
Token economy is a system in which targeted behaviors are reinforced with tokens
(secondary reinforcers) and later exchanged for rewards (primary reinforcers).
Tokens can be in the form of fake money, buttons, poker chips, stickers, etc. While the
rewards can range anywhere from snacks to privileges or activities. For example, teachers use
token economy at primary school by giving young children stickers to reward good behavior.
Token economy has been found to be very effective in managing psychiatric patients.
However, the patients can become over-reliant on the tokens, making it difficult for them to
adjust to society once they leave prison, hospital, etc.
Staff implementing a token economy program have a lot of power. It is important that staff
do not favor or ignore certain individuals if the program is to work. Therefore, staff need to
be trained to give tokens fairly and consistently even when there are shift changes such as in
prisons or in a psychiatric hospital.
Behavior Shaping
A further important contribution made by Skinner (1951) is the notion of behavior shaping
through successive approximation.
Skinner argues that the principles of operant conditioning can be used to produce extremely
complex behavior if rewards and punishments are delivered in such a way as to encourage
move an organism closer and closer to the desired behavior each time.
In shaping, the form of an existing response is gradually changed across successive trials
towards a desired target behavior by rewarding exact segments of behavior.
To do this, the conditions (or contingencies) required to receive the reward should shift each
time the organism moves a step closer to the desired behavior.
According to Skinner, most animal and human behavior (including language) can be
explained as a product of this type of successive approximation.
2. Educational Applications
In the conventional learning situation, operant conditioning applies largely to issues of class
and student management, rather than to learning content. It is very relevant to shaping skill
performance.
A variable-ratio produces the highest response rate for students learning a new task, whereby
initial reinforcement (e.g., praise) occurs at frequent intervals, and as the performance
improves reinforcement occurs less frequently, until eventually only exceptional outcomes
are reinforced.
For example, if a teacher wanted to encourage students to answer questions in class they
should praise them for every attempt (regardless of whether their answer is correct).
Gradually the teacher will only praise the students when their answer is correct, and over time
only exceptional answers will be praised.
Unwanted behaviors, such as tardiness and dominating class discussion can be extinguished
through being ignored by the teacher (rather than being reinforced by having attention drawn
to them). This is not an easy task, as the teacher may appear insincere if he/she thinks too
much about the way to behave.
This is not an easy task, as the teacher may appear insincere if he/she thinks too much about
the way to behave.
Learning Type
Learning Process
Over time, the person responds to the neutral stimulus as if it were the unconditioned
stimulus, even when presented alone. The response is involuntary and automatic.
An example is a dog salivating (response) at the sound of a bell (neutral stimulus) after it
has been repeatedly paired with food (unconditioned stimulus).
For instance, if a child gets praised (pleasant consequence) for cleaning their room
(behavior), they’re more likely to clean their room in the future.
Conversely, if they get scolded (unpleasant consequence) for not doing their homework,
they’re more likely to complete it next time to avoid the scolding.
The timing of the response relative to the stimulus differs between classical and operant
conditioning:
Classical Conditioning (response after the stimulus): In this form of conditioning, the
response occurs after the stimulus. The behavior (response) is determined by what
precedes it (stimulus).
For example, in Pavlov’s classic experiment, the dogs started to salivate (response) after
they heard the bell (stimulus) because they associated it with food.
Operant Conditioning (response before the stimulus): In this form of conditioning, the
response generally occurs before the consequence (which acts as the stimulus for future
behavior).
The anticipated consequence influences the behavior or what follows it. It is a more active
form of learning, where behaviors are reinforced or punished, thus influencing their
likelihood of repetition.
Summary
Looking at Skinner’s classic studies on pigeons’ and rats’ behavior, we can identify some of
the major assumptions of the behaviorist approach.
• Psychology should be seen as a science, to be studied in a scientific manner. Skinner’s
study of behavior in rats was conducted under carefully controlled laboratory conditions.
• The major influence on human behavior is learning from our environment. In the Skinner
study, because food followed a particular behavior the rats learned to repeat that behavior,
e.g., operant conditioning.
• There is little difference between the learning that takes place in humans and that in other
animals. Therefore research (e.g., operant conditioning) can be carried out on animals (Rats /
Pigeons) as well as on humans. Skinner proposed that the way humans learn behavior is
much the same as the way the rats learned to press a lever.
So, if your layperson’s idea of psychology has always been of people in laboratories wearing
white coats and watching hapless rats try to negotiate mazes to get to their dinner, then you
are probably thinking of behavioral psychology.
Behaviorism and its offshoots tend to be among the most scientific of the psychological
perspectives. The emphasis of behavioral psychology is on how we learn to behave in certain
ways.
We are all constantly learning new behaviors and how to modify our existing behavior.
Behavioral psychology is the psychological approach that focuses on how this learning takes
place.
Critical Evaluation
Operant conditioning can explain a wide variety of behaviors, from the learning process to
addiction and language acquisition. It also has practical applications (such as token economy)
that can be used in classrooms, prisons, and psychiatric hospitals.
Researchers have found innovative ways to apply operant conditioning principles to promote
health and habit change in humans.
In a recent study, operant conditioning using virtual reality (VR) helped stroke patients use
their weakened limb more often during rehabilitation. Patients shifted their weight in VR
games by maneuvering a virtual object. When they increased weight on their weakened side,
they received rewards like stars. This positive reinforcement conditioned greater paretic limb
use (Kumar et al., 2019).
Another study utilized operant conditioning to assist smoking cessation. Participants earned
vouchers exchangeable for goods and services for reducing smoking. This reward system
reinforced decreasing cigarette use. Many participants achieved long-term abstinence
(Dallery et al., 2017).
Through repeated reinforcement, operant conditioning can facilitate forming exercise and
eating habits. A person trying to exercise more might earn TV time for every 10 minutes
spent working out. An individual aiming to eat healthier may allow themselves a daily dark
chocolate square for sticking to nutritious meals. Providing consistent rewards for desired
actions can instill new habits (Michie et al., 2009).
Apps like Habitica apply operant conditioning by gamifying habit tracking. Users earn points
and collect rewards in a fantasy game for completing real-life habits. This virtual
reinforcement helps ingrain positive behaviors (Eckerstorfer et al., 2019).
Operant conditioning also shows promise for managing ADHD and OCD. Rewarding
concentration and focus in ADHD children, for example, can strengthen their attention skills
(Rosén et al., 2018). Similarly, reinforcing OCD patients for resisting compulsions may
diminish obsessive behaviors (Twohig et al., 2018).
However, operant conditioning fails to take into account the role of inherited and cognitive
factors in learning, and thus is an incomplete explanation of the learning process in humans
and animals.
For example, Kohler (1924) found that primates often seem to solve problems in a flash of
insight rather than be trial and error learning. Also, social learning theory (Bandura, 1977)
suggests that humans can learn automatically through observation rather than through
personal experience.
The use of animal research in operant conditioning studies also raises the issue of
extrapolation. Some psychologists argue we cannot generalize from studies on animals to
humans as their anatomy and physiology are different from humans, and they cannot think
about their experiences and invoke reason, patience, memory or self-comfort.
Frequently Asked Questions
Operant conditioning was discovered by B.F. Skinner, an American psychologist, in the mid-
20th century. Skinner is often regarded as the father of operant conditioning, and his work
extensively dealt with the mechanism of reward and punishment for behaviors, with the
concept being that behaviors followed by positive outcomes are reinforced, while those
followed by negative outcomes are discouraged.
Operant conditioning differs from classical conditioning, focusing on how voluntary behavior
is shaped and maintained by consequences, such as rewards and punishments.
While both types of conditioning involve learning and behavior modification, operant
conditioning emphasizes the role of reinforcement and punishment in shaping voluntary
behavior.
Operant conditioning is a core component of social learning theory, which emphasizes the
importance of observational learning and modeling in acquiring and modifying behavior.
Social learning theory suggests that individuals can learn new behaviors by observing others
and the consequences of their actions, which is similar to the reinforcement and punishment
processes in operant conditioning.
By observing and imitating models, individuals can acquire new skills and behaviors and
modify their own behavior based on the outcomes they observe in others.
Overall, both operant conditioning and social learning theory highlight the importance of
environmental factors in shaping behavior and learning.
The downsides of using operant conditioning on individuals include the potential for
unintended negative consequences, particularly with the use of punishment. Punishment may
lead to increased aggression or avoidance behaviors.
Additionally, some behaviors may be difficult to shape or modify using operant conditioning
techniques, particularly when they are highly ingrained or tied to complex internal states.
Furthermore, individuals may resist changing their behaviors to meet the expectations of
others, particularly if they perceive the demands or consequences of the reinforcement or
punishment to be undesirable or unjust.
For example, a student may earn extra recess time (positive reinforcement) for completing
homework on time, or lose the privilege to use class computers (negative punishment) for
misbehavior.
References
Bandura, A. (1977). Social learning theory. Englewood Cliffs, NJ: Prentice Hall.
Dallery, J., Meredith, S., & Glenn, I. M. (2017). A deposit contract method to deliver
abstinence reinforcement for cigarette smoking. Journal of Applied Behavior Analysis,
50(2), 234–248.
Eckerstorfer, L., Tanzer, N. K., Vogrincic-Haselbacher, C., Kedia, G., Brohmer, H.,
Dinslaken, I., & Corbasson, R. (2019). Key elements of mHealth interventions to
successfully increase physical activity: Meta-regression. JMIR mHealth and uHealth,
7(11), e12100.
Ferster, C. B., & Skinner, B. F. (1957). Schedules of reinforcement. New York: Appleton-
Century-Crofts.
Kohler, W. (1924). The mentality of apes. London: Routledge & Kegan Paul.
Kumar, D., Sinha, N., Dutta, A., & Lahiri, U. (2019). Virtual reality-based balance training
system augmented with operant conditioning paradigm. Biomedical Engineering
Online, 18(1), 1-23.
Michie, S., Abraham, C., Whittington, C., McAteer, J., & Gupta, S. (2009). Effective
techniques in healthy eating and physical activity interventions: A meta-regression. Health
Psychology, 28(6), 690–701.
Rosén, E., Westerlund, J., Rolseth, V., Johnson R. M., Viken Fusen, A., Årmann, E.,
Ommundsen, R., Lunde, L.-K., Ulleberg, P., Daae Zachrisson, H., & Jahnsen, H. (2018).
Effects of QbTest-guided ADHD treatment: A randomized controlled trial. European
Child & Adolescent Psychiatry, 27(4), 447–459.
Twohig, M. P., Whittal, M. L., Cox, J. M., & Gunter, R. (2010). An initial investigation
into the processes of change in ACT, CT, and ERP for OCD. International Journal of
Behavioral Consultation and Therapy, 6(2), 67–83.
Watson, J. B. (1913). Psychology as the behaviorist views it. Psychological Review, 20,
158–177.