Member Login
Username:Password:
or Sign up here
Discover

REINFORCEMENT


In operant conditioning, 'reinforcement' is an increase in the strength of a response following the presentation of a stimulus contingent on that response. Response strength can be assessed by measures such as the frequency with which the response is made (for example, a pigeon may increase the rate at which it pecks a key), or the speed with which it is made (for example, a rat may run a maze faster). The stimulus contingent on a response is called a 'reinforcer'. Reinforcement can only be confirmed retrospectively, as objects, items, food or other potential 'reinforcers' can only be called such by demonstrating increases in behavior after their administration. It is the strength of the response that is reinforced, not the organism.

Contents
Types of reinforcement
Descriptive types
Primary reinforcers
Secondary reinforcers
Other reinforcement terms
Schedules of reinforcement
Simple schedules
Effects of different types of simple schedules
Compound schedules
Shaping
Chaining
Controversies
History of the terms
See also
References
External links

Types of reinforcement


B.F. Skinner, the researcher who articulated the major theoretical constructs of reinforcement and behaviorism, refused to specify causal origins of reinforcers. Skinner argued that reinforcers are defined by a change in response strength (that is, functionally rather than causally), and that what is a reinforcer to one person may not be to another. Accordingly, activities, foods or items which are generally considered pleasant or enjoyable may not necessarily be reinforcing; they can only be considered so if the behavior that immediately precedes the potential reinforcer increases in similar future situations. If a child receives a cookie when he or she asks for one, and the frequency of 'cookie-requesting behavior' increases, the cookie can be seen as reinforcing 'cookie-requesting behavior'. If however, cookie-requesting behavior does not increase, the cookie cannot be considered reinforcing. The sole criterion which can determine if an item, activity or food is reinforcing is the change in the probability of a behavior after the administration of a potential reinforcer. Other theories may focus on additional factors such as whether the person expected the strategy to work at some point, but a behavioral theory of reinforcement would focus specifically upon the probability of the behavior.
The study of reinforcement has produced an enormous body of reproducible experimental results. Reinforcement is the central concept and procedure in the experimental analysis of behavior and much of quantitative analysis of behavior.
Descriptive types

There are two types of behavioral reinforcers and two types of behavioral punishers.

★ 'Positive reinforcement' is an increase in the likelihood of a behavior due to the ''addition'' of a reinforcer after a behavior. Giving (or ''adding'') food to a dog contingent on its remaining in a sitting position for a specified length of time is an example of positive reinforcement (if this increases the likelihood of the dog sitting in the future).

★ 'Negative reinforcement' is an increase in the likelihood of a behavior when the consequence is the ''removal'' of an aversive stimulus. Turning off (or ''removing'') a shock when a rat presses a bar is an example of negative reinforcement (if this increases the likelihood of the rat pressing the bar in the future).


★ ''Avoidance conditioning'' is a form of negative reinforcement that occurs when a behavior prevents an aversive stimulus from starting or being applied.


★ ''Escape conditioning'' is a form of negative reinforcement that occurs when behavior removes an aversive stimulus that has already started.
Punishment is the opposite of reinforcement, and causes the probability of behaviors to decrease after a punisher is applied. Like reinforcement, punishment comes in two forms:

★ 'Positive punishment' changes the surroundings by ''adding'' an aversive stimulus following a behaviour in order to decrease the likelihood of the behaviour occurring in the future. An example is shocking an animal whenever it pressed a lever pressing which had been previously reinforced.

★ 'Negative punishment' changes the surroundings by ''removing'' a stimulus that is a reinforcer. An example is removal of a food supply contingent on undesirable behavior.
  decreases likelihood of behavior increases likelihood of behavior
presented positive punishment positive reinforcement
taken away negative punishment negative reinforcement

Distinguishing "positive" from "negative" can be difficult, and often largely a matter of emphasis. For example, in a very warm room, a current of external air serves as positive reinforcement because it is pleasantly cool or negative reinforcement because it removes uncomfortably hot air. Some reinforcement can be simultaneously positive and negative, such as a drug addict taking drugs for the added euphoria and eliminating withdrawal symptoms. Many behavioral psychologists simply refer to reinforcement or punishment—without polarity—to cover all consequent environmental changes.
Primary reinforcers

A 'primary reinforcer,' sometimes called an ''unconditioned reinforcer'', is a stimulus that does not require pairing to be reinforcing and is necessary for a species' survival. Examples of 'primary reinforcers' include sleep, food, air, water, and sex. Other primary reinforcers, such as certain drugs, may mimic the effects of other primary reinforcers. While these primary reinforcers are fairly stable through life and across individuals, the reinforcing value of different primary reinforcers varies due to multiple factors (e.g., genetics, experience). Thus, one person may prefer one type of food while another abhors it. Or one person may eat lots of food while another eats very little. So even though food is a primary reinforcer for both individuals, the value of food as a reinforcer differs between them.
Often primary reinforcers shift their reinforcing value temporarily through 'satiation' and 'deprivation.' Food, for example, will cease to be effective in increasing response strength once a certain amount has been consumed (satiation). After a period during which it does not receive any of the primary reinforcer (deprivation), however, the primary reinforcer will once more be effective in increasing response strength.
Secondary reinforcers

A 'secondary reinforcer', sometimes called a ''conditioned reinforcer'', is a stimulus or situation that has acquired reinforcing power after being associated with a primary reinforcer or an earlier conditioned reinforcer (such as money). An example of a secondary reinforcer would be a clicker, as used in clicker training. A dog associates the clicker with praise or treats, and now the clicker is reinforcing. As with primary reinforcers, an organism can experience satiation and deprivation with secondary reinforcers.
Other reinforcement terms


★ A 'generalized reinforcer' is a conditioned reinforcer that has been paired with many other reinforcers (such as money, a secondary generalized reinforcer).

★ In 'reinforcer sampling' a potentially reinforcing but unfamiliar stimulus is presented to an animal without regard to any prior behavior. The stimulus may then later be used more effectively in reinforcement.

★ 'Social reinforcement' involves various sorts of access to and interaction with others.

★ 'Premack principle' is a special case of reinforcement elaborated by David Premack, which states that a commonly occurring action can be used effectively as a reinforcer for a less commonly occurring one.

★ 'Reinforcement hierarchy' is a list of actions, starting with the most desirable and ending with the least desirable. A reinforcement hierarchy can be used to determine the relative frequency and desirability of different actions, and is employed when applying the Premack principle.

★ 'Contingent' outcomes are more likely to reinforce behavior than non-contingent responses. Contingent outcomes are those directly linked to a causal behavior, such a light turning on being contingent on flipping a switch. Note that contingent outcomes are 'not' necessary to demonstrate reinforcement, but perceived contingency may increase learning.

★ 'Contiguous' responses are closely associated by time and space with specific behaviors, reduce the amount of time needed to learn a behavior while increasing its resistance to extinction. Giving a dog a piece of food immediately after sitting is more contiguous with (and therefore more likely to reinforce) sitting behavior than giving the dog food several minutes after sitting.

Schedules of reinforcement


When an animal's surroundings are controlled, its behavior patterns after reinforcement become predictable, even for very complex behavior patterns. A 'schedule of reinforcement' is the protocol for determining when responses or behaviors will be reinforced, ranging from continuous reinforcement, in which every response is reinforced, and extinction, in which no response is reinforced. Between these extremes is ''intermittent'' or ''partial reinforcement'' where only some responses are reinforced.
Specific variations of intermittent reinforcement reliably induce specific patterns of response, irrespective of the species being investigated (including humans in some conditions). The orderliness and predictability of behaviour under schedules of reinforcement was evidence for B. F. Skinner's claim that using operant conditioning he could obtain "control over behaviour", in a way that rendered the theoretical disputes of contemporary comparative psychology obsolete. The reliability of schedule control supported the idea that a radical behaviourist experimental analysis of behavior could be the foundation for a psychology that did not refer to mental or cognitive processes. The reliability of schedules also led to the development of Applied Behavior Analysis as a means of controlling or altering behavior.
Many of the simpler possibilities, and some of the more complex ones, were investigated at great length by Skinner using pigeons, but new schedules continue to be defined and investigated.
Simple schedules

A chart demonstrating the different response rate of the four simple schedules of reinforcement, each hatch mark designates a reinforcer being given

Simple schedules have a single rule to determine when a single type of reinforcer is delivered for specific response.

★ 'Fixed ratio' (FR) schedules deliver reinforcement after every ''n''th response


★ 'Example:' FR2 = every second response is reinforced


★ 'Lab example:' FR5 = rat reinforced with food after each 5 bar-presses in a Skinner box.


★ 'Real-world example:' FR10 = Used car dealer gets a $1000 bonus for each 10 cars sold on the lot.

★ 'Continuous ratio' (CRF) schedules are a special form of a fixed ratio. In a continuous ratio schedule, reinforcement follows each and every response.


★ 'Lab example:' each time a rat presses a bar it gets a pellet of food


★ 'Real world example:' each time a dog defecates outside its owner gives it a treat

★ 'Fixed interval' (FI) schedules deliver reinforcement for the first response after a fixed length of time since the last reinforcement, while premature responses are not reinforced.


★ 'Example:' FI1" = reinforcement provided for the first response after 1 second


★ 'Lab example:' F15" = rat is reinforced for the first bar press after 15 seconds passes since the last reinforcement


★ 'Real world example:' F24 hour = calling a radio station is reinforced with a chance to win a prize, but the person can only sign up once per day

★ 'Variable ratio' (VR) schedules deliver reinforcement after a random number of responses (based upon a predetermined average)


★ 'Example:' VR3 = on average, every third response is reinforced


★ 'Lab example:' VR10 = on average, a rat is reinforced for each 10 bar presses


★ 'Real world example:' VR100 = on average a particular bachelor will get the phone number of the bachelorette he approaches

★ 'Variable interval' (VI) schedules deliver reinforcement for the first response after a random average length of time passes since the last reinforcement


★ 'Example:' VI3" = reinforcement is provided for the first response after an average of 3 seconds since the last reinforcement.


★ 'Lab example:' VI10" = a rat is reinforced for the first bar press after an average of 10 seconds passes since the last reinforcement


★ 'Real world example:' a predator can expect to come across a prey on a variable interval schedule
Other simple schedules include:

★ 'Differential reinforcement of incompatible behavior' (DRI) is used to reduce a frequent behavior without punishing it by reinforcing an incompatible response. An example would be reinforcing clapping to reduce nose picking.

★ 'Differential reinforcement of other behavior' (DRO) is used to reduce a frequent behavior by reinforcing ''any'' behavior other than the undesired one. An example would be reinforcing any hand action other than nose picking.

★ 'Differential reinforcement of low response rate' (DRL) is used to increase low rates of responding. It is like an interval schedule, except that premature responses reset the time required between behavior.


★ 'Lab example:' DRL10" = a rat is reinforced for the first response after 10 seconds, but if the rat responds earlier than 10 seconds there is no reinforcement and the rat has to wait 10 seconds from that premature response without another response before bar pressing will lead to reinforcement.


★ 'Real world example:' "If you ask me for a potato chip no more than once every 10 minutes, I will give it to you. If you ask more often, I will give you none."

★ 'Differential reinforcement of high rate' (DRH) is used to increase high rates of responding. It is like an interval schedule, except that a minimum number of responses are required in the interval in order to receive reinforcement.


★ 'Lab example:' DRH10"/15 responses = a rat must press a bar 15 times within a 10 second increment in order to be reinforced


★ 'Real world example:' "If Lance Armstrong is going to win the Tour de France he has to peddle x number of times during the x hour race.

★ 'Fixed Time' (FT) provides reinforcement at a fixed time since the last reinforcement, irrespective of whether the subject has responded or not. In other words, it is a non-contingent schedule.


★ 'Lab example:' FT5": rat gets food every 5" regardless of the behavior.


★ 'Real world example:' a person gets an annuity check every month regardless of behavior between checks

★ 'Variable Time' (VT) provides reinforcement at an average variable time since last reinforcement, regardless of whether the subject has responded or not
Effects of different types of simple schedules


★ Ratio schedules produce higher rates of responding than interval schedules

★ Variable schedules produce higher rates and greater resistance to extinction than most fixed schedules

★ The variable ratio schedule produces both the highest rate of responding and the greatest resistance to extinction (an example would be the behavior of gamblers at slot machines)

★ Fixed schedules produce 'post-reinforcement pauses' (PRP), where responses will briefly cease immediately following reinforcement


★ The PRP of a fixed interval schedule is scalloped-shape, while those of fixed ratio schedules are more angular.

★ Organisms whose schedules of reinforcement are 'thinned' (that is, requiring more responses or a greater wait before reinforcement) may experience 'ratio strain' if thinned too quickly. This produces behavior similar to that seen during extinction.

★ Partial reinforcement schedules are more resistant to extinction than continuous reinforcement schedules.


★ Ratio schedules are more resistant than interval schedules and variable schedules more resistant than fixed ones.
Compound schedules

Compound schedules combine two or more different simple schedules in some way using the same reinforcer for the same behaviour. There are many possibilities; among those most often used are:

★ 'Multiple schedules' - either of two, or more, schedules may occur with a stimulus indicating which is in force.


★ 'Example': FR4 when given a whistle and FI 6 when given a bell ring.

★ 'Mixed schedules' - either of two, or more, schedules may occur with no stimulus indicating which is in force.


★ 'Example': FI6 and then VR 3 without any stimulus warning of the change in schedule.

★ 'Concurrent schedules' - two schedules are simultaneously in force though not necessarily on two different response devices.

★ 'Chained schedules' - reinforcement occurs after two or more successive schedules have been completed, with a stimulus indicating when one schedule has been completed and the next has started.


★ 'Example': FR10 in a green light when completed it goes to a yellow light to indicate FR 3, after it's completed it goes into red light to indicate VI 6, etc. At the end of the chain, a reinforcer is given.

★ 'Tandem schedules' - reinforcement occurs when two or more successive schedule requirements have been completed, with no stimulus indicating when a schedule has been completed and the next has started.


★ 'Example': VR 10, after it is completed the schedule is changed without warning to FR 10, after that it is changed without warning to FR 16, etc. At the end of the series of schedules, a reinforcer is finally given.

★ 'Higher order schedules' - completion of one schedule is reinforced according to a second schedule; e.g. in FR2 (FI 10 secs), two successive fixed interval schedules would have to be completed before a response is reinforced.

Shaping


'Shaping' involves reinforcing successive, increasingly accurate approximations of a response desired by a trainer. In training a rat to press a lever, for example, simply turning toward the lever will be reinforced at first. Then, only turning and stepping toward it will be reinforced. As training progresses, the response reinforced becomes progressively more like the desired behavior.

Chaining


'Chaining' involves linking discrete behaviors together in a series, such that each result of each behaviour is both the reinforcement (or consequence) for the previous behavior, and the stimuli (or antecedent) for the next behavior. There are many ways to teach chaining, such as forward chaining (starting from the first behavior in the chain), backwards chaining (starting from the last behavior) and total task chaining (in which the entire behavior is taught from beginning to end, rather than as a series of steps). An example would be opening a locked door. First the key is inserted, then turned, then the door opened. Forward chaining would teach the subject first to insert the key. Once that task is mastered, they are told to insert the key, and taught to turn it. Once that task is mastered, they are told to perform the first two, then taught to open the door. Backwards chaining would involve the teacher first inserting and turning the key, and the subject is taught to open the door. Once that is learned, the teacher inserts the key, and the subject is taught to turn it, then opens the door as the next step. Finally, the subject is taught to insert the key, and they turn and open the door. Once the first step is mastered, the entire task has been taught. Total task chaining would involve teaching the entire task as a single series, prompting through all steps. Prompts are faded (reduced) at each step as they are mastered.

Controversies


The standard definition of behavioral reinforcement has been criticized as circular, since it appears to argue that response strength is increased by reinforcement while defining reinforcement as something which increases response strength; that is, the standard definition says only that response strength is increased by things which increase response strength. Other definitions have been proposed, such as F. D. Sheffield's "consummatory behavior contingent on a response," but these are not broadly used in psychology.[1]

History of the terms


In the 1920s Russian physiologist Ivan Pavlov may have been the first to use the word ''reinforcement'' with respect to behavior, but (according to Dinsmoor) he used its approximate Russian cognate sparingly, and even then it referred to strengthening an already-learned but weakening response. He did not use it, as it is today, for selecting and strengthening new behavior. Pavlov's introduction of the word ''extinction'' (in Russian) approximates today's psychological use.
In popular use, ''positive reinforcement'' is often used as a synonym for ''reward'', with people (not behavior) thus being "reinforced," but this is contrary to the term's consistent technical usage. ''Negative reinforcement'' is often used by laypeople and even social scientists outside psychology as a synonym for ''punishment''. This is contrary to modern technical use, but it was B. F. Skinner who first used it this way in his 1938 book. By 1953, however, he followed others in thus employing the word ''punishment'', and he re-cast ''negative reinforcement'' for the removal of aversive stimuli.

See also



Dog training

Reward system

Reinforcement learning

Society for Quantitative Analysis of Behavior

References



1. Franco J. Vaccarino, Bernard B. Schiff, and Stephen E. Glickman (1989). Biological view of reinforcement. in Stephen B. Klein and Robert Mowrer. Contemporary learning theories: Instrumental conditioning theory and the impact of biological constraints on learning. Hillsdale, NJ, Lawrence Erlbaum Associates


See also:

★ Chance, Paul. (2003) ''Learning and Behavior.'' 5th edition Toronto: Thomson-Wadsworth.

★ Dinsmoor, James A. (2004) "The etymology of basic concepts in the experimental analysis of behavior." ''Journal of the Experimental Analysis of Behavior'', '82' (3): 311-316.

★ Ferster, C. B., & Skinner, B. F. (1957). ''Schedules of reinforcement''. New York: Appleton-Century-Crofts

★ Michael, Jack. (1975) "Positive and negative reinforcement, a distinction that is no longer necessary; or a better way to talk about bad things." ''Behaviorism'', '3' (1): 33-44.

★ Skinner, B. F. (1938). ''The behavior of organisms''. New York: Appleton-Century-Crofts.

★ Skinner, B. F. (1956). A case history in scientific method. ''American Psychologist, 11'', 221-33.

Zeiler, M. D. (1968) Fixed and variable schedules of response-independent reinforcement. ''Journal of the Experimental Analusis of Behavior, 11'', 405–414.

Glossary of terms on clickertraining.com

Glossary of reinforcement terms at the University of Iowa

External links



An On-Line Positive Reinforcement Tutorial

Scolarpedia Reinforcement

This article provided by Wikipedia. To edit the contents of this article, click here for original source.