Instrumental Conditioning:  Determining Conditions

 

 

     Another fundamentally important question in the study of instrumental conditioning asks Òwhat are the determining (i.e., important) factors in governing instrumental learning?Ó  Another, perhaps related, important question asks Òwhat are the determining factors governing instrumental performance (once learning has already taken place)?Ó  The first of these questions is exactly parallel to the determining-conditions question we examined in Pavlovian learning.  Our treatment of the second question will center around the question of how motivational concepts (drive, incentive contrast, incentive learning) might influence instrumental responding.

A.  Determining Factors in Instrumental Learning

            1.  Temporal Contiguity.  One factor shown to be influential in Pavlovian learning is the time between the two events to be associated.  Here, this refers to the time between the response and the reinforcement.  Response-reinforcer temporal contiguity has been studied in a number of ways, but the most straightforward recent study was reported by Dickinson, Watt, & Griffiths (1992).  These investigators varied contiguity between groups.  Different groups of rats were given a number of instrumental conditioning sessions in which the lever press response was followed by a food pellet reinforcer after different amounts of time (0, 2, 4, 16, 32, 64), and according to some schedule of reinforcement.  They observed that after a number of sessions there was an orderly, decreasing function relating response rate to temporal contiguity with the maximum responding occurring when there was no delay to reinforcement.  These results clearly demonstrate that response-reinforcer temporal contiguity is an important variable in determining instrumental learning.

            2.  Response-Reinforcer Contiguity.  Investigators have asked whether response-reinforcer contingency or contiguity is more fundamental in instrumental learning.  Skinner presented some provocative thoughts on this in his ÒsuperstitionÓ experiment with pigeons.  He simply placed pigeons, individually, in operant chambers and delivered grain reinforcement every 15 sec, independent of the birdÕs behavior.  He noticed that the birds developed stereotyped behaviors in response to this regime, and he interpreted this result to imply that even though no response-reinforcer contingency applied the birds learned as though such a contingency was in effect.  Specifically, he suggested that occasional chance pairings between a response that the bird happened to engage in at the time of reinforcement and the reinforcer itself was sufficient to generate instrumental learning.  Others have argued, however, that this experiment was flawed in the sense that the birds could have learned a temporal expectation of when grain would occur, and that Pavlovian CRs may account for the behaviors that Skinner observed and attributed to instrumental learning due to chance pairings.

            Later, Hammond (1980) ran an experiment with rats that directly paralleled those performed by Rescorla in the study of contingency effects in Pavlovian learning.  Hammond divided the session up into many 1-sec intervals.  If the rat pressed the lever during a given 1-sec interval, then food reinforcement would be delivered with some probability (e.g., .05).  He trained rats to press the lever in this way, but after several conditioning sessions, he then created a zero contingency by delivering food pellets with an equal probability during 1-sec intervals in which the rat had not pressed the lever.  In the zero contingency group, the rats continued to receive reinforcement with the same probability for lever pressing, but rats in a negative contingency group only received reinforcements when the rats did not lever press.  Hammond observed that rats in both groups dramatically reduced their response rates when the contingency was reduced.  However, it is unclear from these results whether they reflect an effect of the reduced contingency on instrumental leaning or performance.  For example, it could be the case that competing responses are being reinforced when pellets that occur without a lever press are presented.  These other behaviors could have competed with lever pressing to lower the overall level.  This would mean that either contiguity or contingency could have governed instrumental learning in this situation.

            Colwill & Rescorla (1986) ran an experiment similar to HammondÕs but that controlled for the possibility that competing responses could play a role.  They conditioned two different responses that were each reinforced with a different reinforcer.  The response-reinforcer contingency for only one of these responses was degraded by additionally delivering reinforcers of only one type at times when neither of the two instrumental responses had occurred.  They observed that as a result, this treatment reduced the response whose contingency had been degraded more than the response whose contingency was not degraded.  Because any competing responses should have competed equally well with each response, this difference was due to an effect of the contingency manipulation on instrumental learning, not performance.

            3.  Reinforcer Surprise.  Pearce and Hall (1978) examined the impact of reinforcer surprise on instrumental learning.  They studied this question with rats learning to press a lever for a food pellet.  They had two groups of rats that were each trained to press the lever according to an intermittent schedule of reinforcement.  For Group Correlated, whenever the lever press response was followed by reinforcement, a 1-sec light stimulus occurred immediately after the response and just before the food pellet.  The light did not occur when the response was not reinforced.  In Group Uncorrelated, the light could occur equally often after both reinforced and nonreinforced lever press responses.  It is important to realize that the two groups were run on the same intermittent schedule of reinforcement, but that they differed in whether the light was correlated or not correlated with the food pellet.  What they found was that Group Uncorrelated learned to respond at higher levels over the course of training than Group Correlated.  The obvious interpretation is that this was due to a reduced amount of instrumental learning (of the lever press Ð food association) in Group Correlated because when the pellet occurred it was rendered non-surprising by the light.  If surprising reinforcers are more reinforcing of instrumental responses than non-surprising ones, then there should have been less learning in Group Corrletated.

            4.  Response Ð Reinforcer Relevance.  Shettleworth (1975) ran an experiment in which she examined the role of the motivational relevance of the response to the reinforcer in instrumental learning.  She hypothesized that responses and reinforcers that are relevant to the same motivational system would be associated more easily than responses and reinforcers that come from different motivational classes.  She first identified several behaviors (in hamsters) that appear relevant to food measuring increases in their levels of occurrence when the animals are food deprived compared to when not food deprived.  Then she reinforced different subgroups of animals for making one particular response.  Three groups were reinforced with different responses that were ÒrelevantÓ to hunger motivation, and three groups were reinforced with different responses that were determined not to be relevant to hunger motivation.  She observed that the groups with ÒrelevantÓ responses reinforced by food showed increases reinforced responding over training, but the groups with the non-relevant responses showed little or no increases over training in spite of the fact that the responses were reinforced just like in the other case.  One explanation for these data is that response-reinforcer relevance is an important determining condition for instrumental learning.  Apparently, some combinations of response and reinforcement are better learned about than others.

            However, another interpretation of these data is that, more simply, the ÒrelevantÓ behaviors have a higher baseline of occurrence and responses with higher baselines may simply be more easily reinforced than behaviors with lower baselines.  This interpretation was questioned by Shettleworth in a follow-up experiment in which she increased the baseline level of occurrence of one of the Ònon-relevantÓ responses, face washing, by spraying a mist of water on the hamsters faces and still found that food reinforcement was ineffective at increasing levels of this response.

            Though these data are supportive of the view that learning about particular response-reinforcer combinations are easier than others, no research has investigated this topic using the more powerful experimental design used by Garcia & Koelling who investigated this sort of effect in Pavlovian learning.

 

B.  Determining Factors in Instrumental Performance

            1.  Drive.  In a classic experiment, Clark (1958) demonstrated that the animalÕs drive level can influence instrumental performance.  He trained rats while hungry to lever press for food reinforcement.  Then he tested different groups of rats during an extinction test where the groups differed in their level of food deprivation.  He found that rats responding more in this test the more food deprived they were.  This suggests that the general ÒdriveÓ level can modulate how much the animal responds.  Specifically, high hunger drive levels may generally activate the animals behavior, and this may explain why the lever press response is also higher.

            2.  Incentive Contrast.  Another factor shown to influence the amount of instrumental responding is the incentive value that is assigned to the reinforcer.  Melgren (1972) demonstrated how the incentive value of a food reinforcer partly depends upon the prior experience that a subject has with that and related reinforcers.  In his study, rats ran down a runway in order to obtain food reinforcement.  In one group, 2 pellets were placed in the goal box, while 22 pellets were placed in the goal box for a second group.  During phase 2 of the experiment, rats who had earlier been reinforced with 2 pellets either continued to be reinforced with 2 pellets (in Gp 2-2) or they were now switched to receiving 22 pellets (in Gp 2-22).  Conversely, rats who had earlier been reinforced with 22 pellets either continued to be reinforced with 22 pellets (in Gp 22-22) or they were switched to receiving 2 pellets (in Gp 22-2).  Melgren observed that rats who had been upshifted (Gp 2-22) ran faster than rats who had been reinforced with 22 pellets all along (Gp 22-22).  This effect is referred to as positive contrast.  However, rats who were downshifted from 22 to 2 pellets (Gp 22-2) ran more slowly than rats reinforced with 2 pellets all along (Gp 2-2).  This effect is known as negative contrast.  Together, these results imply that the incentive value of a reinforcer can depends upon the prior experience the subject has with related reinforcers.  In other words, if you are used to getting a big reward but are then shifted to a small reward, then the value of that small reward is low compared to someone who has only received small rewards.  On the other hand, receiving a big reward after only getting small rewards is much better than even getting the same big reward if thatÕs what youÕve received all along.

            3.  Incentive learning.  Dickinson, Balleine, and their colleagues have re-examined the effect on instrumental performance of shifting motivational states from a training phase to a test phase.  In one of his experiments, Balleine (1992) trained rats while hungry to press a lever for a food pellet, and then tested rats either hungry or not hungry (i.e., food satiated).  Unlike Clark (1958), Balleine found no difference in lever pressing between the groups.  Balleine and Dickinson have argued that this failure to show an effect of the motivational shift occurs because the rats must first learn about the new incentive value of the food in the shifted drive state before such a shift can influence instrumental performance.  In a second experiment, Balleine first gave two different groups of rats differential experience with the food pellet in hungry and sated drive states.  Both groups were alternately made hungry on one day and then satiated on the next.  One of these groups was given free food pellets on hungry days, while the other group was given free food pellets on satiated days.  The lever was not in the chamber during this phase for either group.  In the 2nd phase of this study, all rats were taught to lever press for food pellets while they were hungry.  In the test phase, lever press responding was measured (during extinction conditions) either while the rats were hungry or while they were satiated.  One half of the rats that had been exposed earlier to the pellets while hungry were tested hungry, while the other half was tested satiated.  Similarly, one half of the rats that had been exposed earlier to the pellets while satiated were tested hungry, while the other half was tested satiated.  During the test session, rats that had been exposed to the pellets previously while hungry responded equally when tested hungry or satiated.  In contrast, rats that had been exposed to the pellets previously while satiated responded less in the test session when they were tested satiated compared to when they were hungry.

These results offer strong support for the view that in order for an animalÕs instrumental behavior to be sensitive to a change in motivational state, the animal must first learn about the incentive value of the reinforcer in the test drive state.  In other words, incentive learning appears necessary for an instrumentally-trained response to show sensitivity to a shift in motivational conditions.  The only apparent data discrepancy in need of some clarification is why Clark earlier showed that his rats were sensitive to shifts in drive whereas BalleineÕs rats were not.  Balleine suggested that this may be because rats in earlier studies used pellet reinforcers that were made from their home cage chow, whereas today pellet reinforcers and home cage chow are different foods.  If this were true, then notice how ClarkÕs rats could have received the relevant incentive learning in their home cage prior to the experiment ever beginning.  Thus, ClarkÕs animals may, indeed, have been more similar to BalleineÕs animals that received exposure to the pellets while satiated prior to the instrumental training phase.