Instrumental Conditioning: Associative Structures

The question of 'What is Learned?' in instrumental conditioning amounts to asking about the associative structures that are assumed to underlie instrumental conditioning. The issue has been of great historical significance in the study of instrumental conditioning, and it has more recently (within the last 15 years) been investigated with techniques that permit for more solid conclusions. Within the last couple of years there has been a fair amount of work directed at the finding the central nervous system substrates that may instantiate some of the ideas stemming from the behavioral work that we will review here. We will restrict our attention to a select subset of studies that have examined the question of what is learned during instrumental conditioning in free operant procedures as well as in situations involving stimulus control. First, we will quickly review the historical perspective on the problem.

A. Historical Perspective: S-R associations vs R-O expectations

1. Thorndike. In his classic studies of cats performing in puzzles boxes, Thorndike discovered that instrumental learning proceeded gradually rather than quickly (as would be expected if the cats had learned "insightfully"). On the basis of this observation together with an inspiration from Darwinian evolutionary concepts, Thorndike deduced his famous "Law of Effect." This law states that when a response is followed by a "pleasant state of affairs" this will have the effect of strengthening a connection between the stimulus situation in which the response occurred and the response itself. In other words, when a reinforcer follows some response that reinforcer will act as a catalyst in strengthening a connection between the stimulus situation (e.g., the conditioning box, S) and the most recent response (e.g., the string pull response, R). Good instrumental learning requires that this S-R association be repeatedly strengthened. In other words, Thorndike assumed that the association was not learned all at once, but required repeated applications of the law of effect before the S-R association would be firmly in place. Once firmly in place, then whenever the animal encountered that stimulus, the response would naturally follow.

Incidentally, the law of effect had another side as well. When a response was followed by an "unpleasant state of affairs" then this would have the effect of weakening the previously established S-R bond. The consequence of this is that the animal would be less likely to display the response when the stimulus occurs. One scenario where this would occur would be a punishment scenario.

Notice that an important aspect of this approach is that the reinforcer, although important in establishing the S-R association, is not itself coded as part of the associative structure assumed to be learned. Tolman picked up on this and argued for an alternative position.

2. Tolman. One of the strong opponents to the position set forth by Thorndike was Tolman. He argued, in contrast, that during instrumental learning animals did learn about the reinforcer. In particular, he argued that animals formed response-outcome expectations. In other words, he argued that the animal in some sense understood what are the consequences of its own actions. Thorndike's position did not allow for this because the reinforcer was not coded as part of the associative structure. For him, the S-R association merely caused the response to occur whenever the animal encountered the stimulus. For Tolman, he maintained that the animal responded because it knew about the consequences of its actions. Tolman was not very clear as to what exactly he meant by a response-outcome expectation and how this resulted in the animal responding, and because of this another famous learning theorist of the time (Guthrie) accused him of leaving his rat "buried in thought."

Today, most people understand Tolman's response-outcome expectation in terms of a response-outcome association (R-O association). So the question amounts to asking whether an S-R or an R-O association underlies instrumental learning. Fortunately there have been more refined experiments in recent years that allow us to determine with greater confidence which of these positions seems more accurate. Also, with that additional refinement has come an increasingly sophisticated understanding of the variables and psychological mechanisms that are important in determining what is learned.

B. A More Modern Perspective

1. Free Operant Situations: S-R vs R-O associations.

a. Colwill & Rescorla (1985). In this experiment, the authors trained their rats to make two different instrumental actions each for a different reinforcer according to the same VI schedule of reinforcement. Following instrumental training of these two different R-O pairs, one of the outcomes was then devalued by pairing it (in the absence of the opportunity to engage in instrumental responding) with LiCl. After the animals stopped consuming the outcome that was devalued, the rats were tested on both instrumental responses under extinction conditions (i.e., no reinforcers were delivered in these tests). They observed that that response whose paired outcome had been devalued occurred at a lower rate than the response whose paired outcome was still valued. In addition, the response whose outcome had been devalued occurred at a non-zero level, indicating that the reinforcer devaluation effect was not complete (there was still some residual responding even after outcome devaluation). These results have been taken to mean that R-O associations can be learned during instrumental conditioning, and that S-R association may also be learned. The first conclusion is justified by the observation of differential responding depending upon the different outcome values, and the second is justified on the basis of the fact that there was residual responding.

b. Adams (1982). In his studies on this problem, Adams asked whether the extent of instrumental training might influence which of these two associations controls the animals responding. Adams trained two different groups of rats to different degrees (one group lever pressed for 100 pellet reinforcers and a second group lever pressed for 500 pellet reinforcers). Then each of these groups was sub-divided into 2 separate sub-groups. In one of these sub-groups an aversion was established to the pellet (by pairing it with LiCl), and in the other sub-group no pellet aversion was established (pellets and LiCl occurred on separate days). Finally, the rats in the four groups were given an extinction test with the lever. He observed that the subjects given limited instrumental training (100 reinforcements) showed sensitivity to the devaluation manipulation but the subjects given extended instrumental training did not. On the basis of these data, Adams concluded that with limited instrumental training and R-O association controls instrumental responding, but with extended training control over the response shifts to the S-R association. In other words, with extended training the response becomes Òhabitual.Ó

c. Dickinson, Nicholas, & Adams (1983). These investigators followed up on Adams' results and asked if the schedule of reinforcement might influence which association might govern instrumental performance. Initially, they trained two groups of rats to lever press for food pellets according to either a VI or a VR schedule of reinforcement. Then each of these two groups was sub-divided into separate pellet devaluation and control sub-groups, just like in the earlier study by Adams (1982). Finally, the rats were tested in extinction (as was also done in the earlier study by Adams). In this experiment, the rats that were trained initially with the VR schedule showed sensitivity to the reinforcer devaluation treatment, but rats trained with the VI schedule showed no such sensitivity. It looks as though training with a VR schedule produces an animal whose behavior is controlled by an R-O association, but training with a VI schedule (under these conditions) produces an animal whose behavior is controlled by an S-R association.

d. Dickinson's Experienced Behavior Rate - Reinforcer Rate Correlation. Dickinson later integrated these results by suggesting that the animal's behavior will be controlled by an R-O association when it experiences a correlation between its own rate of responding and the obtained rate of reinforcement. If the animal experiences no such correlation, then he assumed that the animal's behavior will be controlled by an S-R association. The molar feedback function shows us that under ratio schedules there exists a correlation between response rate and reinforcement rate, however, no such correlation exists when the animal is working under an interval schedule of reinforcement. Thus, so long as the animal can experience this correlation under the ratio schedule, then they should be sensitive to the reinforcer devaluation treatment. In the Dickinson, Nicholas, & Adams study, since the animals on the VI and VR schedules were given limited amounts of lever training to begin with, only the VR subjects should have experienced the response rate Ð reinforcer rate correlation. Only this group should have been sensitive to the reinforcer devaluation treatment. In order to explain the results from the Adams (1982) study, Dickinson's idea requires that the extended training group no longer experiences the response rate - reinforcer rate correlation. This idea makes some sense given that an overtrained subject will not change its level of responding much from day to day. This being the case, the animal will not experience changes in its own responding being accompanied by changes in the reinforcement rate. This would be required for the animal to experience the correlation. Thus, the subjects given extended training in the Adams study, should be insensitive to reinforcer devaluation and its lever pressing should, instead, be governed by S-R associations. These ideas are speculative, but they are probably the best available for understanding the results to date.

2. Stimulus Control: Binary and Hierarchical Associations.

a. Definition. Stimulus control refers to a situation in which the animal's instrumentally conditioned behavior occurs more in the presence of one stimulus situation than in the absence of that stimulus situation. When this happens we say that the stimulus in question exerts stimulus control over the instrumental response. This can happen when a differential reinforcement contingency is in effect during the stimulus (usually a tone, for example) compared to when the stimulus is absent (periods in which no tone is present, for example). The key issue that we will be concerned with here is to determine what kinds of associative structures govern the animalsÕ behavior when it comes under stimulus control.

b. Some Possibilities. One view is Thorndike's. If an animal simply learned an S-R association, then this would explain why the animal responded more in the presence vs the absence of the stimulus. Another possibility is that the animal learns both Pavlovian S-O and instrumental R-O associations, and that these two associations summate their activating effects when the stimulus comes on. This would also explain the basic effect of more responding in the presence vs the absence of the stimulus. A third possibility is that an S-O, O-R associative chain is learned. This view was advanced by Trapold and Overier (1972) and used to explain the differential outcome effect (see below). Briefly, the animal is assumed to develop a Pavlovian S-O association. Thorndike's S-R association is also assumed to be learned. However, because of the Pavlovian S-O association, the outcome representation is assumed to be active at the time that the animal responds and gets reinforced. If the activated outcome representation can act as a kind of internal stimulus, then the Law of Effect states that an association will form between the outcome representation and the response that was reinforced. This is the O-R association part of the chaining model. A forth possibility is that a hierarchical associative structure develops in stimulus control. The form of this association could be where the stimulus associates with an association between the response and the reinforcing outcome (S Ð [R-O] association).

c. The Differential Outcome Effect. In this study (which has been replicated in a variety of different settings), rats were trained to choose one of two available responses (right vs left lever press responses) in the presence of different stimuli (e.g., two different tones). The contingencies were such that choice of one of the responses was correct in the presence of one stimulus, but choice of the other was correct in the presence of the second stimulus. One group of rats, the non-differential outcome group, was trained with a single reinforcer given for all correct choice responses. In a second group, the differential outcome group, the different correct response choices were reinforced with qualitatively different reinforcers (e.g., pellet for one and sucrose for another correct choice response). The differntial outcome effect refers to the fact that Group Differential Outcomes learns the discrimination task more rapidly than does Group Non-Differential Outcome. This result has been taken as evidence in support of the S-O, O-R chaining. The idea is that in Group Differential, there are different S-O associations forming in the presence of the different stimuli. The effect of this would be for each stimulus to activate a representation of a particular outcome before a choice response is made. When the correct choice response is made in the presence of the activated outcome representation, then this (through the law of effect) would strengthen a connection between the outcome representation as a stimulus and that choice response. Further, since each correct choice response is reinforced in the presence of different outcome expectancies, then different O-R associations will develop and come to control the different choice responses. In contrast, in Group Non-Differential since only one reinforcer is used for both correct choice responses, then the same outcome will associate with each stimulus. This will mean that each stimulus will activate the same outcome representation on each trial prior to each correct response. By Thorndike's law of effect, this will mean that the same outcome representation will come to associate with each correct response. This will present a problem for the animal because the S-O, O-R associative chains controlling behavior will result in the same outcome representation on each trial activating both responses (because of the O associated with each R).

d. Colwill & Rescorla (1990) Study. Colwill & Rescorla ran an experiment designed to determine if hierarchical relations can be evidenced in a situation where various approaches based on binary associations can not exert differential effects. In their study, each of two responses was reinforced with different outcomes in the presence of one discriminative stimulus (e.g., S1: R1 -> O1, R2 -> O2). However, in the presence of a second discriminative stimulus, the reinforcement contingencies were switched (i.e., S2: R1 -> O2, R2 -> O1). In the absence of each stimulus, neither response was reinforced. The animals responded more in the presence than the absence of the stimuli, and this indicated that the stimuli exerted stimulus control. At issue was whether this stimulus control occurred because the animals learned about the hierarchical relations among the Ss, Rs, and Os. In the next phase, the O1 outcome was devalued by pairing it with LiCl. Finally, in the test phase subjects were given a choice between the two instrumental responses in the presence of S1, S2, and during the intertrial interval. No reinforcements could be earned during this test session. They observed in this test session that the rats preferred the instrumental response whose associated outcome was valued in a particular stimulus. This meant that in the presence of S1 the rats chose R2 over R1, but in the presence of S2 they chose R1 over R2. These results can not be explained in terms of an account based on binary associations, and suggest, instead, that the animals are capable of learning hierarchical S - [R-O] associations. This associative structure allows for the animals to keep track of the fact that each response went with each outcome, but that different R-O relations held in the presence of the different stimuli. If the animals had merely acquired various binary associations (e.g., S-O, R-O or S-O, O-R, or S-R associations) then since each S and each R was associated with each O, there would be no basis for the animals choosing different responses in the presence of the different stimuli. Therefore, the results from this experiment offer strong support for the view that hierarchical S - [R-O] associations are learned in instrumental stimulus control situations.