Pavlovian Conditioning: Rescorla-Wagner Model

The Rescorla-Wagner model has been and continues to be an extremely influential model of Pavlovian conditioning. It is formalized as a mathematical description of the changes in associative strength (V) that take place on individual conditioning trials. The basic notions are (1) that learning is conceptualized in terms of a change in association (i.e., connection) strength between the CS and some aspect of the US (the model is actually agnostic as to whether this connection is between the CS and the stimulus or response properties of the US), and (2) that stimulus salience and US surprise govern changes in associative strength. What has made the model successful is its applicability to a wide range of conditioning phenomena not previously regarded as being related to one another, and its ability to give rise to predictions about the existence of new phenomena. In spite of the model's success, there are several limitations that are also worth considering.

1. The Model: deltaV = alpha * beta * (lambda - SumV)

deltaV refers to a change in associative strength that occurs on a given trial.

alpha and beta are learning rate (i.e., salience) parameters for the CS & US, respectively.

lambda refers to the maximum amount of associative strength that the US can support.

SumV refers to the total expectation of the US and is calculated by adding the current associative strengths of all the CSs present on the trial.

(lambda - SumV) corresponds, psychologically, to US surprise on a given trial.

2. Application to existing phenomena

a. Acquisition. Notice how on the very first conditioning trial there will be no expectation of the US when the CS is presented (SumV = 0). However, when the US occurs, that generates a big discrepancy between what is expected to occur and what actually occurs (lambda - SumV). Assuming that the CS & US have a non-zero salience level (i.e., assuming that alpha*beta > 0), then there will be learning on the trial.

Notice also that the amount of learning on each successive conditioning trial (deltaV ) will decrease until the total associative strength (SumV) reaches the maximum amount of associative strength that the US can support (lambda). In other words, at asymptote, V = lambda.

If you were to compare the effects of varying the CS and US salience, different effects would show up. In particular, if CS salience were low in one group, but high in another (e.g., alpha*beta = 0.2 versus 0.4), then V would ultimately reach the same asymptote in both groups but the more salient CS would get there faster.

The situation is different when comparing two groups conditioned with the same CS, but different US intensities. In this case, the more intense (i.e., salient) US would have higher beta and lambda values. This would result in V reaching a higher asymptote at a quicker rate when the CS is paired with the more intense US compared to when the CS is paired with the less intense US.

b. Extinction. Notice how extinction trials will result in unlearning of the association. In other words, because there is no US present on the trial when the US is expected to occur, a negative discrepancy is generated (lambda - SumV = a negative number). This will result in deltaV being a negative number on the trial. Thus, the CS will actually lose associative strength on the trial. This will continue to occur with repeated nonreinforced CS presentations until the CS no longer has any associative strength.

c. Blocking. Kamin's blocking effect is predicted by the model by virtue of the assumption that the total expectation of the US on a given trial is arrived at by summing together the associative strengths of all the CSs present on the trial. In the first phase of the blocking experiment, CS1 is conditioned to asymptote (i.e., its V = lambda). At the beginning of the compound conditioning phase, SumV = lambda because the associative strength of CS1 (V = lambd) + the associative strength of the added stimulus CS2 (V = 0) combine to equal the amount of learning supported by the US on the trial (i.e., lambda). Therefore, there will be no discrepancy on the 1^st and on subsequent compound conditioning trials between what is expected and what occurs (i.e., lambda - SumV = 0), and consequently there will be no acquisition of associative strength to the added CS2.

d. Overshadowing. This phenomenon is a cousin of blocking, and occurs in the model for similar reasons (though there is an important difference). When two CSs are combined and paired with the US, then each CS will acquire some associative strength. On the next trial, both CSs contribute their associative strengths towards the total expectation of the US. Two things happen here. First, because both CSs contribute to the total expectation, the rate at which the asymptote of learning will be reached will be faster than if only one CS was paired with the US all along. Second, since both CSs are contributing to the total associative strength, they are sharing how much associative strength the US can support. SumV = lambda at the end of conditioning, but since SumV = V of CS1 + V of CS2, each CS will acquire 1/2 lambda worth of associative strength at the end of conditioning. Notice how the model predicts that this overshadowing effect (a CS when conditioned by itself will always achieve a higher level than when conditioned with another CS) is predicted to occur even when the saliencies of the two CSs differ. In other words, overshadowing is predicted to be a reciprocal affair (the stronger CS can overshadow the weaker, but the weaker can also overshadow the stronger CS). Another interesting feature of the model here is that no overshadowing is predicted to occur on the very first conditioning trial. This differs from the analysis of blocking - as we saw above, blocking should occur on the very first compound conditioning trial.

e. Contingency effect. The harmful effects of degrading the CS-US contingency by adding unsignaled USs in the inter-trial interval can be readily understood with this model. By assuming that the conditioning context can function as a conditioned stimulus, itself capable of associating with the US, then the contingency effect can be seen as a special case of blocking (by context). In the group with the degraded contingency, the conditioning context is assumed to have a higher associative strength than in the group not given the US during the inter-trial interval. This means that during a conditioning trial the context + CS compound will result in less learning of the CS because the higher associative strength of the context will contribute more towards the total expectation of the US and this will allow it to compete more effectively with the CS for associative strength. The context will have low associative strength in the control group since it does not get the US during the inter-trial interval, and any associative strength conditioned to the context during a context + CS conditioning trial will extinguish during the (nonreinforced) inter-trial intervals.

According to this analysis, what should happen if you were to signal each inter-trial US with a different CS?

f. US preexposure effect. This effect, like the contingency effect can be understood in terms of context blocking. Here, the idea is that during the initial phase in which only the US is repeatedly presented, a lot of context associative strength is conditioned. By the time CS - US pairings occur in phase two, the context will be effective at blocking learning about the CS. In other words, the context evokes the expectation of the US already, and this renders the US less surprising, than in the control group, when the CS is actually paired with the US.

According to this analysis, what should happen if you were to preexpose the US in once context and then condition the CS either the same or a different context?

g. Conditioned Inhibition. In the Pavlovian conditioned inhibition procedures, A+ and AX- trials are interspersed until the subject displays a good discrimination (i.e., it responds a lot on A+ trials, but little on AX- trials). The model anticipates that A should acquire associative strength, with an asymptote of l. However, on AX- trials since no US occurs there will be a negative discrepancy because (lambda - SumV < 0). Remember that the value of lambda on a nonreinforced trial = 0. X will acquire negative associative strength as a result of this. And with enough training, X's associative strength will continue to decrease until its negative associative strength will completely counteract A's positive associative strength. The process will end when A's associative strength equals lambda and X's associative strength equals negative lambda.

3. A Couple of Unique Predictions

a. Overexpectation effect. When you separately condition two different CSs and then pair the compound of the two with the US, the model predicts that negative associative strength should occur on the compound trial. This is because the two CSs were each individually trained to their asymptotes. When you combine the two CSs, then the total expectation of the US is double what it should be. This results in a negative discrepancy that reduces the associative strength to each CS. Apply these ideas to the experiment we considered in class.

b. Excitatory conditioning with nonreinforcement. When you combine a CS that was previously trained as a conditioned inhibitor together with a new CS and nonreinforce the stimulus compound, then a positive discrepancy is generated (0 - -lambda) = a positive number. This will result in some excitatory conditioning occurring to the new CS even though it was never paired with the US. Finish this reasoning by applying the ideas to the experiment we discussed in class. In other words, explain why the control group fails to show the effect.

4. Some Problems (& possible solutions where appropriate)

a. Extinction doesn't seem to involve unlearning.

b. CS preexposure effect. Since the associative strength to the preexposed CS should not change during the preexposure phase, then slower learning in the preexposed group is not explained. However, by allowing for the salience of the CS to become reduced during the preexposure phase (due to habituation?), then slow learning will result from a lower learning rate. Wagner's priming theory explained assumed that reductions in CS salience were related to the strength of the Context - CS association learned during the preexposure phase. If the CS is predicted by the context, then its actual occurrence will result in less processing. Therefore, what should happen if we were to compare two groups of subjects where one group was preexposed and conditioned in the same context and another in different contexts?

c. Learned irrelevance. Again, since a zero contingency procedure is expected to result in the context blocking conditioning to the CS, then the associative strength to the CS should be zero by the end of the random contingency phase. Learning when the CS and US are subsequently paired should not be so greatly disrupted. There is some uncertaintly, however, as to whether learned irrelevance reflects the sum of the CS and US preexposure effects combined or something else.

d. Configural learning (also known as negative patterning). In this phenomenon, A+, B+, and AB- trials are all interspersed and the subject learns to respond on A+ and B+ trials more than on AB- trials. Since total expectation of the US on any trial reflects the sum of the associative strengths of all the stimuli present on a given trial, then it is not immediately obvious why the subjects do not respond twice as much on AB- trials as they do on A+ and B+ trials. However, further reflection allows us to assume that a third, compound unique, stimulus is generated on AB- trials. Including these trials allows us to understand better how the model could anticipate successful discrimination learning in this case. The compound unique stimulus would be assumed at asymptote to acquire enough negative associative strength to completely counteract the positive strengths conditioned to the A and B stimuli (on A+ and B+ trials). Thus, the simple fact of configural learning does not present a problem for the model. This analysis, though, makes crystal clear the fact that this model is an elementalistic model, not a configural one. In other words, the assumption is that when multiple stimuli are presented simultaneously, the compound is processed by decomposing it into its separate components. The whole is equal to the sum of its parts by this account. There are other models that start with the different assumption that stimulus compounds are processed configurally where the whole is greater than the sum of its parts.

e. CS-US relevance. Garcia and Koelling's finding that certain CS-US combinations are better learned about than are others presents a problem for which the model does not have a ready fix. The problem is that a CS that seems to have a very high salience for one US seems to have a low salience for a different US. In Garcia & Koelling's experiment they actually conditioned two stimuli together with different USs in different groups. Thus, CS1 + CS2 -> US1 or CS1 + CS2 -> US2. In terms of the model they found that the salience of CS1 > CS2 for US1, but that the salience of CS1 < CS2 for US2. The model has no ready way of explaining how saliencies could fluctuate across different USs.