Multiplying Modalities:
Presentational, Orientational, and Organizational Meaning


If we are concerned with the kinds of meaning that can be made with hypermedia, we need to examine two kinds resources that extend beyond the affordances of plain text. One of these is the semantics of hypertextuality, which will be considered in the next section. The other is the semiotics of multimedia, particularly the integration of verbal and visual resources for meaning.

 I take the position that, fundamentally, all semiosis is multimodal (cf. Kress & van Leeuwen 1996, Mitchell 1994): you cannot make meaning that is construable through only one analytically distinguishable semiotic resource system. Even if for many purposes we analytically distinguish the linguistic semiotic resource system from that of depiction or visual-graphic presentations, and both from others such as the music-sound system or the behavioral-action system, the fact that all signifiers are material phenomena means that their signifying potential cannot be exhausted by any one system of contrasting features for making and analyzing meaning.

 If I speak aloud, yes, you may interpret the acoustical sounds I make through the linguistic system as presentations of lexical items, organized according to a linguistic grammar, etc. But you may also interpret them as indexical signs of my personal identity, individuality, my social category memberships, my state of health and my emotional condition. And I may manipulate vocal features of my speech, which are phonologically and lexically non-distinctive, so that I am heard to speak the same words, even with the same formal intonation patterns of the language, but in ways that present a foreign accent, identifiable dialect-associated features, a child-like timbre, a breathy seductive tone, nervousness, etc. The skills of accomplished actors demonstrate all this quite well. Interpretation of my speech-sound-stream through the terms of the linguistic system alone does not and cannot exhaust its possible meanings in the community.

 Likewise, if I choose to write down my words, eliminating the affordances of vocal speech that give rise to this supra-linguistic meaning potential, I must still create material signs, which now again afford other ways of meaning: in handwriting there are many indexical nuances of meaning, in print there are choices of typefaces and font, page layout, headers and footers, headings and sidebars, etc. Each of these conveys additional kinds of meaning about the historical provenance of the text, its individual authorship, the state of the author (in the case of handwriting), the conventions of the printer, which parts of the text are to seen as more salient, how the text is to be seen as organized logically, etc. -- all through non-linguistic features of the visible text.

 Beyond this, we can hardly help interpreting word-pictures with pictorial imaginations, visualizing what we hear or read, whether as image, technical or abstract diagram, graph, table, etc. And conversely, when we see a visual-graphical image, whether a recognizable scene or an abstract representation of logical or mathematical relationships, we cannot help in most cases also interpreting it verbally. Language and visual representation have co-evolved culturally and historically to complement and supplement one another, to be co-ordinated and integrated (Lemke 1998b, 1998c). Only purists and puristic genres insist on their separation or monomodality. In normal human meaning-making practice, they are inseparably integrated on most occasions.

 But how? We know that there are specific genre conventions (cf. Lemke 1998b), e.g. verbal captions to visual figures, verbal labels within visual figures, verbal explanatory text which cites or refers to visual figures, visual placement to indicate which words are to be linked how to which other words (paragraphing, outlining, tables, sidebars, headers, etc.), visual signs to connect words (e.g. arrows), etc. We also know that the meaning of an image changes depending on the verbal label or accompanying text, and the meaning of text changes depending on the accompanying visual figures. But what is the capacity of this phenomenon? What can it do, and how does it happen? 

These are the basic questions of multimedia semiotics: What kinds of meanings can be made by combining verbal, visual, and other signs from other semiotic resource systems? How do the meanings of multimodal complexes differ from the default meanings of their monomodal components in isolation? How do we construe the meanings of components in multimodal complexes and of whole complexes as such? 

My basic thesis is that the meaning potential, the meaning-resource capacity, of multi-modal constructs is the logical product, in a multiplicative sense, of the capacities of the constituent semiotic resource systems. When we combine text and images, each specific imagetext (cf. Mitchell 1994) is now one possible selection from the universe of all possible imagetexts, and that universe is the multiplicative product of the set of all possible linguistic texts and the set of all possible images. Accordingly, the specificity and precision which is possible with an imagetext is vastly greater than what is possible with text alone or with image alone.

 That said, there are a few important qualifications. First, the existence of cultural traditions means that not all the things that can be said or pictured are said or pictured, and in particular, the probabilities for all possible combinations of textual items (on any text  scale) or of all possible visual features (on any image scale) are never equal. So the Shannon information in any text or image or imagetext is always a great deal less than the maximum information possible if all combinations occurred with equal probability. In fact in semiotic terms humans make meaning by selective contextualization (Lemke 1995); we do not deal with all possible combinations and meanings, but rather we work across scales of organization of signs and events-as-sign to construe words and images appropriately for situations and genres, reading them against the intertextual frequencies with which in our experience various signifiers/features and meanings are likely to co-occur in various contexts. Within these layers of contextualization, however, the sets of possible signs still multiply the semiotic possibilities.

 It would only be in a culture in which language and image were entirely redundant, where there was one and only one picture that could be associated with each text, and one and only one text that could be associated with each picture, that this multiplicative model would not apply.

apply. Moreover, new meanings are made all the time; the universe of potential meanings is semiotically always larger than the set of meanings actually made so far. At the edge of new meanings, we require even more guidance from the cross-specifications of multi-modal representations because of the greater uncertainty in what a new meaning implies. A new word, a new kind of image, a new verbal idea take on meaning as they are used across contexts, and these include discursive contexts, multimodal representational contexts, and actional contexts at least. The new meaning comes to be as the community establishes conventions regarding how it functions and  how it is represented in various contexts and various modalities of semiosis.

 Consider also the issue of cross-modal translations. Even though a culture may create conventions about how, say, a painting is to be described in words, or commented on in scholarly fashion, or how a mathematical equation is to be graphically represented, text, image, and other semiotic forms are sui generis. No text is an image. No text has the exact same set of meaning-affordances as any image. No image or visual representations means in all and only the same ways that some text can mean. It is this essential incommensurability that enables genuine new meanings to be made from the combinations of modalities.

 For meaning to actually multiply usefully across semiotic modalities, there must be some common denominators. At what level of abstraction can we say that images and texts and other kinds of semiotic productions make meaning in the same way?

 All semiosis, I believe, on every occasion, and in the interpretation of every sign, makes meaning in three simultaneous ways. These are the generalizations across modalities of what Halliday (1978) first demonstrated for linguistic signs, when considered functionally as resources for making meanings. Every text and image makes meaning presentationally, orientationally, and organizationally. These three generalized semiotic functions are the common denominator by which multimodal semiosis makes potentially multiplicative hybrid meanings.

 Presentational meanings are those which present some state of affairs. We construe a state of affairs principally from the ideational content of text, what they say about processes, relations, events, participants, and circumstances. For images, one could apply the same terms, recognizing what is shown or portrayed, whether figural or abstract (cf. Kress & van Leeuwen 1996). It is this aspect of meaning which allows us to interpret the child’s unfamiliar scrawl on paper through his use of the word ‘cat’, or his indecipherable speech through his pantomime of eating.

 Orientational meanings are more deeply presupposed; they are those which indicate to us what is happening in the communicative relationship and what stance its participants may have to each other and to the presentational content. These are the meanings by which we orient to each other in action and feeling, and to our community in terms of point of view, attitudes, and values. In text, we orient to the communication situation primarily in terms of speech acts and exchanges: are we being offered something, or is something being demanded of us? Are we being treated intimately or distantly, respectfully or disdainfully? We assess point of view in terms of how states of affairs are evaluated and which rhetorics and discourses are being deployed. The actual signs range from the mood of a clause (interrogative, imperative) to its modality (uncertainty, insistence), from markers of formality to the lexis of peer-status, from sentence adverbials (unfortunately, surprisingly) to explicit evaluations (it’s terrible that …). Visually, there is also a presumptive communicative or rhetorical relationship in which the image mediates between creators and viewers and projects a stance or point of view both toward the viewer and toward the content presented in the image.

 Organizational meanings are largely instrumental and backgrounded; they enable the other two kinds of meaning to achieve greater degrees of complexity and precision. Most fundamentally organizational resources for meaning enable us to make and tell which other signs go together into larger units. These may be structural units, which are contiguous in text or image-space, and usually contain elements which are differentiated in function (subject/predicate in the clause; foreground/background in image composition). Or they may be cohesive or catenative chains, which may be distributed rather than contiguous, and in which similarity and contrast-within-similarity of features tie together longer stretches of text or greater extent of image as a unity or whole (repetition of words and synonyms; unity of palette).

 In multimodal semiosis, we make cross-modal presentational (orientational, organizational) meaning by integrating the contributions to the net or total presentational (orientational, organizational) meaning from the presentational meanings of each contributing modality (Lemke 1998b). Indeed, in many multimodal genres (and in all multimedia productions to some extent), the presentational (orientational, organizational) aspect of the meaning of a multimodal unit (at any scale, see Appendix) is underdetermined if we consider only the contribution from one modality. It may be ambiguous or unidentifiable or simply too vague and imprecise to be useful in the context of the next larger whole or embedding activity.

 As a simple formula this componential multiplicative principle might be represented as:

Pr[L,V] = Pr[L] (x) Pr[V]; Or[L,V] = Or[L] (x) Or[V]; Org[L,V] = Org[L] (x) Org[V]


In the complexity of real meaning-making, there are further complications to this basic principle. First, within a semiotic modality, presentational, orientational, and organizational meanings are not by any means totally independent of one another. The possible combinations do not all occur with equal probability, and functionally each one helps us to interpret the others, especially in short, ambiguous, or unfamiliar texts or images. Secondly, this same cross-functional phenomenon is very important in multimodal semiosis. Not only do Or[L] and Org[L] help to disambiguate and interpret Pr[L] (and it help interpret each of them), and, by our first principle, Pr[V] help interpret Pr[L] (and vice versa), but Or[V] and Org[V] also play a role in making sense of Pr[L], but moreso Pr[L,V].

 Human semiotic interpretation is both gestalt and iterative. That is we recognize patterns by parallel processing of information of different kinds from different sources, where we are not aware of any sequential logic, and we refine our perceptions and interpretations as we notice and integrate new information into prior patterns in ways that depend in part on our having already constructed those prior, now provisional patterns. It is well known in the case of reading a text of some length, that we form expectations about text-to-come and we revise our interpretation of text-already-read in relation both to new text we read and to the expectations we had already formed before reading it. In dialogic interaction we know well that what was said moments ago can at some future time come to have meant something quite different from what it seemed to mean at the time it was first said.

 The viewing of images proceeds in a somewhat different fashion, but still undergoes similar processes through time. We may see a certain gestalt of a whole image, but if the image is complex enough in its details, if there are many scales of visual organization embedded within one another in its composition, then we will not have taken in all the details at first glance, nor will we have become aware of the many kinds of relationships, contiguous and at-a-distance in and across the total image. We examine relationships within different scales of organization, and we move our attention along different pathways through the image until we have exhausted these possibilities and made provisional interpretations, which then lead us to examine still more details at various scales, through the iterative process which may, as with text, converge on some overall interpretation, or diverge into many possibilities, or simply be unstable.

  I believe that it is customary in our culture to pay conscious attention primarily to presentational meanings, to orientational ones only in special circumstances, and to organizational ones only if you are a professional user of the medium. We rely on familiarity with genre conventions to automate our use of organizational and orientational cues and allow us to proceed directly to presentational information, at least in institutionalized use of media, where we are taught that it is only the presentational content which is important for institutional purposes. Such approaches, of course, are highly uncritical. They ignore power relationships, presupposing institutional roles. They ignore the limitations of genre conventions on possible new meanings. They increase a certain narrow kind of efficiency, and minimize the ongoing threat to the social status quo. As professional analysts and designers, we concern ourselves very much with organizational meaning in an instrumental sense: as means to orientational and presentational ends. We also pay attention to orientational meaning, but again very selectively. We may design rhetorical strategies, but we may not question our own role or imagine alternative possible relationships to the users of what we design. We are likely to adopt a particular evaluative stance toward our presentational content (desirable, likely, surprising, obligatory, usual), but we may not consider where that stance positions us in the social universe of discourses about these matters or in our social relations to others and their interests.

 In the particular case of webpages, which I will analyze after some further discussion of the semantics of hypertext, I recommend starting with Organizational aspects of meaning. What are the largest wholes and the largest components of these? What are the most salient and extensive chains of similarities and contrasts within sameness? And then working down toward finer features. Then it is useful to pay attention to Orientational meaning: What is the basic stance of the creator/work to the viewer/user? What demands does it make, what options does it offer? How does it seek to constrain and impose or empower? What stance does it take toward its own presentational content: regarding probability/realism, usuality, normativity, importance, etc. And finally, what does it present by ways of states of affairs, real and potential, and the processes, relations, participants and qualities, and circumstances thereof?

 That is a first iteration. Then one needs to look specifically at how the Pr, Or, and Org elements of each of the participating semiotic modalities interacts with and modifies the meanings of the others, coming through an iterative interpretative process (cf. the hermeneutic cycle) to some provisional hybrid net meanings on various scales.