Language and Evolution: Homepage Robin Allott



1. The origin and evolution of language was the result of a transfer of motor patterning from that controlling bodily movement generally to the articulatory organs.


The following extracts from others writing on language and motor control demonstrate that the idea of a close link between the structures of language and those of the cerebral motor system is increasingly accepted as a central theme::

"Language as a function is not purely emergent but evolved in relation to central programs for motor activity ... language as an elaboration, extension and abstraction of sensorimotor function ... the protoman made utterances that were coincident with and driven by the same rhythm as the movement in question." (Kinsbourne 1978: 553)

"linguistic structure may emerge from, and may even be viewed as, a special case of motoric structure, the structure of action." (Studdert-Kennedy 1983: 5) "we may speculate that the evolution of spoken language entailed the replication and adaptation to vocal function of neural circuitry evolved in the left hemisphere of prehominid primates for right-handed manipulation and bimanual coordination (Studdert-Kennedy 1991: 89)

" the motoric and perceptual mechanisms were in place long before language entered the stage. ... how the newcomers, speech and language, could acquire some of their properties by adapting to the phylogenetically older structures rather than the other way round." (Lindblom 1991: 22)

"The organizational characteristics of speech as a motor control system are fundamentally similar to other sequential motor actions and are felt to involve a limited number of general sensorimotor control processes" (Gracco 1990: 21)

"the networks for speech in the brain and in the model could be organised in the same way as those organising body movements and behaviour. ... developments of the motor and memory systems could lead to the development of language." (Kien 1992: 252)

This paper aims to explore how this broad concept of a cerebral motor basis can be applied to understand some much more specific aspects of the structure and processes of language.


2. There are basic (innate) elementary neural motor programs from which all bodily movements are constructed.


The following extracts show the theoretical and research support for the presence of innate elementary motor programs:

"it seems plausible that in nature the controlled variable of many motor behaviors is a functional combination of several interlaced actions may be that the precise combination of motor schemata defines skilled movement"(Abbs 1982: 541-2)

"These changes suggest a clear segmentation of the movement into units of action which overlap but do not coincide with the figural units as defined by the discontinuities of the movement (cuspids, points of inflection)." (Viviani and Terzuolo 1982: 431)

"The evidence ... indicates that infants are born with a considerable part of the neural structures that will coordinate the functional patterns of muscle activity [walking and reaching ] in adults ... modifications of movement sequences that they can produce without stimulation." (Trevarthen 1984: 247)

"one must show that secondary motor elements are indeed separate entities which may be called up by many different motor program instructions. Expressed another way, particular components of a specific movement should recur intact in many other movements." (Mackay 1985: 103)

"This segmentation may also indicate that movements are a combination of elementary dynamic components. This concept of movement organisation has been successfully applied to simulations of hand movements during writing. ... It is likely that the most complex motor acts, such as speech production, are based on the ability of the nervous system to combine different motor components." (Berkinblit, Feldman and Fukson 1986: 594-8)

"both the primitives which are acted upon by the procedure, and the result of the procedure constitute a finite, discrete set of logically identifiable entities ... the observed decomposition of movements into discriminable units is related to the hierarchical nature of the representation and control systems " (Viviani 1986: 214)

"A Vocabulary of Motor Acts. ... We propose that in inferior area 6 there is a vocabulary of elementary motor acts coded at the single neuron level. This vocabulary is essentially related to arm-mouth movements." (Rizzolatti and Gentilucci 1988: 281)

"Movement plans may be complex in the sense of being composed of separable component tasks. These components may be coordinated at some level by the voluntary motor system, in order to combine tasks into appropriate actions ." (Haggard 1991: 153)

It seems probable that there is a limited set of elements in the motor system, a limited set of motor subroutines, which can be used to produce an open-ended collection of distinct patterns of movement (in much the same way as speech elements can be combined to produce an indefinitely large collection of distinct words and word strings).


3. These elementary motor programs specifically control all the precise ballistic and targeted movements of the hand and arm. Movements of the hand and arm can be seen to be segmented into elementary movements (when for example there is damage to the cerebellum).


Some of the most complex bodily movements are those of the arm and hand. See for example playing the piano or other instruments, games such as tennis, cricket or baseball, many types of fine movement in the use of tools, typing and hand- writing. It is implausible that there should be distinct complex motor programs for every possible variation in the way in which a tennis ball is struck, a cricket- or base-ball is caught or thrown, a word is written, a piece of music is played. Organisational economy suggests that such skilled actions must be constructed from a limited number of basic or elementary movement-patterns.

Much of the research referred to in the extracts cited in relation to Proposition 2 was focused on the analysis of arm movements, and there has been extensive parallel research into the replication or simulation by robots of arm and hand movements.

"Both the effects of simplifying the dynamics computation and the limitations of feedback control in biological arms ... strongly suggest that there must exist substantially correct preprograms in order for humans to make accurate fast arm movements."(Hollerbach 1985: 140)

If human bodily action is a syntactic system organised hierarchically, as Lashley (1960: 526) proposed, then it seems appropriate and necessary to consider what the elementary movements might be, in particular what the movement elements might be for the arm (and hand). Bernstein concluded that the action units could not be specific contractions and extensions of individual muscles. In motor programs "we cannot discover any other determining factor than the image or representation of the result of the action" (Bernstein 1967: 49 - my emphasis).

It is a straightforward matter to describe what the elementary movements for the arm might be (see, for example, Rosse and Clawson 1970 Introduction to the Musculoskeletal System) :

- the arm can move forward and up starting from the side of the body to a position vertically above the shoulder.

- the arm can move out to the side and up through rather less than a semi-circle.

- the arm can move in the reverse direction across the body, though to a smaller extent than the outward movement

- the forearm can bend to touch the upper arm.

- movement of the upper arm and bending of the forearm can combine, for example, the arm can move across or out from the body with the forearm bent to a greater or less degree

- the arm (upper arm, forearm and hand) can turn clockwise or anti-clockwise

- the arm (upper arm and forearm) can move to or be held at any intermediate position or move from one intermediate position to another

The range of movements and positions of the arm may seem too obvious to be described. Abduction and adduction, flexion and extension, pronation and supination, rotation and circumduction are familiar terms. The significance is that the movements and positions are the elements which go to form the gestural patterns considered later in the paper.


4. The elementary motor programs when directed to the articulatory organs produce an equivalent set of elementary speech sounds (elementary articulatory programs)


The elementary motor programs are those which are necessary most obviously for control of all movements of the arm, hand and fingers. These same programs also control movements of other muscle-joint systems in the body, for example, of the legs and head.

A program of action can be executed in a variety of ways. As many authors have pointed out, we can write our signature with either hand, even at a pinch with our foot, with our nose or with a pen attached to the forehead (as in the case of some cerebral-palsy sufferers). The action program is the same but it is applied to completely different sets of muscle-joint systems.

Movement patterns normally executed by the arm and hand can be directed to the mouth and the articulatory organs. Darwin instanced the movements of the tongue by children learning to write. Arm and hand-movements are constructed from a limited set of innate motor programs or action units. These same elements are linked to control movements of the tongue and other articulatory organs and so produce a differentiated set of articulations, speech sounds.

Every element in the movement and positioning of the hand and arm (that is every element in the neuromuscular program) can be redirected to generate an articulatory complex, a structurally-related set of speech sounds.

The following extracts show that the relation between articulation and arm movements is now a familiar one in speech research:

"For many years we have known in a general way that speech and limb movements are related" (Munhall 1994: 174 reviewing Hammond: Cerebral control of speech and limb movements 1990) .

"Phonetic gestures can, then, be seen as adaptations to motoric and perceptual constraints that are language- independent and in no way special to speech." (Lindblom 1991: 21)

"A fundamental premise in the present model [of speech motor control] is that there are characteristic patterns stored in the nervous system" (Gracco 1992: 38)

"into a hierarchy originally established to control manual movements, for example, lower modes can be substituted which control speech articulatory movements. ... deeply embedded within the speech process can be manual actions and the schemas of representation which they support" (McNeill 1981: 205).

"comparing findings on the motor organisation of speech with the organization of voluntary movements about the elbow ...We have found that the kinematic patterns for movements of the tongue dorsum were similar to those of voluntary flexion-extension movements about the elbow" (Ostry and Cooke 1987: 223).

"the task dynamic model we are using for speech was exactly the model used for controlling arm movements, with the articulators of the vocal tract simply substituted for those of the arm." (Browman and Goldstein 1991: 314)

"positing that prelinguistic units of action are harnessed into (gestural) phonological structures through differentiation and coordination. " (Browman and Goldstein 1992: 155)

"such gestures not only can characterize the movements of the speech articulators but also can act as phonological primitives" (Browman and Goldstein 1991: 313)


5. Every program controlling movement of the hand and arm can be redirected to form an equivalent articulatory program; similarly every articulatory program can be redirected to produce an equivalent movement of the hand and arm.


6. Every articulatory gesture can be redirected to produce an equivalent gesture of the hand and arm; every gesture of the hand and arm can be redirected to produce an equivalent articulatory gesture.


The propositions go further than equivalence between elementary movement patterns and elementary articulations - speech sounds. It extends to motor programs formed from a number of elementary action units, both for movements of the arm and hand and for articulation.

Each speech sound is coordinated with a particular body movement (a position or partial movement of the hand and arm). Each concatenation of speech sounds is coordinated with a homologous pattern of movements of the arm and hand.

We can convey a meaning to another by moving our hand or arm, either by imitating a particular action (or in fact performing the particular action) such as HITTING, GIVING, HOLDING or by using a movement of the hand and arm to draw in the air a visual contour, a picture of what we want to refer to eg. a circling movement to represent something circular, or we can indicate our meaning by pointing to an object, perhaps some part of ourselves, our mouth if we are hungry or thirsty, our eye or ear to show that we can see or hear something and so on.

With a shift of attention, ie. a redirection of the program of action, to the mouth and throat, and given a stream of air on which changes in the positions and movements of the articulatory muscles and cartilages can operate, the step between action and language is crossed. The word initially is the by-product of the action because it transduces the action into a patterning of the articulatory system.

Similarly, the action-program involved in production of a certain pattern of speech-sounds when transferred to control the position and movement of the hand and arm is manifested as gesture accompanying speech.

"We also examined movement from the point of view of how it is related in form to the lexical content of the speech. We found ... that the movement in speech began before the lexical item it was to mark, but was completed the moment the lexical item was completed. It seems that the speech-accompanying movement is produced along with the speech as if the speech production process is manifested in two forms of activity simultaneously - in the vocal organs and also in bodily movement, particularly in movements of the hands and arms." (Kendon 1972: 205)

"their [Browman and Goldstein's] conception of articulatory gestures ... the coordinative movement of groups of articulators ... [that] can be characterized using dynamical equations.. [which] stem from work done by Kelso and others modelling the control of other muscular systems (e.g. arm movement) and thus do not represent a domain of control unique to speech. This view of [articulatory] gesture is very attractive." (Fox 1994: 110)


7. Gestures of the hand and arm in a number of different ways represent, or more precisely, are structured by the contours of perceived objects or of larger bodily actions. A gesture can be structured by a perceived circle or square, by the contour of a tree or a house, by the perceived action of another person or by recall of a particular object or action.


"While people talk, they also use their hands. 'illustrative gestures' are used to indicate shapes, sizes, directions and to point, for example to describe a spiral staircase. .. Where illustrative gestures are similar in form to their reference, emblems [gestures with arbitrary meanings] usually are not" (Argyle 1987: 63)

Gesture can be classified in terms of its duration, its elaborateness and in relation to the semantic content of speech: (a) Small unclassified gestures - mere movements (b) Word-gestures i.e. gestures which clearly emphasise or illustrate a particular word used by the speaker (c) Elementary sentence-gestures - at the simplest, a nod of the head meaning 'I agree'.

Word-gestures emphasise or illustrate single words: the grasping movement referred to by Whorf to illustrate the grasping of an idea, the circling gesture of the hand which some may use in saying that 'Something or somebody went round and round', the listening gesture, with the hand moved towards the ear, to accompany 'What did you say?', the throwing up of a hand accompanying 'Ah,well', the vigorous movement down of the fist to accompany the word 'Stand' in saying 'I simply can't stand it', the forward movement of the hand and arm in saying 'Go!', the beckoning of the finger in saying 'Come on!', the fingers touching the chest to emphasise 'I' in 'What I think is this', the downward movement of the hand accompanying 'one' in 'There is just one thing', the outward movement of the hand and arm accompanying 'away' in 'Oh, go away', the finger pressed against the forehead in 'I just can't think what to do', or the hand raised with the fingers pointing up and elbow bent in 'Look! I've just about had enough'.

A gesture may resemble its subject of reference by the kind of movement involved. The simplest gesture for hitting something referred to in speech is a movement of hitting. The gesture and the action referred to by the word are in this case virtually identical and the only difference between gesture and action is that the object hit is not present but only referred to in speech. The simplest gesture for giving something, for taking something, is the normal movement of giving or taking. There is thus an extensive category of what one might call action-gestures where the meaning of the gesture is immediately obvious.

A gesture may resemble its subject by the kind of shape traced out by the stationary or moving hand and arm. The simplest example of this might be the gesture normally used for a circle which is a circling movement of the hand and arm. Much the same gesture is used for expressing the word 'around'. Another example would be the gesture often seen if someone says that a thing is 'huge' - both the arms are spread apart to indicate in a direct way the size of the object referred to. A zig-zag movement would be another example of a shape- or form-gesture.

A gesture may be an indication. This is perhaps not so much resemblance as a variant of the action-gesture. The most rudimentary gesture is to point to the object referred to or more particularly to the feature of the body referred to. So the gesture for me is simply the hand pointing to the chest (or touching it in emphatic speech). A gesture for the ear is to point to or touch the ear - and so on.

"It is a remarkable fact that apes, though they selectively orient, do not point . Rieber. Why is that? Kinsbourne. I don't know. " (Kinsbourne in Rieber 1983:161)

And there can be gesture for function words: "William James speaks of specific feelings accompanying the use of such words as 'and', 'if', 'or'. And there is no doubt that at least certain gestures are often connected with such words, as a collecting gesture with 'and', and a dismissing gesture with 'not'" (Wittgenstein 1960: 78)

Finally, there is difficult question how it is that we are able to interpret gesture, a problem closely linked in principle to the question how we are able to perceive and interpret the utterances of another person, speech perception. "We respond to gestures with an extreme alertness and, one might almost say, in accordance with an elaborate and secret code that is written nowhere, known by none and understood by all." (Sapir quoted by Plutchik 1980: 269)


8. Every gesture structured by a perceived object or action or by a recalled object or action can be redirected to produce an equivalent articulatory action.


The perception of the object is transduced into an action, a movement of the arm and hand representing the object. From this movement in turn is derived the articulatory gesture, the utterance relating to the perceived object. This same process occurs when the object is mentally imaged and not externally perceived.

The problem is to explain how this complicated inter-relation between perception (external or internal), bodily movement (gesture of the hand and arm) and the production of an articulatory pattern related to the perception and the gesture can operate.

"When people think in words, it does not necessarily serve any adaptive purpose for them simultaneously to move. Nevertheless, involuntary changes in position of which the subject is quite unaware can be observed during verbal thought ... a behavioral spin-off of shift in the distribution of neural excitation within the brain due to the adoption of a verbal mental set." (Kinsbourne in Rieber 1980: 161)

"the same areas of the early visual cortex that are excited by visual stimulation are also activated during mental representation of the same stimulus." (Le Bihan et al. 1993: 11802)

"translating sensory inputs to motor outputs ... visual inputs must be transformed from retinal coordinates to coordinates that specify the location of visual objects with respect to the body to perform accurately directed movements" (Zipser and Andersen 1988: 679)

"It would seem that our perception of objects, and particularly of their spatial relations, is determined in part by the laws governing the movements of the eye" (Davson 1972: )

The essential idea in the detailed development of the hypothesis of phonological/semantic equivalence is that the gross muscular expression of the word/articulatory pattern can be observed and analysed in the form of gesture and that complex gestures can be broken down into gestural elements associated with particular sound-elements

"we should regard the gesture and the spoken utterance as different sides of a single underlying mental process..... I credit the discovery that there is a unity of speech and gesture to Adam Kendon ... gesture and language are one system" (McNeill 1992: 1-2)

"[different types of gesture] gestures that depict the visual appearance of something ... gestures that suggest a pattern of action of some sort. ... movement patterns which can be used to depict concrete actions, but which also can be used as metaphors for processes of thought." (Kendon 1991: 4)

"speech and gesture arise as interacting elements of a single system" (McNeill 1987: 503

"since gestural expressions are fully integrated with spoken aspects, ... however ideas are stored in our heads, they must be stored in a way that allows them to be at least as readily encoded in gestural form as in verbal form ..." (Kendon 1986: 42)

"the close, if not inextricable, interaction between perception and action. ... recognising an object as a table is not so much experiencing a mental picture as being able to plan what one can do with object ... This complex association of perceptual and other knowledge may be referred to as a schema" (Roth and Frisby 1986: 189)

"The central thesis is that the visual system and the motor system are functionally inseparable ... they are components of a unified perceptuo-motor system, which is itself a component of the organism-environment system." (Lee 1980: 281)

"... the role of vision in controlling pointing and reaching movements in man. Vision is devoted not only to building up an internal representation of the external world. It also has a motor function. Visually directed action implies continuous transformation of incoming visual stimuli into motor commands." (Jeannerod 1986: 41)


9. Specific articulatory gestures generate specific phonetic-phonological patternings of utterances.


10. Speech-sounds, and beyond them aggregations of speech- sounds in words, are equivalent to, homoeomorphic with, gestures structured by perceived or recalled objects or actions.


The relation between gesture and spoken language has been the subject of a considerable amount of theory and research. The approach to the issue can be divided into a broad approach and a more narrow, or rather more specific, approach. Authors who have followed the broad approach include David McNeill and Adam Kendon (and many other notable figures such as Wundt, Mead and Paget) plus, of course, the extensive work of Gordon Hewes on the gestural origin of language. The broad approach has considered the relation between speech and language at the lexical and syntactic levels. Recently, particularly in the Haskins Laboratories there has been emphasis on a narrower approach in terms of the relation between articulatory 'gestures' and gesture in the more familiar form of arm and hand movements. In the final sections of this paper I attempt to adopt not a broad or narrow approach but a more specific approach which demonstrates the relation between particular speech sounds and particular positions and movements of the hand and arm, and between particular words and particular gestural patterns.



"[London Pub Scene filmed by Birdwhistell and Van Vlack, of people as they drank and talked in the private lounge of a London pub] ... The main conclusions ... may be regarded as hypotheses about how speech and movement are related ... we may see the pattern of body movement that one associates with it [the flow of speech] as organized in a similar fashion, as if each unit of speech [locutions] has its `equivalent' in body motion." (Kendon 1972: 204)

"changes in the patterning of movement which occurs as the subject is speaking are coordinated with changes in the pattern of sound." (Kendon 1972: 205)

Adam Kendon surveyed the position at the 1991 De Kalb meeting of LOS in his paper "Revisiting the gesture theory of language origins." "clear evidence that the gesture phrase is fully organized either prior to, or at the same time as, the spoken phrase" (1991: 3) "The use of gestures that have been so extensively attested to by McNeill ... and others suggest that visual and perceptuo- motor images are continually mobilized as an integral and indispensable part of the process by which we organize our meanings." (1991: 5) "If the same metaphors can find representation in both gesture and speech, it would seem that both modalities are drawing on the same representational substrate. This, it seems, is derived from internalizations of our sensory, especially visual perceptions and manipulations of the physical world." (1991: 12)


McNeill has also explored over a number of years the relation between gesture and spoken language, most recently in his Hand and Mind What Gestures Reveal about Thought (1992).

"speech and gesture arise as interacting elements of a single system"(1987: 503)

" gesture and language are one system" (1992: 2) " [from Kendon's 1972 paper table] ... It appears that each speech unit has its equivalent unit of body motion"(1992: 84)

"Gestures and speech break down together in aphasia ... a parallel interference of speech and gesture does appear [with lesions]" (1992: 24, 323)

"All [experiments described] would suggest that thought is image and word." (1992: 271) "Iconic gestures appear to be images of concepts and imply the existence of schemas which produce them" (1981: 203)

"the half of the brain that is dominant for language also appears to be the significant locus of the gestures that accompany speech" (1992: 331)

"language is a form of action"... "many utterances are constructed in terms of concrete models of reality ... simultaneously parts of action" (1979: xi)

"Linguistic intersubjectivity is reconstructed by Mead from the structure of gestural communication, which is connected more closely with the body and founded in cooperative action." (Joas 1980: 14)


"Articulatory phonology does not take the goal to be auditory. Rather, it hypothesizes that the goal of the speaker's behavior is a particular organized ensemble of articulatory gestures." (Browman and Goldstein 1992: 222) "In articulatory phonology, the basic units of phonological contrast are gestures ... Utterances are modeled as organized patterns ... of gestures, in which gestural units may overlap in time. The phonological structures defined in this way provide a set of articulatorily based natural classes" (Browman and Goldstein 1992: 155) "we show that such gestures not only can characterize the movements of the speech articulators but also can act as phonological primitives" (Browman and Goldstein 1990: 313) "We will examine how contrastive words (and morphemes) differ from one another in terms of their component gestures and their organization ... the same gestural structures ... directly ... characterize the articulatory movements of these words ... Gestures are not the models themselves but rather abstract characterizations of the movements." (Browman and Goldstein 1991: 315) "to show how lexical representations can be viewed as gestural structures ... contributes to an understanding of phonological inventories and processes ... the range of phenomena that can be handled with the relatively simple assumptions that we are making: that phonological structures consist of temporally overlapping, dynamically defined gestural units ..." (Browman and Goldstein 1989: 19)


11. Distinct speech-sounds (consonants and vowels) are equivalent to, homeomorphic with, distinct positions and movements of the hand and arm. These equivalences can be observed and specified.


Earlier sections have argued that distinct speech-sounds are the result of the transfer to the organs of articulation of the set of elementary motor programs that originally evolved for control of bodily movement. Articulatory 'gestures' are produced by specific motor programs.

Any motor program can be executed in a variety of ways, by applying it to different muscle-joint complexes. The motor programs producing the elementary speech sounds can be redirected to produce bodily movement other than that of the articulatory organs.

The articulatory 'gestures' are perhaps the only route to discover what the elementary motor programs are. We know what simple movements of the arm and hand are possible and that these simple movements are the product of the set of elementary motor programs.

To identify what the specific motor elements are, the articulatory 'gestures` have to be redirected to the muscle- joint complex of the arm and hand. Every speech sound should then produce a distinct patterning of arm and hand movement or position.

To redirect motor programs from one muscle-joint complex to another, we have to "image" the familiar action and then shift attention to the muscle-joint complex to which we want to transfer the program. So we "image" the writing of our signature and then transfer it, for example, to our left hand, our foot, to our nose, to our forehead, as previously discussed. The action of writing the signature is then executed by the left hand, the foot, the nose or the forehead.

The same technique (essentially the ideomotor process described by William James) can be applied to transfer the articulatory 'gestures', the motor programs that produce the range of speech sounds. Catford has usefully described "the technique of silent practice of sounds". Auditory sensations, he says, mask the proprioceptive sensations, that is, the kinesthetic awareness of the articulatory process. "To get at the latter, you have to eliminate the auditory sensations." (Catford 1991: 176)

To transfer the articulatory 'gesture' for a speech sound to produce a change in the position or movement of the hand and arm, the first step is to use the Catford technique to separate the sound from the motor aspect of the articulation. Without uttering the sound, "image" the production of the speech sound. Then in the same way as signing one's name can be transferred to the foot, so with a shift of attention we transfer the motor program for the speech sound to the hand and arm. We should then observe a movement or position of the hand and arm which is structurally related to the articulatory 'gesture'.

The ability to do this is not acquired easily, any more than the phonetician can without difficulty introspect and so visualise articulatory movements and positions associated with particular speech sounds. What is needed is a capacity to focus on the particular speech-sound, and then the ability to move one's attention to the hand and arm.

There is no direct method for determining the relation between articulatory and manual gestures in terms of neuromuscular commands. The neurologists are not yet able to say what the specific relation is between activity of cortical cells and the articulation of phonemes. However, one can try to examine the relation in the following way:

Start with the vowels. Say out loud the vowels A E I O U to increase awareness of the articulatory 'gestures' which produce these sounds. Then, without uttering them, have an auditory 'image' of each sound and be aware of the process of articulating it. Observe the movement and position of the tongue, the lips and the mouth generally. With the right arm hanging loosely at the side, as one forms the mouth to produce the vowel sound, be aware of the position and tension of the muscles in the right hand and arm. Transfer attention to the hand and arm. Let the sound of the vowels be felt in the hand and arm at the same time as one hears the sounds in one's head.

For the consonants, follow the same procedure, applying it first of all to the consonantal sounds B C D F G. Without uttering any sound, think and form the sounds by themselves, be aware of the articulatory movements and positions required to form the consonant without a following vowel. Observe the effect when attention is shifted to concentrate on the position of the hand and arm. A similar procedure can then be followed for L M N R.


12. Specific aggregations of speech-sounds, words, can be correlated with specific gestures structured by perceived or recalled objects or actions. These equivalences can be observed and specified.


Words in origin are natural because they carry in their structure either a direct representation of a percept or action or an indirect clue or indication of the percept or action to which the word relates.

Words are formed from speech sounds, each of which is the product of an elementary motor program, an articulatory 'gesture'. For each articulatory 'gesture' there is a corresponding movement of position of the hand and arm. If we have discovered these corresponding movements, then the manual gesture correlated to the word can be formed from combining the elementary movements of the hand and arm corresponding to the speech sounds forming the word.

However, before the set of speech sounds and correlated arm movements and positions has been identified, it is possible, following a technique similar to that outlined in the previous section, to explore the manual gestures associated with individual words, considered as combinations of elementary articulatory 'gestures'.

Examine a number of words and consider how far there is any felt relation between the sound-structure of the word and its meaning. At the most obvious, there are words like 'hiss' 'cuckoo' 'cough' 'howl' 'tick-tock' 'ding-dong' which are clearly onomatopoeic - but what is it that makes them onomatopoeic, what is the specific nature of the resemblance between the word and its meaning?

A next set of words might be 'whistle' 'wind' 'whip', which are nearly onomatopoeic. What is the nature of the relation between the sound and the meaning of these words. A third set of words might include 'yawn' 'suck' 'spit' 'smile' 'gnash' 'sneer'. They derive their naturalness from some relation between the movements made in pronouncing them and the movements of the face to which they refer.(see Ojemann on the neural relation between words and orofacial motor control).

The words listed so far have had a clear natural resemblance to their meaning. What kind of resemblance can one feel or perceive for other words where this does not apply so obviously, words such as 'clap' 'jump' 'push' 'crush' 'hurrah'. Where does the appropriateness derive from? Or consider similarly 'sparkle' 'splash' 'splosh' 'flash' 'quick' 'weary' 'burst', words where no action by the speaker is involved. Concentrate on words such as 'neck' 'nod' 'nose' 'eye' 'lip' 'ankle' 'finger' 'knuckle' 'elbow' 'wrist' 'high' 'head' 'hill'. Say them silently but forcefully. Observe whether any associated feeling or movement is apparent.

For example, say 'neck' sharply, vigorously, with emphasis, several times aloud. Then mentally "image" the production of the word. Note any bodily movement that follows or accompanies the saying or imaging of the word. With the word 'nod', follow the same procedure. Or reverse the process. Make some simple movement and accompany it by saying the word which describes it. For example, say 'hit' at the same time as making a hitting movement. Consider how well or otherwise the word as spoken goes with the movement as it is made. Consider how far a word such as 'clasp' is appropriate for the action of clasping or 'grasp' is appropriate for grasping. What relation may there be between 'clasp' and 'clap', between 'grasp' and 'grip' or between 'clap' and 'pat'?

This is a simple, direct but unavoidably subjective method of seeking to assess the naturalness of the association between the sound-structure and the meaning of individual words. It was by very similar methods of self-observation that phoneticians like Henry Sweet and Daniel Jones laid the foundations for the systematic study of speech-sounds and their relation to articulatory movements. In present-day linguistics, grammatical correctness is assessed by linguists on the basis of their own intuitions, and syntactic theory is based on subjective observation.

There are observable regularities between the sound-structure of words and their meaning. For example, words which start with /l/ /m/ /n/ or /r/ typically have associated with them some aspect of meaning involving turning, bending or curvature. Words with initial /s/ /w/ /sh/ or /z/ frequently have aspects of meaning involving a movement sideways or from side to side ('shake' 'wag' 'side' 'sway' 'zig- zag'). Words beginning with the speech-sounds /j/ /p/ /t/ /ch/ /v/ /th (uv)/ /y/ frequently have an aspect of meaning involving moving energetically forward: pointing, touching, jumping, throwing, tossing, vaulting, and so on. Consider the relation between speech-sounds and meaning for the words: TOUCH HIT HIGH UP EYE AT THAT THIS THE A .


13. How does the perceiver\receiver understand speech?


"What the Motor Theory [of Speech Perception] says is that there ought to be a common locus for perception and production." (Liberman 1991: 186)

"the objects of speech perception are the intended phonetic gestures of the speaker, represented in the brain as invariant motor commands ... To perceive an utterance, then, is to perceive a specific pattern of intended gestures ... the link is innately specified, requiring only epigenetic development to bring it into play". (Liberman 1985: 2-3)

"in the non-human case ... evidence that production and perception of communicative signals have genetic and neurobiological roots in common. Surely it is reasonable to assume a similar arrangement for phonetic communication in humans. ... production and perception are simply different faces of the same module, sharing a common set of (gestural) primitives and, to the extent possible, a common set of processes." (Liberman 1991: 446)

"the recent revised Motor Theory has changed ... to something that is not particularly well specified." (Klatt 1991: 175)

"In articulatory phonology, the basic units of phonological contrast are gestures ... Utterances are modeled as organized patterns ... of gestures, in which gestural units may overlap in time. The phonological structures defined in this way provide a set of articulatorily based natural classes ... Articulatory phonology does not take the goal to be auditory. Rather, it hypothesizes that the goal of the speaker's behavior is a particular organized ensemble of articulatory gestures." (Browman and Goldstein 1992: 155, 222)

"lexical representations can be viewed as gestural structures ... ... phonological structures consist of temporally overlapping, dynamically defined gestural units" ... "a listener ultimately recovers the set of gestures that are part of a given lexical entry" (Browman and Goldstein 1989: 19, 1)

" This [Browman and Goldstein's] view of [articulatory] gesture is very attractive .... However, little space is given to discussion of how a listener might make (in any direct manner as would be required by the Motor Theory) the conversion from acoustic patterns to gestural patterns." (Allen 1994: 110)

"perception involves recovery of distal events from proximal stimuli. this holds for perception of other acoustic stimuli and for visual stimuli ... " ... "the motor theory ... is fundamentally correct in its claim that listeners perceive phonetic gestures ... It is wrong ... in the view that closed modules must be invoked to explain why distal events are perceived ... our own direct-realist theory ... fits better in a universal theory of perception " (Fowler and Rosenblum 1991: 56, 33)

"in both the approaches to the theory of speech perception considered here, ... a motor entity called the gesture [considered to be innate] is regarded as a unit of perception. Both approaches share the problem that the concept of gesture has not been adequately explicated."(MacNeilage 1991: 66)

"Several populations [of neurons] ... responded in the same way to both speech production and perception ... The neuron illustrated in Fig. 14 ... also seemed to have some similar response to the same phoneme during speech perception and production ... The same cortical sites that are essential for orofacial motor control were often essential for phoneme identification (i.e. for perception, not production of phonemes) ... suggests some common mechanism for both orofacial motorcontrol and speech sound identification " (Ojemann 1991: 208, 220)

"This association between motor movements and phoneme identification is strong and unique direct evidence for a motor model of speech perception" ... One population [of neurons] ... seemed to have a specific pattern of activity each time a specific word was perceived, a pattern that also seemed to be present with production of that word " (Ojemann 1991: 225, 122)

"although listeners obviously cannot have kinesthetic feedback from someone else's articulation, they interpret what they hear by implicit motor- matching. ... actual movements of the organs of speech become unnecessary; the appropriate pattern of impulses within the central nervous system is enough." (Hockett 1987: 39)

"The main roadblock for understanding vocal learning in birds - or in humans ... may have been the idea that two physically and procedurally separate processes are involved, one dealing with sensory learning and the other dealing with efferent control. Many of the paradoxes may disappear if perception and production are two closely related states of a same system." (Nottebohm et al. 1990: 122)

"To some extent perception and production may be two closely related states of a same system. ... Why do the muscles of the syrinx respond to sound? ... We suggest that there is a close relation between how a bird learns to produce its song patterns and how it perceives conspecific song ... " (Nottebohm et al. 1990: 115)

There is some truth perhaps in each of the theories. The important step is examine what the neurological basis could be for the relation of perception and production.


14. How does the perceiver\receiver understand gesture?


"We respond to gestures with an extreme alertness and, one might almost say, in accordance with an elaborate and secret code that is written nowhere, known by none and understood by all." (Sapir quoted by Plutchik 1980: 269)

Meltzoff and Moore(1977, 1983) carried out well-conceived, well-controlled and statistically-validated experiments to investigate imitation by newborn infants of adult facial gestures.

"Infants between 12 and 21 days of age can imitate both facial and manual gestures... Such imitation implies that human neonates can equate their own unseen behaviors with gestures they see others perform." (Meltzoff and Moore 1977:75)

"The hypothesis we favor is that this imitation is based on the neonate's capacity to represent visually and proprioceptively perceived information in a form common to both modalities. The infant could thus compare the sensory information from his own unseen motor behavior to a `supramodal' representation of the visually perceived gesture and construct the match required. ... Our recent observations of facial imitation in six newborns- one only 60 minutes old - suggest to us that the ability to use intermodal equivalences is an innate ability of humans." (Meltzoff and Moore 1977: 78)

"Newborn infants ranging in age from 0.7 to 71 hours old were tested for their ability to imitate 2 adult facial gestures: mouth opening and tongue protrusion. ... The results showed that newborn infants can imitate both adult displays. 3 possible mechanisms underlying the early imitative behavior are suggested ... ... It is argued that the data favor the third account [active intermodal matching]." (Meltzoff and Moore 1983: 702)

"we postulate that infants use the equivalence between the act seen and the act done as the fundamental basis for generating the behavioral match. ... this imitation is mediated by a representational system that allows infants to unite within one common framework their own body transformations and those of others. According to this view, both visual and motor transformations of the body can be represented in a common form and thus directly compared." (Meltzoff and Moore 1983: 708)

Kuhl and Meltzoff carried out parallel experiments on the ability of infants to respond to mouth movements and speech sounds:

"by 18-20 weeks of age infants relate the speech sounds they hear to ones they see being produced. Moreover, they relate the sounds they hear to the motor movements necessary to produce them, and initiate these movements to create the event themselves. it seems likely that both these phenomena - cross- modal perception and vocal imitation - are linked to some common representation of speech. The notion that the auditory, visual and motor systems for speech are linked in infants is further reinforced" (Kuhl and Meltzoff 1988: 263)

"The central issue is imitation. How does a child (or, for that matter, an adult) transfer a pattern of light or sound into a pattern of muscular control that serves to reproduce a structure functionally equivalent to the model? The hypothesis to be entertained is that imitation is a specialised mode of action in which the structure of an amodal percept directly specifies the structure of the action to be performed." (Studdert-Kennedy 1986: 206)

Trevarthen reports similar findings:

"Although infants ... do learn by imitation ... the structural foundations for the imitative movements cannot be learned. It is necessary to assume an innate structure that at least partly matches the structure of the adult models to explain both imitation and more complex reciprocal or complementary interactions which are characteristic of communication between child and adult from immediately after birth." ... "The neural basis for empathic response would underlie imitation in both directions" (Trevarthen 1984: 253, 256)

"infant's movements precisely synchronised to the segmentation of the adult's speech at both the phoneme and syllable boundaries" (Condon 1974: 459)

"it seems to have been cerebral reorganisation that has been decisive for the origin of speechlike communication, such as the ability to form crossmodal associations and increased memory." (Wind 1976: 626)

"... it is concluded from our data that there is a strong selective advantage in developing multimodal integrative capacities, and this is seen in at least two groups, mammals and birds." (Rehkamper, Frahin and Ziller 1991: 139)

"Perception and action are mutually entailed components of a single system. Their interlocking operation is possible because the information picked up by a perceptual system is amodal and directly specifies within the constraints of the actor's goal, the action to be performed." (Studdert-Kennedy 1986: 212)

What seems clear from the above is that the intermodal or amodal relationship of perception and action provides the basis for imitation. This also seems to be the case for the relation between speech production and speech perception, and for the relation between gesture and speech.


15. The role of the cerebellum.


The cell population of the entire human brain usually given as 10,000 million ... not likely to include the cells of the cerebellum in which one cell type alone, the granule cell, is supposed to number around 10^10 or 10^11. A single neuron in the cerebellum may accommodate 200,000 synapses. (Kuffler and Nicholls)

" The evolution in complexity and size of the cerebellum is at least as striking as that of the cerebrum.... The critical point is that we need a way of going from points or vectors in sensory space to points or vectors in motor space. (Churchland 412 ff).

"It has long been known that lesions of the cerebellum lead to a marked deterioration in the regulation and coordination of movements. ... The cerebellum as a meta-system. ..the role ... to tune or regulate the activity of various motor mechanisms. It is not an integral part of these mechanisms, but it is in parallel with them as a `meta-system'." (Harvey 1985: 223-224)

"there is little agreement on the precise functions of the cerebellum in control of movement ... The last of the theories is most recent. ... The idea is that several brain regions contribute (in parallel) separate components of an intended movement. ... The role of the cerebellum is suggested to be that of scaling the amplitude of these components so that the movement is accomplished by the correct (and complementary) amounts of movement at each point. It can be viewed as transforming what we want to do into how we do it." (Rothwell 282-283)

"the remarkable fact that the cerebellum receives information from all the motor centres and from the majority of receptors and, in its turn, sends the signals to all the motor centres." (Arshavsky, Gelfand and Orlovsky 1985: 95)

"Recently it has been suggested that the role of the cerebellum may not be limited only to actual motor performance but that it is also involved in mental aspects such as imagination of sequences of movement (motor imagery, M.I.)" (Ryding, et al. 1993: 94)

"the cerebellum is associated with sensory systems used for tracking movements of targets in the environment, as well as movements made by the animal itself, in all vertebrates, not just in a few isolated cases ... movement and perception are interrelated ... it is not surprising to find that a neural structure that is important for fine control and coordination of movements is also important for sensation and perception." (Paulin 1993: 39, 47)


16. The chameleon theory of speech\gesture perception


I have already quoted Studdert-Kennedy's remarks that the central issue is imitation. How does a child (or, for that matter, an adult) transfer a pattern of light or sound into a pattern of muscular control that serves to reproduce a structure functionally equivalent to the model?

"We come back to a very knotty problem: how is it that man achieved this extraordinarily perfect imitative ability, the ability immediately to imitate the sound which he hears? As far as we know, no other animal, apart from the birds, can do this." (Thorpe 1967: 10)

"a great capacity for imitating, that is, translating perceived into performed movements. This may indeed have been one of the most important steps in the development of the brain." (Hayek 1973: 241)

"Although infants ... do learn by imitation ... the structural foundations for the imitative movements cannot be learned. It is necessary to assume an innate structure that at least partly matches the structure of the adult models to explain ... imitation ... from immediately after birth." (Trevarthen 1984: 256)

If you see someone yawning, you will probably yawn. If you think about (visualise or form a mental image of yourself) yawning, you will probably yawn.

All bodily movements are changes in posture - and posture is body-image. We visualise any movement (of arm, leg, head, hand, mouth) as a change in body-image, a change in posture - the movement brings our actual posture into coincidence with our visualised new body-image, visualised posture. This is readily linked to the newborn infant's ability to mimic adult facial movements. How does the hours-old baby do this? - protrude its tongue when it sees the adult do so?

The changes in brain-patterning from seeing are translated into specific motor-patterning which produces the infant's re- presentation of the adult facial movement. There is a transduction of seen visual patterning into a corresponding motor-patterning of the infant's face.

Perception appears to be a process similar to that by which the chameleon changes its bodily state to match its background. Perception (on the chameleon theory) is internal ordering guided by external ordering. The perception of speech and the perception of gesture are aspects of this chameleon- process. Ideomotor action is the reverse process. Both stem from the integration of the motor system and perception.