Language and Evolution: Homepage Robin Allott

Structural interrelation of language and the processes underlying visual perception and action
Indications of isomorphism

In the previous paper, it has been proposed that language (both the surface syntax of a language and the words forming its lexicon) cannot be arbitrary, as orthodox linguistics assumes, but must be naturally based, for a whole variety of reasons which are set out in that paper. One of the most powerful arguments is drawn from the facility, rapidity and regularity with which children acquire the different aspects of language, not only recognition of speech-sound and the discriminable phonemic categories (distinguished very soon after birth) but also the words of their mother-tongue and then grammatical morphemes and processes of the community's language (which they achieve substantially by the age of two) . The proposition in the preceding paper is that children's ability to learn language can only be based on a process analogous to imprinting (the ethological process originally described by Konrad Lorenz) and that the infant is pre-tuned neurologically (in a way very similar to the pretuning of the song-bird for learning its species' song) to understand and in due course to speak any one of a multitude of languages which satisfy certain general characteristics, notably the use of a certain range of sound (speech-sound) and a certain set of speech-sounds (the phonemes). The infant is genetically prepared to extract from the stream of speech-sound the particularly meaningful speech-sounds, to perceive the formation of these phonemes into a particular class of word-forms relatable to items found in perceptual experience, and to comprehend and later to form words in an ordered way into strings or sentences. The infant's ability to acquire language rapidly in this way is possible because the words and syntactic forms found in each language are the result of selection between different possible natural sets of speech-sounds, words and syntactic forms, as they have been determined by the development of the genetic character of the community over a long period of time - involving genetic changes affecting both the articulatory apparatus and the underlying neurological structures controlling speech and language.

The question striking anyone reading this must be: 'Whilst I can envisage that in some respects language may not be arbitrary, I cannot see how the myriad rules of syntax or the thousands of words which form the lexicon in any single language could be preprogrammed in the infant genetically nor even how there could be in the child an innate propensity to learn the thousands of words peculiar to the particular language. How can there be genetic determination, within a community, of the detail of the lexical and syntactic forms of a given language? So Tax (as quoted by Hewes) says:

"Cultural behaviour has a quality of arbitrariness. It does not flow through the genes and is therefore not anchored in the individual. This is seen most clearly in the arbitrariness of the symbols of language." Similarly, Gregory, though accepting the possibility of a genetic base for syntax , denies the possibility of such a base for lexicon: "Just because words differ between languages, and because languages are so recent and change so rapidly, it is quite clear that our knowledge of the names of things cannot be innate. It cannot be built into the nervous system. Words and names cannot be inherited," But at the same time if words of a particular language have no genetic base, they must have been invented, imposed or produced by deliberate convention - and the arguments against this, set out in the previous paper, are even more powerful than those advanced against any natural genetic origin,

The hypothesis examined in the present paper is that language derives its non-arbitrary character, and therefore its ability to be pretuned in the infant neurologically and physiologically, from a direct relation between the structures serving language and the structures serving the other major aspects of human functioning, notably the physiological and neurological organisations underlying visual perception and bodily action. The proposition is that there is an especially intimate relation between visual perception and language (both of which are in their turn very directly related to the physiological and neurological structuring making possible effective bodily action).

The idea that there must be some special relation of language and perception (particularly visual perception) is an ancient one. Humboldt, Sapir and Whorf elaborated a theory of the relation which would make the character of perception dependent on language - the theory that the world we see is systematically distorted by language. Others have judged this implausible and have suggested the reverse relation, that perception is primary and has deeply affected the structure of language. The relation between language and perception has been discussed by writers drawing on very different types of expertise: philosophers, psychologists, physiologists and neurologists.

Perception and language and the interrelation of the two have long been key topics for philosophers, in recent years attracting particular attention as a result both of the development of the linguistic approach to philosophy and the renewed interest in the philosophy of language. Perception and language were central concerns both of the earlier and of the later Wittgenstein, (even though he abandoned his first theme of the logical isomorphism of language and the world in favour of a conventional view of language encapsulated in his concept of the 'language game'). "The difficulty of my theory of logical portrayal was that of finding a connection between the signs on paper and a situation outside in the world. The proposition is a picture of reality - a model of reality as we think it is, a description of a fact."

From psychologists, one can refer to the views of Miller and Johnson-Laird in their book 'Language and Perception'. In that their stated goal was to formulate a theory relating language and perception "since if, in fact, perception is the only source of valid knowledge about the world, studies of perception should show us rather directly the foundations of human language and the meanings of human words in particular." The central problem for them was to explain "how, for example, the word LAMP becomes associated with the object 'lamp'. They rejected as clearly inadequate any normal theory of associational learning but found themselves in the end unable to resolve the problem. "Given this morass of complexity and ignorance any account of the labelling of percepts by words must be speculative and incomplete."

Another psychologist, Osgood, tackled the problem at the level of the sentence and from an interesting series of experiments on the relation between perception of a given scene and the sentences used by subjects to describe it under varying conditions concluded: "It seems perfectly reasonable to think that much, if not all, that is universal in human language is attributable to underlying cognitive structures and processes. Many properties of grammar are present in some form in pre-linguistic perceptuo-motor behaviour. Not only does the sheer ability to 'paraphrase things' in language imply that perceptual and linguistic signs must share the same underlying cognitive or semantic system, but the detailed ways in which non-linguistic, perceptual 'pre-suppositions' determine the form and content of descriptive sentences imply a very intimate interaction between these channels. Perceptual and linguistic signs and sequences must, at some level, share a common representational (semantic) system and a common set of organisational (syntactic) rules.. cognitive in nature."

In an article under the title "Scientific perspectives and philosophical dead-ends", Luria (one of the most distinguished Soviet researchers into language and thought) said: "We must look for the roots of basic linguistic structures in the relations between the active subject and reality and not in the mind itself. Language is thus a system of codes used to express the relations of the subject with the outside world".

Amongst linguists proper, few have paid attention to, or thought it compatible with the principles of their science to consider, the relation between language and perception. However it is of interest to quote here some incidental comments of Chomsky: "There seems no more reason for assuming that the basic principles of grammar are learned than there is for making a comparable assumption about, let us say, visual perception.. any more than supposing the same person learns to analyse the visual field in terms of line, angle, motion, solidity, persons with faces etc. To acquire language, a child must select from the store of potential grammars a specific one that is appropriate to the data available to him ... If other animals do have systems with the formal properties of human language, I imagine that the proper place to look for them is not in the communication systems but rather in systems for the organisation of perception, or something of that type."

Some of the more stimulating views on the possible relation of language and visual perception have come from physiologists and neurologists. Helmholtz, the great pioneer of research into vision, noted the remarkable analogies between vision and language. Karl Lashley's classic discussion of the serial organisation of behaviour emphasised the parallelisms of function and structure, Teuber, summing up as a neurophysiologist at a conference some years ago on the brain and language, concluded: "Language imposes order on events by permitting their classification. What is important here is the hoped-for parallel between the known pattern-extraction machinery in the visual system and the still unknown arrangements in the auditory nervous system for the patterning of linguistic inputs. The formal analogy ... between Chomsky-type grammars and the Hubel/Wiesel type of analysis of visual systems may provide a hint about the kind of hierarchical arrangements we might have to look for in those regions of the brain where language input is received and analysed." Gregory, writing both as a student of the physiology of vision and as a psychologist, said very relevantly in an article on 'The Grammar of Vision':

"It now looks as though our use of these visual metaphors is no accident: seeing, thinking and language are inextricably tied up in the brain - so much so that the brain deals with language and vision in much the same ways. What if any is the connection between the inherited structures of perception that we use to interpret events in the world, and the deep structure of language? Is there a grammar of vision something like the grammar of a language? Was there slow development, over millions of years, of something originally serving some other use -which we finally cashed in on for our language? If so, we may suppose that ... deep structure did not originally serve language; but rather something else. The suggestion is that ... deep structure of language has its roots in the brain's rules for ordering retinal patterns in terms of objects ... From the work of Hubel and Wiesel we begin to understand the words of the language of perception, but it is far from clear how they are put together. In other words ,the mechanism of the grammar underlying the perceptual sentences is still hidden from us. It may be that language and vision are indeed based on common ground and that the basic problems of both must be solved together."

These extracts show the sharp realisation by a number of authors, psychologists, physiologists, neurologists and linguists of the significant analogies between vision and language and the need to find an explanation or account of the close interlinking of these two vital human capabilities. A possible way of approaching the problem of finding an explanation or a plausible account of the relation between language and perception is by considering the fundamental character and purposes, in the animal and the human being, of perception and the organisation of action and, in the human being, of language:

1. For the animal and the human, the primary issue is the organisation of action - to acquire food, to escape from enemies, to fight, to reproduce, to move about - and the primary manifestation of the animal's behaviour, of its objectives, is in bodily action, both locomotion and the movement of parts of the body without locomotion (for example in eating, drinking, grasping);

2. Few animals enjoy anything like a constant environment. For their bodily action to be effective (to help them and their genes to survive) they have constantly to adjust their action to take account of the real environment, the changes continuously taking place in their neighbourhood, the behaviour of both prey and predator. So perception develops in the service of action-organisation (with perception taking all the forms available to increase effectiveness of an animal's action, vision, hearing, smelling, touching, lateral line organs, electrical-sensing organs and so on);

3. But for perception to be useful, the information provided by perception must be fed to the action-organisation and as precisely-integrated with the action-organisation as possible. It is of no use for an animal to see its prey if there is no reliable way of converting the information given in the visual perception into precise instructions for the pattern of bodily action required to seize the prey i.e. distance, size, movement, position information must be exactly translatable into central terms for the precise guidance of the action-organisation. There must be a direct and intimate contact between the structures controlling the organisation of bodily action and the structures responsible for the organisation and interpretation of perception;

4. Beyond this, perception itself is not a passive process: it is an active one in the sense that the direction of perception has to be chosen, the mechanisms of perception themselves require control by the muscular organisation for action. Eyes must move to follow objects and to find objects, to focus on objects and to estimate the distance of objects; the head must be moved to help the location of a prey by sound, the ears, in some species, may be moved to improve acuteness and directional sensitivity; the limbs must be moved to allow identification by touch. Thus, perception must not only be integrated with action to guide action effectively but it must also be integrated with the action-system more fundamentally to allow precise control of the mechanisms of perception which themselves have to be appropriately adjusted. The integration of perception and action has to be extraordinarily complex and complete. There must be totally effective processes for transduction between patterning responsible for action and patterning responsible for perception;

5. There is a further stage in the development of processes to serve the central animal function. Language initially is essentially a means for collaboration (extending pre-linguistic systems for securing co-ordination of action between individuals of a species, wolves, wild dogs, hyenas,lions, birds, bees, ants). It functions by allowing a pooling of perceptions between a number of individuals. Language is a means for translating into a distant stimulus, the sound-pattern, the present content of an individual's perception, so that the individual's perception can be transmitted across space (a physical space or a space in time) to another individual or to other individuals. Later on language becomes a more generalised means of communication of the whole range of perception (internal as well as external) between individuals, where the effect or purpose may be not so much to modify the physical action of the hearing individual as his internal ('mental') organisation, though 'changing the other individual's mind' through speech may in the end result in later changes in his pattern of physical action. Language by serving perception in this way also serves the objective of perception itself, that is: effective external action.

6. For language to function effectively in the service of perception, it has to be integrated as closely as possible with perception; there must be as reliable a relation between perception and language as possible. Language must be so constructed as to convey with precision the content of one individual's perception, with as much detail as will be useful for the action of another individual or for the joint action of several individuals. At the same time, language must be such as to transmit the content of perception rapidly and completely, whilst maintaining precision in representing the contents of the speaking individual's perception. Thus there has to be a close integration between the structure and contents of perception and the structure and contents of language - and, since perception comes before language (visual perception is an inheritance we share with all animals) and is the more vital activity to serve the needs of action, the structure and contents of language must have been derived from and be dependent on the structure and contents of perception. Language, in evolutionary terms, had to be homoeomorphic with external perception in all its forms (visual, auditory, tactile etc). But beyond this language was structured more fundamentally by the needs of action, not simply because perception is structured by action, but because language, as a bodily process expressed in the system of articulation, itself depends directly on the organisation of action for its production and comprehension. Nevertheless, visual perception is the most important part of total perception (something like 90% of the total external information we receive comes in the form of visual information) and in so far as language is structured by perception, it must primarily be structured by visual perception.

In summary, language developed in the service of perception; and perception, as a general animal function, developed as a means of increasing the effectiveness of bodily action and so improving the chances of survival. For language and perception to be most effective, they must, in their turn, neurologically and physiologically, be closely integrated with the organisation of bodily action; the structures and processes underlying action determine and are necessarily linked with the structures and processes underlying perception (WE LOOK AND GRASP); the structures and processes underlying speech must be capable of conveying precisely and rapidly the content of perception and therefore must be integrated with the physiological and neurological organisation of perception (WE LOOK AND SPEAK). Language and vision are, as aspects of human behaviour, parallel hierarchical systems, in the case of language built up from speech-sounds, formed into words with the words then combined together into sentences.

The above expresses the basic thought of this paper, the intimate relation of vision, action, and language, in its most general form. However, this is still rather far removed from a detailed, physiological and neurological explanation of the natural, non-arbitrary foundation of the specific features of any given language - and the manner in which between languages there can be such considerable differences in lexicon and syntax whilst maintaining a direct physical relation between language and other human function.

The thesis to which this view of the relation of language, vision and action leads is that at every level there is correspondence between the processes underlying speech, vision and action, from that of the elementary speech-sounds (the phonemes ) parallel to elementary units of visual perception and action, through the formation of words from elementary speech-sounds (matched by the formation of visual elements into shapes and of units of action into action-routines so that corresponding to any word there is a specific visual or action contour indicating or giving a clue to the meaning of the word) and finally, moving to the stage of greatest complexity, the formation of word-strings, sentences to be treated in parallel with the formation of the continuous visual scene and the continuous action-sequence,

For such a hypothesis to be made plausible, one must go on to examine exactly how the elementary units of speech-sound might be related to the elementary-units of visual perception (or of action-organisation), how a correspondence can be directly established between individual words and the particular percepts or simple actions to which they refer, how the complex structures of language, the word-string or sentence, can be set in apposition to or parallel with the comparable levels of complexity of the visual scene or the action-sequence which forms a complex action. To give the central hypothesis concrete and detailed content in this way clearly involves a most searching examination of the results so far from investigation of the physiology and neurology of vision, of action and of language itself, If the aim is to demonstrate a direct, close relationship between these different human functions, or modes of behaviour, then one has to look for the real elementary units of vision, for the specific correspondences between elementary units for one function and those for the other two, to examine in parallel how in each modality the elementary units are formed into larger structures (visual shapes, words action routines) and then into the complex structures of continuous visual perception (the visual scene), continuous action (the complex action-sequence)and continuous speech (the sentence). How can one tackle such an ambitious task? So far, undoubtedly the most profound and successful research into any of the three functions (language, vision and action) has been in relation to visual perception. The most promising line seems to be to examine what modern research has to tell us about the structures and processes underlying vision.

The organisation of visual perception

Vision is a much more complex process than one might first suppose. In seeing, we are presented with tiny distorted upside-down images in the eye. From them we manage to arrive at the perception of separate solid objects in surrounding space. Variations in the light reflected from objects are translated at the retina, as a result of the character of the retina as a mosaic of individual receptor cells, into a myriad points of light of varying brightness and colour. The eyes feed the brain with information derived from these spots of light, coded into neural activity, chains of electrical impulses in the brain cells. In tackling perception as a scientific problem, the theoretical problem has always been how organisms can construct the stable perceived world of tangible objects out of the flux of light-stimuli. There are very great difficulties in understanding the decoding process, that is how the pattern of spots of light and of electrical impulses from the retinal cells is analysed more centrally and restructured to allow the recovery of the information originally contained in the perceived environment. A close analogy can be drawn between the decoding problem in vision and the problem of understanding the manner in which the auditory apparatus decodes the crude acoustic information falling on the basilar membrane, that is the receptive apparatus of the ear.

Visual information is processed in successive stages: the neural response to light starts at the receptors (rods or cones) in the retina, progresses through the sequential layers of cells in the retina via the optic nerve to the complex system within the brain devoted to vision. Within the brain, the information is channelled through a series of stages, with transformation and integration of the input occurring at each level. The processing is a hierarchical one, which proceeds from one functionally related group of cells to the next, and the hypothesis is that at each level larger and larger parts of the total structuring inherent in the original light stimulus are extracted. Visual processing is carried out essentially by innate, wired-in, segments of the system, though some aspects of vision do in fact develop through early interaction between the visual apparatus and the environment. There seems to be a predisposition to select from the stream of visual information playing on the retina those structures which reliably represent real features which must be apprehended if the animal or human being is to live safely in the world.

Vision (in a way strikingly similar to the functioning of language) is found to depend for successful perception to a high degree on the clues provided by context, on the help in decoding ambiguous information provided by prior knowledge and expectations - where decoding means that the restructured crude initial information at the retina has to be made relatable to the existing perceptual experience of the person viewing an object or scene and to his intentions and patterns of action.

With this broad outline of what is involved in visual perception, one can now proceed to the first of the stages in the detailed development of the hypothesis in this paper, that is consideration of what the elementary units might be taken to be as a preliminary to matching them with elementary units of speech or action.

Visual units

If in fact there is a natural relation between visual perception, action and language, one must look for as reliable evidence as possible of what the basic units, physiologically and neurologically, of vision, action and language are. Hypothetical elementary units will be of no use for the purpose; what is to be treated as a basic visual unit must be validated as far as possible by the trend of neurological and physiological research. The dots of light falling on the retina are not the elementary units of visual processing but simply the raw material from which the elementary units are constructed.

Perhaps the prime and most reliable evidence comes from Hubel and Wiesel's research into the relation between retinal stimulation and the response of specific cells in the visual cortex. Remarkable progress has been made in determining how the activity of individual neurones (cells in the visual cortex of the brain) is related to specific features of visual perception. Applying to the visual cortex methods originally developed by Mountcastle for studying the somato-sensory cortex, Hubel and Wiesel developed a refined technique for recording the activity of individual neurones in the visual cortex when the retina is stimulated in a defined way with a spot of light or an illuminated pattern. Typically such a neurone can only be activated by light falling on a limited area of the retina, the receptive field of that particular neurone. The receptive fields for visual brain-cells at the retinal surface are the building blocks for the synthesis and perception of the complex visual world. Activation of specific cortical brain-cells requires highly specific shapes or forms to be presented at the retina - in particular, the forms must be lines or edges with specific orientation of the lines and specific position on the retina. Some categories of neurones are specialised to respond to angles or corners or to movements in one direction but not in another. A scheme of this kind that assumes a hierarchically ordered series of ascending connections at the different cell-levels in the brain explains many features of how neurones respond selectively to specific stimuli. At each stage, brain-cells with relatively simple properties combine to form fields of progressively greater complexity and visual content.

The main significance of these discoveries (which have been matched by the discovery of similar organisation in other sensory areas of the brain's cortex) is that the visual area of the brain, the cortex for vision, is now seen to be a highly specialised structure designed to factor out and to emphasise the main features of optical inputs that are relevant in the perception of shapes and figures, that is contours and their elements. Even though, as Hubel and Wiesel recognise, the cells they have discovered still represent only a very elementary stage in the handling of complex forms, occupied as they are with a relatively simple region-by-region analysis of retinal contours, perhaps even more important than the specific identifications made of neuronal response has been the evidence they have presented that visual processing is essentially a hierarchical operation. One final point, the remarkable emphasis placed in the system on the precise orientations and contours. There is an amazingly exact relation, structurally, between the cortical organisation and sensitivity to small changes in orientation. In Hubel and Wiesel's experiments, they found that a movement of as little as 30 thousandths of a millimetre in the placing of an electrode implanted in the visual cortex was matched by, linked to, a change of about 10 degrees in the slope (orientation) of the particular line presented to the retina of the eye. Such precision of discrimination must surely have a special significance. The general picture presented by Hubel and Wiesel's work on the elementary units of visual perception fits reasonably well with evidence from a quite different source, that is the set of psychological experiments concerned with the mode of fragmentation of images stabilised on the retina, and with the practical experience in the artificial intelligence field of using a distinctive features approach to machine-vision (the features used are essentially the same as those identified by Hubel and Wiesel).

If in the light of the work described one had to specify elementary units of vision with the greatest research support, they might include: straight lines and edges, bars, angles, corners or curved lines, coupled with the relational aspects of orientation, discontinuity, movement, location and colour. There is one other major point that emerges from the research in addition to the evidence for hierarchical processing and the concern of the system with precise degrees of orientation of the stimulus at the retina, that is the profound preoccupation of the visual system with the contours of visual shapes, which suggests that contours must play a special part in pattern perception. This preoccupation of the visual system also provides some justification for the approach adopted in this paper of relating the elements of visual contour (as described) to the elements of action contour (discussed next) and the contours of word-forms considered as articulatory patterning.

Elementary units of action

Complex forms of action, such as walking, throwing a ball, playing the piano, all inescapably and ultimately have to be built from primitive elements of action open to the different parts of the body - smoothly integrated in an orderly sequence as skill develops. Identification of the elementary units of action which go to form complex actions must be speculative, though guidance can be derived from study of the development of skills in children: "Typically, the motor behaviour of young infants appears staccato and halting... The disorder of the sequencing of movements is something occurring not only in skilled motor performance but also in speech. Articulation after all is the expression of a sequence of movements and the results suggest that there may be some common factor in the mechanisms for the control of body movement and that for the control of speech processes... Skilled motor behaviour,., can be analysed into its component units.. The adults appears to possess a number of modular routines which can be linked together to perform any number of different actions. In this way an analogy can be drawn between skills and language.. A relatively small number of reliable and precise sub-routines are deployed by a rich motor syntax into a very large array of motor programmes," (Connolly)

To list the elementary units of action one has to consider in turn the possibilities there are for voluntary movement of the different parts of the body. Each of these has a range of movements it can execute. Elementary units of action can be expressed for different parts of the body in a general form (that is applicable to interaction of any particular muscles and joints) e g. one can specify bending to certain degree (as applied to the arm, the hand on the wrist, the leg, the head, and the movements of the mouth and jaws) or one can specify turning the arm, the head, the leg, the tongue). One has, so to speak, an abstract representation of any particular action, the realisation of which depends on which joint or sets of joints and which muscles controlling the joints are used to execute it. From this one can derive the concept of the 'action word', realisable in a variety of elementary actions performed by different systems of joints (or muscles) in the body. So a gesture of the arm can represent a structurally similar movement of some other part of the body. Looked at in this way, the character of the elementary units of action closely resembles the character of the elementary units of speech (a phoneme may be realised in different ways in different speech-contexts by people whose articulatory apparatus differs widely). Equally interestingly, a pattern of elementary action in this respect resembles the character of the elementary unit of vision (where a line-element is not realised through the activation simply of a precise set of cells on the retinal surface but through the activation of a pattern at the level of the visual cortex, marking the existence of a line-segment located at some point or other on the retina) i.e. the elementary action unit, elementary speech unit and elementary visual unit may all, in a sense, be generalised, abstract patterns, derived from the crude acoustic, visual or muscle-sense experience.

The correspondence of the elementary units of vision, action and language

The next step is to consider on what plausible basis a specific correspondence might be established between the different elementary units of vision, action and language. That there can be correspondence between vision, language and action in broad outline is a matter of familiar experience. In gesture associated with vigorous speech, and in the pictorial action-language of the deaf and dumb, there is clearly a translation of speech into action and of action into visual representation. But what one is looking for at this point is a correspondence at a basic level between the three functions. There must be reasons given earlier, at some level, physiologically and neurologically, real correspondence between the elementary units because they all form part of one integrally organised system of body and brain, and all depend on the same central coordinating system for guiding the expression in particular movements of the muscles (of the eye in visual perception, of the articulatory organs in speech and of the limbs etc. in action) of unified schemes of perception, articulation and activity.

In 'The Physical Foundation of Language' (published a few years ago), the author focused on the relation between language and gesture (that is one type of action) and proposed a fully-worked out set of equivalences between groupings of speech-sounds (one category of vowels and four categories of consonants) and five corresponding categories of gestural movement (raising the arm through a range of specified positions, moving the arm laterally, turning the arm, bending the arm and raising it across the body, moving the arm forward and upward through a range of positions in a touching, pointing or throwing type of action); the scheme proposed that each speech-sound in the five categories could be matched with one particular position of the arm in the range of position or movement covered by each of the five gestural categories. It is not feasible to present the scheme in detail here and in any case it now has to be superseded, in the light of the new emphasis on the relation between visual elements and speech-sounds, by an extended scheme proposing a specific relation between each speech-sound (articulatory movement-pattern), each visual element (eye-movement pattern) and each action-element. Such a scheme has been formulated provisionally and will be described in a forthcoming book: it suggests, for example, that the set of speech-sounds earlier matched with the bending upwards of the arm across the body (part of the gestural system) can now also be matched with the eye movements involved in the perception of curves with the degree of movement of the arm being put in correspondence systematically with the degree of orientation of the curve in visual space). The new system seems plausible and to fit well with the relation proposed between the contours of visual shapes, action (in the form of gesture) and the contours of word-forms.

Though any scheme of this kind can only be tentative and will need much more extended scrutiny, the important point is that the mere fact that such a scheme of equivalences, between visual, action and speech elements, can be formulated demonstrates that it is not totally impossible to conceive of a way in which words could be naturally based and derive their form from a real relation between their structure and the structure of the percepts or actions to which they refer.

If one is prepared, though only as a provisional hypothesis, to envisage such a correspondence between the elementary units, the next issue for consideration is how such elementary units (lines, curves, angles etc for vision, movements and positions of the limbs for action and the basic speech-sounds ,phonemes, for language) might be put together, in parallel, to form the higher level of structure, the visual shape, the action sub-routine or the word-form referring to the percept or action. Again, the most promising line seems to be to consider first how the elementary units of vision are, or may be, put together to form visual shapes, so far as vision research throws light on this. The construction of visual shapes

The emphasis in this section is on the individual visual shape, not so much in terms of the way it is recognised in practical experience but of the manner in which the individual shape is formed, or derives its structure, from the elementary units of vision. The comparison of critical interest is with the overall structure of the word, the formation of the sound-structure of a particular word from the elementary sound-units. As yet there is no universally accepted account of the manner in which visual shapes are perceived, though a massive research programme has been under way for many years and there have been many theories, not least those more recently stimulated by Hubel and Wiesel's discovery of cells in the visual cortex selectively responsive to differing elementary stimuli at the retina.

Despite different theoretical approaches, there seems general agreement on two important aspects of the visual perception of form. The first is that visual perception is an intensely active process with the perception of form depending on construction of particular percepts from unceasing movement of the eyes, and of the muscles forming part of the oculomotor apparatus. Lashley many years ago had already drawn attention to this: "I have come to feel that the problem of scanning underlies many other problems of neurophysiology ... most of our perception of objects is derived from a succession of scanning movements." The second vital element in the perception of visual form is the preoccupation of the visual system with contour.

One of the more interesting and ambitious accounts of the process of visual shape perception is that by Sommerhoff. This starts from the assumption that shape recognition is based on a tracking of contours by fixation of the perceived object in relation to the fovea:

1. At the lowest level, there are the 'primary analysers', wired-in detectors for elementary units such as those identified by Hubel and Wiesel

2. The next stage is that of scanning, exploratory activities such as contour-following, to pass from the salient features or elements of a shape to the spatial relations between these features or elements, which the brain does by factoring out the information derived from registering the nature of the eye-movements required in passing from one contour element to another, to compile in effect a list of the characteristic properties of the contours in question, that is of the shape formed by the elementary units.

3. The aggregate of the registrations of the eye-movements forms an adequate neural basis for categorising the shape according to the mutual relations between the centrally recognisable elements (lines, curves, angles, terminations etc).

What seems directly relevant for the hypothesis in this paper is:

a. initially shapes appear to be recognised by the elementary features they contain and the eye-movements involved in transition from one elementary feature to another

b. recognition of a shape involves, as Lashley says, readiness to recognise a motor sequence; eye-movement and shape perception seem to be interlocked, mutually dependent processes; the interaction between movement and perception seems to remain reciprocal.

c. the key question is how the succession of retinal images resulting from scanning is converted into a single stable impression of form. The earlier stage in development seems to depend on elementary feature-detection and later a 'template or 'schema' matching system is developed, with which can be associated the permanent labelling of visual shapes and objects provided by word-forms.

The discussion of visual shape perception can be related to the vital importance of the contour of the word-shape, formed from elementary units of speech-sounds, chained together by the succession of movements of the articulatory organs necessary for the production of the word. The characteristic pattern of movements identifies the stable word in much the same way as the characteristic pattern of eye-movements identifies the stable visual shape. For both vision and speech, there is a parallel progression, by hierarchical stages, from the simplest to the most complex forms.

The construction of action sub-routines from the elementary units

What, for the organisation of bodily action hierarchically, is the equivalent stage to the visual shape or word-form? Clearly not complex sequences of activity such as playing the piano or driving a car but the immediate subroutines of action, stretching out or touching, holding, throwing, walking, -sub-routines which, organised into sequences, go to form the syntactic expression of action. A stroke in tennis for example is made of many of these action sub-routines. As with the discussion of visual shape, or indeed of the construction of a word-form, what is involved is a structure which is to an extent a generalised representation of a sequence of elementary units of action, the inner-structure of the action hitting or throwing - of which the components are familiar and can readily be specified.

The points of vital importance for the formation of visual shape, the significance of contour, its formation from elementary units and the chaining together of the elementary units into a more complex structure by the succession of movements, apply with equal force to the analysis of the formation of the intermediate action-types just discussed. Just as for vision and speech, there are templates or 'schemas developed for familiar, frequently used shapes, objects and words, so there are centrally-recorded templates or schemas, at a high and rather abstract level, for the production of the frequently used and absolutely essential component action-forms used to construct complex sequences of skilled activity.

The parallel construction of word-forms from the elementary speech-sound units

The parallel enquiry to the two preceding ones, on the formation of visual shapes, and of action sub-routines, is how the elementary units of speech-sound, essentially the familiar set of phonemes, are formed into word-structures, into the words which constitute the lexicon. Liberman's work indicates that the discrimination of phonemes is a reality not in terms of the acoustic trace but in terms of patterning at the level of the neuromuscular system controlling the distinct articulation of different phonemes (providing for their integration into smooth sequences of permitted combinations). The constraints on the formation of word structures from particular combinations of phonemes are not accidental or conventional but reflect underlying biasses and constraints on the combination of articulatory movements, and at a higher level on the patterning of neural commands for words.

Acoustic patterns (which constitute words as heard) are, to speak in a rather extreme way, simply a by-product of the 'controlled gestures of the vocal organs' (Ladefoged's phrase) by which speech is produced physiologically and neurologically; the events taking place in the individual are the sequences of neural impulses to the muscles which control the shape and movement of the various organs which go to form the bodily speech-apparatus. Given the essential identity of central nervous control of speech-movements and other types of voluntary movement, it is a reasonable hypothesis that the production of speech involves sequences of movements similar to those of rapid typing or playing the piano. If one takes this essential identity to its logical extreme, one could analyse the production of speech, the control and patterning of movements of the articulatory organs) as a sub-division of 'action-organisation', as a part of total human behaviour.

The analysis of the formation of word-structures can readily proceed in parallel with much of what has been said about the formation of visual shapes and action-routines. The concept of contour is just as important for word-formation; others have remarked on the profound preoccupation of the auditory system with articulatory contours, with the unique discriminable articulatory pattern (the word) matching the unique discriminable visual pattern (the shape or object). The word is not just an assembly of the elementary sounds but a unity or pattern at a higher level of organisation, forming what Head described as a 'word-schema' centrally, an ideal form which may be translated in varying ways into the spoken form depending on the context, the preceding positions of the articulatory organs, and general anatomical factors varying from individual to individual; this high-level invariance is no different in principle from the invariance already described for the motor-image of the visual shape or of the action-routine.

But can one relate this theoretical account to the brute fact of the thousands of words which go to form the lexicon of any language? Surely, for many words, the link between word and specific percept or specific picturable action is extremely remote or non-existent, certainly as one moves up into the realm of vocabulary used for scientific or philosophical discussion? This leads into another large question which it is impossible to treat here at the length which its importance deserves, the structure and mode of development of the lexicon of any language. The lexicon of a language is not a disorderly or unordered mass of separate words but a structured system in its own right. Both phylogenetically (in terms of the language-community) and ontogenetically (in terms of the individual acquiring his mother-tongue and its lexicon), the lexicon grows over time in an orderly way from a central core. The approach of the present author is centred on the concept of the Primitive Vocabulary. The idea of the Primitive Vocabulary is that all language is built round a limited number of primitive words (used for referring to a limited number of primitive sensory percepts, actions and states, internal or external). The relation between the particular words in the primitive vocabulary and the percepts etc to which they refer can be established only by actual experience. The Primitive Vocabulary represents the categories into which our 'naive' experience is divided, without reflection, without the discursive use of language and without second-hand communication of knowledge, The initial set of words learned by a child is the prime example of what is described as the primitive vocabulary and, as research has shown, the initial vocabulary very largely consists of words referring to concrete percepts and to action, that is to experience most directly relatable to visual perception and the organisation of bodily action. The acquisition of words in the Primitive Vocabulary goes in step with (though occasionally follows) the formation of what could be described as the child's primitive perceptual repertoire, that is those shapes, things and relations that a child identifies first as its visual experience develops.

The Primitive Vocabulary (and the associated primitive perceptual repertoire) are structured, organised, in terms of the sensory modalities involved, in terms of the frames of experience in which the objects or actions are normally encountered and in terms of the distinct physical frameworks to which particular objects or actions are directly related, This Primitive Vocabulary is set in contrast to the full lexicon acquired by the adult, The primitive perceptual repertoire is similarly contrasted with the much more extensive repertoire of perceptual and other experience acquired by the adult. The adult's lexicon (and the adult's perceptual repertoire) develop from and retain at their core the primitive vocabulary and the primitive perceptual repertoire. The full lexicon of a language develops gradually from the primitive vocabulary by familiar processes of composition and extension of application, and at the higher stages by the creative use of the vital process of metaphorical transformation of primitive words and concepts.

The conclusion of the discussion in this section of the paper is that it is possible and plausible to assume that individual words (certainly for what is perceived or for actions) can have a natural origin, a natural relation between sound and meaning, because of the underlying physiological and neurological parallelisms between the processes of visual perception, bodily action and speech. What emerges from detailed consideration of the processes underlying vision, action and speech is the overriding importance in behaviour of the organisation of movement, of the central co-ordination neurally of complex sequences which may relate in a parallel way to movements of the eyes (the oculomotor apparatus) in seeing, to movements of the parts of the body and of the body as a whole in the organisation of action or to movements of the vocal organs, the articulatory apparatus in speech. The highest level of control of bodily skill is mediated by a set of motor schemas or motor programs that can be executed with a wide variety of initial positions of the muscles and organs involved and of the local environment. Vision, action and speech are all examples of highly skilled human activities, and the probability is very high that they are organised and executed in very similar ways - by neural structures which in their formation and manner of operation are closely similar. Everything points to a central role for the motor cortex not only in relation to bodily action but also to visual perception and speech; and this coming together in the brain of the structures for planning complex action in these different forms provides a possible (and a probable) basis for the detailed interrelation between visual forms, action-forms and word-forms.

Such a relation between patterning at the cortical level would allow one to understand how it is that the structure of visual shapes and action-routines (which everyone would accept as natural and not arbitrary in any sense) can be related to the structures of words (which traditionally have been thought to be wholly arbitrary, despite their inexplicably close relation in consciousness with the percepts and actions to which they refer). Penfield has remarked that 'the image of how to speak a word is in reality a pattern of the motor complex required to produce the word' and in this precise fact one should look for the naturalness of language and for the appropriateness of the structure of the individual word for the meaning it has,

The correspondence of visual scene, complex action and sentence

The paper up to now has presented an account of how the elementary units of visual perception, action and speech might be identified and put in correspondence with each other; and then, as the second stage in the hierarchical process, how the elementary units might go to form more elaborated structures, the visual shape, the action-routine and the word referring to a percept or action. One now comes to the highest level, the formation from words of the sentence, from shapes of the visual scene and from action-routines of the complex action-sequence,

In the process of tracing the interlinking of action, perception and language, the task becomes more difficult because the complexity of the material grows geometrically rather than arithmetically. At this stage, language has to be such that it is capable, as the general semanticists would say, of mapping an empirical and intellectual terrain that is forever growing more extended and more intricate, There are unmeasurable complexities involved when in effect the whole meaning-structure of the individual, his whole knowledge-structure, has to be brought into a specific relation with his ongoing experience: the union of the temporal order in the syntax of vision, action and speech) and of the simultaneous structure into which his action-skills, perceptual experiences of the world, his lexicon of words and meanings, are formed. The vital thread to hang on to is that action, vision and language are biologically and evolutionarily unified. The structure of a sentence must be related in a definite way to the structure of an action or the structure of a perception. It is not possible to treat at length here the vast range of questions which the relation between continuous speech, continuous perception and continuous action opens up. Perhaps one can best gain an idea however of the basic thought by quoting again Karl Lashley's seminal article on 'The problem of serial order in behaviour':

"Generality of the problem of syntax

I have devoted so much time to discussion of the problem of syntax not only because language is one of the most important products of human cerebral action, but also because the problems raised by the organisation of language seem to be characteristic of almost all other cerebral activity. There is a series of hierarchies of organisations: the order of the vocal movements in pronouncing the word, the order of words in the sentence, the order of sentences in the paragraph, the rational order of paragraphs in a discourse. Not only speech but all skilled acts seem to involve the same problems of serial ordering, even down to the temporal coordination of muscular contractions in such a movement as reaching and grasping ... the syntax of the act, which can be described as the habitual order or mode of relating the expressive elements (the individual words or adaptive acts), a generalised pattern or schema of integration which may be imposed upon a wide variety of specific acts, This is the essential problem of serial order: the existence of generalised schemata of action which determine the sequence of specific acts ... The problems of the syntax of action are far removed from anything we can study by direct physiological methods today, yet ., we cannot ignore them. Serial order is typical of the problems raised by cerebral activity".

From examination separately of the concepts of the visual scene, the complex action and the complex word-formation (the sentence), one arrives at some broad parallelisms:

1. nothing in research in any of the three fields casts doubt on the general proposition from which discussion in this paper started, the evolutionary point that any animal to survive must be organised to adapt its behaviour as effectively as possible to the perceived world; the content of perception must be such as to serve the needs of action-organisation and be as precisely and closely integrated with action-organisation as possible; in the same way, language, to serve perception and action effectively, must in its content and organisation be such that it can as accurately as possible provide the information needed and can be integrated as closely and precisely as possible with perception and action;

2. nothing casts doubt on belief in an essentially hierarchical underlying organisation for vision, action and speech, with its higher stages built on material drawn from the lower stages;

3. contours are of key importance for visual shapes, actions and words. Central motor patterning seems to be the level at which the processes underlying the three functions are brought into contact with each other and in fact integrated;

4. at the level of what Lashley described as the parallel syntaxes of vision, action and speech, there is an extraordinarily complicated but similar interrelationship of the semantic content and the syntactic ordering;

5. there are a large number of parallel processes found in vision, speech and action, which display an instantly recognisable unity of plan, reflecting the basic functioning of the body/brain system. For example:

a. the unity, integrity, of the sentence matches the coherent meaning of a visual scene and the coherent intent or objective of a complex action

b. the serial ordering of the sentence, the word-order, closely matches the serial ordering of processes in vision (scanning by saccades and fixations) and the serial processes in the planning and execution of complex actions

c. Gestalt concepts such as Figure and Ground are as relevant to the analysis of the sentence as they are to the analysis of the visual scene or the identification of the purposeful action

d. the interaction of the organisation of the semantic field and syntax in the sentence is matched in vision by the interaction between the available visual schemas and the relational principles governing the total organisation of the scene

e. the significance for the sentence of context and presuppositions is matched by the equal importance for vision of context and expectations and for action of the framework of the particular action and projections of intended action

f. analysis of ambiguity in the sentence raises very similar issues to analysis of ambiguity in the visual scene (there are verbal illusions to match the well-known visual illusions)

g. there is a close parallel between sentence-types and the patterning of the visual scene and the complex action

h. most important of all, closely similar problems of decoding arise in relation to the sentence and to interpretation of the visual scene, and similar techniques for decoding have to be used (the identification of 'markers or 'features', regional disambiguation and then successful disambiguation of the total sentence or total scene).

The comparative problem

Though the discussion and development of the evidence has inevitably been abbreviated (within the constrictions imposed by the format of this paper) the conclusion reached (for which a great deal of additional evidence not presented here could be adduced) is that clear indications can be found at every level, from the elementary units to the most complex sequences and combinations of those units, for a similarity and in fact direct physical relationship between the physiological and neurological structures and processes underlying language, visual perception and action.

The development of the argument in the paper has not depended in any narrow way on reference to the lexicon or syntax of any single language (though the precise schemes of equivalence between the elementary units of vision, action and speech and between words, visual shapes and action-routines which are developed elsewhere - in The Physical Foundation of Language and the forthcoming book already referred to - are of course presented in relation to the English language). The reader may say that if verification of the hypothesis depends on its application in detail to a particular language, then this still leaves the awkward fact that the details of the lexicon and syntax of other languages differ at very many points from those, say, of English. If it is claimed that for the given language an isomorphism can in the finest detail be established between the specific features of the language and the processes of vision and action, how can such an isomorphism possibly extend to other languages with different lexicons and syntaxes? The physiological capabilities of members of other communities or other races in terms of visual perception and co-ordination of voluntary action are surely no different and anatomically their articulatory equipment for speech is the same. Are you suggesting that there are neurophysiological differences (affecting speech, perception or action-organisation) between members of different communities speaking different languages, if for them also there is asserted to be a natural relation between language, vision and action?

The dilemma is essentially the same as that discussed in the previous paper, where arguments were presented for thinking that language cannot be arbitrary or artificial but at the same time the diversity of languages made it difficult to see how languages could be natural.

The central question of comparative linguistics, following Jakobson's proposition that every language is isomorphic with every other language because the interconversion of the codes by translation is possible, is how can one arrive at any idea of the universal principles of structure and functioning of language which constitute the basis of the isomorphism?

Assuming that current languages are in fact equally effective media of communication and that, on the hypothesis defended, their structures are related to the structures underlying visual perception and action-organisation, the issue has to be faced whether differences in language structure imply or involve differences in perceptual processing or other neurological differences between those speaking different languages, though the existence of such differences would not necessarily mean that the outcome, the comprehension of the visual scene or the execution of the complex action, would differ in any way - since there are equally valid alternative ways of doing things and there is no reason why there should not be equally valid (neutral in terms of effectiveness) ways of scanning a visual scene or indeed describing in a sentence a particular event or pattern (see the illuminating experiments of Osgood on this latter point).

At the most obvious level there can be neurophysiological differences between individuals even when they speak the same language and between members of different language - communities in scanning a visual scene, performing a particular complex action or expressing in language a particular sentence-meaning - simply because the different ways of doing these things involve the use of different groupings of muscles, different timing and sequences of action and different central programming. The answer to the question whether people speaking different languages must, to some extent, perceive differently or organise action differently is Yes - but the differences are only in the underlying processes of action and perception and not necessarily in the overt effective outcome.

There is no need to labour the obvious point that in different communities activities with the same final objective are performed in very different ways; at the most familiar level, different ways of cooking food, building houses, making clothes, are used to produce the same products or effects, to feed, shelter, keep warm and adorn.

More significant is the specific experimental evidence of differences between individuals, communities and races in the processes underlying perception and other aspects of behaviour. Yarbus, a great Russian authority on the functioning of the eye and eye-movements, goes so far as to say on the basis of his extensive study of scanning patterns that people who think differently also, to some extent, see differently. Noton and Stark's researches into visual scanning patterns showed very clearly that the scan paths of different subjects for any given pattern may very from individual to individual. Davidoff, in his book 'Differences in Visual Perception', refers to the enormous variety of differences in visual perception that are known to exist and the large differences between individuals belonging to different cultures in their perception of standard visual illusions (such as the Muller-Lyer or arrow illusion). Simon's article in 'Formal Theories of Visual Perception,' concludes that "there are usually a number of alternative ways of representing any particular pattern, and the actual encoding will vary as a function of the subject's previous experience and the strategy he adopts for performing a particular task",

The conclusion must be that language, perception and cognition are inextricably interrelated in the human brain and the evidence of individual variation in underlying perceptual and cognitive processes suggests that there inevitably must be ample natural sources of variation to explain the differing forms which world languages take, both in syntax and lexicon, whilst maintaining the structural relation between any particular language and the underlying processes of visual perception and action-organisation.

The argument in this paper is that the structure of language is isomorphic with the structure and functioning of visual perception and action, and this isomorphism can be demonstrated or at least plausibly illustrated in terms of any single language. A satisfactory account of the relation of language, perception and action in one language can be the starting point for identifying the universal principles of language, which constitute the isomorphism between languages to which Jakobson refers, An isomorphism between language, vision and action established for one language is potentially a universal isomorphism. To put the conclusion most succinctly:

"The isomorphism for any one language between the structures underlying speech, vision and action is at the same time the isomorphism in the universal context, underlying the structures of all languages and making translation between languages possible",


In real life there is no sharp division of our experience into vision or action or language. Vision and action proceed together mutually modifying each other; language itself is action, can refer to action, can cause action and can be modified in its form from moment to moment by the train of visual perception. Even if, in evolutionary terms, the needs of action gave rise to the need to perfect perception, and language developed to support and supplement both action and perception, the three are now co-operating aspects of human behaviour, three intersecting circles - where one might, quite helpfully, conceive of the central area which the three circles share as that part of the physiological and neurological processes, controlled by the cortex and particularly the motor cortex, where the patterns of action, vision and speech come together and mutually determine each other. In this central area (or in some area of the brain closely related to it) one kind of action is enabled to regulate and be regulated by other types of action, one kind of perception (external perception or perception of internal state) can be scrutinised and interpreted by another more abstract form of perception (perception of perception) and language at a more abstract level acquires the ability not only to speak about the contents of action and perception but also about the character of language itself - as we are doing now.