Mirror neurons and action observation. Is simulation involved? Gergely Csibra Moderators: Gloria Origgi, Dan Sperber The discovery of mirror neurons (MNs) has been hailed as the most important finding of the last decade in neuroscience (Ramachandran, 2000), paving the road for the explanation of diverse phenomena from the evolution of language (Rizzolatti & Arbib, 1998), through imitation (Iacoboni et al., 1999), to intersubjectivity (Gallese, 2003). Here I am concerned with the more modest claim that credits a fundamental role to mirror neurons in understanding observed actions (section 1). In particular, mirror neurons have been argued to support simulation theories of action understanding and mind reading (Gallese & Goldman, 1998; Gallese et al., 2004), an idea that has also been dubbed as the 'direct-matching hypothesis' (Rizzolatti et al., 2001). I shall argue that the evidence published on the response properties of MNs in monkeys is incompatible with these theories of action understanding because (a) MNs' activation reflects not the commencement but the conclusion of action interpretation (section 2), and because (b) MNs do not 'mirror' observed actions with sufficient accuracy for effective simulation (section 3). I shall offer an alternative interpretation, which preserves the MNs' role in action understanding without imposing a simulation function on them (section 4). 1. Simulation theories of action understanding According to the direct-matching hypothesis, "an action is understood when its observation causes the motor system of the observer to 'resonate'" (Rizzolatti et al., 2001, p. 661). The mapping of the observed actions onto one's own motor system is direct, automatic, and does not involve a sophisticated perceptual analysis: "Each time an individual sees an action done by another individual, neurons that represent that action are activated in the motor cortex. This automatically induced, motor representation of the observed action corresponds to what is spontaneously generated during active action and whose outcome is known to the acting individual. Thus, the mirror system transforms visual information into knowledge" (Rizzolatti & Craighero, 2004, p. 172). Understanding an observed action therefore implies linking it to its outcome: Mirror neurons "allow us to directly understand the meaning of actions ... of others by internally replicating ('simulating') them ... The observer understands the action because he knows its outcomes when he does it" (Gallese et al., 2004, p. 396). The idea of the direct-matching hypothesis is depicted in Figure 1. A more sophisticated version of simulationist action understanding appeared in Gallese and Goldman (1998). Here actions are understood not just in terms of their outcomes, but also in terms of the mental states, and especially the goals, that have generated them. Since simulation provides a straight answer only to the predictive, 'forward' question ('What will she do if she has a certain goal?'), the retrodictive, 'inverse' question ('What goal has made her do this?') requires generating conjectures about the others' goals, feeding them into the simulation device, and testing if the outcome matches the observed actions (cf. Wolpert et al., 2003). Thus, according to this account, action understanding involves finding a 'pretend' goal that would generate an action plan in the observer's own motor planning system that matches the observed action. Presumably, when the simulated motor action does not match the observed one, a new hypothesis is generated (cf. the 'generate-and-test' model in Goldman and Sripada, in press). Figure 2 (adapted from Gallese and Goldman, 1998) represents this version of simulationist action understanding. 2. Evidence for automatic and direct matching When the monkey performs a certain goal-directed action (say, he grasps a piece of food), some neurons are active in the F5 area in his premotor cortex. The same neurons also discharge when the monkey observes the same action performed by another individual (say, the experimenter grasps a piece of food). Now, consider what happens when the monkey watches an action without any object involved (say, the experimenter mimicks grasping a piece of food). The neurons that are active during grasping remain silent, indicating that the monkey has not understood the observed action as an instance of grasping food. In fact, no MNs are activated at all in this situation (Gallese et al., 1996). Why? Presumably, they do not fire because a mimicked action has no 'outcome', has no 'meaning', does not imply a goal, and hence, is not 'understood' by the monkey. Now, how do MNs know that an action has no meaning (outcome, goal)? According to simulation theories, whether an action has meaning or not is decided by simulation, and so all observed actions (that are performed by effectors that the monkey possesses) should be simulated in order to be interpreted. In fact, both versions of simulation theory (Figs. 1 & 2) predict that simulation will be performed even for actions that are subsequently judged meaningless. In version 1 (Fig. 1), simulation is automatic and obligatory (Gallese, 2004) as long as the observed action is within the monkey's motor repertoire. Mimicked grasping certainly satisfies this condition. In version 2 (Fig. 2), only actions generated by hypothesized pretend goals are simulated. Nevertheless, the simulation system must display at least some attempts to interpret an action, and re-try with new hypotheses in case of finding a non-match between the simulated and observed actions. In this model, simulation is always performed unless goal conjectures are not generated at all. The fact that goal hypotheses are not generated for meaningless actions implies that the goal of the action (or its absence) is decided (and understood) without resorting to simulation. In other words, if only actions with clear goals are simulated by mirror neurons, then simulation is not involved in the judgement of whether an action has a meaning and what it is. The available evidence suggests that MNs are activated only by observation of meaningful actions, which contradicts the predictions derived from simulation theories. Judging from their activation pattern, if mirror neurons correspond to anything on Figs. 1 & 2, it is not the motor activation boxes, but the last stage ('Understanding' and 'Goal Attribution', respectively) representing the final interpretation of the observed action. If simulation is involved at all in the functioning of MNs, the monkey 'simulates' because he has understood an action rather than he understands the action because he has simulated. Given the anatomical connections between the brain regions involved in action understanding, this conclusion is not surprising. Despite the claims to the contrary, visual (and other perceptual) information about observed actions does not reach the premotor cortex in a 'direct' and 'unmediated' way. Observed actions are processed at various brain regions, including the superior temporal sulcus (STS; see e.g., Jellema et al., 2000). The STS, together with parietal regions, like PF (Gallese et al., 2002), is considered to be a part of the 'mirror system' (Rizzolatti & Craighero, 2004), but it does not show any motor activation itself. In spite of lacking mirror properties, STS neurons seem to 'understand' actions quite well, and it is plausible to assume that they send (via PF) pre-processed signals about actions to the premotor areas that include information about the goal or the meaning of the observed action. MNs' sensitivity to the mode of presentation of observed actions is also incompatible with the idea of automatically induced simulation. In particular, MNs in the ventral premotor cortex are not activated when the action is presented on video rather than live (Ferrari et al., 2003; Keysers & Perrett, 2004). MNs do not respond to observed actions even with 3-D, stereoscopic presentation. According to the simulation account, lack of MN activation would indicate that the monkey has not understood the observed action, and has not been able to extract its meaning. However, activation patterns of STS neurons do suggest that monkeys give fairly high-level interpretation to actions observed on a video monitor (Jellema et al., 2000) and are not blind to their meaning. MNs in the premotor area probably do not respond to televised actions because, however well they are understood, they are irrelevant to the monkey's actual situation. MNs have been claimed to be unselective to the significance of target object of the action, especially whether it is a piece of food or a geometric solid (Rizzolatti & Craighero, 2004). Nevertheless, according to my understanding of these experiments, food was always involved in testing the motor properties of MNs because monkeys had been rewarded for performing object-directed actions. If this was the case, the real goal of monkeys' grasping actions was arguably not the target object itself, but the subsequent food reward. In other words, the grasped object was interesting for the monkey as long as grasping it was an instrumental action that led to immediate food reward. Thus, it is difficult to tell whether a MN that discharges in these experiments whenever the monkey grasps an object assigns the action the meaning of 'grasping an object' or the meaning of 'grasping an object for food'. When observing actions, MNs are activated by non-food-related hand actions even if neither the monkey nor the observed individual is rewarded with food afterwards. This seems to indicate that food reward is not required to map an observed action to an executed action. Nevertheless, while MNs always discharge to food-related actions, the response diminishes or disappears "after a few or even the first presentation" (Gallese et al., 1996, p. 605) to non-food three-dimensional solids. This reduced responsivity was attributed to the lack of attention that the monkey paid to the action after the first few trials. However, simulation is supposed to be 'automatically induced', regardless of the attentional state of the monkey. It is understandable that a monkey is less interested in geometric solids than in food, but how could his MNs know that the currently observed action is about an uninteresting object before figuring out the meaning of the action? A more plausible explanation for the reduced activity to repeatedly observed non-food-related actions is that the monkey quickly learns that, unlike his own object-related actions and many other actions that he sees in the laboratory, the current action will not be rewarded, and hence, is not an instrumental action for food. This view of MNs' sensitivity to reward can take two interpretations. The simpler one is that MNs are not interested in anything that is not related to food. The more complex, and more interesting, interpretation would say that MNs might code not just the immediate goal of an action but also those further states to which the action is instrumental – as long as the monkey can figure those out. I will return to this interpretation in the conclusion. 3. Evidence for execution-observation congruence Contrary to the widespread belief that MNs have been defined by their mirror properties, this is not the way the researchers who discovered them used this term. 'Mirror neurons' were either defined as the cells in the ventral premotor cortex that responded when the monkey observed meaningful actions (Rizzolatti et al., 1996), or those that "discharged both when the monkey made active movements and when it observed specific meaningful actions" (Gallese et al., 1996, p. 595). (When I refer to 'mirror neurons' in this paper, I apply this latter definition.) Thus, mirror properties are not defining features but further non-trivial characteristics of these neurons, as they were reported to discharge when the monkey observed 'similar' actions to those that activated the neurons when he performed them. What is the degree of similarity here? First, not all neurons that respond to action observation have motor properties. In fact, a sizeable proportion of these neurons are not mirror neurons under Gallese et al.'s (1996) definition. This proportion varies from 21% (Gallese et al., 1996; Rizzolatti et al., 1996), through 25 % (di Pellegrino et al., 1992), to 30 % in case of the PF region of the parietal cortex (Gallese et al., 2002). What do these neurons do? They certainly do not translate visual information into motor code (i.e., they do not simulate), because they do not seem to represent any motor code. Second, though most mirror neurons are selective to one type of executed action, many of them respond to two or even three different types of observed action. The proportion of MNs sensitive to multiple actions is 21 % (di Pellegrino et al., 1992), 33 % (ingestive actions in Ferrari et al., 2003), 37 % (PF MNs, Gallese et al., 2002), about 40 % (Rizzolatti et al., 1996), 45 % (Gallese et al., 1996), and even 68 % (Umiltà et al., 2001). The fact that more than one action can activate a MN implies, for example, that a neuron that is associated with the 'grasping by hand' motor action could be activated by the observation of 'hands interaction', or by 'grasping with the mouth' (Gallese et al., 1996). Taking the idea of simulation seriously, these instances of MN activation should be classified as mis-simulation or mis-interpretation of the observed action. As the high proportion of MNs responding to multiple actions suggests, mis-interpretation of observed actions would not be exceptional. Third, even when a MN is activated only by the observation of a single action, it is not necessarily the same action as defined by the motor properties of the neuron. Examples are the so-called 'broadly congruent' mirror neurons, which make up approximately 60 % of all MNs (Fogassi & Gallese, 2002), though this number may also include the above neurons that respond to multiple actions. For example, di Pellegrino et al. (1992) reported that in 11 of the 29 MNs (38 %) the effective observed and effective executed actions were logically related. "For example, the effective observed action was placing an object on the table, whereas the effective executed action was bringing food to the mouth or grasping the object" (di Pellegrino et al., 1992, p. 179). If mirror neurons implemented a simulation procedure, this example would literally mean that the monkey understood the object-placing action as having the same meaning as when he grasps (or eats) an object. One can, of course, claim that these two actions "can be considered to be part of a logical sequence" (Fogassi & Gallese, 2002, p. 19), but one can also claim that these two actions (placing an object onto a surface, and grasping an object from a surface) involve goals that are practically opposite to each other. In addition, if simulation constitutes the first step towards understanding an observed action, what system would figure out which actions are logically related to the observed action in order to simulate them? Fourth, even with this loose definition of congruency, in a further 10 % of MNs no relation was found between the effective executed and observed actions (Fogassi & Gallese, 2002). Although these neurons satisfy the definition of MNs, I wonder why they are called 'mirror neurons' at all. Finally, about one third of all MNs show a clear one-to-one congruence ('strict congruence') between visual and motor properties of the cells (di Pellegrino et al. (1992): 41 %; Gallese et al. (1996): 32 %; Ferrari et al. (2003): 37 %; PF MNs in Gallese et al. (2002): 19 %). It is difficult to determine whether these proportions are low or high because statistical analyses have never been reported on visuo-motor congruence in mirror neurons. A suitable null-hypothesis for such an analysis would be the assumption that motor and visual properties are randomly distributed across neurons, with no systematic relation between them. The above papers did not report the data in such details that would allow this post-hoc analysis, but they enable us to make some estimates. If we consider all 92 mirror neurons (including those that respond to multiple actions) in Gallese et al. (1996), then 69 (75 %) of them respond to the observation of grasping and 71 (77 %) of them are activated during the motor action of grasping. With random, unrelated distribution across neurons, one could expect to find 53 MNs (58 %) that are activated by both observation and execution of grasping. If we consider only the MNs that are associated with a single action, 30 of 41 respond to observation of hand grasping, while 60 of 81 are activated while the monkey performs a grasping action. With random distribution, one would expect that 54 % of these MNs would be found congruent for grasping (while ignoring other types of actions). These calculations do not take into account that 'strictly congruent' MNs were defined by not only the general action category (e.g., grasping) but also by the specific way (e.g., grip type) that the action was executed, and therefore cannot be compared directly to the published proportions of congruent neurons (see above). My point here is not that the reported figures of congruence are too low, but rather that these figures should be evaluated against the distribution of various actions among MNs in the two domains (observation and execution). With strongly unequal distribution of types of action or types of grip, one could find a relatively high proportion of good match between the domains even if there were no causal relation between them. Without such a statistical analysis, it remains uncertain whether the cells that satisfy the definition of 'mirror neurons' (i.e., the ones that discharge both with execution and observation of actions) do indeed have 'mirror properties' in the everyday use of this term (i.e., are generally activated by the same action in both domains). The term 'mirror neurons' is indeed a "great terminological choice" (Sperber, 2004), but it does not replace the need for proof of 'mirroring'. In sum, when a monkey observes a hand action, neurons in his brain's ventral premotor (and parietal PF) area discharge. Some of these neurons have no motor action associated with them, some are also activated by different types of observed actions, some have motor properties of unrelated or opposite actions, and in some the associated motor action matches the observed action. These findings are hardly compatible with the direct-matching hypothesis. Simulaton theories imply that the more accurate the simulation, the better the understanding of the observed action. If a monkey relied on the simulation provided by his mirror neurons, he would not understand much of others' actions. The generally weak congruence between motor and perceptual properties of MNs suggests that while the same neural structure is recruited for representing executed and observed actions of the same effectors (e.g., hands), the actual representations do not necessarily match across domains. Rizzolatti et al. (2001) realized this problem and asserted that the broad congruence found in MNs indicated that they "generalize the goal of the observed action across many instances of it" (p. 662). They were probably right. However, one cannot have one's cake and eat it too. MNs either simulate observed actions in order to understand them, or generalize already interpreted actions into abstract action-concepts. The broad congruence found in MNs is more compatible with the latter idea. This conclusion is consistent with the conclusion of the previous point, according to which MNs are more likely to be involved in the representation and further processing of highly interpreted actions than in simulation. 4. Conclusions The above analysis reveals serious incompatibilities between the facts that have been reported about MNs and the theories that ascribe to them the functional role of making observed actions understood by simulation. I want to emphasize that my challenge addresses only these incompatibilities, and not simulation theories or mirror neurons per se. Simulation theories of action understanding may well be correct, even if they are not implemented in mirror neurons. Likewise, I do not deny a possible role of mirror neurons in some aspects of action understanding. In particular, mirror neurons seem to play a role in representing high-level, abstract relations between actions and subsequent states or actions, especially when they are relevant to the monkeys themselves. In fact, a plausible counter-hypothesis for the role of MNs would be that they are involved in the prediction or anticipation of subsequent — rather than in the simulation of concurrent — actions of the observed individual. MNs seem to be primarily sensitive to instrumental actions that are performed in order to enable further actions, and the 'logically' related actions that have been described as examples of broad congruence between executed and observed actions may indeed reflect this kind of sequential relation. This interpretation of MN functioning would also eliminate the problems with weak observation-execution congruence (e.g., sensitivity to multiple actions), because predictions and anticipations do not require one-to-one matching across domains. At the same time, while this hypothesis would assign an important role in action interpretation to MNs, it would also entail that mirror neurons, despite their name, do not 'mirror' actions. Having no access to the raw experimental results of the MN studies, I cannot assess the soundness of this alternative interpretation about the role of MNs in action understanding. I have raised this hypothesis here not as a serious proposal but as an illustration that alternative accounts should also be explored when theory and data (in this case, simulation theory and the activation pattern of mirror neurons) do not match. If the 'prediction' hypothesis turned out to be correct (and even if it did not), mirror neurons may indeed be considered to be the most important discovery in neuroscience during the last decade (Ramachandran, 2000) — but not because they support a simulation theory of action understanding. Acknowledgements I thank to György Gergely, Kazuo Hiraki, Mark Johnson, John S. Watson, and Jennifer Yoon for their valuable comments on earlier versions of this paper. References di Pellegrino, G., Fadiga, L., Fogassi, L., Gallese, V., & Rizzolatti, G. (1992). Understanding motor events: a neurophysiological study. Experimental Brain Research, 91, 176-180. Ferrari, P. F., Gallese, V., Rizzolatti, G., & Fogassi, L. (2003). Mirror neurons responding to the observation of ingestive and communicative mouth actions in the monkey ventral premotor cortex. European Journal of Neuroscience, 17, 1703-1714. Fogassi, L. & Gallese, V. (2002). The neural correlates of action understanding in non-human primates. In M. I. Stamenov & V. Gallese (Eds.), Mirror Neurons and the Evolution of Brain and Language (pp. 13-35). Amsterdam: John Benjamins Publ. Gallese, V. (2003). The manifold nature of interpersonal relations: the quest for a common mechanism. Philosophical Transactions of the Royal Society, LondonB, 358, 517-528. Gallese, V. (2004). Intentional attunement. The Mirror Neuron system and its role in interpersonal relations. http://www.interdisciplines.org/mirror/papers/1 Gallese, V., Fadiga, L., Fogassi, L., & Rizzolatti, G. (1996). Action recognition in the premotor cortex. Brain, 119, 593-609. Gallese, V., Fogassi, L., Fadiga, L., & Rizzolatti, G. (2002). Action representation and the inferior parietal lobule. In W. Prinz& B. Hommel (Eds.), Attention and Performance XIX. Common Mechanisms in Perception and Action (pp. 334-355). New York: OUP. Gallese, V. & Goldman, A. (1998). Mirror neurons and the simulation theory of mind reading. Trends in Cognitive Sciences, 12, 493-501. Gallese, V., Keysers, C., & Rizzolatti, G. (2004). A unifying view of the basis of social cognition. Trends in Cognitive Sciences, 8, 396-403. Goldman, A. I. & Sripada, C. S. (in press). Simulationist models of face-based emotion recognition. Cognition. Iacoboni, M., Woods, R. P., Brass, M., Bekkering, H., Mazziotta, J. C., & Rizzolatti, G. (1999). Cortical mechanisms of human imitation. Science, 286, 2526-2528. Jellema, T., Baker, C. I., Wicker, B., & Perrett, D. I. (2000). Neural representation for the perception of the intentionality of actions. Brain and Cognition, 44, 280-302. Keysers, C. & Perrett, D.I.(2004). Demystifying social cognition: a Hebbian perspective. Trends in Cognitive Sciences, 8, 501-507. Ramachandran, V. S. (2000). MIRROR NEURONS and imitation learning as the driving force behind "the great leap forward" in human evolution. Edge. http://www.edge.org/3rd_culture/ramachandran/ramachandran_p1.html Rizzolatti, G. & Arbib, M. A. (1998). Language within our grasp. Trends in Neurosciences, 21, 188-194. Rizzolatti, G. & Craighero, L. (2004). The mirror-neuron system. Annual Review of Neuroscience, 27, 169-192. Rizzolatti, G., Fadiga, L., Fogassi, L., & Gallese, V. (1996). Premotor cortex and the recognition of motor actions. Cognitive Brain Research, 3, 131-141. Rizzolatti, G., Fogassi, L., & Gallese, V. (2001). Neurophysiological mechanisms underlying the understanding and imitation of action. Nature Review Neuroscience, 2, 661-670. Sperber, D. (2004). "Mirror neurons" or "Concept neurons"? http://www.interdisciplines.org/mirror/papers/1/8 Umiltà, M. A., Kohler, E., Gallese, V., Fogassi, L., Fadiga, L., Keysers, C., & Rizzolatti, G. (2001). I know what you are doing: A neurophysiological study. Neuron, 32, 91-101. Wolpert, D. M., Doya, K., & Kawato, M. (2003). A unifying computational framework for motor control and social interaction. Philosophical Transactions of the Royal Society, LondonB, 358, 593-602.