Mathematical Background

by John F. Sowa

This web page is a revised and extended version of Appendix A from the book Conceptual Structures by John F. Sowa. It presents a brief summary of the following topics for students and general readers of that book and related books such as Knowledge Representation and books on logic, linguistics, and computer science.

  1. Sets, Bags, and Sequences
  2. Functions
  3. Lambda Calculus
  4. Graphs
  5. Relations
  6. Representing Relations by Graphs
  7. Lattices
  8. Propositional Logic
  9. Predicate Logic
  10. Axioms and Proofs
  11. Formal Grammars
  12. Game Graphs
  13. Model Theory
  14. References
Depending on which browser you are using and which fonts are available, one of the following two versions of this document may produce a more readable representation of the mathematical and logical symbols: If your current browser correctly displays "l" as the Greek letter lambda, """ as an upside-down A, and "®" as a right-pointing arrow, then continue viewing this web page. If any of those characters are displayed incorrectly, then click on the other web page to see whether it displays them better. You may also download a program that translates HTML 4.0 character representations to the codes for the Symbol font. You are welcome to use, modify, or adapt that program for the system you are using.


1. Sets, Bags, and Sequences

Elementary or "naive" set theory is used to define basic mathematical structures. A set is an arbitrary collection of elements, which may be real or imaginary, physical or abstract. In mathematics, sets are usually composed of abstract things like numbers and points, but one can also talk about sets of apples, oranges, people, or canaries. In computer science, sets are composed of bits, bytes, pointers, and blocks of storage. In many applications, the elements are never defined, but are left as abstractions that could be represented in many different ways in the human brain, on a piece of paper, or in computer storage.

Curly braces are used to enclose a set specification. For small, finite sets, the specification of a set can be an exhaustive list of all its elements:

{1, 97, 63, 12}.
This specifies a set consisting of the four integers 1, 97, 63, and 12. Since the order of listing the elements is immaterial, the following specification is equivalent to the one above:
{12, 63, 97, 1}.
If the set is very large, like the set of all mammals, a complete listing is impossible. It is hard enough to enumerate all the people in a single city, let alone all the cats, dogs, mice, deer, sheep, and kangaroos in the entire world. For such sets, the specification must state some rule or property that determines which elements are in the set:
{x | vertebrate(x) and warmBlooded(x) and hasHair(x) and lactiferous(x)}
This specification may be read the set of all x such that x is vertebrate, x is warm blooded, x has hair, and x is lactiferous. A given set may be specified in more than one way. The following four specifications all determine the same set:
{1, 2, 3}
{x | x is an integer and 0<x<4}
{x | x is a positive integer, x divides 6, and x¹6}
{x | x=1 or x=2 or x=3}
A set specification that lists all elements explicitly is called a definition by extension. A specification that states a property that must be true of each element is called a definition by intension. Only finite sets can be defined by extension. Infinite sets must always be defined by intension or by some operations upon other infinite sets that were previously defined by intension.

In any theory using sets, there are two privileged sets: the empty set {}, which contains no elements at all, and the universal set U, which contains every element that is being considered. In mathematical discussions, for example, the universal set may be the set of all integers Z or the set of all real numbers R. In most discussions, the universal set is usually defined at the beginning of the presentation. Thereafter, other sets are built up from U: subsets of U, pairs of elements of U, sets of sets of elements from U, etc.

Of all the operators that deal with sets, the most basic is Î, which states whether a particular element is in a set: the notation xÎS means that x is an element of the set S; it may also be read x is a member of the set S or simply x is in S. All other operators on sets can be defined in terms of Î. Let A and B be any two sets. Following are the common operators of set theory; listed for each one is its name, standard symbol, informal English definition, and formal definition in terms of Î:

The operators for union, intersection, and complement satisfy several standard identities. Some of the identities listed below are similar to the rules of ordinary arithmetic. Addition and multiplication, for example, obey the rules of commutativity and associativity, and the minus sign obeys the rule of double complementation. Idempotency, absorption, and De Morgan's laws, however, do not hold for ordinary arithmetic. Distributivity holds for multiplication over addition, but addition does not distribute over multiplication.

For complex sets, the rule for determining which elements are in the set may be too complex to state in a single expression. An example is the set of all grammatical sentences in some language, natural or artificial. Such sets are typically specified by a recursive definition:

The set resulting from these operations is said to be the closure of the starting set under the given generating operations. As an example of a recursive definition, the set S of all positive integers not divisible by 3 could be specified by intension:
S = {x | x is an integer, x>0, and 3 does not divide x}.
But the property x is an integer depends on some prior definition of the set of all integers. The following recursive definition depends only on the operation of adding 3: All elements of S may be enumerated by starting with {1, 2}. The first stage of adding 3 generates the new elements 4 and 5, adding 3 to them gives 7 and 8, then 10 and 11, and so on. The set S is the closure of the set {1, 2} under the operation of adding 3. A recursive definition is a special kind of definition by intension. The formal grammars presented in Section 10 define languages by a recursive definition in which the generating operations are specified by a set of production rules. For a discussion and comparison of various methods of definition, see the notes on definitions by Norman Swartz.

A set has no duplicate elements. Since all duplicates are discarded in computing the union of two sets, the union operator is idempotent: AÈA=A. In some cases, one may want to allow duplicates; therefore, a bag is a collection of things with possible duplicates. Since there may be more than one occurrence of a given element x, the count operator @ is a generalization of the element operator Î. The expression x@A is the number of times the element x occurs in the bag A. Bags are useful for many purposes, such as taking averages: if four men have heights of 178cm, 184cm, 178cm, and 181cm, then the set of those numbers is {178, 181, 184} with the average 181; but the bag of the numbers is {178, 178, 181, 184} with average 180.25.

A sequence is an ordered bag. To distinguish ordered sequences from unordered sets and bags, the elements of a sequence are enclosed in angle brackets: á178, 184, 178, 181ñ; the empty sequence is written áñ. If a sequence has n elements, the elements are numbered from 1 to n (or alternatively from 0 to n-1). A sequence of two elements is sometimes called an ordered pair; a sequence of three elements, a triple; a sequence of four, a quadruple; a sequence of five, a quintuple; and a sequence of n elements, an n-tuple. Historically, the theory of sets was first defined without considering order. On a piece of paper or in computer storage, however, the elements of a set must be listed in some order. Sequences are therefore easier to represent than bags, and bags are easier to represent than sets: a bag is a sequence with the ordering ignored, and a set is a sequence with both order and duplicates ignored.

New sets may be created by combining elements from the universe U in various ways. The cross product of two sets A and B, written A?B, is the set of all possible ordered pairs with the first element of each pair taken from A and the second element from B. If A is the set {1,2} and B is the set {x,y,z}, then A?B is the set,

{á1,xñ, á1,yñ, á1,zñ, á2,xñ, á2,yñ, á2,zñ}.
With the notation for defining a set by a property or rule, it is possible to give a general definition for the cross product A?B:
{áx,yñ | xÎA and yÎB}.
The cross product can also be extended to three or more sets. The product A?B?C is defined as
{áx,y,zñ | xÎA, yÎB, and zÎC}.
Since Ren? Descartes introduced pairs of numbers for identifying points on a plane, the cross product is also called the Cartesian product in his honor.

In this book, most sets are finite. Inside a computer or the human brain, all sets that are explicitly stored must be finite. But mathematical definitions and proofs are generally simpler if there is no upper limit on the size of sets. Therefore, definitions in computer science often permit infinite sets, but with the understanding that any implementation will only choose a finite subset. Most infinite sets discussed in computer science are assumed to be countable: a countably infinite set is one whose elements can be put in a one-to-one correspondence with the integers. The set of all real numbers is uncountable, but such sets are far beyond anything that can be implemented in computer systems.

The terminology for sets is quite standard, although some authors use the word class for set and others make a distinction between classes and sets. Bags are not used as commonly as sets, and the terminology is less standard. Some authors use the word multiset for a bag. Sequences are sometimes called lists or vectors, but some authors draw distinctions between them. Some authors use the symbol Æ for the empty set, but the notation {} is more consistent with the notation áñ for the empty sequence.


2. Functions

A function is a rule for mapping the elements of one set to elements of another set. The notation f: A ® B means that f is a function that maps any element x in the set A to some element f(x) in the set B. The set A is called the domain of f, and B is called the range of f. In mathematics, the element x is called the argument, and f(x) is called the result or the image of x under the mapping f. In computer science, x is called the input and f(x) is called the output.

Suppose Z is the set of all integers, and N is the set of non-negative integers (i.e. the positive integers and zero). Then define a function square: Z ® N with the mapping rule,

square(x) = x2.
The function square applies to all elements in its domain Z, but not all elements in its range N are images of some element of Z. For example, 17 is not the square of any integer. Conversely, some elements in N are images of two different elements of Z, since square(3)=9 and square(-3)=9.

A function is onto if every element of its range is the image of some element of its domain. As an example, define the absolute value function, abs: Z ® N, with the mapping,

            +x if x³0
abs(x) =
            -x if x<0
Every element of N is the image of at least one element of Z under the mapping abs; therefore abs is onto. Note that abs is onto only because its range is limited to N. If the range had been Z, it would not be onto because no negative integer is the absolute value of anything in Z.

A function is one-to-one if no two elements of its domain are mapped into the same element of its range. The function abs is not one-to-one because all the elements of N except 0 are the images of two different elements of Z. For example, abs(-3) and abs(3) are both 3. As a more subtle example, consider the function g: Z ® N with the mapping,

g(x) = 2x2 + x.
Then g(0)=0, g(1)=3, g(-1)=1, g(2)=10, g(-2)=6, etc. The function g is one-to-one since no two elements of Z are mapped into the same element of N. However, g is not onto because many elements of N are not images of any element of Z. Note that g is one-to-one only over the domain Z of integers. If its domain and range were extended to the set R of all real numbers, it would not be one-to-one: g(-0.5) and g(0), for example, are both 0.

A function that is both one-to-one and onto is called an isomorphism. The two sets that form the domain and range of the function are said to be isomorphic to each other. Let E be the set of even integers, and let O be the set of odd integers. Then define the function increment: E ® O with the mapping,

increment(x) = x + 1.
This function is an isomorphism from the set E to the set O because it is both one-to-one and onto. Therefore, the sets E and O are isomorphic. Instead of the terms one-to-one, onto, and isomorphic, many authors use the equivalent terms injective, surjective, and bijective.

For many applications, isomorphic structures are considered equivalent. In old-fashioned computer systems, for example, holes on a punched card could represent the same data as magnetized spots on tape or currents flowing in transistors. Differences in the hardware are critical for the engineer, but irrelevant to the programmer. When programmers copied data from cards to tape, they would blithely talk about "loading cards to tape" as if the actual paper were moved. One mythical programmer even wrote a suggestion for reducing the shipping costs in recycling old cards: load the cards to tape and punch them out at the recycling works.

If f is an isomorphism from A to B, then there exists an inverse function, f -1: B ® A. The inverse of the function increment is the function decrement: O ® E with the mapping,

decrement(x) = x - 1.
The composition of two functions is the application of one function to the result of the other. Suppose that f: A ® B and g: B ® C are two functions. Then their composition g(f(x)) is a function from A to C. The composition of a function with its inverse produces the identity function, which maps any element to itself. For any x in E, decrement(increment(x)) is the original element x.

Functions may have more than one argument. A function of two arguments whose first argument comes from a set A, second argument from a set B, and result from a set C is specified f: A?B ® C. A function with one argument is called monadic, with two arguments dyadic, with three arguments triadic, and with n arguments n-adic. Those terms are derived from Greek. Some authors prefer the Latin terms unary, binary, ternary, and n-ary. The number of arguments of a function is sometimes called its valence, adicity, or arity.

The rule that defines a function f:A®B as a mapping from a set A to a set B is called the intension of the function f. The extension of f is the set of ordered pairs determined by such a rule:

{áa1,b1ñ, áa2,b2ñ, áa3,b3ñ,...}.
where the first element of each pair is an element x in A, and the second is the image f(x) in the set B. A definition by extension is only possible when the domain A is finite. In all other cases, the function must be defined by intension. (One could, of course, define the extension of a function as an infinite set, but the set itself would have to be defined by intension.)

Since a function is a rule for mapping one set to another, the term mapping is sometimes used as a synonym for function. Another synonym for function is the term operator. Addition, subtraction, multiplication, and division are dyadic functions defined over the real numbers, but they are usually called operators. A common distinction is that functions have ordinary alphabetic names, but operators are designated by special symbols like + or ?. Traditional mathematical practice has tended to use several different terms as informal synonyms for functions:


3. Lambda Calculus

Defining a function by a rule is more natural or intuitive than defining it as a set of ordered pairs. But a question arises when functions defined by different rules or intensions happen to have exactly the same sets of ordered pairs or extensions. In developing his theory of lambda calculus, the logician Alonzo Church (1941) distinguished equality by intension from equality by extension:
It is possible, however, to allow two functions to be different on the ground that the rule of correspondence is different in meaning in the two cases although always yielding the same result when applied to any particular argument. When this is done, we shall say that we are dealing with functions in intension. The notion of difference in meaning between two rules of correspondence is a vague one, but in terms of some system of notation, it can be made exact in various ways.
Church's way of making the notion precise was to define lamda calculus as a notation for defining functions and a method for converting any given definition into other equivalent definitions.

In mathematics, the traditional way of defining a function is to specify the name of a function and its formal parameter on the left side of an equation and to put the defining expression on the right:

f(x) = 2x2 + 3x - 2.
This method of definition makes it impossible to specify the name f of the function independently of the name x of the formal parameter. To separate them, Church adopted the Greek letter l as a marker of the defining lambda expression:
f = lx(2x2 + 3x - 2).
In this equation, the name f appears by itself on the left, and its definition is a separate expression on the right. The symbol x that appears after l is called the formal parameter or bound variable, and the remainder of the expression is called the body.

Church's rules for lambda conversion are formal statements of the common techniques for defining and evaluating functions. Whenever a function is applied to its arguments, such as f(5), the function may be evaluated by replacing the name f with the body of the definition and substituting the argument 5 for every occurrence of the formal parameter x. Church also defined additional operators, which combined with function evaluation to produce a computational system that is as general as a Turing machine.

With such rules, Church answered the question about equality of functions: they are equal by extension if they have the same sets of ordered pairs, and they are equal by intension if their definitions are reducible to the same canonical form by the rules of lambda conversion. An important result of the lambda calculus is the Church-Rosser theorem: when an expression has more than one function application that can be evaluated, the order of evaluation is irrelevant because the same canonical form would be obtained with any sequence of evaluations.

In computer science, the clear separation of the name of a function from its defining expression enables a lambda expression to be used anywhere that a function name could be used. This feature is especially useful for applications that create new functions dynamically and pass them as arguments to other functions that evaluate them. John McCarthy (1960) adopted the lambda notation as the basis for defining and evaluating functions in the LISP programming language. A common technique of computational linguistics is to translate natural language phrases to lambda expressions that define their semantics. William Woods (1968) used that technique for defining the semantics of the English quantifiers every and some as well as extended quantifiers such as more than two or less than seven. He implemented his definitions in LISP programs that translated English questions to lambda expressions, which were later evaluated to compute the answers. Richard Montague (1970) adopted a similar technique for his treatment of quantifiers in natural language semantics.


4. Graphs

In diagrams, a graph is normally drawn as a network of nodes connected by arcs. Such diagrams introduce arbitrary conventions that are irrelevant to the mathematical definitions and theorems: Are the arcs drawn curved or straight? Short or long? Are the nodes drawn as dots, circles, or other shapes? Is there any significance in having node x above, below, to the right, or to the left of node y? To avoid such questions, a graph is defined formally without any reference to a diagram. Diagrams are then introduced as informal illustrations. Diagrams are essential for developing an intuitive understanding, but the definitions and proofs are independent of any features of the drawing that are not explicitly mentioned in the formal definitions.

A sample graph

Figure 1: A sample graph

Formally, a graph G consists of a set N of nodes and a set A of arcs. Every arc in A is a pair of nodes from the set N. For the sample graph in Figure 1, the set of nodes is {A, B, C, D, E}, and the set of arcs is {áA,Bñ, áA,Dñ, áB,Cñ, áC,Dñ, áD,Eñ}. Notice that node D happens to be an endpoint of three different arcs. That property can be seen instantly from the diagram, but it takes careful checking to verify it from the set of pairs. For people, diagrams are the most convenient way of thinking about graphs. For mathematical theories, a set of pairs is easier to axiomatize. And for computer implementations, many different data structures are used, such as blocks of storage for the nodes and pointers for the arcs.

Another graph

Figure 2: An alternate way of drawing the same graph as Figure 1.

Figure 2 is another way of drawing the same graph shown in Figure 1. The two diagrams look very different, but their abstract representations as sets of nodes and arcs are the same. Even when graphs are defined in a purely abstract way, questions may arise about the order of the two nodes of an arc. If the order is irrelevant, the notation {A,B} shows that the arc is an unordered set of two nodes. A graph whose arcs are unordered pairs is said to be undirected. If the order is significant, áA,Bñ and áB,Añ represent distinct arcs, and the graph is said to be directed. For the directed graph represented in Figures 1 and 2, an arrowhead on each arc points to the second node of each ordered pair.

Although graphs are defined abstractly, mathematicians normally visualize them as diagrams. The common conventions for drawing graphs are reflected in descriptive terms like endpoint, loop, path, and cycle. Let e be the arc áa,bñ. Then the nodes a and b are called endpoints of e, and e is said to connect a and b. If e is an arc of a directed graph, then the first endpoint a is called the source of e, and the second endpoint b is called the target of e. The word target is easy to remember since that is the direction the arrow points. A loop is an arc e whose endpoints are the same node: e=áa,añ.

Combinations of arcs are often named by the methods of traversing a graph. A walk through a graph is a sequence of nodes áa0, a1, ..., anñ for which any two adjacent nodes ai and ai+1 are the endpoints of some arc. Any arc whose endpoints are adjacent nodes of a walk is said to be traversed by the walk. A walk that contains n+1 nodes must traverse n arcs and is therefore said to be of length n. A path is a walk in which all nodes are distinct. A walk with only one node áa0ñ is a path of length 0. If the first and last nodes of a walk are the same, but all other nodes are distinct, then the walk is called a cycle. Every loop is a cycle of length 1, but cycles may traverse more than one arc.

For the graph in Figure 2, the walk áE, D, A, Bñ is a path because all nodes are distinct. The path is of length 3, which is equal to the number of arcs traversed by a point that moves along the path. The walk áD, C, B, A, Dñ is a cycle because it starts and ends at the same node.

If G is a directed graph, then a walk, path, or cycle through G may or may not follow the same direction as the arrows. A walk, path, or cycle through G is said to be directed if adjacent nodes occur in the same order in which they occur in some arc of G: if ai and ai+1 are adjacent nodes on the walk, then the ordered pair áai,ai+1ñ must be an arc of G. An arc of a directed graph is like a one-way street, and a directed walk obeys all the one-way signs (arrowheads). An undirected walk through a directed graph is possible, simply by ignoring the ordering.

A graph is connected if there is a possible path (directed or undirected) between any two nodes. If it is not connected, then it breaks down into disjoint components, each of which is connected, but none of which has a path linking it to any other component. A cutpoint of a graph is a node, which when removed, causes the graph (or the component in which it is located) to separate into two or more disconnected components.

Certain special cases of graphs are important enough to be given special names: an acyclic graph is one that has no cycles, and a tree is an acyclic connected graph for which the path between any two nodes is unique. The most commonly used trees are rooted trees:

The terminology of trees is extended to related graphs: a forest is a collection of disconnected trees; a chain is a tree with no branches ¾ all the nodes lie along a single path; and a seed has only one node and no arcs. Some authors require every graph to have at least one node, but other authors include the empty graph or blank, which has no nodes or arcs.

A binary tree

Figure 3: A binary tree

A binary tree is a rooted tree where every node that is not a leaf has exactly two children (Figure 3). In a binary tree, the two children of each node are usually designated as the left child and the right child. Since a tree has no cycles, a common convention for simplifying the diagrams is to omit the arrowheads on the arcs, but to draw the parent nodes at a higher level than their children. For Figure 3, the root A, which has no parent, is at the top; and the leaves, which have no children, are arranged along the bottom.

In computer applications, each node of a tree or other graph may have some associated data. To process that data, a program can take a walk through the tree and process the data at each node it visits. For the tree in Figure 3, imagine a walk that starts at the root, visits every node at least once, and stops when it returns to the root. Assume that the left child is always visited before the right child. Such a walk will visit the leaves of the tree only once, but it will visit each of the branching nodes three times: áA, B, D, B, E, B, A, C, F, H, F, I, F, C, G, C, Añ. There are therefore three options for processing the data at the branching nodes:

These definitions can be generalized to trees with an aribtrary number of children at each branching node. They can also be generalized to graphs by finding a spanning tree, which includes all the nodes of the graph, but omits some subset of the arcs.

A common application of graph or tree walking algorithms is the translation of a parse tree or a conceptual graph to some natural or artificial language. The patterns of word order in various natual languages can be generated by different ways of walking through a conceptual graph and translating the concept nodes to words of the target language (Sowa 1984). Irish and Biblical Hebrew, for example, are preorder languages that put the verb first, Latin and Japanese are postorder languages that put the verb last, and English and Chinese are inorder languages that put the verb in the middle.

The terminology for graphs in this section is fairly standard, but many of the ideas have been developed independently by different people, who have introduced different terms. Some authors use the terms vertex and edge instead of node and arc. Others distinguish degrees of connectivity in a directed graph: it is strongly connected if there is a directed path between any two nodes, and it is weakly connected if there is only an undirected path between some pair of nodes. Some authors use the term digraph as an abbreviation for directed graph, but that use is confusing, since digraph should mean double graph. Occasionally, people introduce fancy terms like arborescence for rooted tree, but the simpler terminology is more descriptive.


5. Relations

A relation is a function of one or more arguments whose range is the set of truth values {true,false}. An example of a dyadic or binary relation is the function less than represented by the operator symbol <. Its domain is the set of ordered pairs of real numbers, R?R:
<: R?R ® {true,false}.
When applied to the numbers 5 and 12, 5<12 has the value true, but 12<5 has the value false. Dyadic relations are often written as infix operators with special symbols, such as x<y or xÎS. Sometimes, relations are represented by single letters, such as R(x,y) or S(x,y,z). For improved readability, they may be represented by arbitrarily long alphanumeric strings, such as Mother(x,y) or Between(x,y,z). Traditional mathematics uses single letters or symbols, but programming languages and database systems usually have so many variables, functions, and relations that longer names are preferred. The term predicate is often used as a synonym for relation. Some authors, however, say that relations must have two or more arguments and call a predicate with one argument a property.

As with other functions, relations may be defined either by intension or by extension. An intensional definition is a rule for computing a value true or false for each possible input. An extensional definition is a set of all n-tuples of arguments for which the relation is true; for all other arguments, the relation is false. One instance of a relation being true is represented by a single n-tuple, called a relationship; the relation itself is the totality of all relationships of the same type. A marriage, for example, is a relationship between two individuals, but the relation called marriage is the totality of all marriages.

The following table lists some common types of relations, an axiom that states the defining constraint for each type, and an example of the type. The symbol ® represents an arbitrary dyadic relation.

Type Axiom Example
Reflexive ("x) x®x x is as old as y
Irreflexive ("x) not(x®x) x is the mother of y
Symmetric ("x,y) x®y implies y®x x is the spouse of y
Asymmetric ("x,y) x®y implies not(y®x) x is the husband of y
Antisymmetric ("x,y) x®y and y®x implies x=y x was present at y's birth
Transitive ("x,y) x®y and y®z implies x®z x is an ancestor of y

The symbol ", called the universal quantifier, may be read for every or for all; it is discussed further in Section 9 on predicate logic. Some important types of relations satisfy two or more of the above axioms:

In a relational database, the n-tuples of the relations are explicitly grouped in tables, which are accessed by means of the Structured Query Language (SQL). A stored relation is one whose values are physically stored by extension, and a virtual relation is computed by some rule. In an object-oriented database, the n-tuples of the relations are grouped with the entities or objects that occur in the n-tuples. The resulting data structures resemble graphs rather than tables, but they can represent exactly the same logical information (as shown in Section 6). In theory, implementation issues are irrelevant since the equivalent logical operations can be performed on either form. In practice, the question of how a relation is implemented may have important implications on efficiency or ease of use. As an extreme example, the relation x<y is trivial to compute, but it would require an infinite amount of space to store.


6. Representing Relations by Graphs

Graphs and dyadic relations are mathematical structures that look different, but they can represent the same information in logically equivalent ways. Historically, graphs are more closely associated with geometric properties that can be seen from diagrams, and relations are associated with more abstract mathematics and logic. But every dyadic relation can be represented as a graph, and every graph defines a dyadic relation.

Let G be a graph, and let the symbol ® represent the corresponding dyadic relation. If x and y are nodes in the graph G, define x®y=true if the pair áx,yñ is an arc of G, and x®y=false if áx,yñ is not an arc of G. If the graph is undirected, then ® is symmetric because it satisfies the axiom x®y=y®x.

Although every dyadic relation can be represented by a graph, some extensions are necessary to represent multiple relations in the same graph. A common technique is to label the arcs of a graph with the names of the relations they represent. Informally, a labeled graph can be viewed as a set of graphs ¾ one for each relation ¾ overlaid on top of one another with the labels showing which relation each arc has been derived from. Formally, a labeled graph G can be constructed by the following steps:

  1. Given a set {R1,...,Rn} of n dyadic relations, let {G1,...,Gn} be a set of graphs that represent each of those relations.
  2. The set of nodes N of the labeled graph G is defined as the union of the nodes Ni of each graph Gi for each i from 1 to n.
  3. The set of arcs A of G is the set of all triples of the form ái,a,bñ where i is an integer from 1 to n and áa,bñ is an arc in the set Ai of the graph Gi. The integer i is called the label of the arc.
This construction uses integers as labels, but it could be generalized to use character strings or other symbols as labels. It could even be generalized to infinite graphs or uncountably infinite graphs with real numbers used as labels. Such graphs could never be stored in a computer, but it is possible to define them mathematically and prove theorems about them.

Further extensions are needed to represent n-adic relations where n¹2. Two kinds of generalized graphs may be used to represent n-adic relations: hypergraphs and bipartite graphs. The unlabeled versions can represent a single relation, and the labeled versions can represent an arbitrary collection of n-adic relations, including the possibility of allowing different relations to have a different valence or number of arguments.

As these definitions illustrate, the mapping from relations to hypergraphs is somewhat simpler than the mapping from relations to bipartite graphs. Formally, however, hypergraphs and bipartite graphs are isomorphic, and they can represent the same relations in equivalent ways.

John is going to Boston by bus.

Figure 4: A conceptual graph represented as a labeled bipartite graph.

When bipartite graphs are used to represent conceptual graphs (CGs), the nodes in set C are called concepts and the nodes in set R are called conceptual relations. Figure 4 shows a conceptual graph that represents the English sentence John is going to Boston by bus. The boxes in Figure 4 represent concept nodes, and the circles represent conceptual relation nodes; every arc links a conceptual relation to a concept. Although a CG could be defined formally as either a hypergraph or a bipartite graph, the terminology of bipartite graphs maps more directly to the traditional way of drawing CGs. For a tutorial, see the examples of conceptual graphs. For the formal definitions, see the the draft proposed ANSI standard.


7. Lattices

A mathematical structure is a set together with one or more relations or operators defined over the set. A lattice is a structure consisting of a set L, a partial ordering £, and two dyadic operators Ç and È. If a and b are elements of L, aÇb is called the greatest lower bound or infimum of a and b, and aÈb is called the least upper bound or supremum of a and b. For any a, b, and c in L, these operators satisfy the following axioms: The symbols Ç and È are the same as the symbols for intersection and union of sets. This similarity is not an accident because the set of all subsets of a universal set U forms a lattice with the subset relation Ì as the partial ordering. A bounded lattice is one with a top T and a bottom ^, where for any element a in the lattice, a£T. All finite lattices are bounded, and so are many infinite ones. In a lattice of subsets, the universal set U itself is T, and the empty set {} is ^.

Since the È and Ç operators on lattices are so similar to the unions and intersections of sets, they have many of the same properties. Following are the identities defined for the set operators that also hold for lattice operators:

These identities hold for all lattices. A distributive lattice is one for which the distributive law also holds. A complemented lattice is one for which a complement operator and De Morgan's laws apply. The lattice of all subsets of some set is an example of a bounded distributive complemented lattice, for which all the identities hold.

Graphs of mathematical lattices look like the lattices that one would expect a wisteria vine to climb on. Figure 5 shows the graphs for three kinds of partial orderings: a tree, a lattice, and a general acyclic graph. To simplify the drawings, a common convention for acyclic graphs is to omit the arrows on the arcs, but to assume that the arcs are directed either from the higher node to the lower node or from the lower node to the higher. For the partial ordering a£b, the arc would be directed from a lower node a to a higher node b.

A lattice, a tree, and an acyclic graph

Figure 5: A lattice, a tree, and an acyclic graph

The term hierarchy is often used indiscriminately for any partial ordering. Some authors use the term hierarchy to mean a tree, and tangled hierarchy to mean an acyclic graph that is not a tree. In general, every tree is an acyclic graph, and every lattice is also an acyclic graph; but most lattices are not trees, and most trees are not lattices. In fact, the only graphs that are both trees and lattices are the simple chains (which are linearly ordered).

All the type hierarchies discussed in the book Knowledge Representation are partial orderings, and many of them are also lattices. Leibniz's Universal Characteristic, which is discussed in Section 1.1 of that book, defines a lattice of concept types in which the partial ordering operator £ is called subtype.

Another representation, which is isomorphic to Leibniz's products of primes, would represent each concept type by a bit string. If n is the number of primitive attributes, each product of primes could be mapped to a bit string of length n: Note the inverse relationship between the number of attributes that define a concept type and the number of entities to which it applies. The type Dog applies to fewer entities in the real world than its supertype Animal, but more attributes are required to describe it. This inverse relationship between the number of attributes required to define a concept and the number of entities to which it applies was first noted by Aristotle. It is called the duality of intension and extension.

Leibniz's method generates lattices with all possible combinations of attributes, but most combinations never occur in practice. The following table of beverages, which is taken from a paper by Michael Erdmann (1998), illustrates a typical situation in which many combinations do not occur. Some combinations are impossible, such as a beverage that is simultaneously alcoholic and nonalcoholic. Others are merely unlikely, such as hot and sparkling.

Attributes
Concept Types nonalcoholic hot  alcoholic caffeinic sparkling 
HerbTea       
Coffee     
MineralWater       
Wine         
Beer       
Cola     
Champagne       

Table of beverage types and attributes

To generate the minimal lattice for classifying the beverages in the above table, Erdmann applied the method of formal concept analysis (FCA), developed by Bernhard Ganter and Rudolf Wille (1999) and implemented in an automated tool called Toscana. Figure 6 shows the resulting lattice; attributes begin with lower-case letters, and concept types begin with upper-case letters.

Lattice of beverages

Figure 6: Lattice constructed by the method of formal concept analysis

In Figure 6, beer and champagne are both classified at the same node, since they have exactly the same attributes. To distinguish them more clearly, wine and champage could be assigned the attribute madeFromGrapes, and beer the attribute madeFromGrain. Then the Toscana system would automatically generate a new lattice with three added nodes:

Figure 7 shows the revised lattice with the new nodes and attributes.

Revised lattice

Figure 7: Revised lattice with new attributes

Note that the attribute nonalcoholic is redundant, since it is the complement of the attribute alcoholic. If that attribute had been omitted from the table, the FCA method would still have constructed the same lattice. The only difference is that the node corresponding to the attribute nonalcoholic would not have a label. In a lattice for a familiar domain, such as beverages, most of the nodes correspond to common words or phrases. In Figure 7, the only node that does not correspond to a common word or phrase in English is sparklingÇalcoholic.

Lattices are especially important for representing ontology and the techiques for revising, refining, and sharing ontologies. Each addition of a new attribute results in a new lattice, which is arefinement of the previous lattice. A refinement generated by FCA contains only the minimal number of nodes needed to accommodate the new attribute and its subtypes. Leibniz's method, which generates all possible combinations, would introduce superfluous nodes, such as hot Ç caffeinic Ç sparkling Ç madeFromGrapes. The FCA lattices, however, contain only the known concept types and likely generalizations, such as sparkling Ç alcoholic. For this example, Leibniz's method would generate a lattice of 64 nodes, but the FCA method generates only 14 nodes. A Leibniz-style of lattice is the ultimate refinement for a given set of attributes, and it may be useful when all possible combinations must be considered. But the more compact FCA lattices avoid the nonexistent combinations.

A further study of ontology raises questions about the origin of the various attributes and their relationships to one another. In Leibniz's method and the FCA method, the attributes madeFromGrapes and madeFromGrain are treated as independent primitives. Yet both of them could be analyzed as the dyadic madeFrom relation combined with either grapes or grain. Then madeFrom could be further analyzed into make and from, but the verb make would raise new questions about the difference between making wine from grapes and making a milkshake from milk. The plural noun grapes and the mass noun grain would also raise questions about quantification and measurement. A lattice is an important structure for organizing concept types, but a complete definition of those types leads to all the problems of language, logic, and ontology. For further discussion of those issues, see the paper "Concepts in the Lexicon".


8. Propositional Logic

Symbolic logic has two main branches: propositional calculus and predicate calculus, which are also called propositional logic and predicate logic. Propositional logic deals with statements or propositions and the connections between them. The symbol p, for example, could represent the proposition Lillian is the mother of Leslie. Predicate logic, however, would represent that proposition by a predicate Mother applied to the names of the two individuals: Mother(Lillian,Leslie). Whereas propositional logic represents a complete statement by a single symbol, predicate logic analyzes the statement into finer components.

Besides symbols for propositions, propositional logic also includes Boolean operators that represent logical relations such as and, or, not, and if-then. Let p be the proposition The sun is shining, and let q be the proposition It is raining. The most commonly used operators in propositional logic correspond to the English words and, or, not, if-then, and if-and-only-if:

The propositions represented in symbolic logic may be true or false. The rules of propositional logic compute the truth value of a compound proposition from the truth or falsity of the elementary propositions contained within it. They are therefore called truth functions, whose inputs are the truth values T for true and F for false. The following table, called a truth table, shows the outputs generated by the five truth functions Ù, Ú, ~, É, and º for all possible combinations of the two inputs p and q.

Inputs Outputs 
p q pÙq pÚq ~p pÉq pºq
T T T T F T T
T F F T F F F
F T F T T T F
F F F F T T

There are 16 possible truth functions of two arguments, but the five listed in this table are the most commonly used. Another operator that is sometimes used is exclusive or, which is equivalent to p or q, but not both. Two operators commonly used in computer circuit design are nand and nor with the symbols &nand; and &nor;:
p&nand;q is equivalent to ~(pÙq), 
p&nor;q is equivalent to ~(pÚq). 

If one or two Boolean operators are taken as primitives, the others can be defined in terms of them. One common choice of primitives is the pair ~ and Ù. The other operators can be defined by the following combinations:
pÚq is equivalent to ~(~p Ù ~q)
pÉq is equivalent to ~(p Ù ~q)
pºq is equivalent to ~(p Ù ~q) Ù ~(~p Ù q)

In fact, only one primitive operator, either &nand; or &nor;, is necessary since both ~ and Ù can be defined in terms of either one of them:
~p is equivalent to (p &nand; p)
~p is equivalent to (p &nor; p)
pÙq is equivalent to (p &nand; q) &nand; (p &nand; q)
pÙq is equivalent to (p &nor; p) &nor; (q &nor; q)

Peirce's existential graphs, which are discussed in Chapter 5 of the book Knowledge Representation, use negation and conjunction as the two primitives. Peirce was also the first person to discover that all the other Boolean operators could be defined in terms of &nand; or &nor;.


9. Predicate Logic

In propositional logic, the proposition All peaches are fuzzy may be represented by a single symbol p. In predicate logic, however, the fine structure of the proposition is analyzed in detail. Then p would be represented by an entire formula,
"x(peach(x) É fuzzy(x)).
The symbol " is called the universal quantifier, and the symbols peach and fuzzy are predicates or relations, such as the ones described in Section 5 on relations. The combination "x may be read for every x or for all x, the combination peach(x) may be read x is a peach, and the combination fuzzy(x) may be read x is fuzzy. The entire formula, therefore, may be read For every x, if x is a peach, then x is fuzzy. Since predicates (or relations) are functions that yield truth values as results and since the Boolean operators are functions that take truth values as their inputs, predicates can be combined with the same operators used in the propositional logic.

Besides the universal quantifier, predicate logic has an existential quantifier represented as $. The combination $x may be read there exists an x such that. The following formula uses an existential quantifier:

~$x(peach(x) Ù ~fuzzy(x)).
This may be read It is false that there exists an x such that x is a peach and x is not fuzzy. Formulas with more than one quantifier are possible. The English statement For every integer x, there is a prime number greater than x is represented as
"x$y(integer(x) É (prime(y) Ù x<y)).
Literally, this formula may be read For every x, there exists a y such that if x is an integer, then y is prime and x is less than y.

The two kinds of quantifiers, Boolean operators, variables, predicates, and the rules for putting them together in formulas make up the entire notation of first-order predicate calculus, which is also known as first-order logic or FOL. It is called first order because the range of quantifiers is restricted to simple, unanalyzable individuals.

Higher-order logic or HOL goes beyond FOL by allowing function variables and predicate variables to be governed by quantifiers. An example of a higher-order formula is the axiom of induction:

"P((P(0) Ù "n(P(n) É P(n+1)) É "nP(n)).
This formula may be read For every predicate P, if P is true of 0, and for every n, P(n) implies P(n+1), then P is true of every n. This is the only axiom for arithmetic that requires more expressive power than first-order logic.

Any of the functions, operators, relations, and predicates of Sections 1 through 6 can also be used in the formulas of first-order logic. Following are the formation rules that define the syntax of formulas:

Completely formalized presentations of logic go into great detail about the syntactic form of variables, constants, and functions. When those details are omitted, the nature of the terms can be stated by a declaration, such as let x be a variable or let c be a constant.

The formation rules of first-order logic are an example of a recursive definition. By applying them repeatedly, any possible formula can be derived. Suppose that f is a monadic function and + is a dyadic operator; then f(x) and 2+2 are terms. (By the conventions of Section 2, functions written with single characters are called operators, but they form terms just like other functions.) If P is a dyadic predicate and Q is a monadic predicate, then P(f(x),2+2) and Q(7) are atoms. Since all atoms are formulas, these two formulas can be combined by the Boolean operator É to form a new formula:

(P(f(x),2+2) É Q(7)).
Since any formula may be preceded by ~ to form another formula, the following formula may be derived:
~(P(f(x),2+2) É Q(7)).
Putting the quantifier "y in front of it produces
"y~(P(f(x),2+2) É Q(7)).
Adding another quantifier $x produces
$x"y~(P(f(x),2+2) É Q(7)).
And preceding this formula with ~ produces
~$x"y~(P(f(x),2+2) É Q(7)).
In this formula, the occurrence of x in f(x) is bound by the quantifier $x, but the quantifier "y has no effect on the formula since there is no other occurrence of the variable y.

The order of quantifiers in predicate logic makes a crucial difference, as it does in English. Consider the sentence Every man in department C99 married a woman who came from Boston, which may be represented by the formula,

"x$y((man(x) Ù dept(x,C99)) É 
     (woman(y) Ù hometown(y,Boston) Ù married(x,y))).
This formula says that for every x there exists a y such that if x is a man and x works in department C99, then y is a woman, the home town of y is Boston, and x married y. Since the dyadic predicate married is symmetric, married(Ike,Mamie) is equivalent to married(Mamie,Ike). Interchanging the arguments of that predicate makes no difference, but interchanging the two quantifiers leads to the formula,
$y"x((man(x) Ù dept(x,C99)) É 
     (woman(y) Ù hometown(y,Boston) Ù married(x,y))).
This formula says that there exists a y such that for every x, if x is a man and x works in department C99, then y is a woman, the home town of y is Boston, and x married y. In ordinary English, that would be the same as saying, A woman who came from Boston married every man in department C99. If there is more than one man in department C99, this sentence has implications that are very different from the preceding one.

The first version of predicate logic was developed by Gottlob Frege (1879), but in a notation that no one else ever used. The more common algebraic notation, which has been presented in this section, was defined by Charles Sanders Peirce (1883, 1885), but with a different choice of symbols for the quantifiers and Boolean operators. Giuseppe Peano (1889) adopted the notation from Peirce and introduced the symbols that are still in use today. That notation is sometimes called Peano-Russell notation, since Alfred North Whitehead and Bertrand Russell popularized it in the Principia Mathematica. But it is more accurate to call it Peirce-Peano notation, since the extensions that Russell and Whitehead added are rarely used today. The notation presented in this section is Peirce's algebraic notation with Peano's choice of symbols. For a survey of other notations for logic, see the examples that compare predicate calculus to conceptual graphs and the Knowledge Interchange Format (KIF). Aristotle presented his original syllogisms in a stylized version of Greek, and modern computer systems sometimes represent predicate calculus in a stylized or controlled natural language.


10. Axioms and Proofs

Formation rules define the syntax of first-order logic. They do not, however, say anything about the meaning of the formulas, their truth or falsity, or how they are related to one another. Model theory, which is discussed in Section 13, defines an evaluation function that determines whether a formula is true or false in terms of some model. Proof theory, which is discussed in this section, determines how true formulas can be derived from other true formulas. Since the time of Aristotle, many different proof procedures have been developed for logic, but they all share some common characteristics: As an example, the following two formulas are equivalent ¾ each one implies the other:
"x(peach(x) É fuzzy(x))

~$x(peach(x) Ù ~fuzzy(x))
To prove that these formulas are equivalent, it is necessary to find two proofs. One proof would start with the first formula as hypothesis and apply the rules of inference to derive the second. The second proof would start with the second formula as hypothesis and derive the first.

Any equivalence, whether assumed as a definition or proved as a theorem, can be used to derive one formula from another by the rule of substitution. For any formula A, either of the following two equivalences may be assumed as a definition, and the other can be proved as a theorem:
$xA is equivalent to ~"x~A
"xA is equivalent to ~$x~A

Since these equivalences are true when A is any formula whatever, they remain true when any particular formula, such as (peach(x) É fuzzy(x)), is substituted for A. With this substitution in both sides of the equivalence for ", the left side becomes identical to the first peach formula above, and the right side becomes the following formula:

~$x~(peach(x) É fuzzy(x)).
This formula can be tranformed to other equivalent formulas by using any of the equivalences given in Section 8. For any formulas p and q, the formula pÉq was defined as equivalent to the formula ~(pÙ~q). Therefore, the next formula can be derived by subsituting peach(x) for p and fuzzy(x) for q:
~$x~~(peach(x) Ù ~fuzzy(x)).
For any formula p, the double negation ~~p is equivalent to p. Therefore, the double negation in the previous formula can be deleted to derive
~$x(peach(x) Ù ~fuzzy(x)),
which is identical to the second peach formula. Therefore, the second peach formula can be proved from the first.

To prove that both peach formulas are equivalent, another proof must start with the second formula as hypothesis and derive the first formula. For this example, the second proof is easy to find because each step in the previous proof used an equivalence. Therefore, each step can be reversed to derive the first formula from the second. Most proofs, however, are not reversible because some of the most important rules of inference are not equivalences.

Following are some rules of inference for the propositional logic. The symbols p, q, and r represent any formulas whatever. Since these symbols can also represent formulas that include predicates and quantifiers, these same rules can be used for predicate logic. Of these rules, only the rule of conjunction is an equivalence; none of the others are reversible.

Not all of these rules are primitive. In developing a theory of logic, logicians try to minimize the number of primitive rules. Then they show that other rules, called derived rules of inference, can be defined in terms of them. Nevertheless, for all versions of classical propositional and predicate logic, these rules are either primitive or derived rules of inference.

Following are some common equivalences. When two formulas are equivalent, either one can be substituted for any occurrence of the other, either alone or as part of some larger formula:

These equivalences happen to have exactly the same form as the ones for union and intersection of sets. This similarity is not an accident because George Boole used the same operators of Boolean algebra for sets and propositions. Giuseppe Peano chose a different selection of symbols because he wanted to use logic to prove theorems about sets, and it would be confusing to use the same symbols for both kinds of operators. It is helpful, however, to remember that the formulas for Ç and È can be used for Ù and Ú just by making the rounded ends pointy.

For predicate logic, the rules of inference include all the rules of propositional logic with additional rules about substituting values for quantified variables. Before those rules can be stated, however, a distinction must be drawn between free occurrences and bound occurrences of a variable:

Once the definitions of free and bound occurrences have been stated, rules of inference that deal with quantifiers can be given. Let F(x) be a formula containing a free occurrence of a variable x. Then F(t) is the result of substituting a term t for every free occurrence of x in F. Following are the additional rules of inference: In a sound theory, the rules of inference preserve truth: only true formulas can be proved. In a complete theory, all true formulas can be proved. Kurt G?del (1930) showed that the rules of inference for first-order logic are both sound and complete. In the following year, he showed that the rules of inference for higher-order logic are incomplete; i.e., there exist true formulas that are not provable.


11. Formal Grammars

A formal grammar is a system for defining the syntax of a language by specifying the strings of symbols or sentences that are considered grammatical. Since the set of grammatical sentences of a language may be very large or infinite, they are usually derived by a recursive definition. The derivations are generated by production rules, which were developed by Axel Thue (1914) as a method for transforming strings of symbols. Emil Post (1943) showed that production rules were general enough to simulate a Turing machine. Andrei Andreyevich Markov (1954) developed a general theory of algorithms based on Post production rules. Besides string transformations, Markov showed how production rules could compute any mathematical function that was computable by recursive functions or the lambda calculus.

Noam Chomsky (1956, 1957) used production rules for specifying the syntax of natural languages, and John Backus (1959) used them to specify programming languages. Although Chomsky and Backus both adopted their notations from Post, they found that the completely unrestricted versions used by Thue, Post, and Markov were more powerful than they needed. Backus limited his grammars to the context-free rules, while Chomsky also used the more general, but still restricted context-sensitive rules. The unrestricted rules can be inefficient or undecidable, but the more restricted rules allow simpler, more efficient algorithms for analyzing or parsing a sentence.

A grammar has two main categories of symbols: terminal symbols like the, dog, or jump, which appear in the sentences of the language itself; and nonterminal symbols like N, NP, and S, which represent the grammatical categories noun, noun phrase, and sentence. The production rules state how the nonterminal symbols are transformed in generating sentences of the language. Terminal symbols are called terminal because no production rules apply to them: when a derivation generates a string consisting only of terminal symbols, it must terminate. Nonterminal symbols, however, keep getting replaced during the derivation. A formal grammar G has four components:

The start symbol corresponds to the highest level category that is recognized by the grammar, such as sentence. The production rules generate sentences by starting with the start symbol S and systematically replacing nonterminal symbols until a string consisting only of terminals is derived. A parsing program applies the rules in reverse to determine whether a given string is a sentence that can be generated from S. Grammars of this form are called phrase-structure grammars because they determine the structure of a sentence as a hierarchy of phrases.

Some convention is needed to distinguish terminals from nonterminals. Some people write terminal symbols in lower case letters and nonterminals in upper case; other people adopt the opposite convention. To be explicit, this section will enclose terminal symbols in double quotes, as in "the" or "dog". To illustrate the formalism, the following grammar defines a small subset of English:

The set T defines a 6-word vocabulary, and the set N defines the basic grammatical categories. The starting symbol S represents a complete sentence. The symbol NP represents a noun phrase, VP a verb phrase, Det a determiner, N a noun, and V a verb. The following 9 production rules determine the grammatical combinations for this language:
S   ®  NP VP
NP  ®  Det N
VP  ®  V NP
Det ®  "the"
Det ®  "a"
N   ®  "cat"
N   ®  "dog"
V   ®  "saw"
V   ®  "chased"
This grammar may be used to generate sentences by starting with the symbol S and successively replacing nonterminal symbols on the left-hand side of some rule with the string of symbols on the right:
S
NP VP
Det N VP
a N VP
a dog VP
a dog V NP
a dog chased NP
a dog chased Det N
a dog chased the N
a dog chased the dog
Since the last line contains only terminal symbols, the derivation stops. When more than one rule applies, any one may be used. The symbol V, for example, could have been replaced by saw instead of chased. The same grammar could be used to parse a sentence by applying the rules in reverse. The parsing would start with a sentence like a dog chased the dog and reduce it to the start symbol S.

The production rules for the above grammar belong to the category of context-free rules. Other classes of grammars are ranked according to the complexity of their production rules. The following four categories of complexity were originally defined by Chomsky:

Each one of these classes of grammars is more general than the one before and requires more complex parsing algorithms to recognize sentences in the language. Every finite-state grammar is also context free, every context-free grammar is also context sensitive, and every context-sensitive grammar is also a general-rewrite grammar. But the converses do not hold. For both programming languages and natural languages, intermediate levels of complexity have been defined for which parsing algorithms of greater or lesser efficiency can be written.

Once a grammar is defined, all grammatical sentences in the language can be generated by the following procedure:

  1. Write the start symbol as the first line of a derivation.
  2. To derive a new line, find some production rule whose left-hand side matches a substring of symbols in the current line. Then copy the current line, replacing the matching substring with the symbols on the right-hand side of the production rule.
  3. If more than one production rule has a matching left-hand side, then any one of them may be applied.
  4. If no production rule can be applied to the current line, then stop; otherwise, go back to step #2.
The last line in a derivation is called a sentence of the language defined by the given grammar.

For convenience, production rules may be written in an extended notation, which uses some additional symbols. The new symbols do not increase the number of sentences that can be derived, but they reduce the total number of grammar rules that need to be written. The following extensions, which define regular expressions, are used in Unix utilities such as grep and awk.

With these extensions, any finite-state or regular grammar can be expressed in a single production rule whose right-hand side consists of a regular expression. The more complex grammars require more than one production rule, but usually many fewer than with the basic notation.

Some examples may help to show how the extended notation reduces the number rules. By using the vertical bar, the following two production rules,

®  "cat"
N  ®  "dog"
can be combined in a single rule:
®  "cat" | "dog"
If the grammar permits a noun phrase to contain an optional adjective, it might use the following two rules to define NP:
NP  ®  Det N
NP  ®  Det Adj N
Then both rules can be combined by using the question mark to indicate an optional adjective:
NP  ®  Det Adj? N
If an optional list of adjectives is permitted, then the question mark can be replaced with an asterisk:
NP  ®  Det Adj* N
This single rule in the extended notation is equivalent to the following four rules in the basic notation:
NP  ®  Det N
NP  ®  Det AdjList N
AdjList  ®  Adj
AdjList  ®  Adj AdjList
The last production rule has a tail recursion, in which the leftmost symbol AdjList is replaced by a string that includes the same symbol. To generate an unlimited number of possible sentences, a grammar must have at least one rule that is directly or indirectly recursive. Since the sample grammar for a fragment of English had no recursive rules, it could only generate a finite number of different sentences (in this case 32). With the addition of a recursive rule such as this, it could generate infinitely many sentences (all of which would be rather boring).

By allowing embedded recursions, the more general context-free grammars can enforce constraints that cannot be expressed in a finite-state or regular grammar. The following two rules, for example, generate all strings consisting of n left parentheses followed by n right parentheses:

®  "(" S ")"
S  ®  "(" ")"
Since each rule adds a balanced pair of parentheses, the result of applying these rules any number of times must always be balanced. A finite-state grammar or a regular expression cannot guarantee that the number of parentheses on the right matches the number on the left. The following regular expression, for example, generates too many strings:
®  "("+ ")"+
Besides the strings of balanced parentheses, it generates strings like "())))" and "((((())". A context-free grammar can ensure that the two sides are balanced by generating part of the right side and the corresponding part of the left side in the same rule. A context-sensitive grammar can impose more general constraints that depend on combinations of symbols that occur anywhere in the string. A general-rewrite grammar can impose any constraint that can be formulated in any programming language.


12. Game Graphs

Programs in artificial intelligence use trees and graphs to represent games. Chess, checkers, and tic-tac-toe may be represented by directed graphs whose nodes represent game positions or states and whose arcs represent moves from one state to the next. A complete play of a game, called a game path, is a directed walk from some starting state to an ending state that determines a win, loss, or draw.

This section describes a common type of games called two-person zero-sum perfect-information games. They are called two-person games to distinguish them from games like poker with many players; they are zero-sum games because anything that one player loses the other player wins (as distinguished from negative-sum games where the house takes a cut or positive-sum games where new values are created); and they are perfect-information games because each player can see the complete state at all times (as distinguished from poker or bridge where some of the most significant information is hidden). Following are some basic definitions:

This definition is not general enough to represent all possible games, but it is adequate for typical board games. It allows the possibility of games with more than one starting state, as in a game of chess where one player is given a handicap. For many games, the payoff is +1 for a win, and -1 for a loss. Other games have a numerical score with a wider range.

The play of the game consists of moves from state to state. If more than one move is permitted in a given state, the player on move has the right to choose which one to play. For a state áp1,...,pnñ, the first symbol p1 identifies the player on move, and the remaining information p2,...,pn depends on the type of game. In chess, for example, the current state describes the location of all the pieces on the board, but it also includes control information about the possibility of castling or en passant pawn captures.

Since the payoff function is only defined for ending states, the value at nonending states may be computed on the assumption that each player makes an optimal choice at each move. In choosing moves, A's strategy is to maximize the payoff, and B's strategy is to minimize the payoff. Therefore, the usual method for computing the value at a state P of a game G is called a minimax algorithm because it alternately tries to minimize or maximize the predicted value depending on which player is on move. The value at state P is determined by a recursive function value(P):

The value function computes the expected payoff if both players make the best moves at each turn. If some game paths are infinite, value(P) may be undefined for some state P; if all game paths are finite, value(P) is defined for all P. Books on AI that cover game-playing programs discuss algorithms for evaluating this function efficiently or approximating it if an exact evaluation would take too much time. The result of the value function is also called the backed-up value because it computes the value of a nonending state P by looking ahead to the ending states that are reachable from P and computing backward until it determines the value for P.


13. Model Theory

In two famous papers, "The Concept of Truth in Formalized Languages" and "On the Concept of Logical Consequence," Alfred Tarski (1935, 1936) began the development of model theory. He described his theory as a formalization in symbolic logic of Aristotle's classic definition (Metaphysics, 1011b27):
To say of what is that it is not, or of what is not that it is, is false,
while to say of what is that it is, or of what is not that it is not, is true.
For Aristotle, these conditions define what it means for a sentence in a natural language to be true about the world. Tarski, however, made two important simplifications. First, he restricted his definitions to formalized languages, in particular, to first-order predicate logic. Second, he replaced the complexities of the real world by an abstract "model of an axiom system." For an informal discussion of Tarski's approach, see his paper "The Semantic Conception of Truth".

Formally, a model M consists of a set D of entities called a domain of individuals and a set R of relations defined over the individuals in D. To determine whether a sentence is true or false, model theory defines an evaluation function F, which can be applied to any formula p in first-order logic and any model M=áD,Rñ:

The technical material in Tarski's papers defines the evaluation function F in terms of the syntax of first-order logic.

One of the most complicated features of Tarski's original definition is his method of assigning entities in the domain D to the variables in the formula p. Although his definition is mathematically correct, it is inefficient to compute for finite domains and impossible to compute for infinite domains. As an example, consider the statement For every integer n, the integer 2n is even. In predicate calculus, that statement could be expressed by the following formula:

("n)($m)(integer(n) É (times(2,n,m) Ù even(m))).
According to Tarski's original definition, the truth of this formula is determined by assigning all possible integers to the variable n and checking whether there exists a value of m that makes the body of the formula true for each assignment. Such a computation is impossible for infinite domains and extremely inefficient even for finite domains.

To simplify the evaluation of F, the logician Jaakko Hintikka (1973, 1985) developed game-theoretical semantics as a systematic method for assigning values to the variables one at a time. Risto Hilpinen (1982) showed that game-theoretical semantics is equivalent to the method of endoporeutic, which Peirce developed for determining the truth value of an existential graph. Sowa (1984) adopted game-theoretical semantics as the method for defining the semantics of conceptual graphs. In an introductory textbook for teaching model theory, Barwise and Etchemendy (1993) adopted game-theoretical semantics and wrote a computer program to teach their readers how to play the game. Game-theoretical semantics is mathematically equivalent to Tarski's original definition, but it is easier to explain, easier to generalize to a wide variety of languages, and more efficient to implement in computer programs.

In game-theoretical semantics, every formula p determines a two-person zero-sum perfect-information game, as defined in Section 12. The two players in the evaluation game are the proposer, who wins the game by showing that p is true, and the skeptic, who wins by showing that p is false. To play the game, the two players analyze p from the outside in, according to the following rules. The rules progressivley simplify the formula p by removing quantifiers and Boolean operators until the formula is reduced to a single atom.

Each rule except implication simplifies the formula p by removing one quantifier or Boolean operator. The rules for conjunction and disjunction may even throw away half of the formula. Although the implication rule replaces one operator with two, the new operators are simpler than É. Sooner or later, the game ends when p is reduced to a single atom. Then the winner or loser is determined by checking whether that atom consists of the name P of some n-adic relation in R applied to an n-tuple of arguments ác1,...,cnñ for which that relation is true. If neither player made a bad move at any step, the original p is true if the player who started the game as proposer is the winner and false if the player who started the game as skeptic is the winner. In effect, the procedure for computing the evaluation function F(p,M) is equivalent to determining which side has a winning strategy in the game specified by the formula p for the given model M.

To illustrate the evaluation game, compute the denotation of the formula that expresses the statement For every integer n, the integer 2n is even:

("n)($m)(integer(n) É (times(2,n,m) Ù even(m))).
For this example, the domain D is the set of all integers and the set of relations R would include the relations named times, integer, and even. Since the extensions of these relations contain infinitely many tuples, they could not be stored explicitly on a computer, but they could easily be computed as needed.
  1. Since the first quantifier is ", the skeptic makes the first move by choosing any element of D. Suppose that the skeptic chooses n=17. Then the new version of p becomes
  2. ($m)(integer(17) É (times(2,17,m) Ù even(m))).
  3. Since the quantifier is $, the proposer makes the next move. By peeking ahead, a clever proposer would guess m=34. Then the new version of p becomes
  4. integer(17) É (times(2,17,34) Ù even(34)).
  5. By the rule for É, this formula is converted to
  6. ~integer(17) Ú (times(2,17,34) Ù even(34)).
  7. By the rule for Ú, the proposer can choose either side of Ú. Suppose that the proposer chooses the right-hand side:
  8. times(2,17,34) Ù even(34).
  9. By the rule for Ù, the skeptic can choose either side of the Ù. Choosing the left side,
  10. times(2,17,34).
  11. Finally, the proposer and the skeptic search the table of triples for the times relation. Since á2,17,34ñ is in the table, the proposer wins, and p is declared to be true.
For the proposer to win one game is not sufficient to show that the formula has denotation true. Instead, the proposer must have a winning strategy for any possible move by the skeptic. Since the domain of integers is infinite, the skeptic has infinitely many options at step 1. But for any choice n by the skeptic at step 1, the proposer can always choose the value 2n for m at step 2. The next option for the skeptic is at step 5, where the formula would become
times(2,n,2n) Ù even(2n).
Since both sides of the Ù operator must be true, the skeptic has no possibility of winning. Therefore, the proposer is guaranteed to win, and the formula is true.

This example shows that a winning strategy can often be found for models that allow infinitely many possible moves. Leon Henkin (1959) showed that game-theoretical semantics could also be applied to infinitely long formulas. In some cases, the truth value of an infinite formula can be computed in just a finite number of steps. For example, consider an infinite conjunction:

p1 Ù p2 Ù p3 Ù ...
If this formula happens to be true, the evaluation game would take infinitely long, since every pi would have to be tested. But if it is false, the game would stop when the skeptic finds the first false pi. A similar optimization holds for infinite disjunctions:
p1 Ú p2 Ú p3 Ú ...
If a disjunctive formula is false, the evaluation game would take infinitely long. But if it is true, the game would stop when the proposer finds the first true pi. Although nothing implemented in a computer can be truly infinite, a typical database or knowledge base may have millions of options. Tarski's original definition would require an exhaustive check of all possibilities, but the game-theoretical method can prune away large branches of the computation. In effect, the optimizations are equivalent to the optimizations used in computer game-playing programs. They are also similar to the methods used to answer an SQL query in terms of a relational database.


14. References

All bibliographical references have been moved to the combined bibliography for this web site.

Anyone who has found these notes useful may also be interested in the following on-line resources:


Copyright ©1999 by John F. Sowa. Permission is hereby granted for anyone to make verbatim copies of this document for teaching, self-study, or other noncommercial purposes provided that the copies cite the author, the URL of this document, and this permission statement. Please send comments to John Sowa.

   Last Modified: