The aim of this research is to try to understand the mental mechanisms involved in understanding natural language by observing and simulating on computer the relationships between texts and corresponding formal structures in small problem areas where tasks for the computer are sufficiently clear.
The first question arising when one speaks of meaning representation is what is understood by "meaning". One might think of meaning as of some kind of formal structure for which the natural language is a method of encoding; but then the same question will arise as for the ordinary languages, namely, what is the relation of this formal structure to the real world it intends to describe. Replacing one formal object (a sequence of characters constituting a text in a natural language) by another formal object (e.g., a labelled graph) doesn't seem to improve our understanding of the relationships between a text and the real world.
The situation is different when the use of the natural language is restricted to a sharply defined narrow problem area. Well known examples are names for moments of time or for family relationships. In such cases meaning has an obvious formal representation, clearly related to the part of the real world it describes; studying the correspondence between such formal structures and their representation in a natural language becomes a meaningful task. However the problem is that such clearly defined areas are very narrow and they cannot account for the complex mechanisms of natural languages.
The approach to "meaning" chosen in this work is that I don't seek to represent the mechanisms of natural languages (or the part of those mechanisms to which we restrict our attention) as a unique well-defined formal system. A natural language (in contrast to any mathematically defined calculus) is an open system that can be easily modified to incorporate every new field of human activities, and the structure of the language must support this flexibility. Thus I treat the language as a free combination of a great number of subsystems each serving some particular small field of activity which probably can have a precise mathematical model.
The subsystems can be easily added or replaced (e.g., when a mathematical model becomes too crude for new applications) and their fields can overlap, so the same situation can be described in terms of several subsystems (which is known to be very fruitful in solving various problems). However the subsystems are not completely independent of each other because each of them has to account for relations between its particular field and natural language texts (or utterances), and therefore all subsystems can share some common language mechanisms part of which is known as grammar. No particular language mechanism is mandatory for a subsystem, but the rich choice of such mechanisms present in the language makes it easy to build new subsystems or to modify existing ones, and it is rather a rare occasion that a new mechanism must be added. Thus the basic structures of a language are sufficiently stable.
It is the variety of linguistic mechanisms serving various problem areas that I am going to discuss. It comprises more than what we traditionally call morphology and syntax and what can be described more or less accurately by means of traditional "universal" mathematical models. Many commonly used words do not belong to any particular problem area, they can be used in many areas representing each time different things in the real world but preserving some properties and relationships (e.g., "to close" means to cancel the effect of "to open", no matter what you had opened, a bottle or a computer file; the relationship "open" - "use" - "close" serves well in both areas and in many others). Such words are also part of what the language offers to every problem area.
The object of my studies is the process leading from a text to some appropriate response in a specified problem area. The ultimate response is an external action, but for a text which is in some way incomplete the response may amount to remembering some internal information which will be used to produce responses to future texts. And we have to develop an internal representation for such information. Then we can describe the "meaning" of a word or of a statement in terms of changes it makes in this internal representation.
The internal representation has the form of an "association net" (essentially, a common semantic net), consisting of nodes representing some entities (usually specific objects found either in the text or in the real world) and directed arcs connecting the nodes; some nodes may represent numbers, strings of characters, procedures, or finite sets of other nodes. The important feature is that the arcs are named, and the names of arcs issuing from the same node must be all distinct. One of the operations on such a net is merging two nodes; identically named arcs issuing from the two nodes have to be merged too.
Thus arc names are meaningful, because they help match independently formed structures. The vocabulary of such names is one of the things the language offers to its independent subsystems. The names need not be actual words of the natural language; most often they are special symbols designating roles, e.g., syntactic or semantic cases. The words "open" and "close" from the preceding example can be regarded as such names applicable to a number of subsystems, each time identifying an arc leading from an object to the description of the respective action. And there can be a general "open--close" frame describing the relationships between the actions that are supposed to be valid whenever the language chooses to use this metaphor in a particular area. (The relationships need not be mandatory, they are used as defaults that may be overrun; e.g., mathematicians don't normally use the word "addition" for a non-commutative operation, but addition of ordinals is a counter-example. On the other hand, if too many relations specified by a general frame do not apply to a particular subsystem there is no point in using this frame, and it is better to find a different word for the situation. I will not discuss here this problem any more, and the relations specified by the general frame chosen will never be cancelled.)
The general frames just mentioned are much like M.Minsky's frames in that each of them represents some general idea and may comprise both data and procedures. Of course, they are not as powerful as his, but not as simple as standard structures offered by some "AI shells". Technically, a frame is a separately stored association net with one node designated as the main node. Using a frame consists in making its copy for a particular situation and then probably merging the copy of the main node with some previously defined node and further propagating the merger along identically named arcs. Thus arc names serve as the frame's interface to the rest of the net (which might have been built out of copies of other frames) and the names of the arcs issuing from the main node are analogous to "slots". Applying a frame to a node is informally equivalent to making a statement that the object represented by the node belongs to the class represented by the frame. This brings additional information to the object, including some new procedures.
The process of interpretation of a natural language text can now be described as follows. For each meaningful word we must have a predefined frame, representing the objects, relations and actions connected with the meaning of the word. (Auxiliary words have their frames too because we need to represent syntactic relationships as well as semantics.) E.g., a frame for a verb would include a node representing the word itself and another node to represent the action or situation of the real (or imaginary) world named by this verb. The second would be attached to the first by an arc with the standard name 'reference'. Then there probably would be a node representing a subject (a noun) to this verb attached by an arc named 'subject'. The latter node would have its own 'reference' to a real object that is expected to play some role in the action or the situation represented by the verb. Thus the 'reference' of the verb would have an arc leading to the 'reference' of the subject; the name of this arc may depend on the verb. Similarly, one can introduce some nodes for a direct object of the verb, etc. A frame for the verb "to buy" would include the buyer, the seller, the object and the money (and probably something else, e.g., a tax), each both as a word (because a word corresponding to the respective role can be syntactically attached to the verb) and as its 'reference'. We have to distinguish between the syntactic and semantic entities (words and their 'references') because two distinct words may refer to one same object.
Of course, most words are ambiguous, and each word may have several alternative syntactic and semantic frames. I do not consider the problem at this stage; instead I'll think that the correct alternative has been chosen somehow.
With frame copies for all occurrences of words in the text one can proceed to build a net for the whole text. This is accomplished by merging certain nodes from different frame copies. E.g., the node for the subject to the verb introduced by the verb frame should be merged with the node from the actual noun serving as the subject (supposing that such a noun is present; if the subject were omitted the node would still be present). Again, determining which nodes should be merged is a complex and ambiguous process, but let us think that it has been already done. Then we get an association net describing the "current situation", and this net can be further used to develop a response action.
Representing a situation by means of an association net might look as an over-simplification. A logician might substitute variables for the nodes and two-place predicates for the arcs and then say that the power of this language is restricted to mere conjunctions of simple atomic formulas. This might have been the case if the net were construed as a representation of the meaning. It is not. The function of the net is to serve as an internal representation of data while the meaningful part of the work is building and modifying the net, probably utilizing procedures found in the net.
A somewhat more complex example of relations between the net and the "meaning" is when a text represents several elementary situations, e.g., a real situation compared with a similar imaginary situation ("a car travelled a distance of ... in ... hours, but if its speed were ... mph higher, it would take ... hours less", etc.), or one situation introduced as a condition for another situation, etc. The respective association net would have special nodes to represent the situations; a node representing an action might have a pointer to the situation it belongs to, and a conditional relation between two situations might be represented by a special node. Different (and possibly incompatible) situations might have some nodes in common. Quantification (including quantifier scopes) might be represented by means of dependencies between nodes corresponding to variables, etc. Thus the representation of a complex logical statement would reflect its syntax rather than its truth value or its derivation. It is up to logicians to decide whether it has anything to do with the "meaning" they want to grasp.
If the goal is to construct a response action this method of representation may be quite useful. Some of the objects (or actions) represented in the net might be actually observed at the current moment (e.g., some objects might have their actual co-ordinates in space, or pointers to database records, etc.), and some immediate action related to those objects might be possible (or required). This cannot be done for objects that are only mentioned as possible rather than actual, and all one can do is to keep the information in this nearly syntactic form until such objects actually appear in the scene.
It is interesting to note that a number of words in the natural language imply a comparison of several situations. E.g., the meaning of "being late" implies a situation when one comes (or does something else) in time, which is an imaginary situation but a "due" one, compared with the "actual" situation where the participants are the same but the moment of the action is different, and occupies a later position on the time scale. One of the meanings of the word "to remain" ("there were three apples, one was eaten and two remained") implies a comparison of three similar situations, one for "the whole", another for "the first part" and one more for "the remainder". Some syntactic constructs of natural languages also imply a kind of comparison. If we are going to build association nets out of such words or constructs their predefined frames should contain several situations.
Now I proceed to the question how to build a connected net out of copies of predefined frames for separate words, that is, how to decide which nodes of different copies should be actually one same node. This is the most interesting part of the research, because it is here that various properties of words and syntactic constructs are manifest, and this approach offers a precise language to describe these properties. Sometimes what we usually think to be a subtle shade of meaning inherent in the word turns out to be a piece of technical information used in the process of node merging. In the preceding discussion I introduced separate nodes for words and for objects or actions referred to by the words. If one knew in advance which nodes have to be merged the net could be restricted to its "semantic" part, and there would be no need in nodes representing the words (unless the text is about words). The real function of the syntax is to control this process of combining the "meanings" of words in order to produce the "meaning" of the text.
The simplest case when syntactic relations help to build semantic connections is when a frame contains semantic relations between the 'references' of the words involved. This case was considered earlier. If an occurrence of the verb "to buy" has as its direct object a certain occurrence of the word "book", then the node representing the latter occurrence will be merged with the 'direct object' node of the frame introduced by the verb. This merger will propagate along the 'reference' arc and this will force the node representing the book itself to merge with the respective node in the verb frame, for which there is a predefined connection with the node representing the act of buying/selling. Thus we infer that the 'merchandise' in this act is the book.
One more example of semantic relation directly forced by the syntax is in the sentence "I have a nice neighbour" (the word "have" doesn't imply here that my neighbour is my property). A "neighbour" is always a "neighbour of someone/something", and the meaning of the verb "to have" (in various domains) is the transformation "I have a neighbour" -> "a neighbour of mine" (cf. "I have a knife" -> "a knife of mine").
There are some more complex syntactic patterns that also force semantic nodes to form exactly defined relations. But in general the relationships between syntax and semantics are not quite straightforward. The other extreme is merging two semantic nodes introduced by completely unrelated words (indeed, the two words may belong to different sentences). The reason for doing so is purely semantic: an object is mentioned once, and a similar object is mentioned later, thus they might be one same object. They might be not, especially if they have some incompatible characteristics or are already connected by a relation that requires that they be distinct. There can be syntactic obstacles as well (e.g., an indefinite article preceding the second occurrence of the noun). However if there are no explicit obstacles it is natural to think that the two objects are one same object. A technique that works well in very simple problem areas is dividing all semantic entities in a small number of predefined "semantic classes"; then two nodes with identical semantic class markers are likely candidates for merging.
The semantic nodes to be merged need not be directly represented by words occurring in the text. On hearing a sentence like "he sold his car and bought a video" we tend to think that the money received from the first deal were used in the second deal, though there was no mention of money in either case. This is an example of how this system of representation clarifies otherwise obscure relationships. (Of course, the conclusion that the money is the same is not definitive, and is only acceptable when none of its immediate consequences contains a contradiction; e.g., we wouldn't accept this conclusion so easily if the order of the two deals were reversed.)
There is a number of linguistic instruments of identification that lie between the two extremes, "forced" merging and "free" merging. The most typical situation is that syntax does not impose a forced merger but defines a portion of the text where a candidate for merging can be found. This is characteristic of "weak" syntactic connections, prepositional phrases, etc. E.g., by saying "in this book the author proves a number of theorems" we don't imply that the author is sitting inside the book, but we do imply that the person is the author of this book. 'Author' is a two-place relation, and the corresponding frame will contain both a node for the author and a node for the product of his/her work. The latter remains "unresolved" until another node is found to be merged with it, in this case it is the 'reference' of the word "book". The syntactic position of this word makes it a probable candidate for such a match. Cf. "in this book the conclusion summarizes the author's views": both the conclusion and the author are "of the book"; similarly, "a book with a long preface" produces the same type of relation, but with a different syntactic construct.
Using "anaphoric" words that imply a reference to the preceding text (like "the", "that", "it", "such", etc.) is one more instrument for identification of objects referred to by different words. Apart from semantic restrictions on what nodes can be merged, such words impose syntactic restrictions in terms of words introducing the semantic nodes rather than in terms of the nodes being merged.
An interesting (and difficult) class of constructs are "parallel" constructs used to combine or to compare two objects. Semantically, such constructs contain a binary relation in which the place fillers are supposed to be of similar types. In the text one of the fillers is expressed in full while the other is usually shortened. Consider sentences like "the distance between the new and the old buildings is ... ", "my lecture is shorter than his", "my lecture is shorter than usually", etc. The second member (the incomplete one) is represented by no more than a single word ("the old", "his", etc.). The semantic representation of that member is obtained by making a copy of the representation of the first member and then replacing some of its parts by explicitly introduced objects from the incomplete member. Thus "the new building" becomes "the old building", "the duration of my lecture" becomes "the duration of his lecture". In the last example the part superseded by "usually" is only implied: "this time".
This analysis suggests a method of representation of the syntactic structure of such sentences, which are usually treated as elliptic. The words actually present in the incomplete member have no usual syntactic connections to the rest of the sentence, and there is no point in reconstructing them by inserting omitted words. The important thing is to show for each word in the incomplete member which portion of the first (complete) member will be replaced by nodes introduced by this word. In other words, it is necessary to show for a word in the incomplete member its "prototype" (if any) in the complete member.
Semantic criteria of merging, free of syntax or partially forced, can also depend on special features of the words introducing the nodes. An obvious restriction is that a merger can be prohibited due to incompatibility of semantic classes or of other attributes. Some other interesting criteria of merging are asymmetric (though a merger itself is a symmetrical act). Some words produce nodes that are in an active search of a merging partner, for the word would make no sense without such a merger. The word "author" (from a previous example) cannot be used unless it is implied in some way "of what" the author is. Thus the respective frame will have a semantic node representing the 'author' (the 'reference' of the header node) and another node for the 'product'; the latter will have to find a merging partner.
There is another characteristic of a node, viz., its 'accessibility' as a merging partner. Consider two sentences: "a train ran from Leningrad to Moscow" and "a train left from Leningrad to Moscow"; both describe the same class of events because a running train surely left some time ago, and if it left it was running at least for some time after it. If asked about the difference, one might reply vaguely that in the second instance a sort of "emphasis" is made on the initial state of the motion. But the difference becomes clear when one starts the next sentence with "two hours later ..."; it makes sense with the second sentence but not with the first. A more formal explanation is that the node representing the initial state of the motion (which probably must be present both in the frame for "to leave" and "to run") in the second case is accessible to nodes in a search for a merging partner, but not in the first case. However, in both cases the node can participate in mergers due to other reasons.
One more reason for nodes to merge seems to lie outside linguistics. Consider the text "a train left at 10 p.m. and arrived at 6 a.m.". If it is in a school sum we know that one is expected to compute something from it, probably the time the travel took. But to do it one must know something about the days to which both moments belong. They cannot belong to the same day because the arrival cannot precede the departure. The assumption that the day of arrival immediately follows that of departure is taken as the simplest option in view of the need to make some assumption in order to solve the problem.
The net obtained when all mergers are done is expected to be used to build a response to the text, as noted above. It also looks like a sort of representation of the "meaning" of the text but it is too problem-dependent for this role. A question was mentioned before, whether the frame for the verb "to buy" should contain a node to represent the sales tax. The answer depends on the application. Some applications will require it as well as nodes to represent the type of currency and the method of payment, while in other applications even the 'seller' node may be irrelevant. There seems to be as many frames for the verb as there are applications, and the fact that some of them share some arc names in the respective frames allows to combine different views of the same situation.
This system is partly implemented on a computer, using a rather old implementation of association nets with the necessary primitives for fetching frames, merging nodes, calling procedures, restricted backtracking, etc. (it works very slowly because it spends a lot of time loading frames from files).
The overall control structure is as follows. Dictionary search and treatment of inflections are a separate step. Then all primary frames are brought into memory, including all variants for ambiguous words. There are no separate stages of disambiguation and syntactic parsing, all this is interleaved with semantic processing. Sometimes a syntactic connection is set first, and semantic mergers are derived from it, sometimes it is the other way round; if a semantic connection is found, it still has to be checked against the syntax to prevent misunderstanding. So a perfect understanding of the text must include both syntax and semantics.
The active elements of the system are the "hypotheses" (or predictions) contained in the frames. Each hypothesis has its "goal node" for which it attempts to find a merging partner and a procedure to do it (normally it is one of a few standard procedures). After the merger some additional actions are taken to check its validity, including invoking other hypotheses and some problem-oriented semantic processing, of which any can fail. Some hypotheses are initially marked as incompatible, and whenever one of them becomes active the others are blocked (they are unblocked if the first hypothesis fails). At first the hypotheses are ordered for execution according to their predefined priorities, but this order is modified due to mutual invocation. Pragmatic processing (problem solving) can be partly embedded in the hypotheses (thus providing a feedback to syntax) and partly done at the final stage.
As a whole, this work cannot claim to give a complete description of syntax, semantics, or their relationships (indeed, only a small part of syntax has been actually studied). But it offers a language to describe such relationships and to put clear questions to what would otherwise be written off to "intuition". It also shows how formal methods can be applied to the natural language without restricting it to a unique all-embracing formal model. And, the last, it shows how meaningful processing of meaning in non-mathematical domains can be done without the standard concepts and instruments of logical inference.
This work is a part of a team project, and I must mention here my colleagues N.A.Krupko, T.N.Nevleva, I.M.Novitskaya, L.N.Smirnova and M.M.Zheleznyakov who did much work with various problem domains. A previous overview of this project is found in [1], a discussion of the "de-centralized" approach to language is in [2], for details of programming with association nets see [3].
1. M.M.Zheleznyakov, T.N.Nevleva, I.M.Novitskaya, L.N.Smirnova, G.S.Tseytin. An attempt of development of a "text -> reality" model using association nets (in Russian). In: Mashinnyi fond russkogo yazyka: predproektnye issledovaniya, Institut russkogo yazyka AN SSSR, Moscow, 1988, 140-167.
2. G.S.Tseytin. On relationship between natural language and formal model (in Russian). In: Voprosy kibernetiki (obshchenie s EVM na estestvennom yazyke), Moscow, 1982, 20-34.
3. G.S.Tseytin. Programming with association nets (in Russian). In: EVM v proektirovanii i proizvodstve, vyp.2, "Mashinostroenie", Leningr.otd., Leningrad, 1985, 16-48 (G.V.Orlovsky, ed.)