Learning @ Georgetown

Change font size: A A A

Now You're Reading My Language: Dr. Graham Katz and the Search for Semantic Meaning

By Megan Weintraub

We use the Internet every day to perform tasks as varied as reading the news, buying concert tickets, or finding the number of feet in a mile. As soon as we submit our request to a search engine, it prowls the Web looking for information to satisfy our curiosity. However, if not for the act of typing “number of feet in a mile” and hitting submit, we would never actually retrieve any results. In other words, this exercise relies on human action of some type. Computational linguists, such as Dr. Graham Katz, an assistant professor in the Department of Linguistics at Georgetown, are working on new ways to teach machines how to mimic the logic in human brains in order to enrich the type and scope of information we can find on the Web.

As any researcher working in the field of artificial intelligence will tell you, teaching machines how to understand human language is no small task. Our sentences are complex and they often bury bits of meaning, also known as semantic information, under layers of verb tense and grammatical structure. This is especially the case with sentences that describe particular events in the past or future.

“As we learn a new language, we come to understand the signals that reveal information about the meaning of a sentence,” Dr. Katz explains.

For example, in order to understand the sentence, “I told my friend about my skydiving trip,” we rely on our knowledge of verb tense to interpret the two events embedded in the statement—a conversation between the speaker and his friend, and the skydiving trip itself. From the properties of the sentence, we understand that the skydiving trip preceded the conversation between the friends.

Even the most sophisticated search engines on the Web do not always excel at making this distinction. Dr. Katz examines the intricacies of examples like these and then builds computer modeling systems that can tease apart the various parts of the sentence to glean more information about the semantic meaning. His research falls into two areas: theoretical semantics and natural language processing, both of which offer practical applications to processes we use in our digital world. For instance, the engineers who developed the Google search engine created a complex mathematical system that makes connections between bits of information on the Web without relying on human input.

“Google knows how to display the most relevant information you can find on the Web,” explains Dr. Katz. “Computational linguistics has gotten a big boost from the rise of the Internet, and Google is just one example of how we rely on statistical algorithms every day.”

In addition to enriching the information we retrieve from the Web, researchers in these fields are also using their findings to build systems that search for event clues embedded in online texts, such as websites. One area in which this is highly relevant is in monitoring systems that track online terrorist activity. Since 9/11, governments and corporations have invested significant funding into these systems, which work by turning our knowledge of language into a semantic lexicon, or a virtual warehouse of information. Computer modeling systems can then use the semantic lexicon as a reference, much like we would use a dictionary to define a word. When the computer modeling system reads an online text, such as a news source, it can pick out relevant bits of information.

Computer modeling systems can also produce important insight into the nature of the information conveyed in a sentence from its adverbs. For example, in the sentence, “I slowly told my friend about my skydiving trip,” “slowly” refers to the conversation between friends. The fact that it appears in the sentence indicates to us that “telling” events can be done at different speeds. Dr. Katz has investigated how the properties of adverbs can make it possible for a computational system to “learn” the meanings of words. For example, one of the most important things for a computation system to figure out is which sentences in a text describe events—“I drove to Boston last Monday”—and which characterize background information—“I live in Boston.” This can be done automatically by taking advantage of what Dr. Katz has dubbed the Stative Adverb Gap, which describes the fact that some adverbs, like “quickly,” occur in event sentences but not in state sentences.  By observing which adverbs appear with which verbs, a computational system can infer which describe events and which describe states.

“Event-describing sentences make implicit reference to an event,” explains Dr. Katz. “As human speakers of language, we already intuitively understand the stative adverb gap.”

This distinction is relevant to researchers because it allows them to build logic into the computer modeling systems that truly captures the ways that humans process language.

Dr. Katz, a recent addition to the Georgetown community, contributes to a long tradition of inquiry into research and pedagogy in the field of semantics. In fact, Georgetown boasts an unusually high number of formal semanticists who work closely on these highly relevant issues. He looks forward to co-teaching a seminar next year with Dr. Paul Portner, a fellow professor of linguistics.

“We’re aiming to make Georgetown a powerhouse in semantics and computational linguistics,” he says. “So much of our world today relies on searching and modifying semantic information, and this is a very exciting time to be working this field, especially at Georgetown.”

Print Article

Related Stories