Learning @ Georgetown

Change font size: A A A

Filling In the Web's Historical Gaps

Dr. Graham Katz

Dr. Graham Katz is recruiting Georgetown students to help with his summer research project. (Photo: Roland Dimaya)

By Megan Weintraub

As we all know, the Internet is a rich source of online news. However, when we search for information about a particular historical event, we can run into problems understanding when the event actually took place. This is because news stories often use words such as “today” or “Thursday” instead of dates to display temporal information. For example, a story about the fall of the Berlin Wall might state, “The Berlin Wall fell on Thursday,” but without a reference to the particular date, November 9, 1989, we still fail to understand when the wall came down.

To address this confusion, Dr. Graham Katz builds computational modeling systems that can sift through a text and pull out information to fill in the informational gaps for the computer reading the story. For instance, the system would scan the article about the Berlin Wall and attempt to connect the word “Thursday” with the date the article was published so that when a human asks, "When did the Berlin wall fall?" the computer would return with “November 9, 1989.”

“We’re trying to build an accurate representation of how human language speakers really understand what they’re reading,” explains Dr. Katz. “There are many cases in which getting things right relies on the modeling system doing a lot of understanding. But before that can even happen, a lot of structural information needs to go into forming an overall understanding of an event.”

Since 2002, Dr. Katz has been involved in the development of a large-scale markup language project called TimeML. He looks for the type of information that would be useful for a computer to know before it tries to make sense of dates and events, such as how verbs relate to other parts of the sentence. The system includes seven different “relations,” such as “before,” “after,” and “simultaneously,” that indicate a particular relationship between events in a text.

TimeML serves as the basis for automatic event recognition programs, a fancy term for a complex computer system that asks a series of questions about a text. First, it looks at event identification—“How many events are being described by this text?” “What are they?” Next, it looks at relation identification—“How are the events related?” “Did one precede the other, or did they overlap?”

This summer, Dr. Katz has secured a research grant to work on an application of TimeML to narrative texts, such as novels, instead of news sources. This is an important step in expanding the system’s understanding of language because events in narrative texts tend to work differently than events in the news. For instance, newspaper texts tend to describe events in order of importance, with the important events described first and the background events described later, while in narratives, events are described in order. By studying novels in conjunction with newspaper texts, researchers hope to better understand our uses of everyday speech.

“We’re hoping that as we apply TimeML to the format of the novel, we will learn information about event language that will inform our study of each,” Dr. Katz says.

Dr. Katz hopes to recruit undergraduate researchers at Georgetown to help with this ambitious and important project.

Print Article

Feature Story

Related Stories