Sunday, December 26, 2010

Semantic Suggestion

The weakest point of the keyword query interface for structured information is the vocabulary mismatch problem between the user's vocabularies and the vocabularies allowed by the search system. In a certain domain there exist simply too may natural language expressions about a concept. It is not possible the search system understand all the natural language expressions.

Even in a very restricted domain, there are too many concepts. But the concepts understood by the search system are usually a very small subset of the whole domain concepts. So most of the user queries will fail. This phenomenon is called overshooting the capabilities of the search system.

NaverLab Semantic Movie Search provide semantic query suggestion to mitigate the overshooting problem. The semantic auto-completion helps to mitigate the problem. The semantic auto-completion only works for single object. But the semantic suggestion works for multi-keyword queries.

While typing the first word in the search box, the auto-completion is activated. If the first word is completed and a space is inserted, the semantic suggestion starts to be activated. The semantic suggestions are the multi-keyword queries that have answers in the database. The suggested queries are not plain texts but object queries.

The following is a sequence of the semantic suggestions starting from "gladiator". If a space is inserted after "gladiator", two-word queries are suggested. Select one of the suggested queries and insert a space. Then three-word queries are suggested. In maximum 5-word queries are suggested.

(step 1) Two-word query suggestions
(step 2) Three-word query suggestions
(step 3) Four-word query suggestions

Semantic Auto-completion

NaverLab Semantic Movie Search provides the semantic auto-completion of the keyword typed in the search box. It works just like general keyword auto-completion but the auto-completed items in the semantic auto-completion are not simple texts but objects. The objects are one of People, Movie, School, Company, Award, or Country.

There are two different modes of the auto-completion: vertical mode and horizontal mode. They work similarly but the horizontal mode provides more suggested items. In the vertical mode, maximum three objects of Movies, People and Etc (School, Company, Award, Country) are displayed respectively. In total maximum 9 objects are allowed. In the horizontal mode, maximum three objects of People, Movie, Etc are visible respectively like in the vertical mode but more objects are available when scrolled down.

The below are the examples of the auto-completion in the vertical mode and the horizontal mode respectively when "gladiator" is typed in.

Saturday, December 25, 2010

Going back and forth between Object Search and Visual Data Browser

Fig.1. Linking between Object Search and Visual Data Browser

In NaverLab Semantic Movie Search, user goes back and forth between Object Search and Visual Data Browser. Search for the object of interest using Object Search and explore related information of the object using Visual Data Browser.

User initiates the search from Object Search and find the object of his/her interest with the keyword search. Then he/she expands the object to further explore. In order to expand the object there are two ways. One is to click the [expand + more] button (Refer to the arrow of No. 1 in Fig. 1). Another is to click the object itself and popup the property menus, then click one of the properties of the object. (Refer to the arrow of No. 2 in Fig. 1)

While browsing the data space, user decides to search for new object. He type in keywords in the search box and click the search button. (Refer to the arrow of No. 3 in Fig. 1) This brings him back to Object Search. Another method for coming back to Object Search is to click the object itself and popup the property menus, then click the small magnifier icon inside the thumbnail image. (Refer to the arrow of No. 4 in Fig. 1)

Keyword search and Object search

Fig. 1. Transforming the keyword query to the object query using the auto-completion

Keyword query is the most common query interface for text information. Especially in Web era the keyword query interface is ubiquitous for all kinds of searching. The keyword search is easy to use and very effective for searching the Web. Users got so accustomed to the keyword search. It would not be easy to develop a new query interface so as to replace the keyword query interface.

The keyword search is to match a keyword query to the text surrogate of the objects. In NaverLab Semantic Movie Search, a keyword query is transformed to the multiple object queries. This is because of the ambiguity of natural language.

On the other hand the object search is to match an object query directly to the objects. So the keyword-object mapping is not necessary. There are no ambiguities caused by natural language. But there are ambiguities caused by the structure of the RDF graph.

In NaverLab Semantic Movie Search, users use both the keyword query and the object query. The object query is used when the auto-completion is applied. (Fig. 1) After auto-completion, the keywords in the search box look like just plain strings but actually they are objects.

It is difficult for users to notice the difference between the keyword search and the object search. Let's see the difference with the query "avatar". Fig. 2 is the search result of the keyword query "avatar" and Fig. 3 is the search result of the object query after auto-completion.

Fig. 2. The search result of the keyword query "avatar"

Fig. 3. The search result of the object query "avatar" after auto-complection

Thursday, December 23, 2010

From keywords to objects

Fig. 1. Three meanings of the query "Ben Hur"

In NaverLab Semantic Movie Search, user always initiates search from the keyword interface. This is for the disambiguation of the meaning of the keyword query. If the meaning of the query is ambiguous, in other words there are more than two meanings, all the search results of the possible meanings are displayed. Then user decide which meaning to further explore.

If there remains only one meaning after the disambiguation, it is possible to launch directly the visual data browser. But in NaverLab Semantic Movie Search, the keyword interface (Object Search) is chosen to launch anyway. This is just the problem of policy. We thought this way is more intuitive and usable for users to use.

Let's run the query "Ben Hur". There are three multiple meanings and three movies that have the same title "Ben Hur" are returned. (Fig. 1) Let's see another query that have only one meaning. The search result of "shawshank redemption" is in Fig. 2. Only one movie is returned but other expanded query results (director, leading actors, etc) follow after. This is just for user's convenience.

Fig. 2. In the case of single meaning, the results of the expanded queries are shown together for user's convenience

Keyword query interface for the object search

Fig. 1. An ambiguous query. Two movie objects for the query "avatar".

The keyword interface is very convenient and indispensable for the object search. Without the keyword interface, the query must be formulated by the objects. If the query patterns are simple and the number of objects are relatively small, some menu-based or visual interface may be useful. If the number of objects are big and the query patterns are very various, however, the menu-based and the visual interface approach are getting more complex and less usable.

The downside of the keyword interface is the semantic ambiguity. User may represent the same meaning with too many different natural language expressions. Different spellings, different word orders, synonyms, etc. The usable keyword interface should handle all these language variations properly.

Definitely the semantic ambiguity is unavoidable. There are multiple movies with the same title and many peoples with the same name. Let's run the query "avatar", there exist two movies with the title. (Fig. 1)

Fig. 2. The semantic auto-completion

The easiest way to disambiguate the meaning of the query is for user by himself/herself to specify the intended meaning explicitly. In the semantic auto-completion (Fig. 2), if user choose the Avatar of James Cameron explicitly, the search result now gives more additional information about the movie object. (Fig. 3)

Fig. 3. The search result after selecting the Avatar of James Cameron in the auto-complection

Monday, December 20, 2010

Visualizing the concept of object

Fig. 1. Movie object, People object, Award object, Country object, School object (from the top)

NaverLab Semantic Movie Search is object search. In the object search the unit of information is object and the target of search is also objects. There are Movie objects: Avatar, Gladiator, Titanic, etc. There are People objects: James Cameron, Leonardo DiCaprio, Russell Crowe, etc. There are also Award objects, Country objects, and School objects, etc.

The search result of NaverLab Semantic Movie Search is always a set of the 5 kinds of objects (Movie, People, Award, Country, School). NaverLab Semantic Movie Search explicitly visualize the concept of the objects. Regardless of the kind(class) of the objects the same form of visualization is adopted. In Fig. 1 the samples of the five kinds of the objects are shown. A thumbnail image is on the left side and three important properties and values are on the right side. The Country and School objects do not have any properties.