Collective Intelligence and Semantic Web: 2010

Sunday, December 26, 2010

Semantic Suggestion

The weakest point of the keyword query interface for structured information is the vocabulary mismatch problem between the user's vocabularies and the vocabularies allowed by the search system. In a certain domain there exist simply too may natural language expressions about a concept. It is not possible the search system understand all the natural language expressions.

Even in a very restricted domain, there are too many concepts. But the concepts understood by the search system are usually a very small subset of the whole domain concepts. So most of the user queries will fail. This phenomenon is called overshooting the capabilities of the search system.

NaverLab Semantic Movie Search provide semantic query suggestion to mitigate the overshooting problem. The semantic auto-completion helps to mitigate the problem. The semantic auto-completion only works for single object. But the semantic suggestion works for multi-keyword queries.

While typing the first word in the search box, the auto-completion is activated. If the first word is completed and a space is inserted, the semantic suggestion starts to be activated. The semantic suggestions are the multi-keyword queries that have answers in the database. The suggested queries are not plain texts but object queries.

The following is a sequence of the semantic suggestions starting from "gladiator". If a space is inserted after "gladiator", two-word queries are suggested. Select one of the suggested queries and insert a space. Then three-word queries are suggested. In maximum 5-word queries are suggested.

Auto-completion

(step 1) Two-word query suggestions

(step 2) Three-word query suggestions

(step 3) Four-word query suggestions

Semantic Auto-completion

NaverLab Semantic Movie Search provides the semantic auto-completion of the keyword typed in the search box. It works just like general keyword auto-completion but the auto-completed items in the semantic auto-completion are not simple texts but objects. The objects are one of People, Movie, School, Company, Award, or Country.

There are two different modes of the auto-completion: vertical mode and horizontal mode. They work similarly but the horizontal mode provides more suggested items. In the vertical mode, maximum three objects of Movies, People and Etc (School, Company, Award, Country) are displayed respectively. In total maximum 9 objects are allowed. In the horizontal mode, maximum three objects of People, Movie, Etc are visible respectively like in the vertical mode but more objects are available when scrolled down.

The below are the examples of the auto-completion in the vertical mode and the horizontal mode respectively when "gladiator" is typed in.

Saturday, December 25, 2010

Going back and forth between Object Search and Visual Data Browser

Fig.1. Linking between Object Search and Visual Data Browser

In NaverLab Semantic Movie Search, user goes back and forth between Object Search and Visual Data Browser. Search for the object of interest using Object Search and explore related information of the object using Visual Data Browser.

User initiates the search from Object Search and find the object of his/her interest with the keyword search. Then he/she expands the object to further explore. In order to expand the object there are two ways. One is to click the [expand + more] button (Refer to the arrow of No. 1 in Fig. 1). Another is to click the object itself and popup the property menus, then click one of the properties of the object. (Refer to the arrow of No. 2 in Fig. 1)

While browsing the data space, user decides to search for new object. He type in keywords in the search box and click the search button. (Refer to the arrow of No. 3 in Fig. 1) This brings him back to Object Search. Another method for coming back to Object Search is to click the object itself and popup the property menus, then click the small magnifier icon inside the thumbnail image. (Refer to the arrow of No. 4 in Fig. 1)

Keyword search and Object search

Fig. 1. Transforming the keyword query to the object query using the auto-completion

Keyword query is the most common query interface for text information. Especially in Web era the keyword query interface is ubiquitous for all kinds of searching. The keyword search is easy to use and very effective for searching the Web. Users got so accustomed to the keyword search. It would not be easy to develop a new query interface so as to replace the keyword query interface.

The keyword search is to match a keyword query to the text surrogate of the objects. In NaverLab Semantic Movie Search, a keyword query is transformed to the multiple object queries. This is because of the ambiguity of natural language.

On the other hand the object search is to match an object query directly to the objects. So the keyword-object mapping is not necessary. There are no ambiguities caused by natural language. But there are ambiguities caused by the structure of the RDF graph.

In NaverLab Semantic Movie Search, users use both the keyword query and the object query. The object query is used when the auto-completion is applied. (Fig. 1) After auto-completion, the keywords in the search box look like just plain strings but actually they are objects.

It is difficult for users to notice the difference between the keyword search and the object search. Let's see the difference with the query "avatar". Fig. 2 is the search result of the keyword query "avatar" and Fig. 3 is the search result of the object query after auto-completion.

Fig. 2. The search result of the keyword query "avatar"

Fig. 3. The search result of the object query "avatar" after auto-complection

Thursday, December 23, 2010

From keywords to objects

Fig. 1. Three meanings of the query "Ben Hur"

In NaverLab Semantic Movie Search, user always initiates search from the keyword interface. This is for the disambiguation of the meaning of the keyword query. If the meaning of the query is ambiguous, in other words there are more than two meanings, all the search results of the possible meanings are displayed. Then user decide which meaning to further explore.

If there remains only one meaning after the disambiguation, it is possible to launch directly the visual data browser. But in NaverLab Semantic Movie Search, the keyword interface (Object Search) is chosen to launch anyway. This is just the problem of policy. We thought this way is more intuitive and usable for users to use.

Let's run the query "Ben Hur". There are three multiple meanings and three movies that have the same title "Ben Hur" are returned. (Fig. 1) Let's see another query that have only one meaning. The search result of "shawshank redemption" is in Fig. 2. Only one movie is returned but other expanded query results (director, leading actors, etc) follow after. This is just for user's convenience.

Fig. 2. In the case of single meaning, the results of the expanded queries are shown together for user's convenience

Keyword query interface for the object search

Fig. 1. An ambiguous query. Two movie objects for the query "avatar".

The keyword interface is very convenient and indispensable for the object search. Without the keyword interface, the query must be formulated by the objects. If the query patterns are simple and the number of objects are relatively small, some menu-based or visual interface may be useful. If the number of objects are big and the query patterns are very various, however, the menu-based and the visual interface approach are getting more complex and less usable.

The downside of the keyword interface is the semantic ambiguity. User may represent the same meaning with too many different natural language expressions. Different spellings, different word orders, synonyms, etc. The usable keyword interface should handle all these language variations properly.

Definitely the semantic ambiguity is unavoidable. There are multiple movies with the same title and many peoples with the same name. Let's run the query "avatar", there exist two movies with the title. (Fig. 1)

Fig. 2. The semantic auto-completion

The easiest way to disambiguate the meaning of the query is for user by himself/herself to specify the intended meaning explicitly. In the semantic auto-completion (Fig. 2), if user choose the Avatar of James Cameron explicitly, the search result now gives more additional information about the movie object. (Fig. 3)

Fig. 3. The search result after selecting the Avatar of James Cameron in the auto-complection

Monday, December 20, 2010

Visualizing the concept of object

Fig. 1. Movie object, People object, Award object, Country object, School object (from the top)

NaverLab Semantic Movie Search is object search. In the object search the unit of information is object and the target of search is also objects. There are Movie objects: Avatar, Gladiator, Titanic, etc. There are People objects: James Cameron, Leonardo DiCaprio, Russell Crowe, etc. There are also Award objects, Country objects, and School objects, etc.

The search result of NaverLab Semantic Movie Search is always a set of the 5 kinds of objects (Movie, People, Award, Country, School). NaverLab Semantic Movie Search explicitly visualize the concept of the objects. Regardless of the kind(class) of the objects the same form of visualization is adopted. In Fig. 1 the samples of the five kinds of the objects are shown. A thumbnail image is on the left side and three important properties and values are on the right side. The Country and School objects do not have any properties.

Sunday, December 19, 2010

Composing Complex Queries in the Visual Data Browser

Fig. 1. Avatar - Director - Movie directed - Cast

There is a very powerful search capability in the visual data browser of NaverLab Semantic Movie Search. User may compose complex queries and run them instantly in the visual data browser. The complex querying is mainly made possible by the capability of displaying the 4 most recent nodes of the browsing history path.

It is not difficult to add various complex query composing capabilities to the visual interface. What is very difficult is to make the complex query interface easy for general users to understand and use. What can't be used would be of no use at all.

Let's see a complex query composing scenario. Start browsing with Avatar and follow the director link. Next follow the cast link. The result is in Fig. 1. The third column shows all the movies James Cameron have directed. The fourth column shows all the castings of all the Cameron's movies. Select the Leonardo DiCaprio then the objects in the 3rd column changes. Only Titanic remained. (Fig. 2) The 3rd column is the search result of the query "Movies that James Cameron directed and Leonardo DiCaprio appeared"

You may deselect Leonardo DiCaprio by clicking the x-button and reselect another object. So you can easily change the actor to see what Cameron's movies he appeared in. Select Arnold Schwarzenegger. In the 3rd column, True lies, Terminator 2, Terminator are remained. (Fig. 3)

Fig. 2. Selecting Leonardo DiCaprio restricts the objects in the 3rd column to the movies that Cameron directed and DiCaprio appeared.

Fig. 3. Selecting Arnold Schwarzenegger restricts the objects in the 3rd column to the movies that Cameron directed and Schwarzenegger appeared.

Saturday, December 18, 2010

Set-based VS. Entity-based Browsing

Fig. 1. The class browser panel and the object browser panel

In the visual data browser of Naver Semantic Movie Search, user may explicitly select the browsing mode between set-based or entity(object)-based. The class browser panel can be closed or opened. (Fig. 1) If the class browser panel is closed, the browsing mode is changed into the entity-based mode.

The set-based data browsing is possible in the set-based mode. In the entity-based browsing mode, you may not initiate browsing from a set of entities but may only initiate browsing from a single entity. So, in order to keep browsing, a specific object(entity) must be selected.

Let's start browsing with "Ridley Scott" in the entity-based mode. Assume we are at the second column. With the class browser closed, no property links are shown to follow. A specific object must be selected. Select the Gladiator object. If you click the image of the Gladiator object, the labeled links are popped up and the other objects are dimmed (disabled). (Fig. 2) Select the cast link. Now, in the 3rd column all the actor or actress appeared in Gladiator are displayed. (Fig. 3)

Users may have hard time to understand the set-based browsing. It might be a good idea that the entity-based mode as the default. But currently the default is the set-based mode.

Fig 2. Select the Gladiator object, then the labeled links are popped up and other objects are dimmed.

Fig 3. Follow the cast link of Gladiator, then all the actors are displayed in the 3rd column.

Set-based Browsing

Fig 1. Start browsing with "Christopher Nolan"

The set-based data browsing demonstrated in Parallax data browser is also possible in our visual data browser. Differently with following from a single entity to a single entity of the web browsing, the set-based browsing is following from a set of entities to a set of entities. The set-based browsing is one of the fundamental characteristics of data browsing.

You start with Christopher Nolan. (Fig 1) Then follow the link labeled movies he directed. Now you have all the movies he directed. (Fig 2) Next from the set of the movies you follow the link labeled cast of the movies. This time you have all the actors or actresses who appeared in the Christopher Nolan's movie. (Fig 3) You can keep going like this.

The set-based browsing is very powerful and useful search capability. Without the set-based capability the aggregation of data would be very tedious and laborious.

Fig 2. After following the movies he directed, the 2nd column shows all the Nolan's movies.

Fig 3. After following the cast of the movies. The 3rd column shows all the actors who appeared in all the Nolan's movie.

Object-Object relation search

Fig. 1. The search result of "Angelina Jolie Brad Pitt"

The keyword interface of NaverLab Semantic Movie Search, given two objects in the query, search for the possible relationships between two objects. This is very useful search capability. Haven't you ever been curious if some two actors or actresses appear in the same movie? Haven't you ever been curious if a director and an actor made the same movie?

Let's run the query "Angelina Jolie Brad Pitt". The search result is on Fig 1. There are two results. There is only one movie that the two appears together both as a leading actor and a leading actress. It is "Ms. & Mrs. Smith" in 2005. There is another movie "Mighty Heart" that Brad Pitt participated as a producer.

Searching the relation among more than two objects is possible. Leonardo DiCaprio and Kate Winslet appear in Titanic (1997). User finds that Kathy Bates also appears in the movie as a supporting actress. User gets curious if there are other movies the three appear together. The query "Leonardo DiCaprio Kate Winslet Kathy Bates" have the answer. The answer is Revolutionary Road (2008). (Fig. 2)

Fig. 2. The search result of "Leonardo DiCaprio Kate Winslet Kathy Bates"

Friday, December 17, 2010

Keyword query interface for structured data

Fig1. The search result of "Actress from France"

Are you curious what actor or actress came from France? Are you curious what actor graduated from Stanford? What actor or actress came from the same school with me? What movies got 2010 academy awards? There are endless possible questions about the facts in the movie database. Actually all the answers of the above questions exist in the movie database like IMDB. However it's not easy at all to search for the answers for all the possible questions.

Fig 2. The search result of "2010 Korean action movie"

The keyword interface of NaverLab Semantic Movie Search provides for general users a way to easily search for the answers from database data. It functions like a restricted natural language interface. It is not a full natural language interface but accepts a multi-keyword query of the form of a noun phrase. The properties may be expressed as the adjectives or the propositional phrases attached to a noun.

Let's see another sample query "2010 action Korean movie" (Fig 2). '2010', 'action' and 'Korean' are properties applied to 'movie'. The query intent is obvious. The meaning is the movies that are produced in 2010, of action genre and made in Korea. Many interesting queries are possible by simply listing properties like this. This form of keyword query is easy for general users to understand and use.

Thursday, December 16, 2010

Semantic Search

Fig 1. The search result of a query "Brad Pitt movie"

NaverLab Semantic Movie Search is a semantic search engine. It doesn’t search for the data that simply contain the query keywords but analyse the meaning of the query. It maps the query keywords onto the concepts it already knows in the movie domain and infer all the possible meanings.

Let's run a sample query. The search result is on Fig 1. The query is “Brad Pitt movie” or “Movies of Brad Pitt” (Correctly speaking it's the English translation of the actual Korean query). The search engine recognizes “Brad Pitt” is an actor and “movie” is an abstract concept in the movie domain.Then it searchs for the possible relationships between the two concept “Brad Pitt” and “movie”. It finds the following five meanings:

The movies that Brad Pitt appears as a leading actor
The movies that Brad Pitt appears as a supporting actor
The movies that Brad Pitt appears as a minor role
Brad Pitt's debut movie
The movies that Brad Pitt has participated as a staff.

Let's try another query. The search result of "Actors appearing in Titanic" on Fig. 2. "Titanic" is mapped onto the Movie object "Titanic". "Actors" is mapped to the concept Actor. "appear" is recognized as the relationship between the Movie object and the Actor object. Then it finds the following three meanings:

The actors/actresses who appear in Titanic as a leading actor
The actors/actresses who appear in Titanic as a supporting actor
The actors/actresses who appear in Titanic as a minor role

Fig. 2. The search result of "Actors appearing in Titanic"

Monday, December 13, 2010

Visual Data Browser

Figure 1. Start browsing with "Gladiator"

One of the main motivation of NaverLab Semantic Movie Search is to provide a convenient vehicle for the ceaseless navigation through the movie data space. During navigation one may come across a lot of unexpected discoveries. The visual browsing interface of NaverLab Semantic Movie Search is a really excellent vehicle for the fast and easy navigation. Just a mouse click leads to a new search and instantly bring new results.

The following is a simple browsing scenario initiating from “Gladiator”. There are six properties to select: director, country, crew, award, actor, role. (Fig. 1)

Select “actor”, then in 2nd column all the actors in Gladiator are displayed. (Fig. 2)
Click “Russell Crowe”. This means a specific object is selected. (Fig. 3) You may deselect the object by clicking the x-button and reselect another object.
Select “movie” that Russell Crowe appears. Then all the Russell Crowe’s movies are displayed in the 3rd column. (Fig. 4)
Select “actor” then all the actors who appear in all the Russel Crowe’s movies are displayed in the 4th column. (Fig. 5)
Click “Leonardo DiCaprio”. Then very interesting change happens. The movies in the 3rd column disappear except only the movies that both Leonardo DiCaprio and Russell Crowe appear. (Fig. 6) This is one of the very powerful querying capabilities of our visual browser.

Maximum 4 columns are displayed. But you may keep going to navigate in the right direction. If a new column comes in, then automatically the left-most column disappear to the left. The most 10 recent columns are kept and the 4-column window may slide left or right within the 10-column range. So you may backtrack to a certain past column and take other path.

Figure 2. All the actors appearing in Gladiator, in the 2nd column.

Figure 3. Selecting Russll Crowe

Figure 4. All the movies that Russell Crowe appear in, in the 3rd column.

Figure 5. All the actors who appear in all the Russel Crowe’s movies, in the 4th column.

Figure 6. Selecting Leonardo DiCaprio which results to displaying only the movies that both Russell Crowe and Leonardo DiCaprio appear in.

Sunday, December 12, 2010

Some keyword query examples of NaverLab Semantic Movie Search

In order to understand NaverLab Semantic Movie Search, it would be the best to experience the search service by oneself. Unfortunately the service currently support only Korean. It is almost impossible for one who don’t understand Korean to run the search service.

So the second best will be to show you some real search examples or scenarios of the NaverLab Semantic Movie Search. For your understanding, the English translations of some Korean text are given.

Query: Movie of Angelina Jolie and Brad Pitt
Intent: All movies that both Angelina Jolie and Bradley Pitt appear or participate

Query: Cast of Inception
Intent: All actors appearing in the movie Inception

Query: Academy art direction award
Intent: All art direction award movies in Academy history

NaverLab Semantic Movie Search

Figure 1. The keyword query interface of NaverLab Semantic Movie Search

Figure 2. The visual browsing interface of NaverLab Semantic Movie Search

Querying and browsing. Browsing and Querying. Mindless navigation on the Web became a kind of our everyday entertainment. Unexpected discoveries are fun and sometimes informational. Especially in movie domain, the navigation must be a great fun.

NaverLab Semantic Movie Search provides a useful vehicle for the joyful navigation in the movie domain. The movie search service effectively interweave querying and browsing interface to bring ceaseless navigation experience to users.

The ceaseless navigation on the Web is possible by the graph structure of the Web and the giant search engines like Google. In similar metaphor, in order to enable the ceaseless navigation on structured data, a graph structure among data items and search engines tuned for structured data are required.

Based on the idea, RDF graph is used as the graph structure among data items and a new keyword interface on the RDF graph is developed. A multi-keyword query is transformed into possible sub-graph patterns and the query sub-graph patterns are matched with the RDF graph. The searching algorithm is basically language independent. The keyword interface (Fig 1) performs great. It acts like a restricted natural language interface.

The visual browser part of the interface (Fig. 2) is very unique in design. It shows the 5 most recently visited nodes of the navigation path. So it is possible to go back to the previous nodes on the navigation history and take other path. More importantly it enables to compose a complex query and run it instantly.

The visual interface provides two kinds of browsing mode: class-level and object-level browsing. The visual interface shows the class browser on the top part and the object browser on the bottom part of the window. You may close the class-level browser to acquire more space for the object display. The class and the object browser together provide powerful querying and browsing capabilities for structured data.

NaverLab Semantic Movie Search is an experimental service in Naver Lab. Unfortunately the service currently support only Korean language. But in the search box you may type in a movie title or a person name in English. Of course you need Korean fonts installed on your PC. You also need MS Silverlight installation.

I plan to write about Naver Semantic Movie Search in more detail in my following blogs.

Collective Intelligence and Semantic Web