Invented by Dwayne E. Bowman, Ruben E. Ortega, Michael L. Hamrick, Joel R. Spiegel, Timothy R. Kohn, Amazon Technologies Inc
The Amazon Technologies Inc invention works as followsA search engine that suggests related terms for the user’s search to help refine it is revealed.” Related terms are generated by query term correlation data, which shows the frequency with which certain terms have appeared in the same query. The correlation data are generated and stored using an offline process that parses a query file. The table is periodically regenerated from the latest query submissions, e.g. the last two weeks. This reflects strongly the preferences of the users. The user can select a hyperlink to modify a query based on each related term. In one embodiment, related terms are selected and added to the table to ensure that modified queries do not result in NULL results.
Background for Refining searches by suggesting correlated terms from previous searches
This invention is related to query processing and, more specifically, to techniques that facilitate the refinement of search queries.
With the growing popularity of Internet and World Wide Web, online users are increasingly using search engines to find the information they need. Users can search a large number of web pages to find a few relevant items. For example, some web index sites allow users to search among well-known web sites for a particular website. Many online merchants such as bookshops allow users to search among their entire product range. Some online services, like Lexis? Other online services, such as Lexis?
To perform a search, an individual submits a query that contains one or more query words. The query can also identify explicitly or implicitly a field or segment of a record to be searched such as the title, author or subject classification. A user of an online bookstore, for example, may submit a search query that contains terms the user believes are in the title of the book. The query server program in the search engine will process the query and identify items that match its terms. The query result is the set of items that are identified by the search engine query server program. In the example of an online bookstore, the query results is a list of books with titles that contain all or some of the query terms. In the example of a web index, the query results are a list of documents or web sites. The query result in web-based implementations is usually presented as a list of items located.
If the scope of your search is broad, you may find that the query results contain thousands, or even millions, of items. When the user performs the search to find one item or a limited set of items the conventional methods of ordering the results within the query list often do not place the desired item or items at the top. The user must read many items before finding the desired item. Some search engines such as Excite? As part of the “search refinement” process, certain search engines such as Excite? The user can refine their query by selecting related query terms which more accurately reflect what they are looking for. Search engines generate the related query terms by using the content of the query results, for example by identifying the terms that are most commonly used within the documents located. If a user submits a query for the term “FOOD”, the query results may include several thousand items. The search engine may then go through all or some of these items, and provide the user with query terms related to the content of the item, such as “RESTAURANTS,” “RECIPIES,” and “FDA” in order to refine their query.
The related terms for the query are often presented to the user along with the corresponding checkboxes that the user can selectively mark or check to add the terms to the query. In some implementations the user can select the related query terms from drop-down menus on the query results page. The user can either add more terms to the query or resubmit it with the new query. This technique allows the user to narrow down the search results into a manageable list of items that are primarily relevant.
A problem with the existing techniques for generating similar query terms is that they are often of little to no value in the search refinement procedure. The addition of related terms can sometimes result in a NULL search result. The process of parsing query results to identify frequently-used terms can consume significant processor resources and increase the time it takes for the user to view the query result. The user is frustrated by these and other shortcomings in the existing techniques.
The present invention provides a system and method of search refinement for generating and displaying related queries terms (“related query terms \”). According to the invention, related terms are generated using query term data correlation based on historic query submissions. The correlation data is preferably generated based on the frequency with which certain terms were submitted historically within the same query. This historical query data is used to generate related terms which are often used in conjunction with the query terms submitted by other users. This increases the likelihood of these related terms being helpful to the search refining process. The correlation data should be generated from historical queries that have produced at least one successful match (to increase the probability that these related terms will prove useful).
According to one aspect of the present invention, the correlation data are stored in a data structure for correlation (table, database etc.). This data structure is used to search for related terms as a response to queries. The data structure can be generated by an off-line parsing of a query log, or it could be generated in real time as users submit queries. In one embodiment, a data structure is generated periodically (e.g. once per day) based on the most recent queries (e.g. the last M days’ entries in the querylog), which reflects strongly the current preferences of the users. As a result of this, the terms related to the search engine are reflected strongly by their current tastes. In the context of the search engine for an online merchant, it is common to have the search engine suggest terms related to the best-selling items.
In a preferred implementation, each data entry is a list of related terms and a key phrase. Each related terms lists contains terms that have historically appeared with (in the same request) the respective key terms with the highest frequency. The data structure provides an efficient way to look up related terms for any given query term.
The correlation data structure is used to obtain the list of related terms associated with each term within the submitted query. The related terms list is preferably combined if this step results in multiple lists of related terms (as with a query that contains multiple terms). This can be done by taking the intersection (deleting terms not found on all lists) between the lists. Related terms are those that have appeared in at least one previous successful query submission in conjunction with each term in the current query. Assuming items are not deleted from the database, these terms can be added individually to the current query, while guaranteeing the query modified will not return NULL results. In order to take advantage of this functionality, the related terms should be presented to the user through a user interface which requires that the user add only one related term to each query submission. In another embodiment, the related term are displayed and selected without guaranteeing that a query will be successful.
The related terms can easily be identified by the system without having to parse the documents or to correlate the terms.
The present invention is a system and method of search refinement for generating related search terms (“related term”) based on a history or queries entered into a search engine. The system creates correlation data for query terms that reflects how often specific terms occur together in the same query. In conjunction with the query terms entered by the user, the system uses query term correlation data to suggest additional query words for refining the search. This historical query data tends to generate related terms which are often used in conjunction with the query terms submitted by other users, increasing the likelihood that the related terms will help refine the search. The correlation data should be generated from historical queries that have produced at least one successful match (to increase the probability that these related terms will prove useful).
The preferred embodiment is to generate the query term correlation dates periodically using recent queries, for example, by using entries from the last M days in a query log. This reflects current user tastes. The search engine will suggest related terms that are most likely to correspond with the items searched the most during the relevant period. In the context of the search engine of an internet merchant, for instance, the search engines tend to suggest terms related to the best-selling items. In one embodiment, a technique is used to generate related terms and to present these terms to a user. This guarantees that modifying the query will not result in obtaining a NULL query.
The search refinement method of the invention can be implemented as part of an Internet site, a web page, an online services network, document retrieval systems, or any type of computer system which provides searching capabilities to users. The method can be combined with methods that suggest related terms such as those which process the content of documents located.
The preferred web-based implementation will be described now with reference to FIGS. 1-9. As an example, the system described here is a search engine used by Amazon.com Inc. to help customers find items (e.g. books, CDs etc.). From an online catalog of products. Throughout the description, reference will be made to various implementation-specific details of the Amazon.com implementation. These details are included to illustrate the preferred embodiment of the present invention and not to limit its scope. The appended claims define the scope of the invention.
I. Overview Web Site and Search Engine
It is common knowledge in the field of Internet commerce that the Amazon.com website includes functionality to allow users to search and browse an online catalog of music, book, and other items, as well as make purchases via the Internet 120. The catalog is made up of millions of items. It’s important to have a site that can help users find items.
As shown in FIG. The web site 130 contains a web server application (“webserver”) that processes requests from users’ computers 110 over the Internet 120. These requests include searches submitted by users for the online catalog. In a query log 135 the web server 131 stores all user transactions including queries submitted by users. In the embodiment shown in FIG. The query log 135 is made up of a series of daily query logs 135(1)-135M, each representing a day’s worth of transactions.Click here to view the patent on Google Patents.