Alphabet – Darrell Anderson, Paul Buchheit, Alexander Paul Carobus, Yingwei Cui, Jeffrey A. Dean, Georges R. Harik, Deepak Jindal, Narayanan Shivakumar, Google LLC, Google Technology Holdings LLC

Abstract for “Serving ads based on content”

Advertisers can place targeted ads on pages on the internet (or any other document of any type) and they are allowed to do so. This invention allows you to: (i) obtain content that contains ads spots, (ii), determine ads relevant to content and/or (iii). combine content with ads that are relevant to the content.

Background for “Serving ads based on content”

Advertising is the subject of the present invention. The present invention focuses on expanding the targeting options for advertisers.

Advertising using traditional media such as radio, television and newspapers is well-known. Even with accurate demographic data and reasonable assumptions about the audience of different media outlets, advertisers realize that a lot of their advertising budget is wasted. It is also very difficult to find and eliminate this waste.

Advertising over interactive media is becoming more popular in recent years. Advertisers have grown to value the potential power of the Internet and the services it offers as a way to promote their products.

“Advertisers use a variety of strategies to maximize the advertising’s value. Advertisers use popular means to provide interactive media or services (referred as ‘Web sites?). The specification does not lose any generality. They are conduits for reaching a large audience. An advertiser might place ads on the New York Times Web page or USA Today Web site home page. An advertiser might also try to target specific niche audiences to increase the chance of receiving a positive response from the audience. An agency that promotes tourism in Costa Rica’s rainforest may place ads on the Yahoo Web site’s ecotourism-travel directory. Advertisers will usually determine this targeting manually.

Web site-based ads, also known as “Web ads”, can be used regardless of their strategy. They are often presented to their audience as?banner ads? A rectangular box with graphic components. A member of the advertising audience is also known as a “viewer” A member of the advertising audience (referred to as a?viewer? oder?user? The Specification does not lose any generality. A viewer clicks on one of the banner ads to view it. Typically, embedded hypertext links direct the viewer directly to the advertiser’s site. The viewer clicks on an ad to view it. This is often referred as a “click-through”. (?Click-through? This is intended to be inclusive of any user selection. Commonly, the “click-through rate” is the ratio of the number click-throughs to impressions of an ad. The ad. Conversion is when a user consummates a transaction related to an ad that was previously served. A conversion is when a user completes a transaction that is related to an ad previously served. There are many ways to determine what constitutes a “conversion”. A conversion could be when a user clicks an ad and is directed to the advertiser’s website. The user then makes a purchase on that page. A conversion could also be defined as the user being shown an advertisement and then making a purchase on advertiser’s website within a specified time, such as seven days. There are many other ways to define what constitutes conversion. Commonly, the conversion rate is the ratio of the number conversions to impressions (i.e. the number times an ad has been displayed) A conversion must be possible within a certain time period from the time the ad was served. One definition of conversion rate could include ads that were displayed more often than the time limit.

Web site-based advertising promises a lot, but there are still many problems with the existing methods. Advertisers are able reach large audiences, but they are often unhappy with their return on investment. Some people have tried to improve ad performance through tracking online behavior of users. However, this has created privacy concerns.

“Similarly, the Web site hosts on which the ads are displayed (referred to as ‘Website hosts? Or?ad users? They must maximize ad revenue while maintaining the best user experience. Some web hosts place advertising revenue above the users’ interests. One example is?Overture.com, which hosts a?search engine? service returns advertisements disguised as?search results? in response to user queries. Advertisers can pay to place an ad for their site or a target site on Overture.com. Advertisers who only pay if the user clicks on their ad (cost-per-click) will lose the incentive to target their ads effectively. A poorly targeted ad won’t be clicked, so they will not receive payment. High cost-per-click ads appear near the top or at the bottom of the list, but don’t generate any revenue for the advertiser because viewers don?t click on them. Ads that viewers might click on are often further down or not on the list. This compromises the relevancy of ads.

Google and other search engines have made it possible for advertisers to target their ads to be displayed in conjunction with search results pages that respond to queries that are relevant to the ad. While search result pages offer advertisers great opportunities to target their ads at a wider audience, they are only a fraction the page views on the World Wide Web.

“Some people have tried to manually map Web pages into one or more categories using a category taxonomy. This manual classification of Web pages comes with many disadvantages. Manual classification is time-consuming and costly. In addition, it can lead to inconsistent applications because of the subjectivity of different classifiers. Manual classification is also difficult due to the large number of Web pages and frequent content changes.

“It would be helpful for advertisers to place targeted ads on any page of the web (or any other document of any type) instead than just search results pages. This scheme would avoid manual classifications, which can often lead to insurmountable problems.

Advertisers can place targeted ads on any page of the internet (or any other document of any type) using the present invention. This may be achieved by: (i) finding content that contains ads spots, (ii), determining ads relevant for content and/or (iii). Adding content to ads that are relevant to the content.

“The invention could include novel methods, apparatus and message formats, as well as data structures, that allow advertisers to place targeted, content-relevant ads on any page of the internet (or any other document of any type). This description will allow one skilled in art to make and utilize the invention. It is given in the context and according to specific applications and their requirements. Many modifications can be made to the disclosed embodiments by those who are skilled in the art. The general principles described below can be applied to other embodiments or applications. The invention does not limit itself to the disclosed embodiments. Accordingly, the inventor considers his invention any patentable subject matter.

“FIG. “FIG. This environment could include an ad entry system, maintenance system, and delivery system 120. Advertisers 110 can directly or indirectly enter, maintain and track ad data in the system 120. Advertisements can be in the form text ads, banner ads or image ads. They may also include audio ads, video advertisements, audio ads, and ads that combine one or more of these components. Ads may include embedded information such as links, meta information and/or machine executable directions. Ad consumers 130 can submit ads requests to, accept ads that meet their requirements, and provide usage information for the system 120. Other entities, although not shown here, may also provide usage information to the system 120 (e.g. whether the ad was clicked-through or converted). This usage information could include observed or measured user behavior in relation to ads that were served.

An ad consumer 130 is an example of a general content service that receives content requests (e.g. articles, discussion threads and music. The request is processed by the content server, which retrieves the requested content. A request for ads may be made by the content server to the system 120. This request for ads may contain a number or desired ads. You may also need to include information about the content requested in your ad request. This information could include the content (e.g. page), the category that corresponds to the content or the request (e.g. arts, business computers, arts movies, arts music, etc.). ), part of or all the content request, content date, content type (e.g. text, graphics and audio, mixed media, etc. ), geolocation information, etc.”

The content server might combine the requested content with one of the 120 advertisements. The combined information, including the content and advertisements, is forwarded to the end user who requested the content for presentation to the viewer. The content server may also transmit information about ads, including the where, when and how they were rendered (e.g. position, click-through, impression time, impression dates, size, conversion, or any other relevant information). The system 120 may be contacted. You can also provide such information to the system 120 using another method.

“A search engine is another example of an advertising consumer 130. A search engine might receive queries about search results. The search engine might respond to queries for search results. A good example of a search engine can be found in the article S. Brin & L. Page, “The Anatomy and Uses of Large-Scale Hypertextual Search Engines.” Seventh International World Wide Web Conference Brisbane, Australia, and in U.S. Pat. No. No. 6,285,999 (both incorporated by reference). These search results can include lists of Web page titles, excerpts from Web pages and hypertext links to Web pages. They may also be grouped together into a predetermined number (e.g. ten).

“The search engine might submit a request to the system 120 for ads. A request for ads may contain a list of desired ads. The search results, page space taken up by search results, size and shape, as well as the number of ads requested, will all affect this number. One embodiment will allow for a range of ads, with a minimum of one to ten and preferably three to five. The request for ads can also include the query as entered or parsed, information based upon the query (such geolocation information, whether or not the query was originated from an affiliate, and an identifier of such affiliate) and/or information associated or based with the search results. This information could include, for instance, identifiers that relate to search results (e.g. document identifiers, or?docIDs). Scores related to search results (e.g. document identifiers or?docIDs), information retrieval scores (e.g., Information retrieval (??IR)). Scores such as Page Rank scores and/or combinations IR scores and Page Rank score, dot products of feature vectors corresponding to a query, a document, Page Rank ratings, snippets extracted from identified documents (e.g. WebPages), full texts of identified documents and feature vectors of identified papers, etc.

“The search engine might combine search results with one or several advertisements from the system 120. The combined information, which includes the search results and the advertisement(s), is forwarded to the user who requested the content for presentation. It is preferred that the search results be kept separate from the advertisements, in order to avoid confusion between paid advertisements or presumably neutral search results.

“Finally the search engine may transmit information regarding the ad, when, where and/or how it was rendered (e.g. position, click-through, impression time, impression data, size, conversion, etc.). The system 120 may be contacted. You can also provide such information to the system 120 using another method.

“FIG. “FIG. The 120-character exemplary advertisement system? An inventory system 210 may be included. It may also store ad information (205) and usage information (245). The exemplary system 120 The exemplary system 120 may be used to support ad information entry, management operations 215, campaign assistance operations (e.g. targeting) operations 220 and optimization operations 240. Relative presentation attribute assignment (e.g. position ordering) operations 250. Fraud detection operations 255. And result interface operations 226.

“Advertisers 110 could interface with the system 120?” via interface 216 for ad information entry. The system 120 may interface with Ad consumers 130? via interface 231. Interface 120 may also be used by Ad consumers 130 and/or any other entities (not illustrated). via interface 261 results interface operations.

Advertising programs may contain information about accounts, creatives, targeting and campaigns. “Account” is a term that refers to information about accounts, campaigns, creatives and targeting. The term?account? refers to information about a particular advertiser (e.g., an unique email address, password, billing information etc.). A?campaign? A?campaign? or an?ad campaign? Refers to one or several groups of advertisements. It may include a start and an end date, budget information as well as geo-targeting information and syndication information. Honda might have an advertising campaign that promotes its automobile line and another for its motorcycle line. Each ad group may contain one or more ads. Each ad group can include a set keywords and a maximum bid (cost per click, cost per conversion etc.). Alternativly, or in addition to the above, an average cost bid may be included in each ad group (e.g. average cost per click through, average cost for conversion, etc.). A single maximum cost bid, and/or an average cost bid, may be associated with one keyword. Each ad group can have one or more ads. This is a type of ad content that is eventually rendered to the end user. The ad information (205) may contain more or less information and can be organized in many different ways.

“Ad information 205 can easily be entered and managed through the ad management operations 215. To help 110 advertisers create effective ads campaigns, campaign (e.g. targeting) assistance operations 220 may be used. Campaign assistance operations 220 may use information from the inventory system 215. This can be used to track all ad impressions and reserved impressions. It also tracks keywords available. Ad serving operations 230 can service ads requests from advertisers 130. To determine the best candidate ads for a request, the ad serving operation 230 might use relevancy determination operations (235). Optimizement operations 240 may be used by the ad serving operation 230 to select one or more candidate ads. The ad serving operation 230 can use relative presentation attributes assignment operations 250 to arrange for the return of ads. To reduce fraud in the advertising system (e.g. by advertisers), the fraud detection operations 255 may be used. This includes the use of stolen credit card numbers. The results interface operations (260) may be used to receive result information (from the advertisers 130 or another entity) about an actual ad served. This includes whether click-through occurred and whether conversion occurred (e.g. whether an item or service was purchased within a predetermined period from the rendering the ad). Interface 261 may accept such results information. It may include information about the ad, the time it was served, and the result associated with it.

Online ads such as those found in the exemplary systems discussed above with reference to FIGS. You may also have other features. Advertisers and applications may specify such features. These features are called ‘ad features’. below. In the example of a text advertisement, ad features could include a title, ad text and executable code. An embedded link is another example. Ad features can also include images in an image ad. Depending on the type and content of an online ad the ad feature may include text, a link or an audio file, a file with a video, an image, executable code, embedded data, etc. One or more parameters can be used to describe the way, when and/or location an online ad was served. These parameters are known as “serving parameters”. below. For example, serving parameters could include features of (including information) a page where the ad is served, including one or more topics or concept associated with the page. Information or content located on or inside the page. Information about the page such the host of page (e.g. AOL, Yahoo, etc. (e.g., page importance) Traffic, freshness, quality and quantity of links to the page, location within a directory structure, etc. A search query or search result associated with the serving the ads, a user characteristic (e.g. their geographical location, language used, previous page views and behavior), an affiliate site (e.g. America Online, Google or Yahoo) that initiated the request for the ads to be served. There are many other parameters that can be used to serve ads in the context of an invention.

Serving parameters can be intrinsic to ads features but they can be linked with conditions or constraints. These serving parameters, also known as “serving conditions” or “constraints”, are sometimes referred to as “serving constraints”. In some systems, advertisers may be able specify that their ad will only be served on weekdays. It should not be lower than a specific position and only to users who are in a particular location. Another example is that an advertiser might specify that it will only serve its ads if a page or search query contains certain keywords or phrases.

“?Ad information? It may contain any combination of ad-serving constraints, ad feature information, and information derivable directly from ad services constraints. ), and/or information about the ad (referred as?ad related info?). ), and/or information related to the ad (referred to as?ad related information?).

“A ?document? “A?document?” can be interpreted broadly to mean any machine-readable or machine-storable work product. Documents can be files, files combined, or files that contain embedded links to other files. The files can be any type of file, including text, audio, image and video. Parts of a document that are to be rendered to the end user can be referred to as “content”. The content of the document. Instructions or embedded information may be used to define ad spots in a document. A Web page is a common document in the Internet context. Web pages can include content, and embedded information (such a meta information, hyperlinks etc.) is common. embedded instructions (such Javascript, etc.). A document can be identified as unique by its addressable storage location. An universal resource locator (URL), is an address that allows you to access information via the Internet.

“?Document information? It may contain any information in the document or information derivable form information in the document (referred as?document-derived information). ), and/or information related (referred to as “document related information”). As well as extensions to such information (e.g. information derived from related data). A classification that is based on the textual content of a particular document is an example of document derived data. Document related information can include information from documents that have links to the instant documents, and information from documents to which the instant documents link.

“Content from a text may be rendered using a content rendering application or device. An Internet browser (e.g. Explorer or Netscape), a media play (e.g. an MP3 player, Realnetworks streaming audio player, etc.) are examples of content rendering apps. ), a viewer (e.g. an Abobe Acrobat pdf readers), etc.

“Referring back to FIG. 4. Recall that document information is what determines document relevance information 434. This section describes a variety of ways to obtain document information. The following examples are given in context of Web pages identified by URLs. However, the invention is not limited there to these instances.

There are several ways to get the document information (e.g. Web page contents) For example, a third party such as a Web page host or advertiser may provide document information. This document information could include information contained within the document or other information (e.g. A URL that allows for such information to be accessed. Document information (e.g. Second, document information (e.g. Web page contents), may be obtained during an advertisement request. For example, an end-user’s content rendering app (e.g. a browser) might be instructed to send Web page contents during an advertising request. Or, the document information could be fetched as part of content relevant ads serving operations 410. For future content-relevant ads targeting, third, Web page contents (e.g.) can be pre-fetched (i.e. obtained prior to a specific request). Other methods are also available for obtaining document data, such as the ones described in U.S. Patent Application Ser. No. No. Mar. 29, 2002, U.S. patent application Ser. No. 09/734 886 entitled?HYPERTEXT ROWSER ASSISTANT?” filed December 13, 2000 and U.S. Patent Application Ser. No. No. Each of these is herein incorporated as a reference to the Dec. 13, 2000 filing.

“FIG. “FIG. Content-relevant ad servicing operations 510 may be used to serve document information requests (or ad data) and may also include reply combination operations 515 and document information request distribution. (Note that ad information, or ad relevance information, as well as operations such as relevance information extraction/generation operations 412, ad-document relevance information comparison operations 414 and ad(s)-document association operations 416 are not shown in FIG. 5. This simplifies the Figure. These operations 515 can be used when multiple sources (prefetched) information 520 (or ads information) are required. One or more cached document information 530 and a larger number of?untargeted’ sources of document information can be used as sources. document information 540 and a smaller number of?targeted’. document information 550. A crawl (or other method of retrieval of targeted documents) will generally be?deeper? A crawl (or some other method of retrieval) of targeted documents will generally be?deeper? The arrows in the left margin of FIG. 5 Requests for document (or advertisement) information are moved down the double-arrow lines in FIG. 5, and responses to such requests are moved up the double-arrow lines.

Documents with relatively static information and documents that are not yet available can be pre-fetched (pre-fetched), or may be requested in real time. It may be more convenient to retrieve dynamic information in real time, responsive to a request.

“The cached information 530 could include information about documents that were requested frequently or recently.”

“The greater number of untargeted searches? Document information 540 could have been created and may be updated using a search engine spider 560. U.S. Pat. describes an exemplary search engine crawler 560. No. No. 6,285,999, which has been incorporated by reference. Information about many documents may be available. However, it is possible that information about one particular document might not be available. This is because in the so-called nonblocking implementation, where the content-relevant ads request serving operations don’t wait to obtain document information if it hasn’t been previously obtained or presently stored, a request for ads on a document might be made with so-called “house ads”. Ads for the adserver, ads shown at no cost, and ads that do not generate revenue, as well as random ads or generally performing ads, if ad revenue is determined by a user action (e.g. a click-through, conversion). Note that performance statistics of random ads and generally well-performing ads served in an untargeted manner should not be affected. It may also be possible to request ads for documents that are not normally available. is made to estimate document information. This estimate could be done by looking at the document’s position within a directory structure, and then using information from the directory (categories), or other documents of the same, similar or higher (narrower), classification. You could also look at the log of search queries that led to search results, traffic to the document and identify alternative documents. In such cases, it is possible to contact the Web site hosting the document and provide the information.

“The smaller number of targeted? document information 550 can be obtained and maintained in a variety of ways. Targeted document information retrieval (e.g. crawling) operations 580 can be used to crawl specific content provider Websites such as partner Websites 588. You may have entered some or all of these partner websites via content provider interface operations 585. A content provider such as a Web publisher can also provide document information (e.g. Web pages or URLs to newly added Web pages) directly via content provider interface operations 585.

A self-service syndication method allows content providers, such as publishers, to sign up to place content-relevant ads onto their Websites through an easy, standard and fast process. This self-service syndication method can support any of the following:

“FIG. “FIG. 6 is a flow chart of an exemplary 600 that can be used to obtain document information as part of content-relevant advertising serving operations in accordance with the principles of the invention. Acceptance of the document identifier (e.g. URL) is made. (Block 610) The document relevance information will then be determined. (Decision block 610) If the document relevance information is available (referred as a “hit?”), then it is determined. (Decision block 620) If the document relevance is available (referred to as a?hit?), then the ad serving process continues using that document relevance information. On the other hand, if the document relevance information does not exist, it is determined whether the document information is accessible (e.g. in the cache530, main repository 540 and/or GRAS repository 555). (Block 630). If so, the document relevance information is extracted from the document information (Block 640), and the ad serving processing proceeds. If it is not, the?miss? will be referred to below. It may be possible to determine if the content provider (e.g. a partner) has documents that are easily retrieved (e.g. crawled). (Block 644) A Web site can be considered difficult to crawl if it is dynamically assembled, changes frequently (e.g. news, stocks), and/or has multiple alternatives (e.g. people finders). If the content provider is difficult to crawl and has embedded scripts or links, executable instructions (e.g. Javascript) can be used to obtain document information (Block 645). The method 600 will continue at block 640. It is determined whether non-blocking or blocking ad-serving is used by the content-relevant advertising server if the content provider is easier. (Decision block 650). If the type of the content is blocking, the information is retrieved immediately at block 660. Otherwise, the 600 method continues at block 640. For later retrieval, the non-blocking type (e.g. URL) can be stored (e.g. log of unfilled request 570). Alternative ad-serving may be done. (Block 675). You can also use a?best guess’ if the document relevance information has not been made available. As disclosed previously, it may be used as well.

Referring to FIG. 5. The targeted document information retrieval (e.g. crawling) operations 580 may process the logs of unfilled requests(s)570 and identifiers such as URLs of (partner), content provider Web sites. It then retrieves relevant document information into GRAS repository 550 for future reference. Targeted crawling operations 580 can also be used to pre-crawl web pages for a Website to?prewarm? GRAS repository 550. This ensures that ads relevant to content will always be available.

“FIG. “FIG.7″ is a flow diagram for an exemplary 700 method that can be used to retrieve targeted document information in accordance with the principles of the invention. The document identifiers will be accepted in response to a trigger event 710. (Block 730). Document information for each document identifier (Loop 730-750) is retrieved. (Block 740)”

“URLs for Web page documents identified using URLs may contain information that is different across sessions to distinguish different sessions from the same Web site. These additional information can be added to URLs such as shopperids and sessionids. If this information is removed, the URL addresses the same Web pages. Session information may not have been removed from a URL. However, stored information that is not associated with the URL might not be found using the URL that has the session information as the key. This means that even though Web page content or other information is available, it may be considered unavailable because of the session information in URL. To remove such session information from URLs, document identifier (URL rewrite operations 595) may be used to make them canonical. This will allow search keys to access and retrieve the document information stored in the repositories 553, 550 and 530.

The targeted document information retrieval operation 580 could work in conjunction with the search engine spider 560 (which may do a less frequent crawl of the Web). In one embodiment, the targeted document information retrieval operation 580 may work with a limited number of Web pages per day (e.g. 2.5M Web sites/day). It 580 could be used to supplement the search engine crawler 560 and/or reduce the time it takes to launch a partner website. It might be beneficial to delegate as much of the ongoing work to the search engine crawler 550-. It may be possible to log URLs that have no document information. In this case, it might be beneficial (i) to keep a log of URLs, and (ii) for search engine crawler 560, to retrieve the document information from its own repository 540. The main repository 540 will eventually have more information.

“There are often pages that cannot be crawled. One example of dynamic web pages is those created using a search engine. Pages that are generated using filling out forms, personalized pages, pages that need a login and password are all examples. These Web pages can be extracted using real-time document information extraction 590. One embodiment extracts the contents of document information using embedded instructions (e.g. Javascript) that are included in a document. The embedded instructions (e.g. Javascript) can be sent to the content-relevant advertising operations 410 to receive one or more targeted ads for dynamic documents. ?Interesting? Document information extracted from a Web page can include meta tags, headers and titles. Fetching and content extraction occur in real time.

“In one embodiment, Javascript is used as a proxy. This Javascript extracts the?interesting? Document information, such as titles, meta tags, headers, etc. can be extracted from any Web page that it is found on. The following Javascript could be embedded on a target page:

The above example of real-time document information extraction 590 is very useful. However, there are some drawbacks. This Javascript is large and may take a while to execute per-page. The second is that Javascript needs to be modified in order to make it more efficient over time. This means that the Javascript is larger and more frequently updated, the greater chance of different versions being used.

A static Javascript link is an alternative to embedded Javascript. A static Javascript link can be used to reduce the page’s size by 4KBytes. Here’s an example of a static Javascript page link:

” ”

“Most browsers will cache the Javascript link so that you can only fetch the Javascript when it is needed.”

A second option avoids sending 1KBype of content unconditionally to the content-relevant advertising operations 410 for every Web page view by using a 2-phase model. The first phase aims to serve ads using existing document information (e.g. at cache 530 or main repository 540, and/or GRAS repositorie 550), without sending the content to the relevant ad-serving operations 410. Javascript is used to provide the document information to the browser. This will send the ‘interesting? targeted ads. A target page could include, for example:

If document information is available, or document relevance information (e.g. content), this iframe will get one or more content relevant ad. If not, the iframe will receive a Javascript which will fetch the document relevance (e.g. contents). Javascript’s “Same Origin Policy” might make this scheme less effective. A frame within one domain, such as pagead.google.com, cannot read content from another domain (e.g. aol.com). This is called the same origin policy. The two-phase approach described above may be modified to suit your needs:

The two-phase approach can sometimes be inefficient as the entire Javascript is sent by the browser when document information (content) or document relevance information are not available in cache 533. The third option is a three phase Javascript that corresponds to the two phases of Javascript but with a static link. This three-phase approach exploits the browser’s cache and returns a link to static Javascript. If necessary, the browser will load full Javascript.

“The three-phase approach always sends two requests to the content-relevant advertising operations 410. These requests are sent in parallel and don’t impact the end user’s latency. This adds to the backend load. This additional load may be acceptable as the third request in the 3-phase approach can be handled relatively easily.

“FIG. “FIG. 8” is a flow diagram for an exemplary 800 method that can be used to perform real-time document retrieval in accordance with the principles of this invention. The two-phase and the three-phase methods are both shown. A request for executable (e.g. Javascript) is processed (Block 801). It is checked whether the document information is available at cache 530 or cache 530 or main repository 540 or GRAS repository 555. (Decision block 802) If the document information has been obtained, an empty executable is returned (e.g. empty script) to the content rendering program (e.g. browser) that requested it (Block 850). This is done before the method 800 (Node 860). On the other hand, if the document information isn’t available, an executable to read it (e.g. Javascript) or a link is returned to the executable to read document information (three-phase modeling) to the content rendering app (e.g. browser) that requested it (Block 830). The document identifier is then set to address the correct document information (e.g. ads iframe URL reset to include page content), (Block 840), before the method 800 (Node 860).

The fourth option is a four-phase approach that avoids issuing twice requests. It implements the following trick. The iframe portion is the same as before. The Javascript in the footer will attempt to determine if this iframe is showing the correct ad. Iframe will be redirected to about.blank (which serves either a blank advertisement or an ad that says?place your ads here?). If there is no document information, or document relevance information (e.g. content) for the document in cache 530 Javascipt is able to read the contents of this Iframe in this instance because it’s not in another domain. An iframe that has a security exception means it is a good ad. If the Javascript does not have a security exception, it will be able to access the document information and/or document relevance information (e.g. content) and receive targeted ads. The four-phase approach is more difficult to implement and requires additional browser features (redirect, offload).

“If Javascipt size (i) is 4 KB and Javascipt contents (iii), 1 KB for URLs with content, (iii), browser caching hitrate is 90%, (iv), cached information hit rate (95%) and (v) browser cache hits rates and cached information hit rates (both independent), the three-phase technique provides a good combination of latency performance and bandwidth performance.”

“If ad statistics are being tracked, it may be necessary to take special considerations in order to determine the number of page views. Two ad requests will be issued if document information or document relevance (e.g. content) is not stored in cache530 (or cache 530, main repository540 or GRAS repository550). Page views can be overestimated, which could lead to a loss of important stats such as revenue per thousand impressions (RPM), and many other vital statistics. It is possible to show ads or a static advertisement in an iframe if the document information, relevance information, (e.g. content) is not present. However, this could corrupt statistics for pages without Javascript at their footer. Certain content providers won’t allow this Javascript footer on privacy sensitive pages. To solve this problem, you could add an additional flag to the URL of the iframe to identify pages that don’t have the footer Javascript.

Although the Javascript implementations were described, the invention does not have to be limited to Javascript. It may also use another script or executable. A toolbar/client may be added to the user’s content rendering program (e.g. browser/OS), which can send the document information to the ad servers. An http proxy can be used to monitor all document information (e.g. content) sent to a user. This proxy can then transmit this information to the advertising server.

“In the context Web pages, to target ads according to the content of the Web site, it is necessary to retrieve the information (e.g. content) from the Web site. The URL of the Webpage where the ads will appear is required. FIG. FIG. 9A shows a Web page with one or several ad spots. The Web page 900 contains content 910 and has URLMP 915. The Web page 900 could also contain one or more Iframes 920a,920b, each with its own URL 925a,925b. A script (or pointer to script) may be required to allow the content-relevant advertising server to fetch information (e.g. content) from the Web page 900. However, the URLMP 915 on the main page could be different than the URLs 925a,925b of one or several iframes. As shown in FIG. 9C) Some content provider partners may place the script (or pointer to it) 930? directly on the main page 900 As shown in FIG. 9B) or a pointer thereto, 930a or 930b within an iframe 930a or 920b with a different URLin 925 a? URLIF2 925?b URLMP 915 parent Web page 900 If script 930 is used? It is located on the main Web page as shown in FIG. 9C is a Javascript attribute that first identifies the location (URLMP). 915 of the main Web page, 900?. The script 930a or 930b is located in an iframe 920a or 920b as illustrated in FIG. 9B is a Javascript attribute that identifies the location (URLMP). 915 Instead of returning URLin 925 a, the main Web page 900? is returned. URLIF2 925 or 900? or URLIF2 925 b? Variants such as ?window.document.location?, etc., may be used instead. In any event, in order for the script to get the appropriate document information (e.g., content 910), it needs the proper URLMP and therefore needs to know which of the two methods?document.location or document.referrer?to use. While different content providers could be given Javascript to use in these cases, it complicates things and requires partners to use the correct script for the page.

“FIG. “FIG. This method 1000 uses the Javascript exception handling and iframe security model. Comparing the ad location (?document.location?) with the main page (?window.top.location?) It is possible to do so. (Block 1010) If the documents are identical, the?document.location is used. The?document.location? method is used for determining the root document. (Blocks 1020 & 1030). If they are different, the comparison fails (if the mainpage and iframe are located in the same domain) or generates an exception for security violations (an iframe cannot examine any values other than its own). If there is a mismatch, or an exception, the??document.referrer? The?document.referrer? method is used to determine the root document (mainpage) location. Blocks 1020 and 1004: This combination of exception handling and iframe security models allows for a novel and powerful way to identify the URLMP main page.

“In one embodiment, Javascript?onerror?” An exception handling method is used. Another alternative is to use?try/catch exception handling.”

“a) fetch before you ask (pre-fetch).”

“b) fetch on-demand blocking fetch (on-demand fetch)

“c) fetch on-demand non-blocking fetch (on-demand fetch after request)

“a) fetch only the Web page”

“b) Retrieve the Web page and follow its hyperlinks”

“a) Use a separate crawler

“b) fetcher embedded within the content-relevant advertising targeting system”

“The above-described implementation uses a separate crawler (Recall 580 in FIG. 5.) 5.) Fetches the Web page with its links prior to a request (prefetch). If the requested document information is not available, the Web site is downloaded after the request has been processed. (Remember blocks 670-675 in FIG. 6. 7.) 7.)

“The document information could be subject to further processing once it has been fetched.”

“Referring to FIG. “Referring back to FIG. U.S. describes several ways of extracting and/or creating relevance information. Provisional Application Ser. No. No. In U.S. Patent Application Ser. No. No. Dean, Georges R. Harik, and Paul Bucheit are the inventors. These applications are incorporated by reference. These applications are collectively referred to as “the relevant ad-server applications?” Relevance information can be defined as the topic or cluster to which an advertisement or document is related. U.S. Provisional Application Ser. No. No. 60/416/144, entitled “Methods, Apparatus and Probabilistic Hierarchical inferential Learner?” This application, filed Oct. 3, 2002 (incorporated by reference herein), describes examples of how to determine one or more concepts (referred to collectively as?phil clusters?). Information that can be used in accordance with the principles of this invention.

“In an exemplary embodiment, the present invention uses a dump of a complete advertisements database to generate an index that maps subjects (e.g., phil cluster identifiers) and matches a set of ad groups. This can be achieved using one or more of: (i) a set serving constraints (targeting criteria), (ii), text of ads within the group, (iii), content on advertiser’s website, etc.”

“Recall FIG. “Recall from FIG. To determine the degree of similarity between an advertisement and a document, you can use various similarity techniques such as those described within the relevant ad servers applications. Similar techniques can use extracted and/or generated relevancy information. Based on similarity determinations, one or more content-relevant ads can be associated with a document. An ad might be associated with a document, if the degree of similarity is greater than some absolute or relative threshold.

Once a selection of the best ad groups has been made, one or more ads can be chosen using criteria from the best group(s). This list can be used by the content-relevant an advertising server to request an ad back if M of the M criteria are compatible with a single ad category. If the criteria match one ad group, the ad will be sent back to the requestor.

“Performance information, such as a history of conversions per URL, per domain, or per URL, may be fed back to the system so that clusters of Web pages or Web pages that have better performance for certain types of ads (e.g. ads that belong to a specific topic or cluster) can be identified. This information can be used to rerank content-relevant ads so that ads are served based on both content-relevance as well as performance. There are many optimizations that can be applied to optimize performance. To avoid the need to re-computation on pages that are frequently viewed, cache the URL mapping to the relevant ad groups to prevent this from happening.

“Content-relevant ads may be combined with associated documents either in advance or on-demand, and this can happen in real time. This combination can be done by either the content-relevant ads server, the content provider or the end user’s content rendering app (e.g. browser).

Summary for “Serving ads based on content”

Advertising is the subject of the present invention. The present invention focuses on expanding the targeting options for advertisers.

Advertising using traditional media such as radio, television and newspapers is well-known. Even with accurate demographic data and reasonable assumptions about the audience of different media outlets, advertisers realize that a lot of their advertising budget is wasted. It is also very difficult to find and eliminate this waste.

Advertising over interactive media is becoming more popular in recent years. Advertisers have grown to value the potential power of the Internet and the services it offers as a way to promote their products.

“Advertisers use a variety of strategies to maximize the advertising’s value. Advertisers use popular means to provide interactive media or services (referred as ‘Web sites?). The specification does not lose any generality. They are conduits for reaching a large audience. An advertiser might place ads on the New York Times Web page or USA Today Web site home page. An advertiser might also try to target specific niche audiences to increase the chance of receiving a positive response from the audience. An agency that promotes tourism in Costa Rica’s rainforest may place ads on the Yahoo Web site’s ecotourism-travel directory. Advertisers will usually determine this targeting manually.

Web site-based ads, also known as “Web ads”, can be used regardless of their strategy. They are often presented to their audience as?banner ads? A rectangular box with graphic components. A member of the advertising audience is also known as a “viewer” A member of the advertising audience (referred to as a?viewer? oder?user? The Specification does not lose any generality. A viewer clicks on one of the banner ads to view it. Typically, embedded hypertext links direct the viewer directly to the advertiser’s site. The viewer clicks on an ad to view it. This is often referred as a “click-through”. (?Click-through? This is intended to be inclusive of any user selection. Commonly, the “click-through rate” is the ratio of the number click-throughs to impressions of an ad. The ad. Conversion is when a user consummates a transaction related to an ad that was previously served. A conversion is when a user completes a transaction that is related to an ad previously served. There are many ways to determine what constitutes a “conversion”. A conversion could be when a user clicks an ad and is directed to the advertiser’s website. The user then makes a purchase on that page. A conversion could also be defined as the user being shown an advertisement and then making a purchase on advertiser’s website within a specified time, such as seven days. There are many other ways to define what constitutes conversion. Commonly, the conversion rate is the ratio of the number conversions to impressions (i.e. the number times an ad has been displayed) A conversion must be possible within a certain time period from the time the ad was served. One definition of conversion rate could include ads that were displayed more often than the time limit.

Web site-based advertising promises a lot, but there are still many problems with the existing methods. Advertisers are able reach large audiences, but they are often unhappy with their return on investment. Some people have tried to improve ad performance through tracking online behavior of users. However, this has created privacy concerns.

“Similarly, the Web site hosts on which the ads are displayed (referred to as ‘Website hosts? Or?ad users? They must maximize ad revenue while maintaining the best user experience. Some web hosts place advertising revenue above the users’ interests. One example is?Overture.com, which hosts a?search engine? service returns advertisements disguised as?search results? in response to user queries. Advertisers can pay to place an ad for their site or a target site on Overture.com. Advertisers who only pay if the user clicks on their ad (cost-per-click) will lose the incentive to target their ads effectively. A poorly targeted ad won’t be clicked, so they will not receive payment. High cost-per-click ads appear near the top or at the bottom of the list, but don’t generate any revenue for the advertiser because viewers don?t click on them. Ads that viewers might click on are often further down or not on the list. This compromises the relevancy of ads.

Google and other search engines have made it possible for advertisers to target their ads to be displayed in conjunction with search results pages that respond to queries that are relevant to the ad. While search result pages offer advertisers great opportunities to target their ads at a wider audience, they are only a fraction the page views on the World Wide Web.

“Some people have tried to manually map Web pages into one or more categories using a category taxonomy. This manual classification of Web pages comes with many disadvantages. Manual classification is time-consuming and costly. In addition, it can lead to inconsistent applications because of the subjectivity of different classifiers. Manual classification is also difficult due to the large number of Web pages and frequent content changes.

“It would be helpful for advertisers to place targeted ads on any page of the web (or any other document of any type) instead than just search results pages. This scheme would avoid manual classifications, which can often lead to insurmountable problems.

Advertisers can place targeted ads on any page of the internet (or any other document of any type) using the present invention. This may be achieved by: (i) finding content that contains ads spots, (ii), determining ads relevant for content and/or (iii). Adding content to ads that are relevant to the content.

“The invention could include novel methods, apparatus and message formats, as well as data structures, that allow advertisers to place targeted, content-relevant ads on any page of the internet (or any other document of any type). This description will allow one skilled in art to make and utilize the invention. It is given in the context and according to specific applications and their requirements. Many modifications can be made to the disclosed embodiments by those who are skilled in the art. The general principles described below can be applied to other embodiments or applications. The invention does not limit itself to the disclosed embodiments. Accordingly, the inventor considers his invention any patentable subject matter.

“FIG. “FIG. This environment could include an ad entry system, maintenance system, and delivery system 120. Advertisers 110 can directly or indirectly enter, maintain and track ad data in the system 120. Advertisements can be in the form text ads, banner ads or image ads. They may also include audio ads, video advertisements, audio ads, and ads that combine one or more of these components. Ads may include embedded information such as links, meta information and/or machine executable directions. Ad consumers 130 can submit ads requests to, accept ads that meet their requirements, and provide usage information for the system 120. Other entities, although not shown here, may also provide usage information to the system 120 (e.g. whether the ad was clicked-through or converted). This usage information could include observed or measured user behavior in relation to ads that were served.

An ad consumer 130 is an example of a general content service that receives content requests (e.g. articles, discussion threads and music. The request is processed by the content server, which retrieves the requested content. A request for ads may be made by the content server to the system 120. This request for ads may contain a number or desired ads. You may also need to include information about the content requested in your ad request. This information could include the content (e.g. page), the category that corresponds to the content or the request (e.g. arts, business computers, arts movies, arts music, etc.). ), part of or all the content request, content date, content type (e.g. text, graphics and audio, mixed media, etc. ), geolocation information, etc.”

The content server might combine the requested content with one of the 120 advertisements. The combined information, including the content and advertisements, is forwarded to the end user who requested the content for presentation to the viewer. The content server may also transmit information about ads, including the where, when and how they were rendered (e.g. position, click-through, impression time, impression dates, size, conversion, or any other relevant information). The system 120 may be contacted. You can also provide such information to the system 120 using another method.

“A search engine is another example of an advertising consumer 130. A search engine might receive queries about search results. The search engine might respond to queries for search results. A good example of a search engine can be found in the article S. Brin & L. Page, “The Anatomy and Uses of Large-Scale Hypertextual Search Engines.” Seventh International World Wide Web Conference Brisbane, Australia, and in U.S. Pat. No. No. 6,285,999 (both incorporated by reference). These search results can include lists of Web page titles, excerpts from Web pages and hypertext links to Web pages. They may also be grouped together into a predetermined number (e.g. ten).

“The search engine might submit a request to the system 120 for ads. A request for ads may contain a list of desired ads. The search results, page space taken up by search results, size and shape, as well as the number of ads requested, will all affect this number. One embodiment will allow for a range of ads, with a minimum of one to ten and preferably three to five. The request for ads can also include the query as entered or parsed, information based upon the query (such geolocation information, whether or not the query was originated from an affiliate, and an identifier of such affiliate) and/or information associated or based with the search results. This information could include, for instance, identifiers that relate to search results (e.g. document identifiers, or?docIDs). Scores related to search results (e.g. document identifiers or?docIDs), information retrieval scores (e.g., Information retrieval (??IR)). Scores such as Page Rank scores and/or combinations IR scores and Page Rank score, dot products of feature vectors corresponding to a query, a document, Page Rank ratings, snippets extracted from identified documents (e.g. WebPages), full texts of identified documents and feature vectors of identified papers, etc.

“The search engine might combine search results with one or several advertisements from the system 120. The combined information, which includes the search results and the advertisement(s), is forwarded to the user who requested the content for presentation. It is preferred that the search results be kept separate from the advertisements, in order to avoid confusion between paid advertisements or presumably neutral search results.

“Finally the search engine may transmit information regarding the ad, when, where and/or how it was rendered (e.g. position, click-through, impression time, impression data, size, conversion, etc.). The system 120 may be contacted. You can also provide such information to the system 120 using another method.

“FIG. “FIG. The 120-character exemplary advertisement system? An inventory system 210 may be included. It may also store ad information (205) and usage information (245). The exemplary system 120 The exemplary system 120 may be used to support ad information entry, management operations 215, campaign assistance operations (e.g. targeting) operations 220 and optimization operations 240. Relative presentation attribute assignment (e.g. position ordering) operations 250. Fraud detection operations 255. And result interface operations 226.

“Advertisers 110 could interface with the system 120?” via interface 216 for ad information entry. The system 120 may interface with Ad consumers 130? via interface 231. Interface 120 may also be used by Ad consumers 130 and/or any other entities (not illustrated). via interface 261 results interface operations.

Advertising programs may contain information about accounts, creatives, targeting and campaigns. “Account” is a term that refers to information about accounts, campaigns, creatives and targeting. The term?account? refers to information about a particular advertiser (e.g., an unique email address, password, billing information etc.). A?campaign? A?campaign? or an?ad campaign? Refers to one or several groups of advertisements. It may include a start and an end date, budget information as well as geo-targeting information and syndication information. Honda might have an advertising campaign that promotes its automobile line and another for its motorcycle line. Each ad group may contain one or more ads. Each ad group can include a set keywords and a maximum bid (cost per click, cost per conversion etc.). Alternativly, or in addition to the above, an average cost bid may be included in each ad group (e.g. average cost per click through, average cost for conversion, etc.). A single maximum cost bid, and/or an average cost bid, may be associated with one keyword. Each ad group can have one or more ads. This is a type of ad content that is eventually rendered to the end user. The ad information (205) may contain more or less information and can be organized in many different ways.

“Ad information 205 can easily be entered and managed through the ad management operations 215. To help 110 advertisers create effective ads campaigns, campaign (e.g. targeting) assistance operations 220 may be used. Campaign assistance operations 220 may use information from the inventory system 215. This can be used to track all ad impressions and reserved impressions. It also tracks keywords available. Ad serving operations 230 can service ads requests from advertisers 130. To determine the best candidate ads for a request, the ad serving operation 230 might use relevancy determination operations (235). Optimizement operations 240 may be used by the ad serving operation 230 to select one or more candidate ads. The ad serving operation 230 can use relative presentation attributes assignment operations 250 to arrange for the return of ads. To reduce fraud in the advertising system (e.g. by advertisers), the fraud detection operations 255 may be used. This includes the use of stolen credit card numbers. The results interface operations (260) may be used to receive result information (from the advertisers 130 or another entity) about an actual ad served. This includes whether click-through occurred and whether conversion occurred (e.g. whether an item or service was purchased within a predetermined period from the rendering the ad). Interface 261 may accept such results information. It may include information about the ad, the time it was served, and the result associated with it.

Online ads such as those found in the exemplary systems discussed above with reference to FIGS. You may also have other features. Advertisers and applications may specify such features. These features are called ‘ad features’. below. In the example of a text advertisement, ad features could include a title, ad text and executable code. An embedded link is another example. Ad features can also include images in an image ad. Depending on the type and content of an online ad the ad feature may include text, a link or an audio file, a file with a video, an image, executable code, embedded data, etc. One or more parameters can be used to describe the way, when and/or location an online ad was served. These parameters are known as “serving parameters”. below. For example, serving parameters could include features of (including information) a page where the ad is served, including one or more topics or concept associated with the page. Information or content located on or inside the page. Information about the page such the host of page (e.g. AOL, Yahoo, etc. (e.g., page importance) Traffic, freshness, quality and quantity of links to the page, location within a directory structure, etc. A search query or search result associated with the serving the ads, a user characteristic (e.g. their geographical location, language used, previous page views and behavior), an affiliate site (e.g. America Online, Google or Yahoo) that initiated the request for the ads to be served. There are many other parameters that can be used to serve ads in the context of an invention.

Serving parameters can be intrinsic to ads features but they can be linked with conditions or constraints. These serving parameters, also known as “serving conditions” or “constraints”, are sometimes referred to as “serving constraints”. In some systems, advertisers may be able specify that their ad will only be served on weekdays. It should not be lower than a specific position and only to users who are in a particular location. Another example is that an advertiser might specify that it will only serve its ads if a page or search query contains certain keywords or phrases.

“?Ad information? It may contain any combination of ad-serving constraints, ad feature information, and information derivable directly from ad services constraints. ), and/or information about the ad (referred as?ad related info?). ), and/or information related to the ad (referred to as?ad related information?).

“A ?document? “A?document?” can be interpreted broadly to mean any machine-readable or machine-storable work product. Documents can be files, files combined, or files that contain embedded links to other files. The files can be any type of file, including text, audio, image and video. Parts of a document that are to be rendered to the end user can be referred to as “content”. The content of the document. Instructions or embedded information may be used to define ad spots in a document. A Web page is a common document in the Internet context. Web pages can include content, and embedded information (such a meta information, hyperlinks etc.) is common. embedded instructions (such Javascript, etc.). A document can be identified as unique by its addressable storage location. An universal resource locator (URL), is an address that allows you to access information via the Internet.

“?Document information? It may contain any information in the document or information derivable form information in the document (referred as?document-derived information). ), and/or information related (referred to as “document related information”). As well as extensions to such information (e.g. information derived from related data). A classification that is based on the textual content of a particular document is an example of document derived data. Document related information can include information from documents that have links to the instant documents, and information from documents to which the instant documents link.

“Content from a text may be rendered using a content rendering application or device. An Internet browser (e.g. Explorer or Netscape), a media play (e.g. an MP3 player, Realnetworks streaming audio player, etc.) are examples of content rendering apps. ), a viewer (e.g. an Abobe Acrobat pdf readers), etc.

“Referring back to FIG. 4. Recall that document information is what determines document relevance information 434. This section describes a variety of ways to obtain document information. The following examples are given in context of Web pages identified by URLs. However, the invention is not limited there to these instances.

There are several ways to get the document information (e.g. Web page contents) For example, a third party such as a Web page host or advertiser may provide document information. This document information could include information contained within the document or other information (e.g. A URL that allows for such information to be accessed. Document information (e.g. Second, document information (e.g. Web page contents), may be obtained during an advertisement request. For example, an end-user’s content rendering app (e.g. a browser) might be instructed to send Web page contents during an advertising request. Or, the document information could be fetched as part of content relevant ads serving operations 410. For future content-relevant ads targeting, third, Web page contents (e.g.) can be pre-fetched (i.e. obtained prior to a specific request). Other methods are also available for obtaining document data, such as the ones described in U.S. Patent Application Ser. No. No. Mar. 29, 2002, U.S. patent application Ser. No. 09/734 886 entitled?HYPERTEXT ROWSER ASSISTANT?” filed December 13, 2000 and U.S. Patent Application Ser. No. No. Each of these is herein incorporated as a reference to the Dec. 13, 2000 filing.

“FIG. “FIG. Content-relevant ad servicing operations 510 may be used to serve document information requests (or ad data) and may also include reply combination operations 515 and document information request distribution. (Note that ad information, or ad relevance information, as well as operations such as relevance information extraction/generation operations 412, ad-document relevance information comparison operations 414 and ad(s)-document association operations 416 are not shown in FIG. 5. This simplifies the Figure. These operations 515 can be used when multiple sources (prefetched) information 520 (or ads information) are required. One or more cached document information 530 and a larger number of?untargeted’ sources of document information can be used as sources. document information 540 and a smaller number of?targeted’. document information 550. A crawl (or other method of retrieval of targeted documents) will generally be?deeper? A crawl (or some other method of retrieval) of targeted documents will generally be?deeper? The arrows in the left margin of FIG. 5 Requests for document (or advertisement) information are moved down the double-arrow lines in FIG. 5, and responses to such requests are moved up the double-arrow lines.

Documents with relatively static information and documents that are not yet available can be pre-fetched (pre-fetched), or may be requested in real time. It may be more convenient to retrieve dynamic information in real time, responsive to a request.

“The cached information 530 could include information about documents that were requested frequently or recently.”

“The greater number of untargeted searches? Document information 540 could have been created and may be updated using a search engine spider 560. U.S. Pat. describes an exemplary search engine crawler 560. No. No. 6,285,999, which has been incorporated by reference. Information about many documents may be available. However, it is possible that information about one particular document might not be available. This is because in the so-called nonblocking implementation, where the content-relevant ads request serving operations don’t wait to obtain document information if it hasn’t been previously obtained or presently stored, a request for ads on a document might be made with so-called “house ads”. Ads for the adserver, ads shown at no cost, and ads that do not generate revenue, as well as random ads or generally performing ads, if ad revenue is determined by a user action (e.g. a click-through, conversion). Note that performance statistics of random ads and generally well-performing ads served in an untargeted manner should not be affected. It may also be possible to request ads for documents that are not normally available. is made to estimate document information. This estimate could be done by looking at the document’s position within a directory structure, and then using information from the directory (categories), or other documents of the same, similar or higher (narrower), classification. You could also look at the log of search queries that led to search results, traffic to the document and identify alternative documents. In such cases, it is possible to contact the Web site hosting the document and provide the information.

“The smaller number of targeted? document information 550 can be obtained and maintained in a variety of ways. Targeted document information retrieval (e.g. crawling) operations 580 can be used to crawl specific content provider Websites such as partner Websites 588. You may have entered some or all of these partner websites via content provider interface operations 585. A content provider such as a Web publisher can also provide document information (e.g. Web pages or URLs to newly added Web pages) directly via content provider interface operations 585.

A self-service syndication method allows content providers, such as publishers, to sign up to place content-relevant ads onto their Websites through an easy, standard and fast process. This self-service syndication method can support any of the following:

“FIG. “FIG. 6 is a flow chart of an exemplary 600 that can be used to obtain document information as part of content-relevant advertising serving operations in accordance with the principles of the invention. Acceptance of the document identifier (e.g. URL) is made. (Block 610) The document relevance information will then be determined. (Decision block 610) If the document relevance information is available (referred as a “hit?”), then it is determined. (Decision block 620) If the document relevance is available (referred to as a?hit?), then the ad serving process continues using that document relevance information. On the other hand, if the document relevance information does not exist, it is determined whether the document information is accessible (e.g. in the cache530, main repository 540 and/or GRAS repository 555). (Block 630). If so, the document relevance information is extracted from the document information (Block 640), and the ad serving processing proceeds. If it is not, the?miss? will be referred to below. It may be possible to determine if the content provider (e.g. a partner) has documents that are easily retrieved (e.g. crawled). (Block 644) A Web site can be considered difficult to crawl if it is dynamically assembled, changes frequently (e.g. news, stocks), and/or has multiple alternatives (e.g. people finders). If the content provider is difficult to crawl and has embedded scripts or links, executable instructions (e.g. Javascript) can be used to obtain document information (Block 645). The method 600 will continue at block 640. It is determined whether non-blocking or blocking ad-serving is used by the content-relevant advertising server if the content provider is easier. (Decision block 650). If the type of the content is blocking, the information is retrieved immediately at block 660. Otherwise, the 600 method continues at block 640. For later retrieval, the non-blocking type (e.g. URL) can be stored (e.g. log of unfilled request 570). Alternative ad-serving may be done. (Block 675). You can also use a?best guess’ if the document relevance information has not been made available. As disclosed previously, it may be used as well.

Referring to FIG. 5. The targeted document information retrieval (e.g. crawling) operations 580 may process the logs of unfilled requests(s)570 and identifiers such as URLs of (partner), content provider Web sites. It then retrieves relevant document information into GRAS repository 550 for future reference. Targeted crawling operations 580 can also be used to pre-crawl web pages for a Website to?prewarm? GRAS repository 550. This ensures that ads relevant to content will always be available.

“FIG. “FIG.7″ is a flow diagram for an exemplary 700 method that can be used to retrieve targeted document information in accordance with the principles of the invention. The document identifiers will be accepted in response to a trigger event 710. (Block 730). Document information for each document identifier (Loop 730-750) is retrieved. (Block 740)”

“URLs for Web page documents identified using URLs may contain information that is different across sessions to distinguish different sessions from the same Web site. These additional information can be added to URLs such as shopperids and sessionids. If this information is removed, the URL addresses the same Web pages. Session information may not have been removed from a URL. However, stored information that is not associated with the URL might not be found using the URL that has the session information as the key. This means that even though Web page content or other information is available, it may be considered unavailable because of the session information in URL. To remove such session information from URLs, document identifier (URL rewrite operations 595) may be used to make them canonical. This will allow search keys to access and retrieve the document information stored in the repositories 553, 550 and 530.

The targeted document information retrieval operation 580 could work in conjunction with the search engine spider 560 (which may do a less frequent crawl of the Web). In one embodiment, the targeted document information retrieval operation 580 may work with a limited number of Web pages per day (e.g. 2.5M Web sites/day). It 580 could be used to supplement the search engine crawler 560 and/or reduce the time it takes to launch a partner website. It might be beneficial to delegate as much of the ongoing work to the search engine crawler 550-. It may be possible to log URLs that have no document information. In this case, it might be beneficial (i) to keep a log of URLs, and (ii) for search engine crawler 560, to retrieve the document information from its own repository 540. The main repository 540 will eventually have more information.

“There are often pages that cannot be crawled. One example of dynamic web pages is those created using a search engine. Pages that are generated using filling out forms, personalized pages, pages that need a login and password are all examples. These Web pages can be extracted using real-time document information extraction 590. One embodiment extracts the contents of document information using embedded instructions (e.g. Javascript) that are included in a document. The embedded instructions (e.g. Javascript) can be sent to the content-relevant advertising operations 410 to receive one or more targeted ads for dynamic documents. ?Interesting? Document information extracted from a Web page can include meta tags, headers and titles. Fetching and content extraction occur in real time.

“In one embodiment, Javascript is used as a proxy. This Javascript extracts the?interesting? Document information, such as titles, meta tags, headers, etc. can be extracted from any Web page that it is found on. The following Javascript could be embedded on a target page:

The above example of real-time document information extraction 590 is very useful. However, there are some drawbacks. This Javascript is large and may take a while to execute per-page. The second is that Javascript needs to be modified in order to make it more efficient over time. This means that the Javascript is larger and more frequently updated, the greater chance of different versions being used.

A static Javascript link is an alternative to embedded Javascript. A static Javascript link can be used to reduce the page’s size by 4KBytes. Here’s an example of a static Javascript page link:

” ”

“Most browsers will cache the Javascript link so that you can only fetch the Javascript when it is needed.”

A second option avoids sending 1KBype of content unconditionally to the content-relevant advertising operations 410 for every Web page view by using a 2-phase model. The first phase aims to serve ads using existing document information (e.g. at cache 530 or main repository 540, and/or GRAS repositorie 550), without sending the content to the relevant ad-serving operations 410. Javascript is used to provide the document information to the browser. This will send the ‘interesting? targeted ads. A target page could include, for example:

If document information is available, or document relevance information (e.g. content), this iframe will get one or more content relevant ad. If not, the iframe will receive a Javascript which will fetch the document relevance (e.g. contents). Javascript’s “Same Origin Policy” might make this scheme less effective. A frame within one domain, such as pagead.google.com, cannot read content from another domain (e.g. aol.com). This is called the same origin policy. The two-phase approach described above may be modified to suit your needs:

The two-phase approach can sometimes be inefficient as the entire Javascript is sent by the browser when document information (content) or document relevance information are not available in cache 533. The third option is a three phase Javascript that corresponds to the two phases of Javascript but with a static link. This three-phase approach exploits the browser’s cache and returns a link to static Javascript. If necessary, the browser will load full Javascript.

“The three-phase approach always sends two requests to the content-relevant advertising operations 410. These requests are sent in parallel and don’t impact the end user’s latency. This adds to the backend load. This additional load may be acceptable as the third request in the 3-phase approach can be handled relatively easily.

“FIG. “FIG. 8” is a flow diagram for an exemplary 800 method that can be used to perform real-time document retrieval in accordance with the principles of this invention. The two-phase and the three-phase methods are both shown. A request for executable (e.g. Javascript) is processed (Block 801). It is checked whether the document information is available at cache 530 or cache 530 or main repository 540 or GRAS repository 555. (Decision block 802) If the document information has been obtained, an empty executable is returned (e.g. empty script) to the content rendering program (e.g. browser) that requested it (Block 850). This is done before the method 800 (Node 860). On the other hand, if the document information isn’t available, an executable to read it (e.g. Javascript) or a link is returned to the executable to read document information (three-phase modeling) to the content rendering app (e.g. browser) that requested it (Block 830). The document identifier is then set to address the correct document information (e.g. ads iframe URL reset to include page content), (Block 840), before the method 800 (Node 860).

The fourth option is a four-phase approach that avoids issuing twice requests. It implements the following trick. The iframe portion is the same as before. The Javascript in the footer will attempt to determine if this iframe is showing the correct ad. Iframe will be redirected to about.blank (which serves either a blank advertisement or an ad that says?place your ads here?). If there is no document information, or document relevance information (e.g. content) for the document in cache 530 Javascipt is able to read the contents of this Iframe in this instance because it’s not in another domain. An iframe that has a security exception means it is a good ad. If the Javascript does not have a security exception, it will be able to access the document information and/or document relevance information (e.g. content) and receive targeted ads. The four-phase approach is more difficult to implement and requires additional browser features (redirect, offload).

“If Javascipt size (i) is 4 KB and Javascipt contents (iii), 1 KB for URLs with content, (iii), browser caching hitrate is 90%, (iv), cached information hit rate (95%) and (v) browser cache hits rates and cached information hit rates (both independent), the three-phase technique provides a good combination of latency performance and bandwidth performance.”

“If ad statistics are being tracked, it may be necessary to take special considerations in order to determine the number of page views. Two ad requests will be issued if document information or document relevance (e.g. content) is not stored in cache530 (or cache 530, main repository540 or GRAS repository550). Page views can be overestimated, which could lead to a loss of important stats such as revenue per thousand impressions (RPM), and many other vital statistics. It is possible to show ads or a static advertisement in an iframe if the document information, relevance information, (e.g. content) is not present. However, this could corrupt statistics for pages without Javascript at their footer. Certain content providers won’t allow this Javascript footer on privacy sensitive pages. To solve this problem, you could add an additional flag to the URL of the iframe to identify pages that don’t have the footer Javascript.

Although the Javascript implementations were described, the invention does not have to be limited to Javascript. It may also use another script or executable. A toolbar/client may be added to the user’s content rendering program (e.g. browser/OS), which can send the document information to the ad servers. An http proxy can be used to monitor all document information (e.g. content) sent to a user. This proxy can then transmit this information to the advertising server.

“In the context Web pages, to target ads according to the content of the Web site, it is necessary to retrieve the information (e.g. content) from the Web site. The URL of the Webpage where the ads will appear is required. FIG. FIG. 9A shows a Web page with one or several ad spots. The Web page 900 contains content 910 and has URLMP 915. The Web page 900 could also contain one or more Iframes 920a,920b, each with its own URL 925a,925b. A script (or pointer to script) may be required to allow the content-relevant advertising server to fetch information (e.g. content) from the Web page 900. However, the URLMP 915 on the main page could be different than the URLs 925a,925b of one or several iframes. As shown in FIG. 9C) Some content provider partners may place the script (or pointer to it) 930? directly on the main page 900 As shown in FIG. 9B) or a pointer thereto, 930a or 930b within an iframe 930a or 920b with a different URLin 925 a? URLIF2 925?b URLMP 915 parent Web page 900 If script 930 is used? It is located on the main Web page as shown in FIG. 9C is a Javascript attribute that first identifies the location (URLMP). 915 of the main Web page, 900?. The script 930a or 930b is located in an iframe 920a or 920b as illustrated in FIG. 9B is a Javascript attribute that identifies the location (URLMP). 915 Instead of returning URLin 925 a, the main Web page 900? is returned. URLIF2 925 or 900? or URLIF2 925 b? Variants such as ?window.document.location?, etc., may be used instead. In any event, in order for the script to get the appropriate document information (e.g., content 910), it needs the proper URLMP and therefore needs to know which of the two methods?document.location or document.referrer?to use. While different content providers could be given Javascript to use in these cases, it complicates things and requires partners to use the correct script for the page.

“FIG. “FIG. This method 1000 uses the Javascript exception handling and iframe security model. Comparing the ad location (?document.location?) with the main page (?window.top.location?) It is possible to do so. (Block 1010) If the documents are identical, the?document.location is used. The?document.location? method is used for determining the root document. (Blocks 1020 & 1030). If they are different, the comparison fails (if the mainpage and iframe are located in the same domain) or generates an exception for security violations (an iframe cannot examine any values other than its own). If there is a mismatch, or an exception, the??document.referrer? The?document.referrer? method is used to determine the root document (mainpage) location. Blocks 1020 and 1004: This combination of exception handling and iframe security models allows for a novel and powerful way to identify the URLMP main page.

“In one embodiment, Javascript?onerror?” An exception handling method is used. Another alternative is to use?try/catch exception handling.”

“a) fetch before you ask (pre-fetch).”

“b) fetch on-demand blocking fetch (on-demand fetch)

“c) fetch on-demand non-blocking fetch (on-demand fetch after request)

“a) fetch only the Web page”

“b) Retrieve the Web page and follow its hyperlinks”

“a) Use a separate crawler

“b) fetcher embedded within the content-relevant advertising targeting system”

“The above-described implementation uses a separate crawler (Recall 580 in FIG. 5.) 5.) Fetches the Web page with its links prior to a request (prefetch). If the requested document information is not available, the Web site is downloaded after the request has been processed. (Remember blocks 670-675 in FIG. 6. 7.) 7.)

“The document information could be subject to further processing once it has been fetched.”

“Referring to FIG. “Referring back to FIG. U.S. describes several ways of extracting and/or creating relevance information. Provisional Application Ser. No. No. In U.S. Patent Application Ser. No. No. Dean, Georges R. Harik, and Paul Bucheit are the inventors. These applications are incorporated by reference. These applications are collectively referred to as “the relevant ad-server applications?” Relevance information can be defined as the topic or cluster to which an advertisement or document is related. U.S. Provisional Application Ser. No. No. 60/416/144, entitled “Methods, Apparatus and Probabilistic Hierarchical inferential Learner?” This application, filed Oct. 3, 2002 (incorporated by reference herein), describes examples of how to determine one or more concepts (referred to collectively as?phil clusters?). Information that can be used in accordance with the principles of this invention.

“In an exemplary embodiment, the present invention uses a dump of a complete advertisements database to generate an index that maps subjects (e.g., phil cluster identifiers) and matches a set of ad groups. This can be achieved using one or more of: (i) a set serving constraints (targeting criteria), (ii), text of ads within the group, (iii), content on advertiser’s website, etc.”

“Recall FIG. “Recall from FIG. To determine the degree of similarity between an advertisement and a document, you can use various similarity techniques such as those described within the relevant ad servers applications. Similar techniques can use extracted and/or generated relevancy information. Based on similarity determinations, one or more content-relevant ads can be associated with a document. An ad might be associated with a document, if the degree of similarity is greater than some absolute or relative threshold.

Once a selection of the best ad groups has been made, one or more ads can be chosen using criteria from the best group(s). This list can be used by the content-relevant an advertising server to request an ad back if M of the M criteria are compatible with a single ad category. If the criteria match one ad group, the ad will be sent back to the requestor.

“Performance information, such as a history of conversions per URL, per domain, or per URL, may be fed back to the system so that clusters of Web pages or Web pages that have better performance for certain types of ads (e.g. ads that belong to a specific topic or cluster) can be identified. This information can be used to rerank content-relevant ads so that ads are served based on both content-relevance as well as performance. There are many optimizations that can be applied to optimize performance. To avoid the need to re-computation on pages that are frequently viewed, cache the URL mapping to the relevant ad groups to prevent this from happening.

“Content-relevant ads may be combined with associated documents either in advance or on-demand, and this can happen in real time. This combination can be done by either the content-relevant ads server, the content provider or the end user’s content rendering app (e.g. browser).

Click here to view the patent on Google Patents.