Internet – Brian Pugh, Frank Eugene Pecjak, comScore Inc

Abstract for “Measurements based upon panel and census data”

This article describes how to measure a networked audience. Initially, a set of network usage data is collected based on the access to a resource by a set of client systems. The second set is based on access to the resource by a second group of client systems, using a monitoring app installed on the second batch of client systems. The first set of network usage information is used to determine the usage of the resource over a period of time. One or more adjustment factors are calculated based upon the second set. Adjustments are made to adjust the determined usage. Final, audience reports are generated for the resource that uses the adjusted usage.

Background for “Measurements based upon panel and census data”

For a variety of reasons, “Internet audience measurement” may prove useful. Some organizations might want to be able make claims about their audiences’ growth or technology. Understanding consumer behavior, including how they interact with a specific web site or group, can help organizations make better decisions about their traffic flow and the purpose of their website. Understanding the habits and visitation patterns of Internet users can be useful for advertising planning, buying and selling.

“In one aspect, a system comprises one or more processing units and one or two storage devices that store instructions. When the instructions are executed by one or more processors, the system accesses a first set usage data for a second set of resources. A first group of client system accessed the first set resources. The first set is determined using information from the client systems that received the beacon instructions. One or more processing devices can also access a second collection of usage data to access another set of resources on a network. Based on the information received by monitoring applications installed on second-tier client systems that accessed second-tier resources, the second set is determined. The second group of client system users is a representative sample of a larger user group that uses resources on the network. The instructions also cause one or several processing devices to calculate initial usage measurements for a third group of resources on a network based upon the first set usage data. This third set contains one or two common resources that are included within the second set.

“Implements may include one or more the following features. One example is that the information received by the first group client systems may include one or more beacon messages that identify common resources and include a beacon cookies with a unique client system identifier. Instructions may contain instructions that when executed cause one or more processing devices determine the initial count of unique visitors who accessed the third set resources within a given time period. This is done by counting the number of beacon messages that identify common resources and include beacon cookies with unique identifiers.

“The one or more adjustment variables may include a cookie per-person adjustment factor, which reflects the number of beacon cookies per person who accessed the common resource during the time period. The instructions could include instructions to cause one or more processing devices determine the cookie-per person adjustment factor. This ratio is calculated by comparing the projected total number set on client systems to access the common resources and the projected total people who accessed them during that time period.

“The one or more adjustment variables may include a person per-cookie adjustment factor which reflects the number of people who accessed the common resource during the time period. Instructions may contain instructions to cause one or more processing devices (or both) to calculate the person-per cookie adjustment factor. This ratio is the projected total number people who accessed the shared resources during the period and the projected total number cookies that were placed on client systems that accessed those resources during that time period.

One or more adjustment factors could include a machine over-alignment adjustment factor. This adjusts for the number of client systems used during the time period to access the common resource. It is calculated per person who accessed the common resource during that time period. Instructions may contain instructions to cause one or more processing devices (or both) to calculate the machine overlap adjustment factor. This factor is determined, at minimum in part, by an incremental number per person who accessed the shared resources during the period, a frequency per person who accessed them during that time period and an average number per day of common resource accesses during that time period. An incremental number of client system per person can be calculated based on the ratio of the total number and number of client services that accessed common resources during the period to the total number of people who accessed common resources during that period.

“One or more adjustment factors can include a non beaconed adjustment element that represents a number unique visitors who accessed one or several resources in the third resource set that is not in the first set. Instructions may contain instructions to help determine the non-beaconed adjust factor. These instructions will cause one or more processing devices determine a projected visitor count that accessed third set resources. The projected visitor count is then subtracted from the projected visitor count that accessed third set resources.

The instructions can include instructions that cause one or more processing devices, upon execution, to calculate an initial count page views for the third resource set during a given time period. This is done by counting the total number of beacon messages that identify common resources. One or more adjustment factors can include a non beaconed adjustment factor. This is a count of page views for resources within the third set that are part of the second set of resource but not the first.

“Another aspect of a method is accessing the first set usage data for a primary set of resources on a network. A first group of client system accessed the first set. The first set is then determined using information from the client systems that received the beacon instructions. Accessing a second set usage data to access a second resource on a network is also part of the method. Based on the information received by monitoring applications installed on second-tier client systems that accessed second set resources, the second set is determined. The second group of client system users is a representative sample of larger groups of users who use network resources. The method also includes the determination of initial usage data for a third group of resources on a network based upon the first set. This includes the identification of one or two adjustment factors based the second set. These adjustments factors are applied to the initial usage data to create adjusted usage measurements data. One or more reports can be generated based on these adjusted usage measurements data.

“Implements may include one or more the following features. One example is that the information received by the first group client systems could include one or more beacon messages that identify common resources and include a beacon cookies with a unique client system identifier. The initial usage data can be determined by counting the number of beacon messages received that identify the common resource and include beacon cookies with different unique IDs.

“The one or more adjustment variables may include a cookie per-person adjustment factor, which reflects the number of beacon cookies per person who accessed the common resource during the time period. The cookie-per-person adjustment factors can be calculated by dividing the projected total number cookies that were placed on client systems during the time period with the projected total number people who accessed the shared resources during that time period.

“One or more adjustment factors could include a person per-cookie adjustment factor which reflects the number of people who accessed the common resource during the time period. The person-per-cookie adjustment factors can be calculated by dividing the projected total people who accessed the shared resources over the time period by the projected total number cookies that were placed on client systems that accessed those resources.

One or more adjustment factors could include a machine over-alignment adjustment factor. This adjusts for the number of client systems used during the time period to access the common resource. It is calculated per person who accessed the common resource during that time period. The machine overlap adjustment factor can be determined by determining an incremental number per person who accessed the shared resources during the period. It may also be determined based on a frequency per person who accessed the resources during that time period and an average number per day of common resource accesses during that time period. An incremental number of client system per person can be calculated by dividing the total number and total number of clients that accessed common resources in the time period by the total number of people who accessed common resources during that time period.

“The non-beaconed adjustment factors could include a nonbeaconed factor that represents a number unique visitors who accessed resources within the third set. These resources are not part of the first set. The non-beaconed adjustment factors can be determined by determining the projected number and location of unique visitors who accessed third set resources. Next, determine the projected number and location of unique visitors who accessed common resources. Finally, subtract the projected number of unique users who accessed common resources from the projected total number of unique visits that accessed third set resources.

“Determining initial usage measurement data might include determining the initial count of pageviews for the third set resources over a period of time by determining the total count of beacon messages that identify common resources. One or more adjustment factors could include a non beaconed adjustment factor. This factor reflects the number of pageviews for resources in the third set that are part of the second set of resource but not the first.

“Implements of any of these techniques can include a method, process, an apparatus or a device, a mechanism, a system or instructions stored on a computer readable storage device. Below are the details for particular implementations. The claims and drawings will also reveal other features.

“In general, web page or other resource accesses made by client systems can be recorded and may be used to create audience measurement reports. A panel-based approach can collect data about resource accesses. Panel-based approaches generally involve installing a monitoring app on client systems of a group of users. The monitoring app then collects information about webpages or other resources accessed and sends it to a collection server.

A beacon-based approach can also collect data about resource accesses. A beacon-based approach involves associating scripts or other codes with the resource being accessed so that the code can be executed when the client system renders or uses the resource. The beacon code is executed and sends a message back to the collection server. The message contains certain information such as the resource identifier.

Panel-based and beacon-based data may be used separately for audience measurement reports. However, panel-based and beacon-based data together can be combined to create audience measurement reports. These data sets can be combined to improve the accuracy of the reports. This article will show you how to use both beacon-based and panel-based data collection methods to collect data about resource accesses. It also demonstrates techniques to combine the data from both systems to create audience measurement reports.

“FIG. “FIG. The system 100 comprises client systems 112, 11, 116, 118, one or several web servers 110, a collection service 130, and an 132 database. The panel users use client systems 112,114,116, and 118 to access Internet resources, including webpages at web servers 110. Each client system sends information about resource access to a collection server 130. This information can be used to analyze the Internet users’ usage patterns.

Each of the client systems 112, 11, 116, and118, as well as the collection server 130 and the web server 110, may be implemented by a general-purpose computer that can respond to and execute instructions in a specified manner. This could include a personal computer or special-purpose computer. A workstation, server, or mobile device. The instructions may be received from client systems 112, 114 and 116, collection server 130 and web servers 110. This could include instructions from a software program, a program or piece of code, as well as a device, computer or system that directs operations. Instructions can be stored permanently or temporarily on any machine, component, equipment or other physical storage medium capable of being used for client systems 112, 114 and 116, 118, collection servers 130, and web server 110.

FIG. 1. The system 100 comprises client systems 112, 114 and 116. In other implementations, however, the number of client systems may be greater or less. Similar to FIG. In FIG. 1, the single collection server 130 is used. In other implementations, however, there might be multiple collection servers 130. Each client system 112, 114 and 116 may send data to multiple collection servers for redundancy. Other implementations allow the client systems 112, 114, 116 and 118 to send data to different collection server. This implementation allows data representing the entire panel to be sent to and aggregated at one central location for processing later. One of the collection servers could be the central location.

“The clients systems 112, 114 and 116 are representative of the larger universe being measured. This could be the universe of all Internet users, or all Internet users within a particular geographic area. The behavior from the sample is used to project the behavior onto the universe being measured in order to understand its overall behavior. For example, independent measurements and studies can be used to determine the size and/or demographic composition of the universe. Enumeration studies can be done monthly or at other intervals using random digit dialing.

“Similarly, client systems 112,114, 116, 118, are representative of the wider universe of client system that access Internet resources. This allows for the projection of the behavior of client systems on an aggregate basis to all clients accessing the Internet resources. For example, the total universe of client systems can be calculated using independent measurements and studies.

An entity that controls the collection server 130 may recruit users to the panel. The entity may collect demographic information about the panel members, including their age, gender, household size, composition, geographic location, income, number of clients and geographic region. To ensure that the best possible random sample of the universe is collected, biases are minimized and maximum cooperation rates can be achieved, the methods used to recruit users could be selected or developed. After a user has been recruited, a monitoring program is installed on their client system. The monitoring app collects information about how the user uses the client system to access Internet resources and then sends it to the collection server 130.

“For instance, the monitoring app may have access the network stack of the client systems on which it is installed. Monitoring applications can monitor network traffic in order to collect and analyze information about requests for resources from clients and their responses. The monitoring application might collect and analyze information about HTTP requests and the subsequent HTTP responses.

“So, in system 100, the monitoring application 112b,114 b and 116 b is installed on each client system 112, 114 and 116 and 118. This application is also known as a panel app. When a client system 112, 114 or 118 has a user who uses a browser application 112a, 112b, 112b, 112b, 112b, or 112b to view and visit web pages, the monitoring application 112b, 112b, 112b, 112b, 112b, 112b, or 130b may collect information and send it to the collection server 130. The monitoring application might collect URLs and other resources visited, times they were accessed and an identifier that is associated with that particular client system (which could be linked to demographic information about the user or users). A unique identifier, for example, may be generated and linked to the specific copy of the monitoring software installed on the client’s system. Monitoring applications may also collect and send information regarding requests for resources and any subsequent responses. The monitoring application might collect cookies that are sent to it and/or received by it in response. This information is received and recorded by the collection server 130. The collection server 130 collects and records the information from client systems. It then stores the aggregated information in the database 132, as panel centric data (132 a).

“The panel centric data (132 a) may be used to analyze the habits and visitation of panel users. This information may then be used to extrapolate to all Internet users. Any information that is collected during a session can be used to identify a user of the client software (and/or their demographics). The monitoring application might require that the user identify himself or use techniques like those described in U.S. Patent Application No. 2004-0019518 and U.S. Pat. No. No. 7,260,837 and both are incorporated herein by reference may be used. The client system can be used to identify the user. This allows the usage information to possibly be extrapolated per person rather than per machine. This allows measurements to be attributed to individual users of the client system, not machines.

“To extrapolate panel member usage to the larger universe being measured,” some or all members of the panel have been weighted and projected onto the larger universe. A subset of members may be projected and weighted in some cases. Analyzing the received data might indicate that some panel members’ data is not reliable. These members could be excluded from reporting, and thus, prevented from being projected and weighted.”

The users included in the projection and weighting are weighted so that the reporting sample represents the demographic composition of the universe to be measured. This weighted sample is then projected to the entire universe. You can do this by assigning a projection weight to each member of your reporting sample, and then applying that projection weight on the member’s usage. A reporting sample of client systems can be projected to all client systems using client system projection weights. “The projection weights for client systems are usually different from those of the users.”

“The usage behavior of either the client system or user in the projected weighted sample may be taken to represent the behavior of the defined world (either client system or user). The behavior patterns seen in the projected, weighted sample could be taken to reflect the patterns found in the universe.

This information can be used to calculate the number of visitors and other behaviors. This data can be used to determine the number of unique visitors (or clients systems) to certain web pages and groups of pages, or unique visitors from a specific demographic to certain web pages. These data can also be used to estimate other factors such as frequency of usage per client system, average number pages viewed per client system user and average time spent per user.

“As explained further below, such estimates or other information determined by the panel centric may be combined with data from a beacon based approach to generate reports on audience visitation and other activity. These reports may be more accurate if the panel centric data is combined with beacon-based data.

Referring to FIG. A system 200 may be used to implement a beacon-based method. A beacon-based approach could include beacon code being included in one or more web pages.

“System 200” includes one or more client system 202, web servers 110, collection servers 130, and database 132. Client systems 202 may include client systems 112, 11, 116, or118 that have the panel app installed.

“The client systems contain a browser application (204), which retrieves web pages from web servers 110 and renders those web pages. Beacon code 208 is included in some web pages 206. The beacon code 208 can be included in web pages by publishers who agree to allow the entity that operates the collection server 130 to use it. This code 208 will be rendered along with the web page that has the code 208. The code 208 is rendered and causes the browser application to 204 to send a request to the collection server 130. The message contains certain information such as the URL for the page where the beacon code 208 was included. The beacon code could be JavaScript code, which accesses URLs on the site where the code is embedded and sends them to 130 via HTTP Post messages that include the URL in a query string. The beacon code could also be JavaScript code, which accesses URLs on the page where the code is embedded and then sends to the collection server 130 an HTTP Post message that includes the URL in a query string. attribute of an tag which causes a request for the resource at the URL in?src attribute of the tag to the collection server 130. Because the URL of the webpage is included in the ?src? attribute, the collection server 130 receives the URL of the webpage. The collection server 130 can then return a transparent image. The following is an example of such JavaScript:”

” ”

“The collection server 130 records URLs received in messages with, for example, a time stamp indicating when the message was received as well as the IP address of client systems from which it was received. This information is then compiled by the collection server 130 and stored in the database 132 as site-centric data 132b.

“The message could also contain a unique identifier that identifies the client system. A unique identifier for the client system may be generated when a client sends a beacon to collection server 130. This unique identifier can then be associated with the beacon message. This unique identifier could then be added to a cookie that is created on the client system 102. The cookie may be added to any subsequent beacon messages sent from the client system. This will ensure that messages contain the unique identifier of the client system. If the beacon message is not received by the client system (e.g. because the user has deleted cookies on their client system), the collection server 130 can generate a unique ID and include it in a new client system cookie set.

“Thus, clients systems 102 can access webpages (e.g. on the Internet), and client systems102 can access webpages that contain the beacon code. Messages are sent to collection server 130. These messages contain the URL of the page accessed and a possible unique identifier for client systems that sent it. A record for the message may be created when it is received by the collection server 130. A record can be created for each message that is received at the collection server 130. It may contain an identifier (e.g. the URL) of the page accessed by the client systems, a unique identifier for the client systems, a time when the message was received (e.g. by adding a time stamp to indicate when the message was received by collector 130), and a network adress, such as an IP, of the client systems that accessed this webpage. These records may be gathered by the collection server 130 and stored in the database 132, site centric data 132b.

“Beacon messages are sent regardless of whether the client system has the panel app installed. The panel application records the beacon message and sends it to the collection server 130 for client systems that have the panel application installed. If the panel application is monitoring HTTP traffic, the beacon message can be sent via an HTTP Post message or as a result. The beacon message is then recorded in the HTTP traffic that the panel application records, along with any cookies included in the beacon message. In this example, the collection server 130 is notified that the beacon message has been received by the collection server 130.

“Because the beacon message can be sent regardless of whether the panel app is installed, site centric data (132 b) directly represents the accesses of members of larger universe to measure, and not just members of the panel. Site-centric data 132b can be used to generate audience measurement data for web pages or groups that contain the beacon code. This initial data could contain inaccuracies for a variety of reasons. The panel-centric data, 132 a, can be used to calculate adjustment factors that could increase the accuracy site-centric data.

“FIG. “FIG. The system 300 has a reporting server 302. The reporting server 302 can be used to execute instructions in a specified manner. It could use a general-purpose or special-purpose computer, workstations, servers, mobile devices, and personal computers. A software application, program, piece of code, device, computer, system or combination thereof may be sent to the reporting server 302. Instructions can be stored permanently or temporarily on any machine, component, equipment or other storage medium that can be used by the reporting server 302.

“The reporting server 302 executes instructions to implement a measurement processor 304, and a report generation program 308. The measurement processor 304 contains a pre-processing and initial measurement modules 304a, 304b, and 304c. A report generation module 308 is also implemented by the measurement data processor. 4. To generate unified and adjusted measurement data 306 using the panel centric information 132a and site centric information 132b. Report generation module 308 can use the unified and adjusted measurement data 306 in order to generate one or several reports 310. These reports may include information about client system accesses for one or more resources.

“FIG. “FIG. This describes process 400. It is performed by the preprocessing module 302 a, the initial measuring module 304b, the measurement adjustment program 304c, and report generation module 308 c. The process 400 can be done by other systems and configurations.

“The pre-processing module (304 a) accesses the site centric and panel data 132a (402). The panel centric information 132a refers to a first set resources that were accessed by the first set (those on the panel), and the sitecentric data132b refers to a second set resources that were accessed by the second set. Some of the second-set client systems could be in the panel, while others may not. The second set of resources could also include resources that were included in the first resource set.

“The panel-centric data 132a may contain records that reflect URLs or other identifiers for web pages or other resources accessed. They also include identifiers for client systems that accessed those resources. It may also include information about requests and responses that were used to access those resources (e.g., cookies that were sent and/or received in responses). Site centric data 132b can include records that indicate a URL or another identifier of a resource accessed by a client, the network address of the client that accessed that resource, the time the resource was accessed (for instance, as indicated by a time stamp at the time the collection server 130 received the beacon message), and an unique identifier for that client system (for example in a cookie attached with the beacon message).

“The site centric and panel data 132a are the data aggregated over a specific time period. The accessed data could be, for example, the panel centric information 132a and site centric info 132b that were aggregated over the past 30 days.

“The pre-processing modules 304a perform one or more preprocessing functions on the accessed panels centric data (132 a) and the accessed sites centric data (132 b) (404). The pre-processing module, 304 a may process raw panel centric data 130 a to create state data which represents all facts of usage within a single record. A record in state data might indicate that a user visited web page B on a specific date and time using a specific client system. Pre-processing module (304a) may also match some or all URLs in records of state data to patterns within a dictionary. This may allow for the organization of different URLs into digital media property, reflecting the way Internet companies run their businesses. Each pattern can be associated with a website entity. This could be a collection of web pages or web pages that have been logically grouped together to reflect how Internet companies run their businesses. The finance.yahoo.com domain might include a number of web pages. These web pages could be logically combined into one web entity, such as Yahoo Finance. To reflect different Internet media companies and their ways of arranging their web properties, the dictionary could include several hierarchically linked entities. Yahoo Finance may be considered a subset or part of the Yahoo web entity. This may include all the pages in the yahoo.com website. Other web entities may be included in the Yahoo web entity, such as the Yahoo Health web entity (associated to the various pages on the health.yahoo.com website). Pre-processing module (304a) may associate a state record with the lowest level web entity associated to the URL in the state records.

“The pre-processing module 306 a can also remove data from the panel centric data 130 a for users who are not included in the reporting sample. There may be rules that must be met to ensure that the complete report of a user’s usage during the reporting period has been received. The user could be removed from the report sample if they do not meet these rules. A user can also be removed from the reporting sample if they do not meet certain criteria such as living in a specific area.

“In addition, pre-processing module 306 a may also remove certain types records. Records that are non-human initiated requests (e.g. requests made to render a webpage) or redirects may be deleted.

“The pre-processing module304 a may process site centric information 132b to match some or all URLs in the site records 132b to patterns in dictionary so that the records are associated with a web entity such as the lowest web entity in a hierarchy. To determine the measurement data 306, the actions 406 through 410 can be done per-web entity. To determine the measurement data 306.

“In addition, pre-processing module 204 a may delete certain records from site centric data 132. The pre-processing module, 304 a can remove records that are not human initiated from the site data 132. To remove records that are attributed to accesses by those robots, you can use a list of search index crawlers. Alternately, records that indicate sequential accesses by a client system to the same web page or to different web pages may be removed. For example, accesses spaced 3 seconds apart or less could mean that accesses following the first one are not being made. This can be used to delete records that are not human initiated, and also correct errors related to the beacon code that could result in more than one beacon message per acces.

“In some cases, records may be deleted for client system devices. Records for mobile devices, for example, may be removed. These records can be identified based on the user agent data that was sent with the beacon message. This information may then be recorded in the record. Client systems that are not located in a specific geographic area may have records removed. This is for example, when reports are generated for North America. A reverse lookup (e.g., an IP address reverse lookup) may determine the country and region where the record is located. The reverse lookup of the IP address may also be used to detect shared-use client systems (e.g. client systems that are available to the public in libraries).

“Pre-processing both the panel centric and site centric data may involve delineating between client systems. Sometimes it is desirable to divide reports into classes according to client systems. In one example, reports and the underlying data are divided into home client systems and work client systems. Home client systems are those that are used at their home, while work client system are those that are used at work. These subpopulations can both be identified and distinguished in panel centric data (132 a) because the users themselves identified the machines as either home or work (or some other class) during registration. These two sub-populations can be identified and separated in the site-centric data 132 a. The beacon messages received between 8 and 6 pm local on Monday through Friday could be considered work-generated traffic. The Home sample may target all traffic.

“Another way to identify and separate these two subpopulations from the site centric information 132 b is to use a model that is based on the observed work behavior in panel centric data data 132a. This model could be based on day-of-the-week usage profiles and time of day. All traffic to an IP address that matches the profile of a work machine may be considered work traffic. Panel data might indicate that a machine may be considered work-related if it has more accesses in a given time period (a work period), than in a different time period (a home period). This data may be combined with site-centric data to help classify network access provider into work or home. It is based on whether accesses by users using those network access providers are higher during work hours than at home. This information may be used to determine the IP address of a machine and classify it as the network provider. These techniques are described in U.S. patent Ser. No. No. 61/241/576, filed Sep. 11, 2009. Titled?Determining Client Systems Attributes.

“Actions 406-410 may then be performed separately on each subpopulation to generate measurement data for both the home and work populations. These reports can be generated separately or combined, as described in action 412. Another implementations could also divide between several subpopulations.

“The initial measurement module (304 b) determines the initial usage measurement data, based on pre-processed sitecentric data (406). The initial measurement module 304b can be used to determine the number of unique visitors to a web entity. The number of unique visitors could be defined as the number of people who viewed and requested a web page from the web entity. The initial measurement module 304b can count the number unique cookies received in beacon messages to determine the initial measurement of unique users.

“Another example is that the initial measurement module (304 b) may determine an initial measurement for page views for a web entity. The number of page views for a web entity may be the number of times they were requested and/or viewed. This is independent of whether or not the pages were requested or viewed individually. The initial measurement module 304b could count the number of beacon messages that were received by the web entity.

“The measurement adjustment module (304 c) determines one or several adjustment factors based upon the pre-processed panel data (408). For a variety of reasons, the initial audience measurement data may not be accurate. It is based on pre-processed sitecentric data. Pre-processed panel data can be used to adjust for inaccuracies.

“For instance, if the first measurement of unique users is based solely on beacon measurements, there could be an over or undercount of unique visitors. This is because cookies are set on a machine-by-browser basis and not on a person basis. This means that even though multiple users may use the same client system, only one cookie can be set and counted for each machine and browser. This could lead to an undercounting unique visitors.

“In addition, a cookie that was previously stored on a client system could be deleted. This will result in a new cookie with a new identifier for any subsequent accesses within the reporting period. Accesses made by the same user could be mistakenly identified with accesses from different users. This can lead to an overcount of unique visitors. A user could also use multiple browsers with different cookies set for each. Because a user may use different browsers on the same computer, multiple cookies could be set for that user. This could lead to an overcount of unique visitors.

A cookie-per-person adjustment factor can be calculated based on panel centric data to account for inaccuracies. This adjustment factor can be calculated on a per-web entity basis. This adjustment factor could be used to determine the cookie-per-person number that is set per person who visits beaconed pages (web pages that contain the beacon code). This adjustment factor can be used to adjust the total number of unique visitors to compensate or multiple cookies per person. For example, the process 500 discussed in FIG. 5 may be used to determine this adjustment factor. 5.”

“A user can also have multiple client systems at a given place (for instance, at home). This may lead to separate cookies being set on multiple client systems and counting each user who visits the website entity. This could lead to an overcounting unique visitors. This can lead to an overcounting of unique visitors. A machine overlap adjustment factor could be calculated based on pre-processed panel data. This adjustment factor can be calculated on a per web entity basis. This adjustment factor may be used to adjust for multiple cookie usage by a visitor to the web entity. It can also reflect the number and type of client systems that are being used. The process 600, which is described in FIG. 6, may help determine this adjustment factor. 6.”

A non-beaconed adjustment coefficient may be calculated based on pre-processed panel-centric data to account for any inaccuracies in page views or unique visitors due to a failure of beacon code being included in all web pages for a web entity. This adjustment factor can be calculated on a per-web entity basis. The panel applications should capture all web traffic. However, non-beaconed visits to web pages of a given entity may also be captured and reported by them. The panel data can be used to calculate a non-beaconed adjustment coefficient that is based on unique page views and visits to web pages by the web entity. This factor does not depend on beacon messages. For example, process 700 may be used to determine this adjustment factor. 7.”

“The measurement adjustor module304 c applies adjustment factors to initial usage measurement data to produce adjusted usage measurement data 306 (410) For instance, in one implementation for audience measurement data that reflects unique visitors for a given web entity, the measurement adjustor module 304 c may generate adjusted unique visitors data as follows:\nAdj UVs=((Init UVs/Cookie-Per-Person)*Machine Overlap)+Non-Beaconed\nwhere Adj UVs is the adjusted unique visitors count, Init UVs is the initial count of unique visitors based on the pre-processed site centric data, Cookie-Per-Person is the cookie-per-person adjustment factor, Machine Overlap is the machine overlap adjustment factor, and Non-Beaconed is the non-beaconed adjustment factor. You can multiply the Cookie-Personal adjustment factor (a Person Per-Cookie adjustment factor), rather than divide it.

“As another example, in one implementation for audience measurement data that reflects the total page views of web pages for a given web entity, the measurement adjustor module 304 c may generate adjusted page views data as follows:\nAdj PageViews=Init PageViews+Non-Beaconed\nwhere Adj PageViews is the adjusted page views count, Init PageViews is the initial page views count based on the pre-processed site centric data, and Non-Beaconed is the non-beaconed adjustment factor.”

“The report generation module 308 generates audience measurement reports using the adjusted audience measurement data (412). In an example, the report generation module 308 may generate reports about unique visitors and page views for a particular web entity, for either one or both the home and work population. In such an implementation, report generation module 308 can generate reports on unique visitors to a web entity or page views that combine the home or work populations. The report generation module can combine page views for both the home- and work populations to create a combined count of pageviews, and/or combine unique visitors for both the home- and work populations to produce a combined count.

“In some cases, the report generation module 308 may produce a combined count for unique visitors. The module also takes into consideration the number of users who are both at home and at work. Sometimes, the user may access the website from both a work client and home client system. If the count for the home population were simply added to that of the work population, the user would then be counted twice. Panel centric data 132a may be used by the report generation module 308 to calculate the user overlap between these two populations and remove duplicates. One example is that a number users could install the monitoring app on both their home and work systems. Each user may be designated as such. The data from these users can be used for estimating the number of people who visit the web pages of the web entity via both work and home client systems. This information can then be used to de-duplicate the users in the total count of unique visitors.

“FIG. “FIG.5” is a flowchart that illustrates a process 500 to determine a cookie per-person adjustment factor. This describes process 500 as it is performed by the measurement adjustment program 304 c. However other systems and configurations may also perform the process 500. This adjustment factor can be used to adjust initial audience measurement data for a web entity, as noted above. The following describes the implementation of process 500, where the actions 502 through 506 are executed on a web entity basis.

“The measurement adjustment module304 c calculates, using pre-processed panel data, the count of unique visitors who visited a beaconed page of a web entity (502). The total number of unique visitors can be calculated by adding the projection weights of each member to the pre-processed data. A member’s projection weight may represent the number of people that they are in the total universe. Therefore, by adding up the projection weights of each member, the total number who visited a beaconed page of the web entity may be determined.

“The measurement adjustment module (304 c) counts the number of beacon cookies that a web entity has received (504), based on pre-processed panel-centric information. The measurement adjustment module 304c can, for example, determine which client systems accessed the beaconed page of the web entity using the pre-processed panel-centric data. The measurement adjustment module 304c can then determine which client systems were accessed by beacon messages. (also known as “beacon cookies”) During the reporting period. The panel applications can record and report beacon messages and associated cookies (beacon cookies) for client systems that have the panel application installed. The measurement adjustment module 304c can then generate a projected client system cookie count by adding the projection weight of the user to the number sent by client systems during the reporting period. The measurement adjustment module, 304 c adds together the projected cookie counts to calculate the total number beacon cookies that have been sent by the client system during the reporting period. To determine the projected cookie count, it is possible to add more than one user to a client system.

The measurement adjustment module 304c calculates the cookie-per person adjustment factor by taking the ratio between total unique visitors to total unique visitors. In other words, the measurement adjustment module 304 c determines Cookie-Per-Person as:\nCookie-Per-Person=Total Cookies/Total Unique Visitors\nwhere Total Cookies is a count of the total number of beacon cookies for the web entity and Total Unique Visitors is a count of the total number of unique visitors for the web entity. The reciprocal of Cookie-Per?Person adjustment factor (Person?Per-Cookie), may also be used. You can determine the Total Unique Visitors/Total Cookies to determine the Person-Per-Cookie factor.

“FIG. “FIG.6” is a flowchart that illustrates a process 600 to determine a machine overlap adjustment coefficient. This describes 600 as it is performed by the measurement adjustment modules 304 c. However the 600 process can be performed by other systems and configurations. This adjustment factor can be used to adjust initial audience measurement data for a web entity, as noted above. The following describes the implementation of process 600, in which 602-606 are executed on a web entity basis.

“The measurement adjustment module304 c calculates, based upon pre-processed panelcentric data, the client system to person ratio of a given web entity (602. As mentioned above, a user can have multiple client systems at a given place (e.g., at home). Even though one user is currently visiting the website, multiple cookies can be placed on different client systems and counted. A client system to person ratio can be calculated for any given web entity based on pre-processed panel data. This includes the universe of Internet users and clients systems or users in a specific geographic area. The measurement adjustment module 304c can determine the client system-to-person ratio for a given website entity by determining the total number and user count of defined universe clients that accessed web pages of that web entity.

As described above, projection weights can be used to project users to the total number (or Internet users within a specific geographic region) of Internet users. Projections weights may also be used to project client systems to the total universe (or at least the total in a given geographic region) of client systems that access the Internet. To determine the total client systems that accessed the web page of the web entities, the measurement adjustment module (304 c) may determine which client systems accessed web pages from the entity’s web site during the reporting period. The projection weights for these client systems can then be added to the total client system count to determine the total client systems that accessed the web page of the entity. The measurement adjustment module 304c can also be used to determine total users. It may use pre-processed panel data to identify the users who accessed web pages for the web entity in the reporting period. After adding up the projection weights, it will determine the total users that accessed the web entities web pages.

“Based on the client-system to person ratio, measurement adjustment module 304c calculates the expected reach based upon all panelists in pre-processed panel data across all client systems where those panelists were active (604). Reach is simply the percentage of users who visited a particular web page during a specified period of time, such as the reporting period. The reach percentage is simply the number of visitors who visited the web page.

“The expected reach may be calculated using all panelists in all client systems where they are active.

“pRE 1 + E – 1?? p l ? ? ( E -1 S -1 ) /? ? n (T ) or (1 + q) RE 1 + (E – 1?) ( 1 + q ) l ? ? ? ( E -1 S -1 ) /? ? ? ( T )nwhere?

“p” is the client system to person ratio.

“M p p p;”

“q=the incremental amount of client systems used people=(p?), assuming no shared use machines so that people only use one machine.

“T” is the reporting period in days (e.g. 30 days);

“R=the projected reach for the reporting period T;

“E” is the frequency of visits per visitor to a website page of the web entity over the period T;

“S” is the average number of visits to a web site by a user per day over the period T.

The projected reach, R, for the reporting period may be calculated by using the preprocessed panel-centric data to determine the projected user count that visited the web page during the reporting period, and then dividing that number by the estimated total universe of users. E is the frequency of visits to a website page by a visitor. This can be calculated using pre-processed panel data. These data include the total number of visitors to the entity’s web pages during the reporting period. Then, divide those numbers by the total estimated universe of users. You can calculate the average page visit to a web site of the entity per day by using pre-processed panel data. Add these values together and divide that number by the number of days in your reporting period.

“Based on the client-system to person ratio, measurement adjustment module 304c determines the incremental reach that is not measured due the client systems used members of the panel but are not included in the panel and the reach, R, which was measured by the panel (506). The following formula can be used to determine the expected reach gain from incremental machine activity that is not measured by the panel:

“qRE 1 + E – 1?? q l ? ? ? ( E -1 S -1 ) /? ? ????? n???? ( T )”

“This incremental reach can be then added to the measured reach R.”

“The measurement adjustment module (304 c) determines the machine overlap adjustment factor by determining the proportion of the expected reach across all client system to the incremental reach and measured reach (508). The measurement adjustment module 304c can determine the machine overlap adjustment coefficient based on these factors:

“( 1 + q ) ? RE 1 + (E – 1?) ( 1 + q ) l ? ? ? ( E -1 S -1 ) /? ? ? ( T ) R + qRE 1 + ( E – 1 ) ? q l ? ? ? ( E -1 S -1 ) /? ? ????? n???? ( T )”

“Which simplifies it to:

“( 1 + q ) ? E 1 + E – 1?? ( 1 + q ) l ? ? ? ( E -1 S -1 ) /? ? ? ( T ) 1 + qE 1 + ( E – 1 ) ? q l ? ? n ( E -1 S – 1? ) / ? ???? n n???? ( T )”

The measurement adjustment module 304c can calculate the machine overlap adjustment factor by using the simplified equation. The measurement adjustment module 304c, for example, may calculate the client system-to-person ratio, then determine the incremental number (e.g. by determining p.1), determine how often people visit a web site of the web entity, then calculate the machine overlap adjustment factor using the simplified equation.

“Moreover, the projection weights of the clients systems and users in the defined universe can be calculated accurately. The client system to person ratio can then be used as the machine overlap adjustment factor. It may not be possible to do such precise weighting and estimating. It is possible that there are a mixture of primary (those most frequently used to access the Internet) or secondary (those who use the Internet less often), but it is not always possible to know the exact mix. The sample composition and site may cause the client system-to-person ratio to be more biased towards primary or secondary usage. These errors can be compensated by using the client system-to-person ratio to calculate a machine overlap adjustment factor to compensate for possible errors in weighting and estimating the universe. If the combined reach of the sample exceeds the incremental reach to measured reach, then the machine overlap adjustment factor will increase unique visitors. If the expected combined reach of the web entity is lower than the incremental reach to measured reach, the sample will be biased more towards primary use. The machine overlap adjustment factor will reduce unique visitors to account to incremental secondary usage.

“FIG. “FIG.7” is a flowchart that illustrates a process 700 to determine a non-beaconed adjust factor. This describes process 700, which is performed by measurement adjustment module (304 c). However, other systems and configurations may perform process 700. This adjustment factor can be used to adjust initial audience measurement data for a web entity, as noted above. The following describes the implementation of process 700, where the actions 702-706 are executed on a web entity basis.

“The measurement adjustment module (304 c) determines, depending on the audience measurement, the total number of unique visitors or pageviews for a given website entity using pre-processed panel data (702). Since panel applications should capture all web traffic, non-beaconed visits to web pages are also recorded and reported by panel applications. The measurement adjustment module 304c can use pre-processed panel data in order to calculate the total number of unique visitors and page views for a particular web entity. This is true even if not all web pages have beacon codes.

For example, the total number unique visitors can be calculated by adding the projection weights of each member of the panel to the count of page views for the member that visited the web site. To determine the total page views, one could apply each member’s projected weight to the number of pageviews for that member and then add all the projected pageviews together.

Summary for “Measurements based upon panel and census data”

For a variety of reasons, “Internet audience measurement” may prove useful. Some organizations might want to be able make claims about their audiences’ growth or technology. Understanding consumer behavior, including how they interact with a specific web site or group, can help organizations make better decisions about their traffic flow and the purpose of their website. Understanding the habits and visitation patterns of Internet users can be useful for advertising planning, buying and selling.

“In one aspect, a system comprises one or more processing units and one or two storage devices that store instructions. When the instructions are executed by one or more processors, the system accesses a first set usage data for a second set of resources. A first group of client system accessed the first set resources. The first set is determined using information from the client systems that received the beacon instructions. One or more processing devices can also access a second collection of usage data to access another set of resources on a network. Based on the information received by monitoring applications installed on second-tier client systems that accessed second-tier resources, the second set is determined. The second group of client system users is a representative sample of a larger user group that uses resources on the network. The instructions also cause one or several processing devices to calculate initial usage measurements for a third group of resources on a network based upon the first set usage data. This third set contains one or two common resources that are included within the second set.

“Implements may include one or more the following features. One example is that the information received by the first group client systems may include one or more beacon messages that identify common resources and include a beacon cookies with a unique client system identifier. Instructions may contain instructions that when executed cause one or more processing devices determine the initial count of unique visitors who accessed the third set resources within a given time period. This is done by counting the number of beacon messages that identify common resources and include beacon cookies with unique identifiers.

“The one or more adjustment variables may include a cookie per-person adjustment factor, which reflects the number of beacon cookies per person who accessed the common resource during the time period. The instructions could include instructions to cause one or more processing devices determine the cookie-per person adjustment factor. This ratio is calculated by comparing the projected total number set on client systems to access the common resources and the projected total people who accessed them during that time period.

“The one or more adjustment variables may include a person per-cookie adjustment factor which reflects the number of people who accessed the common resource during the time period. Instructions may contain instructions to cause one or more processing devices (or both) to calculate the person-per cookie adjustment factor. This ratio is the projected total number people who accessed the shared resources during the period and the projected total number cookies that were placed on client systems that accessed those resources during that time period.

One or more adjustment factors could include a machine over-alignment adjustment factor. This adjusts for the number of client systems used during the time period to access the common resource. It is calculated per person who accessed the common resource during that time period. Instructions may contain instructions to cause one or more processing devices (or both) to calculate the machine overlap adjustment factor. This factor is determined, at minimum in part, by an incremental number per person who accessed the shared resources during the period, a frequency per person who accessed them during that time period and an average number per day of common resource accesses during that time period. An incremental number of client system per person can be calculated based on the ratio of the total number and number of client services that accessed common resources during the period to the total number of people who accessed common resources during that period.

“One or more adjustment factors can include a non beaconed adjustment element that represents a number unique visitors who accessed one or several resources in the third resource set that is not in the first set. Instructions may contain instructions to help determine the non-beaconed adjust factor. These instructions will cause one or more processing devices determine a projected visitor count that accessed third set resources. The projected visitor count is then subtracted from the projected visitor count that accessed third set resources.

The instructions can include instructions that cause one or more processing devices, upon execution, to calculate an initial count page views for the third resource set during a given time period. This is done by counting the total number of beacon messages that identify common resources. One or more adjustment factors can include a non beaconed adjustment factor. This is a count of page views for resources within the third set that are part of the second set of resource but not the first.

“Another aspect of a method is accessing the first set usage data for a primary set of resources on a network. A first group of client system accessed the first set. The first set is then determined using information from the client systems that received the beacon instructions. Accessing a second set usage data to access a second resource on a network is also part of the method. Based on the information received by monitoring applications installed on second-tier client systems that accessed second set resources, the second set is determined. The second group of client system users is a representative sample of larger groups of users who use network resources. The method also includes the determination of initial usage data for a third group of resources on a network based upon the first set. This includes the identification of one or two adjustment factors based the second set. These adjustments factors are applied to the initial usage data to create adjusted usage measurements data. One or more reports can be generated based on these adjusted usage measurements data.

“Implements may include one or more the following features. One example is that the information received by the first group client systems could include one or more beacon messages that identify common resources and include a beacon cookies with a unique client system identifier. The initial usage data can be determined by counting the number of beacon messages received that identify the common resource and include beacon cookies with different unique IDs.

“The one or more adjustment variables may include a cookie per-person adjustment factor, which reflects the number of beacon cookies per person who accessed the common resource during the time period. The cookie-per-person adjustment factors can be calculated by dividing the projected total number cookies that were placed on client systems during the time period with the projected total number people who accessed the shared resources during that time period.

“One or more adjustment factors could include a person per-cookie adjustment factor which reflects the number of people who accessed the common resource during the time period. The person-per-cookie adjustment factors can be calculated by dividing the projected total people who accessed the shared resources over the time period by the projected total number cookies that were placed on client systems that accessed those resources.

One or more adjustment factors could include a machine over-alignment adjustment factor. This adjusts for the number of client systems used during the time period to access the common resource. It is calculated per person who accessed the common resource during that time period. The machine overlap adjustment factor can be determined by determining an incremental number per person who accessed the shared resources during the period. It may also be determined based on a frequency per person who accessed the resources during that time period and an average number per day of common resource accesses during that time period. An incremental number of client system per person can be calculated by dividing the total number and total number of clients that accessed common resources in the time period by the total number of people who accessed common resources during that time period.

“The non-beaconed adjustment factors could include a nonbeaconed factor that represents a number unique visitors who accessed resources within the third set. These resources are not part of the first set. The non-beaconed adjustment factors can be determined by determining the projected number and location of unique visitors who accessed third set resources. Next, determine the projected number and location of unique visitors who accessed common resources. Finally, subtract the projected number of unique users who accessed common resources from the projected total number of unique visits that accessed third set resources.

“Determining initial usage measurement data might include determining the initial count of pageviews for the third set resources over a period of time by determining the total count of beacon messages that identify common resources. One or more adjustment factors could include a non beaconed adjustment factor. This factor reflects the number of pageviews for resources in the third set that are part of the second set of resource but not the first.

“Implements of any of these techniques can include a method, process, an apparatus or a device, a mechanism, a system or instructions stored on a computer readable storage device. Below are the details for particular implementations. The claims and drawings will also reveal other features.

“In general, web page or other resource accesses made by client systems can be recorded and may be used to create audience measurement reports. A panel-based approach can collect data about resource accesses. Panel-based approaches generally involve installing a monitoring app on client systems of a group of users. The monitoring app then collects information about webpages or other resources accessed and sends it to a collection server.

A beacon-based approach can also collect data about resource accesses. A beacon-based approach involves associating scripts or other codes with the resource being accessed so that the code can be executed when the client system renders or uses the resource. The beacon code is executed and sends a message back to the collection server. The message contains certain information such as the resource identifier.

Panel-based and beacon-based data may be used separately for audience measurement reports. However, panel-based and beacon-based data together can be combined to create audience measurement reports. These data sets can be combined to improve the accuracy of the reports. This article will show you how to use both beacon-based and panel-based data collection methods to collect data about resource accesses. It also demonstrates techniques to combine the data from both systems to create audience measurement reports.

“FIG. “FIG. The system 100 comprises client systems 112, 11, 116, 118, one or several web servers 110, a collection service 130, and an 132 database. The panel users use client systems 112,114,116, and 118 to access Internet resources, including webpages at web servers 110. Each client system sends information about resource access to a collection server 130. This information can be used to analyze the Internet users’ usage patterns.

Each of the client systems 112, 11, 116, and118, as well as the collection server 130 and the web server 110, may be implemented by a general-purpose computer that can respond to and execute instructions in a specified manner. This could include a personal computer or special-purpose computer. A workstation, server, or mobile device. The instructions may be received from client systems 112, 114 and 116, collection server 130 and web servers 110. This could include instructions from a software program, a program or piece of code, as well as a device, computer or system that directs operations. Instructions can be stored permanently or temporarily on any machine, component, equipment or other physical storage medium capable of being used for client systems 112, 114 and 116, 118, collection servers 130, and web server 110.

FIG. 1. The system 100 comprises client systems 112, 114 and 116. In other implementations, however, the number of client systems may be greater or less. Similar to FIG. In FIG. 1, the single collection server 130 is used. In other implementations, however, there might be multiple collection servers 130. Each client system 112, 114 and 116 may send data to multiple collection servers for redundancy. Other implementations allow the client systems 112, 114, 116 and 118 to send data to different collection server. This implementation allows data representing the entire panel to be sent to and aggregated at one central location for processing later. One of the collection servers could be the central location.

“The clients systems 112, 114 and 116 are representative of the larger universe being measured. This could be the universe of all Internet users, or all Internet users within a particular geographic area. The behavior from the sample is used to project the behavior onto the universe being measured in order to understand its overall behavior. For example, independent measurements and studies can be used to determine the size and/or demographic composition of the universe. Enumeration studies can be done monthly or at other intervals using random digit dialing.

“Similarly, client systems 112,114, 116, 118, are representative of the wider universe of client system that access Internet resources. This allows for the projection of the behavior of client systems on an aggregate basis to all clients accessing the Internet resources. For example, the total universe of client systems can be calculated using independent measurements and studies.

An entity that controls the collection server 130 may recruit users to the panel. The entity may collect demographic information about the panel members, including their age, gender, household size, composition, geographic location, income, number of clients and geographic region. To ensure that the best possible random sample of the universe is collected, biases are minimized and maximum cooperation rates can be achieved, the methods used to recruit users could be selected or developed. After a user has been recruited, a monitoring program is installed on their client system. The monitoring app collects information about how the user uses the client system to access Internet resources and then sends it to the collection server 130.

“For instance, the monitoring app may have access the network stack of the client systems on which it is installed. Monitoring applications can monitor network traffic in order to collect and analyze information about requests for resources from clients and their responses. The monitoring application might collect and analyze information about HTTP requests and the subsequent HTTP responses.

“So, in system 100, the monitoring application 112b,114 b and 116 b is installed on each client system 112, 114 and 116 and 118. This application is also known as a panel app. When a client system 112, 114 or 118 has a user who uses a browser application 112a, 112b, 112b, 112b, 112b, or 112b to view and visit web pages, the monitoring application 112b, 112b, 112b, 112b, 112b, 112b, or 130b may collect information and send it to the collection server 130. The monitoring application might collect URLs and other resources visited, times they were accessed and an identifier that is associated with that particular client system (which could be linked to demographic information about the user or users). A unique identifier, for example, may be generated and linked to the specific copy of the monitoring software installed on the client’s system. Monitoring applications may also collect and send information regarding requests for resources and any subsequent responses. The monitoring application might collect cookies that are sent to it and/or received by it in response. This information is received and recorded by the collection server 130. The collection server 130 collects and records the information from client systems. It then stores the aggregated information in the database 132, as panel centric data (132 a).

“The panel centric data (132 a) may be used to analyze the habits and visitation of panel users. This information may then be used to extrapolate to all Internet users. Any information that is collected during a session can be used to identify a user of the client software (and/or their demographics). The monitoring application might require that the user identify himself or use techniques like those described in U.S. Patent Application No. 2004-0019518 and U.S. Pat. No. No. 7,260,837 and both are incorporated herein by reference may be used. The client system can be used to identify the user. This allows the usage information to possibly be extrapolated per person rather than per machine. This allows measurements to be attributed to individual users of the client system, not machines.

“To extrapolate panel member usage to the larger universe being measured,” some or all members of the panel have been weighted and projected onto the larger universe. A subset of members may be projected and weighted in some cases. Analyzing the received data might indicate that some panel members’ data is not reliable. These members could be excluded from reporting, and thus, prevented from being projected and weighted.”

The users included in the projection and weighting are weighted so that the reporting sample represents the demographic composition of the universe to be measured. This weighted sample is then projected to the entire universe. You can do this by assigning a projection weight to each member of your reporting sample, and then applying that projection weight on the member’s usage. A reporting sample of client systems can be projected to all client systems using client system projection weights. “The projection weights for client systems are usually different from those of the users.”

“The usage behavior of either the client system or user in the projected weighted sample may be taken to represent the behavior of the defined world (either client system or user). The behavior patterns seen in the projected, weighted sample could be taken to reflect the patterns found in the universe.

This information can be used to calculate the number of visitors and other behaviors. This data can be used to determine the number of unique visitors (or clients systems) to certain web pages and groups of pages, or unique visitors from a specific demographic to certain web pages. These data can also be used to estimate other factors such as frequency of usage per client system, average number pages viewed per client system user and average time spent per user.

“As explained further below, such estimates or other information determined by the panel centric may be combined with data from a beacon based approach to generate reports on audience visitation and other activity. These reports may be more accurate if the panel centric data is combined with beacon-based data.

Referring to FIG. A system 200 may be used to implement a beacon-based method. A beacon-based approach could include beacon code being included in one or more web pages.

“System 200” includes one or more client system 202, web servers 110, collection servers 130, and database 132. Client systems 202 may include client systems 112, 11, 116, or118 that have the panel app installed.

“The client systems contain a browser application (204), which retrieves web pages from web servers 110 and renders those web pages. Beacon code 208 is included in some web pages 206. The beacon code 208 can be included in web pages by publishers who agree to allow the entity that operates the collection server 130 to use it. This code 208 will be rendered along with the web page that has the code 208. The code 208 is rendered and causes the browser application to 204 to send a request to the collection server 130. The message contains certain information such as the URL for the page where the beacon code 208 was included. The beacon code could be JavaScript code, which accesses URLs on the site where the code is embedded and sends them to 130 via HTTP Post messages that include the URL in a query string. The beacon code could also be JavaScript code, which accesses URLs on the page where the code is embedded and then sends to the collection server 130 an HTTP Post message that includes the URL in a query string. attribute of an tag which causes a request for the resource at the URL in?src attribute of the tag to the collection server 130. Because the URL of the webpage is included in the ?src? attribute, the collection server 130 receives the URL of the webpage. The collection server 130 can then return a transparent image. The following is an example of such JavaScript:”

” ”

“The collection server 130 records URLs received in messages with, for example, a time stamp indicating when the message was received as well as the IP address of client systems from which it was received. This information is then compiled by the collection server 130 and stored in the database 132 as site-centric data 132b.

“The message could also contain a unique identifier that identifies the client system. A unique identifier for the client system may be generated when a client sends a beacon to collection server 130. This unique identifier can then be associated with the beacon message. This unique identifier could then be added to a cookie that is created on the client system 102. The cookie may be added to any subsequent beacon messages sent from the client system. This will ensure that messages contain the unique identifier of the client system. If the beacon message is not received by the client system (e.g. because the user has deleted cookies on their client system), the collection server 130 can generate a unique ID and include it in a new client system cookie set.

“Thus, clients systems 102 can access webpages (e.g. on the Internet), and client systems102 can access webpages that contain the beacon code. Messages are sent to collection server 130. These messages contain the URL of the page accessed and a possible unique identifier for client systems that sent it. A record for the message may be created when it is received by the collection server 130. A record can be created for each message that is received at the collection server 130. It may contain an identifier (e.g. the URL) of the page accessed by the client systems, a unique identifier for the client systems, a time when the message was received (e.g. by adding a time stamp to indicate when the message was received by collector 130), and a network adress, such as an IP, of the client systems that accessed this webpage. These records may be gathered by the collection server 130 and stored in the database 132, site centric data 132b.

“Beacon messages are sent regardless of whether the client system has the panel app installed. The panel application records the beacon message and sends it to the collection server 130 for client systems that have the panel application installed. If the panel application is monitoring HTTP traffic, the beacon message can be sent via an HTTP Post message or as a result. The beacon message is then recorded in the HTTP traffic that the panel application records, along with any cookies included in the beacon message. In this example, the collection server 130 is notified that the beacon message has been received by the collection server 130.

“Because the beacon message can be sent regardless of whether the panel app is installed, site centric data (132 b) directly represents the accesses of members of larger universe to measure, and not just members of the panel. Site-centric data 132b can be used to generate audience measurement data for web pages or groups that contain the beacon code. This initial data could contain inaccuracies for a variety of reasons. The panel-centric data, 132 a, can be used to calculate adjustment factors that could increase the accuracy site-centric data.

“FIG. “FIG. The system 300 has a reporting server 302. The reporting server 302 can be used to execute instructions in a specified manner. It could use a general-purpose or special-purpose computer, workstations, servers, mobile devices, and personal computers. A software application, program, piece of code, device, computer, system or combination thereof may be sent to the reporting server 302. Instructions can be stored permanently or temporarily on any machine, component, equipment or other storage medium that can be used by the reporting server 302.

“The reporting server 302 executes instructions to implement a measurement processor 304, and a report generation program 308. The measurement processor 304 contains a pre-processing and initial measurement modules 304a, 304b, and 304c. A report generation module 308 is also implemented by the measurement data processor. 4. To generate unified and adjusted measurement data 306 using the panel centric information 132a and site centric information 132b. Report generation module 308 can use the unified and adjusted measurement data 306 in order to generate one or several reports 310. These reports may include information about client system accesses for one or more resources.

“FIG. “FIG. This describes process 400. It is performed by the preprocessing module 302 a, the initial measuring module 304b, the measurement adjustment program 304c, and report generation module 308 c. The process 400 can be done by other systems and configurations.

“The pre-processing module (304 a) accesses the site centric and panel data 132a (402). The panel centric information 132a refers to a first set resources that were accessed by the first set (those on the panel), and the sitecentric data132b refers to a second set resources that were accessed by the second set. Some of the second-set client systems could be in the panel, while others may not. The second set of resources could also include resources that were included in the first resource set.

“The panel-centric data 132a may contain records that reflect URLs or other identifiers for web pages or other resources accessed. They also include identifiers for client systems that accessed those resources. It may also include information about requests and responses that were used to access those resources (e.g., cookies that were sent and/or received in responses). Site centric data 132b can include records that indicate a URL or another identifier of a resource accessed by a client, the network address of the client that accessed that resource, the time the resource was accessed (for instance, as indicated by a time stamp at the time the collection server 130 received the beacon message), and an unique identifier for that client system (for example in a cookie attached with the beacon message).

“The site centric and panel data 132a are the data aggregated over a specific time period. The accessed data could be, for example, the panel centric information 132a and site centric info 132b that were aggregated over the past 30 days.

“The pre-processing modules 304a perform one or more preprocessing functions on the accessed panels centric data (132 a) and the accessed sites centric data (132 b) (404). The pre-processing module, 304 a may process raw panel centric data 130 a to create state data which represents all facts of usage within a single record. A record in state data might indicate that a user visited web page B on a specific date and time using a specific client system. Pre-processing module (304a) may also match some or all URLs in records of state data to patterns within a dictionary. This may allow for the organization of different URLs into digital media property, reflecting the way Internet companies run their businesses. Each pattern can be associated with a website entity. This could be a collection of web pages or web pages that have been logically grouped together to reflect how Internet companies run their businesses. The finance.yahoo.com domain might include a number of web pages. These web pages could be logically combined into one web entity, such as Yahoo Finance. To reflect different Internet media companies and their ways of arranging their web properties, the dictionary could include several hierarchically linked entities. Yahoo Finance may be considered a subset or part of the Yahoo web entity. This may include all the pages in the yahoo.com website. Other web entities may be included in the Yahoo web entity, such as the Yahoo Health web entity (associated to the various pages on the health.yahoo.com website). Pre-processing module (304a) may associate a state record with the lowest level web entity associated to the URL in the state records.

“The pre-processing module 306 a can also remove data from the panel centric data 130 a for users who are not included in the reporting sample. There may be rules that must be met to ensure that the complete report of a user’s usage during the reporting period has been received. The user could be removed from the report sample if they do not meet these rules. A user can also be removed from the reporting sample if they do not meet certain criteria such as living in a specific area.

“In addition, pre-processing module 306 a may also remove certain types records. Records that are non-human initiated requests (e.g. requests made to render a webpage) or redirects may be deleted.

“The pre-processing module304 a may process site centric information 132b to match some or all URLs in the site records 132b to patterns in dictionary so that the records are associated with a web entity such as the lowest web entity in a hierarchy. To determine the measurement data 306, the actions 406 through 410 can be done per-web entity. To determine the measurement data 306.

“In addition, pre-processing module 204 a may delete certain records from site centric data 132. The pre-processing module, 304 a can remove records that are not human initiated from the site data 132. To remove records that are attributed to accesses by those robots, you can use a list of search index crawlers. Alternately, records that indicate sequential accesses by a client system to the same web page or to different web pages may be removed. For example, accesses spaced 3 seconds apart or less could mean that accesses following the first one are not being made. This can be used to delete records that are not human initiated, and also correct errors related to the beacon code that could result in more than one beacon message per acces.

“In some cases, records may be deleted for client system devices. Records for mobile devices, for example, may be removed. These records can be identified based on the user agent data that was sent with the beacon message. This information may then be recorded in the record. Client systems that are not located in a specific geographic area may have records removed. This is for example, when reports are generated for North America. A reverse lookup (e.g., an IP address reverse lookup) may determine the country and region where the record is located. The reverse lookup of the IP address may also be used to detect shared-use client systems (e.g. client systems that are available to the public in libraries).

“Pre-processing both the panel centric and site centric data may involve delineating between client systems. Sometimes it is desirable to divide reports into classes according to client systems. In one example, reports and the underlying data are divided into home client systems and work client systems. Home client systems are those that are used at their home, while work client system are those that are used at work. These subpopulations can both be identified and distinguished in panel centric data (132 a) because the users themselves identified the machines as either home or work (or some other class) during registration. These two sub-populations can be identified and separated in the site-centric data 132 a. The beacon messages received between 8 and 6 pm local on Monday through Friday could be considered work-generated traffic. The Home sample may target all traffic.

“Another way to identify and separate these two subpopulations from the site centric information 132 b is to use a model that is based on the observed work behavior in panel centric data data 132a. This model could be based on day-of-the-week usage profiles and time of day. All traffic to an IP address that matches the profile of a work machine may be considered work traffic. Panel data might indicate that a machine may be considered work-related if it has more accesses in a given time period (a work period), than in a different time period (a home period). This data may be combined with site-centric data to help classify network access provider into work or home. It is based on whether accesses by users using those network access providers are higher during work hours than at home. This information may be used to determine the IP address of a machine and classify it as the network provider. These techniques are described in U.S. patent Ser. No. No. 61/241/576, filed Sep. 11, 2009. Titled?Determining Client Systems Attributes.

“Actions 406-410 may then be performed separately on each subpopulation to generate measurement data for both the home and work populations. These reports can be generated separately or combined, as described in action 412. Another implementations could also divide between several subpopulations.

“The initial measurement module (304 b) determines the initial usage measurement data, based on pre-processed sitecentric data (406). The initial measurement module 304b can be used to determine the number of unique visitors to a web entity. The number of unique visitors could be defined as the number of people who viewed and requested a web page from the web entity. The initial measurement module 304b can count the number unique cookies received in beacon messages to determine the initial measurement of unique users.

“Another example is that the initial measurement module (304 b) may determine an initial measurement for page views for a web entity. The number of page views for a web entity may be the number of times they were requested and/or viewed. This is independent of whether or not the pages were requested or viewed individually. The initial measurement module 304b could count the number of beacon messages that were received by the web entity.

“The measurement adjustment module (304 c) determines one or several adjustment factors based upon the pre-processed panel data (408). For a variety of reasons, the initial audience measurement data may not be accurate. It is based on pre-processed sitecentric data. Pre-processed panel data can be used to adjust for inaccuracies.

“For instance, if the first measurement of unique users is based solely on beacon measurements, there could be an over or undercount of unique visitors. This is because cookies are set on a machine-by-browser basis and not on a person basis. This means that even though multiple users may use the same client system, only one cookie can be set and counted for each machine and browser. This could lead to an undercounting unique visitors.

“In addition, a cookie that was previously stored on a client system could be deleted. This will result in a new cookie with a new identifier for any subsequent accesses within the reporting period. Accesses made by the same user could be mistakenly identified with accesses from different users. This can lead to an overcount of unique visitors. A user could also use multiple browsers with different cookies set for each. Because a user may use different browsers on the same computer, multiple cookies could be set for that user. This could lead to an overcount of unique visitors.

A cookie-per-person adjustment factor can be calculated based on panel centric data to account for inaccuracies. This adjustment factor can be calculated on a per-web entity basis. This adjustment factor could be used to determine the cookie-per-person number that is set per person who visits beaconed pages (web pages that contain the beacon code). This adjustment factor can be used to adjust the total number of unique visitors to compensate or multiple cookies per person. For example, the process 500 discussed in FIG. 5 may be used to determine this adjustment factor. 5.”

“A user can also have multiple client systems at a given place (for instance, at home). This may lead to separate cookies being set on multiple client systems and counting each user who visits the website entity. This could lead to an overcounting unique visitors. This can lead to an overcounting of unique visitors. A machine overlap adjustment factor could be calculated based on pre-processed panel data. This adjustment factor can be calculated on a per web entity basis. This adjustment factor may be used to adjust for multiple cookie usage by a visitor to the web entity. It can also reflect the number and type of client systems that are being used. The process 600, which is described in FIG. 6, may help determine this adjustment factor. 6.”

A non-beaconed adjustment coefficient may be calculated based on pre-processed panel-centric data to account for any inaccuracies in page views or unique visitors due to a failure of beacon code being included in all web pages for a web entity. This adjustment factor can be calculated on a per-web entity basis. The panel applications should capture all web traffic. However, non-beaconed visits to web pages of a given entity may also be captured and reported by them. The panel data can be used to calculate a non-beaconed adjustment coefficient that is based on unique page views and visits to web pages by the web entity. This factor does not depend on beacon messages. For example, process 700 may be used to determine this adjustment factor. 7.”

“The measurement adjustor module304 c applies adjustment factors to initial usage measurement data to produce adjusted usage measurement data 306 (410) For instance, in one implementation for audience measurement data that reflects unique visitors for a given web entity, the measurement adjustor module 304 c may generate adjusted unique visitors data as follows:\nAdj UVs=((Init UVs/Cookie-Per-Person)*Machine Overlap)+Non-Beaconed\nwhere Adj UVs is the adjusted unique visitors count, Init UVs is the initial count of unique visitors based on the pre-processed site centric data, Cookie-Per-Person is the cookie-per-person adjustment factor, Machine Overlap is the machine overlap adjustment factor, and Non-Beaconed is the non-beaconed adjustment factor. You can multiply the Cookie-Personal adjustment factor (a Person Per-Cookie adjustment factor), rather than divide it.

“As another example, in one implementation for audience measurement data that reflects the total page views of web pages for a given web entity, the measurement adjustor module 304 c may generate adjusted page views data as follows:\nAdj PageViews=Init PageViews+Non-Beaconed\nwhere Adj PageViews is the adjusted page views count, Init PageViews is the initial page views count based on the pre-processed site centric data, and Non-Beaconed is the non-beaconed adjustment factor.”

“The report generation module 308 generates audience measurement reports using the adjusted audience measurement data (412). In an example, the report generation module 308 may generate reports about unique visitors and page views for a particular web entity, for either one or both the home and work population. In such an implementation, report generation module 308 can generate reports on unique visitors to a web entity or page views that combine the home or work populations. The report generation module can combine page views for both the home- and work populations to create a combined count of pageviews, and/or combine unique visitors for both the home- and work populations to produce a combined count.

“In some cases, the report generation module 308 may produce a combined count for unique visitors. The module also takes into consideration the number of users who are both at home and at work. Sometimes, the user may access the website from both a work client and home client system. If the count for the home population were simply added to that of the work population, the user would then be counted twice. Panel centric data 132a may be used by the report generation module 308 to calculate the user overlap between these two populations and remove duplicates. One example is that a number users could install the monitoring app on both their home and work systems. Each user may be designated as such. The data from these users can be used for estimating the number of people who visit the web pages of the web entity via both work and home client systems. This information can then be used to de-duplicate the users in the total count of unique visitors.

“FIG. “FIG.5” is a flowchart that illustrates a process 500 to determine a cookie per-person adjustment factor. This describes process 500 as it is performed by the measurement adjustment program 304 c. However other systems and configurations may also perform the process 500. This adjustment factor can be used to adjust initial audience measurement data for a web entity, as noted above. The following describes the implementation of process 500, where the actions 502 through 506 are executed on a web entity basis.

“The measurement adjustment module304 c calculates, using pre-processed panel data, the count of unique visitors who visited a beaconed page of a web entity (502). The total number of unique visitors can be calculated by adding the projection weights of each member to the pre-processed data. A member’s projection weight may represent the number of people that they are in the total universe. Therefore, by adding up the projection weights of each member, the total number who visited a beaconed page of the web entity may be determined.

“The measurement adjustment module (304 c) counts the number of beacon cookies that a web entity has received (504), based on pre-processed panel-centric information. The measurement adjustment module 304c can, for example, determine which client systems accessed the beaconed page of the web entity using the pre-processed panel-centric data. The measurement adjustment module 304c can then determine which client systems were accessed by beacon messages. (also known as “beacon cookies”) During the reporting period. The panel applications can record and report beacon messages and associated cookies (beacon cookies) for client systems that have the panel application installed. The measurement adjustment module 304c can then generate a projected client system cookie count by adding the projection weight of the user to the number sent by client systems during the reporting period. The measurement adjustment module, 304 c adds together the projected cookie counts to calculate the total number beacon cookies that have been sent by the client system during the reporting period. To determine the projected cookie count, it is possible to add more than one user to a client system.

The measurement adjustment module 304c calculates the cookie-per person adjustment factor by taking the ratio between total unique visitors to total unique visitors. In other words, the measurement adjustment module 304 c determines Cookie-Per-Person as:\nCookie-Per-Person=Total Cookies/Total Unique Visitors\nwhere Total Cookies is a count of the total number of beacon cookies for the web entity and Total Unique Visitors is a count of the total number of unique visitors for the web entity. The reciprocal of Cookie-Per?Person adjustment factor (Person?Per-Cookie), may also be used. You can determine the Total Unique Visitors/Total Cookies to determine the Person-Per-Cookie factor.

“FIG. “FIG.6” is a flowchart that illustrates a process 600 to determine a machine overlap adjustment coefficient. This describes 600 as it is performed by the measurement adjustment modules 304 c. However the 600 process can be performed by other systems and configurations. This adjustment factor can be used to adjust initial audience measurement data for a web entity, as noted above. The following describes the implementation of process 600, in which 602-606 are executed on a web entity basis.

“The measurement adjustment module304 c calculates, based upon pre-processed panelcentric data, the client system to person ratio of a given web entity (602. As mentioned above, a user can have multiple client systems at a given place (e.g., at home). Even though one user is currently visiting the website, multiple cookies can be placed on different client systems and counted. A client system to person ratio can be calculated for any given web entity based on pre-processed panel data. This includes the universe of Internet users and clients systems or users in a specific geographic area. The measurement adjustment module 304c can determine the client system-to-person ratio for a given website entity by determining the total number and user count of defined universe clients that accessed web pages of that web entity.

As described above, projection weights can be used to project users to the total number (or Internet users within a specific geographic region) of Internet users. Projections weights may also be used to project client systems to the total universe (or at least the total in a given geographic region) of client systems that access the Internet. To determine the total client systems that accessed the web page of the web entities, the measurement adjustment module (304 c) may determine which client systems accessed web pages from the entity’s web site during the reporting period. The projection weights for these client systems can then be added to the total client system count to determine the total client systems that accessed the web page of the entity. The measurement adjustment module 304c can also be used to determine total users. It may use pre-processed panel data to identify the users who accessed web pages for the web entity in the reporting period. After adding up the projection weights, it will determine the total users that accessed the web entities web pages.

“Based on the client-system to person ratio, measurement adjustment module 304c calculates the expected reach based upon all panelists in pre-processed panel data across all client systems where those panelists were active (604). Reach is simply the percentage of users who visited a particular web page during a specified period of time, such as the reporting period. The reach percentage is simply the number of visitors who visited the web page.

“The expected reach may be calculated using all panelists in all client systems where they are active.

“pRE 1 + E – 1?? p l ? ? ( E -1 S -1 ) /? ? n (T ) or (1 + q) RE 1 + (E – 1?) ( 1 + q ) l ? ? ? ( E -1 S -1 ) /? ? ? ( T )nwhere?

“p” is the client system to person ratio.

“M p p p;”

“q=the incremental amount of client systems used people=(p?), assuming no shared use machines so that people only use one machine.

“T” is the reporting period in days (e.g. 30 days);

“R=the projected reach for the reporting period T;

“E” is the frequency of visits per visitor to a website page of the web entity over the period T;

“S” is the average number of visits to a web site by a user per day over the period T.

The projected reach, R, for the reporting period may be calculated by using the preprocessed panel-centric data to determine the projected user count that visited the web page during the reporting period, and then dividing that number by the estimated total universe of users. E is the frequency of visits to a website page by a visitor. This can be calculated using pre-processed panel data. These data include the total number of visitors to the entity’s web pages during the reporting period. Then, divide those numbers by the total estimated universe of users. You can calculate the average page visit to a web site of the entity per day by using pre-processed panel data. Add these values together and divide that number by the number of days in your reporting period.

“Based on the client-system to person ratio, measurement adjustment module 304c determines the incremental reach that is not measured due the client systems used members of the panel but are not included in the panel and the reach, R, which was measured by the panel (506). The following formula can be used to determine the expected reach gain from incremental machine activity that is not measured by the panel:

“qRE 1 + E – 1?? q l ? ? ? ( E -1 S -1 ) /? ? ????? n???? ( T )”

“This incremental reach can be then added to the measured reach R.”

“The measurement adjustment module (304 c) determines the machine overlap adjustment factor by determining the proportion of the expected reach across all client system to the incremental reach and measured reach (508). The measurement adjustment module 304c can determine the machine overlap adjustment coefficient based on these factors:

“( 1 + q ) ? RE 1 + (E – 1?) ( 1 + q ) l ? ? ? ( E -1 S -1 ) /? ? ? ( T ) R + qRE 1 + ( E – 1 ) ? q l ? ? ? ( E -1 S -1 ) /? ? ????? n???? ( T )”

“Which simplifies it to:

“( 1 + q ) ? E 1 + E – 1?? ( 1 + q ) l ? ? ? ( E -1 S -1 ) /? ? ? ( T ) 1 + qE 1 + ( E – 1 ) ? q l ? ? n ( E -1 S – 1? ) / ? ???? n n???? ( T )”

The measurement adjustment module 304c can calculate the machine overlap adjustment factor by using the simplified equation. The measurement adjustment module 304c, for example, may calculate the client system-to-person ratio, then determine the incremental number (e.g. by determining p.1), determine how often people visit a web site of the web entity, then calculate the machine overlap adjustment factor using the simplified equation.

“Moreover, the projection weights of the clients systems and users in the defined universe can be calculated accurately. The client system to person ratio can then be used as the machine overlap adjustment factor. It may not be possible to do such precise weighting and estimating. It is possible that there are a mixture of primary (those most frequently used to access the Internet) or secondary (those who use the Internet less often), but it is not always possible to know the exact mix. The sample composition and site may cause the client system-to-person ratio to be more biased towards primary or secondary usage. These errors can be compensated by using the client system-to-person ratio to calculate a machine overlap adjustment factor to compensate for possible errors in weighting and estimating the universe. If the combined reach of the sample exceeds the incremental reach to measured reach, then the machine overlap adjustment factor will increase unique visitors. If the expected combined reach of the web entity is lower than the incremental reach to measured reach, the sample will be biased more towards primary use. The machine overlap adjustment factor will reduce unique visitors to account to incremental secondary usage.

“FIG. “FIG.7” is a flowchart that illustrates a process 700 to determine a non-beaconed adjust factor. This describes process 700, which is performed by measurement adjustment module (304 c). However, other systems and configurations may perform process 700. This adjustment factor can be used to adjust initial audience measurement data for a web entity, as noted above. The following describes the implementation of process 700, where the actions 702-706 are executed on a web entity basis.

“The measurement adjustment module (304 c) determines, depending on the audience measurement, the total number of unique visitors or pageviews for a given website entity using pre-processed panel data (702). Since panel applications should capture all web traffic, non-beaconed visits to web pages are also recorded and reported by panel applications. The measurement adjustment module 304c can use pre-processed panel data in order to calculate the total number of unique visitors and page views for a particular web entity. This is true even if not all web pages have beacon codes.

For example, the total number unique visitors can be calculated by adding the projection weights of each member of the panel to the count of page views for the member that visited the web site. To determine the total page views, one could apply each member’s projected weight to the number of pageviews for that member and then add all the projected pageviews together.

Click here to view the patent on Google Patents.