Alphabet – Bruce L. Davis, Tony F. Rodriguez, Digimarc Corp

Abstract for “Channelized audio watermarks

An audio watermark signal can be associated with an audio host signal but it is supplied separately. The user can decide whether or not the watermark signal is rendered after rendering the audio host signal. This specification describes a wide range of technologies, most relating to audio and/or image content and/or portable devices (e.g. smartphones).

Background for “Channelized audio watermarks

“The present technology expands in certain respects upon technology detailed in the above-detailed patent application. These previous works can be used to implement the present technology and can be incorporated into the current technology.

Referring to FIG. 1. An illustrative device 14 includes a processor 16, memory 18, one or several input peripherals 20 and one or two output peripherals 22. System 12 can also contain a network connection 24 and remote computers 26.

An illustrative device 14 can be a smartphone or tablet computer. However, any other electronic device that is consumer-grade can be used. The processor may include a microprocessor, such as an Atom device or A4 device. The operating system software and application software stored in the memory control part of the processor’s operation. ), data, etc. A hard drive or flash memory could be used as memory.

“Input peripherals 20 could include a camera or a microphone. An interface system that converts analog signals from the camera/microphone into digital data is also possible. You can also use a touch screen or keyboard as an input peripheral. Output peripherals 22 include a speaker, display screen, and so on.

“The 24th network connection can be wired (e.g. Ethernet, etc. ), wireless (WiFi, 4G, Bluetooth, etc. Or both.”

“In an exemplary operation device 14 receives a set digital content data through a microphone 20. The interface can be connected through the network connection 24 or any other means. You can use any content data; audio is an example.

“The system 12 processes digital content data to create corresponding identification data. This can be done by using a digital watermark process or a fingerprinting algorithm. data (e.g., file names, header data, etc.). This data is used to identify the content data received from other data (e.g. other audio or video).

“By referencing this identification data the system determines which software should be invoked. Indexing a table, database or other data structure with this identification data is one way to accomplish this. This will allow you to identify the correct software. FIG. 2 shows an illustration of a table. 2.”

“In certain cases, the data structure might identify a single program. If this is the case, the software will be launched. The software does not need to be installed on the device. Apps that are cloud-based may be available. If the software is not available, it can be downloaded from an online repository such as the iTunes Store, installed, and launched. The device can also subscribe to the software-as-service version of the app. Depending on the implementation, the user may be asked permission to participate in certain actions. In other cases, such actions are carried out without disturbing the user.

“Sometimes, the data structure can identify multiple software programs. Different programs might be specific to certain platforms. In this case, device 12 could simply choose the program that corresponds to that platform (e.g. Android G2, iPhone 4, etc.). The data structure might identify other programs that are compatible with a particular platform. The device might check this situation to see if there are any already installed. It can launch the program if it is found. The device can choose between two programs if it finds them. The device might prompt the user to choose one or both. The device can choose to download an alternative program if none are available. This is done using an algorithm or user input. The application is launched once it has been downloaded and installed.

“(Sometimes, the data structure might identify different programs that serve various functions?all of which are related to the content. One app could be used to find lyrics. An app that relates to the biography of a musician could be another. An app that allows you to purchase the content could be another option. Each type of software can include multiple alternatives.

“Note: The device may have an already installed application that is technically suitable to work with the received content (e.g. to render an MPEG4 file or an MP3 file). There may be many or more programs that are technically compatible with certain operations. The content might indicate that only a small subset of the possible software programs should be used.

Software in the device 14 could enforce content-identified software selection. The system could also treat software identification as a preference that can be overridden by the user. In some cases, the user might be given an incentive to use content-identified software. Alternately, the user might be charged a fee or other impediment to use any software not identified by the content.

“Sometimes, the system might not render certain content on a particular device (e.g. because there is no suitable app or hardware capability), but it may invite the user transfer the content to another device that has the required capability and may implement such transfer. Ansel Adams may have been sceptical of large format photographs being used as screen savers on a low resolution smartphone display. The software might suggest that the user instead of trying to display the images on a small format, low resolution smartphone display, it will ask the user to transfer the images to a larger format HD display at home.

The system might render the content in a limited manner, instead of rendering it completely. A video could be rendered in a series or still frames, such as from scene transitions. The system can also transfer the content to a place where it can be enjoyed more effectively. If hardware considerations allow (e.g. screen resolution is sufficient),?the software can be downloaded and installed.

“As illustrated by the table in FIG. “As shown in FIG. 2 (which data structure might be located in the memory 18 or in a remote computing system 26, the indication of software could be based upon one or more contextual elements?in addition to content identification data. ”

“Context” is a formal definition. “Context” is any information that can be used as a way to describe an entity’s situation (a person, place, or object that is relevant to the interaction between an application and a user), including applications themselves.

“Context information can include computing context (network connectivity and memory availability, processor type and contention, as well as computing context (“context information”)). ), user context (user name, location, actions and preferences; also, friends and social networks. ), physical context (e.g., lighting, noise level, traffic, etc. ), temporal context (time, date, month, season). History of the above, etc.

“In the illustrated tableau, rows 32-34 correspond to the same content (i.e. the same content ID), but they indicate that different software should be used. Depending on whether the context is indoors and outdoors. The software is identified with a five symbol hex identifier, while the content is identified using six hex symbols. Other identifiers, with a longer or shorter length, may be used.

Row 36 shows two software items?both of these are invoked. One includes a second descriptor?an identifier for a YouTube video that will be loaded by software FF245. This software is intended for users in a daytime context and those aged between 20-25.

Row 38 displays the user’s location (zip code) as well as their gender, which is contextual data. The alternative specifies the software used to create this content/context (i.e. four identifiers??OR?d together as opposed with the?AND). ”

“Rows 41 and 42 demonstrate that the same content ID could correspond to different codecs depending on the device processor (Atom, A4).”

“(By way of comparison, think about the process by which codecs are currently chosen. The user may not be familiar with the technical differences between codecs and the artist does not have any control. The codec selection is made by the party least interested in it. Instead, certain media rendering software, such as Windows Media Player, comes with default codecs. The default codecs may not be able to handle certain content. If this happens, the rendering software will typically download a second codec (again without input from the parties most affected).

It will be clear that the software listed in table 30 can be either a standalone app or a component of a software program (codec, driver, etc.). The software can render content or provide additional functionality or information related to it. Some implementations of the “software” may include a URL or other data/parameters that are provided to another software program or online service. In some implementations, the?software? can include a URL or other data/parameter that is provided by another software program or online service (e.g. a YouTube video ID).

“Desirably all such software listed in the table is selected by the proprietor (e.g. creator, artist or copyright-holder of the content with it). This gives the proprietor an artistic control that is not available in other digital content systems. (The owner’s control should be treated with more respect than that of AOL or iTunes. The proprietor’s decision seems to be more important than the company that provides the word processing software.

“Often, the selection of software by the proprietor will be based on technical merit and aesthetics. Sometimes however, commercial considerations are involved. Robert Genn, an artist, noted that “…” ?Starving artist? Acceptable at age 20, but problematic at age 40

For example, if ambient audio is detected by a user’s phone from The Decemberists, the artist-specified data 30 might indicate that the device should either load the Amazon app to purchase the music or load the Amazon web page, in order to generate sales. The Red Hot Chili Peppers may also be detected by the same device. This group might have requested that the device load their web page or another app for the same purpose. Thus, the proprietor can specify the fulfillment service for content-oriented commerce.

An auction arrangement may be the best way to solve the problem of starving artists in some arrangements. The device 14 (or remote computer systems 26) might announce to an online service (akin Google AdWords), that the iPod of a customer?for which certain demographic profiles/context information may exist?has detected the soundtrack from the movie Avatar. The user can then be offered a purchase opportunity by holding a mini auction. The winner, e.g., EBay, then deposits the winning bid amount into an Account, which is shared with the artist and the auction service. The device launches an EBay application, which allows the user to purchase a copy of the movie or its soundtrack. These content detection events and the context information can be pushed to cloud-based services, which can create a highly competitive market for responses.

“(Auction technology can also be detailed in the assignee?s previously-cited patent application and in Google?s published patent applications US2010017298, and US2009198607.”)

Associative software can become popular because of the popularity of content. This could encourage other content owners to use such software with their content. Wide deployment of the software might increase consumer exposure to other proprietor’s content.

“United Music Group might digitally watermark its songs with an identifier that allows the FFmpeg M3 player to be identified by Universal Music Group as the preferred rendering program. The software is quickly installed by dedicated fans of UMG artists. This allows for widespread deployment on large numbers of consumer devices. The widespread use of the FFmpeg M3 software is one factor that music owners consider when making a decision about which software to include in the table 30.

“(The table 30 software may be modified over time such as during a song’s production cycle. A table-specified program may contain an app that introduces a new band to the public, or a YouTube clip. A different software selection might be indicated after the music is more popular or the band has been better known.

Application software is commonly used to perform music discovery and other content-related tasks. Some of these services (e.g. I/O) can be used in content-related software. Commercial OS software has never provided services that are specific to content identification or processing.

“According to a further aspect, the present technology provides operating system software to perform one or several services specific to content processing and identification.”

“In one implementation, an OS program interface (API), takes content data (or a pointer at a location where content data is stored) and returns fingerprint data corresponding to that input. A different OS service, either using the same API or another, takes the same input and returns the watermark information decoded directly from the content data. An input parameter to the API allows you to specify which watermark or fingerprint process is to be used. The service can also apply multiple watermarks and/or fingerprint extract processes to input data and return the resultant information back to the calling program. Watermark extraction can be used to verify the validity of the resultant data by referring to error correction data.

“The same API, or another, can further process the extracted fingerprint/watermark data to obtain XML-based content metadata that is associated with the content (e.g., text giving the title of the work, the name of the artist, the copyright holder, etc.). It may consult a remote metadata registry such as Gracenote to do this.

“Such a content-processing API can establish a message queue (e.g., a ?listening/hearing queue) to which results of the fingerprint/watermark extraction process (either literally, or the corresponding metadata) are published. The queue can be monitored by one or more applications. One app might be alerted to music by The Beatles. One app may be alerted to Disney movie soundtracks. The monitoring app can detect such content and launch activity. This includes logging the event, providing a complementing media content, and offering a purchasing opportunity.

You can also implement such functionality without the need for an operating system. One way to do this is through a publish/subscribe model. This allows apps to publish certain capabilities (e.g. listening for a specific type of audio) and others to subscribe. These arrangements allow loosely-coupled apps to cooperate to create a similar ecosystem.

The present technology can be used to monitor media that a user has been exposed to. The user does not need to take any action to start a discovery operation to find the song’s identity, unlike Shazam and other song identification services. (Of course the user must turn on the device at some point and authorize the background functionality. Instead, the device listens for longer periods of time?much more than the 10-15 seconds Shazam-like services allow, and during the user’s day. When content is detected, it is processed to recognize it. This information is stored in the device and used to prime software to reflect exposure.

“The device might process ambient audio for 15 minutes, an hour or a whole day. The device may display a list of content that the user has been exposed when the user interacts with it again. To engage in a discovery process, the user might be invited to touch content listings. The software associated with the content launches.

“In certain implementations, the device can prime software programs with information that is at least partially based on content identification data. This may cause the YouTube app to display a thumbnail that corresponds to a music video for a song the user has heard?readying it to be selected. A 90-second sample audio clip can be downloaded to the iPod music app. It is located in a ‘Recent Encounters’ folder. folder. A band email might be added to the email InBox. A trivia game app could load questions related to the band. These data are stored locally, so the user doesn’t need to direct it retrieval (e.g. from a website). The information is displayed prominently when the app is used again.

Social media apps can be used as platforms to share such information. An avatar might greet the user when they activate a Facebook app. . . ? Then, list the content that the user was exposed to, e.g., “Billy Liar?” By the Decemberists, “Boys Better?” The Dandy Warhols and the Nike LeBron James commercial. The app can remind users of the context where each item was found, such as while they were walking through downtown Portland on November 4, 2010. This is determined by GPS and accelerometer sensors. The Facebook app allows users to invite their friends to share any content. The app may also ask if the user would like discographies of any of these bands or full digital copies. It might also inquire if the user is interested in additional content or associated apps.

“The app can also report on the media encounters and related activities of the user’s friends (with appropriate permissions).

“The foregoing will show that certain embodiments of the above inventions make it easier for the user to locate apps associated with specific media content. Instead, the media content is used to locate its favored apps.

These embodiments ensure continuity between artistic intent and delivery. They optimize the experience the art is meant to create. The artist does not have to allow a platform to deliver the artistic experience.

This technology encourages competition in app marketplaces, giving artists more control over which apps are most popular. A Darwinian effect could emerge where app popularity is less a result of marketing budgets and branding, but more a result of the popularity of the content delivered.

“Other Arrangements”

“Filtering/Highlighting Data Streams by Reference to Object Interactions”

“Users are being presented with increasing amounts of data. There are hundreds of channels for television, emails, RSS/Twitter/social networks/blog feeds. Technologies have been developed to filter and highlight incoming information based on user profile data.

DVR software such as Tivo is a familiar example. It presents a subset unabridged of the electronic program guide based on user interest. Tivo software allows users to view television programs and invites feedback from them in the form of thumbs-up or thumbs-down. Tivo software can detect which television programs have been viewed by the user and invites feedback in the form of?thumbs up? Rankings, and then proposes future programs of interest based upon such past behavior and ranking.”

Google’s “Priority Inbox” is a more recent example. Gmail is its email service. The importance of each email received is evaluated and ranked. Google uses these factors to determine the importance of an email. This includes the email that the user has read previously, the email that the user responded to, and the senders/keywords associated. The top mail list is given to the email sent that has scored highly in this assessment.

My6sense.com also offers similar services for triaging RSS or Twitter feeds. The software tracks the user’s past interactions with data feeds and prioritizes the most important items. My6sense uses Twitter to process user’s Twitter feeds. It considers which links were clicked, what tweets have been favorited, which tweets have been retweeted and the authors/keywords of such tweets.

These principles can also be applied to object interactions. If a Nordstrom customer uses her smartphone to take photos of Jimmy Choo motorcycle boots in a store, it could be taken as an indication of an interest in fashion, motorcycling, footwear, and Jimmy Choo merchandise. The person may later use her smartphone to capture images of River Road motorcycle saddle bags. This could indicate that the person is interested in motorcycling. Each new object in an image is revealed more information about the person. While some hypotheses might be confirmed (e.g. motorcycling), others may be dismissed.

“Aside from recognizing objects in imagery the analysis (which may include human review by crowdsourcing) can also be used to discern activities. You can also note the location (either from the imagery or by using GPS data.

Image analysis may be applied to an image frame and reveal that it contains a person riding a motorcycle with a tent and forested background. One image might show a person riding a bike, while another image could show a person wearing the same attire. Another image, taken just a few moments later, may show a motorcycle being driven with a forested background. All images may be located in Yellowstone National Park using GPS data.

“Such historical information?accumulated over time?can reveal recurrent themes and patterns that indicate subjects, activities, people, and places that are of interest to the user. A confidence metric can be assigned to each such conclusion based on the system?s confidence that the attribute accurately describes a user?s interest. (In the above examples,?motorcycling is higher than?Jimmy Choo merchandise.) (In the examples above,?motorcycling? would score higher than??Jimmy Choo merchandise. These data can be used to filter or highlight the data feeds (and other data) that the user is presented with.

“A history of device usage can be compiled to provide a complete history of interests, media consumption patterns, and other information that can be used for improving the user’s interaction in the world. If the user photographs Sephora fashion accessories, the parameters that control the user’s junk mail filter might be modified to allow the delivery of emails from this company that may otherwise have been blocked. A user’s web browsers, such as Safari on a smartphone or Firefox on a computer at home, may add the Sephora website to a list of “Suggested Favorites?” Similar to Tivo’s program suggestions.

A user can choose to create a Twitter account that is owned essentially by their object-derived profile. This account is able to follow tweets that relate to objects the user recently sensed. The profile-associated Twitter account can reflect the user’s interest in a Canon SLR camera. It can also follow tweets related to this subject. This account can then retweet these posts into a feed the user can check from their own Twitter account.

This object-derived profile information can be used to influence the content delivered via smartphones, TVs, and other content-delivery devices. It can also affect the content composition. Media mashups that include objects the user interacts with can be created for their consumption. Jimmy Choo motorcycle boots may be worn by a central character in virtual reality gaming. A Canon SLR camera may be part of the treasure that an opponent has captured.

“Whenever a user interacts with an object it can be published on Twitter, Facebook, and other social media. (subject to sharing parameters and user permission). These communications could also be referred to as “check-ins?” In the FourSquare sense it is for an object, media type, or both (music, TV, etc.). It is not for a place, but rather for an object or media type.

“Social network analysts will realize that this is a type of social network analysis but with nodes representing physical objects.”

“Social network analysis looks at relationships using network theory. The network includes nodes and connections (also known as edges, links or connections). Nodes represent the actors in the network, while ties are the relationships among them. Complex graph-based structures can result. There can be many types of ties among the nodes. The relationships can be?likes?,? in the example just given. ?owns,? etc.”

“A 3D graph can place objects in one plane and physical objects on another. The links between the planes link people to objects they like or own. (The default relationship could be?like. ?Owns? This may be deduced from data or inferred from context. A Camaro automobile taken by a user and geolocated at his home may indicate an “owns” relationship. relationship. A lookup of a Camaro plate number in a public database will also indicate ownership. relationship.)”

“A graph like this will typically contain links between people objects, as is the case in social network graphs. It may also include links between physical objects. One such link is the relationship between physical proximity. This relationship may link two cars parked in the same parking lot.

The number of links between two objects in a network can indicate their relative importance. The length of the network path linking two objects can indicate their degree of association. A shorter path indicates a greater degree of association.

“(While the detailed arrangement identified physical items by analysis of captured data, it will also be recognized that objects with whom the user interacts can also be identified by detection of RFID/NFC chip associated with such objects.

Analogous principles and embodiments can be used to analyze the user’s audio environment. This includes music and speech recognition. This information can also be used to select and compose streams of data that the user (e.g. user device) presents and/or may be sent by him (user device). Even more utility can be obtained by considering both the visual and auditory stimuli captured by user device sensor.

“Text Entry”

“The smartphone’s front-facing camera can be used to speed up text entry in a gaze tracking mode.”

The first step in establishing a basic geometrical reference frame is to have the user simultaneously look at three or four positions on the screen while using the smartphone camera 101 to monitor the gaze of either one or both eyes. FIG. FIG. 3 shows how the user is looking at successively points A, B and C. Repeating this cycle can increase accuracy. The principles of gaze tracking systems are well-known to the reader, so they will not be repeated here. You can find detailed examples of systems in patent publications 2011,0013007 and 2010,0295774, 20100295774, 20050175218, as well as references therein.

Once the system has established the geometrical framework linking the user’s eyes and the screen of the device, the user can use the displayed keyboard 102 to indicate the initial letter. Other keyboard displays can be used that make more use of the screen. A gesture can be used by the user to indicate selection of the gazed at letter. This could include an eye blink or tapping on the smartphone’s body (or the desk where it is lying). Text is added to the message area 103 when it is selected.

“Once an initial (e.g.,?N?) letter has been presented, data entry may be speeded up. Once an initial letter (e.g.,?N?) has been submitted, data entry can be speeded up and gaze tracking may become more accurate. This is done by presenting likely next letters in an expanded letter-menu section 104 of the screen. A menu of likely next letters is displayed each time a letter has been entered. This can be determined, for example, through frequency analysis of letter pairs in a representative corpus. FIG. FIG. 4 shows an example. The menu is a hexagonal array with tiles, but other configurations are possible.

“In this example the user has already typed the text “Now is the time to a_?” and the system is now waiting for the user’s selection of a letter to go where underscore 106 is indicated. The last letter that was selected was?a. This letter is now displayed in greyed-out format on the center tile. It is surrounded with a variety?including?an?. ?at,? ?al,? ?al,? These are the most common letter pairings that begin with?a. In the indicated hexagonal array is also displayed a ?–? Selection tile 108 (indicating that the next symbol should not be a space) and a keyboard selection tiles 110 are also displayed in the indicated hexagonal array.

To enter the next letter?l?, simply look at the?al?. display tile 112, and is accepted by tapping or any other gesture as described above. As shown in FIG. 5. This is the extended message. The menu 104 has been updated with the most commonly used letter pairs that begin with the letter “l” The device asks for a second letter input. To enter another?l To enter another?l?, the user looks at the?ll Tile 114 is viewed by the user, who gestures.”

Initial studies show that over half of text entry can now be done by using the expanded letter-menu of next-letters plus a space. The user can simply look at the keyboard tile 110 to request a different letter. A keyboard, like the one shown in FIG. 3 or another keyboard appears and the user selects from it.”

“Instead of four letter pairs, a place and a keyboard icon as shown in FIGS. Alternative embodiments 2 and 3 present five letter pairs and a space. FIG. 6. This arrangement displays a keyboard on the screen so that the user can choose letters directly from it. 4.”

“Instead the standard keyboard display 102, there is a variant keyboard display (102) as shown in FIG. 7?can also be used. This layout shows that five characters do not need to be displayed on the keyboard. The five most likely letters are already in the hexagonal menu. The five keys in the illustrated example are not completely omitted. Instead, they are given extra-small keys. The remaining 21 letters are given extra large keys. This arrangement makes it easier to select letters from the keyboard and allows for more precise gaze tracking. ”

“It should also be noted, that the variant keyboard layout (102a) of FIG. 7 does not include the space bar. There is an enlarged menu tile (116) for the space symbol. Therefore, there is no space bar on the keyboard 102a. This area has been replaced by common punctuation symbols in the illustrated arrangement.

“The artisan will see that there are many alternatives and extensions. For example, the last letter does not need to be in the middle of the hexagon.

“This space can be left empty or used to indicate the next letter indicated by the user?s current gaze. This allows the user to verify the selection before gesturing to confirm. Gazing at the middle tile does not change the gaze-based selection. You can summon a numericpad to your screen by selecting a numericpad icon (like the keyboard tile 110 in FIG. 4. A numeric keyboard or a keyboard can also be displayed on the screen during the message composition operation (as in keyboard 102 in FIG. 6). A few of the hexagonal tiles may give a guess as to the word being entered. This is again based on an analysis of a text corpus.

The corpus used for determining the most common letter pairings and full word guesses can be customized by the user. For example, it can store a historical archive that contains all text and/or emails sent from the device or authored by the user. You can add graphical elements and controls to the smartphone functionality (e.g., text-messaging app) to augment the display features.

“In other embodiments, the user can select symbols and words that are not displayed on the smartphone’s display. On such a page, a large-scale keyboard and a complete numericpad can be displayed. These pages can be used either alone or together with a letter menu like menu 104. The smartphone camera can also be used for gaze-tracking and geometrical calibration.

“Sign Language”

“Just like a smartphone can see a user?s eyes and interpret their movements, it can also watch the user’s hand gestures and interpret them. It is now a sign language interpreter.

Sign languages, with American and British sign languages being the dominant ones, contain a wide range of elements. All of these elements can be captured using a camera and can be identified by image analysis software. Signs typically include handform, orientation and movement. Similar gestures, called finger spelling (or manual alphabet), are used for proper names and other specialized vocabulary.

“An excellent sign language analysis module divides the smartphone-captured images into regions of interest by identifying contiguous sets pixel having chrominances in a gamut that is associated with most skin colors. This segmented imagery is then used to create a classification engine which matches the hand configuration with the best match from a reference handforms database. Sequences of frames are also processed to detect motion vectors. These indicate the movement at different points in the handforms and the changes in the orientations over the time. The movements that are detected are also applied to a database of references movements and changes in order to find the best match.

“When matching signs are discovered in the database, the textual meanings associated?with the discerned sign are retrieved from the records and can be output as words, phonemes, or letters to an output device such as the smartphone display screen.”

“Desirably the best-match data in the database is not exported in raw format. The database should identify for each sign a set or candidate matches. Each match must be assigned a confidence metric. The software will then determine which combination of letters, phonemes, or words is most likely to be used in the given context. It will also refer to a reference database that details word spellings (e.g. a dictionary) and identify frequently signed word-pairs. The artisan will know that similar techniques are used by speech recognition systems?to decrease the chance of outputting nonsense sentences.

Training is also an option for recognition software. The user can indicate that the system has misinterpreted a sign by making a sign. The sign is then repeated by the user. The system then offers an alternative interpretation?avoiding the previous interpretation (which the system infers was incorrect). This process can be repeated until it returns the correct interpretation. The system will then be able to add the sign(s) just spoken?along with the correct meaning to its reference signs database.

“Similarly, if the system interprets the sign and the user doesn’t challenge it, data about the captured imagery can be added into the reference database?in conjunction with that interpretation. This arrangement allows the system to recognize different signs. This technique can be used to train the system over time to recognize user-specific vernaculars and other idiosyncrasies.

“To assist in machine-recognition standard sign language can also be augmented to provide reference information or calibration that will aid understanding. A gesture that a user might use to sign to a smartphone is for example: extending thumbs and fingers from an outwardly facing palm (the typical sign for the number 5?). Then, return the fingers to a fist. The smartphone can then identify the user’s fleshtone color and determine the size of their hand and fingers. You can use the same gesture to separate concepts, such as a period at end of sentence. This punctuation is often expressed in American sign language with a pause. “An overt gesture rather than an absence of a gesture is a better parsing element to machine vision-based signlanguage interpretation.

“As mentioned, the interpreted sign language may be output as text on the smartphone’s display. Other arrangements are possible. You can store the text in plain text (e.g., in Word or ASCII documents) or you can convert it to speech using a text-to?speech converter. This will produce audible speech. The text can also be input to a translation service or routine (e.g. Google Translate) to convert it into another language.

“The proximity sensor on the smartphone could be used to detect the user’s approach (e.g. hands) and capture images of camera imagery. These frames can then be checked for skin-tone chrominance, long edges (or any other characteristics that are characteristic of fingers and/or hands). The phone could activate its sign language translation if it concludes that the user has moved his hands towards the phone. Apple’s FaceTime communications software may be modified to activate the sign-language translator when the user places their hands in front of the phone’s camera. The phone can then communicate text counterparts to the user?s hand gestures to the other party(ies), such as text display, text to speech conversion, and so on.

“Streaming Mode Detector”

“According to another aspect of the technology, smartphones are equipped to quickly identify plural objects and make them available for later review.”

“FIG. “FIG.8 shows an example. This application has a large view window, which is updated with streaming video from camera (i.e. the usual viewfinder mode). The system analyses the imagery and identifies any objects as the user pans over the camera. FIG. FIG. 8. There are many objects with barcodes that are within the camera’s view.

“In the illustrated system, the processor analyses the image frame starting from the center?looking out for identifiable features. Other arrangements may be used, such as a top-down or another image search process. The phone overlays bracketing 120 around an identified feature (e.g. the barcode number 118) or highlights it to let the user know what part of the displayed images caught its eye. A “whoosh” sound is then emitted from the device speaker. The device speaker emits a?whoosh? sound, and an animated indicia moves between the bracketed portion of the screen and the History 122 button at its bottom. The animation can be a square-shaped graphic that falls to the History button. The History button displays a red-circled counter 124 next to it. This indicates how many items were detected and placed in the History device (7 in this instance).

“After processing barcode 118, it continues to analyze the field of vision for other identifiable features. It then recognizes barcode 126, and continues to work outward from there. Counter 124 is incremented by?8. Next, it notes barcode 128, even though it is partially out of the camera’s view. This is possible because of redundant encoding. It takes less than three seconds to recognize and capture data from the three barcodes in the device history. This includes the associated feedback (sound and animation effects), and it usually takes 1 or 2 seconds.

Summary for “Channelized audio watermarks

“The present technology expands in certain respects upon technology detailed in the above-detailed patent application. These previous works can be used to implement the present technology and can be incorporated into the current technology.

Referring to FIG. 1. An illustrative device 14 includes a processor 16, memory 18, one or several input peripherals 20 and one or two output peripherals 22. System 12 can also contain a network connection 24 and remote computers 26.

An illustrative device 14 can be a smartphone or tablet computer. However, any other electronic device that is consumer-grade can be used. The processor may include a microprocessor, such as an Atom device or A4 device. The operating system software and application software stored in the memory control part of the processor’s operation. ), data, etc. A hard drive or flash memory could be used as memory.

“Input peripherals 20 could include a camera or a microphone. An interface system that converts analog signals from the camera/microphone into digital data is also possible. You can also use a touch screen or keyboard as an input peripheral. Output peripherals 22 include a speaker, display screen, and so on.

“The 24th network connection can be wired (e.g. Ethernet, etc. ), wireless (WiFi, 4G, Bluetooth, etc. Or both.”

“In an exemplary operation device 14 receives a set digital content data through a microphone 20. The interface can be connected through the network connection 24 or any other means. You can use any content data; audio is an example.

“The system 12 processes digital content data to create corresponding identification data. This can be done by using a digital watermark process or a fingerprinting algorithm. data (e.g., file names, header data, etc.). This data is used to identify the content data received from other data (e.g. other audio or video).

“By referencing this identification data the system determines which software should be invoked. Indexing a table, database or other data structure with this identification data is one way to accomplish this. This will allow you to identify the correct software. FIG. 2 shows an illustration of a table. 2.”

“In certain cases, the data structure might identify a single program. If this is the case, the software will be launched. The software does not need to be installed on the device. Apps that are cloud-based may be available. If the software is not available, it can be downloaded from an online repository such as the iTunes Store, installed, and launched. The device can also subscribe to the software-as-service version of the app. Depending on the implementation, the user may be asked permission to participate in certain actions. In other cases, such actions are carried out without disturbing the user.

“Sometimes, the data structure can identify multiple software programs. Different programs might be specific to certain platforms. In this case, device 12 could simply choose the program that corresponds to that platform (e.g. Android G2, iPhone 4, etc.). The data structure might identify other programs that are compatible with a particular platform. The device might check this situation to see if there are any already installed. It can launch the program if it is found. The device can choose between two programs if it finds them. The device might prompt the user to choose one or both. The device can choose to download an alternative program if none are available. This is done using an algorithm or user input. The application is launched once it has been downloaded and installed.

“(Sometimes, the data structure might identify different programs that serve various functions?all of which are related to the content. One app could be used to find lyrics. An app that relates to the biography of a musician could be another. An app that allows you to purchase the content could be another option. Each type of software can include multiple alternatives.

“Note: The device may have an already installed application that is technically suitable to work with the received content (e.g. to render an MPEG4 file or an MP3 file). There may be many or more programs that are technically compatible with certain operations. The content might indicate that only a small subset of the possible software programs should be used.

Software in the device 14 could enforce content-identified software selection. The system could also treat software identification as a preference that can be overridden by the user. In some cases, the user might be given an incentive to use content-identified software. Alternately, the user might be charged a fee or other impediment to use any software not identified by the content.

“Sometimes, the system might not render certain content on a particular device (e.g. because there is no suitable app or hardware capability), but it may invite the user transfer the content to another device that has the required capability and may implement such transfer. Ansel Adams may have been sceptical of large format photographs being used as screen savers on a low resolution smartphone display. The software might suggest that the user instead of trying to display the images on a small format, low resolution smartphone display, it will ask the user to transfer the images to a larger format HD display at home.

The system might render the content in a limited manner, instead of rendering it completely. A video could be rendered in a series or still frames, such as from scene transitions. The system can also transfer the content to a place where it can be enjoyed more effectively. If hardware considerations allow (e.g. screen resolution is sufficient),?the software can be downloaded and installed.

“As illustrated by the table in FIG. “As shown in FIG. 2 (which data structure might be located in the memory 18 or in a remote computing system 26, the indication of software could be based upon one or more contextual elements?in addition to content identification data. ”

“Context” is a formal definition. “Context” is any information that can be used as a way to describe an entity’s situation (a person, place, or object that is relevant to the interaction between an application and a user), including applications themselves.

“Context information can include computing context (network connectivity and memory availability, processor type and contention, as well as computing context (“context information”)). ), user context (user name, location, actions and preferences; also, friends and social networks. ), physical context (e.g., lighting, noise level, traffic, etc. ), temporal context (time, date, month, season). History of the above, etc.

“In the illustrated tableau, rows 32-34 correspond to the same content (i.e. the same content ID), but they indicate that different software should be used. Depending on whether the context is indoors and outdoors. The software is identified with a five symbol hex identifier, while the content is identified using six hex symbols. Other identifiers, with a longer or shorter length, may be used.

Row 36 shows two software items?both of these are invoked. One includes a second descriptor?an identifier for a YouTube video that will be loaded by software FF245. This software is intended for users in a daytime context and those aged between 20-25.

Row 38 displays the user’s location (zip code) as well as their gender, which is contextual data. The alternative specifies the software used to create this content/context (i.e. four identifiers??OR?d together as opposed with the?AND). ”

“Rows 41 and 42 demonstrate that the same content ID could correspond to different codecs depending on the device processor (Atom, A4).”

“(By way of comparison, think about the process by which codecs are currently chosen. The user may not be familiar with the technical differences between codecs and the artist does not have any control. The codec selection is made by the party least interested in it. Instead, certain media rendering software, such as Windows Media Player, comes with default codecs. The default codecs may not be able to handle certain content. If this happens, the rendering software will typically download a second codec (again without input from the parties most affected).

It will be clear that the software listed in table 30 can be either a standalone app or a component of a software program (codec, driver, etc.). The software can render content or provide additional functionality or information related to it. Some implementations of the “software” may include a URL or other data/parameters that are provided to another software program or online service. In some implementations, the?software? can include a URL or other data/parameter that is provided by another software program or online service (e.g. a YouTube video ID).

“Desirably all such software listed in the table is selected by the proprietor (e.g. creator, artist or copyright-holder of the content with it). This gives the proprietor an artistic control that is not available in other digital content systems. (The owner’s control should be treated with more respect than that of AOL or iTunes. The proprietor’s decision seems to be more important than the company that provides the word processing software.

“Often, the selection of software by the proprietor will be based on technical merit and aesthetics. Sometimes however, commercial considerations are involved. Robert Genn, an artist, noted that “…” ?Starving artist? Acceptable at age 20, but problematic at age 40

For example, if ambient audio is detected by a user’s phone from The Decemberists, the artist-specified data 30 might indicate that the device should either load the Amazon app to purchase the music or load the Amazon web page, in order to generate sales. The Red Hot Chili Peppers may also be detected by the same device. This group might have requested that the device load their web page or another app for the same purpose. Thus, the proprietor can specify the fulfillment service for content-oriented commerce.

An auction arrangement may be the best way to solve the problem of starving artists in some arrangements. The device 14 (or remote computer systems 26) might announce to an online service (akin Google AdWords), that the iPod of a customer?for which certain demographic profiles/context information may exist?has detected the soundtrack from the movie Avatar. The user can then be offered a purchase opportunity by holding a mini auction. The winner, e.g., EBay, then deposits the winning bid amount into an Account, which is shared with the artist and the auction service. The device launches an EBay application, which allows the user to purchase a copy of the movie or its soundtrack. These content detection events and the context information can be pushed to cloud-based services, which can create a highly competitive market for responses.

“(Auction technology can also be detailed in the assignee?s previously-cited patent application and in Google?s published patent applications US2010017298, and US2009198607.”)

Associative software can become popular because of the popularity of content. This could encourage other content owners to use such software with their content. Wide deployment of the software might increase consumer exposure to other proprietor’s content.

“United Music Group might digitally watermark its songs with an identifier that allows the FFmpeg M3 player to be identified by Universal Music Group as the preferred rendering program. The software is quickly installed by dedicated fans of UMG artists. This allows for widespread deployment on large numbers of consumer devices. The widespread use of the FFmpeg M3 software is one factor that music owners consider when making a decision about which software to include in the table 30.

“(The table 30 software may be modified over time such as during a song’s production cycle. A table-specified program may contain an app that introduces a new band to the public, or a YouTube clip. A different software selection might be indicated after the music is more popular or the band has been better known.

Application software is commonly used to perform music discovery and other content-related tasks. Some of these services (e.g. I/O) can be used in content-related software. Commercial OS software has never provided services that are specific to content identification or processing.

“According to a further aspect, the present technology provides operating system software to perform one or several services specific to content processing and identification.”

“In one implementation, an OS program interface (API), takes content data (or a pointer at a location where content data is stored) and returns fingerprint data corresponding to that input. A different OS service, either using the same API or another, takes the same input and returns the watermark information decoded directly from the content data. An input parameter to the API allows you to specify which watermark or fingerprint process is to be used. The service can also apply multiple watermarks and/or fingerprint extract processes to input data and return the resultant information back to the calling program. Watermark extraction can be used to verify the validity of the resultant data by referring to error correction data.

“The same API, or another, can further process the extracted fingerprint/watermark data to obtain XML-based content metadata that is associated with the content (e.g., text giving the title of the work, the name of the artist, the copyright holder, etc.). It may consult a remote metadata registry such as Gracenote to do this.

“Such a content-processing API can establish a message queue (e.g., a ?listening/hearing queue) to which results of the fingerprint/watermark extraction process (either literally, or the corresponding metadata) are published. The queue can be monitored by one or more applications. One app might be alerted to music by The Beatles. One app may be alerted to Disney movie soundtracks. The monitoring app can detect such content and launch activity. This includes logging the event, providing a complementing media content, and offering a purchasing opportunity.

You can also implement such functionality without the need for an operating system. One way to do this is through a publish/subscribe model. This allows apps to publish certain capabilities (e.g. listening for a specific type of audio) and others to subscribe. These arrangements allow loosely-coupled apps to cooperate to create a similar ecosystem.

The present technology can be used to monitor media that a user has been exposed to. The user does not need to take any action to start a discovery operation to find the song’s identity, unlike Shazam and other song identification services. (Of course the user must turn on the device at some point and authorize the background functionality. Instead, the device listens for longer periods of time?much more than the 10-15 seconds Shazam-like services allow, and during the user’s day. When content is detected, it is processed to recognize it. This information is stored in the device and used to prime software to reflect exposure.

“The device might process ambient audio for 15 minutes, an hour or a whole day. The device may display a list of content that the user has been exposed when the user interacts with it again. To engage in a discovery process, the user might be invited to touch content listings. The software associated with the content launches.

“In certain implementations, the device can prime software programs with information that is at least partially based on content identification data. This may cause the YouTube app to display a thumbnail that corresponds to a music video for a song the user has heard?readying it to be selected. A 90-second sample audio clip can be downloaded to the iPod music app. It is located in a ‘Recent Encounters’ folder. folder. A band email might be added to the email InBox. A trivia game app could load questions related to the band. These data are stored locally, so the user doesn’t need to direct it retrieval (e.g. from a website). The information is displayed prominently when the app is used again.

Social media apps can be used as platforms to share such information. An avatar might greet the user when they activate a Facebook app. . . ? Then, list the content that the user was exposed to, e.g., “Billy Liar?” By the Decemberists, “Boys Better?” The Dandy Warhols and the Nike LeBron James commercial. The app can remind users of the context where each item was found, such as while they were walking through downtown Portland on November 4, 2010. This is determined by GPS and accelerometer sensors. The Facebook app allows users to invite their friends to share any content. The app may also ask if the user would like discographies of any of these bands or full digital copies. It might also inquire if the user is interested in additional content or associated apps.

“The app can also report on the media encounters and related activities of the user’s friends (with appropriate permissions).

“The foregoing will show that certain embodiments of the above inventions make it easier for the user to locate apps associated with specific media content. Instead, the media content is used to locate its favored apps.

These embodiments ensure continuity between artistic intent and delivery. They optimize the experience the art is meant to create. The artist does not have to allow a platform to deliver the artistic experience.

This technology encourages competition in app marketplaces, giving artists more control over which apps are most popular. A Darwinian effect could emerge where app popularity is less a result of marketing budgets and branding, but more a result of the popularity of the content delivered.

“Other Arrangements”

“Filtering/Highlighting Data Streams by Reference to Object Interactions”

“Users are being presented with increasing amounts of data. There are hundreds of channels for television, emails, RSS/Twitter/social networks/blog feeds. Technologies have been developed to filter and highlight incoming information based on user profile data.

DVR software such as Tivo is a familiar example. It presents a subset unabridged of the electronic program guide based on user interest. Tivo software allows users to view television programs and invites feedback from them in the form of thumbs-up or thumbs-down. Tivo software can detect which television programs have been viewed by the user and invites feedback in the form of?thumbs up? Rankings, and then proposes future programs of interest based upon such past behavior and ranking.”

Google’s “Priority Inbox” is a more recent example. Gmail is its email service. The importance of each email received is evaluated and ranked. Google uses these factors to determine the importance of an email. This includes the email that the user has read previously, the email that the user responded to, and the senders/keywords associated. The top mail list is given to the email sent that has scored highly in this assessment.

My6sense.com also offers similar services for triaging RSS or Twitter feeds. The software tracks the user’s past interactions with data feeds and prioritizes the most important items. My6sense uses Twitter to process user’s Twitter feeds. It considers which links were clicked, what tweets have been favorited, which tweets have been retweeted and the authors/keywords of such tweets.

These principles can also be applied to object interactions. If a Nordstrom customer uses her smartphone to take photos of Jimmy Choo motorcycle boots in a store, it could be taken as an indication of an interest in fashion, motorcycling, footwear, and Jimmy Choo merchandise. The person may later use her smartphone to capture images of River Road motorcycle saddle bags. This could indicate that the person is interested in motorcycling. Each new object in an image is revealed more information about the person. While some hypotheses might be confirmed (e.g. motorcycling), others may be dismissed.

“Aside from recognizing objects in imagery the analysis (which may include human review by crowdsourcing) can also be used to discern activities. You can also note the location (either from the imagery or by using GPS data.

Image analysis may be applied to an image frame and reveal that it contains a person riding a motorcycle with a tent and forested background. One image might show a person riding a bike, while another image could show a person wearing the same attire. Another image, taken just a few moments later, may show a motorcycle being driven with a forested background. All images may be located in Yellowstone National Park using GPS data.

“Such historical information?accumulated over time?can reveal recurrent themes and patterns that indicate subjects, activities, people, and places that are of interest to the user. A confidence metric can be assigned to each such conclusion based on the system?s confidence that the attribute accurately describes a user?s interest. (In the above examples,?motorcycling is higher than?Jimmy Choo merchandise.) (In the examples above,?motorcycling? would score higher than??Jimmy Choo merchandise. These data can be used to filter or highlight the data feeds (and other data) that the user is presented with.

“A history of device usage can be compiled to provide a complete history of interests, media consumption patterns, and other information that can be used for improving the user’s interaction in the world. If the user photographs Sephora fashion accessories, the parameters that control the user’s junk mail filter might be modified to allow the delivery of emails from this company that may otherwise have been blocked. A user’s web browsers, such as Safari on a smartphone or Firefox on a computer at home, may add the Sephora website to a list of “Suggested Favorites?” Similar to Tivo’s program suggestions.

A user can choose to create a Twitter account that is owned essentially by their object-derived profile. This account is able to follow tweets that relate to objects the user recently sensed. The profile-associated Twitter account can reflect the user’s interest in a Canon SLR camera. It can also follow tweets related to this subject. This account can then retweet these posts into a feed the user can check from their own Twitter account.

This object-derived profile information can be used to influence the content delivered via smartphones, TVs, and other content-delivery devices. It can also affect the content composition. Media mashups that include objects the user interacts with can be created for their consumption. Jimmy Choo motorcycle boots may be worn by a central character in virtual reality gaming. A Canon SLR camera may be part of the treasure that an opponent has captured.

“Whenever a user interacts with an object it can be published on Twitter, Facebook, and other social media. (subject to sharing parameters and user permission). These communications could also be referred to as “check-ins?” In the FourSquare sense it is for an object, media type, or both (music, TV, etc.). It is not for a place, but rather for an object or media type.

“Social network analysts will realize that this is a type of social network analysis but with nodes representing physical objects.”

“Social network analysis looks at relationships using network theory. The network includes nodes and connections (also known as edges, links or connections). Nodes represent the actors in the network, while ties are the relationships among them. Complex graph-based structures can result. There can be many types of ties among the nodes. The relationships can be?likes?,? in the example just given. ?owns,? etc.”

“A 3D graph can place objects in one plane and physical objects on another. The links between the planes link people to objects they like or own. (The default relationship could be?like. ?Owns? This may be deduced from data or inferred from context. A Camaro automobile taken by a user and geolocated at his home may indicate an “owns” relationship. relationship. A lookup of a Camaro plate number in a public database will also indicate ownership. relationship.)”

“A graph like this will typically contain links between people objects, as is the case in social network graphs. It may also include links between physical objects. One such link is the relationship between physical proximity. This relationship may link two cars parked in the same parking lot.

The number of links between two objects in a network can indicate their relative importance. The length of the network path linking two objects can indicate their degree of association. A shorter path indicates a greater degree of association.

“(While the detailed arrangement identified physical items by analysis of captured data, it will also be recognized that objects with whom the user interacts can also be identified by detection of RFID/NFC chip associated with such objects.

Analogous principles and embodiments can be used to analyze the user’s audio environment. This includes music and speech recognition. This information can also be used to select and compose streams of data that the user (e.g. user device) presents and/or may be sent by him (user device). Even more utility can be obtained by considering both the visual and auditory stimuli captured by user device sensor.

“Text Entry”

“The smartphone’s front-facing camera can be used to speed up text entry in a gaze tracking mode.”

The first step in establishing a basic geometrical reference frame is to have the user simultaneously look at three or four positions on the screen while using the smartphone camera 101 to monitor the gaze of either one or both eyes. FIG. FIG. 3 shows how the user is looking at successively points A, B and C. Repeating this cycle can increase accuracy. The principles of gaze tracking systems are well-known to the reader, so they will not be repeated here. You can find detailed examples of systems in patent publications 2011,0013007 and 2010,0295774, 20100295774, 20050175218, as well as references therein.

Once the system has established the geometrical framework linking the user’s eyes and the screen of the device, the user can use the displayed keyboard 102 to indicate the initial letter. Other keyboard displays can be used that make more use of the screen. A gesture can be used by the user to indicate selection of the gazed at letter. This could include an eye blink or tapping on the smartphone’s body (or the desk where it is lying). Text is added to the message area 103 when it is selected.

“Once an initial (e.g.,?N?) letter has been presented, data entry may be speeded up. Once an initial letter (e.g.,?N?) has been submitted, data entry can be speeded up and gaze tracking may become more accurate. This is done by presenting likely next letters in an expanded letter-menu section 104 of the screen. A menu of likely next letters is displayed each time a letter has been entered. This can be determined, for example, through frequency analysis of letter pairs in a representative corpus. FIG. FIG. 4 shows an example. The menu is a hexagonal array with tiles, but other configurations are possible.

“In this example the user has already typed the text “Now is the time to a_?” and the system is now waiting for the user’s selection of a letter to go where underscore 106 is indicated. The last letter that was selected was?a. This letter is now displayed in greyed-out format on the center tile. It is surrounded with a variety?including?an?. ?at,? ?al,? ?al,? These are the most common letter pairings that begin with?a. In the indicated hexagonal array is also displayed a ?–? Selection tile 108 (indicating that the next symbol should not be a space) and a keyboard selection tiles 110 are also displayed in the indicated hexagonal array.

To enter the next letter?l?, simply look at the?al?. display tile 112, and is accepted by tapping or any other gesture as described above. As shown in FIG. 5. This is the extended message. The menu 104 has been updated with the most commonly used letter pairs that begin with the letter “l” The device asks for a second letter input. To enter another?l To enter another?l?, the user looks at the?ll Tile 114 is viewed by the user, who gestures.”

Initial studies show that over half of text entry can now be done by using the expanded letter-menu of next-letters plus a space. The user can simply look at the keyboard tile 110 to request a different letter. A keyboard, like the one shown in FIG. 3 or another keyboard appears and the user selects from it.”

“Instead of four letter pairs, a place and a keyboard icon as shown in FIGS. Alternative embodiments 2 and 3 present five letter pairs and a space. FIG. 6. This arrangement displays a keyboard on the screen so that the user can choose letters directly from it. 4.”

“Instead the standard keyboard display 102, there is a variant keyboard display (102) as shown in FIG. 7?can also be used. This layout shows that five characters do not need to be displayed on the keyboard. The five most likely letters are already in the hexagonal menu. The five keys in the illustrated example are not completely omitted. Instead, they are given extra-small keys. The remaining 21 letters are given extra large keys. This arrangement makes it easier to select letters from the keyboard and allows for more precise gaze tracking. ”

“It should also be noted, that the variant keyboard layout (102a) of FIG. 7 does not include the space bar. There is an enlarged menu tile (116) for the space symbol. Therefore, there is no space bar on the keyboard 102a. This area has been replaced by common punctuation symbols in the illustrated arrangement.

“The artisan will see that there are many alternatives and extensions. For example, the last letter does not need to be in the middle of the hexagon.

“This space can be left empty or used to indicate the next letter indicated by the user?s current gaze. This allows the user to verify the selection before gesturing to confirm. Gazing at the middle tile does not change the gaze-based selection. You can summon a numericpad to your screen by selecting a numericpad icon (like the keyboard tile 110 in FIG. 4. A numeric keyboard or a keyboard can also be displayed on the screen during the message composition operation (as in keyboard 102 in FIG. 6). A few of the hexagonal tiles may give a guess as to the word being entered. This is again based on an analysis of a text corpus.

The corpus used for determining the most common letter pairings and full word guesses can be customized by the user. For example, it can store a historical archive that contains all text and/or emails sent from the device or authored by the user. You can add graphical elements and controls to the smartphone functionality (e.g., text-messaging app) to augment the display features.

“In other embodiments, the user can select symbols and words that are not displayed on the smartphone’s display. On such a page, a large-scale keyboard and a complete numericpad can be displayed. These pages can be used either alone or together with a letter menu like menu 104. The smartphone camera can also be used for gaze-tracking and geometrical calibration.

“Sign Language”

“Just like a smartphone can see a user?s eyes and interpret their movements, it can also watch the user’s hand gestures and interpret them. It is now a sign language interpreter.

Sign languages, with American and British sign languages being the dominant ones, contain a wide range of elements. All of these elements can be captured using a camera and can be identified by image analysis software. Signs typically include handform, orientation and movement. Similar gestures, called finger spelling (or manual alphabet), are used for proper names and other specialized vocabulary.

“An excellent sign language analysis module divides the smartphone-captured images into regions of interest by identifying contiguous sets pixel having chrominances in a gamut that is associated with most skin colors. This segmented imagery is then used to create a classification engine which matches the hand configuration with the best match from a reference handforms database. Sequences of frames are also processed to detect motion vectors. These indicate the movement at different points in the handforms and the changes in the orientations over the time. The movements that are detected are also applied to a database of references movements and changes in order to find the best match.

“When matching signs are discovered in the database, the textual meanings associated?with the discerned sign are retrieved from the records and can be output as words, phonemes, or letters to an output device such as the smartphone display screen.”

“Desirably the best-match data in the database is not exported in raw format. The database should identify for each sign a set or candidate matches. Each match must be assigned a confidence metric. The software will then determine which combination of letters, phonemes, or words is most likely to be used in the given context. It will also refer to a reference database that details word spellings (e.g. a dictionary) and identify frequently signed word-pairs. The artisan will know that similar techniques are used by speech recognition systems?to decrease the chance of outputting nonsense sentences.

Training is also an option for recognition software. The user can indicate that the system has misinterpreted a sign by making a sign. The sign is then repeated by the user. The system then offers an alternative interpretation?avoiding the previous interpretation (which the system infers was incorrect). This process can be repeated until it returns the correct interpretation. The system will then be able to add the sign(s) just spoken?along with the correct meaning to its reference signs database.

“Similarly, if the system interprets the sign and the user doesn’t challenge it, data about the captured imagery can be added into the reference database?in conjunction with that interpretation. This arrangement allows the system to recognize different signs. This technique can be used to train the system over time to recognize user-specific vernaculars and other idiosyncrasies.

“To assist in machine-recognition standard sign language can also be augmented to provide reference information or calibration that will aid understanding. A gesture that a user might use to sign to a smartphone is for example: extending thumbs and fingers from an outwardly facing palm (the typical sign for the number 5?). Then, return the fingers to a fist. The smartphone can then identify the user’s fleshtone color and determine the size of their hand and fingers. You can use the same gesture to separate concepts, such as a period at end of sentence. This punctuation is often expressed in American sign language with a pause. “An overt gesture rather than an absence of a gesture is a better parsing element to machine vision-based signlanguage interpretation.

“As mentioned, the interpreted sign language may be output as text on the smartphone’s display. Other arrangements are possible. You can store the text in plain text (e.g., in Word or ASCII documents) or you can convert it to speech using a text-to?speech converter. This will produce audible speech. The text can also be input to a translation service or routine (e.g. Google Translate) to convert it into another language.

“The proximity sensor on the smartphone could be used to detect the user’s approach (e.g. hands) and capture images of camera imagery. These frames can then be checked for skin-tone chrominance, long edges (or any other characteristics that are characteristic of fingers and/or hands). The phone could activate its sign language translation if it concludes that the user has moved his hands towards the phone. Apple’s FaceTime communications software may be modified to activate the sign-language translator when the user places their hands in front of the phone’s camera. The phone can then communicate text counterparts to the user?s hand gestures to the other party(ies), such as text display, text to speech conversion, and so on.

“Streaming Mode Detector”

“According to another aspect of the technology, smartphones are equipped to quickly identify plural objects and make them available for later review.”

“FIG. “FIG.8 shows an example. This application has a large view window, which is updated with streaming video from camera (i.e. the usual viewfinder mode). The system analyses the imagery and identifies any objects as the user pans over the camera. FIG. FIG. 8. There are many objects with barcodes that are within the camera’s view.

“In the illustrated system, the processor analyses the image frame starting from the center?looking out for identifiable features. Other arrangements may be used, such as a top-down or another image search process. The phone overlays bracketing 120 around an identified feature (e.g. the barcode number 118) or highlights it to let the user know what part of the displayed images caught its eye. A “whoosh” sound is then emitted from the device speaker. The device speaker emits a?whoosh? sound, and an animated indicia moves between the bracketed portion of the screen and the History 122 button at its bottom. The animation can be a square-shaped graphic that falls to the History button. The History button displays a red-circled counter 124 next to it. This indicates how many items were detected and placed in the History device (7 in this instance).

“After processing barcode 118, it continues to analyze the field of vision for other identifiable features. It then recognizes barcode 126, and continues to work outward from there. Counter 124 is incremented by?8. Next, it notes barcode 128, even though it is partially out of the camera’s view. This is possible because of redundant encoding. It takes less than three seconds to recognize and capture data from the three barcodes in the device history. This includes the associated feedback (sound and animation effects), and it usually takes 1 or 2 seconds.

Click here to view the patent on Google Patents.