Invented by Dipock Das, Dayanand Pochugari, Neeraj Verma, Nikesh Padakanti, Aungon Nag Radon, Anand Srinivasabagavathar, Adam Oliner, Splunk Inc

natural language processing. Natural language processing (NLP) has become an integral part of our daily lives. From voice assistants like Siri and Alexa to chatbots on websites, NLP enables machines to understand and respond to human language. However, one of the challenges in NLP is disambiguation, which refers to the process of resolving the meaning of ambiguous words or phrases in a given context. To tackle this challenge, researchers and developers have been exploring user-specific approaches based on machine learning recommendation models. The market for determining an approach to disambiguation that is user-specific based on a machine learning recommendation model is rapidly growing. With the increasing demand for personalized experiences, companies are investing in technologies that can understand and cater to individual preferences. Disambiguation plays a crucial role in achieving this personalization by accurately interpreting user input and providing relevant responses. Machine learning recommendation models have proven to be effective in various domains, such as e-commerce and content recommendation. These models learn from user behavior and preferences to make personalized recommendations. By applying this concept to disambiguation, NLP systems can adapt to individual users and provide more accurate interpretations. The user-specific approach to disambiguation involves training machine learning models on user-specific data. This data can include past interactions, search history, and user preferences. By analyzing this data, the models can learn patterns and make predictions about the user’s intended meaning when faced with ambiguous language. One of the key advantages of this approach is its ability to adapt to individual users’ language usage and preferences. Different users may have different interpretations for the same ambiguous word or phrase based on their background, culture, or personal experiences. By considering user-specific data, the machine learning models can capture these nuances and provide more accurate disambiguation. Implementing a user-specific approach to disambiguation requires a robust infrastructure for data collection, storage, and analysis. Companies need to ensure that user data is collected and stored securely, adhering to privacy regulations. Additionally, the machine learning models need to be continuously updated and retrained to adapt to changing user preferences and language usage. The market for user-specific disambiguation is not limited to voice assistants and chatbots. It has potential applications in various industries, such as customer support, healthcare, and legal services. In customer support, for example, a user-specific disambiguation model can help agents understand customer queries more accurately and provide relevant solutions. In healthcare, the model can assist doctors in interpreting patient symptoms and medical records, leading to better diagnoses and treatment plans. However, challenges remain in developing and deploying user-specific disambiguation models. One of the main challenges is the availability of user-specific data. Companies need to encourage users to provide feedback and share their preferences to train the models effectively. Additionally, ethical considerations, such as bias in the training data and the potential for misuse of user data, need to be addressed to ensure fairness and privacy. In conclusion, the market for determining an approach to disambiguation that is user-specific based on a machine learning recommendation model is expanding rapidly. Companies are recognizing the importance of personalized experiences and are investing in technologies that can accurately interpret user input. By leveraging machine learning models trained on user-specific data, NLP systems can provide more accurate disambiguation, leading to improved user experiences in various domains. However, ethical considerations and challenges in data collection and privacy need to be addressed to ensure the responsible development and deployment of these models.

The Splunk Inc invention works as follows

In various embodiments, a NL application implements functionality to enable users to access data storage systems more efficiently based on NL queries. The operations of the NL app are guided, at least partially, by one or more machine-learning and/or templates. The templates and/or models of machine learning provide a flexible structure that can be easily tailored to reduce user effort and time associated with processing NL request and increase accuracy in NL application implementations.

Background for Determining an approach to disambiguation that is user-specific based on a machine learning recommendation model for interaction

Field of Invention

The present invention relates to computer science, data science, and more specifically to adaptable methods for interacting with data sources using natural language.

Description of Related Art

Natural language (NL), data applications were developed to allow users to analyze and access data from a variety of data sources, without having to be experts in the DSLs associated with them. A NL data app extracts and curates the metadata that is associated with different data sources. It then translates the NL request into an appropriate DSL query, uses the DSL to retrieve data from the domain-specific data source, performs operations on the retrieved information, and displays the result. In NL data applications, the ambiguity of NL requests can be a problem. To determine the intent of a user, the application must engage the user in arduous, repetitive, interactive and time-consuming interrogation processes. These interrogation procedures are inefficient, and can be irritating to users.

For instance, let’s say that a user asks for?sales according to price? To obtain Colorado unit sales by price for the prior week. A typical NL application would ask the user for a number of information to disambiguate, including type of sale, geographic region, and period. Imagine that the user requests unit sales for Colorado each week. If the user accidentally requests’sales by price’ Instead of ‘unit sales by price for Colorado the previous week? The NL data application will repeat the above requests for information to disambiguate the question.

As the above illustrates, the technology needs to develop more effective techniques of interfacing with underlying data sources through natural language applications.

The present invention includes a method of disambiguating and executing a NL query. The method comprises generating an enquiry based upon a first ambiguous NL question received from a users and a model of interaction that links the first ambiguous NL and the user to the inquiry. This model is created via a machine-learning algorithm.

Further embodiments include, among others, a computer-readable media and a device configured to implement the above method.

The template provides a flexible framework which can be easily tailored to reduce time and effort required to process NL requests, and increase accuracy in NL application implementations. It is noteworthy that the functionality and/or effectiveness of a NL application implementing the disclosed techniques can be improved by updating templates without changing the NL software.

In the following description are numerous specific details that will help you understand the invention better. It will be obvious to those skilled in the art, however, that the present invention can be implemented without any of these specifics.

General Overview

Modern data centers and computing environments may consist of anywhere from a handful to thousands or systems that are configured to process data and service requests from distant clients and perform countless other computational tasks. Machine data is generated by many components in these computing environments. Machine data can be any data that is produced by a component or machine in an IT environment, and that reflects the activity of the IT environment. Machine data is raw data generated by components of IT environments such as servers and sensors. It can also be data from mobile devices or Internet of Things devices. Machine data can include system logs, network packet data, sensor data, application program data, error logs, stack traces, system performance data, etc. Machine data includes diagnostic data, performance data and other data types that can be used to diagnose performance issues, monitor user interaction, and derive other insights.

There are a number of tools available for analyzing machine data. To reduce the amount of data generated by machines, these tools pre-process data according to anticipated analysis needs. In order to make it easier for users to retrieve and analyze data, certain data items can be extracted and stored from machine data. The rest of the data is typically not saved, and it is discarded when pre-processing. Storage capacity is becoming cheaper and more abundant, so there are less reasons to discard machine data.

This abundant storage capacity makes it possible to store large quantities of minimally processed data for later retrieval or analysis. An analyst can search the entire machine data at once, rather than focusing on a specific set of items. This allows for greater flexibility and allows them to store minimally processed data. An analyst may be able to examine different aspects of the data, which could allow them to analyze other parts.

However, it is difficult to analyze and search large amounts of machine data. A data center, server, or network appliance may produce many types and formats (e.g. system logs, packet data (e.g. wire data), etc. ), sensor data, application program data, error logs, stack traces, system performance data, operating system data, virtualization data, etc.) There are thousands of components that can make it difficult to analyze. Mobile devices can also generate large amounts information about data accesses and network performance. These types of information can be reported by millions of mobile devices.

In the data input and query system machine data is collected and stored in ‘events’. A portion of machine data is stored as an event and it’s associated with a certain point in time. The machine data can be a reflection of activity in an IT-environment and produced by one component. Events may then be searched for to gain insight into that IT-environment, improving performance of components within it. The events can be derived by analyzing ‘time series data. The time series data consists of a sequence data points, e.g. performance measurements taken from a computer, etc. They are linked to successive time points. Each event contains a part of machine data, and the timestamp is calculated from that portion. A timestamp for an event can be calculated by interpolating between events with known timestamps that are temporally close to each other, or using other configurable rules.

In some cases, machine data may have a format predefined, in which data items of a specific data format are stored in predefined locations within the data. Machine data can include, for example, data that is associated with a table in a database. Machine data can also have a repeatable pattern (e.g. not random) but not a fixed format. This means that machine data may consist of various data items that are stored in different locations. When the data source comes from an operating system’s log, for example, an event could include one or several lines of the log that contain machine data containing different types performance and diagnostic data associated with a particular point in time.

The machine data generated by such data sources can include, for example and without limitation, server log files, activity log files, configuration files, messages, network packet data, performance measurements, sensor measurements, etc. Machine data can be generated from such data sources, including, but not limited to, log files for servers, activity logs, configuration files and messages.

Click here to view the patent on Google Patents.