Invented by Dipock Das, Dayanand Pochugari, Neeraj Verma, Nikesh Padakanti, Aungon Nag Radon, Anand Srinivasabagavathar, Adam Oliner, Splunk Inc
The Splunk Inc invention works as follows
In various embodiments, a NL application implements functionality to enable users to access data storage systems more efficiently based on NL queries. The operations of the NL app are guided, at least partially, by one or more machine-learning and/or templates. The templates and/or models of machine learning provide a flexible structure that can be easily tailored to reduce user effort and time associated with processing NL request and increase accuracy in NL application implementations.Background for Determining an approach to disambiguation that is user-specific based on a machine learning recommendation model for interaction
Field of Invention
The present invention relates to computer science, data science, and more specifically to adaptable methods for interacting with data sources using natural language.
Description of Related Art
Natural language (NL), data applications were developed to allow users to analyze and access data from a variety of data sources, without having to be experts in the DSLs associated with them. A NL data app extracts and curates the metadata that is associated with different data sources. It then translates the NL request into an appropriate DSL query, uses the DSL to retrieve data from the domain-specific data source, performs operations on the retrieved information, and displays the result. In NL data applications, the ambiguity of NL requests can be a problem. To determine the intent of a user, the application must engage the user in arduous, repetitive, interactive and time-consuming interrogation processes. These interrogation procedures are inefficient, and can be irritating to users.
For instance, let’s say that a user asks for?sales according to price? To obtain Colorado unit sales by price for the prior week. A typical NL application would ask the user for a number of information to disambiguate, including type of sale, geographic region, and period. Imagine that the user requests unit sales for Colorado each week. If the user accidentally requests’sales by price’ Instead of ‘unit sales by price for Colorado the previous week? The NL data application will repeat the above requests for information to disambiguate the question.
As the above illustrates, the technology needs to develop more effective techniques of interfacing with underlying data sources through natural language applications.
The present invention includes a method of disambiguating and executing a NL query. The method comprises generating an enquiry based upon a first ambiguous NL question received from a users and a model of interaction that links the first ambiguous NL and the user to the inquiry. This model is created via a machine-learning algorithm.
Further embodiments include, among others, a computer-readable media and a device configured to implement the above method.
The template provides a flexible framework which can be easily tailored to reduce time and effort required to process NL requests, and increase accuracy in NL application implementations. It is noteworthy that the functionality and/or effectiveness of a NL application implementing the disclosed techniques can be improved by updating templates without changing the NL software.
In the following description are numerous specific details that will help you understand the invention better. It will be obvious to those skilled in the art, however, that the present invention can be implemented without any of these specifics.
General Overview
Modern data centers and computing environments may consist of anywhere from a handful to thousands or systems that are configured to process data and service requests from distant clients and perform countless other computational tasks. Machine data is generated by many components in these computing environments. Machine data can be any data that is produced by a component or machine in an IT environment, and that reflects the activity of the IT environment. Machine data is raw data generated by components of IT environments such as servers and sensors. It can also be data from mobile devices or Internet of Things devices. Machine data can include system logs, network packet data, sensor data, application program data, error logs, stack traces, system performance data, etc. Machine data includes diagnostic data, performance data and other data types that can be used to diagnose performance issues, monitor user interaction, and derive other insights.
There are a number of tools available for analyzing machine data. To reduce the amount of data generated by machines, these tools pre-process data according to anticipated analysis needs. In order to make it easier for users to retrieve and analyze data, certain data items can be extracted and stored from machine data. The rest of the data is typically not saved, and it is discarded when pre-processing. Storage capacity is becoming cheaper and more abundant, so there are less reasons to discard machine data.
This abundant storage capacity makes it possible to store large quantities of minimally processed data for later retrieval or analysis. An analyst can search the entire machine data at once, rather than focusing on a specific set of items. This allows for greater flexibility and allows them to store minimally processed data. An analyst may be able to examine different aspects of the data, which could allow them to analyze other parts.
However, it is difficult to analyze and search large amounts of machine data. A data center, server, or network appliance may produce many types and formats (e.g. system logs, packet data (e.g. wire data), etc. ), sensor data, application program data, error logs, stack traces, system performance data, operating system data, virtualization data, etc.) There are thousands of components that can make it difficult to analyze. Mobile devices can also generate large amounts information about data accesses and network performance. These types of information can be reported by millions of mobile devices.
In the data input and query system machine data is collected and stored in ‘events’. A portion of machine data is stored as an event and it’s associated with a certain point in time. The machine data can be a reflection of activity in an IT-environment and produced by one component. Events may then be searched for to gain insight into that IT-environment, improving performance of components within it. The events can be derived by analyzing ‘time series data. The time series data consists of a sequence data points, e.g. performance measurements taken from a computer, etc. They are linked to successive time points. Each event contains a part of machine data, and the timestamp is calculated from that portion. A timestamp for an event can be calculated by interpolating between events with known timestamps that are temporally close to each other, or using other configurable rules.
In some cases, machine data may have a format predefined, in which data items of a specific data format are stored in predefined locations within the data. Machine data can include, for example, data that is associated with a table in a database. Machine data can also have a repeatable pattern (e.g. not random) but not a fixed format. This means that machine data may consist of various data items that are stored in different locations. When the data source comes from an operating system’s log, for example, an event could include one or several lines of the log that contain machine data containing different types performance and diagnostic data associated with a particular point in time.
The machine data generated by such data sources can include, for example and without limitation, server log files, activity log files, configuration files, messages, network packet data, performance measurements, sensor measurements, etc. Machine data can be generated from such data sources, including, but not limited to, log files for servers, activity logs, configuration files and messages.
Click here to view the patent on Google Patents.