Invented by Angelos D. Keromytis, Salvatore J. Stolfo, Columbia University of New York

Identifying a Market for Function Call Detection

A market is an arena for buyers and sellers to exchange goods or services. It can exist physically or virtually, such as the stock exchange or online store. Common types of markets include fish markets, farmers’ markets, street/food retail outlets, souks and flea markets.

Function call detection

Function call detection is a critical element of software security. It helps detect malicious code by locating its start or entry point and following along its internal control flow to detect instructions until its exit points.

Recently, many tools have turned to exception handling information for function start detection. Unfortunately, these programs often suffer from two issues: they combine this reliable source with unsafe approaches that lack assurances of correctness; and secondly, they rely on call frame fidelity without accounting for errors introduced during scan operations.

The use of exception handling information in function start detection can have a substantial effect on coverage, but it also leads to false positives due to inaccuracy when repairing control flows and incomplete analysis of control flows [27]. Furthermore, using exception handling data often reduces the number of true function starts detected [30].

This paper investigates these two problems, seeking to provide new insights into optimal strategies for using exception-handling information in function start detection. We analyze results using ANGR and GHIDRA tools that use exception-handling info to detect function starts in non-disassembled regions.

ANGR utilizes prologue matching to detect function starts in non-disassembled code regions, followed by recursive disassembly from each matched function start. This approach yields more precise and complete detection of function starts than prologue matching alone can achieve; nonetheless, it comes with significant error and uncertainty benefits (see Figure 5a).

GHIDRA also utilizes heuristics to detect tail calls and treat their targets as function starts (not enabled by default). Our results show that this approach reduces false positives but increases binaries with reduced coverage. To further optimize this strategy, we developed a rule that ensures any missed tail call targets are not referenced elsewhere other than jumps within the current function; this rule can be applied across other tools and helps avoid an excessive number of false positives.

Conclusions

A market is an organized group of buyers and sellers for a product or service. It can exist in either physical or virtual form, but must have interested buyers. The term may also refer to an industry or business sector. Conclusion: Determining your target market for your goods or services is critical because it helps focus your advertising efforts and guarantee the correct people are purchasing from you.

The Columbia University of New York invention works as follows

Methods and media for detecting an abnormal sequence of function call are provided. These methods include compressing a sequence function calls by execution of a program using an compression model. Then, based on how compressed the sequence is, determining whether there is an abnormal sequence function calls in that sequence. These methods include running at most one program, observing at the least one sequence function call made by execution of that program, assigning each type function call in the sequence of function calling made by at least 1 known program a unique ID; and creating at minimum part of the compression modeling by recording at the least one sequence.

Background for Methods and media for detecting an abnormal sequence of function calls

Applications can be terminated due to any number or software faults, threats, attacks, or other suitable software failures. Computer viruses, trojans and hackers, key recovery attacks, malicious executables and probes are all possible. Computer viruses pose a constant threat to computers connected to public networks (such the Internet) or private networks (such corporate computer networks). Many computers have firewalls and antivirus software to protect them from these threats. These preventative measures may not be sufficient. Many services need to be available in case of remote attacks, high-volume events, such as Slammer and Blaster, or simple application-level DoS attacks.

These threats aside, programs often contain errors during operation which are usually caused by programmer error. These software errors and failures can lead to illegal memory access errors or division by zero errors. These errors can cause an application’s execution to stop or crash.

Methods and media for detecting an abnormal sequence of function call are provided. Methods for detecting an abnormal sequence of function call are provided in some embodiments. These methods include compressing a sequence function calls by execution of a program using an compression model. Then, based on how compressed the sequence is, determining whether there is an abnormal sequence of functions calls in the sequence. Some embodiments of the methods include the following: Executing at most one program; monitoring at least 1 sequence of function calling made by that program; assigning each type function call in the sequence of function called made by the program at the least one; creating at minimum part of the compression modeling by recording at the least 1 sequence of unique IDs based on each type function call’s unique identifier and the observed at the least one function call.

In some embodiments computer-readable media containing computer executable instructions that cause a processor to execute a method for detecting an abnormal sequence of function calls is provided. These methods include compressing a sequence function call made by execution of a program using an compression model. Then, based on how compressed the sequence is, determining whether there is an abnormal sequence of functions calls in the sequence. Some embodiments of the methods include the following: Executing at most one program; monitoring at least 1 sequence of function calling made by that program; assigning each type function call in the sequence of function called made by the program at the least one; creating at minimum part of the compression modeling by recording at the least one sequence unique identifiers using the unique identifiers assigned for each type function call and the observed at the least 1 sequence of functions calls.

In certain embodiments, systems are provided for detecting an abnormal sequence of function call. These include a memory and a processor communicating with the memory. The processor compresses a sequence function calls made through the execution of a program using an optimization model. Based on how compressed the sequence is, the processor determines whether there is an abnormal sequence of functions calls in the sequence. The processor executes at most one program, observes at minimum one sequence function call made by the executed program, and assigns each type function call in that sequence a unique ID. Finally, the processor creates at the least part of the compression modeling by recording at the least one sequence unique identifiers using the unique identifiers assigned for each type function call and the observed sequence of functions calls.

Methods and media for detecting an abnormal sequence of function calls or detecting program executions that are not in the normal range of operation are provided. Some embodiments provide systems and methods that model running programs and application level computations. They also detect abnormal executions by instrumenting, monitoring, and analysing application-level program functions and arguments. This approach can be used for detecting abnormal program executions, which could be indicative of malicious attacks or program fault.

The algorithm used to detect anomalies may be, for instance, a probabilistic algorithm (PAD) or a one-class support vector machine (OCSVM), which can be described below or any other suitable algorithm.

Anomaly detection can be used to detect file system access anomaly detection and/or process execution anomaly detection. An anomaly detector can also be used to determine program execution state information, according to different embodiments. An anomaly detector, for example, may be used to model program stack information in order to detect abnormal program behavior.

In different embodiments, PAD is used to model program stack data. Such stack information can be extracted using, for instance, Selective Transactional EMulation(STEM), which is discussed below and permits the selective execution (or all) of a program within an instruction-level emulator using the Valgrind emulator. This allows the modification of a program’s source code or binary to include indicators of which functions are being called (or any other suitable related information) or any other suitable technique. This allows you to dynamically determine the information needed such as stack frames and function-call arguments. The monitored program can see this transparently. One or more of these information can be extracted from program stack specific information.

For example as shown in FIG. “For example, as illustrated in FIG. 8, an anomaly detector can be applied by extracting data from the stack (e.g. by using an emulator, or by modifying a program) and creating a data record that is provided to the anomaly detection for processing at 802. In various embodiments, the anomaly detector models normal program execution stack behavior in a first phase. The detection mode detects stack function references at 806 as abnormal after a model has already been computed. This is done by comparing the references to the model that was trained from 804.

Selective transactional emulator (STEM), and error virtualization can be used, depending on the embodiments, to reverse (undo), the effects of processing malicious input (e.g. changes to file system variables or program files) to allow program execution to resume in a graceful way. This allows you to pinpoint the exact location of the program that failed or was attacked. An anomaly detector can be applied to function calls to detect malicious program executions. This makes it possible to mitigate against them (e.g. by patch generation systems or content filtering signature generator systems). Additionally, if a vulnerability is identified accurately, it may reduce the performance impact by using STEM to execute a portion or all of a program.

Anomaly detection” can be achieved using detection models, as explained above. These models can be used with unsupervised and automatic learning.

Some embodiments allow such models to be constructed from a training set, which can include at least some of the arguments and a list of function calls. A model could include a compressed set of function calls that have been observed through execution of known non-anomalous programs. There are many compression methods that can be used in different embodiments. In some cases, the compression model can be used to create a dictionary for a compression technique such as Lempel-Ziv-Welch compression (LZW). For example, each function call can be assigned an identifier, such as a two-digit number, string, code, or other type of identifier. A sequence of function call can be represented as a collection of identifiers. A training set can contain different series of identifiers. These can be used to create a table or library of sequences that can be part of the compression model. The ability to compress a list of function calls using a compression model, such as a dictionary, library, or table of sequences, can help determine if they are anomalous. If the stream of function call can be compressed using a model (e.g. created from non-anomalous known applications), then both training and testing sets can be considered to be non-anomalous. If the test set is not able to be compressed well, it may contain abnormal function calls. There are many techniques that can be used to determine the extent of test data that must be compressed to make it non-anomalous. These include techniques based on empirical data, user or administrator settings, and techniques that use administrative settings.

FIG. “FIG. 9 shows a method to create a compression model at 910. To create a compression model 920, this can be done by running one or more non-anomalous programs at 911 and then compressing at 912 a sequence of function calls. At 930, a test program can be run. A sequence of function call resulting from program execution can then be compressed using compression model 931 to create a compressed sequence 932. At 933 it is possible to determine if the compressed sequence 932 has been well compressed. This can be determined by, for instance, the percentage of function call that were not compressed, how long the various sequences of uncompressed functions are, the distance between them, the distribution and density of uncompressed functions calls, and/or the number unique uncompressed and/or uncompressed calls. The compressed sequence 932 can be considered to be non-anomalous if it is well compressed at 935. It can be considered abnormal if it is not compressed well, at 934. There are several recovery options available if a program execution is deemed anomalous. Some embodiments allow programs to be executed at 911 to be considered anomalous programs. A sequence 932 that is well compressed may indicate the existence of anomalous sequences. Some embodiments allow for the creation of multiple models 920. These models can include models of both anomalous or non-anomalous program models. A compressed sequence 931 can be used to compare with various models in such embodiments.

Execution, in some embodiments, can include execution all or part of a programme and can be done natively or in an emulator. Execution 933 can be stopped if an abnormal sequence is detected. One or more function calls, or the order of function calls, can be changed to make the sequence non-anomalous. A compressed sequence 932 can be stored in memory but may not be created under certain circumstances. Compression 931 can be used to feed directly into the 933-degree determination of how compressed a test program is. At 933, the determination can be made at any time during execution (at 930), at intervals or breaks in execution (at 930), or after execution is complete at 930. Differentiation of function calls can be used in various embodiments (e.g. Differentiation of function calls (e.g. In different embodiments, creation can be performed at 910, execution at 930, or determination at 933 on the same digital processor device. The compression model 920 may include multiple models that were created by running different training data on different digital processing devices.

A probabilistic anomaly detection algorithm (PAD), can be used to train an algorithm for detecting anomalies. This model can be used in various ways, including in combination with a compression model (e.g. model 926). This model can be described as a density estimation. The estimation of a density function (p(x), over normal data allows anomalies to be defined as elements with low probability. Low probability data or events are detected by consistency checks on the normal data. A record is considered anomalous if it fails one of these tests.

First and Second Order Consistency Checks” can be used. First order consistency checks are used to verify that the value is consistent with the observed values for that feature in a normal data set. These first-order consistency checks determine the likelihood that a feature will be observed, P(Xi Xj), where Xi and Xj are the feature variables.

One way to calculate these probabilities is to use a multinomial to compute the ratio of elements counted to total counts. This can lead to a biased estimator if there are not enough data. An alternative approach is to use an estimator for determining these probability distributions. Let N be the total number, Ni the number of observations for symbol i. Let N be the total number of observations, Ni be the number of observations for symbol i, and L be the pseudo count. Add this to the count for each observed symbol. k0 is the number of symbols observed, and L the total number. These definitions can be used to calculate the probability of an observed element i.

P\n?\n(\nX\n=\ni\n)\n=\nN\ni\n+\n?\nk\n0\n?\n?\n+\nN\n?\nC\n(\n1\n)\n\nand the probability for an unobserved element i can be:

P\n?\n(\nX\n=\ni\n)\n=\n1\nL\n-\nk\n0\n?\n(\n1\n-\nC\n)\n(\n2\n)\n\nwhere C, the scaling factor, accounts for the likelihood of observing a previously observed element versus an unobserved element. You can compute C as:

C\n=\n(\n?\nk\n=\nk\n0\nL\n?\nk\n0\n?\n?\n+\nN\nk\n?\n?\n?\n+\nN\n?\nm\nk\n)\n?\n(\n?\nk\n?\nk\n0\n?\nm\nk\n)\n-\n1\n?\n?\nwhere\n?\n?\nm\nk\n=\nP\n?\n(\nS\n=\nk\n)\n?\nk\n!\nk\n=\nk\n0\n?\n?\n?\n(\nk\n?\n?\n?\n)\n?\n?\n(\nk\n?\n?\n?\n+\nN\n)\n(\n3\n)\n\nand P(s=k) is a prior probability associated with the size of the subset of elements in the alphabet that have non-zero probability.\nBecause this computation of C can be time consuming, C can also be calculated by:

Click here to view the patent on Google Patents.