Memory Vulnerability Diagnosis for Binary Program

. Vulnerability diagnosis is important for program security analysis. It is a further step to understand the vulnerability after it is detected, as well as a preparatory step for vulnerability repair or exploitation. This paper mainly analyses the inner theories of major memory vulnerabilities and the threats of them. And then suggests some methods to diagnose several types of memory vulnerabilities for the binary programs, which is a difficult task due to the lack of source code. The diagnosis methods target at buffer overflow, use after free (UAF) and format string vulnerabilities. We carried out some tests on the Linux platform to validate the effectiveness of the diagnosis methods. It is proved that the methods can judge the type of the vulnerability given a binary program.


Introduction
Memory vulnerabilities are difficult to detect and diagnose especially for those do not crash the program.Due to its importance, vulnerability diagnosis problem has been intensively studied.Researchers have proposed different techniques to diagnose memory vulnerabilities.
Jun Xu proposed a technique that uses the crash as a trigger to initiate an automatic diagnosis algorithm [1].Moreover, the diagnosis process also generates a signature of the attack using data/address values embedded in the malicious input message, and is used to block future attacks.However, this technique needs an input that can trigger the program to crash and the validity of the generated signature cannot be verified.
Jiang Zheng combined both of the dynamic analysis techniques and static analysis techniques to solve automatic buffer overflow vulnerability diagnosis (BOVD) problem for commodity software [2].However, the solution only targets at buffer overflow vulnerability and needs an effective exploit as input to finish the diagnosis process.
In this paper, we mainly focus on the memory vulnerabilities that might not lead to program corruption.Because of lack of source code, vulnerability analysis for binary programs have been a puzzle and the most popular solution may be fuzz test.But it is fair well known that the reported bugs are not always vulnerabilities.Therefore our work of vulnerability diagnosis is valuable for vulnerability fixing or exploitation and deserves attention.

Buffer Overflow Vulnerability Diagnosis
Buffer overflow vulnerabilities include heap overflow and stack overflow.Their theories are similar but the diagnosis methods to them are quite different.

Heap Overflow Diagnosis
A heap overflow condition is a buffer overflow, where the buffer that can be overwritten is allocated in the heap portion of memory, generally meaning that the buffer was allocated using a routine such as the malloc() call [3].
Heap overflows are usually just as dangerous as stack overflows.Besides important user data, heap overflows can be used to overwrite function pointers that may be living in memory, pointing it to the attacker's code.
It is comparatively easy to detect an overflow in the heap at runtime.When allocate an area in the heap, generally the memory allocator sets headers before or after (or both) the memory area.These headers can contain some information like size, used or not, and etc.
Therefore, we can monitor the allocator so as to determine the base address and the size of the allocation areas.Then, check all STORE and LOAD accesses in memory to see which one is outside the allocated area.
In the diagnosis process, we supervise all calls to the malloc function and save the base address and the size of our buffer in a std::list.Then, when a STORE or LOAD occurs, we just need to check if the destination address is in our list.If it is outside our area, there probably is heap overflow.

Stack Overflow Diagnosis
A stack overflow condition is another buffer overflow condition, where the buffer being overwritten is allocated on the stack (i.e., is a local variable or, rarely, a parameter to a function) [4].
It is much more complicated to diagnose stack overflow than heap overflow though their theories are nearly the same.First, we should be aware that there are not one but several variables in the same stack frame.The main problem is to detect the overflow between these variables, which will probably not crash the program, that's why it's possible to miss this type of vulnerabilities without a binary analysis.So, it is quite difficult to detect the overflow between the variables at runtime without changing the code execution There are four main steps to diagnose the stack overflow vulnerability: First, isolate the functions.Then, determine how many variables are instantiated in the stack frame.We use a std::list of stack frame which contains a unique ID and another std::list which contains all variables instantiated in this stack frame.Next, detect the loops which contain a STORE instruction to our stack frame.Generally a loop ends with a conditional jump.Finally, analyze this STORE to detect a potential overflow.If a STORE instruction occurs, we check where the value is stored.If the value destination is in our stack frame, we monitor that for each STORE the value is stored in the same area.

UAF Vulnerability Diagnosis
Use after free errors occur when a program continues to use a pointer after it has been freed [ Use after free errors sometimes have no effect and other times cause a program to crash.While it is technically feasible for the freed memory to be re-allocated and for an attacker to use this reallocation to launch a buffer overflow attack, we are unaware of any exploits based on this type of attack.
In order to diagnose the UAF vulnerability, we maintain a free table (TF) and an allocation table (TA) which represent the states of pointers allocated or freed during the execution.When a LOAD or STORE instruction occurs, the tool checks if the memory access is referenced into TA or TF.If the memory access is in TF, then we regard it as a use after free vulnerability and may lead to memory leak.

Format String Vulnerability Diagnosis
The Format Function is an ANSI C conversion function, like printf, fprintf, which converts a primitive variable of the programming language into a human-readable string representation.The Format String is the argument of the Format Function and is an ASCII Z string which contains text and format parameters, like: printf ("The magic number is: %d\n", 1911).The Format String Parameter, like %x %s defines the type of conversion of the format function.
The Format String exploit occurs when the submitted data of an input string is evaluated as a command by the application.The attack could be executed when the application doesn't properly validate the submitted input.In this case, if a Format String parameter, like %x, is inserted into the posted data, the string is parsed by the Format Function, and the conversion specified in the parameters is executed.However, the Format Function is expecting more arguments as input, and if these arguments are not supplied, the function could read or write the stack.
In this way, the attacker could execute code, read the stack, or cause a segmentation fault in the running application, causing new behaviours that could compromise the security or the stability of the system[6].
We apply dynamic taint analysis to help diagnose the format string vulnerability.First, taint all arguments (*argv[]).And then check when a printf occurs if there are some tainted bytes in its first argument (generally stored in RDI register in x86_64).If RDI points on a memory area which contains tainted bytes, that means there is a potential vulnerability.

Experiment Results
Our memory vulnerability diagnosis tools for binary programs are built on Pin.Pin is a dynamic binary instrumentation framework for the IA-32 and x86-64 instruction-set architectures that enables the creation of dynamic program analysis tools.These tools called Pintools, can be used to perform program analysis on user space applications in Linux and Windows.We carried out the experiments on Ubuntu 12.04 64bits.As a dynamic binary instrumentation tool, instrumentation is performed at run time on the compiled binary files.Thus, it requires no recompiling of source code and can support instrumenting programs that dynamically generate code.

UAF Diagnosis Result
UAF vulnerability tends to lead to memory leak.As is described in Figure 1, we record the states of pointers allocated or freed during the execution to maintain a free table (TF) and an allocation table (TA).When a LOAD or STORE instruction occurs, the tool checks if the memory access is referenced into TA or TF.If the memory access is in TF, TF <-(0x19c2010, 0x20), then we regard it as a use after free vulnerability.And we check the free table, found that (0x19c2040, 0x20) would lead to 32 bytes memory leak.

Format String Diagnosis Result
As you can see in Figure 2, we first taint all arguments (*argv[]).And then check when a printf occurs if there are some tainted bytes in its first argument stored in RDI.If RDI points on a memory area which contains tainted bytes, that means there is a potential vulnerability.We can trace the execution related to the tainted data thanks to dynamic taint analysis.

Buffer Overflow Diagnosis Results
As is depicted in Figure 3, we supervise all calls to the malloc function and save the base address and the size of our buffer in a std::list.Then, when a STORE or LOAD occurs, we just need to check if the destination address is in our list.If it is outside our area, there probably is heap overflow.

onclusion
This paper mainly analyses the inner theories of major memory vulnerabilities the threats of them.And then suggests some methods to diagnose the several types of memory vulnerabilities for the binary programs, which is a difficult task due to the lack of source code.The diagnosis methods target at buffer overflow, use after free (UAF) and format string vulnerabilities.We carried out some tests on the Linux platform to validate the effectiveness of the diagnosis methods.It is proved that the methods can judge the type of the vulnerability given a binary program.

Figure 3 .
Figure 3. Heap Overflow Diagnosis Result.As is shown in Figure4, we use a std::list of stack frame which contains a unique ID and another std::list which contains all variables instantiated in this stack frame.And analyze this STORE to detect a potential overflow.If a STORE instruction occurs, we check where the value is stored.If the value destination is in our stack frame, we monitor that for each STORE the value is stored in the same area.When the continuous write operation counts to 4, then we regard it as 4 bytes overflow.
5].Like double free errors and memory leaks, use after free errors have two common and sometimes overlapping causes: Confusion over which part of the program is responsible for the memory.The use of previously freed memory can have any number of adverse consequences -ranging from the corruption of valid data to the execution of arbitrary code, depending on the instantiation and timing of the flaw.The use of previously freed memory can result in a write-whatwhere or buffer overflow (especially heap overflow) in several ways.