Staticly Detect Stack Overflow Vulnerabilities with Taint Analysis

Nowadays binary static analysis uses dangerous system library function to detect stack overflow vulnerary in program and there is no effective way to dig out the function which can cause stack overflow issue. List necessarily characteristics of the function which may cause stack overflow vulnerary and define stack overflow dangerous function(SODF). Then introduce static taint analysis to detect SODF include taint introduction, taint propagation and taint checking stragety. Next describe the particular process of detecting SODF in the program with static taint analysis. Finally choose 4 runtime library and 2 binary software, and detect whether the chosen software has SODF and locate the name of SODF with static taint analysis. Testing result shows that the algorithm can detect and locate plenty of SODF in test program which means the algorithm can work efficiently.


Introduction
Stack overflow is a well-known security issue, which can be used to execute unauthorized instructions and illegal operations.The study of stack overflow vulnerability detection is of great significance.
Current method of detecting stack overflow vulnerabilities can be divided into dynamic and static analysis.Dynamic analysis, mainly through checking dataintegrity of an array of borders and return pointer, whose typical tools include StackGuard [1] and RAD [2].Overall, stack overflow is often detected when the stack has been crashed.
Source code analysis and binary analysis provide effective performance in static analysis.Source code analysis detects possible vulnerabilities by comparison with known vulnerabilities after lexical analysis.Wherein, Flawfinder [3] and RATS [4] are typical analysis tools.Binary analysis aims at checking if there is boundary detection when calling library functions to determine whether exists stack overflow vulnerability.
Static analysis can efficiently locate the stack overflow vulnerability.And dynamic analysis can be more effective with the prior knowledge provided by the static analysis.However, it is very difficult to access most of the program's source code.And there is no good way to locate the function which has stack overflow risk and defined by the program itself.
Most static method to detect binary software's stack overflow vulnerability is based on analyzing known vulnerable library functions.However many programs have stack overflow vulnerability because of bad boundary condition of self-defined function.IKEView.exe has stack overflow vulnerability because of a self-define function with R60 version.
In this paper, we establish a model for stack overflow dangerous function, and propose an algorithm to detect stack overflow vulnerabilities.We have implemented our method and applied it to test several binary software.The results show that static taint analysis can achieve higher accuracy.

Basic Definitions
Despite the formats of executable files on different operation system are different, but the basic formulation methods are consistent.
Executable file contains import and export tables, global variables and functions.Function consists of basic blocks, which are made up of basic instructions.Here we give out some basic definitions in binary software.

Some Definitions
Instruction: Instruction is instructing a computer to perform a certain operation command, abbreviate as I , and it is the smallest unit of computer-implemented.It consists of a string of binary number, usually including opcode and operands.Opcodes represents the operation which the instruction is going to operate.Operands represent particular data which is operated by the instruction.
Basic Block: Basic Block is an execution sequence of instructions, abbreviate as B ,which has only one entrance and one exit.The entrance is the first instruction of the basic block while the exit is the last.Therefore, as long as the first instruction is executed in time, then all other instructions within a basic block will be executed once the order.
Basic Block Set: Basic Block Set is composed of several basic blocks that can access to each other, abbreviate as BB .
Function: Function is a basic unit of an executable module, and a combination of basic blocks that accomplish a specific function together, abbreviate as F.

Intermediate Languages
It is very difficult to analysis disassembled assembly code directly due to the wide range of assembly instructions.Normally we lift the disassembled assembly code into intermediate language.
Intermediate language abstracts form native assembly code and therefore makes it easier to develop analysis tools and algorithms.
In this paper, we use the REIL(Reverse Engineering Intermediate Language), a platform-independent intermediate language to represent disassembled assembly code.

Stack Overflow Dangerous Function Model
So far, most static code analysis methods decide whether a program has stack overflow vulnerability by looking for known dangerous functions such as strcpy and memcpy.However, the source of stack overflow is not the function name itself, but their approach to the data.For example, the dangerous function copies the source data into the destination buffer without correctly checking the size of the destination buffer.Parameter src is the address of source buffer and parameter dst is the address of destination buffer of the function.And data transfer from the source buffer to the destination buffer through local variable tmp in the loop.The loop control variable i is independent on the destination buffer.So mymemcpy function is a SODF.The function is an unknown SODF, and many static analysis methods often ignore it.

Characteristic of SODF
Always, data transfer between source buffer and destination buffer in a loop, and the control condition of the loop depends on the source buffer.There are many self-defined functions that have the above characteristics, and they are likely to become internal factors for stack overflow.Through the analysis of a large number of stack overflow vulnerabilities, we summarize the characteristics of Stack Overflow Dangerous Function as follow: (1) Contains a loop (2) Data transfer from the source buffer to the destination buffer in the loop (3) The source buffer address is the parameter of the function (4) The destination buffer is the parameter or a local variable of the function (5) The loop control variable is not dependent on the destination buffer.. Next, we propose our static taint analysis method to locate that kind of function.

Over View
In this section, we leverage static taint analysis to locate stack overflow dangerous function (SODF).And introduce an algorithm based on static taint analysis to analyze program.
The algorithm is divided into two phases: cyclic basic block set's filter and static taint analysis.The cyclic basic block set's filter (step one to step three) aims at locating the possible dangerous function address.Static taint analysis (step four to step five) checks whether data transfers from buffer to buffer in the loop and determines whether the function is a dangerous one.
General idea of the algorithm is as follows: Step one, get the function fwith_args which introduces the parameter in the program.
Step two, locate loop basic block set BBwith_cycle which contains cycle relationship in fwith_args.
Step three, remove the jump instruction Jcc in every BBwith_cycle and form new basic block Bneed_analysis.
Step four, introduce taint for each basic block Bneed_analysis.If the basic block invokes the parameter of the function, then we mark the parameter as taint and on to the next step.
Step five, we translate the B need_analysis into intermediate language basic block and start the taint propagation.If the destination buffer is tainted, then we can say the function is a stack overflow dangerous function.

Suspicious Basic Block
Suspicious basic block is a basic block set.It is a set of all possible basic blocks between loop start and loop end in the loop basic block set.We set the begin of a loop basic block set is the lowest address of the set and the end of a loop basic block set is the highest address of the set.
We adopt a conservative analysis strategy in static taint analysis.So we analysis all possible paths from loop starts to loop ends, and can effectively improve the accuracy of the algorithm through this method.
The mymemcpy function corresponding to that shown in Figure 1, its Suspicious BB is {0x401858, 0x401860, 0x40184F}.

Static Taint State
Static taint analysis does not real execute target program, there is no runtime information.We propose static taint state to introduce taint and taint propagation. Every

Taint Analysis Strategy
Taint analysis strategy includes taint introduction, taint propagation and taint checking.It mainly checks whether to-be-analysis-basic-block-set transfers parameter of the function as the source buffer to destination buffer.
(1) Taint Introduction Taint introduction here mainly concerns about the parameters of the function.According to the stack structure, parameters of the function get indexed by EBP/RBP register.When instruction introduce the parameter in the memory, then we mark the operand of the instruction as taint source, and set the operand state and memory state tainted. (

P r2r (T[t1])
Returns t 1 's operand taint state, and the destination operand must be operand taint state.
(3) Taint Checking Strategy When instruction STM is executed, taint checking begins.Algorithm checks source operand's operand state and memory state.The function is stack overflow dangerous function if the operand's memory state is tainted.
The loop basic block set is extracted from the program in section III.Then it transformed to Suspicious BB.And the Suspicious BB is translated to the REIL basic block.The specific static taint analysis progress with the REIL basic block is shown below.
We can know from the Code II that the REIL basic block read data from memory pointed by parameter of function, and transfer the data to destination buffer.Thus, mymemory is an SODF.
The algorithm adopts conservative strategy, so it will costs a lot of time to analyze functions which has high cyclomatic complexity.State explosion problem will happen when cyclomatic complexity is higher than 50.We ignore those functions when practical analyzing program.

Experiment Design
This chapter chooses 4 runtime libraries and 2 software, which is shown in Table 3.And the detection of SODF is a new technique nowadays ,so no comparison with other techniques.
Static taint analysis algorithm is based on plugin of IDA pro.The plugin runs on Windows 7 SP1 x32 system, processor is i5 2510m@2.5GHz,and memory is 3GB.

Result
The result of the algorithm is shown in  2 shows that the SODF that the algorithm detected.We can see that data stored in the address that pointed by arg_0 transfers to the address pointed by arg_4 in the cycle structure.And loop variable arg_8 is independent with destination buffer.So we can say the function is a SODF.And in practical the function is the cause of stack overflow vulnerability in IKEView.exe.

Conclusions
We proposed static taint analysis algorithm based on stack overflow vulnerabilities' behavior.Experimental result shows that algorithm can effectively locate self-defined SODF in program.However, algorithm will cost lots of time when analyzing functions with high cyclomatic complexity.Next, we will continue optimize the efficiency of the algorithm.Further static taint analysis algorithm can be used on other vulnerabilities' pattern.
The algorithm can be combined with dynamic analysis.By providing a possible position of SODF, making dynamic analysis more effectively to monitoring program.

Figure 1
Figure 1 The Disassembly Code of the Function

Figure 3
Figure 3 Disassembly Code in Ikeview.ExeIt will take quite long time to analysis the function that has high cyclomatic complexity, so the algorithm ignore those functions.However, the static taint analysis algorithm can accurately locate the SODF as shown in the result.

Figure
Figure operand has two static taint state, the operand T and state M each has two states : tainted state and untainted state.Before the propagation starts, all operands' static taint state are set to untainted.

Table 2 .Table 1
Taint Operation Semantics Taint Propagation   Tt indicates that intermediate instruction's operand t 's taint state.  Mt indicates that intermediate the memory's taint state pointed to instruction's operand t .ab indicates that operand a 's taint state updates for operand b 's taint state.  TM a b indicates that operand a 's operand and memory state updates for operand b 's operand and memory state.To express taint propagation process, Table 1 shows the intermediate instructions' operational semantics, where xxx P represents the taint propagation strategy.And taint propagation strategy is shown in Undefined

Table 2
Taint Propagation Strategy

Table 3
Specific Information of Library and Software

Table 4 .
Nk represents the known number of stack overflow dangerous function[NVD, CVE].Nr represents the real number of stack overflow dangerous function.Ni represents the number of functions that algorithm ignore to analysis due to high cyclomatic complexity.Nc represents the number of repetitions that known SODF and detected SODF.Time represents the time that the algorithm performs one time costs.

Table 4
Analysis Result