A Byte

If you're a beginner in reverse engineering, maybe you're in the right place. I'm going to explain this challenge for who doesn't understand anything about x64 assembly and how to convert the binary to C, and then, find the flag with python.

I'm using IDA so when I open this binary, this is what appear:

Ok, but what does that mean?

It means that we have the crt0 and main. crt0 is the first thing that the program does to initialize the stack, store initialized variables (but no local static variables) into the data section, clear uninitialized global and static variables into the bss section, and jump to the main function. Basically, we have any global variable saved in the stack and also a memory space to run the main. Let's focus on the main.

byte -> reserved 1 byte or 8 bits in memory;
word -> reserved 2 bytes or 16 bits in memory;
dword -> reserved 4 bytes or 32 bits in memory;
qword -> reserved 8 bytes or 64 bits in memory.

push rbp ; push the memory address of the register base pointer (rbp) to the stack frame.

mov rbp, rsp ; copies the memory address of the register stack pointer (rsp) to the top of the current stack frame, which is rbp. Now both rsp and rbp are equal and on the top of the stack.

sub rsp, 50h ; creates a space to store local variables. This space is 50h (base-16, or hexadecimal) or 80 (base-10, or decimal) or 'P' (as a character in ASCII Table). It's going to be on the top of the stack.

mov [rbp+var_44], edi ; there is a calling convention for registers. In order, they are: rdi/edi, rsi/esi, rdx/edx, rcx/ecx, r8 and r9. These registers are used for integer and memory address arguments. So, in this case, the program puts the first 32-bit register (edi) into the offset, which is the relative address from the beginning of memory segment where the variable is stored. It means that offset is the distance between the base address and the byte we’re interested (var_44, a local variable). This local variable it's going to be argc (argument count), which contains the number of arguments passed to the program.

mov [rbp+var_50], rsi ; the program puts the second 64-bit register (rsi) into the offset of var_50 (another local variable) which is going to be argv[] (argument vector). And, now it makes sense to use rsi than esi, because the program wants more memory space to a vector.

mov rax, fs:28h ; it's the stack canaries, which it necessary to detect a stack buffer overflow before execution of malicious code can occur.

mov [rbp+var_8], rax ; the program puts 64-bit register rax into the offset of var_8.

xor eax, eax ; xor as a assembly instruction can be used to clear a register. The reason is because equal numbers like 1 xor 1, or 0 xor 0, have 0 as a result. So, if I have 1101 xor 0010, the result is 0. Xor instruction executes faster than mov instruction. However, why is the reason to clear the register? The program clear the register because the variables are stored in their places, consequently, we don't need to use the registers anymore. Let's save memory, right? Another question: why does it clean 32-bit register eax but no 64-bit register rax? Because both have the same result, so if we use 32-bit register it will use less bit to process.

cmp [rbp+var_44], 2 ; compares the integer value argc by the user with the value 2.

jz short loc_77B ; jumps to loc_77B function if the comparation results in zero (in other words, if the comparation between these two values are equal). "short" means that the jump can be encoded as 2 bytes with a distance range between -128 to +127 bytes.

After this last instruction, you can see two lines: red and green. Now, let's expand it and substitute var_44 = argc and var_50 = argv:

If we follow the red line:

nop ; the comparation doesn't result in zero.

jmp short loc_765 ; jumps to loc_765 function.

It goes to blue line:

loc_765: ; loc_765 function.

lea rdi, s ; the lea instruction places the address specified by rdi in "s" value, which is a string u do not know da wae.

call _puts ; calls the puts string function which display that message on the screen.

mov eax, 0FFFFFFFFh ; copies -1 (0FFFFFFFFh) to 32-bit register eax. It uses just few bytes as possible to be friendly with the cache.

jmp loc_891 ; jumps to loc_891 function.

It goes to blue line:

loc_891: ; loc_891 function.

mov rsi, [rbp+var_8] ; the offset value of var_8 is copied to 64-bit register rsi.

xor rsi, fs:28h ; xor instruction to clear the space used to stack canaries.

jz short locret_8A5 ; jumps to locret_8A5 function if the comparation results in zero.

If it goes to red line:

call ___stack_chk_fail ; aborts the function that called it with a message that a stack overflow has been detected and the program exits.

Else if goes to green line:

locret_8A5: ; goes to locret_85A function, and, in brief, the program finishes.

Now, let's come back to the ending of the main function (or to the beginning of the up image) and go to green line:

loc_77B: ; loc_77B function.

mov rax, [rbp+argv] ; as you remember, 64-bit register rax is clear and now the program can use it to store argv's memory address.

mov rax, [rax+8] ; it's expanding +8 bytes into the memory space.

mov [rbp+s], rax ; string "s" receives register rax value.

mov rax, [rbp+s] ; now, rax receive the string "s" value.

mov rdi, rax ; 64-bit register rdi receives 64-bit register rax value.

call _strlen ; calls the strlen function, which calculates the length of a given string (in this case it's the length of string "s").

mov [rbp+var_3C], eax ; stores 32-bit register eax into the offset of var_3C.

cmp [rbp+var_3C], 23h ; compares 23h and rbp+var_3C.

jnz short loc_761 ; if the previous comparation doesn't return 0, jump to loc_761.

If it goes to red line:

mov [rbp+var_40], 0 ; stores 0 into the memory address of var_40.

jmp short loc_7CD ; the jmp instruction performs an unconditional jump. It's just jump to another location that, in this case, is loc_7CD function.

It goes to blue line:

loc_7CD: ; loc_7CD function.

mov eax, [rbp+var_40] ; 32-bit register eax receives the memory address of var_40.

cmp eax, [rbp+var_3C] ; compares 32-bit register eax with the memory address of var_3C.

jl short loc_7A5 ; if the condition is met, jump to loc_7A5.

If it goes to green line:

loc_7A5: ; loc_7A5 function.

mov eax, [rbp+var_40] ; 32-bit register receives memory address of var_40.

movsxd rdx, eax ; movsxd moves the dword by sign extending the dword into qword. So, for this case, it's going to expand the memory space of the 64-bit register rdx with the 32-bit register eax. It results in 4 bytes more into the memory space of rdx.

mov rax, [rbp+s] ; 64-bit register rax receives memory address of string "s".

add rax, rdx ; 64-bit register rax plus 64-bit register rdx.

movzx ecx, byte ptr [rax] ; 32-bit register ecx receives 1 byte (or 8 bits) from 64-bit register rax, and movzx produces zero-extension in the rest of the bits, which are 24 (32 - 8) bits with 0.

mov eax, [rbp+var_40] ; 32-bit register eax receives the memory address of var_40.

mov rax, [rbp+s] ; 64-bit register rax receives memory address of string "s".

add rax, rdx ; 64-bit register rax value plus 64-bit register rdx value.

xor ecx, 1 ; 32-bit register ecx xor 1. If it's equal, result = 0. Else, result = 1.

mov edx, ecx ; 32-bit register edx value receives 32-bit register ecx value.

mov [rax], dl ; memory address of 64-bit register rax is the value of the 8-bit register bl.

add [rbp+var_40], 1 ; memory address of var_40 plus 1.

It goes back to loc_7CD.

In brief, we can conclude that loc_7A5 function is doing a for loop to add 1 until the comparation between 32-bit register eax with the memory address of var_3C are matched.

After loc_7CD function, instead of following to the green line, it follows to the red line:

There are a huuuuuge mov instructions. All of them are used to store byte values, in other words, char. Basically, the program organize a space in memory for all these values into a string. And, in the end, it jumps to loc_764 function if it's not return something different from zero.

Ok, we had some idea about how the binary file works in x64 assembly language. In order that, let's see how IDA has organized the pseudocode in C.

And, of course, we can do better:

However, just C code is not enough. We need to convert bytes to string to get the flag. That's why we use python now.

We got the flag!

Also, it was just an alternative way (mine) to find the flag. However, the challenge gave us a python code using z3 (Microsoft tools) library to find the flag in this another way.

Finally, same flag: