Encoding of Shellcode

3. Encoding of Shellcode

In the previous module, we introduced the meaning of NULL-free shellcodes. Shellcodes are generally encoded since most vulnerabilities have some form of restriction over data which is being overflowed.

Consider the following snippet:

#include <iostream>
#include <cstring>

int main(int argc, char *argv[])
{
  char StringToPrint [20];
  char string1[] = "\x41\x41\x41";
  char string2[] = "\x42\x42\x42\x43\x43\x43";

  strcat(StringToPrint, string1);
  strcat(StringToPrint, string2);
  print("%s", StringToPrint);

  return 0;
}

The code simply concatenates the two variabels string1 and string2 into StringToPrint.

If everything works fine when printf gets executed, the program should print the string "AAABBBCCCC".

C language string functions will work till a NULL, or 0 bytes is found. If the string2 variable contained the NULL character \x00, then the strcat function would only copy only the data before. Let's try to edit string2 by adding a NULL character between \x42 and \x43.

Our code should look like this:

char string2[] =  "\x42\x42\x42\x00\x43\x43\x43";

Which results with: AAABBBCCC

If we compile and execute the program, we will see that only part of the string is printed, that is AAABBB

As you can see, if our shellcode on contains NULL character, it wont work because it contains strcat.

Shellcodes should be Null-free to guarantee the execution. There are several types of shellcode encoding:

  • Null-free Encoding

  • Alphanumeric and printable encoding

Encoding a shellcode that contains NULL bytes means replacing machine instructions containing zeroes, with instructions that do not contain the zeroes, but that achieve the same tasks.

Let's see an example. Let's say you want to initialize a register to zero. We have different alternatives:

Machine Code

Assembly

Comment

B8 00000000

MOV EAX,0

Set EAX to 0

33 C0

XOR EAX,EAX

Set EAX to zero

B8 78563412

MOV EAX, 0x12345678

This also sets EAX to 0

2D 78563412

SUB EAX,0x12345678

From this, you should notice that the first instruction (MOV EAX, 0) should be avoided because it has 00 within its machine code representation.

Sometimes, the target process filters out all non-alphanumeric bytes from t he data. In such cases, alphanumeric shellcodes are used; however, such case instructions become very limited. To avoid such problems, Self-modifying Code (SMC) is used.

In this case, the encoded shellcode is prepended with a small decoder (that has to be valid alphanumeric encoded shellcode), which on execution will decode and execute the main body of shellcode.

Last updated