# Encoding of Shellcode

### 3. Encoding of Shellcode

In the previous module, we introduced the meaning of NULL-free shellcodes. Shellcodes are generally encoded since most vulnerabilities have some form of restriction over data which is being overflowed.

Consider the following snippet:

```
#include <iostream>
#include <cstring>

int main(int argc, char *argv[])
{
  char StringToPrint [20];
  char string1[] = "\x41\x41\x41";
  char string2[] = "\x42\x42\x42\x43\x43\x43";

  strcat(StringToPrint, string1);
  strcat(StringToPrint, string2);
  print("%s", StringToPrint);

  return 0;
}

```

The code simply concatenates the two variabels **string1** and **string2** into **StringToPrint**.

If everything works fine when **printf** gets executed, the program should print the string "AAABBBCCCC".

C language string functions will work till a **NULL**, or **0** bytes is found. If the **string2** variable contained the **NULL** character **\x00**, then the **strcat** function would only copy only the data before. Let's try to edit **string2** by adding a **NULL** character between **\x42** and **\x43**.

Our code should look like this:

```
char string2[] =  "\x42\x42\x42\x00\x43\x43\x43";
```

Which results with: `AAABBBCCC`

If we compile and execute the program, we will see that only part of the string is printed, that is `AAABBB`

As you can see, if our shellcode on contains **NULL** character, it wont work because it contains **strcat**.

**Shellcodes should be Null-free to guarantee the execution**. There are several types of shellcode encoding:

* Null-free Encoding
* Alphanumeric and printable encoding

Encoding a shellcode that contains **NULL** bytes means replacing machine instructions containing zeroes, with instructions that do not contain the zeroes, but that achieve the same tasks.

Let's see an example. Let's say you want to initialize a register to zero. We have different alternatives:

| Machine Code | Assembly            | Comment                 |
| ------------ | ------------------- | ----------------------- |
| B8 00000000  | MOV EAX,0           | Set EAX to 0            |
| 33 C0        | XOR EAX,EAX         | Set EAX to zero         |
| B8 78563412  | MOV EAX, 0x12345678 | This also sets EAX to 0 |
| 2D 78563412  | SUB EAX,0x12345678  |                         |

From this, you should notice that the first instruction (**MOV EAX, 0**) should be avoided because it has **00** within its machine code representation.

Sometimes, the target process filters out all non-alphanumeric bytes from t he data. In such cases, alphanumeric shellcodes are used; however, such case instructions become very limited. To avoid such problems, **Self-modifying Code (SMC)** is used.

In this case, the encoded shellcode is prepended with a small decoder (that has to be valid alphanumeric encoded shellcode), which on execution will decode and execute the main body of shellcode.
