Manually Creating An ELF Executable
Manually Creating An ELF Executable
Manually Creating An ELF Executable
Robin Hoksbergen
[email protected]
http://www.robinhoksbergen.com
January 28, 2014
1 Introduction
Hello class, and welcome to X86 Masochism 101. Here, youll learn how to use opcodes di-
rectly to create an executable without ever actually touching a compiler, assembler
or linker. Well be using only an editor capable of modifying binary les (i.e. a hex editor)
and chmod to make the le executable.
1
If that doesnt turn you on, I dont know what will.
On a more serious note, this is one of those things that I personally think are a lot of fun.
Obviously, youre not going to be using this to create serious million-line programs. However,
it can give you an enormous amount of satisfaction to know that you actually understand how
this kind of thing really works on a low level. Its also cool to be able to say you wrote an
executable without ever touching a compiler or interpreter. Beyond that, there are applications
in kernel programming, reverse engineering and (perhaps unsurprisingly) compiler creation.
First of all, lets take a very quick look at how executing an ELF le actually works. A lot of
details will be left out. Whats important is getting a good idea of what your PC does when
you tell it to execute an ELF binary le.
When you tell the computer to execute an ELF binary, the rst thing itll look for are the
appropriate ELF headers. These headers contain all sorts of important information about CPU
architecture, sections and segments of the le, and much more - well talk more about that
later. The header also contains information that helps the computer identify the le as ELF.
Most importantly, the ELF header contains information about the program header table in the
case of an executable, and the virtual address to which the computer transfers control upon
execution.
The program header table, in turn, denes several segments in program headers. If youve
ever programmed in assembly, you can think of some of the sections such as text and data as
segments in an executable. The program headers also dene where the data of these segments
are in the actual le, and what virtual memory address to assign to them.
If everythings been done correctly, the computer loads all segments into virtual memory based
1
Theoretically, you dont even need to use chmod or a similar command if your conguration les contain
stupid umask values. Dont do that, though.
1
on the data in the program headers, then transfers control to the virtual memory address as-
signed in the ELF header, and starts executing instructions.
Before we begin with the practical stu , please make sure youve got an actual hex editor
on your computer, and that you can execute ELF binaries and are on an x86 machine. Most
hex editors should work as long as they actually allow you to edit and save your work - I per-
sonally like Bless . If youre on Linux, you should be ne as far as ELF binaries are concerned.
Some other Unix-like operating systems might work, too, but dierent OSes implement things
in slightly dierent ways, so I cannot be sure. I also use system calls extensively, which further
limits compatibility. If youre on Windows, youre out of luck. Likewise if your CPU architec-
ture is anything other than x86 (though x86 64 should work), since I simply cannot provide
opcodes for each and every architecture out there.
There are three phases to creating an ELF executable. First , well construct the actual payload
using opcodes. Second , well build the ELF and program headers to turn this payload into a
working program. Finally, well make sure all osets and virtual addresses are correct and ll
in the nal blanks.
A word of warning: constructing an ELF executable by hand can be extremely frustrating.
Ive provided an example binary myself which you can use to compare your work to, but keep in
mind that there is no compiler or linker to tell you what youve done wrong. If (read: when) you
screw up, all your computer will tell you is I/O Error or Segmentation Fault, which makes
these programs extremely hard to debug. No debugging symbols for you!
2 Constructing the Payload
Lets try to keep the payload simple but suciently challenging to be interesting. Our payload
should put Hello World! on the screen, then exit with code 93. This is harder than it looks.
Well need both a text segment (containing executable instructions) and a data segment (con-
taining the Hello World! string and some other minor data. Lets take a look at the assembly
code we need to achieve this:
(text segment)
mov ebx, 1
mov eax, 4
mov ecx, HWADDR
mov edx, HWLEN
int 0x80
mov eax, 1
mov ebx, 0x5D
int 0x80
The code above isnt too hard, even if youve never done much assembly. Interrupt 0x80 is used
to make system calls, with the values in the registers EAX and EBX telling the kernel what
kind of call it is. You can get a more comprehensive reference of the system calls and their
values in assembly at http://syscalls.kernelgrok.com.
For our payload, well need to convert these instructions to hexadecimal opcodes. Luckily,
2
there are good online references that help us do just that. Try to nd one for the x86 family,
and see if you can gure out how to go from the above code to the hex codes below:
<pre>0xBB 0x01 0x00 0x00 0x00
0xB8 0x04 0x00 0x00 0x00
0xB9 0x** 0x** 0x** 0x**
0xBA 0x0D 0x00 0x00 0x00
0xCD 0x80
0xB8 0x01 0x00 0x00 0x00
0xBB 0x5D 0x00 0x00 0x00
0xCD 0x80
(The *s denote virtual addresses. We dont know these yet, so well leave them blank for now.
2
)
The second part of the payload consists of the data segment, which is actually just the string
Hello World!
n. Use a nice ASCII conversion table (man ascii, anyone?) to convert these values to hex,
and youll see that well get the following data:
(data segment)
0x48 0x65 0x6C 0x6C 0x6F 0x20 0x57 0x6F 0x72 0x6C 0x64 0x21 0x0A
And theres our nal payload!
3 Building the Headers
This is where it can get very complicated very quickly. Ill explain some of the more important
parameters in the process of building the headers, but youll probably want to take a good
look at the ELF reference at http://www.skyfree.org/linux/references/ELF_Format.pdf
if youre ever going to build ELF headers completely by yourself.
An ELF header has the following structure, byte size between parentheses:
e_ident(16), e_type(2), e_machine(2), e_version(4), e_entry(4), e_phoff(4),
e_shoff(4), e_flags(4), e_ehsize(2), e_phentsize(2), e_phnum(2), e_shentsize(2)
e_shnum(2), e_shstrndx(2)
Now well ll in the structure, and Ill explain a bit more about these parameters where appro-
priate. You can always check the reference I linked to before if you want to nd out more.
e ident(16) This parameter contains the rst 16 bytes of information that identies the le
as an ELF le. The rst four bytes always hold 0x7F, E, L, F. Bytes ve to seven
all contain 0x01 for 32-bit binaries on lower-endian machines. Bytes eight to fteen are
padding, so those can be 0x00, and the sixteenth byte contains the length of this block,
so that has to be 16 (=0x10).
e type(2) Set it to 0x02 0x00. This basically tells the computer that its an executable ELF
le.
2
Your hex editor may not allow adding garbage data that isnt actually in hex to your les. If so, its preferable
to use magic hex numbers to denote This should be changed later, such as 0xDEADBEEF or 0xFEEDFACE.
3
e machine(2) Set it to 0x03 0x00, which tells the computer that the ELF le has been created
to run on i386 type processors.
e version(4) Set it to 0x01 0x00 0x00 0x00.
e entry(4) Transfer control to this virtual address on execution. We havent determined this,
yet, so its 0x** 0x** 0x** 0x** for now.
e pho(4) Oset from le to program header table. We put it right after the ELF header, so
thats the size of the ELF header in bytes: 0x34 0x00 0x00 0x00.
e sho(4) Oset from le to section header table. We dont need this. 0x00 0x00 0x00 0x00
it is.
e ags(4) We dont need ags, either. 0x00 0x00 0x00 0x00 again.
e ehsize(2) Size of the ELF header, so holds 0x34 0x00.
e phentsize(2) Size of a program header. Technically, we dont know this yet, but I can
already tell you that it should hold 0x20 0x00. Scroll down to check, if you like.
e phnum(2) Number of program headers, which directly corresponds to the number of seg-
ments in the le. We want a text and a data segment, so this should be 0x02 0x00.
e shentsize(2), e shnum(2), e shstrndx(2) All of these arent really relevant if were not
implementing section headers (which we arent), so you can simply set this to 0x00 0x00
0x00 0x00 0x00 0x00.
And thats the ELF header! Its the rst thing in the le, and if youve done everything correctly
the nal header should look like this in hex:
0x7F 0x45 0x4C 0x46 0x01 0x01 0x01 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x00 0x10 0x02 0x00 0x03 0x00 0x01 0x00 0x00 0x00
0x** 0x** 0x** 0x** 0x34 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x00 0x00 0x34 0x00 0x20 0x00 0x02 0x00 0x00 0x00
0x00 0x00 0x00 0x00
Were not done with the headers, though. We now need to build the program header table, too.
A program header has the following entries:
p_type(4), p_offset(4), p_vaddr(4), p_paddr(4), p_filesz(4), p_memsz(4),
p_flags(4), p_align(4)
Again, Ill ll in the structure (twice, this time: one for the text segment, one for the data
segment) and explain a number of things on the way:
p type(4) Tells the program about the type of segment. Both text and data use PT LOAD
(=0x01 0x00 0x00 0x00) here.
p oset(4) Oset from the beginning of the le. These values depend on how big the headers
and segments are, since we dont want any overlap there. Keep at 0x** 0x** 0x** 0x**
for now.
p vaddr(4) What virtual address to assign to segment. Keep at 0x** 0x** 0x** 0x** 0x**
now, well talk more about it next.
4
p paddr(4) Physical addressing is irrelevant, so you may put 0x00 0x00 0x00 0x00 here.
p lesz(4) Number of bytes in le image of segment, must be larger than or equal to size of
payload in segment. Again, set it to 0x** 0x** 0x** 0x**. Well change this later.
p memsz(4) Number of bytes in memory image of segment. Note that this doesnt necessarily
equal p lesz, but it may as well in this case. Keep it at 0x** 0x** 0x** 0x** for now,
but remember that we can later set it to the same value we assign to p lesz.
p ags(4) These ags can be tricky if youre not used to working with them. What you need
to remember is that READ permissions is 0x04, WRITE permissions is 0x02, and EXEC
permissions is 0x01. For the text segment we want READ+EXEC, so 0x05 0x00 0x00
0x00, and for the data segment we prefer READ+WRITE+EXEC, so 0x07 0x00 0x00
0x00.
p align(4) Handles alignment to memory pages. Page size are generally 4KiB, so the value
should be 0x1000. Remember, x86 is little-endian, so the nal value is 0x00 0x10 0x00
0x00.
Whew. Weve certainly done a lot now. We havent yet lled in many of the elds in the
program headers, and were missing a few bytes in the ELF header, too, but were getting there.
If everything went as planned, your program header table (which you can paste directly behind
the ELF header, by the way - remember our oset in that header?) should look something like
this:
0x01 0x00 0x00 0x00 0x** 0x** 0x** 0x** 0x** 0x** 0x** 0x**
0x00 0x00 0x00 0x00 0x** 0x** 0x** 0x** 0x** 0x** 0x** 0x**
0x05 0x00 0x00 0x00 0x00 0x10 0x00 0x00
0x01 0x00 0x00 0x00 0x** 0x** 0x** 0x** 0x** 0x** 0x** 0x**
0x00 0x00 0x00 0x00 0x** 0x** 0x** 0x** 0x** 0x** 0x** 0x**
0x07 0x00 0x00 0x00 0x00 0x10 0x00 0x00
4 Filling in the Blanks
Although weve nished most of the hard work by now, there are still some tricky things we
need to do. Weve got an ELF header and program table we can place at the beginning of our
le, and weve got the payload for our actual program, but we still need to put something in the
table that tells the computer where to nd this payload, and we need to position our payload
in the le so it can actually be found.
First, well want to calculate the size of our headers and payload before we can determine
any osets. Simply add the sizes of all elds in the headers together and thats the minimal
oset for any of the segments. There are 116 bytes in the ELF header + 2 program headers,
and 116 = 0x74, so the minimum oset is 0x74. To stay on the safe side, lets put the initial
oset at 0x80. Fill 0x74 to 0x7F with 0x00, then put the text segment at 0x80 in the le.
The text segment itself is 34 = 0x22 bytes, which means the minimal oset for the data segment
is 0x80 + 0x22 = 0xA2. Lets put the data segment at 0xA4 and ll 0xA2 and 0xA3 with 0x00.
If youve been doing all the above in your hex editor, you will now have a binary le that
contains the ELF and program headers from 0x00 to 0x73, 0x74 to 0x7F will be lled with
5
zeroes, the text segment is placed from 0x80 to 0xA1, 0xA2 and 0xA3 are zeroes again, and
the data segment goes from 0xA4 to 0xB1. If youre following these instructions and thats not
what youve got, now would be a good time to see what went wrong.
Assuming everythings now in the right place in the le, its time to change some of our previous
*s into actual values. Im simply going to give you the values for each parameter rst, and then
explain why were using those particular values.
e entry(4) - 0x80 0x80 0x04 0x08; Well choose 0x8048080 as our entry point in virtual memory.
There are some rules about what you can and cannot choose as an entry point, but the most
important thing to remember is that a starting virtual memory address modulo page size must
be equal to the oset in the le modulo page size. You can check the ELF reference and some
other good books for more information, but if it seems too complicated, just forget about it and
use these values. p oset(4) - 0x80 0x00 0x00 0x00 for text, 0xA4 0x00 0x00 0x00 for data. This
is because of the obvious reason that thats where these segments are in the le. p vaddr(4) -
0x80 0x80 0x04 0x08 for text, 0xA4 0x80 0x04 0x08 for data. We want the text segment to be
the entry point for the program, and were placing the data segment in memory in such a way
that it is directly congruent to their physical osets. p lesz(4) - 0x24 0x00 0x00 0x00 for text,
0x20 0x00 0x00 0x00 for data. These are simply the bytesizes of the dierent segments in the
le and memory. In this case, p memsz = p lesz, so use those same values there.
5 The Result
Assuming you followed everything to the letter, this is what you would get if you dumped out
everything in hex:
7F 45 4C 46 01 01 01 00 00 00 00 00 00 00 00 10 02 00 03 00
01 00 00 00 80 80 04 08 34 00 00 00 00 00 00 00 00 00 00 00
34 00 20 00 02 00 00 00 00 00 00 00 01 00 00 00 80 00 00 00
80 80 04 08 00 00 00 00 24 00 00 00 24 00 00 00 05 00 00 00
00 10 00 00 01 00 00 00 A4 00 00 00 A4 80 04 08 00 00 00 00
20 00 00 00 20 00 00 00 07 00 00 00 00 10 00 00 00 00 00 00
00 00 00 00 00 00 00 00 BB 01 00 00 00 B8 04 00 00 00 B9 A4
80 04 08 BA 0D 00 00 00 CD 80 B8 01 00 00 00 BB 2A 00 00 00
CD 80 00 00 48 65 6C 6C 6F 20 57 6F 72 6C 64 21 0A
Thats it. Run chmod +x on the binary le and then execute it. Hello World in 178 bytes.
3
I
hope you enjoyed writing it. :-) If you thought this HOWTO was useful or interesting, let me
know! I always appreciate getting an email. Tips, comments and/or constructive criticisms are
always welcome, too.
3
You could of course go much smaller than that, but thats something for another post.
6