As you may know, we recently brought Rolf Rolles on board the team here at Exodus. We all met at our Austin office and Rolf spent a week working alongside us. Our interview process doesn’t consist of contrived questions intended to observe the interviewee’s capacity for mental acrobatics. Traditionally, when we bring someone in for consideration we are already familiar with their past work and skillset. What we are more interested in is evaluating their capacity to work as part of our team. So, Rolf spent his time auditing code and writing some instrumentation tools for some of the problems we were facing at the time. It went very well, and we’re thrilled that he decided to join us.
One night during that week we were chatting with Rolf about random programming problems and he recalled the story of a past interview whereby he was asked to implement a strlen() function in C that, when compiled, would not contain any conditional branches. He didn’t pose the problem as a challenge but Brandon, Zef, and I all found it intriguing and took a shot at solving it. Leave it to Rolf Rolles to reverse the interview process itself…
Spoiler alert: what follows are our independently created solutions.
Brandon’s Solution:
#define f(b) ((-b)>>31)&1
typedef unsigned int (*funcptr)(unsigned int x);
funcptr functable[2];
unsigned char *p;
unsigned int done(unsigned int x)
{
return x;
}
unsigned int counter(unsigned int x)
{
return(functable[f(*(p+x+1))](x+1));
}
int main(int argc, char *argv[])
unsigned int len;
p = (unsigned char *)argv[argc-1];
functable[0] = (funcptr)&done;
functable[1] = counter;
len = functable[f(*p)](0);
printf("len is %un", len);
return 0;
}
Zef’s Solution:
*
* strlen without conditional branch
* compiles with -Wall -ansi
*/
#include <stdio.h>
int _gtfo(char *s);
int _str_len(char *s);
int (*f[])(char *s) = {_gtfo, _str_len};
int _gtfo(char *s)
{
return -1; /* set to '0' to include trailing null */
}
int _str_len(char *s){
char c = *s;
return f[((c & 0x01))|
((c & 0x02) >> 1)|
((c & 0x04) >> 2)|
((c & 0x08) >> 3)|
((c & 0x10) >> 4)|
((c & 0x20) >> 5)|
((c & 0x40) >> 6)|
((c & 0x80) >> 7)](++s) +1 ;
}
int main(int argc, char *argv[])
{
if(argc > 1 ) printf("strlen("%s") = %dn", argv[1], _str_len(argv[1]));
return 0;
}
Zef’s description:
“So, my immediate thought was to use function pointers to ‘conditionally’ execute code without a conditional branch. There are two possible states for each member of a string when performing a ‘strlen’-type operation. ‘Terminator’ and ‘Not Terminator’. In this case the ‘Terminator’ for a C-string is ‘NULL’ (0x00). This of course is the only value with 0 bits set; by masking each bit in the 8 bit value and shifting to the lsb then combining the values with a ‘|’ operation, a binary state is created allowing for the indepedent execution of the two defined states ‘Terminator’ and ‘Not Terminator'”.
Aaron’s Solution:
As I admittedly suck at C, I approached the problem in straight assembly (I know, that’s cheating. And yes, this could be achieved with a rep scasb, but that’s just too easy). However, I was able to solve the problem in 27 bytes:
section .text
global _start
_start:
pop eax
pop eax
xor eax, eax
xor ebx, ebx
pop esi
_continue:
mov al, [esi]
add al, 0xFF
salc
inc al
lea ecx, [0x8048097+eax*4]
jmp ecx
inc ebx
inc esi
jmp _continue
int 0x80
The three pops that occur within _start are to get access to argv[1] (the string to be measured, provided on the command line). The last pop esi puts a pointer to the string into the esi register.
The mov al, [esi] grabs a single byte off the string. Then, the add al, 0xFF is used to determine whether the byte is NULL or not. If the value is non-NULL, the add to the 8-bit register al will set the Carry flag. If it is NULL, it will not set the CF.
The next instruction is actually considered undocumented (even objdump shows the mnemonic as ‘bad’). What the salc instruction does is sets the al register to 0xFF if the Carry flag is set, otherwise it sets it to 0x00. This is the method I used to implement a binary state to determine if the character is NULL or not.
The inc al instruction then increments al, which was either 0xFF or 0x00. After the inc it will either be 0x00 or 0x01.
The lea ecx, [0x8048097+eax*4] instruction loads into ecx either the address 0x8048097 or 0x804809b. These addresses are significant and can be observed by objdump’ing the assembled binary:
strlen_no_conditionals: file format elf32-i386
Disassembly of section .text:
08048080 :
8048080: 58 pop eax
8048081: 58 pop eax
8048082: 31 c0 xor eax,eax
8048084: 31 db xor ebx,ebx
8048086: 5e pop esi
08048087 :
8048087: 8a 06 mov al,BYTE PTR [esi]
8048089: 04 ff add al,0xff
804808b: d6 (bad)
804808c: fe c0 inc al
804808e: 8d 0c 85 97 80 04 08 lea ecx,[eax*4+0x8048097]
8048095: ff e1 jmp ecx
8048097: 43 inc ebx
8048098: 46 inc esi
8048099: eb ec jmp 8048087
804809b: cd 80 int 0x80
$
So, if the character is not NULL, the code will jmp ecx to 0x8048097 which increments the string length counter (ebx) and increments the string pointer (esi) and then branches unconditionally to _continue.
If the value was NULL, the jmp ecx will land directly at the int 0x80. As the size of the inc ebx and inc esi and jmp _continue is exactly 4 bytes, the lea instruction very conveniently can load either the address of the inc ebx or directly at the int 0x80, thus removing the need for any NOP-like instructions.
The last convenient optimization to note is that the int 0x80 will execute the syscall specified by the eax register. Well, because the result of the add/salc/inc condition will set eax to 1 only when a NULL is found, the int 0x80 will execute syscall #1 which on Linux is exit(). Additionally, the exit code is specified by the ebx register. That is why I used the ebx register as my counter to hold the string length. So, upon execution of the interrupt, the exit code will contain the length of the string as can be observed by running the assembled binary and inspecting the return value:
$ ld -o strlen_no_conditionals a.o
$ ./strlen_no_conditionals "ExodusIntel" ; echo $?
11
$ ./strlen_no_conditionals "should return 16" ; echo $?
16
$
Rolf’s Solution:
“Basically, the fundamental problem to overcome with this challenge is to ‘make a decision’ — that is to say, decide when to terminate the iteration upon reaching a NULL character — without using an explicit jcc-style conditional branch. A few minutes’ reflection upon this problem yields that we could use recursion into a function pointer table with 256 entries, where 255 of the entries increased some counter variable, and the entry at 0 terminates the procedure and returns the counter. In doing so, we have replaced all conditional jumps with one indexed, switch jump. Some further reflection provides the reduction of the table size from 256 entries down to two.”
int func(char *);
int func_x(char *c) { return 1+func(c); }
int func_0(char *c) { return 0; }
ctr table[2] = { &func_0, &func_x };
int func(char *c) { return table[!!*c](c+1); }
If you’ve come up with an interesting approach, we’d love to see it. Feel free to leave a comment or some such.
—
Aaron Portnoy
@aaronportnoy