x86 to x64 Binary Translator

This feature is enabled with the build flag BOXEDWINE_VM, and requires BOXEDWINE_64BIT_MMU (see hard MMU).

This is a big change from how the normal CPU emulation works, for one, it is multi threaded. So each emulated thread will have its on host thread. This isn’t even possible on the normal CPU emulator because then I would have to implement the lock instruction which would just make everything slower. For the x64 I am able to use the hardware lock instruction.

When translating x86 instructions to x64, some of them are really simple and require no change.  For example


mov eax, ecx

Would require no change, the machine code is exactly the same.

Some instructions have been removed. Like “inc eax” in x86 can be done in two ways, with machine code “40” and with “ff c0”. The first one was removed in x64 to make room for a new prefix that allows the machine code to know if it dealing with a 32-bit or 64-bit register.

Anything with memory needs to be rewritten to take into account the emulated MMU. With hard MMU (which this requires), it means just adding a simple offset.  So for example

mov eax, [DS:eax]

would have to be changed to

lea r13d, [r15+eax]
mov eax, [r10+r13]

where

  • r13 is a tmp register
  • r15 register holds the DS segment address
  • r10 holds the memory offset

So in the best case scenario, memory access instructions will become two instructions, thus slowing things down a little bit.

In the above scenario I used “lea” instead of “add” so that it would not affect the CPU flags.

x64 has 8 extra registers.  I use them like this:

  • r8, r12, r13 as tmp
  • r9 holds pointer to CPU structure
  • r10 hold memory offset
  • r11 hold stack pointer
  • r14 holds SS segment address
  • r15 holds DS segment address

Notice that I use my own stack register instead of rsp, this is because I will need to push/pop 2 and 4 byte values where as x64 rsp would expect to be 8 byte aligned.

So what happens if we need to push a register on to the stack?

push ebp

This will become

pushfq                               // we don't want this push instruction to change CPU flags
lea         r8d,[r11-4]              // calculate where we will write the value (ESP-4)
and         r8d,dword ptr [r9+2Ch]   // and that value with the stack mask (r9 is cpu, 2C is offset to stack mask)
lea         r8d,[r8+r14]             // add stack segment to address
mov         dword ptr [r8+r10],ebp   // puts ebp on the stack (might cause exception, which is why esp isn't updated yet)
and         r11d,dword ptr [r9+30h]  // and the original esp value with the not stack mask
or          r11d,r8d                 // or the new stack with the old
popfq 

in c code this looks like

void push32(struct CPU* cpu, U32 value) {
    U32 new_esp=(ESP & cpu->stackNotMask) | ((ESP - 4) & cpu->stackMask);
    writed(cpu->thread, cpu->segAddress[SS] + (new_esp & cpu->stackMask) ,value);
    ESP = new_esp;
}