PCSX2 Documentation/PCSX2 EE Recompiler

From PCSX2 Wiki
Jump to navigation Jump to search

WORK IN PROGRESS I think it is about time that I start to contribute to this project ;)

It is a summary of my understanding of the EE recompiler. The idea is to collect the key information on the recompiler. It is interesting as a general info and it would be useful to port/improve it one day.

External Reference

Global Overview of the EE recompiler

Others useful documentation

  1. Introduction to Dynamic Recompilation
  2. Recompilers: All 'dems buzzwords?


The 3 recompiler phases

  1. The recompilation phase:
    The purpose is to compile an EE instruction list into an X86 instruction list (also know as an instruction block). Instructions are stored in a buffer called x86Ptr. It can be seen as an instruction cache.
  2. The execution phase:
    The x86 instruction block will be executed.
  3. The pause phase:
    The purpose is to emulate the others HW block (VU, GIF, DMA etc..) In particular EE interrupts are handled here.

Memory And Buffer

An important part of the recompiler is the management of various blocks (x86/EE etc...). You can see below a nice schematic with the links between them.

Recompiler block.png


PS2 Virtual space

The PS2 virtual space is composed of EE instructions. An instruction is always 4B. An instruction can be anywhere in the 4GB address space of the PS2 (yes 4GB). Of course those 4GB are mapped physically into a 32MB RAM or 4MB ROM. The address of the current instruction is stored inside an HW register. It is named PC (which stand for Program Counter).


Recompiler lookup table

The recLUT (recompiler lookup table) will allow to get the related BASEBLOCK related to the PC. The LUT maps the virtual address space as 64kB page. There is an easy macro to retrieve the current BASEBLOCK from a PC.

  • C++ macro: PC_GETBLOCK(PC)
  • C++ array: __aligned16 uptr recLUT[_64kb]

The hwLUT (hardware lookup table) will allow to get the physical PS2 address from a PS2 virtual address. Again a nice macro is available.

  • C++ macro: HWADDR(mem_address)
  • C++ array: __aligned16 uptr hwLUT[_64kb]

Important note: those LUTs are based on the TLB mapping at the boot of the EE kernel. They don't take TLB updates into consideration. It won't work if your game/program (linux/home-brew) relies on TLB management. In this situation your best bet is to use the interpreter.


Base Block

The recompiler will map the EE memory into 3 differents BASEBLOCK array.

  • C++ array: BASEBLOCK *recRAM
  • C++ array: BASEBLOCK *recROM
  • C++ array: BASEBLOCK *recROM1


The BASEBLOCK is barely a function pointer. It could be either

  • a pointer to the X86 instruction buffer
  • a pointer to a JITCompile dispatcher. The function will do
  1. Call recRecompile(cpuRegs.pc) to recompile the current block
  2. Jump to the recompiled block PC_GETBLOCK(cpuRegs.pc)->m_pFnptr()
  • a pointer to a JITCompileInBlock dispatcher.


Lazy Allocation

Might (very likely) be removed in the future. Postponed


Memory Protection

The instruction cache buffer is a cache of the EE program inside the EE memory. Therefore it is required to ensure some coherencies between the cache and the EE memory. It means that a write in the EE memory must be translated to a discard of the cache content, likely followed by a recompilation.

The situation can occurred because of self-modifying code or due to a library linking (change memory pointer in RAM).

A naive implementation would be to instrument all write to detect corresponding block. However it will cost a big penalty for each memory write. Another one will be to check the content of the instruction block at each execution. Again slow. A more complex implementation will use the page fault signal handler mechanism to detect invalid write. Guess what, we choose the later.

  • C++ function: void mmap_MarkCountedRamPage( u32 paddr )
  • C++ function: int mmap_GetRamPageInfo( u32 paddr )
  • C++ function: void mmap_ClearCpuBlock( uint offset )
  • C++ function: void dyna_page_reset(u32 start, u32 sz)
  • C++ function: void dyna_block_discard(u32 start, u32 sz)
  • C++ function: void recClear(u32 addr, u32 size)
  • C++ function: void mmap_PageFaultHandler::OnPageFaultEvent( const PageFaultInfo& info, bool& handled )
  • C++ array: u16 manual_page[Ps2MemSize::MainRam >> 12]
  • C++ array: u8 manual_counter[Ps2MemSize::MainRam >> 12]
  • C++ array: vtlb_PageProtectionInfo m_PageProtectInfo[Ps2MemSize::MainRam >> 12]

The Automatic Protection

The EE memory is memory mapped as 4K Read/Write pages. A protection status is attached for each page. If the protection is manual, you need to handle it manually (easy isn't it). This case will be discussed below. Otherwise you will mark the page as Read-Only.

The Write Interception

Now that EE memory page is Read-Only, any write on it will trigger an error. On Linux it will be a SIGSEGV (segmentation fault) signal. PCSX2 remaps the default handler to handle it. It will dispatch the signal to the correct buffer. Buffer will

  • Remount the page as Read/Write
  • Mark the memory protection as manual
  • Clear the recLUT cache

The Manual Protection

Some pages will be marked as manually protected, all the block will be recompiled with a manual protection code. The protection is an integrity check at the start of the block. Basically it compares the content of the current EE instructions with the older instruction. In case of failure the block will be cleared with the help of the dyna_block_discard function.

The Automatic Re-Protection

A couple of write could happen at the startup of the game to initialize a couple of library pointers. It would be costly to handle all the pages. Therefore a counter was added to current manual block. It will trigger a recompilation in the future. To avoid a "WRITE => SIGSEGV => MANUAL RECOMP => RECOMP" loop, after severals trials the automatic re-protection feature will be dropped.

Limitation

The biggest limitation is the mix of data and code in the same page. Data could just be the global variable of the program often after the code. It could also the thread data stack. Or the kernel area to save register context.

Code Generation

Dispatcher

PCSX2 contains various dispatchers for the EE recompiler but also for others recompiler. Dispatchers are small ASM code generated by hand. They serve 2 purposes:

  1. Wrapper to call a C++ function from the recompiler
  2. Add some glues between recompiler block

DispatcherReg

  • Generated by _DynGen_DispatcherReg
  • High-Level Description
  1. Compute the address of BASEBLOCK related to the current PC. It is equivalent to "bb = PC_GETBLOCK(cpuRegs.pc)"
  2. Jump to the related function pointer. It is equivalent to "bb->m_pFnptr()"

JITCompile

  • Generated by _DynGen_JITCompile
  • High-Level Description
  1. Realign the stack
  2. Recompile the current block. It is equivalent to "recRecompile(cpuRegs.PC)"
  3. Execute the just compiled block. It is equivalent to "DispatcherReg"

JITCompileInBlock

  • Generated by _DynGen_JITCompileInBlock
  • High-Level Description
  1. Realign the stack
  2. Execute JITCompile
  • Note: the purpose is to detect that EE instructions were already compiled inside another block.

EnterRecompiledCode

  • Generated by _DynGen_EnterRecompiledCode

DispatchBlockDiscard

  • Generated by _DynGen_DispatchBlockDiscard
  • High-Level Description
  1. Execute "dyna_block_discard"
  2. Exit the recompiler

DispatchPageReset

  • Generated by _DynGen_DispatchPageReset
  • High-Level Description
  1. Execute "dyna_page_reset"
  2. Exit the recompiler

_DynGen_DispatcherReg

  • Generated by _DynGen_DispatcherReg

EventTest

  • xCALL( recEventTest ) (TODO clean the code to add a function)

Compilation

EE Memory Emulation

DRAFT/JUNK OF OLD DATA



All those blocks are managed by the BaseBlocks class. [code] static BaseBlocks recBlocks; [/code]


_DynGen_* functions generate dispatcher functions and return a function pointer to the function. Full initialization is done in _DynGen_Dispatchers.


  • JITCompile (generated by _DynGen_JITCompile) will

1/ Call recRecompile(cpuRegs.pc) to recompile the current block 2/ Jump to the recompiled block PC_GETBLOCK(cpuRegs.pc)->m_pFnptr()

Basically all BASEBLOCK will contains JITCompile as init address.


  • JITCompileInBlock (generated by _DynGen_JITCompileInBlock)

1/ Jump to JITCompile

Basically after the compilation of BLOCK of size N. First BASEBLOCK will contains the x86 address. The remaining N-1 BLOCK will contain JITCompileInBlock.


  • DispatcherReg (generated by _DynGen_DispatcherReg) will

1/ Jump to the current block (Note. Stack won't be realigned)


  • EnterRecompiledCode (generated by _DynGen_EnterRecompiledCode) will

1/ Setup the base frame pointer 2/ Align the stack pointer 3/ Save edi/esi/ebx on the stack 4/ Simulate a function call? (potentially to help debugger to unwind the stack) 5/ Simulate the stack frame preparation "push ebp, mov ebp, esp" 6/ Save esp, ebp into static variable (for debug check). Code can surely be removed. 7/ Jump to DispatcherReg 8/ Handle the return of DispatcherReg (Leave and restore edi/esi/ebx) 9/ Handle the return of current function (leave and ret)

  • ExitRecompiledCode is the return address of DispatcherReg (end of EnterRecompiledCode)
  • DispatchBlockDiscard (generated by _DynGen_DispatchBlockDiscard) is a wrapper to the C++ function dyna_block_discard
  • DispatchPageReset (generated by _DynGen_DispatchPageReset) is a wrapper to the C++ function dyna_page_reset

[hr]


The details of the "recompilation" stage Input: the PC (first instruction address) Output: x86 code in a buffer that is ready to be executed.