PCSX2 Documentation/PCSX2 EE Recompiler: Difference between revisions

Jump to navigation Jump to search
 
(17 intermediate revisions by the same user not shown)
Line 65: Line 65:


=== Lazy Allocation ===
=== Lazy Allocation ===
Might (very likely) be removed in the future. Postponed
Recompiler buffers are lazy-allocated to reduce the physical memory footprint. During the init, contiguous virtual memories are reserved to hold the various buffers. However memories can't be accessed (no read, no write permission). A write to the buffer will trigger a segmentation fault that it will be caught by a dedicated handler. The handler will enable permission access. And therefore the memory will be allocated. In order to limit segmentation fault management, buffers are allocated by big block (typically 1/4th of the max size).


In my humble opinion, this mechanism is useless nowadays because
* kernel might already do lazy-allocation under the hood
* unused memory will be pushed in the swap
* 32 bits program on recent machines are limited by the virtual address space not the physical resources anymore. The former is 2-4GB whereas a single DDR4 module is 4GB.
Potentially this feature could be removed in the future but it is a very low priority


=== Memory Protection ===
=== Memory Protection ===
Line 74: Line 80:
The situation can occurred because of self-modifying code or due to a library linking (change memory pointer in RAM).
The situation can occurred because of self-modifying code or due to a library linking (change memory pointer in RAM).


A naive implementation would be to instrument all write to detect corresponding block. However it will cost a big penalty for each memory write. Another one will be to check the content of the instruction block at each execution. Again slow. A more complex implementation will use the page fault signal handler mechanism to detect invalid write. Guess what, we choose the later.
A naive implementation would be to instrument all write to detect corresponding block. However it will cost a big penalty for each memory write. Another one will be to check the content of the instruction block at each execution. Again slow. A more complex implementation will use the page fault signal handler mechanism to detect invalid write. Guess what, we choose the latter.


* '''<code>C++ function: void mmap_MarkCountedRamPage( u32 paddr )</code>'''
* '''<code>C++ function: void mmap_MarkCountedRamPage( u32 paddr )</code>'''
Line 110: Line 116:


The biggest limitation is the mix of data and code in the same page. Data could just be the global variable of the program often after the code. It could also the thread data stack. Or the kernel area to save register context.
The biggest limitation is the mix of data and code in the same page. Data could just be the global variable of the program often after the code. It could also the thread data stack. Or the kernel area to save register context.
== Blocks Management ==
=== Extended Base Block ===
Recompiler handles code by block. The standard base block was already disccussed above *TODO ADD A LINK*. The standard base block is rather primitive and allow only a translation from EE's PC to the x86 buffer address. It is the minimum to run but it isn't enough to manage them so they're extended to '''BASEBLOCKEX'''. This new structure contains both the EE's PC and x86 buffer address with their respective code segment size. Note: EE instructions have a constant size of 4 bytes so you can store directly the number of instructions. However X86 instructions have a variable length so you only have a byte length. An extended base block will be created every time you recompile a new block.
All extended base blocks are aggregated into a map/array class named '''BaseBlockArray'''. It has the following properties
# Entries are indexed with a continuous index. I.e. if you have 10 elements, they will be placed from index 0 to 9.
#* It means that every addition/removal will move the element (also know as self-balancing tree).
# Entries are sorted on the ascending order of EE's PC.
#* Be aware that block can overlap. So the Nth entry can finish before the previous N-1th entry.
# Lookups are implemented as a binary search. Easy to implement due to the 2 previous properties.
* '''<code>C++ class: BaseBlockArray</code>'''
* '''<code>C++ class: BaseBlocks</code>'''
* '''<code>C++ object: static BaseBlocks recBlocks</code>'''
=== Block Link ===
At the end of execution of a block you have 2 possibilities
* Block was rather big so you will
# update the event
# lookup the next block
# then execute it
* Block was small so you will
# lookup the next block
# then execute it
Often, you know the next block at compile it. So you can directly do the lookup at compilation and directly jump from block1 to block2. It allow to save a couple of instructions. Small loops really profit of this optimization.
This optimization requires that you keep a record of the x86 pointer at end of the block and the destination PC. You can have severals blocks that jump to the same PC so you need a multi map.
There are 2 ways to handle the link creation.
# You want to link to a block that already exists (let's call it a direct link).
## You get the x86 function pointer of the destination block.
## You update the jump instruction of the current block to jump to the destination block.
## You save the link to handle the removal of the destination block.
# You want to link to a block that doesn't exists (let's call it a deferred link).
## You get the x86 function pointer of the destination block.
## You update the jump instruction of the current block to jump to the '''recompiler dispatcher'''
## You save the link '''to update the jump instruction later when a new block is created'''. As previously, you keep the link information to handle the removal of destination block.
Every time you create a new block, you will check if the new block is the destination block of any blocks. If you have some hits, a link will be established. This way it will handle all deferred links.
Every time you delete a block, again you will check if the deleted block is the destination block of any blocks. If you have some hits, the link will be deleted. It is important to delete dangling link otherwise you might jump into the void.
=== The Block Manager ===
TO DO !


== Code Generation ==
== Code Generation ==
Line 118: Line 174:
# Wrapper to call a C++ function from the recompiler
# Wrapper to call a C++ function from the recompiler
# Add some glues between recompiler block
# Add some glues between recompiler block
Note: '''"ExitRecompiledCode"''' is the exit address/label of the recompiler.
==== EventTest ====
* Generated by '''_DynGen_DispatcherEvent'''
* High-Level Description
# Execute '''"recEventTest"'''


==== DispatcherReg ====
==== DispatcherReg ====
Line 140: Line 204:
# Realign the stack  
# Realign the stack  
# Execute JITCompile
# Execute JITCompile
* Note: the purpose is to detect that EE instructions were already compiled inside another block.
* Note2: it might be removed soon


==== EnterRecompiledCode ====
==== EnterRecompiledCode ====


* Generated by '''_DynGen_EnterRecompiledCode'''
* Generated by '''_DynGen_EnterRecompiledCode'''
* High-Level Description
# Setup the base frame pointer
# Align the stack pointer
# Save edi/esi/ebx on the stack
# Simulate a function call by pushing the call address and EBP onto the stack
# Simulate the stack frame preparation ("push ebp, mov ebp, esp")
# Execute DispatcherReg
# Handle the return of DispatcherReg call ("leave")
# Restore edi/esi/ebx
# Destroy the stack ("leave")
# Return to C++ world ("ret")
* Note: I think the stack is simulated to ease stack rewind for exeception/debugger.


==== DispatchBlockDiscard ====
==== DispatchBlockDiscard ====


* Generated by  '''_DynGen_DispatchBlockDiscard'''
* Generated by  '''_DynGen_DispatchBlockDiscard'''
* High-Level Description
# Execute '''"dyna_block_discard"'''
# Exit the recompiler


==== DispatchPageReset ====
==== DispatchPageReset ====


* Generated by  '''_DynGen_DispatchPageReset'''
* Generated by  '''_DynGen_DispatchPageReset'''
 
* High-Level Description
==== _DynGen_DispatcherReg ====
# Execute '''"dyna_page_reset"'''
 
# Exit the recompiler
* Generated by  '''_DynGen_DispatcherReg'''
 
==== EventTest ====
 
* xCALL( recEventTest ) (TODO clean the code to add a function)


=== Compilation ===
=== Compilation ===


== EE Memory Emulation ==
== EE Memory Emulation ==
''' DRAFT/JUNK OF OLD DATA '''
All those blocks are managed by the BaseBlocks class.
[code]
static BaseBlocks recBlocks;
[/code]
_DynGen_* functions generate dispatcher functions and return a function pointer to the function. Full initialization is done in _DynGen_Dispatchers.
* JITCompile (generated by _DynGen_JITCompile) will
1/ Call recRecompile(cpuRegs.pc) to recompile the current block
2/ Jump to the recompiled block PC_GETBLOCK(cpuRegs.pc)->m_pFnptr()
Basically all BASEBLOCK will contains JITCompile as init address.
* JITCompileInBlock (generated by _DynGen_JITCompileInBlock)
1/ Jump to JITCompile
Basically after the compilation of BLOCK of size N. First BASEBLOCK
will contains the x86 address. The remaining N-1 BLOCK will contain
JITCompileInBlock.
* DispatcherReg (generated by _DynGen_DispatcherReg) will
1/ Jump to the current block (Note. Stack won't be realigned)
* EnterRecompiledCode (generated by _DynGen_EnterRecompiledCode) will
1/ Setup the base frame pointer
2/ Align the stack pointer
3/ Save edi/esi/ebx on the stack
4/ Simulate a function call? (potentially to help debugger to unwind the stack)
5/ Simulate the stack frame preparation "push ebp, mov ebp, esp"
6/ Save esp, ebp into static variable (for debug check). Code can surely be removed.
7/ Jump to DispatcherReg
8/ Handle the return of DispatcherReg (Leave and restore edi/esi/ebx)
9/ Handle the return of current function (leave and ret)
* ExitRecompiledCode is the return address of DispatcherReg (end of EnterRecompiledCode)
* DispatchBlockDiscard (generated by _DynGen_DispatchBlockDiscard) is a wrapper to the C++ function dyna_block_discard
* DispatchPageReset (generated by _DynGen_DispatchPageReset) is a wrapper to the C++ function dyna_page_reset
[hr]
The details of the "recompilation" stage
Input: the PC (first instruction address)
Output: x86 code in a buffer that is ready to be executed.
ninja
56

edits

Navigation menu