PCSX2 Documentation/64-bit Recompilation: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 1: Line 1:
{{DocTabs|Section=3}}
''Originally written by ZeroFrog''
''Originally written by ZeroFrog''


Line 6: Line 5:
Before going into technical details, I want to cover the current Pcsx2 recompilation model.
Before going into technical details, I want to cover the current Pcsx2 recompilation model.


Pcsx2 Recompilation
==Pcsx2 Recompilation==


Every different instruction set requires either an interpreter or a recompiler to execute it on the PC. Both are important in emulation. Interpreters are implemented with regular high-level languages and are platform independent. They are easy to program, easy to debug, but slow. They are extremely important for testing and debugging purposes. For example, interpreting a simple 32bit EE MIPS instruction (code) might look like:
Every different instruction set requires either an interpreter or a recompiler to execute it on the PC. Both are important in emulation. Interpreters are implemented with regular high-level languages and are platform independent. They are easy to program, easy to debug, but slow. They are extremely important for testing and debugging purposes. For example, interpreting a simple 32bit EE MIPS instruction (code) might look like:


<nowiki>
<source lang="cpp">
switch(code>>26) {
switch(code>>26) {
case 0x02: // J - jump to  
case 0x02: // J - jump to  
Line 19: Line 18:
   break;
   break;
...
...
}</nowiki>
}
</source>
 
Recompilers, on the other hand, try to cut as many corners as possible. For example, we know the instruction at address 0x1000 will never change, so there is no reason why the CPU needs to execute the switch statement and decode the instruction every single time it executes it. So recompilers generate the minimal amount of assembly the CPU needs to execute to emulate that instruction. Because we're working with assembly, recompilation is a very platform dependent process.
Recompilers, on the other hand, try to cut as many corners as possible. For example, we know the instruction at address 0x1000 will never change, so there is no reason why the CPU needs to execute the switch statement and decode the instruction every single time it executes it. So recompilers generate the minimal amount of assembly the CPU needs to execute to emulate that instruction. Because we're working with assembly, recompilation is a very platform dependent process.


Line 26: Line 27:
More complex recompilers divide the code into simple blocks (no jumps/branches) and try to preserve target platform registers across instructions in the real CPU registers. There are many different types of register allocation algorithms using graph coloring. Such compilers might also do constant propagation elimination. A common pattern in the MIPS Emotion Engine is something like:
More complex recompilers divide the code into simple blocks (no jumps/branches) and try to preserve target platform registers across instructions in the real CPU registers. There are many different types of register allocation algorithms using graph coloring. Such compilers might also do constant propagation elimination. A common pattern in the MIPS Emotion Engine is something like:


<source lang="asm">
lui s0, 0x1000
lui s0, 0x1000
lw s0, 0x2000(s0)
lw s0, 0x2000(s0)
</source>


If we propagated the constants at the lw, we know that the read address is 0x10002000.
If we propagated the constants at the lw, we know that the read address is 0x10002000.
Line 43: Line 46:
For those that remember what it was like in the 0.8.1 days can appreciate how powerful the 0.9.1 Pcsx2 optimizations are.
For those that remember what it was like in the 0.8.1 days can appreciate how powerful the 0.9.1 Pcsx2 optimizations are.


x86-64
==x86-64==


So why isn't x86-32 enough? Well, for starters the Playstation 2 EE has 32 128bit regular registers, 32 32bit floating point registers, and some COP0 registers. Most instructions work on 64 bits, the MMI instructions work on the full 128bits. On the other hand, the x86 CPU has 8 32bit general purpose registers (one is for stack), 8 64bit registers (MMX), and 8 128bit registers(SSE). And you can't combine the three that easily (ie: you can't add an x86 register with a SSE register before first transferring the x86 to SSE or vice versa). So there's a very big difference in registers sizes. Because of the small number of x86 registers, the recompiler does a lot of register thrashing (registers are spilled to memory very frequently). Each memory read/write is pretty slow, so the more thrashing, the slower the recompiler becomes. Also, x86-32 is inherently 32bit, so a 64bit add would require 2 32bit instructions and 4 regular x86 registers for the source and result (2 if reading from memory). The EE recompiler tries to alleviate the register pressure by using the 64bit arithmetic capabilities of MMX, but MMX has a pretty limited ISA and intra-register set transfers kill performance.
So why isn't x86-32 enough? Well, for starters the Playstation 2 EE has 32 128bit regular registers, 32 32bit floating point registers, and some COP0 registers. Most instructions work on 64 bits, the MMI instructions work on the full 128bits. On the other hand, the x86 CPU has 8 32bit general purpose registers (one is for stack), 8 64bit registers (MMX), and 8 128bit registers(SSE). And you can't combine the three that easily (ie: you can't add an x86 register with a SSE register before first transferring the x86 to SSE or vice versa). So there's a very big difference in registers sizes. Because of the small number of x86 registers, the recompiler does a lot of register thrashing (registers are spilled to memory very frequently). Each memory read/write is pretty slow, so the more thrashing, the slower the recompiler becomes. Also, x86-32 is inherently 32bit, so a 64bit add would require 2 32bit instructions and 4 regular x86 registers for the source and result (2 if reading from memory). The EE recompiler tries to alleviate the register pressure by using the 64bit arithmetic capabilities of MMX, but MMX has a pretty limited ISA and intra-register set transfers kill performance.
Line 63: Line 66:
Moral of the blog Most recompiler theory discussed here actually comes straight from compiler theory. Compilers will always be necessary as long as engineers keep coming with new instruction set architectures (ISAs). Learn how a compiler works. I recommend Compilers: Principles, Techniques, and Tools by Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman.
Moral of the blog Most recompiler theory discussed here actually comes straight from compiler theory. Compilers will always be necessary as long as engineers keep coming with new instruction set architectures (ISAs). Learn how a compiler works. I recommend Compilers: Principles, Techniques, and Tools by Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman.


{{PCSX2 Developers Blog Navbox}}
{{PCSX2 Documentation Navbox}}
ninja
782

edits