PCSX2 Documentation/Introduction to Dynamic Recompilation: Difference between revisions
PCSX2 Documentation/Introduction to Dynamic Recompilation (view source)
Revision as of 07:13, 8 January 2022
, 8 January 2022Get rid of html entity (&), use & instead
(Created page with "''Originally written by cottonvibes'' This blog post is an introduction to dynamic recompilers (dynarecs), and hopes to provide some insight on how they work and why pcsx2 us...") |
(Get rid of html entity (&), use & instead) Tags: Mobile edit Mobile web edit |
||
(7 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
''Originally written by cottonvibes'' | ''Originally written by cottonvibes'' | ||
This blog post is an introduction to dynamic recompilers (dynarecs), and hopes to provide some insight on how they work and why pcsx2 uses them to speed up emulation. | This blog post is an introduction to dynamic recompilers (dynarecs), and hopes to provide some insight on how they work and why pcsx2 uses them to speed up emulation. To first understand why dynarecs are useful, you must first be familiar with a basic interpreter emulator. | ||
To first understand why dynarecs are useful, you must first be familiar with a basic interpreter emulator. | |||
Assume we are emulating a very simple processor. Processors have instruction sets which are a set of different instructions they can compute. | Assume we are emulating a very simple processor. Processors have instruction sets which are a set of different instructions they can compute. | ||
Lets assume the processor we are emulating is a made-up chip I'll call SL3 (super lame 3), and has only these 3 instructions (and each instruction has fixed width of 4 bytes): | Lets assume the processor we are emulating is a made-up chip I'll call SL3 (super lame 3), and has only these 3 instructions (and each instruction has fixed width of 4 bytes): | ||
<source lang="asm"> | |||
MOV dest_reg, src1_reg // Move source register to destination register | MOV dest_reg, src1_reg // Move source register to destination register | ||
ADD dest_reg, src1_reg, src2_reg // Add source1 and source2 registers, and store the result in destination register | ADD dest_reg, src1_reg, src2_reg // Add source1 and source2 registers, and store the result in destination register | ||
BR relative_address // Branch (jump) to relative address (PC += relative_address * 4) | BR relative_address // Branch (jump) to relative address (PC += relative_address * 4) | ||
</source> | |||
Processors generally have what we call registers which can hold data, and the processor's instructions perform the operations on these registers. | Processors generally have what we call registers which can hold data, and the processor's instructions perform the operations on these registers. | ||
Line 20: | Line 17: | ||
Now to program for this processor, we can have the following code: | Now to program for this processor, we can have the following code: | ||
<source lang="asm"> | |||
MOV reg1, reg0 | MOV reg1, reg0 | ||
ADD reg4, reg2, reg3 | ADD reg4, reg2, reg3 | ||
BR 5 | BR 5 | ||
</source> | |||
What this code does is: | What this code does is: | ||
#It moves register 0 to register 1 (so now register 1 holds a copy of register 0's data). | |||
#It adds register 2 and register 3 together, and stores the result in register 4. | |||
#It branches 5 instructions further away (so now it jumps to some code that is further down (not shown in above example)) | |||
So that is how we can program for the SL3 processor in assembly code. But how do we emulate it? | So that is how we can program for the SL3 processor in assembly code. But how do we emulate it? | ||
Line 34: | Line 32: | ||
To actually emulate this processor we can use an interpreter. An interpreter simply fetches each instruction opcode and executes them accordingly (e.g. by calling emulated methods for each different instruction). The rest of the emulator (emulating other processors/peripherals of our system) can then get updated sometime in between instructions or after a group of cpu instructions are run. Interpreters are a simple and complete way to emulate a system. | To actually emulate this processor we can use an interpreter. An interpreter simply fetches each instruction opcode and executes them accordingly (e.g. by calling emulated methods for each different instruction). The rest of the emulator (emulating other processors/peripherals of our system) can then get updated sometime in between instructions or after a group of cpu instructions are run. Interpreters are a simple and complete way to emulate a system. | ||
{http://forums.pcsx2.net/Thread-blog-Introduction-to-Dynamic-Recompilation?pid=102002#pid102002 Click here to see a C++ code example of a simple interpreter] | |||
Using interpreters we constantly have to be fetching and executing instructions one-by-one. There is a lot of overhead in this, and minimal room for optimization since most special case optimizations will have the overhead of checking for them (so it will for example add extra if-statements and conditionals... reducing the gain from the optimization). But there's a faster way to do processor emulation which doesn't have these draw-backs... using dynamic recompilation! | Using interpreters we constantly have to be fetching and executing instructions one-by-one. There is a lot of overhead in this, and minimal room for optimization since most special case optimizations will have the overhead of checking for them (so it will for example add extra if-statements and conditionals... reducing the gain from the optimization). But there's a faster way to do processor emulation which doesn't have these draw-backs... using dynamic recompilation! | ||
Line 44: | Line 42: | ||
So for instance remember our above SL3 program: | So for instance remember our above SL3 program: | ||
<source lang="asm"> | |||
MOV reg1, reg0 | MOV reg1, reg0 | ||
ADD reg4, reg2, reg3 | ADD reg4, reg2, reg3 | ||
BR 5 | BR 5 | ||
</source> | |||
Lets assume this code is part of some function and gets called 100's of times a second (this could sound crazy, but games/applications commonly call the same code hundreds or thousands of times a second). | Lets assume this code is part of some function and gets called 100's of times a second (this could sound crazy, but games/applications commonly call the same code hundreds or thousands of times a second). | ||
Line 62: | Line 61: | ||
PCSX2 has a very cool emitter that looks very similar to x86-32 assembly, except the instructions have an 'x' before them. | PCSX2 has a very cool emitter that looks very similar to x86-32 assembly, except the instructions have an 'x' before them. | ||
So for example: | So for example: | ||
<source lang="asm"> | |||
mov eax, ecx; | mov eax, ecx; | ||
</source> | |||
is | is | ||
<source lang="asm"> | |||
xMOV(eax, ecx); | xMOV(eax, ecx); | ||
</source> | |||
with the pcsx2 emitter. | with the pcsx2 emitter. | ||
Line 71: | Line 74: | ||
The code for actually recompiling these blocks looks something like this: | The code for actually recompiling these blocks looks something like this: | ||
<source lang="cpp"> | |||
// This is our emulated MOV instruction | // This is our emulated MOV instruction | ||
void MOV() { | void MOV() { | ||
Line 77: | Line 80: | ||
u8 reg1 = fetch(); // Get source 1 register number | u8 reg1 = fetch(); // Get source 1 register number | ||
xMOV(eax, ptr[& | xMOV(eax, ptr[&cpuRegs[reg1]]); // Move reg1's data to eax | ||
xMOV(ptr[& | xMOV(ptr[&cpuRegs[dest]], eax); // Move eax to dest register | ||
fetch(); // This fetch is needed because every instruction in our SL3 processor is 4 bytes | fetch(); // This fetch is needed because every instruction in our SL3 processor is 4 bytes | ||
Line 89: | Line 92: | ||
u8 reg2 = fetch(); // Get source 2 register number | u8 reg2 = fetch(); // Get source 2 register number | ||
xMOV(eax, ptr[& | xMOV(eax, ptr[&cpuRegs[reg1]]); // Move reg1's data to eax | ||
xADD(eax, ptr[& | xADD(eax, ptr[&cpuRegs[reg2]]); // Add eax with reg2's data | ||
xMOV(ptr[& | xMOV(ptr[&cpuRegs[dest]], eax); // Move eax to dest register | ||
} | } | ||
Line 157: | Line 160: | ||
} | } | ||
} | } | ||
</source> | |||
Note the above code doesn't have any logic to successfully exit once it starts executing recompiled blocks... I left this stuff out in order to not complicate things... so assume that somehow execution ends and we can get back to running the other parts of the emulator... | Note the above code doesn't have any logic to successfully exit once it starts executing recompiled blocks... I left this stuff out in order to not complicate things... so assume that somehow execution ends and we can get back to running the other parts of the emulator... | ||
Line 180: | Line 184: | ||
I should also add that currently pcsx2 has the following dynarecs: | I should also add that currently pcsx2 has the following dynarecs: | ||
EE recompiler (for MIPS R5900 ee-core processor) | *EE recompiler (for MIPS R5900 ee-core processor) | ||
IOP recompiler (for MIPS R3000A i/o processor) | *IOP recompiler (for MIPS R3000A i/o processor) | ||
microVU recompiler (for VU0/VU1, and COP2 instructions) | *microVU recompiler (for VU0/VU1, and COP2 instructions) | ||
Super VU recompiler (can be used instead of microVU for VU0/VU1) | *Super VU recompiler (can be used instead of microVU for VU0/VU1) | ||
newVIF unpack recompiler (recompiles vif unpack routines) | *newVIF unpack recompiler (recompiles vif unpack routines) | ||
r3000air (not yet finished, but should one day supersede the IOP recompiler) | *r3000air (not yet finished, but should one day supersede the IOP recompiler) | ||
{{PCSX2 Documentation Navbox}} |