Cycle-level simulation of a CPU inside a Plugin
- KVRian
- Topic Starter
- 1154 posts since 17 Feb, 2010
Hello,
let's say I'm making a VST plugin effect. It has a certain sampling rate (wich could be changed according to the host), as always happens in plugins, nothing strange so far.
Let's say I want to simulate a CPU/microcontroller/coprocessor INSIDE THIS VST PLUGIN. I mean a low-level cycle-exact simulation.
For example, let's take 'Cyclone' virtual instrument. It uses this approach to simulate a Motorola 68k and other coprocessors/stuff.
A Motorola 68k for example was clocked at about 8 Mhz. So for my purposes, I have to simulate that CPU 8 millions times per second (fixed), and sample its output inside the main plugin processing class (at samplerate).
Example, using WDL-OL : I declare a method:
CPU_SIMULATION() and I write in all the instructions to be performed 8 millions times per second.
Then, inside the main wdl method called ProcessDoubleReplacement(), I have to sample the current CPU_SIMULATION() output at every plugin iteration, for example 44100 times per second.
Question number 1: Is this general approach correct? Maybe other ways are possible, but currently I'm interested in using this particularly.
Question number 2: How can I force/let CPU_SIMULATION() to perform its instructions EXACTLY 8 millions times per second? I mean, how can I make it to 'run' at a fixed frequency of 8 Mhz ?
Thank you in advance for any hint,
let's say I'm making a VST plugin effect. It has a certain sampling rate (wich could be changed according to the host), as always happens in plugins, nothing strange so far.
Let's say I want to simulate a CPU/microcontroller/coprocessor INSIDE THIS VST PLUGIN. I mean a low-level cycle-exact simulation.
For example, let's take 'Cyclone' virtual instrument. It uses this approach to simulate a Motorola 68k and other coprocessors/stuff.
A Motorola 68k for example was clocked at about 8 Mhz. So for my purposes, I have to simulate that CPU 8 millions times per second (fixed), and sample its output inside the main plugin processing class (at samplerate).
Example, using WDL-OL : I declare a method:
CPU_SIMULATION() and I write in all the instructions to be performed 8 millions times per second.
Then, inside the main wdl method called ProcessDoubleReplacement(), I have to sample the current CPU_SIMULATION() output at every plugin iteration, for example 44100 times per second.
Question number 1: Is this general approach correct? Maybe other ways are possible, but currently I'm interested in using this particularly.
Question number 2: How can I force/let CPU_SIMULATION() to perform its instructions EXACTLY 8 millions times per second? I mean, how can I make it to 'run' at a fixed frequency of 8 Mhz ?
Thank you in advance for any hint,
-
- KVRian
- 1379 posts since 26 Apr, 2004 from UK
First, a CPU emulation cannot be done in real time. Even a "slow" one. But you can try, there is nothing impeding you for doing it the way you are trying to achieve it.
As for the second question, you need to know a little bit more about the binary it's running and the connections to the outside (as you will also need to emulate them, and they are actually producing the sound!).
As for the second question, you need to know a little bit more about the binary it's running and the connections to the outside (as you will also need to emulate them, and they are actually producing the sound!).
-
- KVRian
- 573 posts since 1 Jan, 2013 from Denmark
Check out this finnish guy, he uploads a lot of complete low-level processor emulations, for instance:
https://www.youtube.com/watch?v=y71lli8MS8s
It might give you some overview about the project at hand.
https://www.youtube.com/watch?v=y71lli8MS8s
It might give you some overview about the project at hand.
- KVRian
- Topic Starter
- 1154 posts since 17 Feb, 2010
Wow...Mayae wrote:Check out this finnish guy, he uploads a lot of complete low-level processor emulations, for instance:
It might give you some overview about the project at hand.
maybe it would be better to take a look to that code with less haste than that video
Thank you for the link,
-
- KVRAF
- 6323 posts since 30 Dec, 2004 from London uk
deleted
- KVRian
- Topic Starter
- 1154 posts since 17 Feb, 2010
Why not? There are a lot of real time low-level, cycle-exact emulators out there (Higan for example, ...and the promising CEN64). They emulate every single cycle of several CPUs/coprocessors/etc. inside a single system and syncronize them.Miles1981 wrote:First, a CPU emulation cannot be done in real time. Even a "slow" one. But you can try, there is nothing impeding you for doing it the way you are trying to achieve it.
As for the second question, you need to know a little bit more about the binary it's running and the connections to the outside (as you will also need to emulate them, and they are actually producing the sound!).
By the way I have no "performance doubts". I'm just in a "would it be possible, and how?" stage.
I'm just guessing how could it be done from an architectural viewpoint, since I never made things like this. In pro-audio plugins frameworks the main processing class/method is linked to Samplerate parameter. So I was asking ho to build a method that have to be preformed not at Samplerate, but at different rates, independently from Samplerate.
Maybe "counting" physical CPU cycles? Is it possible?
-
- KVRAF
- 6323 posts since 30 Dec, 2004 from London uk
I dont think youd want to get it to run at just 8Mhz. As fast as possible is better and easier. Each CPU operation has a known execution time. You can slow it down with delay loops. Projects like this are done by obsessive programmers with lots of free time. Better find yourself a few. You might be better off asking on emulation forums.
-
- KVRian
- 1379 posts since 26 Apr, 2004 from UK
You have to count the number of samples, and this is given by the hardware around the CPU. You have to emulate everything, not just the CPU and the number of cycles is the number of cycles to get the proper number of samples out of that hardware.xhunaudio wrote:I'm just guessing how could it be done from an architectural viewpoint, since I never made things like this. In pro-audio plugins frameworks the main processing class/method is linked to Samplerate parameter. So I was asking ho to build a method that have to be preformed not at Samplerate, but at different rates, independently from Samplerate.
Maybe "counting" physical CPU cycles? Is it possible?
- KVRian
- Topic Starter
- 1154 posts since 17 Feb, 2010
Sure, that was my "B" plan.UltraJv wrote:Projects like this are done by obsessive programmers with lots of free time. Better find yourself a few. You might be better off asking on emulation forums.
Project like this do exist, but maybe for me it would be simpler to understand it from someone who uses to deal with pro-audio frameworks also. And since Cyclone (Sonic Charge) is build around this concept, maybe people like Mr. Sonic Charge (or someone with similar knowledge) will join this discussion and any hint from him would be great...
- KVRian
- Topic Starter
- 1154 posts since 17 Feb, 2010
By the way, my aim is not to build a full system simulation, just LEARN how to execute a method in a plugin at a speed/rate that is totally INDEPENDENT FROM Samplerate. For example several oversampling methods reside on the concept "2x or 4x samplerate" that is relatively simple to perform,but it's not what I'm looking for.
I'm looking for a "parallel" (or better to say "independent" to avoid further misunderstanding ) method() processing that runs at a rate that is totally INDEPENDENT from Samplerate.
No in-depth details, just a general overview on how to proceed
I'm looking for a "parallel" (or better to say "independent" to avoid further misunderstanding ) method() processing that runs at a rate that is totally INDEPENDENT from Samplerate.
No in-depth details, just a general overview on how to proceed
-
- KVRian
- 1379 posts since 26 Apr, 2004 from UK
If you execute your emulator until you get the samples you need, then it's fully asynchronous. Otherwise, you can look at ATK to learn such a thing (the samples in the different are always a multiple of the final sample siwe, but it's always the same principle).
- KVRian
- Topic Starter
- 1154 posts since 17 Feb, 2010
Yes, asynchronous it's what I'm looking for.Miles1981 wrote:If you execute your emulator until you get the samples you need, then it's fully asynchronous. Otherwise, you can look at ATK to learn such a thing (the samples in the different are always a multiple of the final sample siwe, but it's always the same principle).
Thank you for the ATK link, I'll take a look to it in any case
-
- KVRist
- 43 posts since 11 Dec, 2003
You simply calculate the number of cycles that the hardware CPU would make in the same time span as the audio buffer. I.e. frameCount / sampleRate * cpuFrequency. Then you emulate only this number of cycles and stop until the next process call.
Naturally you can only process full clock cycles and since the ratio between host sample rate and cpu frequency isn't necessarily an integer you might want to compensate the rounding error by running an extra cycle occasionally. This can be implemented by accumulating the rounding error (the "fractional clock cycle") and adjust the integer cycle count accordingly.
In Cyclone I also synchronize various other events, like MIDI, button clicks etc by sub-dividing the process call into smaller spans depending on where the events occur on the "timeline".
Here is the inner loop from Cyclone.
/ Magnus
Naturally you can only process full clock cycles and since the ratio between host sample rate and cpu frequency isn't necessarily an integer you might want to compensate the rounding error by running an extra cycle occasionally. This can be implemented by accumulating the rounding error (the "fractional clock cycle") and adjust the integer cycle count accordingly.
In Cyclone I also synchronize various other events, like MIDI, button clicks etc by sub-dividing the process call into smaller spans depending on where the events occur on the "timeline".
Here is the inner loop from Cyclone.
Code: Select all
int framesDone = 0;
while (framesDone < sampleFrames) {
int n = sampleFrames - framesDone;
// Timer
int timerCyclesLeft = emulator->getTimerCyclesLeft();
int timerSamplesLeft = static_cast<int>(ceil(timerCyclesLeft * (1.0 / TX16W_CPU_CLOCK_FREQUENCY) * sr));
n = min(n, timerSamplesLeft);
// MIDI
if (framesDone >= earliestMidiByteTime && consumedMidiBytes < midiBytesCount
&& framesDone >= midiBytes[consumedMidiBytes].when) {
emulator->sendMidiByte(midiBytes[consumedMidiBytes].byte);
++consumedMidiBytes;
earliestMidiByteTime = framesDone + minSamplesBetweenMidiBytes;
}
if (consumedMidiBytes < midiBytesCount) {
n = min(n, max(midiBytes[consumedMidiBytes].when, earliestMidiByteTime) - framesDone);
}
// Buttons
int thisMS = static_cast<int>(msAtFrameZero + msPerSample * framesDone);
int timeToNextButtonChange = max(nextButtonChange.msTime, earliestButtonChangeTimeMS) - thisMS;
while (gotNextButtonChange && timeToNextButtonChange <= 0) {
int buttonIndex = static_cast<int>(nextButtonChange.button);
assert(0 <= buttonIndex && buttonIndex < TX16WEmulator::BUTTON_COUNT);
pressedButtons[buttonIndex] = nextButtonChange.pressed;
emulator->setButtons(pressedButtons);
earliestButtonChangeTimeMS = nextButtonChange.msTime + STICKY_BUTTONS_MS;
gotNextButtonChange = buttonChangeQueue.pop(nextButtonChange);
timeToNextButtonChange = max(nextButtonChange.msTime, earliestButtonChangeTimeMS) - thisMS;
}
if (gotNextButtonChange) {
int samplesTillNextButton = static_cast<int>(ceil(timeToNextButtonChange * sr / 1000.0));
n = min(n, samplesTillNextButton);
}
n = max(n, 0);
if (n > 0) {
float* inputPointers[TX16W_INPUT_COUNT];
if (inputsEnabled) {
for (int i = 0; i < TX16W_INPUT_COUNT; ++i) {
inputPointers[i] = (inputs[i] == 0 ? 0 : inputs[i] + framesDone);
}
audioEngine->setInputsEnabled(areInputsConnected && inputGain >= 0.00001f);
audioEngine->setInputGain(inputMicAmpEnabled, inputGain);
} else {
for (int i = 0; i < TX16W_INPUT_COUNT; ++i) {
inputPointers[i] = 0;
}
}
float* outputPointers[TX16W_OUTPUT_COUNT];
for (int i = 0; i < TX16W_OUTPUT_COUNT; ++i) {
outputPointers[i] = (outputs[i] == 0 ? 0 : outputs[i] + framesDone);
}
audioEngine->render(n, inputPointers, outputPointers, accumulating);
framesDone += n;
}
double cycleCountDouble = cycleCountError + static_cast<double>(n) / sr * TX16W_CPU_CLOCK_FREQUENCY;
int cycleCount = max(static_cast<int>(cycleCountDouble), 0);
cycleCountError = cycleCountDouble - cycleCount;
int didCycles;
{
NuXThreads::Snapshot<Floppy>::Lock floppyAccess(currentFloppy);
Floppy& floppy = floppyAccess.get();
unsigned int newIdentity = floppy.getIdentity();
emulator->setFloppyImage(floppy.getImageDataPointer(), lastMountedFloppyIdentity != newIdentity, floppy.isReadOnly());
lastMountedFloppyIdentity = newIdentity;
didCycles = emulator->emulate(cycleCount);
floppy.updateDirtyCounter(emulator->getFloppyImageDirtyCounter());
}
cycleCountError += (cycleCount - didCycles);
}
- KVRian
- Topic Starter
- 1154 posts since 17 Feb, 2010
Hi Magnus,malström wrote:You simply calculate the number of cycles that the hardware CPU would make in the same time span as the audio buffer. I.e. frameCount / sampleRate * cpuFrequency. Then you emulate only this number of cycles and stop until the next process call.
Naturally you can only process full clock cycles and since the ratio between host sample rate and cpu frequency isn't necessarily an integer you might want to compensate the rounding error by running an extra cycle occasionally. This can be implemented by accumulating the rounding error (the "fractional clock cycle") and adjust the integer cycle count accordingly.
In Cyclone I also synchronize various other events, like MIDI, button clicks etc by sub-dividing the process call into smaller spans depending on where the events occur on the "timeline".
Here is the inner loop from Cyclone.
/ MagnusCode: Select all
int framesDone = 0; while (framesDone < sampleFrames) { int n = sampleFrames - framesDone; // Timer int timerCyclesLeft = emulator->getTimerCyclesLeft(); int timerSamplesLeft = static_cast<int>(ceil(timerCyclesLeft * (1.0 / TX16W_CPU_CLOCK_FREQUENCY) * sr)); n = min(n, timerSamplesLeft); // MIDI if (framesDone >= earliestMidiByteTime && consumedMidiBytes < midiBytesCount && framesDone >= midiBytes[consumedMidiBytes].when) { emulator->sendMidiByte(midiBytes[consumedMidiBytes].byte); ++consumedMidiBytes; earliestMidiByteTime = framesDone + minSamplesBetweenMidiBytes; } if (consumedMidiBytes < midiBytesCount) { n = min(n, max(midiBytes[consumedMidiBytes].when, earliestMidiByteTime) - framesDone); } // Buttons int thisMS = static_cast<int>(msAtFrameZero + msPerSample * framesDone); int timeToNextButtonChange = max(nextButtonChange.msTime, earliestButtonChangeTimeMS) - thisMS; while (gotNextButtonChange && timeToNextButtonChange <= 0) { int buttonIndex = static_cast<int>(nextButtonChange.button); assert(0 <= buttonIndex && buttonIndex < TX16WEmulator::BUTTON_COUNT); pressedButtons[buttonIndex] = nextButtonChange.pressed; emulator->setButtons(pressedButtons); earliestButtonChangeTimeMS = nextButtonChange.msTime + STICKY_BUTTONS_MS; gotNextButtonChange = buttonChangeQueue.pop(nextButtonChange); timeToNextButtonChange = max(nextButtonChange.msTime, earliestButtonChangeTimeMS) - thisMS; } if (gotNextButtonChange) { int samplesTillNextButton = static_cast<int>(ceil(timeToNextButtonChange * sr / 1000.0)); n = min(n, samplesTillNextButton); } n = max(n, 0); if (n > 0) { float* inputPointers[TX16W_INPUT_COUNT]; if (inputsEnabled) { for (int i = 0; i < TX16W_INPUT_COUNT; ++i) { inputPointers[i] = (inputs[i] == 0 ? 0 : inputs[i] + framesDone); } audioEngine->setInputsEnabled(areInputsConnected && inputGain >= 0.00001f); audioEngine->setInputGain(inputMicAmpEnabled, inputGain); } else { for (int i = 0; i < TX16W_INPUT_COUNT; ++i) { inputPointers[i] = 0; } } float* outputPointers[TX16W_OUTPUT_COUNT]; for (int i = 0; i < TX16W_OUTPUT_COUNT; ++i) { outputPointers[i] = (outputs[i] == 0 ? 0 : outputs[i] + framesDone); } audioEngine->render(n, inputPointers, outputPointers, accumulating); framesDone += n; } double cycleCountDouble = cycleCountError + static_cast<double>(n) / sr * TX16W_CPU_CLOCK_FREQUENCY; int cycleCount = max(static_cast<int>(cycleCountDouble), 0); cycleCountError = cycleCountDouble - cycleCount; int didCycles; { NuXThreads::Snapshot<Floppy>::Lock floppyAccess(currentFloppy); Floppy& floppy = floppyAccess.get(); unsigned int newIdentity = floppy.getIdentity(); emulator->setFloppyImage(floppy.getImageDataPointer(), lastMountedFloppyIdentity != newIdentity, floppy.isReadOnly()); lastMountedFloppyIdentity = newIdentity; didCycles = emulator->emulate(cycleCount); floppy.updateDirtyCounter(emulator->getFloppyImageDirtyCounter()); } cycleCountError += (cycleCount - didCycles); }
thank you so much for joining this discussion and for the code example !
Sure, I didn't think to "use" the buffer size... (improves cycle accuracy-timing compared to "cpuFrequency/sampleRate" ... I assume this is the reason to choose that instead of a simpler "cpuFrequency/sampleRate" , right?).
Since mine is just a question about the "theory", did you ever heard about any method in an emulator like this that uses a fully asynchronous loop independent from the main audio system rate?
Example : a method that doesn't refer to the main audio system's samplerate, or buffer size etc... It runs independently (maybe in a separate thread?!) and its output is then sampled at a regular (samplerate) rate inside the main audio processing method of audio frameworks.
Congrats for Cyclone and all other softwares