Cycle-level simulation of a CPU inside a Plugin

DSP, Plugin and Host development discussion.
Post Reply New Topic
RELATED
PRODUCTS

Post

Hello,

let's say I'm making a VST plugin effect. It has a certain sampling rate (wich could be changed according to the host), as always happens in plugins, nothing strange so far.

Let's say I want to simulate a CPU/microcontroller/coprocessor INSIDE THIS VST PLUGIN. I mean a low-level cycle-exact simulation.

For example, let's take 'Cyclone' virtual instrument. It uses this approach to simulate a Motorola 68k and other coprocessors/stuff.

A Motorola 68k for example was clocked at about 8 Mhz. So for my purposes, I have to simulate that CPU 8 millions times per second (fixed), and sample its output inside the main plugin processing class (at samplerate).

Example, using WDL-OL : I declare a method:

CPU_SIMULATION() and I write in all the instructions to be performed 8 millions times per second.

Then, inside the main wdl method called ProcessDoubleReplacement(), I have to sample the current CPU_SIMULATION() output at every plugin iteration, for example 44100 times per second.

Question number 1: Is this general approach correct? Maybe other ways are possible, but currently I'm interested in using this particularly.

Question number 2: How can I force/let CPU_SIMULATION() to perform its instructions EXACTLY 8 millions times per second? I mean, how can I make it to 'run' at a fixed frequency of 8 Mhz ?

Thank you in advance for any hint,
bruno @ Xhun Audio || www.xhun-audio.com || Twitter || Instagram
Image

Post

First, a CPU emulation cannot be done in real time. Even a "slow" one. But you can try, there is nothing impeding you for doing it the way you are trying to achieve it.
As for the second question, you need to know a little bit more about the binary it's running and the connections to the outside (as you will also need to emulate them, and they are actually producing the sound!).

Post

Check out this finnish guy, he uploads a lot of complete low-level processor emulations, for instance:
https://www.youtube.com/watch?v=y71lli8MS8s

It might give you some overview about the project at hand.

Post

Mayae wrote:Check out this finnish guy, he uploads a lot of complete low-level processor emulations, for instance:

It might give you some overview about the project at hand.
Wow...

maybe it would be better to take a look to that code with less haste than that video :D

Thank you for the link,
bruno @ Xhun Audio || www.xhun-audio.com || Twitter || Instagram
Image

Post

deleted

Post

Miles1981 wrote:First, a CPU emulation cannot be done in real time. Even a "slow" one. But you can try, there is nothing impeding you for doing it the way you are trying to achieve it.
As for the second question, you need to know a little bit more about the binary it's running and the connections to the outside (as you will also need to emulate them, and they are actually producing the sound!).
Why not? There are a lot of real time low-level, cycle-exact emulators out there (Higan for example, ...and the promising CEN64). They emulate every single cycle of several CPUs/coprocessors/etc. inside a single system and syncronize them.

By the way I have no "performance doubts". I'm just in a "would it be possible, and how?" stage. :)

I'm just guessing how could it be done from an architectural viewpoint, since I never made things like this. In pro-audio plugins frameworks the main processing class/method is linked to Samplerate parameter. So I was asking ho to build a method that have to be preformed not at Samplerate, but at different rates, independently from Samplerate.

Maybe "counting" physical CPU cycles? Is it possible?
bruno @ Xhun Audio || www.xhun-audio.com || Twitter || Instagram
Image

Post

I dont think youd want to get it to run at just 8Mhz. As fast as possible is better and easier. Each CPU operation has a known execution time. You can slow it down with delay loops. Projects like this are done by obsessive programmers with lots of free time. Better find yourself a few. You might be better off asking on emulation forums.

Post

xhunaudio wrote:I'm just guessing how could it be done from an architectural viewpoint, since I never made things like this. In pro-audio plugins frameworks the main processing class/method is linked to Samplerate parameter. So I was asking ho to build a method that have to be preformed not at Samplerate, but at different rates, independently from Samplerate.

Maybe "counting" physical CPU cycles? Is it possible?
You have to count the number of samples, and this is given by the hardware around the CPU. You have to emulate everything, not just the CPU and the number of cycles is the number of cycles to get the proper number of samples out of that hardware.

Post

UltraJv wrote:Projects like this are done by obsessive programmers with lots of free time. Better find yourself a few. You might be better off asking on emulation forums.
Sure, that was my "B" plan. :)

Project like this do exist, but maybe for me it would be simpler to understand it from someone who uses to deal with pro-audio frameworks also. And since Cyclone (Sonic Charge) is build around this concept, maybe people like Mr. Sonic Charge (or someone with similar knowledge) will join this discussion and any hint from him would be great... :tu:
bruno @ Xhun Audio || www.xhun-audio.com || Twitter || Instagram
Image

Post

By the way, my aim is not to build a full system simulation, just LEARN how to execute a method in a plugin at a speed/rate that is totally INDEPENDENT FROM Samplerate. For example several oversampling methods reside on the concept "2x or 4x samplerate" that is relatively simple to perform,but it's not what I'm looking for.

I'm looking for a "parallel" (or better to say "independent" to avoid further misunderstanding :D ) method() processing that runs at a rate that is totally INDEPENDENT from Samplerate.

No in-depth details, just a general overview on how to proceed :)
bruno @ Xhun Audio || www.xhun-audio.com || Twitter || Instagram
Image

Post

If you execute your emulator until you get the samples you need, then it's fully asynchronous. Otherwise, you can look at ATK to learn such a thing (the samples in the different are always a multiple of the final sample siwe, but it's always the same principle).

Post

Miles1981 wrote:If you execute your emulator until you get the samples you need, then it's fully asynchronous. Otherwise, you can look at ATK to learn such a thing (the samples in the different are always a multiple of the final sample siwe, but it's always the same principle).
Yes, asynchronous it's what I'm looking for.

Thank you for the ATK link, I'll take a look to it in any case :)
bruno @ Xhun Audio || www.xhun-audio.com || Twitter || Instagram
Image

Post

You simply calculate the number of cycles that the hardware CPU would make in the same time span as the audio buffer. I.e. frameCount / sampleRate * cpuFrequency. Then you emulate only this number of cycles and stop until the next process call.

Naturally you can only process full clock cycles and since the ratio between host sample rate and cpu frequency isn't necessarily an integer you might want to compensate the rounding error by running an extra cycle occasionally. This can be implemented by accumulating the rounding error (the "fractional clock cycle") and adjust the integer cycle count accordingly.

In Cyclone I also synchronize various other events, like MIDI, button clicks etc by sub-dividing the process call into smaller spans depending on where the events occur on the "timeline".

Here is the inner loop from Cyclone.

Code: Select all

int framesDone = 0;
	while (framesDone < sampleFrames) {
		int n = sampleFrames - framesDone;
		
		// Timer
		
		int timerCyclesLeft = emulator->getTimerCyclesLeft();
		int timerSamplesLeft = static_cast<int>(ceil(timerCyclesLeft * (1.0 / TX16W_CPU_CLOCK_FREQUENCY) * sr));
		n = min(n, timerSamplesLeft);

		// MIDI
		
		if (framesDone >= earliestMidiByteTime && consumedMidiBytes < midiBytesCount
				&& framesDone >= midiBytes[consumedMidiBytes].when) {
			emulator->sendMidiByte(midiBytes[consumedMidiBytes].byte);
			++consumedMidiBytes;
			earliestMidiByteTime = framesDone + minSamplesBetweenMidiBytes;
		}
		if (consumedMidiBytes < midiBytesCount) {
			n = min(n, max(midiBytes[consumedMidiBytes].when, earliestMidiByteTime) - framesDone);
		}
		
		// Buttons
		
		int thisMS = static_cast<int>(msAtFrameZero + msPerSample * framesDone);
		int timeToNextButtonChange = max(nextButtonChange.msTime, earliestButtonChangeTimeMS) - thisMS;
		while (gotNextButtonChange && timeToNextButtonChange <= 0) {
			int buttonIndex = static_cast<int>(nextButtonChange.button);
			assert(0 <= buttonIndex && buttonIndex < TX16WEmulator::BUTTON_COUNT);
			pressedButtons[buttonIndex] = nextButtonChange.pressed;
			emulator->setButtons(pressedButtons);
			earliestButtonChangeTimeMS = nextButtonChange.msTime + STICKY_BUTTONS_MS;
			gotNextButtonChange = buttonChangeQueue.pop(nextButtonChange);
			timeToNextButtonChange = max(nextButtonChange.msTime, earliestButtonChangeTimeMS) - thisMS;
		}
		if (gotNextButtonChange) {
			int samplesTillNextButton = static_cast<int>(ceil(timeToNextButtonChange * sr / 1000.0));
			n = min(n, samplesTillNextButton);
		}
		
		n = max(n, 0);
		if (n > 0) {
			float* inputPointers[TX16W_INPUT_COUNT];
			if (inputsEnabled) {
				for (int i = 0; i < TX16W_INPUT_COUNT; ++i) {
					inputPointers[i] = (inputs[i] == 0 ? 0 : inputs[i] + framesDone);
				}
				audioEngine->setInputsEnabled(areInputsConnected && inputGain >= 0.00001f);
				audioEngine->setInputGain(inputMicAmpEnabled, inputGain);
			} else {
				for (int i = 0; i < TX16W_INPUT_COUNT; ++i) {
					inputPointers[i] = 0;
				}
			}
			float* outputPointers[TX16W_OUTPUT_COUNT];
			for (int i = 0; i < TX16W_OUTPUT_COUNT; ++i) {
				outputPointers[i] = (outputs[i] == 0 ? 0 : outputs[i] + framesDone);
			}
			audioEngine->render(n, inputPointers, outputPointers, accumulating);
			framesDone += n;
		}
		
		double cycleCountDouble = cycleCountError + static_cast<double>(n) / sr * TX16W_CPU_CLOCK_FREQUENCY;
		int cycleCount = max(static_cast<int>(cycleCountDouble), 0);
		cycleCountError = cycleCountDouble - cycleCount;
		int didCycles;

		{
			NuXThreads::Snapshot<Floppy>::Lock floppyAccess(currentFloppy);
			Floppy& floppy = floppyAccess.get();
			
			unsigned int newIdentity = floppy.getIdentity();
			emulator->setFloppyImage(floppy.getImageDataPointer(), lastMountedFloppyIdentity != newIdentity, floppy.isReadOnly());
			lastMountedFloppyIdentity = newIdentity;
						
			didCycles = emulator->emulate(cycleCount);
			
			floppy.updateDirtyCounter(emulator->getFloppyImageDirtyCounter());
		}
		
		cycleCountError += (cycleCount - didCycles);
	}
/ Magnus

Post

malström wrote:You simply calculate the number of cycles that the hardware CPU would make in the same time span as the audio buffer. I.e. frameCount / sampleRate * cpuFrequency. Then you emulate only this number of cycles and stop until the next process call.

Naturally you can only process full clock cycles and since the ratio between host sample rate and cpu frequency isn't necessarily an integer you might want to compensate the rounding error by running an extra cycle occasionally. This can be implemented by accumulating the rounding error (the "fractional clock cycle") and adjust the integer cycle count accordingly.

In Cyclone I also synchronize various other events, like MIDI, button clicks etc by sub-dividing the process call into smaller spans depending on where the events occur on the "timeline".

Here is the inner loop from Cyclone.

Code: Select all

int framesDone = 0;
	while (framesDone < sampleFrames) {
		int n = sampleFrames - framesDone;
		
		// Timer
		
		int timerCyclesLeft = emulator->getTimerCyclesLeft();
		int timerSamplesLeft = static_cast<int>(ceil(timerCyclesLeft * (1.0 / TX16W_CPU_CLOCK_FREQUENCY) * sr));
		n = min(n, timerSamplesLeft);

		// MIDI
		
		if (framesDone >= earliestMidiByteTime && consumedMidiBytes < midiBytesCount
				&& framesDone >= midiBytes[consumedMidiBytes].when) {
			emulator->sendMidiByte(midiBytes[consumedMidiBytes].byte);
			++consumedMidiBytes;
			earliestMidiByteTime = framesDone + minSamplesBetweenMidiBytes;
		}
		if (consumedMidiBytes < midiBytesCount) {
			n = min(n, max(midiBytes[consumedMidiBytes].when, earliestMidiByteTime) - framesDone);
		}
		
		// Buttons
		
		int thisMS = static_cast<int>(msAtFrameZero + msPerSample * framesDone);
		int timeToNextButtonChange = max(nextButtonChange.msTime, earliestButtonChangeTimeMS) - thisMS;
		while (gotNextButtonChange && timeToNextButtonChange <= 0) {
			int buttonIndex = static_cast<int>(nextButtonChange.button);
			assert(0 <= buttonIndex && buttonIndex < TX16WEmulator::BUTTON_COUNT);
			pressedButtons[buttonIndex] = nextButtonChange.pressed;
			emulator->setButtons(pressedButtons);
			earliestButtonChangeTimeMS = nextButtonChange.msTime + STICKY_BUTTONS_MS;
			gotNextButtonChange = buttonChangeQueue.pop(nextButtonChange);
			timeToNextButtonChange = max(nextButtonChange.msTime, earliestButtonChangeTimeMS) - thisMS;
		}
		if (gotNextButtonChange) {
			int samplesTillNextButton = static_cast<int>(ceil(timeToNextButtonChange * sr / 1000.0));
			n = min(n, samplesTillNextButton);
		}
		
		n = max(n, 0);
		if (n > 0) {
			float* inputPointers[TX16W_INPUT_COUNT];
			if (inputsEnabled) {
				for (int i = 0; i < TX16W_INPUT_COUNT; ++i) {
					inputPointers[i] = (inputs[i] == 0 ? 0 : inputs[i] + framesDone);
				}
				audioEngine->setInputsEnabled(areInputsConnected && inputGain >= 0.00001f);
				audioEngine->setInputGain(inputMicAmpEnabled, inputGain);
			} else {
				for (int i = 0; i < TX16W_INPUT_COUNT; ++i) {
					inputPointers[i] = 0;
				}
			}
			float* outputPointers[TX16W_OUTPUT_COUNT];
			for (int i = 0; i < TX16W_OUTPUT_COUNT; ++i) {
				outputPointers[i] = (outputs[i] == 0 ? 0 : outputs[i] + framesDone);
			}
			audioEngine->render(n, inputPointers, outputPointers, accumulating);
			framesDone += n;
		}
		
		double cycleCountDouble = cycleCountError + static_cast<double>(n) / sr * TX16W_CPU_CLOCK_FREQUENCY;
		int cycleCount = max(static_cast<int>(cycleCountDouble), 0);
		cycleCountError = cycleCountDouble - cycleCount;
		int didCycles;

		{
			NuXThreads::Snapshot<Floppy>::Lock floppyAccess(currentFloppy);
			Floppy& floppy = floppyAccess.get();
			
			unsigned int newIdentity = floppy.getIdentity();
			emulator->setFloppyImage(floppy.getImageDataPointer(), lastMountedFloppyIdentity != newIdentity, floppy.isReadOnly());
			lastMountedFloppyIdentity = newIdentity;
						
			didCycles = emulator->emulate(cycleCount);
			
			floppy.updateDirtyCounter(emulator->getFloppyImageDirtyCounter());
		}
		
		cycleCountError += (cycleCount - didCycles);
	}
/ Magnus
Hi Magnus,

thank you so much for joining this discussion and for the code example !

Sure, I didn't think to "use" the buffer size... (improves cycle accuracy-timing compared to "cpuFrequency/sampleRate" ... I assume this is the reason to choose that instead of a simpler "cpuFrequency/sampleRate" , right?).

Since mine is just a question about the "theory", did you ever heard about any method in an emulator like this that uses a fully asynchronous loop independent from the main audio system rate?

Example : a method that doesn't refer to the main audio system's samplerate, or buffer size etc... It runs independently (maybe in a separate thread?!) and its output is then sampled at a regular (samplerate) rate inside the main audio processing method of audio frameworks.

Congrats for Cyclone and all other softwares :)
bruno @ Xhun Audio || www.xhun-audio.com || Twitter || Instagram
Image

Post Reply

Return to “DSP and Plugin Development”