The perfect Sleep() function

13^th February, 2023

In my first blog post back in 2020 (here) I tried to design an accurate Sleep function for use in games and other real-time applications. The preciseSleep function I came up with was fine, but I’ve come up with simpler and better ones since then.

For reference, here’s the old function again:

#include <chrono>
#include <thread>
#include <math.h>
using namespace std;
using namespace chrono;

void preciseSleep(double seconds) {
    static double estimate = 5e-3;
    static double mean = 5e-3;
    static double m2 = 0;
    static int64_t count = 1;
    while (seconds > estimate) {
        auto start = high_resolution_clock::now();
        this_thread::sleep_for(milliseconds(1));
        auto end = high_resolution_clock::now();
        double observed = (end - start).count() / 1e9;
        seconds -= observed;
        ++count;
        double delta = observed - mean;
        mean += delta / count;
        m2 += delta * (observed - mean);
        double stddev = sqrt(m2 / (count - 1));
        estimate = mean + stddev;
    }
    auto start = high_resolution_clock::now();
    auto spinNs = (int64_t)(seconds * 1e9);
    auto delay = nanoseconds(spinNs)
    while (high_resolution_clock::now() - start < delay);
}

It models the inherent sleep delay as a normally distributed random variable. Using this model, it tries to guess how long a 1 millisecond sleep will actually last, and will only sleep if it’s safe. This works pretty well. It’s precise and power efficient. But there are a few problems with it.

First off, this function isn’t reliable. If you get unlucky, the first sleep can randomly delay longer than average, but since that’s the only datapoint, the model will conclude that sleeping is never safe, and that we should always spin. Spinning unnecessarily is bad. The chances of getting unlucky like this might seem small, but it can definitely happen if the computer is under heavy load.

There are ways to patch this up, but why bother when the core idea is flawed.

The OS scheduler isn’t random#

At least on Windows, the thread scheduler runs on a periodic interrupt signal. If you put your thread to sleep, it won’t wake up until the next interrupt. This is what causes the random looking sleep delays. Assuming the scheduler runs once per millisecond, here’s a possible timeline when you call Sleep(1ms).

The thread wake up times are always rounded up to the next scheduler tick. So a thread that calls sleep 0.75 milliseconds into a tick will wake up in 1.25 milliseconds. The sleep delays aren’t random at all, they’re determined by where in the scheduler tick you called sleep. If you call Sleep(T) randomly through the tick, your actual sleep times will end up uniformly distributed between T and T + SCHEDULER_PERIOD.

On Windows, we can directly control the scheduler period using timeBeginPeriod. The default period is something around 15.6ms, but calling timeBeginPeriod(1) at the start of your main function will change it to 1ms for your program. I’ve already shown (here) how a lower scheduler period greatly improves sleeping precision, and now it should be obvious why.

Robust sleep#

The previous preciseSleep tries to guess the scheduler period, but if you call timeBeginPeriod(1) at the start of your program, then you no longer have to guess. You can just assume the worst case, which is that your sleep will take exactly 1 scheduler period longer than you asked. This greatly simplifies the code.

#include <thread>
#include <chrono>
#include <windows.h>
using namespace std;
using namespace chrono;
#define PERIOD 1
#define TOLERANCE 0.02

void robustSleep(double seconds) {
    auto t0 = high_resolution_clock::now();
    auto target = t0 + nanoseconds(int64_t(seconds * 1e9));

    // sleep
    double ms = seconds * 1000 - (PERIOD + TOLERANCE);
    int ticks = (int)(ms / PERIOD);
    if (ticks > 0)
        this_thread::sleep_for(milliseconds(ticks * PERIOD));

    // spin
    while (high_resolution_clock::now() < target)
        YieldProcessor();
}

int main() {
    #pragma comment(lib, "winmm.lib") // for timeBeginPeriod
    timeBeginPeriod(PERIOD);
    // ...
}

Rounding the sleep time down to the nearest scheduler tick cancels out the rounding up that the scheduler does, and you won’t ever overshoot your sleep.

This sleep is as precise as a spin loop. And because it always sleeps for the longest possible safe time, the CPU usage is also optimal. I’ve been using it in my games for over a year, and the frame pacing has been rock solid. Before when I was using preciseSleep there would still be occasional hiccups.

I don’t think you can make a better sleep function using only the system Sleep. But in recent versions of Windows 10, we got a new tool for precise sleeping.

High resolution timer#

The CREATE_WAITABLE_TIMER_HIGH_RESOLUTION flag was added to CreateWaitableTimerEx in Windows 10, version 1803. Passing this flag creates a timer that ticks at a different rate from the scheduler’s timer. Empirically, this timer seems to have a period of about 1ms. So how is this any better than calling timeBeginPeriod(1) and then sleep?

Calling timeBeginPeriod changes the global timer interrupt for all programs. So in exchange for making your program more power efficient by allowing more precise sleeps, it makes every other program less power efficient. This can still end up a net positive, but a high resolution timer doesn’t have this problem since it is independent from the global timer.

Another benefit of the high resolution times is that the requested sleep time is not rounded to milliseconds. You can set the timer for 0.7 ms, or 12.1 μs. So even if you do call timeBeginPeriod, the timer can be more power efficient because of this increased resolution.

The downside of the high resolution timer is that it’s new. It’s behavior is still mostly undocumented. For example, the actual resolution of the timer isn’t stated anywhere as far as I know. Also, it has a quirk that if you request a sleep period longer than the system timer period, the precision of the timer plummets.

So it’s better to split one long sleep into multiple sleeps that are shorter than the scheduler period. That’s a bit annoying, because now you need to know what the scheduler period is. Or you can just give up some power efficiency and set it with timeBeginPeriod(1).

#include <thread>
#include <chrono>
#include <windows.h>
using namespace std;
using namespace chrono;
#define PERIOD 1
#define TOLERANCE 1'020'000

void timerSleep(double seconds) {
    auto t = high_resolution_clock::now();
    auto target = t + nanoseconds(int64_t(seconds * 1e9));

    static HANDLE timer;
    if (!timer)
        timer = CreateWaitableTimerExW(NULL, NULL, 
            CREATE_WAITABLE_TIMER_HIGH_RESOLUTION, 
            TIMER_ALL_ACCESS);

    int64_t maxTicks = PERIOD * 9'500;
    for (;;) {
        int64_t remaining = (target - t).count();
        int64_t ticks = (remaining - TOLERANCE) / 100;
        if (ticks <= 0)
            break;
        if (ticks > maxTicks)
            ticks = maxTicks;

        LARGE_INTEGER due;
        due.QuadPart = -ticks;
        SetWaitableTimerEx(timer, &due, 0, NULL, NULL, NULL, 0);
        WaitForSingleObject(timer, INFINITE);
        t = high_resolution_clock::now();
    }

    // spin
    while (high_resolution_clock::now() < target)
        YieldProcessor();
}

int main() {
    #pragma comment(lib, "winmm.lib") // for timeBeginPeriod
    timeBeginPeriod(PERIOD);
    // ...
}

Here the maxTicks value clamps the amount of time to sleep to 95% of the scheduler period to avoid the quirk.

timerSleep is definitely more complicated than robustSleep. But let’s see how they both compare to the old preciseSleep.

Results: accuracy#

Here’s a plot of the average sleep error over 10000 calls for all of the relevant sleep functions. The scheduler period was set to 1ms for this test. I specifically tested sleep times that would be useful for games and other real-time applications (1/60 sec, 1/144 sec, …).

As you can see, all of the functions have a negligible error. The timer sleep seems to spike up at some point, but that error is still only around 6 μs. You can see that for longer sleeps, preciseSleep error rises slowly but surely. Although with a small scheduler period this is perfectly tolerable. robustSleep completely overlaps with a spin loop that I included as a control - that’s how robust it is.

Let’s see what happens when the scheduler period is increased to 10ms.

Now these results are a lot more interesting. You can see that the old preciseSleep is completely unusable at this larger scheduler period, with an average error of 100 μs for longer sleep times. robustSleep still overlaps the spin loop almost completely, but it has an error spike around 1/30 second sleeps. I’m not quite sure why, but that can be fixed by tweaking the TOLERANCE parameter. timerSleep has errors of less than 1 μs regardless of sleep times.

Of course, the reason preciseSleep and robustSleep are able to attain such high precision even at higher scheduler periods is because they spend most of their time spinning. Let’s take a look at CPU usage next.

Results: CPU usage#

For these tests, I simply measured what proportion of time each sleep function spends in a spin loop as opposed to actually sleeping. I couldn’t use GetProcessTimes like last time because it kept reporting 0% CPU usages for some reason, even when the functions spent half of their time spinning.

Either way, here is the percentage of time spent spinning when the scheduler period was set to 1 ms:

All functions spend a decent amount of time actually sleeping, even at lower sleep times. preciseSleep and robustSleep have very similar CPU usage, with preciseSleep having a slight edge. timerSleep clearly has the lowest CPU usage, usually around 2x lower than the other two.

And now let’s look at what happens with a 10ms scheduler period.

Both preciseSleep and robustSleep mostly just spin, even for higher sleep times. robustSleep is especially bad here, because it’s still trying to be perfectly precise, and it’s not safe to sleep at all for any time less than 20ms. timerSleep on the other hand, has basically the exact same power usage curve that it had with a 1ms scheduler period in the previous graph. It even manages to sleep 50% of the time for sleeps of only 1ms.

So the high resolution timer is really good even when the scheduler period is low, but it completely destroys the other sleep methods when the scheduler period is high.

You can download my testing code and raw data here.

Conclusion#

If you can afford to target only more recent versions of Windows 10, you should probably use the new high resolution timer for sleeping in your main loop. It’s very precise, and extremely power efficient. You could also try to avoid using timeBeginPeriod to set the scheduler period, or at the very least try to not lower it beyond 8ms. The people working on Chrome found that this greatly improves power usage, and anyone running your software will be grateful for the reduced heat and fan noise.

If you want the most robust, precise sleeping method possible, or if you still want to support older versions of Windows, or you just want to keep things simple, then use the robustSleep function along with timeBeginPeriod(1). I don’t think anything better than that is possible.

It is also possible to combine the two approaches and get the best of both worlds. First try to create a high resolution timer with the new flag, and if that fails you know it’s not supported by the runtime, and you can fall back to robustSleep. I have a copy-pastable snippet of code on github that does exactly this.