As many probably know, the number of milliseconds in WinAPI'sshny
Sleep function is transmitted by how much we want to sleep. Therefore, the minimum that we can request is to fall asleep for 1 millisecond. But what if we want to sleep even less? For those interested in how to do this
in pictures , welcome under cat.
First, let me remind you that Windows (like any non-real-time system) does not guarantee that the thread (some people call it a thread, thread) will sleep exactly the requested time. Starting from Vista, OS logic is simple. There is a certain quantum of time allocated to the thread for execution (yes, yes, the very 20 ms that everyone heard about during the 2000 / XP and still hear about it on the server axes). And the Windows reschedules the threads (stops some threads, starts others) only after this quantum has expired. Those. if the quantum in the OS is 20 ms (by default in XP it was just such a value, for example), even if we requested
Sleep (1), in the worst case, the control will return to us in the same 20 ms. There are multimedia functions for managing this quantum of time, in particular
timeBeginPeriod / timeEndPeriod .
Second, I will make a brief digression, why such accuracy may be required. Microsoft says that only multimedia applications need this accuracy. For example, you make a new WinAMP with blackjet, and here it is very important that we send a new piece of audio data to the system on time. I needed another area. We had a H264 flow decompressor. And he was on ffmpeg'e. And he had a synchronous interface (Frame * decompressor.Decompress (Frame * compressedFrame)). And everything was fine until they decompressed the Intel chips in the processors. I don’t remember what reasons I had to work with him not through the native Intel Media SDK, but through the DXVA2 interface. And it is asynchronous. So I had to work like this:
- Copy data to video memory
- We do Sleep, so that the frame has time to expand
- Interrogate whether decompression has ended, and if so, then pick up the compressed frame from the video memory
The problem was in the second paragraph. If you believe GPUView, then the frames had time to be shrunk for 50-200 microseconds. If you put
Sleep (1) then you can maximize 1000 * 4 * (cores) = 4000 frames per second on core i5. If we assume that the usual fps is equal to 25, then it turns out that only 40 * 4 = 160 video streams are simultaneously decompressed. And the goal was to pull out 200. Actually there were 2 options: either to redo everything for asynchronous work with a hardware decompressor, or to reduce the time of Sleep.
')
First measurements
To roughly estimate the current runtime quantum of a thread, let's write a simple program:
void test() { std::cout << "Starting test" << std::endl; std::int64_t total = 0; for (unsigned i = 0; i < 5; ++i) { auto t1 = std::chrono::high_resolution_clock::now(); ::Sleep(1); auto t2 = std::chrono::high_resolution_clock::now(); auto elapsedMicrosec = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count(); total += elapsedMicrosec; std::cout << i << ": Elapsed " << elapsedMicrosec << std::endl; } std::cout << "Finished. average time:" << (total / 5) << std::endl; } int main() { test(); return 0; }
Here is a typical output on Win 8.1Starting test
0: Elapsed 1977
1: Elapsed 1377
2: Elapsed 1409
3: Elapsed 1396
4: Elapsed 1432
Finished. average time: 1518
Immediately, I want to warn you that if you have for example MSVS 2012, then std :: chrono :: high_resolution_clock you do not intend anything. And in general, we recall that the surest way to measure the duration of something is the Performance Counter. We will rewrite our code a bit to make sure that we measure the times correctly. First, let's write a helper class.
I did tests now on MSVS2015, there the implementation of high_resolution_clock is already correct, through performance counts. I do this step, all of a sudden, who wants to repeat the tests on an older compilerPreciseTimer.h #pragma once class PreciseTimer { public: PreciseTimer(); std::int64_t Microsec() const; private: LARGE_INTEGER m_freq;
Modified test function void test() { PreciseTimer timer; std::cout << "Starting test" << std::endl; std::int64_t total = 0; for (unsigned i = 0; i < 5; ++i) { auto t1 = timer.Microsec(); ::Sleep(1); auto t2 = timer.Microsec(); auto elapsedMicrosec = t2 - t1; total += elapsedMicrosec; std::cout << i << ": Elapsed " << elapsedMicrosec << std::endl; } std::cout << "Finished. average time:" << (total / 5) << std::endl; }
Well, the typical output of our program on Windows Server 2008 R2Starting test
0: Elapsed 10578
1: Elapsed 14519
2: Elapsed 14592
3: Elapsed 14625
4: Elapsed 14354
Finished. average time: 13733
We are trying to solve the problem in the forehead
Let's rewrite our program a little. And try to use the obvious:
std :: this_thread :: sleep_for (std :: chrono :: microseconds (500)) void test(const std::string& description, const std::function<void(void)>& f) { PreciseTimer timer; std::cout << "Starting test: " << description << std::endl; std::int64_t total = 0; for (unsigned i = 0; i < 5; ++i) { auto t1 = timer.Microsec(); f(); auto t2 = timer.Microsec(); auto elapsedMicrosec = t2 - t1; total += elapsedMicrosec; std::cout << i << ": Elapsed " << elapsedMicrosec << std::endl; } std::cout << "Finished. average time:" << (total / 5) << std::endl; } int main() { test("Sleep(1)", [] { ::Sleep(1); }); test("sleep_for(microseconds(500))", [] { std::this_thread::sleep_for(std::chrono::microseconds(500)); }); return 0; }
Typical output on Windows 8.1Starting test: Sleep (1)
0: Elapsed 1187
1: Elapsed 1315
2: Elapsed 1427
3: Elapsed 1432
4: Elapsed 1449
Finished. average time: 1362
Starting test: sleep_for (microseconds (500))
0: Elapsed 1297
1: Elapsed 1434
2: Elapsed 1280
3: Elapsed 1451
4: Elapsed 1459
Finished. average time: 1384
Those. as we see, there is no gain on the move. Take a closer look at
this_thread :: sleep_for . And we notice that it is generally implemented through
this_thread :: sleep_until , i.e. Unlike
Sleep, it is not even immune to translating hours, for example. Let's try to find a better alternative.
Slip that can
Searching for MSDN and stackoverflow directs us towards Waitable Timers, as the only alternative. Well, let's write another helper class.
WaitableTimer.h #pragma once class WaitableTimer { public: WaitableTimer() { m_timer = ::CreateWaitableTimer(NULL, FALSE, NULL); if (!m_timer) throw std::runtime_error("Failed to create waitable time (CreateWaitableTimer), error:" + std::to_string(::GetLastError())); } ~WaitableTimer() { ::CloseHandle(m_timer); m_timer = NULL; } void SetAndWait(unsigned relativeTime100Ns) { LARGE_INTEGER dueTime = { 0 }; dueTime.QuadPart = static_cast<LONGLONG>(relativeTime100Ns) * -1; BOOL res = ::SetWaitableTimer(m_timer, &dueTime, 0, NULL, NULL, FALSE); if (!res) throw std::runtime_error("SetAndWait: failed set waitable time (SetWaitableTimer), error:" + std::to_string(::GetLastError())); DWORD waitRes = ::WaitForSingleObject(m_timer, INFINITE); if (waitRes == WAIT_FAILED) throw std::runtime_error("SetAndWait: failed wait for waitable time (WaitForSingleObject)" + std::to_string(::GetLastError())); } private: HANDLE m_timer; };
And supplement our tests with new:
int main() { test("Sleep(1)", [] { ::Sleep(1); }); test("sleep_for(microseconds(500))", [] { std::this_thread::sleep_for(std::chrono::microseconds(500)); }); WaitableTimer timer; test("WaitableTimer", [&timer] { timer.SetAndWait(5000); }); return 0; }
Let's see what has changed.
Typical output on Windows Server 2008 R2Starting test: Sleep (1)
0: Elapsed 10413
1: Elapsed 8467
2: Elapsed 14365
3: Elapsed 14563
4: Elapsed 14389
Finished. average time: 12439
Starting test: sleep_for (microseconds (500))
0: Elapsed 11771
1: Elapsed 14247
2: Elapsed 14323
3: Elapsed 14426
4: Elapsed 14757
Finished. average time: 13904
Starting test: WaitableTimer
0: Elapsed 12654
1: Elapsed 14700
2: Elapsed 14259
3: Elapsed 14505
4: Elapsed 14493
Finished. average time: 14122
As we can see, on server-side operations on the move, nothing has changed. Since the default runtime quantum of a thread on it is usually huge. I will not look for virtual machines with XP and Windows 7, but I’ll say that most likely there will be a completely similar situation on XP, but on Windows 7 it’s like a default time slice of 1ms. Those. The new test should give the same performance as the previous tests on Windows 8.1.
Now let's take a look at the output of our program on Windows 8.1.Starting test: Sleep (1)
0: Elapsed 1699
1: Elapsed 1444
2: Elapsed 1493
3: Elapsed 1482
4: Elapsed 1403
Finished. average time: 1504
Starting test: sleep_for (microseconds (500))
0: Elapsed 1259
1: Elapsed 1088
2: Elapsed 1497
3: Elapsed 1497
4: Elapsed 1528
Finished. average time: 1373
Starting test: WaitableTimer
0: Elapsed 643
1: Elapsed 481
2: Elapsed 424
3: Elapsed 330
4: Elapsed 468
Finished. average time: 469
What do we see? That's right, that our new slip could! Those. on Windows 8.1, we have already solved our task. Why did this happen? This happened due to the fact that in windows 8.1 the time quantum was made just 500 microseconds. Yes, yes, the threads run for 500 microseconds (on my system, the default resolution is set to 500.8 microseconds and less is not set, unlike XP / Win7, where it was possible to set 500 microseconds exactly), then rescheduled again according to their priorities and run on new execution.
Conclusion 1 : To make
Sleep (0.5) is necessary, but not sufficient, the right slip. Always use the Waitable timers for this.
Conclusion 2 : If you write only under Win 8.1 / Win 10 and are guaranteed not to run on other operating systems, then you can stop using Waitable Timers.
We remove the dependence on the circumstances or how to increase the accuracy of the system timer
I have already mentioned the multimedia function timeBeginPeriod. The documentation states that using this function you can set the desired timer accuracy. Let's check. Once again we modify our program.
v3 program #include "stdafx.h" #include "PreciseTimer.h" #include "WaitableTimer.h" #pragma comment (lib, "Winmm.lib") void test(const std::string& description, const std::function<void(void)>& f) { PreciseTimer timer; std::cout << "Starting test: " << description << std::endl; std::int64_t total = 0; for (unsigned i = 0; i < 5; ++i) { auto t1 = timer.Microsec(); f(); auto t2 = timer.Microsec(); auto elapsedMicrosec = t2 - t1; total += elapsedMicrosec; std::cout << i << ": Elapsed " << elapsedMicrosec << std::endl; } std::cout << "Finished. average time:" << (total / 5) << std::endl; } void runTestPack() { test("Sleep(1)", [] { ::Sleep(1); }); test("sleep_for(microseconds(500))", [] { std::this_thread::sleep_for(std::chrono::microseconds(500)); }); WaitableTimer timer; test("WaitableTimer", [&timer] { timer.SetAndWait(5000); }); } int main() { runTestPack(); std::cout << "Timer resolution is set to 1 ms" << std::endl; // timeGetDevCaps , , // , timeBeginPeriod(1); ::Sleep(1); // ::Sleep(1); // runTestPack(); timeEndPeriod(1); return 0; }
Traditionally, the typical findings of our program.
On Windows 8.1Starting test: Sleep (1)
0: Elapsed 2006
1: Elapsed 1398
2: Elapsed 1390
3: Elapsed 1424
4: Elapsed 1424
Finished. average time: 1528
Starting test: sleep_for (microseconds (500))
0: Elapsed 1348
1: Elapsed 1418
2: Elapsed 1459
3: Elapsed 1475
4: Elapsed 1503
Finished. average time: 1440
Starting test: WaitableTimer
0: Elapsed 200
1: Elapsed 469
2: Elapsed 442
3: Elapsed 456
4: Elapsed 462
Finished. average time: 405
Timer resolution is set to 1 ms
Starting test: Sleep (1)
0: Elapsed 1705
1: Elapsed 1412
2: Elapsed 1411
3: Elapsed 1441
4: Elapsed 1408
Finished. average time: 1475
Starting test: sleep_for (microseconds (500))
0: Elapsed 1916
1: Elapsed 1451
2: Elapsed 1415
3: Elapsed 1429
4: Elapsed 1223
Finished. average time: 1486
Starting test: WaitableTimer
0: Elapsed 602
1: Elapsed 445
2: Elapsed 994
3: Elapsed 347
4: Elapsed 345
Finished. average time: 546
And on Windows Server 2008 R2Starting test: Sleep (1)
0: Elapsed 10306
1: Elapsed 13799
2: Elapsed 13867
3: Elapsed 13877
4: Elapsed 13869
Finished. average time: 13143
Starting test: sleep_for (microseconds (500))
0: Elapsed 10847
1: Elapsed 13986
2: Elapsed 14000
3: Elapsed 13898
4: Elapsed 13834
Finished. average time: 13313
Starting test: WaitableTimer
0: Elapsed 11454
1: Elapsed 13821
2: Elapsed 14014
3: Elapsed 13852
4: Elapsed 13837
Finished. average time: 13395
Timer resolution is set to 1 ms
Starting test: Sleep (1)
0: Elapsed 940
1: Elapsed 218
2: Elapsed 276
3: Elapsed 352
4: Elapsed 384
Finished. average time: 434
Starting test: sleep_for (microseconds (500))
0: Elapsed 797
1: Elapsed 386
2: Elapsed 371
3: Elapsed 389
4: Elapsed 371
Finished. average time: 462
Starting test: WaitableTimer
0: Elapsed 323
1: Elapsed 338
2: Elapsed 309
3: Elapsed 359
4: Elapsed 391
Finished. average time: 344
Let's analyze the interesting facts that are visible from the results:
- On windows 8.1, nothing has changed. We conclude that timeBeginPeriod is smart enough, i.e. if N applications have requested permission of the system timer to different values, then this resolution will not be reduced. On Windows 7, we would also not notice any changes, since there the resolution of the timer is already 1 ms.
- On the server OS, timeBeginPeriod (1) worked unexpectedly: it set the resolution of the system timer to the highest possible value. Those. On such operating systems, a workplace of the type is clearly sewn somewhere:
void timeBeginPerion(UINT uPeriod) { if (uPeriod == 1) { setMaxTimerResolution(); return; } ... }
I note that this was not yet on Windows Server 2003 R2. This is an innovation in the 2008m server.
- On server OSes, Sleep (1) also worked in unexpected ways. Those. Sleep (1) is interpreted on server operating systems, starting from the 2008th server, not as " make a pause of 1 millisecond ", but as " make the minimum possible pause ." Next is the case that this statement is not true.
We continue our conclusions:
Conclusion 3 : If you write only under Win Server 2008/2012/2016 and are guaranteed not to run on other operating
systems , then you can not bother at all,
timeBeginPeriod (1) and the subsequent
Sleep (1) will do everything you need.
Conclusion 4 :
timeBeginPeriod for our purposes is good only for server axes. but sharing it with Waitable Devices covers our task on Win Server 2008/2012/2016 and on Windows 8.1 / Windows 10
What if we want everything at once?
Let's think about what we should do if we need Sleep (0.5) to work under Win XP / Win Vista / Win 7 / Win Server 2003.
Only the native api will come to the rescue - the undocumented api that is accessible to us from the user space via ntdll.dll. There are interesting functions NtQueryTimerResolution / NtSetTimerResolution.
Let's write the AdjustSystemTimerResolutionTo500mcs function. ULONG AdjustSystemTimerResolutionTo500mcs() { static const ULONG resolution = 5000;
To make the code compile, add the ads of the necessary functions. #include <winnt.h> #ifndef NT_ERROR #define NT_ERROR(Status) ((((ULONG)(Status)) >> 30) == 3) #endif extern "C" { NTSYSAPI NTSTATUS NTAPI NtSetTimerResolution( _In_ ULONG DesiredResolution, _In_ BOOLEAN SetResolution, _Out_ PULONG CurrentResolution); NTSYSAPI NTSTATUS NTAPI NtQueryTimerResolution( _Out_ PULONG MaximumResolution, _Out_ PULONG MinimumResolution, _Out_ PULONG CurrentResolution); } #pragma comment (lib, "ntdll.lib")
Typical output from Windows 8.1Starting test: Sleep (1)
0: Elapsed 13916
1: Elapsed 14995
2: Elapsed 3041
3: Elapsed 2247
4: Elapsed 15141
Finished. average time: 9868
Starting test: sleep_for (microseconds (500))
0: Elapsed 12359
1: Elapsed 14607
2: Elapsed 15019
3: Elapsed 14957
4: Elapsed 14888
Finished. average time: 14366
Starting test: WaitableTimer
0: Elapsed 12783
1: Elapsed 14848
2: Elapsed 14647
3: Elapsed 14550
4: Elapsed 14888
Finished. average time: 14343
Timer resolution is set to 1 ms
Starting test: Sleep (1)
0: Elapsed 1175
1: Elapsed 1501
2: Elapsed 1473
3: Elapsed 1147
4: Elapsed 1462
Finished. average time: 1351
Starting test: sleep_for (microseconds (500))
0: Elapsed 1030
1: Elapsed 1376
2: Elapsed 1452
3: Elapsed 1335
4: Elapsed 1467
Finished. average time: 1332
Starting test: WaitableTimer
0: Elapsed 105
1: Elapsed 394
2: Elapsed 429
3: Elapsed 927
4: Elapsed 505
Finished. average time: 472
Typical output from Windows Server 2008 R2Starting test: Sleep (1)
0: Elapsed 7364
1: Elapsed 14056
2: Elapsed 14188
3: Elapsed 13910
4: Elapsed 14178
Finished. average time: 12739
Starting test: sleep_for (microseconds (500))
0: Elapsed 11404
1: Elapsed 13745
2: Elapsed 13975
3: Elapsed 14006
4: Elapsed 14037
Finished. average time: 13433
Starting test: WaitableTimer
0: Elapsed 11697
1: Elapsed 14174
2: Elapsed 13808
3: Elapsed 14010
4: Elapsed 14054
Finished. average time: 13548
Timer resolution is set to 1 ms
Starting test: Sleep (1)
0: Elapsed 10690
1: Elapsed 14308
2: Elapsed 768
3: Elapsed 823
4: Elapsed 803
Finished. average time: 5478
Starting test: sleep_for (microseconds (500))
0: Elapsed 983
1: Elapsed 955
2: Elapsed 946
3: Elapsed 937
4: Elapsed 946
Finished. average time: 953
Starting test: WaitableTimer
0: Elapsed 259
1: Elapsed 456
2: Elapsed 453
3: Elapsed 456
4: Elapsed 460
Finished. average time: 416
It remains to make observations and conclusions.
Observations:
- On Win8, after the first launch of the program, the resolution of the system timer was reset to a large value. Those. Conclusion 2 was made by us wrong.
- After manual installation, the spread of real slips for the case of the WaitableTimer has grown, even though the average slip is about 500 microseconds.
- On the server OS, Sleep (1) very unexpectedly stopped working (as did this_thread :: sleep_for ) compared to the case of timeBeginPeriod . Those. Sleep (1) began to work as it should, in the sense of " pause 1 millisecond ."
Final conclusions
- Conclusion 1 remained unchanged: To make Sleep (0.5) is necessary, but not sufficient, the correct slip. Always use the Waitable timers for this.
- Conclusion 2 : The resolution of the system timer on Windows depends on the type of Windows, on the version of Windows, on the currently running processes, on what processes could be performed before. Those. nothing can be stated or guaranteed! If you need any guarantees, then you need to always request / set the necessary accuracy yourself. For values ​​less than 1 millisecond, you need to use native api. For larger values, it is better to use timeBeginPeriod .
- Conclusion 3 : If possible, it is better to test the code not only on your working Win 10, but also on the one specified by the customer. We must remember that server operating systems can be very different from the desktop