📜 ⬆️ ⬇️

Worst API ever created

A detailed look at logging context switching using the Event Tracing API for Windows.


In response to the post last week, I received the following email:
Maybe I’m a little late with a question about your recent post, but just in case, I’ll ask: do you have any methods (strategies) of working with an external library that you can’t get rid of and that violates some (or all ) principles of writing API design (most likely, refer to the principles, recommendations, which are described in the previous article of the author. Approx. transl. )? Maybe there are some stories? This is a vague question, but I’m just asking about any experience (as a user) of using an API that I really remember.

- Michael Bartnett

It reminded me that, really, I always wanted to describe in detail the steps necessary to use a bad API - just to highlight how bad it can be for a developer. I don’t think that the people who develop the API really understand how important it is to do it correctly and how much unnecessary work hundreds, thousands and sometimes millions of other developers do when incorrect, erroneous API development. So, I felt it was important to write an article that would show how much unnecessary work a written API can call.

It would probably be a good article in a weekly review of bad APIs. But, since I don’t have time for something like this and I can only make out one API, the most important question arises: which API should I choose?
')

Event Tracing API for Windows




Now is a great time in the history of computing for writing an article about a bad API (on the other hand, it can be said that this is the worst time to make a living using programming). There are so many bad APIs now that I can randomly select one API and most likely I will find enough problems to write an article of 3000 words. But if I was going to choose one, separate operation in one API, then my choice seems reasonable among all APIs that I have ever used.

Currently, there are a lot of APIs that make a lot of effort to get into the top-ranking "worst APIs". For example, CSS, with the appearance of a new version, can take half the places in the top 10 ranking for the year. In times of its popularity, DirectShow certainly dominated the ranking of its era. In the new generation, newcomers such as the Android SDK, along with the development tools, demonstrate real potential for their intricacies, so the quality of the API, when called from C ++ code, is the last thing you're worried about.

But when I thought long and hard about who was the winner in the “hard category of bad API” - there was one real one - Event Tracing API for Windows.

The Event Tracing API for Windows is an API that does something very simple: it allows any component of the system (including regular applications) to notify about "events" that can be received ("absorbed") by any other component. This is a logging system that is used to record the performance and debugging information of any component, starting with the system core.

Now, usually, there is no reason for game developers to use Event Tracing API for Windows directly. You can use utilities such as PerfMon to view information about your game, such as how many working sets (working set) it uses or how hard it works with the disk. But there is one specific thing that only provides direct access to the Event Tracing API: the ability to track the context switching time.

Yes, if you have a fairly recent version of Windows (for example, 7 or 8), the OS kernel will log all context switches, including CPU timestamp. You can actually relate them to your own profiling in the game. This is incredibly useful information (from the category of information that can only be obtained directly from the "iron"). This is the reason why such utilities as RAD , Telemetry can show you when the streams you started were interrupted and have to wait until the system itself does its work; something that can be critically important for debugging weird performance issues.

It sounds very good. I mean that the context switching time is very significant information and even if the API is not of the best quality, it’s still very cool, isn't it?

Is not it?

First of all - write an example of using the API



Before we take a look at the real Event Tracing API for Windows, I want to do step by step what I said at the lecture last week: first write a usage example. Whenever you evaluate the quality of an API, or create a new API, you always, always, ALWAYS have to start by writing some code as if you are a user who is trying to do the things for which your API is intended. If there are no restrictions, this is the only way to get a good and clean look at the future of how the API will work. It would be "magical." And then, when you have some examples of use, You can move forward and start thinking about practical problems and about the best way for you to implement.

So, if I were a developer without any knowledge of Event Tracing API for Windows, how would I want to get a list of context switches? Well, 2 ways come to mind.

The simplest approach would be like:

//    etw_event_trace Trace = ETWBeginTrace(); ETWAddEventType(Trace, ETWType_ContextSwitch); //    event EventBuffer[4096]; int EventCount; while(EventCount = ETWGetEvents(Trace, sizeof(EventBuffer), EventBuffer)) { {for(int EventIndex = 0; EventIndex < EventCount; ++EventIndex) { assert(EventBuffer[EventIndex].Type == ETWType_ContextSwitch); //  EventBuffer[EventIndex].ContextSwitch }} } //    ETWEndTrace(Trace); 

which will lead to an API that will look, for example, like this:

 enum etw_event_type { ETWType_None, ETWType_ContextSwitch, ... ETWType_Count, }; struct etw_event_context_switch { int64_t TimeStamp; uint32_t ProcessID; uint32_t FromThreadID; uint32_t ToThreadID; }; struct etw_event { uint32_t Type; // event_type union { etw_event_context_switch ContextSwitch; ... }; }; struct etw_event_trace { void *Internal; }; event_trace ETWBeginTrace(void); void ETWAddEventType(event_trace Trace, event_type); int ETWGetEvents(event_trace Trace, size_t BufferSize, void *Buffer); void ETWEndTrace(event_trace Trace); 


This is one way to do this. Very simple, easy to understand. It's hard enough to make a mistake. If someone went through a debugger, he would see exactly what was going on and you simply could say what went wrong.

However, I can imagine a situation where the critical performance code would not want to pay for copying from the kernel buffer to your buffer - what this API requires ( ETWGetEvents should copy events from some internal OS buffer, since they need to be taken from somewhere) . The version will be a little more difficult to take some displayed memory using the API, which you use as a buffer for reading:

 //    etw_event_trace Trace = ETWBeginTrace(4096*sizeof(etw_event)); ETWAddEventType(Trace, ETWType_ContextSwitch); //    etw_event_range Range; while(ETWBeginEventRead(Trace, &Range)) { {for(etw_event *Event = Range.First; Event != Range.OnePastLast; ++Event) { assert(Event->Type == ETWType_ContextSwitch); //  Event->ContextSwitch }} ETWEndEventRead(Trace, &Range); } //    ETWEndTrace(Trace); 


All I did here was to change the return mechanism: instead of copying, a pointer to a certain block (a “ranged pointer” means a pointer to a certain range. In general, a pointer that knows where the data it points to ends. Note . ). In ETWBeginTrace , the user transmits the maximum number of events, which affects the buffer size and the kernel allocates a memory area (in the user address space) that is sufficient for the specified number of events. Then the system, if it can, writes directly to the allocated buffer and, thus, avoids unnecessary copying. When the user calls ETWBeginEventRead() , pointers are returned to the beginning and end of some part of the memory for the events. Since the buffer will be processed as a ring buffer, the user expects to be able to loop through all the received events in the event that there are more than one event. I added the “end of reading” call, as some implementations may require the kernel to know what part of the buffer the user is viewing, thus avoiding writing to a memory that is actively being read. I don’t really know if such things are needed at all, but if you want to get basic information and give the kernel maximum flexibility to implement, this version definitely supports more possible implementations than the version with ETWGetEvents() .

The API will be updated, for example, like this:

 struct etw_event_range { etw_event *First; etw_event *OnePastLast; }; event_trace ETWBeginTrace(size_t BufferSize); int ETWBeginEventRead(event_trace Trace, etw_event_range *Range); void ETWEndEventRead(event_trace Trace, etw_event_range *Range); 


If you really want to, you can even support both versions of reading events using the same API - you just need to allow the ETWGetEvents() call. Also, to supplement the API with error messages, it would be nice to have something like this:

 bool ETWLastGetEventsOverflowed(event_trace Trace); 

so that after each call to ETWGetEvents() be able to check, and not too many events have occurred since the last check?

To each his own, but I think that most developers will not have any problems with the API that I just proposed. Everyone has their own tastes and I am sure that everyone will notice something that he does not like, but I doubt that someone will say that the API is terrible. The API is pretty simple and I think most developers will be able to easily embed this API into their code without too much thought.

The reason why the API is so simple is not because I used a lot of API development experience to subtly show my vision of a good API. On the contrary. The API is simple, because the problem it is designed to solve is elementary. How to move data from one place to another is, in fact, the simplest problem for the API that may arise in the system. This is a celebrated memcpy() .

But it is the simplicity of the task that makes the Event Tracing API for Windows shine. Even if all that needs to be done is to move the memory from one place to another, using this API causes all sorts of difficulties that you can imagine.

“Run” tracking



I don't know how anyone wants to start teaching how to use the Event Tracing API for Windows. Maybe there are good examples teaching this that I just never met. I had to put together pieces of code taken from various scraps of documentation during many hours of experiments. Every time I figured out one more step in the overall process, I thought: “Wait, seriously ??”. And every time Microsoft implicitly answered: "Seriously."

If I tell you how to use the API, then you lose the opportunity to experience a screaming experience, so I will say that if you want to feel everything in full, interrupt the reading and try to get the context switching time yourself. I can guarantee that you will receive hours of unbridled fun and anxiety. Those of you who prefer to save time, avoiding heaps of obscure moments - just read on.

OK, let's start. The equivalent of my function ETWBeginTrace() is the call proposed by Microsoft, StartTrace() . At first glance, it seems rather harmless:
 ULONG StartTrace(TRACEHANDLE *SessionHandle, char const *SessionName, EVENT_TRACE_PROPERTIES *Properties); 


However, when you look at what needs to be transferred to the place of the Properties parameter, things get a little more complicated. The structure EVENT_TRACE_PROPERTIES , defined by Microsoft, looks like this:

 struct EVENT_TRACE_PROPERTIES { WNODE_HEADER Wnode; ULONG BufferSize; ULONG MinimumBuffers; ULONG MaximumBuffers; ULONG MaximumFileSize; ULONG LogFileMode; ULONG FlushTimer; ULONG EnableFlags; LONG AgeLimit; ULONG NumberOfBuffers; ULONG FreeBuffers; ULONG EventsLost; ULONG BuffersWritten; ULONG LogBuffersLost; ULONG RealTimeBuffersLost; HANDLE LoggerThreadId; ULONG LogFileNameOffset; ULONG LoggerNameOffset; }; 


where, in turn, the first data member is also a structure that expands to the following:

 struct WNODE_HEADER { ULONG BufferSize; ULONG ProviderId; union { ULONG64 HistoricalContext; struct { ULONG Version; ULONG Linkage; }; }; union { HANDLE KernelHandle; LARGE_INTEGER TimeStamp; }; GUID Guid; ULONG ClientContext; ULONG Flags; }; 

A quick glance at this mass of strange data raises only questions: why are there such members as "EventsLost" and "BuffersWritten" (“number of "BuffersWritten" events” and “number of recorded buffers”, respectively, from the documentation. Approx. Transl. ) ? The reason is as follows: instead of making different data structures for different operations that can be used to track events, Microsoft has grouped API functions into several groups, and all functions in each group share a combined structure for parameters. Therefore, instead of the user getting a clear idea of ​​what is being fed into the input and returning from a function, simply looking at the parameters of the function - it should depend entirely on the MSDN documentation for each API, and hope that the documentation correctly lists which members of the giant Parameter structures are used for each call and when they are intended to transfer input and output parameters.

Of course, since there are so many different ways to use this structure, Microsoft requires that you completely nullify this huge beast before using:

 EVENT_TRACE_PROPERTIES SessionProperties = {0}; 

For the StartTrace() function, if you just want to receive the data directly and will not log them to a file, you need to fill in some members. These two have some meaning:

 SessionProperties.EnableFlags = EVENT_TRACE_FLAG_CSWITCH; SessionProperties.LogFileMode = EVENT_TRACE_REAL_TIME_MODE; 

The member EnableFlags says we want to get it. We want a context switch, so we set this flag. And now let's see what happens when there are more than 32 events from one provider? I do not know, but I assume that Microsoft was not particularly concerned about this opportunity. But I was, that's why I used enum in my sentence, since it supports billions of types of events. But wait, "32 types of events should be enough for everyone" - so Microsoft offered a 32-bit flag field. There are no problems, but this is definitely a sign of a close-minded thinking, which leads to things like duplication of functions with an Ex prefix at the end of the name (“Ex” from “Extended” is an extended version of the function. Approx. Transl. ).

LogFileMode determines whether we want to receive events directly or we want the kernel to write them to disk. Since these are completely different operations, I would like to break them into calls for different functions, but, wait a minute, we have one big structure for everything - we drop everything there.

Things get a little weird with this field:

 SessionProperties.Wnode.ClientContext = 1; 

What does this record mean? Well, terribly called "ClientContext" ("Context of the "ClientContext" . Approx. Translation ), actually refers to the way in which you want to get the time of events. "TimestampType" ("TileTimeType". Approx. Transl. ) Would be a little clearer, but not important. Real fun is a simple "1" on the right.

It turns out that there is a set of values ​​that ClientContext can accept, but Microsoft does not always give them names. So you just have to read the documentation and remember that 1 means that time comes from QueryPerformanceCounter , 2 means “system time”, and 3 means the number of CPU cycles.

In case this is not obvious, there are many reasons why a publicly available API should never do that. In the hidden part of the implementation, from time to time, I will act in a similar way, in situations where, say, you need to use -1 and -2 for some intricate indexing scheme. But for an API that is literally used by millions of developers, you should always give names to your constants.

First, it makes the code more readable. No one knows what a ClientContext with a value of "1" , but the value of USE_QUERY_PERFORMANCE_COUNTER_TIMESTAMPS will be crystal clear. Secondly, it makes the code searchable. No one can normally search for “1” in the code base, but USE_QUERY_PERFORMANCE_COUNTER_TIMESTAMPS is easily searched. You might think: “well, no problem, I’ll look for ClientContext = 1 ”, but remember that more complex use of this API may include the use of variables, for example: ". . .ClientContext = TimestampType;" . Third, the code will not compile for later versions of the SDK, where some things have changed. For example, if you decide to disable the use of USE_QUERY_PERFORMANCE_COUNTER_TIMESTAMPS , they can remove the definition ( #define ) of this constant and make USE_QUERY_PERFORMANCE_COUNTER_TIMESTAMPS_DEPRECATED . After such changes, the old code will not compile with the new version of the SDK and the developer will look at the new documentation and, thus, will see what he should use in return.

Etc., etc., etc.

Perhaps the most annoying field we have to fill:

 SessionProperties.Wnode.Guid = SystemTraceControlGuid; 


GUID tells who is trying to track events. In our case, we are trying to take data from the kernel log and SystemTraceControlGuid - a globally defined GUID that points specifically to this log. I’m sure that a better name could be given for this GUID , but this is an insignificant problem compared to the fact that if you try to compile this line of code, you will see that the linker cannot find SystemTraceControlGuid .

This happens, of course, because the GUID so large that Microsoft could not have found a way to embed them in the header files (I can count a few possible ways, but I suppose they didn’t like any of them) So Microsoft forces you to select one file in your project, in which, the Windows header files will embed GUID definitions. In order to do this, you must write something like this:

 #define INITGUID //   SystemTraceControlGuid  evntrace.h. #include <windows.h> #include <strsafe.h> #include <wmistr.h> #include <evntrace.h> #include <evntcons.h> 

So, now you have to carefully choose where you do it - perhaps create a new file in your project, where all GUID will be located - everyone can refer to them (or some kind of nonsense there). In general, you can not identify them twice.

But whatever it is, we are almost done with filling out the structure. All we need is to deal with the SessionName parameter, which we have to pass as a string, right? Since this is just a session name, I guess I can just do the following:

 ULONG Status = StartTrace(&SessionHandle, "CaseyTown!!!", &SessionProperties); 

because it's a cool session title, don't you think so?

But alas, things don't work that way. It turns out that despite the fact that you have already specified a GUID for SessionProperties , which indicates that the kernel is the source of events - you also need to specify the predefined constant KERNEL_LOGGER_NAME as the name of the session. Why? Well, this is because there is a small surprise that I will save for you so that you can enjoy the intrigue of everything that happens.

Ok so start:

 ULONG Status = StartTrace(&SessionHandle, KERNEL_LOGGER_NAME, &SessionProperties); 

It looks good, right? WRONG.

It turns out that, despite the fact that the SessionName string SessionName passed as the second parameter, this is just a “convenient” feature. In fact, SessionName should be implemented directly in the SessionProperties , but since Microsoft does not want to limit the maximum length of the string name, it was decided to just go ahead and package this string after the EVENT_TRACE_PROPERTIES structure. Which means that, in fact, you can NOT do this:

 EVENT_TRACE_PROPERTIES SessionProperties = {0}; 

And should do so:

 ULONG BufferSize = sizeof(EVENT_TRACE_PROPERTIES) + sizeof(KERNEL_LOGGER_NAME); EVENT_TRACE_PROPERTIES *SessionProperties =(EVENT_TRACE_PROPERTIES*) malloc(BufferSize); ZeroMemory(SessionProperties, BufferSize); SessionProperties->LoggerNameOffset = sizeof(EVENT_TRACE_PROPERTIES); 


Yes, everything is correct - every user of the Event Tracing API for Windows must do arithmetic calculations and manually arrange the structure in a packaged format. I cannot imagine why the name should be packaged in this way, but I am sure that if you need everyone to do this, you must provide an auxiliary macro or function that will do the right things for the user and save him from understanding your weird data packaging logic.

But wait, at least you don't need to copy the name yourself! Microsoft decided that the StartTrace() function of this API would copy the name into the structure for you, since in the end, the name is passed as the second parameter.

Well, this is a beautiful gesture, but not so accepted in practice. It turns out that the forced transfer of KERNEL_LOGGER_NAME as a SessionName — along with the GUID — is not, after all, superfluous, and this is the kind of surprise I mentioned. The real reason why the SessionName parameter should be set to KERNEL_LOGGER_NAME is that Windows allows you to have only one session in the system — in general, a session that reads events from the SystemTraceControlGuid . Other GUID can have multiple sessions, but not SystemTraceControlGuid . So, in fact, when you transfer KERNEL_LOGGER_NAME - you say that you want to have one, unique session that can exist in the system at any time with SystemTraceControlGuid GUID . If someone has already started this session - your attempt to start it - will fail.

Huh, everything is getting better. The session name is global for the entire operating system and does not close automatically upon abnormal termination of the process that started the session. So, if you wrote the code that calls StartTrace() , but somewhere a bug went through - no matter how - and your program crashed - KERNEL_LOGGER_NAME - the session will still work! And, when you try to start your program again, possibly after fixing the bug, your attempt to call StartTrace() will fail with ERROR_ALREADY_EXISTS .

So, in principle, the StartTrace() call, which will helpfully copy SessionName into a structure for you - in rare cases is the first call you want to do, in any case. You will most likely make the following call:

 ControlTrace(0, KERNEL_LOGGER_NAME, SessionProperties, EVENT_TRACE_CONTROL_STOP); 

This call terminates any existing session and the subsequent StartTrace() call will succeed. But, of course, ControlTrace() does not copy the session name as StartTrace() does, which means that, in practice, you have to do it yourself, since the StartTrace() call comes after the ControlTrace() call!

 StringCbCopy((LPSTR)((char*)pSessionProperties + pSessionProperties->LoggerNameOffset), sizeof(KERNEL_LOGGER_NAME), KERNEL_LOGGER_NAME); 


This is insane, but the consequences of all this are even more insane. If you think about what it means to have only one possible tracking process that is attached to the kernel log, you will quickly realize that security issues come into play. And what if some other process called StartTrace() to track the kernel log - how does the system know that our process has the ability to take and stop tracking the log and start another with our own settings?

The answer is funny - no way. In fact, kernel log tracking is fully accessible to all - let the best process win! Whoever called StartTrace() , well, he gets the opportunity to configure the log tracking for himself.

Well, not quite. Obviously, you don’t want any process to steal kernel kernel tracking from another process. Therefore, Microsoft has decided that the best solution to the problem is simply to deny access to the kernel log to all processes, except those that have administrator rights.

Yes, that's right - I'm not exaggerating. If you just want to get a list of context switches, even if for your own process, then it should be run with admin privileges. All the delights of "clicking-right-button-launch-from-name-administrator". If you have not done so and just started your process in the usual way, then your call to StartTrace will end with an error, because the process does not have enough privileges. (In theory, you have the opportunity to add a user to the “Performance Log Users” group and thus avoid running the process as an administrator, but I just say it now - in fact, I can't remember whether it will work for connections to kernel log or just for all other types of tracking ...)

Amazing right? , , 2 ( ETWBeginTrace() / ETWAddEventType() ), , , , , , , , GUID -, #define #include - , , .

, !

«»



, . , « » , ? ! , Event Tracing Windows. ! ! -, , , , … OpenTrace() :

 TRACEHANDLE OpenTrace(EVENT_TRACE_LOGFILE *Logfile); 

? , , «» — , . , , . , OpenTrace() .

, , OpenTrace() , EVENT_TRACE_LOGFILE . , , -, , - «» — . , StartTrace() , OpenTrace() — , -, , , , — .

EVENT_TRACE_LOGFILE :

 struct EVENT_TRACE_LOGFILE { LPTSTR LogFileName; LPTSTR LoggerName; LONGLONG CurrentTime; ULONG BuffersRead; union { ULONG LogFileMode; ULONG ProcessTraceMode; }; EVENT_TRACE CurrentEvent; TRACE_LOGFILE_HEADER LogfileHeader; PEVENT_TRACE_BUFFER_CALLBACK BufferCallback; ULONG BufferSize; ULONG Filled; ULONG EventsLost; union { EVENT_CALLBACK *EventCallback; EVENT_RECORD_CALLBACK *EventRecordCallback; }; ULONG IsKernelTrace; PVOID Context; }; 

, " Callback " ( . . . ) — . — . .

, , EVENT_TRACE_LOGFILE - Microsoft :

 EVENT_TRACE_LOGFILE LogFile = {0}; 

OpenTrace() , - , «» . , , :

 LogFile.LoggerName = KERNEL_LOGGER_NAME; 

, . Why? ! — , . Microsoft , .

, , , , :

 LogFile.LoggerName = KERNEL_LOGGER_NAME; LogFile.ProcessTraceMode = (PROCESS_TRACE_MODE_REAL_TIME | PROCESS_TRACE_MODE_EVENT_RECORD | PROCESS_TRACE_MODE_RAW_TIMESTAMP); 

PROCESS_TRACE_MODE_REAL_TIME , , — , -, , . PROCESS_TRACE_MODE_EVENT_RECORD — , Windows, EventRecordCallback , EventCallback (, , API !). PROCESS_TRACE_MODE_RAW_TIMESTAMP — Windows ClientContext , StartTrace() . , : , , , " 2 " " 2 " — " 2 ", . " 1 " " 3 " — … .

-, API , :

 LogFile.EventRecordCallback = CaseyEventRecordCallback; 

, -, :

 TRACEHANDLE ConsumerHandle = OpenTrace(&LogFile); 


, ( !), , StartTrace() OpenTrace() . 2 API, . TRACEHANDLE . . . . !

StartTrace() ULONG , . OpenTrace() , INVALID_HANDLE_VALUE , - . StartTrace() ( -) . OpenTrace() , - .

«»



— , -, , ? , , , , OpenTrace() :
 static void WINAPI CaseyEventRecordCallback(EVENT_RECORD *EventRecord) { EVENT_HEADER &Header = EventRecord->EventHeader; UCHAR ProcessorNumber = EventRecord->BufferContext.ProcessorNumber; ULONG ThreadID = Header.ThreadId; int64 CycleTime = Header.TimeStamp.QuadPart; // Process event here. } 


, . , , - , — . ?

, Windows , «» , , API:
 ULONG ProcessTrace(TRACEHANDLE *HandleArray, ULONG HandleCount, LPFILETIME StartTime, LPFILETIME EndTime); 


, Windows , , . , ? , , ( - , , ) ?

, , , Event Tracing API Windows : ProcessTrace .

, . ProcessTrace() , , , , , . CloseTrace() . , — , , ProcessTrace() !

, , . , ProcessTrace() :
 static DWORD WINAPI Win32TracingThread(LPVOID Parameter) { ProcessTrace(&ConsumerHandle, 1, 0, 0); return(0); } 


, , OpenTrace() — , :
 DWORD ThreadID; HANDLE ThreadHandle = CreateThread(0, 0, Win32TracingThread, 0, 0, &ThreadID); CloseHandle(ThreadHandle); 


, , , , Event Tracing API Windows.

API



, , : API, , , . . --- API , — , Event Tracing API Windows — API .

API, , ( , 10 ). , , — « ». , , . , , , , SessionName KERNEL_LOGGER_NAME , GUID — SystemTraceControlGuid .

, , , API, — « — API». , API . — , API — , , , Microsoft- . 2 (. « — API». . . ), , , , Microsoft.

:
— // ( ).

Windows, , API Windows Linux, API Windows, Linux API — — . . Linux API?

Source: https://habr.com/ru/post/254795/


All Articles