Once, not so long ago, I was recommended a background rocking of streaming video called
Jaksta , which allows you to record streaming video to disk right while watching YouTube, Facebook video, GoogleVideo, and so on. As a result of its installation, I received a strong BSOD on every Windows boot. Switching to Safe Mode, I demolished this creature, but there were questions.
A brief study of the software showed that it installs the NDIS Miniport driver, which specifically in my system began to die when it was loaded. “What for such difficulties?”, I thought, and decided to experiment with the implementation of the interception of streaming video from the browser without any drivers.
Foreword
This opus assumes some knowledge of Windows, WinAPI and some C ++, so if some obvious moments for me require more detailed explanation, then ask. I will immediately clarify that there is no ready-made program for intercepting videos built on the principles outlined in this post (at least I have not written anything like this). There are some blanks and theoretical fabrications, mostly motivated as antagonists to the solution with the NDIS Miniport driver and blue screens.
So, if it is hypothetical to assume that we have a module capable of intercepting HTTP or TCP / IP packets from the browser, how exactly will we catch the video? There are two options:
- Analyze URLs as URLs.
To do this, you will need to intercept outgoing packets containing HTTP GET, and look where exactly this GET is sent. The decision is rather dubious, because it requires specific knowledge of a particular site. On the other hand, addresses matching the “ www.youtube.com/watch?v=o78nFVB1tJA ” type of template will allow you to filter requests for streaming video directly BEFORE receiving the stream. - Check replies from server
This will require intercepting incoming packets and checking their HTTP headers for Content-Type. Obviously, for the video they will be specific to specific formats. For example, for the aforementioned Flash Video link from YouTube, where I play the saxophone, the server’s response will contain the header “Content-Type: video / x-flv”, which unequivocally informs us that Flash Video will follow the headline. In the case of MPEG4, the header will contain video / mp4 and so on.
The final solution will probably require a combination of 1) and 2) to work effectively, but in this post we will first focus on the interception of packets.
Pitfalls and DLL injection
The first thing that comes to mind to write an interceptor is to inject a DLL into the browser process. Some in this place will stop reading, because then everything is clear. Whoever can understand everything can
download the source code here and the compiled version
here (yes, only 3Kb). If you decide to try how it all works, then I strongly recommend taking a 32-bit browser and cutting down all the software that uses similar tricks, such as
AdMuncher (hello Murray & shannow!), Because this example didn’t want to spit on correct cohabitation with such software (this correctable). Search for results in the form of .log files in% TEMP%.
')
For all the others, it is necessary to clarify something, although system traps and the introduction of DLLs in Windows processes are a rather trite topic. The end result of introducing your DLL into the browser process will look something like this:

That is, within each process of the browser will sit our samopisnaya DLL, which will intercept the necessary packets. Immediately two questions arise
- How to implement a DLL?
- How to catch packages?
Let's try to answer them below ...
The introduction of a DLL in someone else's process
At the first stage, it is obvious that we need to write two modules - the main application and the DLL. The main application will be implemented, and the DLL will be implemented accordingly. Without further ado, run Vistual Studio and immediately write the main application (Injector.cpp):
#pragma comment(linker, "/entry:WinMain /nodefaultlib") void APIENTRY winMain() { HMODULE interceptor = LoadLibrary(TEXT("Interceptor.dll")); if (interceptor != NULL) { HOOKPROC cbtHook = (HOOKPROC) GetProcAddress(interceptor, (LPCSTR) 1); HHOOK hHook = (HHOOK) SetWindowsHookEx(WH_CBT, cbtHook, interceptor, 0); if (hHook != NULL) { MessageBox(NULL, TEXT("Press OK to terminate."), TEXT("Interceptor is working."), MB_OK); UnhookWindowsHookEx(hHook); } FreeLibrary(interceptor); } }
What does the above code do? In the first line for reasons of compactness of the code, we put the entry point into the application directly in WinMain (), without preludes. In the source, I generally cut off MSVCRT as unnecessary.
Next, we load our interceptor, find the export function in it at number 1 (import by ordinal) and set up a global CBT type trap using the created parameters. Then we simply display a modal message to click the "OK" button to complete and go to the astral. Everything. This is enough to inject DLLs into all processes that somehow use User32 WinAPI to work with windows.
CBT is short for Computer-Based Training. The WH_CBT trap is generally good because “... The system calls this trap before activating, creating, destroying, minimizing, maximizing, moving or resizing a window. And the same before the completion of the system command, before removing the keyboard or mouse event from the message queue, before setting the input focus and before synchronizing with the system message queue ... ”This is a free translation of MSDN. In fact, this means that it will work for 99% of applications that are written in accordance with the standard window architecture.
The beauty of this method is that in fact we don’t have to mess around with the Windows trap system as such.
Start writing a DLL
Since we are writing an interceptor, and not something else, it will be enough for him that:
- The application initializes the DLL when it is loaded into the process.
- The DLL will remain in the address space of the process until its completion.
- On completion or on an external event, the application deinitializes the DLL before unloading.
It is worth saying that this technique requires that the modules match the digit capacity. This means that a 64-bit browser will need a 64-bit DLL interceptor, the same goes for 32-bit applications. You should not expect that a 32-bit interceptor will be loaded into a 64-bit application, and the more so on the contrary.
So, let's write the skeleton of the future interceptor (Interceptor.cpp):
HINSTANCE g_hDllInstance;
In addition, we need to specify export at number 1. To do this, we write a standard DEF file (Interceptor.def) and do not forget to feed it to the linker via the / DEF parameter:
LIBRARY Intercept EXPORTS CBT_Hook @1
Everything. Now the DLL is glued to the processes and sits in them until completion. In order for us not to be injected into unnecessary processes and behave correctly inside the main application (yes, yes, because it also loads and initializes the DLL) we will do an additional check:
const char *appsToIntercept[] = { "chrome.exe", "iexplore.exe", "opera.exe", "firefox.exe", "safari.exe", 0}; char thisProcessPath[MAX_PATH], *thisProcessName; char thisDllPath[MAX_PATH], *thisDllName; BOOL onLoad() { BOOL rv = FALSE;
Thus, if the application is unknown to us, then we are not loading it. Now we are going to directly intercept the WinSock functions.
Function interception mechanism
First you need to clarify the obvious things. Since those to whom everything is already clear to this line anyway will not finish reading, I will indent it. To my amazement, I realized that some programmers, who are very smart and advanced, do not always clearly understand how the system libraries work in Windows processes. In this regard, I strongly recommend to view another post on Habré -
"Step-by-step guide to executable files (EXE) Windows" .
It is important to understand this:

All DLL libraries, including system libraries and those specified in the PE header as imports, are loaded directly into the address space of the application that uses them. From a logical point of view, each running application has its own individual set of copies of system and other DLLs.
Thus, the easiest way to intercept packets in the browser is to intercept a call to certain functions in the system library responsible for sending and receiving packets. At this point, some will stop reading again, because everything is again clear and nothing new, with which I fully agree. But for all the others I will continue.
Interception can be done at different levels: WinHTTP, WinINet, WinSock. For me, the most versatile is the interception of WinSock functions from the WS2_32.DLL library. It has its drawbacks, especially when working with HTTPS, where packets are encrypted. For HTTPS, from my point of view, the best solution would be to intercept WinHTTP functions and / or OpenSSL libraries. But let's start with a simple one.
So, we highlight the main points of what we need to do:
- Determine the address of the function being intercepted
- Rewrite the function call at the entry point so that its own handler is called.
- In your own handler, perform certain actions before calling the original function.
- Call the original function
- Save result
- In your own handler, perform certain actions after calling the original function
- Return the result to the call procedure
According to ancient traditions of backward compatibility in Windows, there are several ways to do the same thing by calling different functions, so we will try to catch them all. For our task, there should be enough interception of the following from WS2_32:
- send ()
- WSASend ()
- recv ()
- WSARecv ()
- WSAGetOverlappedResult ()
- connect ()
- WSAConnect ()
- closesocket ()
And the last three are needed solely for the purpose of creating and destroying the context tied to a particular connection, if it is needed at all. In this example, I will try to get rid of the context in general. However, in reality, it will most likely be needed to correctly assemble the HTTP request pairs — an HTTP response. In this case, the connect () and WSAConnect () interception is strictly not necessary, since a new context for the new socket can be created de facto when it is first written to it.
So, as we will get in our DLL structure to rewrite and restore the entry point in the WinSock functions:
About what HOOK_CODE_SIZE is and what it depends on to read further.
Some assembly language
To intercept the function call at the entry point, we will have to patch the code. So, the simplest algorithm would be:
- Define your handler.
In this case, the type of call to cdecl or stdcall, as well as all input parameters must be exactly the same as in the original function, otherwise we will spoil the stack. - Determine the entry point to the function of interest
With this all simple, you need to call GetProcAddress () from kernel32.dll. - Save code from function entry point
Here, too, everything is simple - byte-by-copy to a secluded place - Patch entry point
Roughly speaking, it all comes down to rewriting the code at the entry point so that when you call it, there is a transition to our handler.

There are several different methods of interception, from a functional point of view. The simplest method is to constantly rewrite the code at the very beginning of the entry point from the original to its own and back (when calling the original function). There are more complex methods in which you do not need to constantly rewrite the code, but which require the writing of an instruction analyzer in order to correctly implement your interceptor in the middle of the original function. Let us dwell on the simplest - rewriting at the beginning of the code.
Again, there are several ways to call your handler. Without going into details, I’ll highlight two of them: unconditional jump and return through the call stack. In the first case, the concept is:
MyFuncHandler: <blablablablabla> OriginalFunction: JMP MyFunHandler
This is extremely simple and in a 32-bit expression requires 5 bytes, one for the code of the JMP unconditional instruction instruction and four for the relative address. Why relative, later. In the second case, the concept is slightly different:
MyFuncHandler: <blablablablabla> OriginalFunction: PUSH MyFuncHandler RETN
This requires 6 bytes - two bytes for the PUSH <32-bit DWORD> and RETN instruction codes and four for the absolute address. Yes Yes. In the first case, the address is considered relative to the current address of the executable code. In the second, it is constant and is considered relative to the beginning of the address space. I will go first method.
We write the interceptor installer:
I do not think that the above code needs additional explanation. It is worth noting that we first form the patched code in the structure responsible for the content of information about the interceptor functions, and we make the patch itself using macros via memcpy (). Fans of aesthetics can add a lock there, but in my opinion this is unnecessary, guess why?
To enable the trap, we copy the new code that contains only the transition to the address of its own handler. To turn off the trap, we restore the 5 original bytes stored in the array as oldCode.
Since we have written the setting of a trap on a function, it is worth writing and restoring the initial state of the function code:
Now, when we can integrate into the process, set, remove, turn on and off traps, it's time to do their own handlers.
WinSock proprietary function handlers
So, first, let's define an array of prototypes of intercepted functions. As stated above, we will try to intercept only the most necessary functions:
As can be seen from the macro, each system function with an abstract name name has its own handler named my_name. Now it is necessary to determine the handlers specified in the array. Let's do it on the example of send ():
int WSAAPI my_send(SOCKET s, char *buf, int len, int flags) { PAPIHOOK thisHook = hookFind(my_send); if (NULL == thisHook) return (int) 0; hookDisable(thisHook); int rv; rv = send(s, buf, len, flags); hookEnable(thisHook); return rv; }
This will look like an empty wrapper for a system function that does nothing except what the original code calls. Since the wrapper is essentially the same for all its own handlers, it makes sense to make a couple of macros for the convenience of determining them:
and to exit the function too:
Next, we use these macros to determine the remaining traps from the array.
Setting traps
The last line in our OnLoad () calls a kind of magic function InstallHooks (). Since we have all the components of the solution, we will write a batch installation of all the specific pitfalls:
That's so concise. And batch trap removal is no less concise:
HTTP packet interception
Well, we smoothly got to the most interesting. So, we have two types of packages - HTTP requests and HTTP responses. Accordingly, the first are sent by functions like send (), the second are received by functions of type recv (). Send functions must be intercepted BEFORE calling the original code, while the send buffer is still pristine. The reception functions, respectively, must be intercepted AFTER the execution of the initial code, otherwise we will not see what exactly is accepted.
There are asynchronous functions. The idea is simple. When WSASend () or WSARecv () is called, the WSAOVERLAPPED structure is specified in which Event is written. Asynchronous functions are completed instantly, and upon completion they give out SOCKET_ERROR with GetLastError () set to WSA_IO_PENDING. Next, the main application waits for the Event event in any way, for example, WaitForSingleObject (), and as soon as the event status is set, the application reads the buffer through WSAGetOverlappedResult ().
If you can easily remove data from synchronous functions, you will have to tinker with asynchronous a bit. At the beginning of the post I mentioned that it would not be possible to get rid of contexts completely, and asynchronous operations are exactly that why. In more detail. Calling WSAGetOverlappedResult () does not carry any information about the send or receive buffer. Therefore, it is obvious that you need to create a context and store a pointer to a buffer there.
There is one more reason why context is needed. Since for our task, of intercepting streaming video, HTTP requests and responses are also required, the most logical solution is to assemble disparate send (), recv () calls in pairs. So let's set up a structure for a context that will be suitable for collecting pairs of HTTP requests and responses, and for working with
asynchronous functions:
struct REQUEST { SOCKET socket; char *request; LPWSABUF wsaBuf; PREQUEST next; }
What is the need for all of the above? According to the socket
socket number, we will determine the compliance of the request and response. That is, the main application sends and receives a request on the same TCP socket, otherwise it cannot be. The
request pointer will refer to the HTTP request. The
LPWSABUF pointer will be used for asynchronous functions. That is, when WSASend () / WSARecv () is called, we will save the pointer to the buffer, and when WSAGetOverlappedResult () completes it, we will remove it from there. Again, matching is determined by the socket number.
Looking ahead to say that for WSASend () asynchronous calls are not used in any of the browsers that I managed to test during the writing of this post and the interceptor blanks.
What is
next used for ? To organize a single-linked list. It is logical that contexts for request-response pairs should be placed somewhere so that they are not lost. In order not to inflate the size of the program and not to use something like the STL templates, it was easiest for me to solve the puzzle for the school competition and write the implementation of a simply linked list. As you will be better, see for yourself.
Without going into details, let us describe the functions for working with a linked list of request-response contexts (see the source for details):
Next, we write a general handler for all send () functions:
And the general handler for all recv ():
. . 'GET' . HTTP GET HTTP . 'HTTP' , GET . . %TEMP%\< .exe>-< >.log
. , WSARecv():
, . read():
closesocket() , . , , , , YouTube…
Google Chrome
www.youtube.com/watch?v=o78nFVB1tJA ( ):
[22:28:48] [SOCKET = 0EB0, REQUEST = 1327 bytes, RESPONSE = 329 bytes] ->GET /videoplayback?algorithm=throttle-factor&burst=40&cp=U0hTS1RRU19OTUNOM19MS1dBOlR1eGNSd1JHRkdy&expire=1346465093&factor=1.25&fexp=926900%2C910103%2C922401%2C920704%2C912806%2C924412%2C913558%2C912706&gcr=fi&id=a3bf27155075b490&ip=91.155.190.10&ipbits=8&itag=34&keepalive=yes&key=yt1&ms=au&mt=1346441292&mv=m&range=13-1781759&signature=7415093589702691B2E46681B2EF24EC370C2F1F.D6D55168E2211687994A3F47D8919AC5470C567D&source=youtube&sparams=algorithm%2Cburst%2Ccp%2Cfactor%2Cgcr%2Cid%2Cip%2Cipbits%2Citag%2Csource%2Cupn%2Cexpire&sver=3&upn=GlJDbjcQ-2w HTTP/1.1 Host: oo---preferred---elia-hel1---v11---lscache1.c.youtube.com Connection: keep-alive User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.26 Safari/537.4 Accept: *
HTTP .
- , ? , .
findings
. IPC -. :
- URL .
- . IPC, , .
- .
. YouTube / Flash Video. . 90% ”Content-type”.
:
- .
, . , . - , . - HTTPS
, HTTPS NDIS Miniport . WinHTTP OpenSSL.
? :
- . HTTP , . TCP/IP.
- WinSock. . .
- .
UPD: .
Respectfully,
//st