📜 ⬆️ ⬇️

We study the source tree of Windows 10: from telemetry to open source

image

No matter how closed the Microsoft software is, it gives plenty of information about its internal structure. For example, exporting functions from a library by name gives an idea of ​​its interfaces. There are also free debugging symbols that are commonly used to diagnose errors in the OS. However, we still have only compiled binary modules on hand. It becomes interesting: what were they before the compilation? Let's try to figure out how to get more information about the source code, without doing anything illegal.

The idea, of course, is not new. At the time, both Russinovich and Alex Ionescu did the same. I was only interested in getting the latest data, adding a little and clarifying the work already done by others. For the experiment, we will need packages of debugging symbols that are freely available. I took packages for the latest release version of the "dozens" (64 bits), and decided to investigate both the release package (free build) and the debug build (checked build).

Debugging symbols are a set of files with the pdb extension (program database, program database) that contain various information to enhance the debugging capabilities of binary modules in the OS, including the names of globals, functions and data structures, sometimes along with their contents.
')
In addition to the symbols, you can take the conditionally accessible debug build "tens". Such an assembly is rich in assertions, in which not only the variable names that are not documented and not in symbol files, but also the line number in the file in which the assertion has been described are described.

image

In the example, you can see not only the file name and its extension, but also the directory structure before it, very useful even without a root.

We set the strings from sysinternals utility on symbol files and get about 13 GB of raw data. But to feed all the files from the distribution package of the debug build in a row is so-so an idea, there will be too much unnecessary data. We confine ourselves to a set of extensions: exe - executable files, sys - drivers, dll - libraries, ocx - ActiveX-components, cpl - components of the control panel, efi - EFI-applications, in particular, the loader. The raw data from the distribution kit gathered 5.3 GB.

To my surprise, I discovered that not many programs are capable of at least opening files of a dozen gigabytes, and even more so, units have been able to support the search function inside such files. In this experiment, the 010 Editor was used to manually view the raw and intermediate data. Data filtering was cheap and good with python scripts.

Filter data from character files


In the symbol files, among other things, contains the linker information. That is, the symbol file contains a list of object files that were used to build the corresponding binary, and the linker uses the full path to the object file.

image


  • Catch filter number 1: look for lines by mask ": \\".


We receive absolute paths, sort, delete duplicates. By the way, there wasn’t much rubbish and it was manually removed.

When inspecting the obtained data, the approximate structure of the source code tree became clear. The root is “d: \ th”, which apparently means a threshold, according to the name of the November version of Windows 10 - Threshold 1. However, there were few files with the root “d: \ th”. This is explained by the fact that the linker accepts already collected files. And the assembly of the object is carried out in the folder “d: \ th.obj.amd64fre” for release build and “d: \ th.obj.amd64chk” for debug.

  • Hook-on filter number 2: we assume that the source files are stored by analogy with the object files after assembly, and we “disassemble” the object files into the original ones. Attention! This step may introduce a distortion of the structure for some folders, because the source build options are not reliably known.


For example:
d: \ th.obj.amd64fre \ shell \ osshell \ games \ freecell \ objfre \ amd64 \ freecellgame.obj
this is the former
d: \ th \ shell \ osshell \ games \ freecell \ freecellgame.c ??

Regarding the file extension: an object file is obtained from a heap of different types of source file: “c”, “cpp”, “cxx”, “asm”, etc. At this stage it is not clear which type of source file was used, so leave the extension "C ??"

In addition to the folder "d: \ th" there are many other roots. For example, “d: \ th.public.chk” and “d: \ th.public.fre”. We will omit this folder due to the fact that it contains the public part of sdk, that is, it is not very interesting to us. It is also worth noting the various ways of projects for drivers, which, apparently, are going somewhere in the workplace of developers:

c: \ users \ joseph-liu \ desktop \ sources \ rtl819xp_src \ common \ objfre_win7_amd64 \ amd64 \ eeprom.obj
C: \ ALLPROJECTS \ SW_MODEM \ pcm \ amd64 \ pcm.lib
C: \ Palau \ palau_10.4.292.0 \ sw \ host \ drivers \ becndis \ inbox \ WS10 \ sandbox \ Debug \ x64 \ eth_tx.obj
C: \ Users \ avarde \ Desktop \ inbox \ working \ Contents \ Sources \ wl \ sys \ amd64 \ bcmwl63a \ bcmwl63a \ x64 \ Windows8Debug \ nicpci.obj

In other words, there is a set of device drivers that meet standards, such as USB XHCI, that are included in the OS source tree. And all the specific drivers are going somewhere else.

  • Catch-filter number 3: delete binary files, because we are only interested in the source. We delete “pdb”, “lib”, “exp”, etc. The files “res” are rolled back to “rc” - the source code of the resource file.


image


Impressions are becoming more beautiful! However, at this stage additional data is almost impossible to obtain. Moving on to the next raw data set.

Filtering data from executable files


Since the absolute paths in the raw data turned out to be small, we will filter the strings by extensions:

After filtering the data, it becomes clear that although the resulting paths have no root, the directory structure indicates that it is built relative to it. That is, it is enough for all paths to add the root “d: \ th” at the beginning.

At this stage there are several problems with the data obtained from the characters. First problem: we are not sure that the path for assembling the source file into the object file was correctly rolled back.

  • Catch filter number 4: check if there are any matches between the paths to the object files and the paths to the original ones.


And they really are! That is, for most directories it can be argued that their structure was restored correctly. Of course, there are still questionable catalogs, but I think this error is quite acceptable. Along the way, you can safely replace the extension “c ??” with the extension of the source that matched along the path.

The second problem is the header files. The fact is that this is an important part of the source files, but the object file is not obtained from the header, which means that the header information cannot be recovered from the information about the object files. We have to be content with small, namely, those headers that we found in the raw binary data.

The third problem: we still do not know most of the source file extensions.

  • Catch-filter number 5: we assume that the source files of the same type are stored within the same folder.


That is, if a file with the extension “cpp” is already present in any of the folders, most likely all its neighbors will have the same extension.
image

Well, what about the source code in assembler? For the final touch, you can contact the Windows Research Kernel — the source code for Windows XP — and manually rename some of the source code in the assembler.

We study the data


Telemetry


For a while I studied the issue of telemetry in Windows 10 . Unfortunately, a quick analysis revealed nothing worthwhile. I did not find any keyloggers, no leakage of sensitive data, nothing to which I could dig in. And the first keyword to search among the source files was “telemetry”. The result exceeded my expectations: 424 matches. I will give the most interesting below.

Telemetry in source files
d: \ th \ admin \ enterprisemgmt \ enterprisecsps \ v2 \ certificatecore \ certificates storetelemetry.cpp
d: \ th \ base \ appcompat \ appraiser \ heads \ telemetry \ telemetryappraiser.cpp
d: \ th \ base \ appmodel \ search \ common \ telemetry \ telemetry.cpp
d: \ th \ base \ diagnosis \ siuf \ libs \ telemetry \ siufdatacustom.c ??
d: \ th \ base \ diagnosis \ pdui \ de \ wizard \ wizardtelemetryprovider.c ??
d: \ th \ base \ enterpriseclientsync \ settingsync \ azure \ lib \ azuresettingsyncprovidertelemetry.cpp
d: \ th \ base \ fs \ exfat \ telemetry.c
d: \ th \ base \ fs \ fastfat \ telemetry.c
d: \ th \ base \ fs \ udfs \ telemetry.c
d: \ th \ base \ power \ energy \ platformtelemetry.c ??
d: \ th \ base \ power \ energy \ sleepstudytelemetry.c ??
d: \ th \ base \ stor \ vds \ diskpart \ diskparttelemetry.c ??
d: \ th \ base \ stor \ vds \ diskraid \ diskraidtelemetry.cpp
d: \ th \ base \ win32 \ winnls \ els \ advancedservices \ spelling \ platformspecific \ current \ spellingtelemetry.c ??
d: \ th \ drivers \ input \ hid \ hidcore \ hidclass \ telemetry.h
d: \ th \ drivers \ mobilepc \ location \ product \ core \ crowdsource \ locationoriontelemetry.cpp
d: \ th \ drivers \ mobilepc \ sensors \ common \ helpers \ sensorstelemetry.cpp
d: \ th \ drivers \ wdm \ bluetooth \ user \ bthtelemetry \ bthtelemetry.c ??
d: \ th \ drivers \ wdm \ bluetooth \ user \ bthtelemetry \ fingerprintcollector.c ??
d: \ th \ drivers \ wdm \ bluetooth \ user \ bthtelemetry \ localradiocollector.c ??
d: \ th \ drivers \ wdm \ usb \ telemetry \ registry.c ??
d: \ th \ drivers \ wdm \ usb \ telemetry \ telemetry.c ??
d: \ th \ ds \ dns \ server \ server \ dnsexe \ dnstelemetry.c ??
d: \ th \ ds \ ext \ live \ identity \ lib \ tracing \ lite \ microsoftaccounttelemetry.c ??
d: \ th \ ds \ security \ base \ lsa \ server \ cfiles \ telemetry.c
d: \ th \ ds \ security \ protocols \ msv_sspi \ dll \ ntlmtelemetry.c ??
d: \ th \ ds \ security \ protocols \ ssl \ telemetry \ telemetry.c ??
d: \ th \ ds \ security \ protocols \ sspcommon \ ssptelemetry.c ??
d: \ th \ enduser \ windowsupdate \ client \ installagent \ common \ commontelemetry.cpp
d: \ th \ enduser \ winstore \ licensemanager \ lib \ telemetry.cpp
d: \ th \ minio \ ndis \ sys \ mp \ ndistelemetry.c ??
d: \ th \ minio \ security \ base \ lsa \ security \ driver \ telemetry.cxx
d: \ th \ minkernel \ fs \ cdfs \ telemetry.c
d: \ th \ minkernel \ fs \ ntfs \ mp \ telemetry.c ??
d: \ th \ minkernel \ fs \ refs \ mp \ telemetry.c ??
d: \ th \ net \ netio \ iphlpsvc \ service \ teredo_telemetry.c
d: \ th \ net \ peernetng \ torino \ telemetry \ notelemetry \ peerdistnotelemetry.c ??
d: \ th \ net \ rras \ ip \ nathlp \ dhcp \ telemetryutils.c ??
d: \ th \ net \ winrt \ networking \ src \ sockets \ socketstelemetry.h
d: \ th \ shell \ cortana \ cortanaui \ src \ telemetrymanager.cpp
d: \ th \ shell \ explorer \ traynotificationareatelemetry.h
d: \ th \ shell \ explorerframe \ dll \ ribbontelemetry.c ??
d: \ th \ shell \ fileexplorer \ product \ fileexplorertelemetry.c ??
d: \ th \ shell \ osshell \ control \ scrnsave \ default \ screensavertelemetryc.c ??
d: \ th \ windows \ moderncore \ inputv2 \ inputprocessors \ devices \ keyboard \ lib \ keyboardprocessortelemetry.c ??
d: \ th \ windows \ published \ main \ touchtelemetry.h
d: \ th \ xbox \ onecore \ connectedstorage \ service \ lib \ connectedstoragetelemetryevents.cpp
d: \ th \ xbox \ shellui \ common \ xbox.shell.data \ telemetryutil.c ??

Comment, perhaps, not worth it, because still nothing reliably known. However, these data can provide a good starting point for more detailed research.

Kernel Patch Protection


The next find is everyone's favorite PatchGuard . True, the OS source tree contains only one file of an incomprehensible, most likely binary type.
d: \ th \ minkernel \ ntos \ ke \ patchgd.wmp
Looking for matches in the unfiltered data, I discovered that Kernel Patch Protection is in fact a separate project.
d: \ bnb_kpg \ minkernel \ oem \ src \ kernel \ patchgd \ mp \ xcptgen00.c ??
d: \ bnb_kpg \ minkernel \ oem \ src \ kernel \ patchgd \ mp \ xcptgen01.c ??
d: \ bnb_kpg \ minkernel \ oem \ src \ kernel \ patchgd \ mp \ xcptgen02.c ??
d: \ bnb_kpg \ minkernel \ oem \ src \ kernel \ patchgd \ mp \ xcptgen03.c ??
d: \ bnb_kpg \ minkernel \ oem \ src \ kernel \ patchgd \ mp \ xcptgen04.c ??
d: \ bnb_kpg \ minkernel \ oem \ src \ kernel \ patchgd \ mp \ xcptgen05.c ??
d: \ bnb_kpg \ minkernel \ oem \ src \ kernel \ patchgd \ mp \ xcptgen06.c ??
d: \ bnb_kpg \ minkernel \ oem \ src \ kernel \ patchgd \ mp \ xcptgen07.c ??
d: \ bnb_kpg \ minkernel \ oem \ src \ kernel \ patchgd \ mp \ xcptgen08.c ??
d: \ bnb_kpg \ minkernel \ oem \ src \ kernel \ patchgd \ mp \ xcptgen09.c ??
d: \ bnb_kpg \ minkernel \ oem \ src \ kernel \ patchgd \ mp_noltcg \ patchgd.c ??
d: \ bnb_kpg \ minkernel \ oem \ src \ kernel \ patchgd \ mp_noltcg \ patchgda.c ??
d: \ bnb_kpg \ minkernel \ oem \ src \ kernel \ patchgd \ mp_noltcg \ patchgda2.c ??
d: \ bnb_kpg \ minkernel \ oem \ src \ kernel \ patchgd \ mp_noltcg \ patchgda3.c ??
d: \ bnb_kpg \ minkernel \ oem \ src \ kernel \ patchgd \ mp_noltcg \ patchgda4.c ??

Doubtful files


Not inventing anything else that interests me, I began to look for everything in a row - and I was satisfied!

d: \ th \ windows \ core \ ntgdi \ fondrv \ otfd \ atmdrvr \ umlib \ backdoor.c ??
in the font driver?

d: \ th \ inetcore \ edgehtml \ src \ site \ webaudio \ opensource \ wtf \ wtfvector.h
Web Template Framework is just the Web Template Framework, a controversial abbreviation. Wait a minute

Open source?


d: \ th \ printscan \ print \ drivers \ renderfilters \ msxpsfilters \ util \ opensource \ libjpeg \ jaricom.c ??
d: \ th \ printscan \ print \ drivers \ renderfilters \ msxpsfilters \ util \ opensource \ libpng \ png.c ??
d: \ th \ printscan \ print \ drivers \ renderfilters \ msxpsfilters \ util \ opensource \ libtiff \ tif_compress.c ??
d: \ th \ printscan \ print \ drivers \ renderfilters \ msxpsfilters \ util \ opensource \ zlib \ deflate.c ??
I think on this find it is time to round out.

Archive with a text file with a list of sources is given here . Share your findings in the comments!

Source: https://habr.com/ru/post/279215/


All Articles