We are writing the viewer of the MS Exchange mail database (part 1)

In one big, pre-Bolshoy country, there lived small, very small people. And everything was fine until a deep, deep hole appeared right in the center of this big, or rather, very big country. Well, I must say, she appeared not alone and, of course, not immediately, but no one will remember about it anymore and it does not matter to anyone.

The main thing was a hole and it was big, even very large, or rather infinite, and more precisely nobody knows. But the government of the very big country knew that it was necessary to do something with the pit and it was decided to study it and fall asleep. To this were caused the most presamy, dexterous and courageous, intelligent and the most intelligent men and women from those who were in this beautiful state.

And the most presses started. Studied and fell asleep, fell asleep and studied, studied, fell asleep, fell asleep, studied, and it was repeated not so much, but many times.

It so happened that our little best-selling heroes left, or rather, they had to leave, but the pit remained. But new heroes came, the same small ones, but the very same ones, and they began to do the same.
')
In general, it lasted as long as it was, until one remained the most. And one day, the pit disappeared, really, and appeared somewhere in a completely different place, where they began to study and bury her again ...

All coincidences are random, everything is fictional and nothing like this has ever existed and could not exist! True, true, well, or almost true.

One such “pit” will be discussed further. But since Privacy Policy we will affect only a small piece of it, which can be reached using only open sources and the documented API.

The material will be almost exclusive. information on the grain.

Have you ever wondered where MS Exchange Server stores all your mail, or how it works with it at the lowest level? Here I am going to write a little about it here.

Warning: Do not try to go deep into this topic, and your whole life is not enough. I warned.

Introduction

MS Exchange Server (hereinafter simply Exchange) is one of the flagships in the Microsoft product line. You can read about its main functions in the wiki or on the official website . In short, this is a kind of "combine" for working with mail, calendars and other user data, which has extensive integration capabilities with various MS products (SharePoint, TFS, etc.).

But within the framework of this article, we will not be interested in what it provides to the end user, but from where it takes this data and which API it uses for this. We will try to independently read the mailbox database on the Exchange 2010 Mailbox role ( Mailbox Server ).

Exchange has several entry points ( CAS Server ) through which the user can access his data and several protocols that he can use for this, for example, OWA, RPC (Outlook), POP3 / IMAP4.

Regardless of how access is obtained, Exchange sends all requests for the Mailbox role (up to Exchange 2007, this was the only role), which, among other things, contains databases of user mailboxes that will interest us today. Physically, these databases are located on the hard disk inside the * .edb files. They can be found in the Mailbox \ <base name> folder in the directory where Exchange was installed. In addition, transaction logs and other files related to the database life cycle are placed there, but we will not need them, the most basic thing for us is * .edb.

If you dig a little bit, you can find out that Exchange uses the Extensible Storage Engine (ESE) to access the contents of the databases! And if you dig further, it becomes clear that the implementation of ESE functions is in the ese.dll library (or esent.dll). This is the core of all operations performed by Exchange. ESE provides an extensive set of tools for working with the database. Description of functions, constants, structures and everything you might need can be found here . Unfortunately, this documentation has not been updated for a long time, so there are not a number of functions that appeared in Exchange 2010, but as part of this topic, we will not need them. You can find ese.dll in the Bin folder inside the main Exchange directory.

After reading the description of ESE it becomes clear that the data is stored in a tabular form, i.e. a set of tables with columns and rows. Table cells can be of different types, well, plus all the features of the databases: indexes, search, etc.

In total, we know that Exchange stores its databases on the Mailbox role, in the form of files with the .edb extension, and accesses them thanks to ESE (ese.dll). This is enough for us and we can start coding.

We will list the tables in the database, as well as all the columns and columns. Of course, we ~~will~~ never know what they mean. only MS knows about this, well, and a few more people. In these columns there should be almost all the information related to the mailboxes, starting with the user’s folder names, ending with letters, but to understand where that is already the task of reverse engineering will not be considered here.

Programming

Training

First we need:

Visual Studio 2008-2010
MS Exchange Server 2010 (any other will do, but in this topic we’ll talk about 2010)
Knowledge of C / C ++

In Exchange, create a database, which we will try to read. To do this, you can use the Exchange Management Console ( EMC ). I will not describe the procedure because There is a lot of information on the Internet on this topic. We will create a single user in this database (through the same EMC), so that it will contain some content and log in to this user in his mailbox to check that everything has been done correctly, for example, through OWA. After that, go to the Mailbox directory and look for the folder with the name of our database, and in it the EDB file. Before copying the base, unmount it through EMC. That's all, we have a base for experiments. Copy it somewhere, for example, in the directory of the future project.

From the Bin folder, copy ese.dll, thanks to which we will work with the database.

In Visual Studio, we create a console C ++ project. Here is an important nuance, because Exchange 2010 (unlike all previous versions) has only 64 bit version, then we will have to create a project with x64 support, since otherwise, we simply cannot load ese.dll into our address space. Therefore, to test the application, you need a 64-bit version of the OS, you can of course test it on Exchange itself, but I use my workstation with Windows 7 for this purpose. Also, we will use the unicode version of the API, so the project should be better to make unicode the default encoding.

And so, in the newly created project we are convinced that there are support for x64 and Unicode (General - Use Unicode Character Set). Now we include the main header file for ESE:

#include <esent.h>

This file comes with the SDK along with the studio since VS 2008.
In stdafx.h we add 2 defaines, with the JET (ESE) version, and indicate that we want to use the unicode version of the API:

#define JET_UNICODE
#define JET_VERSION 0x0600

Well, now we need to decide what we want to get from the database. ESE is a database with tables, columns, and rows; this is what we will try to extract from it: tables, columns, and rows. To do this, we prepare the following structures:

typedef struct tagDBColumnsInfo
{
std :: wstring sColumnName ;
std :: vector < std :: wstring > sColumnValues ;
} SDBColumnInfo ;

typedef struct tagDBTableInfo
{
std :: wstring sTableName ;
std :: vector < SDBColumnInfo > sColumnInfo ;
} SDBTableInfo ;

typedef struct tagDBTablesInfo
{
std :: wstring sDBName ;
std :: vector < SDBTableInfo > sTablesInfo ;
} SDBTablesInfo ;

First of all, you need to load the DLL itself, do it, as always, via :: LoadLibrary (...).
We will load the function from ese.dll dynamically and we will need the following functions :

JetInit
JetCreateInstanceW
JetBeginSessionW
JetAttachDatabaseW
JetOpenDatabaseW
JetCloseDatabase
JetDetachDatabaseW
JetTerm
JetSetSystemParameterW
JetOpenTableW
JetGetColumnInfoW
JetRetrieveColumns
JetMove
JetGetTableColumnInfoW
JetCloseTable
JetGetSystemParameter

Opening the base

After we have successfully loaded the functions we need, we begin to read the database directly. According to MSDN, you must specify the database page size , through the setting of the parameter JET_paramDatabasePageSize (esent.h). This is where the complexity appears. it is impossible to find out this value having only an EDB file, and you need to specify exactly the otherwise the database will not open. This can be done through eseutils (comes bundled with Exchange), but I went a little different way, and found out that this value is constant for identical versions of Exchange and is always a multiple of 4096. So, experimentally it turned out that for Exchange 2010 it is equal to 32768 .

Ok, first of all we set the page size value:

JET_ERR jRes = _JetSetSystemParameter ( NULL , NULL , JET_paramDatabasePageSize, 32768 , NULL ) ;

JET_ERR is just a long that contains an error code. You can turn this code into a text description using the JetGetSystemParameter function (ala :: FormatMessage (...)):

JetGetSystemParameter ( m_instance, m_sesid, JET_paramErrorToString,
reinterpret_cast < JET_API_PTR * > ( & jeterror, cBuff, MAX_BUFFER_SIZE ) ;

For the convenience of parsing the error code, I use the following macro (m_cLog is my internal logging class):

#define WRITE_TO_LOG_AND_RETURN_IF_ERROR (jeterror) \
if (jeterror) {\
char cBuff [MAX_BUFFER_SIZE] = {0}; \
if (m_instance) _JetGetSystemParameter (m_instance, m_sesid, \
JET_paramErrorToString, reinterpret_cast <JET_API_PTR *> (& jeterror), cBuff, MAX_BUFFER_SIZE); \
m_cLog.write (m_sEDBPath, cBuff, jeterror, __FILE__, __LINE__); \
return jeterror; }

Now you need to disable callbacks and Exchange-specific, because we know nothing about them:

jRes = _JetSetSystemParameter ( NULL , NULL , JET_paramDisableCallbacks, true , NULL ) ;

Next, create a new instance (JET_INSTANCE m_instance) to work with the database:

jRes = _JetCreateInstance ( & m_instance, NULL ) ;

We initialize the created instance to start working with the database:

jRes = _JetInit ( & m_instance ) ;

Starting a new session (JET_SESID m_sesid):

jRes = _JetBeginSession ( m_instance, & m_sesid, NULL , NULL ) ;

Connect our EDB file:

jRes = _JetAttachDatabase ( m_sesid, L "demo.edb" , JET_bitDbReadOnly ) ;

And open it:

jRes = _JetOpenDatabase ( m_sesid, L "demo.edb" , NULL , & m_dbid, JET_bitDbReadOnly ) ;

Total, if all functions returned JET_errSuccess, then the base is open, which means you can start reading the contents.

Next is a bit of code. I will bring it because on this subject you will not find him in the afternoon with fire.

We list the tables

For listing, we will write the following function:

JET_ERR CJetDBReaderCore :: EnumRootTables ( SDBTablesInfo & sDBTablesInfo )
{
sDBTablesInfo. sDBName = m_sEDBPath ;
JET_ERR jRes = OpenTable ( ROOT_TABLE ) ;
if ( jRes == JET_errSuccess )
{
JET_COLUMNBASE sNameInfo,
sTypeInfo ;
if ( ! ReadFromTable ( ROOT_TABLE, NAME_COLUMN, sNameInfo ) &&
! ReadFromTable ( ROOT_TABLE, TYPE_COLUMN, sTypeInfo ) )
{
JET_RETRIEVECOLUMN sJetRC [ 2 ] ;
sJetRC [ 0 ] . columnid = sNameInfo. columnid ;
sJetRC [ 0 ] . cbData = sNameInfo. cbMax ;
sJetRC [ 0 ] . itagSequence = 1 ;
sJetRC [ 0 ] . grbit = 0 ;
CHAR szName [ MAX_BUFFER_SIZE ] ;
sJetRC [ 0 ] . pvData = szName ;

sJetRC [ 1 ] . columnid = sTypeInfo. columnid ;
sJetRC [ 1 ] . cbData = sTypeInfo. cbMax ;
sJetRC [ 1 ] . itagSequence = 1 ;
sJetRC [ 1 ] . grbit = 0 ;
WORD wType ;
sJetRC [ 1 ] . pvData = & wType ;

do
{
jRes = GetColumns ( ROOT_TABLE, sJetRC, 2 ) ;
if ( jRes ! = JET_errSuccess ) return jRes ;
if ( wType == 1 )
{
szName [ sJetRC [ 0 ] . cbActual ] = 0 ;

SDBTableInfo sTableInfo ;
std :: string tmp ( szName ) ;
sTableInfo. sTableName . assign ( tmp. begin ( ) , tmp. end ( ) ) ;

sDBTablesInfo. sTablesInfo . push_back ( sTableInfo ) ;
}

} while ( ! TableEnd ( ROOT_TABLE ) ) ;
}

jRes = CloseTable ( ROOT_TABLE ) ;
}

return jRes ;
}

Where:

ROOT_TABLE - “MSysObjects”, let's call this table root, because it contains a list of all other tables in the database.
NAME_COLUMN - “Name”, a column containing the names of all the tables.
TYPE_COLUMN - “Type”, a column containing the type of the table.

As can be seen in the code, first we open the root table, this is done through the function JetOpenTable :

JET_ERR CJetDBReaderCore :: OpenTable ( std :: wstring sTableName )
{
std :: map < std :: wstring , JET_TABLEID > :: const_iterator iter = m_tables. find ( sTableName ) ;
if ( iter == m_tables. end ( ) )
{
JET_TABLEID tableid ( 0 ) ;
JET_ERR jRes = _JetOpenTable ( m_sesid, m_dbid, sTableName. C_str ( ) , NULL ,
0 , JET_bitTableReadOnly, & tableid ) ;
WRITE_TO_LOG_AND_RETURN_IF_ERROR_2 ( jRes )

m_tables [ sTableName ] = tableid ;
}

return JET_errSuccess ;
}

Next, we get information about the columns inside the ReadFromTable, since we need its Id to get the contents:

JET_ERR CJetDBReaderCore :: ReadFromTable (
std :: wstring sTableName,
std :: wstring sColumnName,
JET_COLUMNBASE & sColumnBase )
{
std :: map < std :: wstring , JET_TABLEID > :: const_iterator iter = m_tables. find ( sTableName ) ;
if ( iter ! = m_tables. end ( ) )
{
JET_ERR jRes = _JetGetColumnInfo ( m_sesid, m_dbid, sTableName. C_str ( ) ,
sColumnName. c_str ( ) , & sColumnBase ( sizeof ( JET_COLUMNBASE ) , JET_ColInfoBase ) ;
WRITE_TO_LOG_AND_RETURN_IF_ERROR_2 ( jRes )
}

return JET_errSuccess ;
}

Having an Id, we fill in the JET_RETRIEVECOLUMN structure, which we have JetRetrieveColumns inside GetColumns, to get the name of the table:

JET_ERR CJetDBReaderCore :: GetColumns (
std :: wstring sTableName,
JET_RETRIEVECOLUMN * sJetRC,
Int ncount )
{
std :: map < std :: wstring , JET_TABLEID > :: const_iterator iter = m_tables. find ( sTableName ) ;
if ( iter ! = m_tables. end ( ) )
{
JET_ERR jRes = _JetRetrieveColumns ( m_sesid, iter - > second, sJetRC, nCount ) ;
WRITE_TO_LOG_AND_RETURN_IF_ERROR_2 ( jRes )
}

return JET_errSuccess ;
}

All the list of tables received, go to get the contents of the columns. For each table, we will receive a list of columns in it, and as we receive, we will store this information in our structures.

Enumerate columns

We write the following function:

JET_ERR CJetDBReaderCore :: EnumColumns (
SDBTableInfo & sTableInfo,
std :: list < SColumnInfo > & sColumnsInfo )
{
if ( ! OpenTable ( sTableInfo. sTableName ) )
{
JET_COLUMNLIST sColumnInfo ;
GetTableColumnInfo ( sTableInfo. STableName , & sColumnInfo ) ;
MoveToFirst ( sTableInfo. STableName ) ;

char szNameBuff [ MAX_BUFFER_SIZE ] ;
do
{
SColumnInfo ci ;
JET_RETRIEVECOLUMN sJetRC [ 4 ] ;

sJetRC [ 0 ] . columnid = sColumnInfo. columnidcolumnname ;
sJetRC [ 0 ] . cbData = sizeof ( szNameBuff ) ;
sJetRC [ 0 ] . itagSequence = 1 ;
sJetRC [ 0 ] . grbit = 0 ;
sJetRC [ 0 ] . pvData = szNameBuff ;

sJetRC [ 1 ] . columnid = sColumnInfo. columnidcolumnid ;
sJetRC [ 1 ] . cbData = sizeof ( DWORD ) ;
sJetRC [ 1 ] . itagSequence = 1 ;
sJetRC [ 1 ] . grbit = 0 ;
sJetRC [ 1 ] . pvData = & ci. dwId ;

sJetRC [ 2 ] . columnid = sColumnInfo. columnidcoltyp ;
sJetRC [ 2 ] . cbData = sizeof ( DWORD ) ;
sJetRC [ 2 ] . itagSequence = 1 ;
sJetRC [ 2 ] . grbit = 0 ;
sJetRC [ 2 ] . pvData = & ci. dwType ;

sJetRC [ 3 ] . columnid = sColumnInfo. columnidcbMax ;
sJetRC [ 3 ] . cbData = sizeof ( DWORD ) ;
sJetRC [ 3 ] . itagSequence = 1 ;
sJetRC [ 3 ] . grbit = 0 ;
sJetRC [ 3 ] . pvData = & ci. dwMaxSize ;

GetColumns ( sTableInfo. STableName , sJetRC, 4 ) ;

ci. sName . assign ( reinterpret_cast < wchar_t * > ( sJetRC [ 0 ] . pvData ) , sJetRC [ 0 ] . cbActual / 2 ) ;

SDBColumnInfo sDBColumnInfo ;
sDBColumnInfo. sColumnName = ci. sName ;

sColumnsInfo. push_back ( ci ) ;
sTableInfo. sColumnInfo . push_back ( sDBColumnInfo ) ;
}
while ( ! TableEnd ( sTableInfo. sTableName ) ) ;

CloseTable ( sTableInfo. STableName ) ;
}

return JET_errSuccess ;
}

Here we again open the table, but not the root, but the one that we found in the previous step.

Next you need to get information about all the columns, for this we get a pointer to the first one and go to the last one going through one after the other:

JET_ERR CJetDBReaderCore :: MoveToFirst ( std :: wstring sTableName )
{
std :: map < std :: wstring , JET_TABLEID > :: const_iterator iter = m_tables. find ( sTableName ) ;
if ( iter ! = m_tables. end ( ) ) // if already open
{
JET_ERR jRes = _JetMove ( m_sesid, iter - > second, JET_MoveFirst, 0 ) ;
BOOL bIsEmpty = ( jRes == JET_errNoCurrentRecord ) ;
if ( bIsEmpty ) return jRes ; // Ingnore if empty
WRITE_TO_LOG_AND_RETURN_IF_ERROR_2 ( jRes ) ;
}

return NO_ERROR ;
}

JET_ERR CJetDBReaderCore :: GetTableColumnInfo (
std :: wstring sTableName,
JET_COLUMNLIST * pCl,
BOOL bReplaceOld )
{
JET_ERR jRes = JET_errSuccess ;
std :: map < std :: wstring , JET_TABLEID > :: iterator iter = m_tables. find ( sTableName ) ;
if ( iter ! = m_tables. end ( ) )
{
jRes = _JetGetTableColumnInfo ( m_sesid, iter - > second, NULL , pCl,
sizeof ( JET_COLUMNLIST ) , JET_ColInfoList ) ;
WRITE_TO_LOG_AND_RETURN_IF_ERROR_2 ( jRes )

if ( bReplaceOld ) // if you need last time open table
{
jRes = CloseTable ( sTableName ) ;
m_tables [ sTableName ] = pCl - > tableid ;
}
else
{
jRes = _JetCloseTable ( m_sesid, pCl - > tableid ) ;
WRITE_TO_LOG_AND_RETURN_IF_ERROR_2 ( jRes )
}
}

return jRes ;
}

Sorry, the first version of the post was cut off, and I did not have time to respond, apparently there is some limit on the size of the article, so it will be divided into two parts.
To be continued ...

In the next part, we finish reading the base, draw conclusions, and look at an example of data that can be “ripped out” from the base.

PS This code is an adapted and reduced version for the post. Therefore, there are some flaws in the code, or rather gags in the field of reduced functionality. Please do not pay attention to them this is not production, but I wanted to show working examples. The code is fully working and written so that it can be placed on the Internet and at the same time not “eat” all the space on the page. Thank you for understanding.

PPS I understand that in view of the specificity of this information is unlikely to be useful for a wide range of people, but if it helps, even to one person, I will be glad, and the time spent on this post will pay off.