📜 ⬆️ ⬇️

Amateur and back-engineering. Part 2: Frame



Last time I described the beginning of my relationship with reverse engineering. Some time passed and here, to some extent, the result of my research.

I am trying to restore the source code from the .dll library and the .pdb database. Using IDA certainly brought some results, but not satisfactory. Maybe I'm just not being diligent. Therefore, I started from the other side - with restoring the framework of the library project. Since I have a .pdb-base, I can do it. In theory. Theoretically, because the information from preprocessing files is written to the database, and not from the sources. So you need to work on.
')

Filling


I will begin the story with the theory. Structurally, a .pdb database is a set of characters (any variable, structure, function, enumeration, type, all these are characters) related to each other. Symbols are divided by type, and depending on the type I can have different properties. Reading properties, you can get a description of structures, functions, overrides, enumerations, constants, including the relationships between all of these, the names of files and .obj-modules in which functions are located, and much more. There is a DIA SDK (Debug Interface Access) for accessing symbols, it is well documented and is not very difficult to deal with. The only "problem" - DIA out of the box is available only for C / C ++, and if you want to work on .Net, you will need to work by overtaking the interface in .Net .dll, but this is another story. You can simply find the finished module. Personally, I chose the second option by finding Dia2Lib.dll, but in it some functions were translated incorrectly (instead of an array, a simple variable was inserted in the parameter of some functions).

Perhaps there is some ready-made solution for generating code on the .pdb-database, but I did not find it. And now I am writing my own. I write in C #, less hassle with memory, although at the cost of convenience of working with files. To begin with, classes were needed to describe the characters. The standard ones (those from Dia2Lib) are a bit awkward. More precisely, if you want to wiggle with data in three degrees of freedom, they simply won't stand it.
Classes for processing character data
class Member { public string name; public int offcet; //  public ulong length; //    public string type; //  ,  , ,   .. public string access; //  public uint id; //    } class BaseClass { public string type; public int offcet; //   public ulong length; public uint id; } class Function { public string name; public string type; public string access; public string filename; // ,    public uint id; } class Typedef { public string name; public string type; public uint id; } class Enum { public string name; public uint id; public SubEnum[] values; } class SubEnum { public string name; public dynamic value; public uint id; } class VTable { public ulong count; //  public string type; public uint id; } class SubStructure { public string name; public uint id; } class Structure { public string name; public uint id; public Member[] members; public BaseClass[] baseclass; public Function[] functions; public Typedef[] typedefs; public Enum[] enums; public VTable[] vtables; public SubStructure[] substructures; } 

A banal search of symbols can fill arrays of these structures and get the basis for the framework. After the problems begin. The first problem, it was already mentioned, all the structures from preprocessed files are recorded in the database. Like this:
The first example is not very necessary structure.
 struct /*id:2*/ _iobuf { /*off 0x00000000 size:0004 id:5*/ public: char * _ptr; /*off 0x00000004 size:0004 id:8*/ public: signed int _cnt; /*off 0x00000008 size:0004 id:5*/ public: char * _base; /*off 0x00000012 size:0004 id:8*/ public: signed int _flag; /*off 0x00000016 size:0004 id:8*/ public: signed int _file; /*off 0x00000020 size:0004 id:8*/ public: signed int _charbuf; /*off 0x00000024 size:0004 id:8*/ public: signed int _bufsiz; /*off 0x00000028 size:0004 id:5*/ public: char * _tmpfname; }; 

Very few people can use the structure from the standard library. But if you can still track them somehow, then there is a worse example.
The second example is not very necessary structure
 struct /*id:24371*/ std::allocator<std::_Tree_nod<std::_Tmap_traits<int,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<int>,std::allocator<std::pair<int const ,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >,0> >::_Node>:/*0x0 id:24351*/ std::_Allocator_base<std::_Tree_nod<std::_Tmap_traits<int,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<int>,std::allocator<std::pair<int const ,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >,0> >::_Node> { // /*id:24362*/ public: __thiscall const std::_Tree_nod<std::_Tmap_traits<int,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<int>,std::allocator<std::pair<int const ,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >,0> >::_Node * address (const std::_Tree_nod<std::_Tmap_traits<int,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<int>,std::allocator<std::pair<int const ,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >,0> >::_Node &); // /*id:24364*/ public: __thiscall std::_Tree_nod<std::_Tmap_traits<int,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<int>,std::allocator<std::pair<int const ,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >,0> >::_Node * address (std::_Tree_nod<std::_Tmap_traits<int,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<int>,std::allocator<std::pair<int const ,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >,0> >::_Node &); // /*id:24367*/ public: __thiscall void allocator<std::_Tree_nod<std::_Tmap_traits<int,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<int>,std::allocator<std::pair<int const ,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >,0> >::_Node> (const std::allocator<std::_Tree_nod<std::_Tmap_traits<int,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<int>,std::allocator<std::pair<int const ,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >,0> >::_Node> &); // /*id:24372*/ public: __thiscall void allocator<std::_Tree_nod<std::_Tmap_traits<int,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<int>,std::allocator<std::pair<int const ,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >,0> >::_Node> (); //:d:\program files\microsoft visual studio .net 2003\vc7\include\xmemory /*id:24374 */public: void __thiscall std::allocator<struct std::_Tree_nod<class std::_Tmap_traits<int,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,struct std::less<int>,class std::allocator<struct std::pair<int const ,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > >,0> >::_Node>::deallocate(struct std::_Tree_nod<class std::_Tmap_traits<int,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,struct std::less<int>,class std::allocator<struct std::pair<int const ,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > >,0> >::_Node *,unsigned int); // /*id:24376*/ public: __thiscall std::_Tree_nod<std::_Tmap_traits<int,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<int>,std::allocator<std::pair<int const ,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >,0> >::_Node * allocate (unsigned int ,const void *); //:d:\program files\microsoft visual studio .net 2003\vc7\include\xmemory /*id:24378 */public: struct std::_Tree_nod<class std::_Tmap_traits<int,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,struct std::less<int>,class std::allocator<struct std::pair<int const ,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > >,0> >::_Node * __thiscall std::allocator<struct std::_Tree_nod<class std::_Tmap_traits<int,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,struct std::less<int>,class std::allocator<struct std::pair<int const ,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > >,0> >::_Node>::allocate(unsigned int); // /*id:24380*/ public: __thiscall void construct (std::_Tree_nod<std::_Tmap_traits<int,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<int>,std::allocator<std::pair<int const ,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >,0> >::_Node *,const std::_Tree_nod<std::_Tmap_traits<int,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<int>,std::allocator<std::pair<int const ,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >,0> >::_Node &); //:d:\program files\microsoft visual studio .net 2003\vc7\include\xmemory /*id:24384 */public: void __thiscall std::allocator<struct std::_Tree_nod<class std::_Tmap_traits<int,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,struct std::less<int>,class std::allocator<struct std::pair<int const ,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > >,0> >::_Node>::destroy(struct std::_Tree_nod<class std::_Tmap_traits<int,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,struct std::less<int>,class std::allocator<struct std::pair<int const ,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > >,0> >::_Node *); // /*id:24386*/ public: __thiscall unsigned int max_size (); structure /*id:24353*/ value_type; typedef /*id:24352*/std::_Allocator_base<std::_Tree_nod<std::_Tmap_traits<int,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<int>,std::allocator<std::pair<int const ,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >,0> >::_Node> _Mybase; typedef /*id:24354*/std::_Tree_nod<std::_Tmap_traits<int,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<int>,std::allocator<std::pair<int const ,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >,0> >::_Node * pointer; typedef /*id:24355*/std::_Tree_nod<std::_Tmap_traits<int,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<int>,std::allocator<std::pair<int const ,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >,0> >::_Node & reference; typedef /*id:24357*/const std::_Tree_nod<std::_Tmap_traits<int,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<int>,std::allocator<std::pair<int const ,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >,0> >::_Node * const_pointer; typedef /*id:24359*/const std::_Tree_nod<std::_Tmap_traits<int,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<int>,std::allocator<std::pair<int const ,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >,0> >::_Node & const_reference; typedef /*id:24360*/unsigned int size_type; typedef /*id:24361*/signed int difference_type; }; 

And even if you make a filter on the standard template structures, there will be a bunch of language features that unfold or change during translation. As an example, I can name custom templates.
Sample pattern sweep
 struct /*id:16851*/ S_BVECTOR<D3DXVECTOR2> { /*off 0x00000000 size:0016 id:9357*/ private: std::vector<D3DXVECTOR2,std::allocator<D3DXVECTOR2> > m_VECPath; /*off 0x00000016 size:0004 id:8*/ private: signed int m_nCount; /*off 0x00000020 size:0004 id:8*/ private: signed int m_nPos; /*id:9360 */public: __thiscall S_BVECTOR<struct D3DXVECTOR2>::S_BVECTOR<struct D3DXVECTOR2>(class S_BVECTOR<struct D3DXVECTOR2> const &); /*id:9362 */public: __thiscall S_BVECTOR<struct D3DXVECTOR2>::S_BVECTOR<struct D3DXVECTOR2>(void); /*id:9364 */public: void __thiscall S_BVECTOR<struct D3DXVECTOR2>::resize(unsigned short); /*id:9366*/ public: __thiscall void addsize (unsigned short ); /*id:9368 */public: void __thiscall S_BVECTOR<struct D3DXVECTOR2>::setsize(unsigned short); /*id:9369*/ public: __thiscall void setsizeNew (unsigned short ); /*id:9370 */public: void __thiscall S_BVECTOR<struct D3DXVECTOR2>::clear(void); /*id:9371 */public: void __thiscall S_BVECTOR<struct D3DXVECTOR2>::push_back(struct D3DXVECTOR2 &); /*id:9373*/ public: __thiscall void pop_front (); /*id:9374*/ public: __thiscall void pop_back (); /*id:9375 */public: int __thiscall S_BVECTOR<struct D3DXVECTOR2>::size(void); /*id:9377 */public: bool __thiscall S_BVECTOR<struct D3DXVECTOR2>::empty(void); /*id:9379*/ public: __thiscall D3DXVECTOR2 * front (); /*id:9381*/ public: __thiscall D3DXVECTOR2 * next (); /*id:9382*/ public: __thiscall D3DXVECTOR2 * end (); /*id:9383 */public: struct D3DXVECTOR2 * __thiscall S_BVECTOR<struct D3DXVECTOR2>::operator[](int); /*id:9385*/ public: __thiscall void remove (signed int ); /*id:9387 */public: __thiscall S_BVECTOR<struct D3DXVECTOR2>::~S_BVECTOR<struct D3DXVECTOR2>(void); /*id:9388*/ public: __thiscall void * __vecDelDtor (unsigned int ); }; 

Of course, everything can be quite easily returned to the original form. But situations where manual processing is needed can be quite a lot. For example, for the library that I want to use, 2673 structures are written in the database. Of these, only about 250 are really necessary, the rest is sweeping std patterns and other “standard” things. It remains only to hope that everything will pass without problems. Well, suppose that there are blanks for structures. Next you need to write them in files.

Generation


First you need the files themselves to write. A bit of theory. When compiling, each source code with the code after the preprocessor is translated, using the compiler, into machine codes. From each source code with the code, an .obj file or a .o file is obtained depending on the compiler. With the help of the DIA SDK, you can get a list of all files from each .obj-module (in short, the entire list of what is included in #include). How to get a list of files was described in the last article (well, as described ... well, there is a code there ). If we speak the language of an amateur - from each .obj-module you can get the name of the source code, which was once a module (they will have the same name), and a list of included libraries (this includes all files except .cpp, although there are exceptions). After creating a general structure, and linking the parts together, you can begin recording structures.

Directly get the name of the file in which there was a structure when it existed as a source, as far as I know, is impossible. But you can find out by what files the implementation of the structure methods was scattered. Therefore, I propose to simply collect all the files that include functions-methods, select the one that will be the header, write the description there, and associate the rest of the files with the header. But when you get the name of the source code, in which the method is located, there may be an unpleasant or bug, or a file error. To get the name, you first need to find the list of source lines by RVA (relative virtual address), and then find the file which contains these lines by this line list. But sometimes the number of lines corresponding to the method is zero, but the file name is still there. And usually the wrong name. This is usually manifested in the analysis of the designer. Maybe the designer simply does not consider ...
An example of a structure with a broken constructor location
 //      -   .     -    ,   . //e:\????\kop\project\mindpower\sdk\src\mpfont.cpp //e:\????\kop\project\mindpower\sdk\src\i_effect.cpp //e:\????\kop\project\mindpower\sdk\include\i_effect.h struct /*id:9920*/ CTexList { /*off 0x00000000 size:0002 id:1138*/ public: unsigned short m_wTexCount; /*off 0x00000004 size:0004 id:1778*/ public: float m_fFrameTime; /*off 0x00000008 size:0016 id:9726*/ public: std::vector<std::vector<D3DXVECTOR2,std::allocator<D3DXVECTOR2> >,std::allocator<std::vector<D3DXVECTOR2,std::allocator<D3DXVECTOR2> > > > m_vecTexList; /*off 0x00000024 size:0028 id:98*/ public: std::basic_string<char,std::char_traits<char>,std::allocator<char> > m_vecTexName; /*off 0x00000052 size:0004 id:8384*/ public: IDirect3DTexture8 * m_lpCurTex; /*off 0x00000056 size:0004 id:8130*/ public: MindPower::lwITex * m_pTex; //:e:\????\kop\project\mindpower\sdk\src\mpfont.cpp[0] /*id:9921*/ public: __thiscall void CTexList::CTexList (const CTexList &); //:e:\????\kop\project\mindpower\sdk\src\i_effect.cpp[3] /*id:9927*/ public: __thiscall void CTexList::CTexList (); //:e:\????\kop\project\mindpower\sdk\src\i_effect.cpp[2] /*id:9929*/ public: __thiscall void CTexList::~CTexList (); //:e:\????\kop\project\mindpower\sdk\src\i_effect.cpp[3] /*id:9930*/ public: __thiscall void CTexList::SetTextureName (const std::basic_string<char,std::char_traits<char>,std::allocator<char> > &); //:e:\????\kop\project\mindpower\sdk\src\i_effect.cpp[16] /*id:9932*/ public: __thiscall void CTexList::GetTextureFromModel (CEffectModel *); //:e:\????\kop\project\mindpower\sdk\src\i_effect.cpp[25] /*id:9934*/ public: __thiscall void CTexList::CreateSpliteTexture (signed int ,signed int ); //:e:\????\kop\project\mindpower\sdk\src\i_effect.cpp[16] /*id:9936*/ public: __thiscall void CTexList::GetCurTexture (S_BVECTOR<D3DXVECTOR2> &,unsigned short &,float &,float ); //:e:\????\kop\project\mindpower\sdk\src\i_effect.cpp[2] /*id:9938*/ public: __thiscall void CTexList::Reset (); //:e:\????\kop\project\mindpower\sdk\src\i_effect.cpp[7] /*id:9939*/ public: __thiscall void CTexList::Clear (); //:e:\????\kop\project\mindpower\sdk\src\i_effect.cpp[6] /*id:9940*/ public: __thiscall void CTexList::Remove (); //:e:\????\kop\project\mindpower\sdk\include\i_effect.h[12] /*id:9941*/ public: __thiscall void CTexList::Copy (CTexList *); }; 

Usually, which is not surprising, the structures are in two files, header.h and code.cpp, but there are other options. For example, the structure has only a header, or the code file is represented with the extension .inl, or the structure is nowhere else, in the opinion of the .pdb database, to be written. I used the following algorithm. If there is a header in the list of files that includes the structure, we write the structure into the header, and connect it to the file with the code, if it exists. We go through the structure, making a list of all types that are used. If the type is a structure, and there is a list of files for it - we connect the header of this structure, otherwise we write this structure to the beginning of the file. There is one more unpleasant moment: the structures love to duplicate. I have not the slightest idea why many of them occur several times, and one after the other (in fact, not one after the other, there are many standard templates between them, but if you turn on the filter, then one after the other). At the same time, the properties \ methods of such structures coincide, and they differ only in their sequence number. Personally, I just sorted the array with the structures for the names of the structures, and when sorting all the elements I compared the name of the current with the name of the previous one. And it all worked.

Result


Although everything worked, but, of course, not as I would like. It certainly created a bunch of files that, in general, reflected, as I hope, the structure of the original project, but there is such a mess ...
One of the generated files is lwitem.h
 //         #ifndef __MINDPOWER::LWITEM__ #define __MINDPOWER::LWITEM__ #ifndef _MINDPOWER::LWIRESOURCEMGR_ #define _MINDPOWER::LWIRESOURCEMGR_ struct MindPower::lwIResourceMgr:MindPower::lwInterface { //57  }; #endif #ifndef _MINDPOWER::LWISCENEMGR_ #define _MINDPOWER::LWISCENEMGR_ struct MindPower::lwISceneMgr:MindPower::lwInterface { //15  }; #endif #ifndef _MINDPOWER::LWLINKCTRL_ #define _MINDPOWER::LWLINKCTRL_ struct MindPower::lwLinkCtrl { //3  }; #endif #include lwitypes2.h #ifndef _STD::ALLOCATOR<STD::BASIC_STRING<CHAR,STD::CHAR_TRAITS<CHAR>,STD::ALLOCATOR<CHAR> > >::REBIND<T>_ #define _STD::ALLOCATOR<STD::BASIC_STRING<CHAR,STD::CHAR_TRAITS<CHAR>,STD::ALLOCATOR<CHAR> > >::REBIND<T>_ struct std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >::rebind<T> { typedef std::allocator<std::_List_nod<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Node *> other; }; #endif #ifndef _MINDPOWER::LWIPRIMITIVE_ #define _MINDPOWER::LWIPRIMITIVE_ struct MindPower::lwIPrimitive:MindPower::lwInterface { //46  }; #endif #include d3dx8math.h #ifndef _STD::_NUM_FLOAT_BASE_ #define _STD::_NUM_FLOAT_BASE_ struct std::_Num_float_base:std::_Num_base { //16 - }; #endif #ifndef _MINDPOWER::LWIITEM_ #define _MINDPOWER::LWIITEM_ struct MindPower::lwIItem:MindPower::lwInterface { //26  }; #endif #ifndef _MINDPOWER::LWITEM_ #define _MINDPOWER::LWITEM_ struct MindPower::lwItem:MindPower::lwIItem { //12  //34  }; #endif #endif 

The main errors are: there are no namespaces, there is no filter for standard templates and replacing them with library connections, no internal file structure, I’m the author of the bottom and let it not generate the govnokod and much more. In short, the work is still a blockage. That's all for now, I am attaching the generator code, maybe someone will need it.
Amateur kodogenerator on github

So it goes.

Source: https://habr.com/ru/post/240831/


All Articles