Hello to all,
After quite a long time since the writing of the first article, I still decided, albeit a little bit, but to write articles on the topic of modifying / improving IDA Pro .
In this article we will talk about how to correct the jambs in those processor modules, the source of which you do not have, and the jambs just do not give to live. Unfortunately, not all the problems listed below can be attributed to the jambs, so developers are unlikely to implement them.
Note: hereinafter, errors in the Motorola M68000 module (my favorite and very often used) will be considered.
So the first joint : addressing the PC register. The mistake is that the disassembly listing for such instructions is not always correct. Take a look at the screenshot:
It seems that there is no error here. Moreover, its presence does not interfere with the analysis. But, the opcode is disassembled incorrectly. Let's look at the dizasm in some online disassembler:
We see that addressing should be relative to the PC- register, since the target address of the link falls within the range signed short
Cant two : "mirrors" for RAM, and some other regions. Since addressing in m68k is 24-bit, then all calls to the older (or vice versa, younger) regions should be redirected to the same range as the cross-references.
The jamb is three (or rather, not even a joint, but a lack of functionality): the so-called lineA ( 1010 ) and lineF ( 1111 ) emulators. These are opcodes for which the main instruction set was not enough, so they must be processed in a special way by interrupt vectors. The size of opcodes depends only on the implementation at handler level. I saw only a two-byte implementation. We will add.
The shoal of four : trap #N instructions do not give crefs to the handlers themselves traps.
The jamb is five : the movea.w instruction should make a full xref to the address from the word- link, but we only have a word- number.
In order to understand how to fix a specific processor module, you need to understand what opportunities we have on this topic in principle and what the "fix" is.
Actually, the "patch" is a regular plugin. It seems to be written in Python , but, I did everything in "pluses". Only portability suffers, but if someone takes a rewrite of the plugin in Python - I will be very grateful.
To begin, create an empty DLL project in Visual Studio : File-> New-> Project-> Windows Desktop Wizard-> Dynamic link library (.dll), also by checking the Empty Project checkbox , and removing all the others:
Let's unpack the IDA SDK and write it in Visual Studio macros (I will use 2017 ) so that in the future you can easily link to it. At the same time, we will add a macro for the path to IDA Pro .
Go to View -> Other Windows -> Property Manager :
Since we are working with the SDK version 7.0 , the compilation will be done by the x64 compiler. Therefore, choose Debug | x64 -> Microsoft.Cpp.x64.user -> Properties :
Click the Add Macro button in the User Macros section, and write the IDA_SDK macro there with the path you have unpacked the SDK :
We do the same with IDA_DIR (the path to your IDA Pro ):
I note that IDA is set by default to % Program Files% , which requires administrative rights.
Let's also remove the Win32 configuration (in this article I will not affect the compilation for the x86 system), leaving only the x64 option.
Create an empty ida_plugin.cpp file. We do not add the code yet.
Now it is possible to set the encoding and other settings for C ++ :
Let's write inkludy:
And libraries from the SDK :
Now add the code template:
#include <ida.hpp> #include <idp.hpp> #include <ua.hpp> #include <bytes.hpp> #include <loader.hpp> #include <offset.hpp> #define NAME "M68000 proc-fixer plugin" #define VERSION "1.0" static bool plugin_inited; static bool my_dbg; //-------------------------------------------------------------------------- static void print_version() { static const char format[] = NAME " v%s\n"; info(format, VERSION); msg(format, VERSION); } //-------------------------------------------------------------------------- static bool init_plugin(void) { if (ph.id != PLFM_68K) return false; return true; } #ifdef _DEBUG static const char* const optype_names[] = { "o_void", "o_reg", "o_mem", "o_phrase", "o_displ", "o_imm", "o_far", "o_near", "o_idpspec0", "o_idpspec1", "o_idpspec2", "o_idpspec3", "o_idpspec4", "o_idpspec5", }; static const char* const dtyp_names[] = { "dt_byte", "dt_word", "dt_dword", "dt_float", "dt_double", "dt_tbyte", "dt_packreal", "dt_qword", "dt_byte16", "dt_code", "dt_void", "dt_fword", "dt_bitfild", "dt_string", "dt_unicode", "dt_3byte", "dt_ldbl", "dt_byte32", "dt_byte64", }; static void print_insn(const insn_t *insn) { if (my_dbg) { msg("cs=%x, ", insn->cs); msg("ip=%x, ", insn->ip); msg("ea=%x, ", insn->ea); msg("itype=%x, ", insn->itype); msg("size=%x, ", insn->size); msg("auxpref=%x, ", insn->auxpref); msg("segpref=%x, ", insn->segpref); msg("insnpref=%x, ", insn->insnpref); msg("insnpref=%x, ", insn->insnpref); msg("flags["); if (insn->flags & INSN_MACRO) msg("INSN_MACRO|"); if (insn->flags & INSN_MODMAC) msg("OF_OUTER_DISP"); msg("]\n"); } } static void print_op(ea_t ea, const op_t *op) { if (my_dbg) { msg("type[%s], ", optype_names[op->type]); msg("flags["); if (op->flags & OF_NO_BASE_DISP) msg("OF_NO_BASE_DISP|"); if (op->flags & OF_OUTER_DISP) msg("OF_OUTER_DISP|"); if (op->flags & PACK_FORM_DEF) msg("PACK_FORM_DEF|"); if (op->flags & OF_NUMBER) msg("OF_NUMBER|"); if (op->flags & OF_SHOW) msg("OF_SHOW"); msg("], "); msg("dtyp[%s], ", dtyp_names[op->dtype]); if (op->type == o_reg) msg("reg=%x, ", op->reg); else if (op->type == o_displ || op->type == o_phrase) msg("phrase=%x, ", op->phrase); else msg("reg_phrase=%x, ", op->phrase); msg("addr=%x, ", op->addr); msg("value=%x, ", op->value); msg("specval=%x, ", op->specval); msg("specflag1=%x, ", op->specflag1); msg("specflag2=%x, ", op->specflag2); msg("specflag3=%x, ", op->specflag3); msg("specflag4=%x, ", op->specflag4); msg("refinfo["); opinfo_t buf; if (get_opinfo(&buf, ea, op->n, op->flags)) { msg("target=%x, ", buf.ri.target); msg("base=%x, ", buf.ri.base); msg("tdelta=%x, ", buf.ri.tdelta); msg("flags["); if (buf.ri.flags & REFINFO_TYPE) msg("REFINFO_TYPE|"); if (buf.ri.flags & REFINFO_RVAOFF) msg("REFINFO_RVAOFF|"); if (buf.ri.flags & REFINFO_PASTEND) msg("REFINFO_PASTEND|"); if (buf.ri.flags & REFINFO_CUSTOM) msg("REFINFO_CUSTOM|"); if (buf.ri.flags & REFINFO_NOBASE) msg("REFINFO_NOBASE|"); if (buf.ri.flags & REFINFO_SUBTRACT) msg("REFINFO_SUBTRACT|"); if (buf.ri.flags & REFINFO_SIGNEDOP) msg("REFINFO_SIGNEDOP"); msg("]"); } msg("]\n"); } } #endif static bool ana_addr = 0; static ssize_t idaapi hook_idp(void *user_data, int notification_code, va_list va) { switch (notification_code) { case processor_t::ev_ana_insn: { insn_t *out = va_arg(va, insn_t*); if (ana_addr) break; ana_addr = 1; if (ph.ana_insn(out) <= 0) { ana_addr = 0; break; } ana_addr = 0; #ifdef _DEBUG print_insn(out); #endif for (int i = 0; i < UA_MAXOP; ++i) { op_t &op = out->ops[i]; #ifdef _DEBUG print_op(out->ea, &op); #endif } return out->size; } break; case processor_t::ev_emu_insn: { const insn_t *insn = va_arg(va, const insn_t*); } break; case processor_t::ev_out_mnem: { outctx_t *outbuffer = va_arg(va, outctx_t *); //outbuffer->out_custom_mnem(mnem); //return 1; } break; default: { #ifdef _DEBUG if (my_dbg) { msg("msg = %d\n", notification_code); } #endif } break; } return 0; } //-------------------------------------------------------------------------- static int idaapi init(void) { if (init_plugin()) { plugin_inited = true; my_dbg = false; hook_to_notification_point(HT_IDP, hook_idp, NULL); print_version(); return PLUGIN_KEEP; } return PLUGIN_SKIP; } //-------------------------------------------------------------------------- static void idaapi term(void) { if (plugin_inited) { unhook_from_notification_point(HT_IDP, hook_idp); plugin_inited = false; } } //-------------------------------------------------------------------------- static bool idaapi run(size_t /*arg*/) { return false; } //-------------------------------------------------------------------------- const char comment[] = NAME; const char help[] = NAME; //-------------------------------------------------------------------------- // // PLUGIN DESCRIPTION BLOCK // //-------------------------------------------------------------------------- plugin_t PLUGIN = { IDP_INTERFACE_VERSION, PLUGIN_PROC | PLUGIN_MOD, // plugin flags init, // initialize term, // terminate. this pointer may be NULL. run, // invoke plugin comment, // long comment about the plugin // it could appear in the status line // or as a hint help, // multiline help about the plugin NAME, // the preferred short name of the plugin "" // the preferred hotkey to run the plugin };
The print_op()
and print_insn()
functions are needed to understand which flags are set by the current processor module for certain instructions. This is necessary if we want to find some flags for the existing opcodes, so that we can use them when correcting.
Actually, the body of our "patch" is the function hook_idp()
. In it for our needs we need to implement three callbacks:
processor_t::ev_ana_insn
: needed if there is no implementation of some opcodes in the processor moduleprocessor_t::ev_emu_insn
: here you can create cross refs for data / code that new opcodes reference (or old ones do not reference)processor_t::ev_out_mnem
: new opcodes should somehow be displayed. It's all hereThe init_plugin()
function does not allow our patching to load in other processor modules.
And, most importantly, we hang the whole callback on the events of the processor module:
hook_to_notification_point(HT_IDP, hook_idp, NULL);
The trick to the global variableana_addr
needed so thatana_insn
does not go into recursion when trying to get information about instructions that we do not manually parse. Yes, alas, this “crutch” stretches for a very long time, from the old versions.
In order to properly solve this problem, I had to tinker a lot with the debugging output that I just implemented for this task. I knew that in some cases, IDA successfully displays links on the PC (in instructions where a jump occurs on the offset table, which is close to the current instruction, plus register-index), but for the lea
instruction, the correct mapping of addressing is not implemented. As a result, I found such an instruction with a jump, and found out which flags should be set so that the PC with the brackets is displayed:
case processor_t::ev_ana_insn: { insn_t *out = va_arg(va, insn_t*); if (ana_addr) break; ana_addr = 1; if (ph.ana_insn(out) <= 0) { ana_addr = 0; break; } ana_addr = 0; for (int i = 0; i < UA_MAXOP; ++i) { op_t &op = out->ops[i]; switch (op.type) { case o_near: case o_mem: { if (out->itype != 0x76 || op.n != 0 || (op.phrase != 0x09 && op.phrase != 0x0A) || (op.addr == 0 || op.addr >= (1 << 23)) || op.specflag1 != 2) // lea table(pc),Ax break; short diff = op.addr - out->ea; if (diff >= SHRT_MIN && diff <= SHRT_MAX) { out->Op1.type = o_displ; out->Op1.offb = 2; out->Op1.dtype = dt_dword; out->Op1.phrase = 0x5B; out->Op1.specflag1 = 0x10; } } break; } } return out->size; } break;
It's simple. Just mask the addresses on a specific range: 0xFF0000-0xFFFFFF (for RAM) and 0xC00000-0xC000FF (for VDP video memory). The main thing here is to filter by o_near
and o_mem
.
case processor_t::ev_ana_insn: { insn_t *out = va_arg(va, insn_t*); if (ana_addr) break; ana_addr = 1; if (ph.ana_insn(out) <= 0) { ana_addr = 0; break; } ana_addr = 0; for (int i = 0; i < UA_MAXOP; ++i) { op_t &op = out->ops[i]; switch (op.type) { case o_near: case o_mem: { op.addr &= 0xFFFFFF; // for any mirrors if ((op.addr & 0xE00000) == 0xE00000) // RAM mirrors op.addr |= 0x1F0000; if ((op.addr >= 0xC00000 && op.addr <= 0xC0001F) || (op.addr >= 0xC00020 && op.addr <= 0xC0003F)) // VDP mirrors op.addr &= 0xC000FF; } break; } } return out->size; } break;
Actually, to add the desired opcode, you must:
CUSTOM_INSN_ITYPE
enum m68k_insn_type_t { M68K_linea = CUSTOM_INSN_ITYPE, M68K_linef, };
value = get_dword(0x0A * sizeof(uint32)); // ... value = get_dword(0x0B * sizeof(uint32));
ev_emu_insn
add ev_emu_insn
to handlers and to the following instruction so that the code-flow is not interrupted: insn->add_cref(insn->Op1.addr, 0, fl_CN); // code ref insn->add_cref(insn->ea + insn->size, insn->Op1.offb, fl_F); // flow ref
ev_out_mnem
we display our custom opcode: const char *mnem = (outbuffer->insn.itype == M68K_linef) ? "line_f" : "line_a"; outbuffer->out_custom_mnem(mnem);
enum m68k_insn_type_t { M68K_linea = CUSTOM_INSN_ITYPE, M68K_linef, }; /* after includes */ case processor_t::ev_ana_insn: { insn_t *out = va_arg(va, insn_t*); if (ana_addr) break; uint16 itype = 0; ea_t value = out->ea; uchar b = get_byte(out->ea); if (b == 0xA0 || b == 0xF0) { switch (b) { case 0xA0: itype = M68K_linea; value = get_dword(0x0A * sizeof(uint32)); break; case 0xF0: itype = M68K_linef; value = get_dword(0x0B * sizeof(uint32)); break; } out->itype = itype; out->size = 2; out->Op1.type = o_near; out->Op1.offb = 1; out->Op1.dtype = dt_dword; out->Op1.addr = value; out->Op1.phrase = 0x0A; out->Op1.specflag1 = 2; out->Op2.type = o_imm; out->Op2.offb = 1; out->Op2.dtype = dt_byte; out->Op2.value = get_byte(out->ea + 1); } return out->size; } break; case processor_t::ev_emu_insn: { const insn_t *insn = va_arg(va, const insn_t*); if (insn->itype == M68K_linea || insn->itype == M68K_linef) { insn->add_cref(insn->Op1.addr, 0, fl_CN); insn->add_cref(insn->ea + insn->size, insn->Op1.offb, fl_F); return 1; } } break; case processor_t::ev_out_mnem: { outctx_t *outbuffer = va_arg(va, outctx_t *); if (outbuffer->insn.itype != M68K_linea && outbuffer->insn.itype != M68K_linef) break; const char *mnem = (outbuffer->insn.itype == M68K_linef) ? "line_f" : "line_a"; outbuffer->out_custom_mnem(mnem); return 1; } break;
It is solved this way: we find the opcode for the trap
instruction, we obtain the index from the instruction, and take the vector handler at this index. Something like this will turn out:
case processor_t::ev_emu_insn: { const insn_t *insn = va_arg(va, const insn_t*); if (insn->itype == 0xB6) // trap #X { qstring name; ea_t trap_addr = get_dword((0x20 + (insn->Op1.value & 0xF)) * sizeof(uint32)); get_func_name(&name, trap_addr); set_cmt(insn->ea, name.c_str(), false); insn->add_cref(trap_addr, insn->Op1.offb, fl_CN); return 1; } } break;
Here, too, everything is simple: first, we filter on the movea.w
operation. Then, if the operand is of type word, and refers to RAM, we make the link in a steep way, relative to the base 0xFF0000. It will look like this:
case processor_t::ev_ana_insn: { insn_t *out = va_arg(va, insn_t*); if (ana_addr) break; ana_addr = 1; if (ph.ana_insn(out) <= 0) { ana_addr = 0; break; } ana_addr = 0; for (int i = 0; i < UA_MAXOP; ++i) { op_t &op = out->ops[i]; switch (op.type) { case o_imm: { if (out->itype != 0x7F || op.n != 0) // movea break; if (op.value & 0xFF0000 && op.dtype == dt_word) { op.value &= 0xFFFF; } } break; } } return out->size; } break; case processor_t::ev_emu_insn: { const insn_t *insn = va_arg(va, const insn_t*); for (int i = 0; i < UA_MAXOP; ++i) { const op_t &op = insn->ops[i]; switch (op.type) { case o_imm: { if (insn->itype != 0x7F || op.n != 0 || op.dtype != dt_word) // movea break; op_offset(insn->ea, op.n, REF_OFF32, BADADDR, 0xFF0000); } break; } } } break;
In fact, fixing existing modules is not a very simple task, if it concerns not just the implementation of unknown opcodes, but something more complicated.
It takes hours of debugging an existing implementation, understanding what is happening in it (sometimes even reversing a percent module). But the result is worth it.
Link to source: https://github.com/lab313ru/m68k_fixer
Source: https://habr.com/ru/post/424263/
All Articles