How was the local version of the site Intuit.Ru hacked

Connection establishment ...

Looking through the results (rather disappointingly) of the next exam for INTUIT, I wondered for the hundredth time the question “Why is this question again not right ?! After all, he was 100% sure of the answer ... ”. Having shown a bit of cowardice, I decided to look for the correct answers to the tests and was greatly surprised by their lack of public access. Something was certainly there, but it was basically a lure of the form "10 random answers for free, and for the rest, please transfer us a coin." The money is not big there, but I didn’t want to pay for the answers, and I decided to go another way.

The first crazy thought was to try to search for bugs on the site, but I quickly refused it because the next one came. I remembered that in the old days INTUIT was selling discs with a local version of its site, which allowed it to be trained without being connected to the Internet, and so on. pass exams. The results could then be synchronized upon connection. Then a logical chain appeared in the brain: “You can pass tests offline => a check is also performed offline => somewhere there is a file with answers”.

Some 4.5GB separated me from my happiness and the solution of all my dilemmas! The disc was found pretty quickly on some tracker and put on download. From boredom began to read the comments. Among the placer of thanks, someone asked: “can I see the answers in the same place?”. He was never answered, but then it did not alert me at all. And the image was downloaded and installed (it was set for quite a long time). Let's see what we have here.
')

Dive

The internal structure of the root directory looked like this:

html lib local_web_server intu32.ico INTUIT.exe intuit.ini trayicon.ico uninstall.exe

Somehow I was immediately drawn to the lib folder. And rightly drawn. A bunch of * .pm files and a couple of directories with interesting names " course " and " test ". Looking into the test folder and examining its contents, I realized that I did not understand anything. It contained 3 .pm files, which, as you may have guessed, turned out to be perl-packages. I must say that I am absolutely not a pearl guru. I wrote something on it once for experiments, but that is the case of bygone days. The system, which fell into my hands, was completely written in pearl. But this is not a reason to surrender and the excavation was continued.

In the course directory there are many files with names very similar to abbreviated course naming. How to find the one that I needed? It’s time, apparently, to launch a shortcut from the desktop and see what they offer there. Some time after the launch, the default browser started and loaded the page at localhost : 3232. It is interesting. So somewhere inside this wigwam a full-fledged Apache still lives, or at least some Denver. But more about that later.

Logged in as admin: admin, I decided to try the exam for the course I needed to see how the whole system works. After noting the answers to the questions and clicking on the submit button, I saw the following text: “You scored {n}% points. We correctly solved {m} tasks from {k}. ” And then the glance fell on the browser line in which the path to the course was specified, but not in the lib directory, but in the html directory. There was also an abbreviated course name. It seems that I have already seen a similar file ... Running into the lib / courses directory, opening oopbase.pm and seeing several procedures, some kind of hash array% lecture, but the picture below is much more interesting! Here he is handsome! The hash is an array of% test with all questions, as well as answer options. "All the problem is solved!" I thought at that moment, but as I studied this hash, the beautiful pink bird Oblomingo knocked me out of the window more and more obsessively ... There were no answers. The hash is very large, so I bring it a piece that is of interest:

 830 => ['830','50','5',"16ab48786a5d520fe6eeea7f1a6e140b", [['5708','830','1','10',"7ce21606425e2b20e566f422696b92de", [['16692','5708','1','1',"  –  ","e9b0b5d2f98bf71489891a48f80cb868", "387b3b67d9c1248131c136eae3e6cab9", [['349301','16692','1',"",], ['349302','16692','2',"  ,     ",], ['349303','16692','3',"    ",], ['349304','16692','4',"     ",],],],

The key 830 is the value that was assigned to the key from the% lecture hash. At the same time I bring in his part:

 my %lecture = ( 11 => 830, 7 => 826, 17 => 836, 2 => 821, 1 => 820, … );

Now it becomes clear that the questions that correspond to key 830 belong to the 11th lecture. And this is the only thing that became clear at that time. Since I didn’t have more clever thoughts about this file, it was decided to go around, as all normal characters do. My path lay to the further study of the file structure, maybe there is the same table of answers. I even thought of how it could look like. The table was not found, but a careful examination of the lib directory led me along the test.pm => etest.pm chain to a rather amusing ext.pm file which contained the procedures:

 sub check_answer_exam sub check_answer_exam_extern sub check_answer

Which in turn caused the same-name procedures from the “er” package. The package was found quite quickly and the analysis of its contents showed that it loads the er.dll library through the XSLoader , which was lying nearby (and next to it was exactly the same, but under * nix with the .so extension). The picture began to take shape, and I went to look at the export table of this library.

We need to go deeper!

The export table has killed all the sense of beauty in me. It contained one function boot_er . But what about check_answer and other buns ... It began to smell like kerosene and required countermeasures: good tea and IDA Pro!

It turned out that the library is not as simple as it seems and contains a whole bunch of all sorts of procedures, and in addition it has an interesting import table. In addition to quite familiar imports from KERNEL32 and msvcrt, I saw a whole bunch of functions from the perl58.dll library and everything would probably not be so interesting if these functions were easily found in Google. But they were not.

Maybe this is all from the lack of pearl barley experience and I do not know how to look for such functions, but I have not found an exhaustive description anywhere. Only scraps of information on the forums. Go ahead. The bold pressing of Shift + F12 showed a lot of potentially interesting things.

In particular, the string Usage: er :: check_answer (a, local_user_id, t, answer) gave me the idea of calling this library from my own perl script. The only catch was the input format. Having picked in scripts with which help testing was carried out, I saw several calls of print which interested me. The values were not output to a log file, but directly to the console. It is logical to assume that this is the console of the web server and the matter remained for a small one: to find this console.

Inspection of the local_web_server catalog showed that there was no smell of Apache, or even Denver. The server was written in perl and was provided in two versions of “server_unix.pl” and “server_win32.pl”, and the main startup file “intuit.exe” which is located above the directory, just started all this stuff, as indicated by the following lines obtained with using IDA from this file

Great, then you can start all this manually and see what is written in STDOUT. Selecting the first test in the course I needed and ticking the correct answers, I pressed the submit button and saw the next console output.

Here are the first cookies! Let us examine the values that are of practical interest:

820 - test ID that corresponds to the first lecture (based on the% lecture hash array)
tasks => 4 is the number of correctly solved tasks.
answer => [348966, 349080, 63232, 63234, 53240] - an array of answer choices I have marked.

The numbers in the array answer are response IDs that were previously discussed in the% test array.

It remains to obtain the parameters of the function call er :: check_answer (). A call to the print function was added to the etest.pm file, in which the contents of the param array were written to the file using Data :: Dumper, which was passed to the function. I give a piece of code:

 if($type eq 'lecture') { open(F, ">>D:\param.txt"); print F Dumper(@param); close(F); ($mtime,$csa,$tasks,$points,$mark) = test::ext::check_answer(@param); }

Further, on the basis of the data obtained, I sketched the following script:

 #!/usr/bin/perl -w require test::ext; @var1 = [ 1388075531, '8647ee8932669c9e0a00827bb82957d2', [ [ 5658, 16542 ] #   4     ,       . ]; $var2 = '57d0c0e0304de48376b064b86cd36bc1'; @var3 = [ … ]; #         $var4 = ['348973', '348975']; my @param = (@var1, $var2, @var3, $var4); my ($mtime,$csa,$tasks,$points,$mark); ($mtime,$csa,$tasks,$points,$mark) = test::ext::check_answer(@param); print "Tasks: " . $tasks;

Let us analyze what is happening here. The @ var1 contains the questions that were asked in the current test; there are only 5 arrays of the form [section int id, question int id]. The $ var2 variable contains the hash identifier of the user who passes the test. Need most likely to save the test results. The @ var3 hash contains an exact copy of the value from the% test array, which can be obtained from the lecture ID key. Finally, the $ var4 array contains the answers I selected. In the above script, I left only the answers to the first question to make it easier to test.

The launch of the script pleased with the string "Tasks: 1", which meant that I was on the right track. A thought immediately arose on writing the simplest brute force that would form various variants of the $ var4 array and check $ task for equality to one. It would have ended this story if it were not for the interest that filled me and the desire to find out how deep the rabbit hole still was.

Wake up, Neo ... The Matrix has you ...

After examining the import table in IDA, it was decided to start with the function strncmp . Having launched my test script under the debugger (the code of which is given above) and having traced the program before loading the er.dll library, I opened the list of functions used by it and set a breakpoint on the import of strncmp. After restarting the script, the debugger happily reported “ Break point at msvcrt.strncmp ”, and the stack of compared lines were:

It was a hash from the @ var1 array and I decided to see where this function is called from. Pressing Ctrl + F9, and then F8, I began to trace the program further. In the course of the trace, I again found myself inside strncmp, to which two strings of the form "57f260a2606af344753ffc00ad834581" were transmitted. This hash was vaguely familiar to me and, having looked at the script code, I was convinced that it was a hash relating to one of the questions. But it was already warmer! Pressing F9 and once again entering the comparison function, I was a little puzzled by the parameters I saw on the stack:

The string “e8c178abd4f1114837d00771871b6379” was a hash for another test question, and the second was not familiar to me. Continuing the launches, I compiled a list of 6 calls to the compare function. Here they are:

Call	Hash # 1	Hash # 2
one	8647ee8932669c9e0a00827bb82957d2	8647ee8932669c9e0a00827bb82957d2
2	57f260a2606af344753ffc00ad834581	57f260a2606af344753ffc00ad834581
3	e8c178abd4f1114837d00771871b6379	0f4632529bff4a4ebab04b5794c1518a
four	9ffcabf80ad0aea2ec7b8c7b89051c29	46916f5f972e01efa665f6cf2245f071
five	3f2fd6dc285372ee847ee9837718f0df	a95478bfb5af215aa268d21b498b493c
6	7f29edf5e4fad4e5782ffd36512cc6b7	a2401e9ad6086aa57ea59b61ec0d55d2

Calls from the 2nd to the 6th contained the first line of hashes, which I at that time called "question identifiers". In this case, the question to which I answered correctly, the hashes coincided. So there was some kind of dependency here. Changing the $ var4 array in the script, I was finally convinced of this, seeing instead of “57f260a2606af344753ffc00ad834581” a completely different hash: “859c6288692d7037035a011ba54597aa”. Now I had to understand where these hashes come from.

Turning to the function call, I put a breakpoint at the beginning of the transfer of parameters and, removing all other points, I restarted the script.

As can be seen from the code, the address where the hash is located is stored in EAX, which confirmed the memory dump.

Things are easy - to find out who put it there.

Further study of the algorithm showed that for each question lines of the form are formed
asdc * a * <query_id> * a * <response_id_id_1_> ... * a * <response__id_ id_N> , which are hashed by the md5 algorithm and compared with the hash that I previously mistakenly called the “question identifier”. In fact, it turned out to be the hash-identifier of the correct answer. Shah!

And mate!

It turned out that the idea of writing bruteforce answers was not as bad as it seemed to me at the beginning. And although I still came to this again, this brute force is of a completely different quality. I will not give the full code, I will consider only its main points.

In order not to insult the feelings of believers in perl (I think they had enough of my miracle script), I will implement brute force in C #, which, as we know, will endure everything. It was decided to analyze files with courses using regular expressions. To the critics of this approach, I want to immediately say: Can you do better? Do it. And it's more convenient for me.

The following data blocks represent interest in the course file:

hash array of lecture numbers and their identifiers
question lists for each lecture
answers to questions

The first hash array can be obtained with a fairly simple regular expression:

 (?<lnum>\d+)\s=>\s(?<lid>\d+)

Next, analyze the receipt of questions from the course. To do this, I applied the following regular expression:

 \['(?<tid>\d+)','\d+'(?:,'\d'){2},"(?<ttext>.*?)","(?<ahash>[a-f0-9]*)","[a-f0-9]*",(?<avars>\[{2}.*?(?:,\]){2})

It may seem complicated at first glance, but in fact there is nothing supernatural either. Named groups contain:

tid - question identifier
ttext - question text
ahash - the correct answer hash
avars - unprocessed answers

The parsing of unprocessed response options is performed by the following expression:

 '(?<aid>\d+)'.*?"(?<atext>.*?)"

Named groups contain:

aid - answer ID
atext - question text

Directly brute force answers in this case is a combinatorial problem of finding all subsets of a given set. The subsets were generated using a binary code, as shown by the following algorithm:

 int SetPower = (int)Math.Pow(2, answers.Count); for (int i = 1; i < SetPower; i++) { string aStr = ""; answers.ForEach(x => x.Valid = false); for (int j = 0; j < answers.Count; j++) { if ((i & (1 << j)) != 0) { aStr += "*a*" + answers[j].AID; answers[j].Valid = true; } } string answerString = "asdc*a*" + task.TID + aStr; if (GetMD5Hash(answerString) == task.TrueAnswerHash) { return answers.Where(w => w.Valid == true).ToList(); } }

At each pass of the algorithm, a subset of the answer choices is formed, from which a string is compiled for hashing. After receiving the hash, it is compared with the reference one, and in case of coincidence, many correct answers are returned.

Connection terminated ...

And so this story of the struggle for knowledge ended. Knowledge is the thing that is obtained only by work and any knowledge obtained free of charge (in the broad sense of the word) does not cost anything. I thank the INTUIT LEU for their resource in general and for those pleasant moments that they delivered to me with their magic library er.dll in particular.

Additional materials to the article

Perl Script I Used For Debugging
Console bruteforcer answers

Source: https://habr.com/ru/post/208874/

All Articles