📜 ⬆️ ⬇️

Development → Kazakhstan: How I helped to submit 100 forms of tax reporting. Continuation 300 form

* This is not the moon. This is a space station.
- Obi-Wan Kenobi*

Greetings to the society!


1 article → Start 200 form


Continued ...


The next step to solve the problems of my customer were the VAT tax returns. What is interesting is the taxpayer's office could export only 300 small forms in the form of xml. The remaining forms were exported only with the help of the SONO program. And these forms have been archived.


But not everything is as simple as it seems from the first time.




and the most interesting, as programmers in a company that supports an online service for filing tax returns, "encrypted" these same forms ...


Part one. The torment of finding the algorithm for reading large forms.
This is SONO program. It employs Kazakhstan’s accountants who file online tax returns.




What I started with.


Exported all 300 forms of declarations from SONO.




And tried to open the archives. But it was not there.




The archiver produced an error - "file is corrupted". After spending 5 hours learning the basics of tar, I realized that I didn't understand anything ...


The study of open sources stupidly googling also did not help. And here I came across a small discussion on the accounting forum. Where admins frankly mocked the algorithm, which is encrypted large forms.


It turns out that without thinking twice without composing various cryptographic protection. The authors of the "encryption" of large forms decided to stupidly remove the first two characters "BZ" at the beginning of the file.


By inserting this mega key into the beginning of the file




I opened the archive. And inside it was the same damaged archive. (for the coders who invented to hide the data inside the same archive, see the matryoshka favorite toy) Wise with experience, I repeated the mega crack by adding BZ to the beginning of the file.




and finally got access to the data.




Just solving the issue of how to calculate information from the archive took 6 hours of my life. “Abildet” - as my uncle says.


Ready function on php which "decrypts" the big forms.


if ($_POST["action"] == "getBz2") { $name = $_FILES["bz2"]["tmp_name"]; $homepage = file_get_contents($name); if (strripos($_FILES["bz2"]["name"], ".xml") === false) { $homepage = "BZ".$homepage; file_put_contents($name, $homepage); $baseDir = "/tmp/21"; exec("rm -f " .$baseDir . "/dir/*"); if (!@mkdir("$baseDir", 0777, true)) { } exec("tar -jxvf $name -C $baseDir"); exec (" rm $baseDir/*.xml -rf"); $files = glob("$baseDir/*.bz2"); $homepage = file_get_contents($files[0]); $homepage = "BZ".$homepage; file_put_contents($files[0], $homepage); exec("bunzip2 ".$files[0]); $files = glob("$baseDir/*.xml"); $homepage = file_get_contents($files[0]); } echo homepage; die(); } 

Of course, everything can be easily improved, but here only a part of the php code that is responsible for the "crack".


Part Two Reading the xml structure and collecting the necessary data on TK


A small visual analysis of the final xml files. Gave an understanding that the forms that were originally given as xml have fno formatVersion = 1, mega encrypted had fno formatVersion = 2.






The main data structure is completely repeated.


 function getTitle(a) { try { return frame.contentWindow.document.querySelector("form[name='form_300_00'] field[name='" + a + "']").innerHTML; } catch (ex) { return ""; } } var fno = {}; fno["dt"] = {} fno["dt"]["dt_main"] = getTitle("dt_main"); fno["dt"]["dt_regular"] = getTitle("dt_regular"); fno["dt"]["dt_additional"] = getTitle("dt_additional"); fno["dt"]["dt_notice"] = getTitle("dt_notice"); fno["dt"]["dt_final"] = getTitle("dt_final"); fno["dt"]["notice_date"] = getTitle("notice_date"); fno["dt"]["notice_number"] = getTitle("notice_number"); fno["p7"] = getFaktures(7); fno["p8"] = getFaktures(8); fno["period_year"] = getTitle("period_year"); fno["period_quarter"] = getTitle("period_quarter"); fno["submit_date"] = getTitle("submit_date"); fno["field_300_00_001_A"] = getTitle("field_300_00_001_A"); fno["field_300_00_001_B"] = getTitle("field_300_00_001_B"); fno["field_300_00_013_A"] = getTitle("field_300_00_013_A"); fno["field_300_00_013_B"] = getTitle("field_300_00_013_B"); fno["field_300_00_015"] = getTitle("field_300_00_015"); fno["field_300_00_021"] = getTitle("field_300_00_021"); fno["field_300_00_023"] = getTitle("field_300_00_023"); fno["iin"] = getTitle("iin"); fno["rnn"] = getTitle("rnn"); 

In principle, I made the logical structure of the document in 20 minutes. If it were not for mega quack, then it all hung.


Result

List of 300 forms:




Loading forms:




View Forms:



ps The customer stopped writing for joy.
pps For those interested, on the application side, this is not "sexy."
ppps If anyone wants to try to hand off soon everything on the githab. Currently the system is not finalized. Therefore, I do not want to show it ...


')

Source: https://habr.com/ru/post/325310/


All Articles