The largest document leak in Internet history: 2.6 terabytes
The International Consortium of Investigative Journalists (ICIJ)
laid out in free access the “Panama Archive”: the largest database of offshore companies obtained from the computers of the Panamanian law firm
Mossack Fonseca by unidentified persons.
Anonymous (John Doe) transferred to the reporters of the German newspaper Süeddeustche Zeitung
2.6 terabytes of files - spreadsheets, letter texts, PDF, TIFF and other formats, including the ancient and already unused. Understanding the scope of the work, they asked the ICJI to organize an international joint project.
Millions of graphic images were driven through
Tesseract's character recognition program on 40 temporary servers in the Amazon cloud.
Apache Solr was used to index the text,
Apache Tika was used to process documents in different formats.
')
During the year, the archive was studied by
370 reporters from 80 countries of the world. To make it easier for journalists to work for them, the developers at ICIJ have connected a graphical interface from the library program
Project Blacklight . In order to display information in a graphical form and display links between objects, we had to use the proprietary program
Linkurious , and using the
Talend tool
, the contents of the relational SQL database Mossack Fonseca were transferred to the
Neo4j format.
Now the result of the work of developers is proposed to evaluate to everyone through the Internet.
The ICIJ database in a structured form is available at:
https://www.occrp.org/en/panamapapers/database.html
You can
download a copy on your computer (35.7 MB in the archive).
The database contains information on almost 214,000 offshore firms in 21 offshore jurisdictions.
Interactive client map of offshore companies
There are 11,516 firms in the database that belong to 6285 Russian citizens. Among them are relatives and friends of high-ranking officials. Such a massive leak of documents can lead to a number of high-profile resignations and criminal cases, although offshore firms often operate in the gray legal field, without violating the law.
Only a fraction of the information from 11.5 million files that were obtained from the computers of the law firm Mossack Fonseca, one of the world's largest generators of one-day firms, was published.
The international consortium of investigative journalists does not publish all the available information, there are no source documents or a large database with personal information, there are no bank accounts of firms, the contents of electronic correspondence and financial transactions of companies. This is done in order not to light the personal data of many people who are not involved in financial crimes.
Only the names of companies, their jurisdictions, postal addresses and names of offshore company managers are made public. The data covers the period from 1977 to 2015.
The Panama Archive database is published under the Free Creative Commons Attribution-ShareAlike license. An international consortium of investigative journalists invites the entire community to focus on the study and classification of published information.
UPD.
The first find of the community Geektimes
UPD2.
The second find of the community Geektimes
