Many Linux users on Linux have noticed how inconvenient it is to work with the history of chat messages. There is no normal search, messages for a long period of time are loaded for a very long time. No ability to export to other formats / clients.
Skype for Linux stores message history in an undocumented binary format. Despite the fact that enthusiasts scratched it for a long time, much remains unknown.
A surface search for a turnkey solution for exporting message history did not lead to success. Therefore, I, having collected all the available information, wrote my own.
Skype API
The first thought was to use the Skype API via the D-Bus interface. It would seem that there is a command “SEARCH CHATS”, but for some reason I could not get her to give out all the chats. I suspect that it is not intended for this. Skype API is sufficient to operate with current events, but to access the history you need to look for another solution. Support and writes that the ability to export through the API in the plans is not worth
https://jira.skype.com/browse/SPA-596 .
')
DBB files
So the only way out is to extract messages from profile files. I do not know for what reason, but for the data storage, Skype engineers came up with a rather strange bike. From the 4th version of the client, they thought better of it and switched to SQLite, but for Linux, there is only the 2nd with the old format.
The data is stored in the profile folder in the
nameXXX.dbb files. Each file contains records of fixed size “power of two” +8. The record size is rounded up to the nearest power of two (at least 256) and the record is written to the appropriate file.
Thus, the
chatmsg512.dbb file consists of blocks of 512 + 8 = 520 bytes, which contain records between 256 and 512 bytes in length.
I can hardly imagine for what task such a format would be effective. Due to rounding, unused holes are obtained, which leads to an unjustified increase in size. For example, in my database of 21,500 messages,
23.5% of the space was lost on rounding, an average of 73 bytes per message. With such free handling of free space, the seven-bit coding of numbers looks a bit strange (see below). Finally, since the messages are written in different files, they must be combined and sorted before any meaningful use, which also does not add speed.
Recording format
The record consists of a heading and body. The header is 17 bytes long (seventeen).
4 bytes magic value "l33l"
4 bytes 32-bit int record size
4 bytes 32-bit int identifier
5 bytes unknown
Then follow the fields of three types 0x00 - a number in a seven-bit encoding, 0x03 - a string, 0x04 - a block of binary data. In addition to the data type, each field contains the field type also in a seven-bit encoding.
Field types
0x00 - data type (number)
7bit number - field type
7bit number - field value
0x03 - data type (string)
7bit number - field type
null-terminated string - field value
0x04 - data type (blob)
7bit number - field type
7bit blob size - field length
binary blob - field value
Seven-bit variable length coding
In each byte, the most significant bit indicates whether this byte is the last one (1 - no, 0 - yes). The remaining 7 bits are significant. To get the number you need to glue together 7 bit blocks in order of big-endian.
Python script / module that reads DBB files can be taken from GitHub
https://github.com/Vayu/skypelog .
The module contains currently known information about the field names of various types of records: SkypeMsg, SkypeAcc, SkypeContact.
Direct start of the script allows you to export the contents of the
chatmsgXXX.dbb files to JSON or simple HTML:
- JSON is intended for subsequent processing by external programs and saves the entire account history in one file. There are two options: “full” exports all known fields and “compact” exports the minimum set of “date, name, message”.
- HTML creates message history files for each account-contact pair. For example, vasya-petya.html and vasya-masha.html. Unfortunately, the structure of the records for group chats is still not completely clear.
An example of using
skypelog.py as a module:
#! / usr / bin / env python
from skypelog import *
data = SkypeDBB ( "/home/user/.Skype/account/call256.dbb" )
for r in data. records ( ) :
print r
A longer example is
apiuse.py on GitHub.
Conclusion
As you can see from the example above,
skypelog.py greatly simplifies the study of the DBB format. The field names are still known (guessed), only for several types of records:
chatmsgXXX.dbb - chat messages, class
SkypeMsgprofileXXX.dbb - accounts, class
SkypeAccuserXXX.dbb - contacts, class
SkypeContactThose who wish are invited to guess the yet unknown field values ​​in the following files:
alertXXX.dbb - system messages
chatXXX.dbb - chat list
chatmemberXXX.dbb - chat participants list (?)
transferXXX.dbb - the list of transferred files
callXXX.dbb - call log
callmemberXXX.dbb - call participants list (?)
voicemailXXX.dbb - voice mail
More detailed discussion of the format (eng)
Neal Krawetz blog - Skype LogsNeal Krawetz blog - Skype Logs discussionPS1: Judging by the reviews in the comments “SEARCH CHATS” used to work, it may depend on the version of Skype. Those interested can try the following code:
(need dbus module for Python)
#! / usr / bin / env python
import dbus
import sys
try :
skype = dbus. SystemBus ( ) . get_object ( 'com.Skype.API' , '/ com / Skype' )
except :
try :
skype = dbus. SessionBus ( ) . get_object ( 'com.Skype.API' , '/ com / Skype' )
except :
print "Can't find Skype API"
sys . exit ( )
print skype. Invoke ( "NAME python" )
print skype. Invoke ( "PROTOCOL 9999" )
print skype. Invoke ( "SEARCH CHATS" )