Mobile developers from Applidium
managed to understand the communication protocol used by Siri, so now this speech recognition engine can theoretically be run on any device, including Android, if you know where to get the iPhone 4S ID, and Apple will make it to the "black list".
The key element of Siri is how the program communicates with the server (Siri only works if you have access to the Internet). The traffic goes through TCP protocol, port 443, to the server 17.174.4.4. If you try to contact the server
https://17.174.4.4/ from the desktop, you will see that it presents a certificate in the name of
guzzoni.apple.com (
Didier Guzzoni from SRI is one of the creators of this technology).
Since the traffic is protected by https, you will not listen to it with the sniffer. The guys from Applidium decided that it was easiest to pick up a fake https server and dns server - and look at requests from Siri. Of course,
it is not possible to forge a real certificate
guzzoni.apple.com , but you can try to issue your own valid certificate to
guzzoni.apple.com on behalf of your own certificate authority on a fake https server. Since iPhone allows you to add certificates from arbitrary certification authorities to your phone, this method has worked - and now Siri can successfully exchange commands with your own server!
After that, hackers were able to calmly deal with the Siri protocol, it uses unusual methods that do not comply with HTTP standards. The title looks like this:
')
ACE /ace HTTP/1.0 Host: guzzoni.apple.com User-Agent: Assistant(iPhone/iPhone4,1; iPhone OS/5.0/9A334) Ace/1.0 Content-Length: 2000000000 X-Ace-Host: 4620a9aa-88f4-4ac1-a49d-e2012910921
As you can see, the ACE method is used instead of the usual GET, “/ ace” is requested as the url, the
Content-Length field is indicated at almost 2 GB. The
X-Ace-host field is somehow tied to the device ID (GUID).
The request body itself is sent in binary form. It starts with 0xAACCEE. The developers suggested that the archived content goes further, that is, the data is transmitted in a compressed form. So it turned out: the
zlib archiver successfully recognized the archive in binary code (starting from the fifth byte after the AACCEE header).
In the unpacked data the binary code was again found, but with inclusions of the text. Among individual words,
bplist00 attracted the attention of developers. Obviously, this is an indication of a
plist binary. Having studied the data in a little more detail, the developers found several different fragments:
- Fragments beginning with 0x020000xxxx are “plist” packets, xxxx corresponds to the size of the plist binary data that follows the header.
- Fragments starting with 0x030000xxxx - “ping” packets that iPhone sends to the Siri server to maintain the connection. Here xx corresponds to the ping number.
- Fragments starting with 0x040000xxxx - “pong” packets that the Siri server sends to the iPhone in response to pings. As it is easy to guess, xx corresponds to the sequence number of the packet.
Deciphering
plist binary content is not difficult; you can do it yourself on Mac OS X using the console command
plutil , or on other platforms using
CFPropertyList .
The developers found out that Siri sends audio files compressed with the
Speex codec to the server, and also inserts an iPhone 4S device identifier everywhere. The program and the server exchange a huge amount of information on the slightest cause. For example, when speech recognition works, Apple servers send timestamps and “trust level” for
each word.
For independent work with the Siri protocol, you can use the
toolkit created by Applidium programmers.