I did some packet sniffing and look at how Google Talk works.
1. Google Talk is Jabber (XMPP) compliant. Here is what it send for authentication
<stream:stream to="gmail.com" version="1.0" xmlns:stream="http://etherx.jabber.org/streams" xmlns="jabber:client"> <stream:features> <starttls xmlns="urn:ietf:params:xml:ns:xmpp-tls"/> <mechanisms xmlns="urn:ietf:params:xml:ns:xmpp-sasl"> <mechanism>X-GOOGLE-TOKEN</mechanism> </mechanisms> </stream:features> <auth xmlns="urn:ietf:params:xml:ns:xmpp-sasl" mechanism="X-GOOGLE-TOKEN"<(Some hash, presume Google Token)</auth>
Then followed by the standard presence information etc. Interestingly, the precense exchange indicated that Google Talk assume everyone in your Gmail addressbook is your friend…Hmm.
IMs are also standard compliant :-)
<message to="xxxx@gmail.com" type="chat"> <body>testing now..pls accept then we hang up =)</body> <active xmlns="http://jabber.org/protocol/chatstates"/> </message>
Note: talk.google.com does not seem to support s2s. This means you cant use gmail jabber server to talk to other friends who is on other jabber system :P2. Now lets get to the juicy part. How voice is being handled. It started with this:
<iq to="xxxx@gmail.com/Talk.v64E5E900DB" type="set" id="17"> <session xmlns="http://www.google.com/session" type="initiate" id="357570531" initiator="james.seng@gmail.com/Talk.v64E5E917A2"> <description xmlns="http://www.google.com/session/phone"> <payload-type xmlns="http://www.google.com/session/phone" id="103" name="ISAC"/> <payload-type xmlns="http://www.google.com/session/phone" id="97" name="IPCMWB"/> <payload-type xmlns="http://www.google.com/session/phone" id="102" name="iLBC"/> <payload-type xmlns="http://www.google.com/session/phone" id="4" name="G723"/> <payload-type xmlns="http://www.google.com/session/phone" id="100" name="EG711U"/> <payload-type xmlns="http://www.google.com/session/phone" id="101" name="EG711A"/> <payload-type xmlns="http://www.google.com/session/phone" id="0" name="PCMU"/> <payload-type xmlns="http://www.google.com/session/phone" id="8" name="PCMA"/> <payload-type xmlns="http://www.google.com/session/phone" id="13" name="CN"/> </description></session></iq>
Okay, so confirmed: Google Talk is not using SIP *sigh*. Google is using its own proprietary extension to XMPP and the above is an exchange of the codec capability of the clients, all of which is documented in GT’s developer site except for “CN” which I am not sure what it is.
From the subsequent exchange, it looks like the clients do the following things
1. Negotiate the codec to use
2. setup one-time id and password for the RTP stream
3. Via some means, determine the appropriate STUN server to use.
4. Proceed to setup RTP between two clients for voice call.
(3) needs some eleboration because I found out that the STUN server specified does not belong to Google – the IP address belongs to someone in Taiwan, likely another Google Talk user. Further investigation shows that Google Talk apparently comes with a STUN server. In other words, like Skype, Google Talk turns every client into a possible server to help relay voice call between two users. Very smart thing to do technically speaking but let me go read the Google Talk UAT again. (Hmm…What happen to the ‘Don’t be Evil’ plan?)
Here is the raw xml if you are interested.