BitTorrent's decision to shift its communications from TCP to UDP seems esoteric, but it's been widely criticised as a threat to the bandwidth of other Internet users.
So why is what looks like such a small change a big thing? To understand this, we have to delve somewhat into how different parts of the Internet protocol handle traffic.
The Pessimistic Protocol
When your computer needs to communicate over the Internet, it has to know something about the application's requirements: does the application need speed or reliability?
If reliability is important, it will use the Transmission Control Protocol (the “TCP” in TCP/IP).
In TCP, the two machines set up a communication session over their Internet connection with the following components:
- Handshaking – each end of the communication makes sure the other end exists and can handle a session;
- Acknowledgment – when your computer sends a packet to the remote host, it receives an acknowledgment that the packet has been received;
- Timeout – the two ends assume that they might be on an unreliable connection, so they set a time limit on the session. If one end takes too long to receive an acknowledgment, it assumes the packet wasn't received, and retransmits it.
This is very good for the reliability of a connection. Imagine, for example, that you're trying to send an e-mail with a large document attached. If the document isn't complete, Microsoft Word probably won't be able to open it. TCP ensures that in spite of the occasional lost packet in the network, both ends of the communication know what's going on, and they make sure they either maintain a session until all of the data has passed over the network, or (in the face of insurmountable problems in the network), they drop the session and try again later.
All of this reflects one of the fundamental assumptions of the Internet. Since the systems connected to the Internet are intelligent, the network doesn't have to be. TCP is an end-to-end model: the two machines connected to the network are directly aware of each other, and can therefore handle the unreliability of the networks that existed when the Internet was first designed.
And it's ideal for file-oriented communications, where getting all of the data is more important than how quickly you get the data, and where you're sending a relatively small number of large packets.
TCP, however, has a couple of problems. First, if you're transmitting lots of small packets, each one of which requires its own acknowledgment, then the overhead is comparatively large. Second, because each packet requires acknowledgment, TCP isn't good at “real time” communications.
As a result, real-time applications like VoIP calls and Internet videoconferencing don't use TCP. They use a different protocol, UDP – the User Datagram Protocol.
The Optimistic Protocol
UDP changes two of TCP's assumptions about the network and communications. It assumes that the network is reliable enough that it can strip away some of the back-and-forth error correction of TCP; and second, it assumes that the communications can stand some level of lossy communication without catastrophe. If you lose part of a file – or, for that matter, part of a secured banking transaction – then you have a problem. If, on the other hand, you lose a tenth of a second of a telephone conversation, then you may not even notice.
So, like taking the back seats out of a car so it can go faster, UDP strips away the “session-based” aspects of TCP. The application takes the data, puts it into an addressed packet, hands it off to the network, and thereafter pays no further attention to what's happening.
If the network loses a packet on the way, UDP doesn't have to care. It assumes that someone else – for example, the application or the user – can decide whether or not the lost packet was important, or if the network is performing well enough to handle the communication.
This makes UDP the protocol of choice for real-time streaming applications. The network is usually good enough to deliver all of the packets; with lots of small packets, the lower overhead of UDP (because it doesn't need acknowledgments) is more efficient and has lower latency.
The end user, of course, doesn't even know a choice is made. It's the software writer who decides whether an application is going to use TCP or UDP for its communication sessions.
Hence in creating an application such as BitTorrent, which deals with large files and non-real-time data, the designers originally worked with TCP.
Enter the Traffic Managers
When applications like BitTorrent started to dominate the Internet, ISPs started to worry, particularly in America where Internet access is generally sold without download limits. Whether it's distributing legal files (like Linux distributions and software) or illegal files (like unlicensed copies of music), BitTorrent is a bandwidth hog.
With their pipes allegedly filling up with P2P traffic, US ISPs triggered the 'net neutrality' debate by looking for ways to throttle back that traffic. Whether this reflected a wish to share capacity fairly among users (the pro-throttling stance) or to interfere with users' control over their communications (the neutrality stance), the outcome is that with the right technology in place, the ISP can slow down P2P traffic such as BitTorrent.
The list of traffic management technologies is too long to recite here, but one relatively simple technique is worth discussing: the TCP reset. Put simply, this involves the ISP looking for connections that have the characteristics of P2P traffic (for example, a very long session between two users that carries lots of large packets). When they detect such a session, the ISP can send a “reset” packet to one (or both) of the computers in that session – which is what the applications would do if TCP failed to acknowledge packets or in the event of a timeout.
The application would then believe that something had gone wrong with the communication, and will first attempt to restart the session, or eventually give up.
Another simple and widely-used traffic management technique is “random packet discard”. The ISP can simply drop random packets, triggering a retransmission from the hosts. This doesn't much upset the applications, since TCP detects the error and requests the missing packet again.
These approaches depend on the application using TCP – because UDP doesn't have a similar “reset” function.
That makes it harder and much more intrusive to throttle UDP traffic. To interfere with UDP traffic, the ISP has to look inside the packet (so-called 'deep packet inspection'), and this is a public relations minefield. Deep packet inspection means the ISP is looking at the content of the traffic (as with filtering), but without the sanction of a government requirement to look for illegal content. It may even be that in some jurisdictions, such behaviour by an ISP breaks wire-tap laws.
And deep packet inspection is more expensive than the simpler business of watching TCP sessions. The difference is somewhat like sitting on a highway: it's easier to count the number of semi-trailers that pass by than it is to count the number of occupants in all the cars that pass. Where there might be hundreds of thousands of sessions to watch, there will be millions upon millions of packets – which demands big iron and sophisticated software if the packet inspection is to survive the onslaught of traffic without itself becoming the bottleneck that the ISP is trying to avoid.
The BitTorrent designers have spotted this and have decided that instead of using the file-oriented (or rather stateful) TCP, they should make a small change to the application, and use UDP by default – as a way to defeat ISPs throttling BitTorrent traffic.
The End as We Know It?
Whatever the eventual outcome, it's pretty clear that the BitTorrent designers have approached Internet protocols with something of a sense of entitlement. Even though TCP should be the preferred protocol for a non-real-time, large file transfer application (from the point of view of traffic fairness), UDP is much harder and costlier to throttle – so they've decided that UDP is better for their purposes than TCP, and made it the new default protocol selection.
And, because most users – even P2P users – are largely ignorant of how the Internet works, this will very quickly change the nature of BitTorrent traffic. It will, instead, be using the same protocol as VoIP and video.
But why would this matter? If an ISP has a 10 Gbps backbone, and 5 Gbps is BitTorrent, why would the choice of protocol make a difference?
The problem goes back to the difference between TCP and UDP. Since the TCP reset approach doesn't work on UDP traffic, the only cheap and simple throttling mechanism available is random packet discard – but if that starts happening to VoIP or video users on a large scale, they will most certainly notice. If, on the other hand, ISPs decide that they need deep packet inspection to deal with P2P over UDP, it will probably impact everybody's speed, and will bring both the net neutrality and Internet filtering debates to a boil.