Overview
"Because, sometimes, the Internet doesn't quite work..."
The MIT RON (Resilient Overlay Networks) project is a DARPA-funded effort motivated by the desire to improve the robustness and availability of Internet paths between hosts by an order of magnitude over today's wide-area Internet routing infrastructure. The key design goal in RON is to develop techniques to allow end-hosts and applications to cooperatively gain improved reliability and performance from the Internet. At a glance, RON nodes examine the condition of the Internet between themselves and the other nodes, and, based upon how the network looks, decide if they should let packets flow directly to other nodes, or if they should send them indirectly via other RON nodes. For instance, the group of cooperating systems below can mutually provide a more available and better-performing routing service than what vanilla Internet routing can provide.

RON is an architecture that allows a small group of distributed Internet applications to detect and recover from path outages and periods of degraded performance within several seconds, improving over today's wide-area routing protocols that take at least several minutes to recover. A RON is an application-layer overlay on top of the existing Internet routing substrate. The RON nodes monitor the functioning and quality of the Internet paths among themselves, and use this information to decide whether to route packets directly over the Internet or by way of other RON nodes, optimizing application-specific routing metrics.
The RON project has several components, including:
- Overlay configuration and maintenance.
- Probing and outage detection
- Routing around outages and performance failures
- Application-controlled routing
- Policy routing
- Multi-path routing; QoS routing
- Data forwarding
- API and RON libraries
- Applications (e.g., resilient VPN, resilient conferencing, etc.)
- Data analysis and understanding wide-area routing and fault-tolerance behavior; BGP interactions
- Simulations of RON behavior
RON is part of a larger research agenda on large-scale, robust, Internet-based distributed systems, which spans areas ranging from resilient routing (as in RON) to emerging peer-to-peer systems. Our work on peer-to-peer systems is based on Chord, a scalable p2p lookup service.
RON is also closely related to other current projects at LCS in the area of robust Internet infrastructures and uses some of the ideas from these projects: CM , the Inernet Congestion Manager; and Click-SMP , a modular PC-based router.
RON data, Internet experiments
RON deployment sites
Since early 2001, we have run a real-life RON, which now has 17 sites located around the Internet. Our deployment is international. We have also collected extensive data sets and analyzed them. They will soon be made publicly available on this page.
Papers
- Scaling All-Pairs Overlay Routing
David Sontag, Yang Zhang, Amar Phanishayee, David G. Andersen, David Karger
CoNEXT, Rome, Italy, December 2009.
- Measuring the Effects of Internet Path Faults on Reactive Routing
Nick Feamster, David Andersen, Hari Balakrishnan, and Frans Kaashoek
ACM SIGMETRICS 2003, San Diego, CA, June 2003.
Presentation - Mayday: Distributed Filtering for Internet Services
David G. Andersen
4th Usenix Symposium on Internet Technologies and Systems, Seattle, Washington, March 2003.
Presentation: [Postscript (390k)] [PDF (110k)] - Topology Inference from BGP Routing Dynamics
David G. Andersen, Nick Feamster, Steve Bauer, and Hari Balakrishnan
2nd SIGCOMM Internet Measurement Workshop, Marseille, France, November 2002. - Resilient Overlay Networks
David G. Andersen, Hari Balakrishnan, M. Frans Kaashoek, Robert Morris
Proc. 18th ACM SOSP, Banff, Canada, October 2001.
Presentation (PDF) (292 KB) - DNS Performance and the Effectiveness of Caching
Jaeyeon Jung, Emil Sit, Hari Balakrishnan, and Robert Morris
Proc. 1st ACM SIGCOMM Internet Measurement Workshop, San Francisco, CA, November 2001. - Resilient Overlay Networks
David G. Andersen, SM Thesis, Massachusetts Institute of Technology, May 2001.
[Postscript (8.9 MB)] [ps.gz (1.2 MB)][ PDF (2.2 MB)] (86 pages)
- The Case for Resilient Overlay Networks
David G. Andersen, Hari Balakrishnan, M. Frans Kaashoek, and Robert Morris
Proc. HotOS VIII, Schloss Elmau, Germany, May 2001. (best student paper award)
Presentation: [Slides (ps)] [Slides (pdf)] [Notes (ps)] [Notes (pdf)] - Fine-Grained Failover Using Connection Migration
Alex C. Snoeren, David G. Andersen, and Hari Balakrishnan
Proc. 3rd USENIX USITS, San Francisco, CA, March 2001.
(Also MIT-LCS-TR-812, September 2000.)
Talks
- Topology Inference from BGP Routing Dynamics. 2002 Internet Measurement Workshop. [Postscript (400k)] [PDF (150k)]
- RON: Choosing Resiliency. 2002 Opensig workshop, Lexington, KY. [Postscript (780k)] [PDF (240k)]
- Resilient Overlay Networks, 18th SOSP, Lake Louise, Alberta, Canada, October 2001.
- Resilient Overlay Networks, MIT LCS Annual Retreat, Cape Cod, June 2001.
- Resilient Overlay Networks, DARPA PI Meeting, Colorado Springs, CO, July 2001.
- Slides from an old presentation comparing existing link probing mechanisms.
Resources
- RIPE NCC stores data about BGP routing table updates.
People
Projects
- The Detour Project at the University of Washington. They developed "sting", which uses TCP to determine forward andvreverse path packet loss rates. There has also been a small project follow-on to Detour by some of David Wetherall's students to test Detour. They simulated some algorithms for forming the routing topology: [Orig ps][Local Mirror] The projects list is also available.
There are some important differences between RON and Detour. First. RON seeks to prevent disruptions in end-to-end communication in the face of failures. RON takes advantage of underlying Internet path redundancy on time-scales of a few seconds, reacting responsively to path outages and performance failures. Second, RON is designed as an application-controlled routing overlay; because each RON is more closely tied to the application using it, RON more readily integrates application-specific path metrics and path selection policies. Third, we present and analyze experimental results from a real-world deployment of a RON to demonstrate fast recovery from failure and improved latency and loss-rates even over short time-scales.
- The Berkeley SPAND project. The Spared Passive Network Performance toolkit lets applications measure and share performance information with other local clients to make better guesses about which (for example) mirror site to use. The SPAND paper contains more information [ps] local ps] as does Mark Stemm's thesis [html] [ps] [local ps].
- RAMP Reliable Adaptive Multipath Routing, from UCSD.
Network Characterization
Measurement Tools
Overlay Networks
Funding
We gratefully acknowledge funding for RON from DARPA under the Fault-Tolerant Networking (FTN) program of the ATO; it is being supported by DARPA and the Space and Naval Warfare Systems Center (SPAWAR), San Diego, under contract N66001-00-1-8933.