Q. What is SABUL?
SABUL is a protocol for moving data very efficiently over long haul, high performance networks. SABUL is also the name of an open source library implementing SABUL.
Q. How fast is SABUL?
When moving data from a single computer using a 1 Gb/s NIC and connected with GigE, SABUL can move data at over 950 Mb/s. SABUL can also be used on a cluster of such computers. For example, using two three-node clusters, SABUl has moved data between Chicago and Amsterdam at over 2.8 Gb/s.
Q. Can you give me some details of why SABUL is faster than competing protocols?
It has long been recognized that TCP does not provide good performance for applications on networks with a high bandwidth delay product.
One approach to improving TCP performance for data intensive applications is to adjust the TCP window size to be the product of the bandwidth and the RTT delay of the network. This approach requires tuning the network and in practice can be quite difficult.
Another approach to overcoming the limitations of TCP is to stripe TCP over several standard TCP network connections. In contrast to the first approach, this can be done at the data middleware or application level. The perforamnce of striped TCP begins to level off as the number of sockets increases, sometimes after only 25-50 sockets, effectively limiting its usefulness to OC-3 (155 Mb/s) and OC-12 (622 Mb/s) and lower bandwidth networks.
The approach we took was to use one UDP-based channel for data in order to send data at high rates, and another TCP-based channel to resend blocks to ensure reliability, for rate control so the protocol is friendly, and to adjust the protocol due to congestion etc. We call the library SABUL which implements this. This is our third version of SABUL over the past three years. Over this time we have improved the implementation of SABUL so that we can usually achieve over 900 Mb/s per node using 1 Gb/s NIC. By using clusters, with each node directly connected to the router with 1 Gb/s link, we can scale this to several Gb/s, even over long haul neworks.
Q. What are Photonic Data Services?
Photonic data services (PDS) are a layered series of protocols designed to work with data over photonic networks. PDS consists of the following three layers:
- Photonic path services layer: these are services to set up, tear down, and check the status of photonic paths, providing high performance photonic circuits These services were developed by iCAIR at Northwestern University. With photonic path services, applications can request specialized photonic paths as their are needed.
- Network protocol layer: This is the layer where SABUL lives and is described in detail above. Briefly, SABUL uses UDP to send data efficiently. Normally, UDP is an unreliable protocol. SABUL combines a UDP data channel with a TCP control channel so that the protocol is reliable and unfriendly to other traffic. SABUL was developed by the Laboratory for Advanced Computing and National Center for Data Mining at UIC.
- Data services layer: in prior work, we developed a specialized protocol to create data webs called the Data Web Transfer Protocol or DWTP. DWTP is also known as the Data Space Transport Protocol or DSTP. DSTP is compatible with web services, but also has specialized functionality to work with data (it supports keys, metadata and data), can sample data, select rows and columns of data, etc. DSTP directly supports functionality for remote data analysis and distributed data mining. DSTP was developed by the Laboratory for Advanced Computing and National Center for Data Mining at UIC.
The speed and bandwidth come from b). The functionality for data comes from c). The flexibility and ability to do this on a per application basis comes from a). The goal of photonic data services is to take the lambdas to the data.
Q. Why use Photonic Data Services?
DSTP is designed so that working with remote data over the web is as easy as working with remote documents. With Photonic path services, DSTP can now be used for the first time on Gigabyte and Terabyte size data sets.
To say it differently, with Photonic Data Services, applications can now work with remote Gigabyte size data sets as if they were local.
Q. What is the status of these protocols?
The current release of SABUL is 2.3. The current release of DSTP is version 3.0. Both of these are available via the source forge project dataspace at www.sourceforge.net/projects/dataspace.
The path services are currently integrated with SABUL for a single administrative domain in an experimental release of SABUL, not generally available. A general release is planned for sometime in 2003 or early 2004.
Q. How can I get more technical information?
Technical information about SABUL is available from the following publications:
- H. Sivakumar, R. L. Grossman, M. Mazzucco, Y. Pan, Q. Zhang, Simple Available Bandwidth Utilization Library for High-Speed Wide Area Networks, Journal of Supercomputing, 2003, to appear. pdf
- Robert L. Grossman, Yunhong Gu, Dave Hanley, Xinwei Hong, Dave Lillethun, Jorge Levera, Joe Mambretti, Marco Mazzucco, and Jeremy Weinberger, Experimental Studes Using Photonic Data Services at IGrid 2002, FGCS, 2003, to appear. pdf
- Yuhong Gu, Xinwei Hong, Marco Mazzucco, and Robert L. Grossman, SABUL: A High Performance Data Transfer Protocol, submitted for publication. pdf
Technical information about DSTP and DSTP applications can be found elsewhere on this web site.
Q. Who funded the research?
Early versions of SABUL were funded by NSF and DOE. During the past two years, the work has been supported by the NSF.