1.0 Introduction
The DataSpace Transfer Protocol (DTSP) provides a simple means for publishing and querying remote and distributed data. With DSTP clients and servers, it is easy to create a data web, similar to the document web created with http clients and servers.
In this document, we describe the DSTP protocol and give a simple example of its use.
2.0 DSTP Specification
DSTP specifies a protocol for the distribution, inquiry, and retrieval of data columns for the purpose of a correlation study. The protocol is so designed that data columns from different servers can be joined based on specific Universal Correlation Keys (UCKs). The role of UCKs in DSTP can be considered analogous to that of a primary key attribute in a database. Because UCKs and primary keys are unique in DSTP and databases respectively, a join can be performed on either. The motivation for DSTP stems from NNTP (Network News Transfer Protocol, Phil Lapsley, February 1986), an analogous server which uses a similar protocol for retrieval of news articles. The various commands and the associated responses in a DSTP server are similar to the ones in NNTP. The DSTP server uses a stream connection and SMTP-like commands and responses. It is designed to accept connections from hosts and to provide a simple interface to the data columns on the server. The server only serves as an interface between the programs and the data.
3.0 Commands
Commands consist of a command word followed by optional parameters. The command and the parameters are all separated by one blank space and are not case sensitive. There is only one command allowed per command line. Each command is to be terminated by a CR-LF pair. There are two possible kinds of responses: a text response or a status response.
Text responses are sent only after a numeric status response has already been sent. This status response indicates that text will follow. Text is sent as a series of lines each terminated by a CR-LF pair. A single line containing a period is sent to indicate the end of the text. Status responses are sent as a result of the last command received from the client. These status response lines begin with a 3 digit numeric code. The first digit of the response broadly indicates the success or failure of the previous command.
1xx Informative message
2xx Command ok
4xx Command was correct, but
couldn't be performed for some reason.
5xx Command not implemented, or
incorrect.
The second digit in the code indicates the function response category.
x0x Connection, setup and
miscellaneous messages
x1x
Universal Correlation Key (UCK) selection
x2x Column selection.
x3x Datafile selection
x4x Data
x5x MetaData
x7x Lines message
Here are the essential commands:
LIST [SELECTED] (UCK UCKID | UCKName) | (DATAFILE DatafileName) | (ATTRIBUTE Column# |
AttributeName)
SET (UCK UCKID | UCKName) | (DATAFILE DatafileName) | (LINE (Integer#1 Integer#2 | -Integer#2 -Integer#1))
DELETE (UCK UCKID | UCKName)) | DATAFILE DataFileName
STAT
METADATA
DATA (Col# | AttributeName) [(Col# | AttributeName) (Col# | AttributeName) ...]
DECIMATE (Integer %)
RANDOM (LINE Integer#1) | Integer#2 {0<Integer#2<100)
STOP
QUIT
Here is a complete list of commands:
LIST UCK
LIST DATAFILE
LIST ATTRIBUTE
SET UCK (UCKID | UCKName)
LIST SELECTED UCK
LIST UCK ATTRIBUTE
LIST UCK DATAFILE
SET DATAFILE (DatafileName)
LIST SELECTED DATAFILE
SET LINE (Integer#1 Integer#2) | (-Integer#2 -Integer#1) {where Integer#2 > Integer#1}
DATA (Col# | AttributeName) [(Col# | AttributeName) (Col# | AttributeName) ...]
DECIMATE Integer {where 0<Integer<100}
RANDOM Integer {0<Integer<100}
RANDOM LINE Integer { Integer < #data rows}
STOP
QUIT
4.0 Command Descriptions
LIST UCK
List
the UCK's available.
LIST DATAFILE
List the Datafiles available.
LIST ATTRIBUTE
List the attributes
SET UCK
(UCKID | UCKName)
Select a UCK by
UCKID or by Name.
LIST SELECTED UCK
List the current selected UCKs.
LIST UCK ATTRIBUTE
Lists the attributes associated with the selected UCKs.
SET DATAFILE
(DatafileName)
Selects a Datafile.
LIST SELECTED
DATAFILE
Lists the selected
Datafiles
LIST UCK DATAFILE
Lists the DSML metadata for the Datafiles associated with the
selected uck.
SET LINE (Integer#1 Integer#2) | (-Integer#2
-Integer#1) {where Integer#2 > Integer#1}
Sets which lines of data should be returned, from Integer#1 to
Integer#2.
DATA (Col# | AttributeName) [(Col# |
AttributeName) (Col# | AttributeName) ...]
Returns data in comma delimited format. The data format is each selected
uck and then each column listed
after the data
command. For example, DATA 2 3 returns
uck,Col2,Col3.
METADATA
Returns selected uck and attribute metadata in DSML format.
DECIMATE Integer% {where
0<Integer<100}
Throw away Integer% rows of
data.
RANDOM Integer%
{0<Integer<100}
Return Integer% random rows
of data.
RANDOM LINE Integer { Integer <
#data rows}
Return Integer random rows of
data.
DELETE UCK (UCKID | UCKName)
Deselect a UCK by UCKID or by Name.
STOP
Stops the
data from being sent. This command may be sent even though data is
arriving.
QUIT
Close the
connection to DSTP Server
5.0 Example
Here is a simple example. In what follows, C: indicates commands sent to the DSTP server from the client program; S: indicates responses received from the server by the client.
(Blue indicates reserved words) (Green=Client) (Red=Server)
S: (listens at TCP port 5040)
C: (requests connection on the specified port)
S: 200 www.lac.uic.edu DSTP server INN 0.1 19-Jan-1999 ready (posting ok)
C: LIST UCK
S: 215 List
of UCKs follows
S: <UCK ID="1" NAME="AGE" NUMATTR="6"/>
S:
<UCK ID="2" NAME="ZIPCODE" NUMATTR="5"/>
S: <UCK ID="1"
NAME="AGE" NUMATTR="11"/>
S: .
C: LIST DATAFILE
S: 215 List
of Datafiles follows
S: <DATAFILE NAME="guru1.dat" NUMRECORDS="300"
DESCRIPTION="This data is from the space"
DSFILENAME="file1.ds"/>
S: <DATAFILE NAME="guru2.dat"
NUMRECORDS="3" DESCRIPTION="This data is from the
hospital"
DSFILENAME="file2.ds"/>
S:
<DATAFILE NAME="guru3.dat" NUMRECORDS="500" DESCRIPTION="This data is
fromthe census"
DSFILENAME="file3.ds"/>
S: .
C: LIST UCK ATTRIBUTE
S: 214 17 1
age
S: <ATTRIBUTE-DESCRIPTOR NUMER="1" NAME="ADRG" DATA-TYPE="real"
USE-AS="continuous" UNIT="kg"/>
S: <ATTRIBUTE-DESCRIPTOR
NUMER="4" NAME="DRG" DATA-TYPE="real" USE-AS="continuous"
UNIT="kg"/>
S: <ATTRIBUTE-DESCRIPTOR NUMER="11" NAME="GGP"
DATA-TYPE="real" USE-AS="continuous" UNIT="kg"/>
S:
<ATTRIBUTE-DESCRIPTOR NUMER="7" NAME="GHT" DATA-TYPE="real"
USE-AS="continuous" UNIT="kg"/>
S: <ATTRIBUTE-DESCRIPTOR
NUMER="2" NAME="LOS" DATA-TYPE="real" USE-AS="continuous"
UNIT="kg"/>
S: <ATTRIBUTE-DESCRIPTOR NUMER="6" NAME="PPT"
DATA-TYPE="real" USE-AS="continuous" UNIT="kg"/>
S:
<ATTRIBUTE-DESCRIPTOR NUMER="15" NAME="au" DATA-TYPE="real"
USE-AS="continuous" UNIT="kg"/>
S: <ATTRIBUTE-DESCRIPTOR
NUMER="14" NAME="cp" DATA-TYPE="real" USE-AS="continuous"
UNIT="kg"/>
S: <ATTRIBUTE-DESCRIPTOR NUMER="16" NAME="ih"
DATA-TYPE="real" USE-AS="continuous" UNIT="kg"/>
S:
<ATTRIBUTE-DESCRIPTOR NUMER="13" NAME="ku" DATA-TYPE="real"
USE-AS="continuous" UNIT="kg"/>
S: <ATTRIBUTE-DESCRIPTOR
NUMER="22" NAME="lf" DATA-TYPE="real" USE-AS="continuous"
UNIT="kg"/>
S: <ATTRIBUTE-DESCRIPTOR NUMER="17" NAME="lo"
DATA-TYPE="real" USE-AS="continuous" UNIT="kg"/>
S:
<ATTRIBUTE-DESCRIPTOR NUMER="21" NAME="mu" DATA-TYPE="real"
USE-AS="continuous" UNIT="kg"/>
S: <ATTRIBUTE-DESCRIPTOR
NUMER="20" NAME="nb" DATA-TYPE="real" USE-AS="continuous"
UNIT="kg"/>
S: <ATTRIBUTE-DESCRIPTOR NUMER="18" NAME="ps"
DATA-TYPE="real" USE-AS="continuous" UNIT="kg"/>
S:
<ATTRIBUTE-DESCRIPTOR NUMER="19" NAME="qw" DATA-TYPE="real"
USE-AS="continuous" UNIT="kg"/>
S: <ATTRIBUTE-DESCRIPTOR
NUMER="12" NAME="tr" DATA-TYPE="real" USE-AS="continuous"
UNIT="kg"/>
S: .
C: UCK 2 DATAFILE LIST
S: 215 List
of Datafiles follows
S: guru2.dat
S: .
C: DATA 1
S: 412 No Datafile has been selected
C: SET DATFILE guru1.dat
S:
guru1.dat
S: .
C: METADATA
S: 221 1 pair
retrieved - metadata follows
S: <PAIR SIZE ="300">
S: <UCK
ID="1" NAME="AGE" NUMATTR="6"/>
S: <ATTRIBUTE-DESCRIPTOR
NUMER="1" NAME="ADRG" DATA-TYPE="real" USE-AS="continuous"
UNIT="kg"/>
S: </UCK>
S: </PAIR>
S: .
C: QUIT
S: 205
GoodBye
S: .