ICEDB is a continuous query processor to process streaming data, and is a central part of the CarTel project. ICEDB differs from traditional stream processing applications in how query results are sent to the querying application (running on the portal). Because network connectivity is variable and intermittent, with ICEDB:
Queries specify what sensor data must be acquired and at what rate, how the data should be sub-sampled, filtered, and summarized on the mobile node, and in what (dynamic) priority order results should be sent back to the portal.
Query results stream in across an intermittently connected network, and populate a relational database at the portal.
Applications issue SQL queries on the portal’s relational database to retrieve data they need for further analysis, visualization, etc. These are snapshot queries that run on whatever data is currently available. Applications do not wait synchronously for the results of continuous queries.
Thus, applications can think of the data distributed across the mobile network as being stored locally in a standard SQL relational database, which simplifies how they are written. The programming model is familiar, essentially the same as what web developers today use. ICEDB deals with the underlying complexity of distributing queries to the mobile nodes (where they run in situ), coping with the network’s vagaries, and ensuring that the results are available locally.
ICEDB handles heterogeneous sensor data, allowing the set of sensors to be expanded without requiring major software changes on the remote nodes. Each sensor has an adapter running on the node that handles the details of configuring and extracting information from that sensor and converting it into a normalized form. To ease management and deployment, when a new sensor is added, or when the functions of an adapter need to be modified, only the adapter module needs to change.
Both the ICEDB node and ICEDB portal require the following software. We have tested with the indicated versions.
The ICEDB portal additionally requires the following software.
To deploy ICEDB onto nodes that consist of a master and a slave in the same NFS-based setup as the one used in CarTel, you may find the utilities in tools/deploy/ to be helpful.
Download ICEDB v0.1 from http://cartel.csail.mit.edu/icedb/icedb-0.1.tgz.
To install ICEDB, set your ICEDB_DATA_DIR environment variable to the location of ICEDB’s “data cluster”, i.e. the directory where ICEDB will store all of its data (database, catalog, logs, configurations, etc.). This variable must be set before running any part of ICEDB. Also ensure (either before or after the following installation step) that the directory gives write access to whatever user(s) ICEDB will be running under. Note that ICEDB cannot run as the root user, for security reasons.
Then run setup.bash, which is a standard SimpleSetup installer. This should be the only step that requires root access, if you’re installing to a root-only location.
Once ICEDB is installed, modify the localconf.py configuration file under $ICEDB_DATA_DIR/central and $ICEDB_DATA_DIR/device (i.e. on both the central and device nodes). You can add arbitrary Python code to this file, which is automatically executed before many of the ICEDB tools are run, but it is intended primarily as a way to override the default values specified in the icedb.conf module, so refer to that file for more examples of what to change.
In particular, one commonly changed value is the location of the central node. This is where data is collected from the device nodes. Set this by setting the central_host variable.
ICEDB expects to receive POSIX signals sigusr1 and sigusr2 as notifications of network connection and disconection, respectively. In the CarTel system, the software for doing this is called OpenWifi.
Now initialize the data cluster on the central host (also wipes out any previous ICEDB data cluster - note that this does not affect any other Postgresql clusters/databases you have on your system):
central-host $ icedb-setup central cluster
Do the same on the device:
device-host $ icedb-setup device cluster
You should only ever need to do this once. This also starts up the backend Postgresql daemons.
Initialize the ICEDB catalogs/users/databases on the nodes by running:
central-host $ icedb-setup central db
device-host $ icedb-setup device db
NOTE: The Postgresql daemons need to be running for these commands to work. This means the cluster setup command must have been run before this. In the future, this database setup step will automatically start Postgresql if it’s not already running.
icedb-ctl is the ICEDB daemon controller. To start ICEDB on the central and device nodes, run:
central-host $ icedb-ctl central start
device-host $ icedb-ctl device start
Other than the start command, you can also issue stop, status, and restart. The actual ICEDB executables are named icedb-central and icedb-device, but the user should not have to invoke these manually; they do expect the environment to have been prepared properly.
To manage adapters or queries, use the icedb-client tool from the central host (currently requires execution on the central host):
central-host $ icedb-client add-adapter my_simple_gps
'latitude double precision, longitude double precision'
--path /path/to/simple-gps-adapter.exe
central-host $ icedb-client add-query myfirstquery
'select * from my_simple_gps
where time > now - 5 every 5 seconds'
For help regarding icedb-client’s command line syntax, see icedb-client --help.
Adapters are the data sources from which ICEDB collects data. We refer to the executable binaries or scripts that insert the data into ICEDB “packages”. ICEDB downloads adapters and saves the packages to a configuration-specified location (by default, to $ICEDB_DATA_DIR/commands/). ICEDB is not resopnsible for starting or stopping these adapter packages; the user must manually do this.
One handy tool for this is icedb-adapter, which exports ICEDB configuration options as environment variables for an adapter package to consume. The usage is simple:
device-host $ icedb-adapter my-adapter-package
This in turn uses the bash-conf tool, which developers may find useful to get configuration variables.
To write an adapter package, refer to the icedb-dummy-pusher Perl script as a simple example. gps2db is a GPS adapter that is an example of a complete adapter package. The IcedbPusher Perl module is provided for your convenience. ICEDB expects the data to be sent in CSV format over a socket on icedb_data_port, but with the name of the adapter as the first line.
Aside from data sources, continuous queries can also be deployed. These are simply normal SQL queries with an optional final clause, either:
RATE _n_, where _n_ specifies the rate at which the query is to be executed over the database in Hz, or
EVERY _n_, where _n_ is the interval in seconds between window slides, and is the reciprocal of RATE.
The results from these queries, which are executed by the ICEDB device daemon over the historical data stored in the Postgresql backend, are buffered in different Postgresql tables and sent back to the central node.
Data from adapters are stored into tables on the device node’s local database. The table schemas for each adapter has the attributes as specified in the adapter definition, but along with the following fields:
time: the time at which the tuple was inserted into the device’s databaserec_id: a unique identifier for the tupleFor instance, the gps adapter’s attributes string might be:
lat double precision not null, long double precision not null
This will result in the following DDL on the device:
create table gps (
rec_id serial4 not null,
time timestamp not null,
lat double precision not null,
long double precision not null
)
These tuples are forwarded to the central node when connectivity is available. The central node essentially builds tables with the same schema, but with the following additional fields:
time_received: the time when the tuple was received at the central node.unit_id: the ID of the device node from which the tuple originated.Query results are also buffered in tables on the device node’s local database. These tables are similarly mirrored onto the central node.
Applications are provided direct access to these tables, which are created and populated by ICEDB.
ICEDB is released under the GNU General Public License.
Support for ICEDB was provided by the NSF CAREER Program under grant number 0448124.
http://www.mit.edu/~y_z/http://cartel.casil.mit.edu/icedb/http://cartel.csail.mit.edu/