nilmdb.nilmdb is the NILM database interface. It tracks a PyTables database holds actual rows of data, and a SQL database tracks metadata and ranges.
Access to the nilmdb must be single-threaded. This is handled with the nilmdb.serializer class.
nilmdb.server is a HTTP server that provides an interface to talk, thorugh the serialization layer, to the nilmdb object.
nilmdb.client is a HTTP client that connects to this.
Committing a transaction in the default sync mode (PRAGMA synchronous=FULL) takes about 125msec. sqlite3 will commit transactions at 3 times:
1: explicit con.commit()
2: between a series of DML commands and non-DML commands, e.g. after a series of INSERT, SELECT, but before a CREATE TABLE or PRAGMA.
3: at the end of an explicit transaction, e.g. “with self.con as con:”
To speed up testing, or if this transaction speed becomes an issue, the sync=False option to NilmDB will set PRAGMA synchronous=OFF.
We need to send the contents of “data” as POST. Do we need chunked transfer?
Before timestamps are added:
Raw data is about 440 kB/s (9 channels)
Prep data is about 12.5 kB/s (1 phase)
How do we know how much data to send?
Converting from ASCII to PyTables:
Maybe:
# threaded side creates this object
parser = nilmdb.layout.Parser("layout_name")
# threaded side parses and fills it with data
parser.parse(textdata)
# serialized side pulls out rows
for n in xrange(parser.nrows):
parser.fill_row(rowinstance, n)
table.append()
stream_get_ranges(path)
-> return IntervalSet?First approach was quadratic. Adding four hours of data:
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-110000 /bpnilm/1/raw real 24m31.093s $ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-120001 /bpnilm/1/raw real 43m44.528s $ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-130002 /bpnilm/1/raw real 93m29.713s $ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-140003 /bpnilm/1/raw real 166m53.007s
Disabling pytables indexing didn’t help:
real 31m21.492s real 52m51.963s real 102m8.151s real 176m12.469s
Server RAM usage is constant.
Speed problems were due to IntervalSet speed, of parsing intervals from the database and adding the new one each time.
First optimization is to cache result of nilmdb:_get_intervals
,
which gives the best speedup.
Also switched to internally using bxInterval from bx-python package.
Speed of tests/test_interval:TestIntervalSpeed
is pretty decent
and seems to be growing logarithmically now. About 85μs per insertion
for inserting 131k entries.
Storing the interval data in SQL might be better, with a scheme like: http://www.logarithmic.net/pfh/blog/01235197474
Next slowdown target is nilmdb.layout.Parser.parse().
Initial implementation was pretty slow, even with binary search in sorted list
Replaced with bxInterval; now takes about log n time for an insertion
__iadd__
Tried blist too, worse than bxinterval.
Might be algorithmic improvements to be made in Interval.py,
like in __and__
Current/old design has specific layouts: RawData, PrepData, RawNotchedData. Let’s get rid of this entirely and switch to simpler data types that are just collections and counts of a single type. We’ll still use strings to describe them, with format:
type_count
where type is “uint16”, “float32”, or “float64”, and count is an integer.
nilmdb.layout.named() will parse these strings into the appropriate handlers. For compatibility:
"RawData" == "uint16_6"
"RawNotchedData" == "uint16_9"
"PrepData" == "float32_8"