|
|
@@ -19,13 +19,13 @@ Sqlite performance |
|
|
|
Committing a transaction in the default sync mode (PRAGMA synchronous=FULL) |
|
|
|
takes about 125msec. sqlite3 will commit transactions at 3 times: |
|
|
|
|
|
|
|
1: explicit con.commit() |
|
|
|
1. explicit con.commit() |
|
|
|
|
|
|
|
2: between a series of DML commands and non-DML commands, e.g. |
|
|
|
2. between a series of DML commands and non-DML commands, e.g. |
|
|
|
after a series of INSERT, SELECT, but before a CREATE TABLE or |
|
|
|
PRAGMA. |
|
|
|
|
|
|
|
3: at the end of an explicit transaction, e.g. "with self.con as con:" |
|
|
|
3. at the end of an explicit transaction, e.g. "with self.con as con:" |
|
|
|
|
|
|
|
To speed up testing, or if this transaction speed becomes an issue, |
|
|
|
the sync=False option to NilmDB will set PRAGMA synchronous=OFF. |
|
|
@@ -48,56 +48,58 @@ transfer? |
|
|
|
everything still gets buffered. Just a tradeoff of buffer size. |
|
|
|
|
|
|
|
Before timestamps are added: |
|
|
|
|
|
|
|
- Raw data is about 440 kB/s (9 channels) |
|
|
|
- Prep data is about 12.5 kB/s (1 phase) |
|
|
|
- How do we know how much data to send? |
|
|
|
|
|
|
|
- Remember that we can only do maybe 8-50 transactions per second on |
|
|
|
the sqlite database. So if one block of inserted data is one |
|
|
|
transaction, we'd need the raw case to be around 64kB per request, |
|
|
|
ideally more. |
|
|
|
- Maybe use a range, based on how long it's taking to read the data |
|
|
|
- If no more data, send it |
|
|
|
- If data > 1 MB, send it |
|
|
|
- If more than 10 seconds have elapsed, send it |
|
|
|
- Should those numbers come from the server? |
|
|
|
- Remember that we can only do maybe 8-50 transactions per second on |
|
|
|
the sqlite database. So if one block of inserted data is one |
|
|
|
transaction, we'd need the raw case to be around 64kB per request, |
|
|
|
ideally more. |
|
|
|
- Maybe use a range, based on how long it's taking to read the data |
|
|
|
- If no more data, send it |
|
|
|
- If data > 1 MB, send it |
|
|
|
- If more than 10 seconds have elapsed, send it |
|
|
|
- Should those numbers come from the server? |
|
|
|
|
|
|
|
Converting from ASCII to PyTables: |
|
|
|
|
|
|
|
- For each row getting added, we need to set attributes on a PyTables |
|
|
|
Row object and call table.append(). This means that there isn't a |
|
|
|
particularly efficient way of converting from ascii. |
|
|
|
- Could create a function like nilmdb.layout.Layout("foo".fillRow(asciiline) |
|
|
|
- But this means we're doing parsing on the serialized side |
|
|
|
- Let's keep parsing on the threaded server side so we can detect |
|
|
|
errors better, and not block the serialized nilmdb for a slow |
|
|
|
parsing process. |
|
|
|
- But this means we're doing parsing on the serialized side |
|
|
|
- Let's keep parsing on the threaded server side so we can detect |
|
|
|
errors better, and not block the serialized nilmdb for a slow |
|
|
|
parsing process. |
|
|
|
- Client sends ASCII data |
|
|
|
- Server converts this ACSII data to a list of values |
|
|
|
- Maybe: |
|
|
|
- Maybe: |
|
|
|
|
|
|
|
# threaded side creates this object |
|
|
|
parser = nilmdb.layout.Parser("layout_name") |
|
|
|
# threaded side parses and fills it with data |
|
|
|
parser.parse(textdata) |
|
|
|
# serialized side pulls out rows |
|
|
|
for n in xrange(parser.nrows): |
|
|
|
parser.fill_row(rowinstance, n) |
|
|
|
table.append() |
|
|
|
# threaded side creates this object |
|
|
|
parser = nilmdb.layout.Parser("layout_name") |
|
|
|
# threaded side parses and fills it with data |
|
|
|
parser.parse(textdata) |
|
|
|
# serialized side pulls out rows |
|
|
|
for n in xrange(parser.nrows): |
|
|
|
parser.fill_row(rowinstance, n) |
|
|
|
table.append() |
|
|
|
|
|
|
|
|
|
|
|
Inserting streams, inside nilmdb |
|
|
|
-------------------------------- |
|
|
|
|
|
|
|
- First check that the new stream doesn't overlap. |
|
|
|
- Get minimum timestamp, maximum timestamp from data parser. |
|
|
|
- (extend parser to verify monotonicity and track extents) |
|
|
|
- Get all intervals for this stream in the database |
|
|
|
- See if new interval overlaps any existing ones |
|
|
|
- If so, bail |
|
|
|
- Question: should we cache intervals inside NilmDB? |
|
|
|
- Assume database is fast for now, and always rebuild fom DB. |
|
|
|
- Can add a caching layer later if we need to. |
|
|
|
- `stream_get_ranges(path)` -> return IntervalSet? |
|
|
|
- Get minimum timestamp, maximum timestamp from data parser. |
|
|
|
- (extend parser to verify monotonicity and track extents) |
|
|
|
- Get all intervals for this stream in the database |
|
|
|
- See if new interval overlaps any existing ones |
|
|
|
- If so, bail |
|
|
|
- Question: should we cache intervals inside NilmDB? |
|
|
|
- Assume database is fast for now, and always rebuild fom DB. |
|
|
|
- Can add a caching layer later if we need to. |
|
|
|
- `stream_get_ranges(path)` -> return IntervalSet? |
|
|
|
|
|
|
|
Speed |
|
|
|
----- |
|
|
@@ -105,44 +107,44 @@ Speed |
|
|
|
- First approach was quadratic. Adding four hours of data: |
|
|
|
|
|
|
|
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-110000 /bpnilm/1/raw |
|
|
|
real 24m31.093s |
|
|
|
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-120001 /bpnilm/1/raw |
|
|
|
real 43m44.528s |
|
|
|
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-130002 /bpnilm/1/raw |
|
|
|
real 93m29.713s |
|
|
|
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-140003 /bpnilm/1/raw |
|
|
|
real 166m53.007s |
|
|
|
real 24m31.093s |
|
|
|
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-120001 /bpnilm/1/raw |
|
|
|
real 43m44.528s |
|
|
|
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-130002 /bpnilm/1/raw |
|
|
|
real 93m29.713s |
|
|
|
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-140003 /bpnilm/1/raw |
|
|
|
real 166m53.007s |
|
|
|
|
|
|
|
- Disabling pytables indexing didn't help: |
|
|
|
|
|
|
|
real 31m21.492s |
|
|
|
real 52m51.963s |
|
|
|
real 102m8.151s |
|
|
|
real 176m12.469s |
|
|
|
real 52m51.963s |
|
|
|
real 102m8.151s |
|
|
|
real 176m12.469s |
|
|
|
|
|
|
|
- Server RAM usage is constant. |
|
|
|
|
|
|
|
- Speed problems were due to IntervalSet speed, of parsing intervals |
|
|
|
from the database and adding the new one each time. |
|
|
|
|
|
|
|
- First optimization is to cache result of `nilmdb:_get_intervals`, |
|
|
|
which gives the best speedup. |
|
|
|
- First optimization is to cache result of `nilmdb:_get_intervals`, |
|
|
|
which gives the best speedup. |
|
|
|
|
|
|
|
- Also switched to internally using bxInterval from bx-python package. |
|
|
|
Speed of `tests/test_interval:TestIntervalSpeed` is pretty decent |
|
|
|
and seems to be growing logarithmically now. About 85μs per insertion |
|
|
|
for inserting 131k entries. |
|
|
|
- Also switched to internally using bxInterval from bx-python package. |
|
|
|
Speed of `tests/test_interval:TestIntervalSpeed` is pretty decent |
|
|
|
and seems to be growing logarithmically now. About 85μs per insertion |
|
|
|
for inserting 131k entries. |
|
|
|
|
|
|
|
- Storing the interval data in SQL might be better, with a scheme like: |
|
|
|
http://www.logarithmic.net/pfh/blog/01235197474 |
|
|
|
- Storing the interval data in SQL might be better, with a scheme like: |
|
|
|
http://www.logarithmic.net/pfh/blog/01235197474 |
|
|
|
|
|
|
|
- Next slowdown target is nilmdb.layout.Parser.parse(). |
|
|
|
- Rewrote parsers using cython and sscanf |
|
|
|
- Stats (rev 10831), with _add_interval disabled |
|
|
|
- Rewrote parsers using cython and sscanf |
|
|
|
- Stats (rev 10831), with _add_interval disabled |
|
|
|
|
|
|
|
layout.pyx.Parser.parse:128 6303 sec, 262k calls |
|
|
|
layout.pyx.parse:63 13913 sec, 5.1g calls |
|
|
|
numpy:records.py.fromrecords:569 7410 sec, 262k calls |
|
|
|
layout.pyx.parse:63 13913 sec, 5.1g calls |
|
|
|
numpy:records.py.fromrecords:569 7410 sec, 262k calls |
|
|
|
|
|
|
|
- Probably OK for now. |
|
|
|
|
|
|
@@ -213,8 +215,8 @@ created stream. These locations are called tables. For example, |
|
|
|
tables might be located at |
|
|
|
|
|
|
|
nilmdb/data/newton/raw/ |
|
|
|
nilmdb/data/newton/prep/ |
|
|
|
nilmdb/data/cottage/raw/ |
|
|
|
nilmdb/data/newton/prep/ |
|
|
|
nilmdb/data/cottage/raw/ |
|
|
|
|
|
|
|
Each table contains: |
|
|
|
|
|
|
|