|
|
@@ -104,21 +104,21 @@ Speed |
|
|
|
|
|
|
|
- First approach was quadratic. Adding four hours of data: |
|
|
|
|
|
|
|
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-110000 /bpnilm/1/raw |
|
|
|
real 24m31.093s |
|
|
|
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-120001 /bpnilm/1/raw |
|
|
|
real 43m44.528s |
|
|
|
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-130002 /bpnilm/1/raw |
|
|
|
real 93m29.713s |
|
|
|
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-140003 /bpnilm/1/raw |
|
|
|
real 166m53.007s |
|
|
|
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-110000 /bpnilm/1/raw |
|
|
|
real 24m31.093s |
|
|
|
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-120001 /bpnilm/1/raw |
|
|
|
real 43m44.528s |
|
|
|
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-130002 /bpnilm/1/raw |
|
|
|
real 93m29.713s |
|
|
|
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-140003 /bpnilm/1/raw |
|
|
|
real 166m53.007s |
|
|
|
|
|
|
|
- Disabling pytables indexing didn't help: |
|
|
|
|
|
|
|
real 31m21.492s |
|
|
|
real 52m51.963s |
|
|
|
real 102m8.151s |
|
|
|
real 176m12.469s |
|
|
|
real 31m21.492s |
|
|
|
real 52m51.963s |
|
|
|
real 102m8.151s |
|
|
|
real 176m12.469s |
|
|
|
|
|
|
|
- Server RAM usage is constant. |
|
|
|
|
|
|
@@ -139,10 +139,12 @@ Speed |
|
|
|
- Next slowdown target is nilmdb.layout.Parser.parse(). |
|
|
|
- Rewrote parsers using cython and sscanf |
|
|
|
- Stats (rev 10831), with _add_interval disabled |
|
|
|
layout.pyx.Parser.parse:128 6303 sec, 262k calls |
|
|
|
layout.pyx.parse:63 13913 sec, 5.1g calls |
|
|
|
numpy:records.py.fromrecords:569 7410 sec, 262k calls |
|
|
|
- Probably OK for now. |
|
|
|
|
|
|
|
layout.pyx.Parser.parse:128 6303 sec, 262k calls |
|
|
|
layout.pyx.parse:63 13913 sec, 5.1g calls |
|
|
|
numpy:records.py.fromrecords:569 7410 sec, 262k calls |
|
|
|
|
|
|
|
- Probably OK for now. |
|
|
|
|
|
|
|
- After all updates, now takes about 8.5 minutes to insert an hour of |
|
|
|
data, constant after adding 171 hours (4.9 billion data points) |
|
|
@@ -157,12 +159,12 @@ IntervalSet speed |
|
|
|
sorted list |
|
|
|
|
|
|
|
- Replaced with bxInterval; now takes about log n time for an insertion |
|
|
|
- TestIntervalSpeed with range(17,18) and profiling |
|
|
|
- 85 μs each |
|
|
|
- 131072 calls to `__iadd__` |
|
|
|
- 131072 to bx.insert_interval |
|
|
|
- 131072 to bx.insert:395 |
|
|
|
- 2355835 to bx.insert:106 (18x as many?) |
|
|
|
- TestIntervalSpeed with range(17,18) and profiling |
|
|
|
- 85 μs each |
|
|
|
- 131072 calls to `__iadd__` |
|
|
|
- 131072 to bx.insert_interval |
|
|
|
- 131072 to bx.insert:395 |
|
|
|
- 2355835 to bx.insert:106 (18x as many?) |
|
|
|
|
|
|
|
- Tried blist too, worse than bxinterval. |
|
|
|
|
|
|
@@ -173,14 +175,14 @@ IntervalSet speed |
|
|
|
insert for 2**17 insertions, followed by total wall time and RAM |
|
|
|
usage for running "make test" with `test_rbtree` and `test_interval` |
|
|
|
with range(5,20): |
|
|
|
- old values with bxinterval: |
|
|
|
20.2 μS, total 20 s, 177 MB RAM |
|
|
|
- rbtree, plain python: |
|
|
|
97 μS, total 105 s, 846 MB RAM |
|
|
|
- rbtree converted to cython: |
|
|
|
26 μS, total 29 s, 320 MB RAM |
|
|
|
- rbtree and interval converted to cython: |
|
|
|
8.4 μS, total 12 s, 134 MB RAM |
|
|
|
- old values with bxinterval: |
|
|
|
20.2 μS, total 20 s, 177 MB RAM |
|
|
|
- rbtree, plain python: |
|
|
|
97 μS, total 105 s, 846 MB RAM |
|
|
|
- rbtree converted to cython: |
|
|
|
26 μS, total 29 s, 320 MB RAM |
|
|
|
- rbtree and interval converted to cython: |
|
|
|
8.4 μS, total 12 s, 134 MB RAM |
|
|
|
|
|
|
|
Layouts |
|
|
|
------- |
|
|
@@ -220,20 +222,45 @@ Each table contains: |
|
|
|
parameters of how the data is broken up, like files per directory, |
|
|
|
rows per file, and the binary data format |
|
|
|
|
|
|
|
- A changing `_nrows` file (Python pickle format) that contains the |
|
|
|
number of the next row that will be inserted into the database. |
|
|
|
This number only increases, even if rows are deleted, and is |
|
|
|
overwritten atomically. (Note that it may not really be atomic on |
|
|
|
all OSes, and it may not be fully durable on power loss or other |
|
|
|
failures.) |
|
|
|
|
|
|
|
- Hex named subdirectories `("%04x", although more than 65536 can exist)` |
|
|
|
|
|
|
|
- Hex named files within those subdirectories, like: |
|
|
|
|
|
|
|
/nilmdb/data/newton/raw/000b/010a |
|
|
|
|
|
|
|
The data format of these files is raw binary, interpreted by the |
|
|
|
Python `struct` module according to the format string in the |
|
|
|
`_format` file. |
|
|
|
|
|
|
|
- Same as above, with `.removed` suffix, is an optional file (Python |
|
|
|
pickle format) containing a list of row numbers that have been |
|
|
|
logically removed from the file. If this range covers the entire |
|
|
|
file, the entire file can be removed. |
|
|
|
file, the entire file will be removed. |
|
|
|
|
|
|
|
- Note that the `bulkdata.nrows` variable is calculated once in |
|
|
|
`BulkData.__init__()`, and only ever incremented during use. Thus, |
|
|
|
even if all data is removed, `nrows` can remain high. However, if |
|
|
|
the server is restarted, the newly calculated `nrows` may be lower |
|
|
|
than in a previous run due to deleted data. To be specific, this |
|
|
|
sequence of events: |
|
|
|
|
|
|
|
- insert data |
|
|
|
- remove all data |
|
|
|
- insert data |
|
|
|
|
|
|
|
will result in having different row numbers in the database, and |
|
|
|
differently numbered files on the filesystem, than the sequence: |
|
|
|
|
|
|
|
- insert data |
|
|
|
- remove all data |
|
|
|
- restart server |
|
|
|
- insert data |
|
|
|
|
|
|
|
This is okay! Everything should remain consistent both in the |
|
|
|
`BulkData` and `NilmDB`. Not attempting to readjust `nrows` during |
|
|
|
deletion makes the code quite a bit simpler. |
|
|
|
|
|
|
|
- Similarly, data files are never truncated shorter. Removing data |
|
|
|
from the end of the file will not shorten it; it will only be |
|
|
|
deleted when it has been fully filled and all of the data has been |
|
|
|
subsequently removed. |