Browse Source

More speed tests, some whitespace cleanups

tags/before-insert-rework
Jim Paris 11 years ago
parent
commit
437e1b425a
1 changed files with 17 additions and 13 deletions
  1. +17
    -13
      design.md

+ 17
- 13
design.md View File

@@ -103,13 +103,13 @@ Speed

- First approach was quadratic. Adding four hours of data:

$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-110000 /bpnilm/1/raw
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-110000 /bpnilm/1/raw
real 24m31.093s
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-120001 /bpnilm/1/raw
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-120001 /bpnilm/1/raw
real 43m44.528s
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-130002 /bpnilm/1/raw
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-130002 /bpnilm/1/raw
real 93m29.713s
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-140003 /bpnilm/1/raw
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-140003 /bpnilm/1/raw
real 166m53.007s

- Disabling pytables indexing didn't help:
@@ -122,19 +122,19 @@ Speed
- Server RAM usage is constant.

- Speed problems were due to IntervalSet speed, of parsing intervals
from the database and adding the new one each time.
from the database and adding the new one each time.

- First optimization is to cache result of `nilmdb:_get_intervals`,
which gives the best speedup.
- Also switched to internally using bxInterval from bx-python package.
Speed of `tests/test_interval:TestIntervalSpeed` is pretty decent
and seems to be growing logarithmically now. About 85μs per insertion
for inserting 131k entries.
- Storing the interval data in SQL might be better, with a scheme like:
http://www.logarithmic.net/pfh/blog/01235197474
- Next slowdown target is nilmdb.layout.Parser.parse().
- Rewrote parsers using cython and sscanf
- Stats (rev 10831), with _add_interval disabled
@@ -142,7 +142,7 @@ Speed
layout.pyx.parse:63 13913 sec, 5.1g calls
numpy:records.py.fromrecords:569 7410 sec, 262k calls
- Probably OK for now.
IntervalSet speed
-----------------
- Initial implementation was pretty slow, even with binary search in
@@ -163,12 +163,16 @@ IntervalSet speed

- Replaced again with rbtree. Seems decent. Numbers are time per
insert for 2**17 insertions, followed by total wall time and RAM
usage for running "make test" with test_rbtree and test_interval
usage for running "make test" with `test_rbtree` and `test_interval`
with range(5,20):
- Plain python:
- old values with bxinterval:
20.2 μS, total 20 s, 177 MB RAM
- rbtree, plain python:
97 μS, total 105 s, 846 MB RAM
- rbtree converted to cython:
26 μS, total 29 s, 320 MB RAM
- rbtree and interval converted to cython:
8.4 μS, total 12 s, 134 MB RAM

Layouts
-------
@@ -178,12 +182,12 @@ just collections and counts of a single type. We'll still use strings
to describe them, with format:

type_count
where type is "uint16", "float32", or "float64", and count is an integer.

nilmdb.layout.named() will parse these strings into the appropriate
handlers. For compatibility:
"RawData" == "uint16_6"
"RawNotchedData" == "uint16_9"
"PrepData" == "float32_8"

Loading…
Cancel
Save