Browse Source

notes

git-svn-id: https://bucket.mit.edu/svn/nilm/nilmdb@10832 ddd99763-3ecb-0310-9145-efcb8ce7c51f
tags/bxinterval-last
Jim Paris 12 years ago
parent
commit
7eef39d5fd
5 changed files with 43 additions and 31 deletions
  1. +37
    -29
      design.md
  2. +2
    -0
      nilmdb/layout.pyx
  3. +2
    -0
      nilmdb/nilmdb.py
  4. +1
    -1
      runserver.py
  5. +1
    -1
      tests/test_interval.py

+ 37
- 29
design.md View File

@@ -119,36 +119,44 @@ Speed
real 102m8.151s
real 176m12.469s

(Maybe even worse? Probably just more load on the system)
- Server RAM usage isn't growing, but maybe there's some other bug there.

- Turns out it's because of setting up the intervals taking so long --
7 seconds for 5000 intervals. I replaced the Interval internals
with one based on the quicksect stuff from bx-python, which seems
much nicer. But it still has the problem of reading in the entire
SQL database and building the in-memory IntervalSet for each
request. First improvement would be to do this once and cache
results, second improvement might be to always store the interval
data in SQL, as suggested at
http://www.logarithmic.net/pfh/blog/01235197474
- Next slowdown target is nilmdb.layout.Parser.parse().
- Consider cython
- Or could just split strings, and let pytables's table.append() convert
from ASCII itself? Would require changing timestamp format on client
side, though. Cython is probably better -- customized routine to parse
each layout type directly from a string into a typed array?
- Server RAM usage is constant.

- Speed problems were due to IntervalSet speed, of parsing intervals
from the database and adding the new one each time.

- First optimization is to cache result of `nilmdb:_get_intervals`,
which gives the best speedup.
- Also switched to internally using bxInterval from bx-python package.
Speed of `tests/test_interval:TestIntervalSpeed` is pretty decent
and seems to be growing logarithmically now. About 85μs per insertion
for inserting 131k entries.
Interval speed
--------------
- Replaced with bxInterval
- Storing the interval data in SQL might be better, with a scheme like:
http://www.logarithmic.net/pfh/blog/01235197474
- Next slowdown target is nilmdb.layout.Parser.parse().
- Rewrote parsers using cython and sscanf
- Stats (rev 10831), with _add_interval disabled
layout.pyx.Parser.parse:128 6303 sec, 262k calls
layout.pyx.parse:63 13913 sec, 5.1g calls
numpy:records.py.fromrecords:569 7410 sec, 262k calls
- Probably OK for now.
IntervalSet speed
-----------------
- Initial implementation was pretty slow, even with binary search in
sorted list

- Replaced with bxInterval; now takes about log n time for an insertion
- TestIntervalSpeed with range(17,18) and profiling
- 85 μs each
- 131072 calls to `__iadd__`
- 131072 to bx.insert_interval
- 131072 to bx.insert:395
- 2355835 to bx.insert:106 (18x as many?)

- TestIntervalSpeed with range(17,18) and profiling
- 85 μs each
- 131072 calls to __iadd__
- 131072 to bx.insert_interval
- 131072 to bx.insert:395
- 2355835 to bx.insert:106 (18x as many?)
- Tried blist too, worse than bxinterval.

- Might be algorithmic improvements to be made in Interval.py,
like in `__and__`

+ 2
- 0
nilmdb/layout.pyx View File

@@ -1,3 +1,5 @@
# cython: profile=True

import tables
import time
import sys


+ 2
- 0
nilmdb/nilmdb.py View File

@@ -175,6 +175,8 @@ class NilmDB(object):
Add interval to the internal interval cache, and to the database.
Note: arguments must be ints (not numpy.int64, etc)
"""
# XXX SPEED TEST
return
# Ensure this stream's intervals are cached, and add the new
# interval to that cache.
iset = self._get_intervals(stream_id)


+ 1
- 1
runserver.py View File

@@ -9,7 +9,7 @@ server = nilmdb.Server(db, host = "127.0.0.1",
port = 12380,
embedded = False)

if 1:
if 0:
server.start(blocking = True)
else:
try:


+ 1
- 1
tests/test_interval.py View File

@@ -231,4 +231,4 @@ class TestIntervalSpeed:
speeds[j] = speed
aplotter.plot(speeds.keys(), speeds.values(), plot_slope=True)
yappi.stop()
#yappi.print_stats(sort_type=yappi.SORTTYPE_TTOT, limit=10)
yappi.print_stats(sort_type=yappi.SORTTYPE_TTOT, limit=10)

Loading…
Cancel
Save