Compare commits
23 Commits
bxinterval
...
insert-rew
Author | SHA1 | Date | |
---|---|---|---|
9082cc9f44 | |||
bf64a40472 | |||
32dbeebc09 | |||
66ddc79b15 | |||
7a8bd0bf41 | |||
ee552de740 | |||
6d1fb61573 | |||
f094529e66 | |||
5fecec2a4c | |||
85bb46f45c | |||
17c329fd6d | |||
437e1b425a | |||
c0f87db3c1 | |||
a9c5c19e30 | |||
f39567b2bc | |||
99ec0f4946 | |||
f5c60f68dc | |||
bdef0986d6 | |||
c396c4dac8 | |||
0b443f510b | |||
66fa6f3824 | |||
875fbe969f | |||
e35e85886e |
4
.gitignore
vendored
Normal file
4
.gitignore
vendored
Normal file
@@ -0,0 +1,4 @@
|
||||
db/
|
||||
tests/*testdb/
|
||||
.coverage
|
||||
*.pyc
|
@@ -1,2 +1,4 @@
|
||||
sudo apt-get install python-nose python-coverage
|
||||
sudo apt-get install python-tables cython python-cherrypy3
|
||||
sudo apt-get install python-tables python-cherrypy3
|
||||
sudo apt-get install cython # 0.17.1-1 or newer
|
||||
|
||||
|
4
TODO
4
TODO
@@ -1,5 +1 @@
|
||||
- Merge adjacent intervals on insert (maybe with client help?)
|
||||
|
||||
- Better testing:
|
||||
- see about getting coverage on layout.pyx
|
||||
- layout.pyx performance tests, before and after generalization
|
||||
|
34
design.md
34
design.md
@@ -103,13 +103,13 @@ Speed
|
||||
|
||||
- First approach was quadratic. Adding four hours of data:
|
||||
|
||||
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-110000 /bpnilm/1/raw
|
||||
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-110000 /bpnilm/1/raw
|
||||
real 24m31.093s
|
||||
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-120001 /bpnilm/1/raw
|
||||
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-120001 /bpnilm/1/raw
|
||||
real 43m44.528s
|
||||
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-130002 /bpnilm/1/raw
|
||||
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-130002 /bpnilm/1/raw
|
||||
real 93m29.713s
|
||||
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-140003 /bpnilm/1/raw
|
||||
$ time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-140003 /bpnilm/1/raw
|
||||
real 166m53.007s
|
||||
|
||||
- Disabling pytables indexing didn't help:
|
||||
@@ -122,19 +122,19 @@ Speed
|
||||
- Server RAM usage is constant.
|
||||
|
||||
- Speed problems were due to IntervalSet speed, of parsing intervals
|
||||
from the database and adding the new one each time.
|
||||
from the database and adding the new one each time.
|
||||
|
||||
- First optimization is to cache result of `nilmdb:_get_intervals`,
|
||||
which gives the best speedup.
|
||||
|
||||
|
||||
- Also switched to internally using bxInterval from bx-python package.
|
||||
Speed of `tests/test_interval:TestIntervalSpeed` is pretty decent
|
||||
and seems to be growing logarithmically now. About 85μs per insertion
|
||||
for inserting 131k entries.
|
||||
|
||||
|
||||
- Storing the interval data in SQL might be better, with a scheme like:
|
||||
http://www.logarithmic.net/pfh/blog/01235197474
|
||||
|
||||
|
||||
- Next slowdown target is nilmdb.layout.Parser.parse().
|
||||
- Rewrote parsers using cython and sscanf
|
||||
- Stats (rev 10831), with _add_interval disabled
|
||||
@@ -142,7 +142,7 @@ Speed
|
||||
layout.pyx.parse:63 13913 sec, 5.1g calls
|
||||
numpy:records.py.fromrecords:569 7410 sec, 262k calls
|
||||
- Probably OK for now.
|
||||
|
||||
|
||||
IntervalSet speed
|
||||
-----------------
|
||||
- Initial implementation was pretty slow, even with binary search in
|
||||
@@ -161,6 +161,18 @@ IntervalSet speed
|
||||
- Might be algorithmic improvements to be made in Interval.py,
|
||||
like in `__and__`
|
||||
|
||||
- Replaced again with rbtree. Seems decent. Numbers are time per
|
||||
insert for 2**17 insertions, followed by total wall time and RAM
|
||||
usage for running "make test" with `test_rbtree` and `test_interval`
|
||||
with range(5,20):
|
||||
- old values with bxinterval:
|
||||
20.2 μS, total 20 s, 177 MB RAM
|
||||
- rbtree, plain python:
|
||||
97 μS, total 105 s, 846 MB RAM
|
||||
- rbtree converted to cython:
|
||||
26 μS, total 29 s, 320 MB RAM
|
||||
- rbtree and interval converted to cython:
|
||||
8.4 μS, total 12 s, 134 MB RAM
|
||||
|
||||
Layouts
|
||||
-------
|
||||
@@ -170,12 +182,12 @@ just collections and counts of a single type. We'll still use strings
|
||||
to describe them, with format:
|
||||
|
||||
type_count
|
||||
|
||||
|
||||
where type is "uint16", "float32", or "float64", and count is an integer.
|
||||
|
||||
nilmdb.layout.named() will parse these strings into the appropriate
|
||||
handlers. For compatibility:
|
||||
|
||||
|
||||
"RawData" == "uint16_6"
|
||||
"RawNotchedData" == "uint16_9"
|
||||
"PrepData" == "float32_8"
|
||||
|
@@ -1,3 +1,5 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""Class for performing HTTP client requests via libcurl"""
|
||||
|
||||
from __future__ import absolute_import
|
||||
@@ -8,6 +10,7 @@ import sys
|
||||
import re
|
||||
import os
|
||||
import simplejson as json
|
||||
import itertools
|
||||
|
||||
import nilmdb.httpclient
|
||||
|
||||
@@ -16,6 +19,10 @@ from nilmdb.httpclient import ClientError, ServerError, Error
|
||||
|
||||
version = "1.0"
|
||||
|
||||
def float_to_string(f):
|
||||
# Use repr to maintain full precision in the string output.
|
||||
return repr(float(f))
|
||||
|
||||
class Client(object):
|
||||
"""Main client interface to the Nilm database."""
|
||||
|
||||
@@ -84,33 +91,82 @@ class Client(object):
|
||||
"layout" : layout }
|
||||
return self.http.get("stream/create", params)
|
||||
|
||||
def stream_insert(self, path, data):
|
||||
def stream_destroy(self, path):
|
||||
"""Delete stream and its contents"""
|
||||
params = { "path": path }
|
||||
return self.http.get("stream/destroy", params)
|
||||
|
||||
def stream_insert(self, path, data, start = None, end = None):
|
||||
"""Insert data into a stream. data should be a file-like object
|
||||
that provides ASCII data that matches the database layout for path."""
|
||||
that provides ASCII data that matches the database layout for path.
|
||||
|
||||
start and end are the starting and ending timestamp of this
|
||||
stream; all timestamps t in the data must satisfy 'start <= t
|
||||
< end'. If left unspecified, 'start' is the timestamp of the
|
||||
first line of data, and 'end' is the timestamp on the last line
|
||||
of data, plus a small delta of 1μs.
|
||||
"""
|
||||
params = { "path": path }
|
||||
|
||||
# See design.md for a discussion of how much data to send.
|
||||
# These are soft limits -- actual data might be rounded up.
|
||||
max_data = 1048576
|
||||
max_time = 30
|
||||
end_epsilon = 1e-6
|
||||
|
||||
def pairwise(iterable):
|
||||
"s -> (s0,s1), (s1,s2), ..., (sn,None)"
|
||||
a, b = itertools.tee(iterable)
|
||||
next(b, None)
|
||||
return itertools.izip_longest(a, b)
|
||||
|
||||
def extract_timestamp(line):
|
||||
return float(line.split()[0])
|
||||
|
||||
def sendit():
|
||||
result = self.http.put("stream/insert", send_data, params)
|
||||
params["old_timestamp"] = result[1]
|
||||
return result
|
||||
# If we have more data after this, use the timestamp of
|
||||
# the next line as the end. Otherwise, use the given
|
||||
# overall end time, or add end_epsilon to the last data
|
||||
# point.
|
||||
if nextline:
|
||||
block_end = extract_timestamp(nextline)
|
||||
if end and block_end > end:
|
||||
# This is unexpected, but we'll defer to the server
|
||||
# to return an error in this case.
|
||||
block_end = end
|
||||
elif end:
|
||||
block_end = end
|
||||
else:
|
||||
block_end = extract_timestamp(line) + end_epsilon
|
||||
|
||||
# Send it
|
||||
params["start"] = float_to_string(block_start)
|
||||
params["end"] = float_to_string(block_end)
|
||||
return self.http.put("stream/insert", block_data, params)
|
||||
|
||||
clock_start = time.time()
|
||||
block_data = ""
|
||||
block_start = start
|
||||
result = None
|
||||
start = time.time()
|
||||
send_data = ""
|
||||
for line in data:
|
||||
elapsed = time.time() - start
|
||||
send_data += line
|
||||
for (line, nextline) in pairwise(data):
|
||||
# If we don't have a starting time, extract it from the first line
|
||||
if block_start is None:
|
||||
block_start = extract_timestamp(line)
|
||||
|
||||
if (len(send_data) > max_data) or (elapsed > max_time):
|
||||
clock_elapsed = time.time() - clock_start
|
||||
block_data += line
|
||||
|
||||
# If we have enough data, or enough time has elapsed,
|
||||
# send this block to the server, and empty things out
|
||||
# for the next block.
|
||||
if (len(block_data) > max_data) or (clock_elapsed > max_time):
|
||||
result = sendit()
|
||||
send_data = ""
|
||||
start = time.time()
|
||||
if len(send_data):
|
||||
block_start = None
|
||||
block_data = ""
|
||||
clock_start = time.time()
|
||||
|
||||
# One last block?
|
||||
if len(block_data):
|
||||
result = sendit()
|
||||
|
||||
# Return the most recent JSON result we got back, or None if
|
||||
@@ -125,9 +181,9 @@ class Client(object):
|
||||
"path": path
|
||||
}
|
||||
if start is not None:
|
||||
params["start"] = repr(start) # use repr to keep precision
|
||||
params["start"] = float_to_string(start)
|
||||
if end is not None:
|
||||
params["end"] = repr(end)
|
||||
params["end"] = float_to_string(end)
|
||||
return self.http.get_gen("stream/intervals", params, retjson = True)
|
||||
|
||||
def stream_extract(self, path, start = None, end = None, count = False):
|
||||
@@ -143,9 +199,9 @@ class Client(object):
|
||||
"path": path,
|
||||
}
|
||||
if start is not None:
|
||||
params["start"] = repr(start) # use repr to keep precision
|
||||
params["start"] = float_to_string(start)
|
||||
if end is not None:
|
||||
params["end"] = repr(end)
|
||||
params["end"] = float_to_string(end)
|
||||
if count:
|
||||
params["count"] = 1
|
||||
|
||||
|
@@ -15,7 +15,8 @@ version = "0.1"
|
||||
|
||||
# Valid subcommands. Defined in separate files just to break
|
||||
# things up -- they're still called with Cmdline as self.
|
||||
subcommands = [ "info", "create", "list", "metadata", "insert", "extract" ]
|
||||
subcommands = [ "info", "create", "list", "metadata", "insert", "extract",
|
||||
"destroy" ]
|
||||
|
||||
# Import the subcommand modules. Equivalent way of doing this would be
|
||||
# from . import info as cmd_info
|
||||
|
25
nilmdb/cmdline/destroy.py
Normal file
25
nilmdb/cmdline/destroy.py
Normal file
@@ -0,0 +1,25 @@
|
||||
from __future__ import absolute_import
|
||||
from nilmdb.printf import *
|
||||
import nilmdb.client
|
||||
|
||||
from argparse import ArgumentDefaultsHelpFormatter as def_form
|
||||
|
||||
def setup(self, sub):
|
||||
cmd = sub.add_parser("destroy", help="Delete a stream and all data",
|
||||
formatter_class = def_form,
|
||||
description="""
|
||||
Destroy the stream at the specified path. All
|
||||
data and metadata related to the stream is
|
||||
permanently deleted.
|
||||
""")
|
||||
cmd.set_defaults(handler = cmd_destroy)
|
||||
group = cmd.add_argument_group("Required arguments")
|
||||
group.add_argument("path",
|
||||
help="Path of the stream to delete, e.g. /foo/bar")
|
||||
|
||||
def cmd_destroy(self):
|
||||
"""Destroy stream"""
|
||||
try:
|
||||
self.client.stream_destroy(self.args.path)
|
||||
except nilmdb.client.ClientError as e:
|
||||
self.die("Error deleting stream: %s", str(e))
|
@@ -1,7 +1,6 @@
|
||||
from __future__ import absolute_import
|
||||
from nilmdb.printf import *
|
||||
import nilmdb.client
|
||||
import nilmdb.layout
|
||||
import sys
|
||||
|
||||
def setup(self, sub):
|
||||
|
@@ -1,7 +1,6 @@
|
||||
from __future__ import absolute_import
|
||||
from nilmdb.printf import *
|
||||
import nilmdb.client
|
||||
import nilmdb.layout
|
||||
import nilmdb.timestamper
|
||||
|
||||
import sys
|
||||
|
@@ -1,8 +1,9 @@
|
||||
"""Interval and IntervalSet
|
||||
"""Interval, IntervalSet
|
||||
|
||||
Represents an interval of time, and a set of such intervals.
|
||||
|
||||
Intervals are closed, ie. they include timestamps [start, end]
|
||||
Intervals are half-open, ie. they include data points with timestamps
|
||||
[start, end)
|
||||
"""
|
||||
|
||||
# First implementation kept a sorted list of intervals and used
|
||||
@@ -18,16 +19,20 @@ Intervals are closed, ie. they include timestamps [start, end]
|
||||
# Fourth version is an optimized rb-tree that stores interval starts
|
||||
# and ends directly in the tree, like bxinterval did.
|
||||
|
||||
import rbtree
|
||||
cimport rbtree
|
||||
cdef extern from "stdint.h":
|
||||
ctypedef unsigned long long uint64_t
|
||||
|
||||
class IntervalError(Exception):
|
||||
"""Error due to interval overlap, etc"""
|
||||
pass
|
||||
|
||||
class Interval(object):
|
||||
cdef class Interval:
|
||||
"""Represents an interval of time."""
|
||||
|
||||
def __init__(self, start, end):
|
||||
cdef public double start, end
|
||||
|
||||
def __init__(self, double start, double end):
|
||||
"""
|
||||
'start' and 'end' are arbitrary floats that represent time
|
||||
"""
|
||||
@@ -41,9 +46,9 @@ class Interval(object):
|
||||
return self.__class__.__name__ + "(" + s + ")"
|
||||
|
||||
def __str__(self):
|
||||
return "[" + str(self.start) + " -> " + str(self.end) + "]"
|
||||
return "[" + repr(self.start) + " -> " + repr(self.end) + ")"
|
||||
|
||||
def __cmp__(self, other):
|
||||
def __cmp__(self, Interval other):
|
||||
"""Compare two intervals. If non-equal, order by start then end"""
|
||||
if not isinstance(other, Interval):
|
||||
raise TypeError("bad type")
|
||||
@@ -57,20 +62,20 @@ class Interval(object):
|
||||
return -1
|
||||
return 1
|
||||
|
||||
def intersects(self, other):
|
||||
cpdef intersects(self, Interval other):
|
||||
"""Return True if two Interval objects intersect"""
|
||||
if (self.end <= other.start or self.start >= other.end):
|
||||
return False
|
||||
return True
|
||||
|
||||
def subset(self, start, end):
|
||||
cpdef subset(self, double start, double end):
|
||||
"""Return a new Interval that is a subset of this one"""
|
||||
# A subclass that tracks additional data might override this.
|
||||
if start < self.start or end > self.end:
|
||||
raise IntervalError("not a subset")
|
||||
return Interval(start, end)
|
||||
|
||||
class DBInterval(Interval):
|
||||
cdef class DBInterval(Interval):
|
||||
"""
|
||||
Like Interval, but also tracks corresponding start/end times and
|
||||
positions within the database. These are not currently modified
|
||||
@@ -85,6 +90,9 @@ class DBInterval(Interval):
|
||||
db_end = 200, db_endpos = 20000
|
||||
"""
|
||||
|
||||
cpdef public double db_start, db_end
|
||||
cpdef public uint64_t db_startpos, db_endpos
|
||||
|
||||
def __init__(self, start, end,
|
||||
db_start, db_end,
|
||||
db_startpos, db_endpos):
|
||||
@@ -109,7 +117,7 @@ class DBInterval(Interval):
|
||||
s += ", " + repr(self.db_startpos) + ", " + repr(self.db_endpos)
|
||||
return self.__class__.__name__ + "(" + s + ")"
|
||||
|
||||
def subset(self, start, end):
|
||||
cpdef subset(self, double start, double end):
|
||||
"""
|
||||
Return a new DBInterval that is a subset of this one
|
||||
"""
|
||||
@@ -119,11 +127,13 @@ class DBInterval(Interval):
|
||||
self.db_start, self.db_end,
|
||||
self.db_startpos, self.db_endpos)
|
||||
|
||||
class IntervalSet(object):
|
||||
cdef class IntervalSet:
|
||||
"""
|
||||
A non-intersecting set of intervals.
|
||||
"""
|
||||
|
||||
cdef public rbtree.RBTree tree
|
||||
|
||||
def __init__(self, source=None):
|
||||
"""
|
||||
'source' is an Interval or IntervalSet to add.
|
||||
@@ -148,7 +158,7 @@ class IntervalSet(object):
|
||||
descs = [ str(x) for x in self ]
|
||||
return "[" + ", ".join(descs) + "]"
|
||||
|
||||
def __eq__(self, other):
|
||||
def __match__(self, other):
|
||||
# This isn't particularly efficient, but it shouldn't get used in the
|
||||
# general case.
|
||||
"""Test equality of two IntervalSets.
|
||||
@@ -167,8 +177,8 @@ class IntervalSet(object):
|
||||
else:
|
||||
return False
|
||||
|
||||
this = [ x for x in self ]
|
||||
that = [ x for x in other ]
|
||||
this = list(self)
|
||||
that = list(other)
|
||||
|
||||
try:
|
||||
while True:
|
||||
@@ -199,10 +209,20 @@ class IntervalSet(object):
|
||||
except IndexError:
|
||||
return False
|
||||
|
||||
def __ne__(self, other):
|
||||
return not self.__eq__(other)
|
||||
# Use __richcmp__ instead of __eq__, __ne__ for Cython.
|
||||
def __richcmp__(self, other, int op):
|
||||
if op == 2: # ==
|
||||
return self.__match__(other)
|
||||
elif op == 3: # !=
|
||||
return not self.__match__(other)
|
||||
return False
|
||||
#def __eq__(self, other):
|
||||
# return self.__match__(other)
|
||||
#
|
||||
#def __ne__(self, other):
|
||||
# return not self.__match__(other)
|
||||
|
||||
def __iadd__(self, other):
|
||||
def __iadd__(self, object other not None):
|
||||
"""Inplace add -- modifies self
|
||||
|
||||
This throws an exception if the regions being added intersect."""
|
||||
@@ -210,13 +230,19 @@ class IntervalSet(object):
|
||||
if self.intersects(other):
|
||||
raise IntervalError("Tried to add overlapping interval "
|
||||
"to this set")
|
||||
self.tree.insert(rbtree.RBNode(other))
|
||||
self.tree.insert(rbtree.RBNode(other.start, other.end, other))
|
||||
else:
|
||||
for x in other:
|
||||
self.__iadd__(x)
|
||||
return self
|
||||
|
||||
def __isub__(self, other):
|
||||
def iadd_nocheck(self, Interval other not None):
|
||||
"""Inplace add -- modifies self.
|
||||
'Optimized' version that doesn't check for intersection and
|
||||
only inserts the new interval into the tree."""
|
||||
self.tree.insert(rbtree.RBNode(other.start, other.end, other))
|
||||
|
||||
def __isub__(self, Interval other not None):
|
||||
"""Inplace subtract -- modifies self
|
||||
|
||||
Removes an interval from the set. Must exist exactly
|
||||
@@ -227,13 +253,13 @@ class IntervalSet(object):
|
||||
self.tree.delete(i)
|
||||
return self
|
||||
|
||||
def __add__(self, other):
|
||||
def __add__(self, other not None):
|
||||
"""Add -- returns a new object"""
|
||||
new = IntervalSet(self)
|
||||
new += IntervalSet(other)
|
||||
return new
|
||||
|
||||
def __and__(self, other):
|
||||
def __and__(self, other not None):
|
||||
"""
|
||||
Compute a new IntervalSet from the intersection of two others
|
||||
|
||||
@@ -244,15 +270,15 @@ class IntervalSet(object):
|
||||
|
||||
if not isinstance(other, IntervalSet):
|
||||
for i in self.intersection(other):
|
||||
out.tree.insert(rbtree.RBNode(i))
|
||||
out.tree.insert(rbtree.RBNode(i.start, i.end, i))
|
||||
else:
|
||||
for x in other:
|
||||
for i in self.intersection(x):
|
||||
out.tree.insert(rbtree.RBNode(i))
|
||||
out.tree.insert(rbtree.RBNode(i.start, i.end, i))
|
||||
|
||||
return out
|
||||
|
||||
def intersection(self, interval):
|
||||
def intersection(self, Interval interval not None):
|
||||
"""
|
||||
Compute a sequence of intervals that correspond to the
|
||||
intersection between `self` and the provided interval.
|
||||
@@ -269,23 +295,24 @@ class IntervalSet(object):
|
||||
if i:
|
||||
if i.start >= interval.start and i.end <= interval.end:
|
||||
yield i
|
||||
elif i.start > interval.end:
|
||||
break
|
||||
else:
|
||||
subset = i.subset(max(i.start, interval.start),
|
||||
min(i.end, interval.end))
|
||||
yield subset
|
||||
|
||||
def intersects(self, other):
|
||||
### PROBABLY WRONG
|
||||
cpdef intersects(self, Interval other):
|
||||
"""Return True if this IntervalSet intersects another interval"""
|
||||
node = self.tree.find_left(other.start, other.end)
|
||||
if node is None:
|
||||
return False
|
||||
for n in self.tree.inorder(node):
|
||||
if n.obj:
|
||||
if n.obj.intersects(other):
|
||||
return True
|
||||
if n.obj > other:
|
||||
break
|
||||
for n in self.tree.intersect(other.start, other.end):
|
||||
if n.obj.intersects(other):
|
||||
return True
|
||||
return False
|
||||
|
||||
def find_end(self, double t):
|
||||
"""
|
||||
Return an Interval from this tree that ends at time t, or
|
||||
None if it doesn't exist.
|
||||
"""
|
||||
n = self.tree.find_left_end(t)
|
||||
if n and n.obj.end == t:
|
||||
return n.obj
|
||||
return None
|
1
nilmdb/interval.pyxdep
Normal file
1
nilmdb/interval.pyxdep
Normal file
@@ -0,0 +1 @@
|
||||
rbtree.pxd
|
134
nilmdb/nilmdb.py
134
nilmdb/nilmdb.py
@@ -192,36 +192,58 @@ class NilmDB(object):
|
||||
# Return cached value
|
||||
return self._cached_iset[stream_id]
|
||||
|
||||
# TODO: Split add_interval into two pieces, one to add
|
||||
# and one to flush to disk?
|
||||
# Need to think about this. Basic problem is that we can't
|
||||
# mess with intervals once they're in the IntervalSet,
|
||||
# without mucking with bxinterval internals.
|
||||
|
||||
# Maybe add a separate optimization step?
|
||||
# Join intervals that have a fairly small gap between them
|
||||
|
||||
def _add_interval(self, stream_id, interval, start_pos, end_pos):
|
||||
"""
|
||||
Add interval to the internal interval cache, and to the database.
|
||||
Note: arguments must be ints (not numpy.int64, etc)
|
||||
"""
|
||||
# Ensure this stream's intervals are cached, and add the new
|
||||
# interval to that cache.
|
||||
# Ensure this stream's intervals are cached
|
||||
iset = self._get_intervals(stream_id)
|
||||
try:
|
||||
iset += DBInterval(interval.start, interval.end,
|
||||
interval.start, interval.end,
|
||||
start_pos, end_pos)
|
||||
except IntervalError as e: # pragma: no cover
|
||||
|
||||
# Check for overlap
|
||||
if iset.intersects(interval): # pragma: no cover (gets caught earlier)
|
||||
raise NilmDBError("new interval overlaps existing data")
|
||||
|
||||
# Check for adjacency. If there's a stream in the database
|
||||
# that ends exactly when this one starts, and the database
|
||||
# rows match up, we can make one interval that covers the
|
||||
# time range [adjacent.start -> interval.end)
|
||||
# and database rows [ adjacent.start_pos -> end_pos ].
|
||||
# Only do this if the resulting interval isn't too large.
|
||||
max_merged_rows = 30000000 # a bit more than 1 hour at 8 KHz
|
||||
adjacent = iset.find_end(interval.start)
|
||||
if (adjacent is not None and
|
||||
start_pos == adjacent.db_endpos and
|
||||
(end_pos - adjacent.db_startpos) < max_merged_rows):
|
||||
# First delete the old one, both from our cache and the
|
||||
# database
|
||||
iset -= adjacent
|
||||
self.con.execute("DELETE FROM ranges WHERE "
|
||||
"stream_id=? AND start_time=? AND "
|
||||
"end_time=? AND start_pos=? AND "
|
||||
"end_pos=?", (stream_id,
|
||||
adjacent.db_start,
|
||||
adjacent.db_end,
|
||||
adjacent.db_startpos,
|
||||
adjacent.db_endpos))
|
||||
|
||||
# Now update our interval so the fallthrough add is
|
||||
# correct.
|
||||
interval.start = adjacent.start
|
||||
start_pos = adjacent.db_startpos
|
||||
|
||||
# Add the new interval to the cache
|
||||
iset.iadd_nocheck(DBInterval(interval.start, interval.end,
|
||||
interval.start, interval.end,
|
||||
start_pos, end_pos))
|
||||
|
||||
# Insert into the database
|
||||
self.con.execute("INSERT INTO ranges "
|
||||
"(stream_id,start_time,end_time,start_pos,end_pos) "
|
||||
"VALUES (?,?,?,?,?)",
|
||||
(stream_id, interval.start, interval.end,
|
||||
int(start_pos), int(end_pos)))
|
||||
|
||||
self.con.commit()
|
||||
|
||||
def stream_list(self, path = None, layout = None):
|
||||
@@ -291,16 +313,6 @@ class NilmDB(object):
|
||||
if group == '':
|
||||
raise ValueError("invalid path")
|
||||
|
||||
# Make the group structure, one element at a time
|
||||
group_path = group.lstrip('/').split("/")
|
||||
for i in range(len(group_path)):
|
||||
parent = "/" + "/".join(group_path[0:i])
|
||||
child = group_path[i]
|
||||
try:
|
||||
self.h5file.createGroup(parent, child)
|
||||
except tables.NodeError:
|
||||
pass
|
||||
|
||||
# Get description
|
||||
try:
|
||||
desc = nilmdb.layout.get_named(layout_name).description()
|
||||
@@ -312,9 +324,15 @@ class NilmDB(object):
|
||||
exp_rows = 8000 * 60*60*24*30*3
|
||||
|
||||
# Create the table
|
||||
table = self.h5file.createTable(group, node,
|
||||
description = desc,
|
||||
expectedrows = exp_rows)
|
||||
try:
|
||||
table = self.h5file.createTable(group, node,
|
||||
description = desc,
|
||||
expectedrows = exp_rows,
|
||||
createparents = True)
|
||||
except AttributeError:
|
||||
# Trying to create e.g. /foo/bar/baz when /foo/bar is already
|
||||
# a table raises this error.
|
||||
raise ValueError("error creating table at that path")
|
||||
|
||||
# Insert into SQL database once the PyTables is happy
|
||||
with self.con as con:
|
||||
@@ -337,8 +355,7 @@ class NilmDB(object):
|
||||
"""
|
||||
stream_id = self._stream_id(path)
|
||||
with self.con as con:
|
||||
con.execute("DELETE FROM metadata "
|
||||
"WHERE stream_id=?", (stream_id,))
|
||||
con.execute("DELETE FROM metadata WHERE stream_id=?", (stream_id,))
|
||||
for key in data:
|
||||
if data[key] != '':
|
||||
con.execute("INSERT INTO metadata VALUES (?, ?, ?)",
|
||||
@@ -361,38 +378,53 @@ class NilmDB(object):
|
||||
data.update(newdata)
|
||||
self.stream_set_metadata(path, data)
|
||||
|
||||
def stream_insert(self, path, parser, old_timestamp = None):
|
||||
def stream_destroy(self, path):
|
||||
"""Fully remove a table and all of its data from the database.
|
||||
No way to undo it! The group structure is removed, if there
|
||||
are no other tables in it. Metadata is removed."""
|
||||
stream_id = self._stream_id(path)
|
||||
|
||||
# Delete the cached interval data
|
||||
if stream_id in self._cached_iset:
|
||||
del self._cached_iset[stream_id]
|
||||
|
||||
# Delete the data node, and all parent nodes (if they have no
|
||||
# remaining children)
|
||||
split_path = path.lstrip('/').split("/")
|
||||
while split_path:
|
||||
name = split_path.pop()
|
||||
where = "/" + "/".join(split_path)
|
||||
try:
|
||||
self.h5file.removeNode(where, name, recursive = False)
|
||||
except tables.NodeError:
|
||||
break
|
||||
|
||||
# Delete metadata, stream, intervals
|
||||
with self.con as con:
|
||||
con.execute("DELETE FROM metadata WHERE stream_id=?", (stream_id,))
|
||||
con.execute("DELETE FROM ranges WHERE stream_id=?", (stream_id,))
|
||||
con.execute("DELETE FROM streams WHERE id=?", (stream_id,))
|
||||
|
||||
def stream_insert(self, path, start, end, data):
|
||||
"""Insert new data into the database.
|
||||
path: Path at which to add the data
|
||||
parser: nilmdb.layout.Parser instance full of data to insert
|
||||
start: Starting timestamp
|
||||
end: Ending timestamp
|
||||
data: Rows of data, to be passed to PyTable's table.append
|
||||
method. E.g. nilmdb.layout.Parser.data
|
||||
"""
|
||||
if (not parser.min_timestamp or not parser.max_timestamp or
|
||||
not len(parser.data)):
|
||||
raise StreamError("no data provided")
|
||||
|
||||
# If we were provided with an old timestamp, the expectation
|
||||
# is that the client has a contiguous block of time it is sending,
|
||||
# but it's doing it over multiple calls to stream_insert.
|
||||
# old_timestamp is the max_timestamp of the previous insert.
|
||||
# To make things continuous, use that as our starting timestamp
|
||||
# instead of what the parser found.
|
||||
if old_timestamp:
|
||||
min_timestamp = old_timestamp
|
||||
else:
|
||||
min_timestamp = parser.min_timestamp
|
||||
|
||||
# First check for basic overlap using timestamp info given.
|
||||
stream_id = self._stream_id(path)
|
||||
iset = self._get_intervals(stream_id)
|
||||
interval = Interval(min_timestamp, parser.max_timestamp)
|
||||
interval = Interval(start, end)
|
||||
if iset.intersects(interval):
|
||||
raise OverlapError("new data overlaps existing data: "
|
||||
raise OverlapError("new data overlaps existing data at range: "
|
||||
+ str(iset & interval))
|
||||
|
||||
# Insert the data into pytables
|
||||
table = self.h5file.getNode(path)
|
||||
row_start = table.nrows
|
||||
table.append(parser.data)
|
||||
table.append(data)
|
||||
row_end = table.nrows
|
||||
table.flush()
|
||||
|
||||
|
23
nilmdb/rbtree.pxd
Normal file
23
nilmdb/rbtree.pxd
Normal file
@@ -0,0 +1,23 @@
|
||||
cdef class RBNode:
|
||||
cdef public object obj
|
||||
cdef public double start, end
|
||||
cdef public int red
|
||||
cdef public RBNode left, right, parent
|
||||
|
||||
cdef class RBTree:
|
||||
cdef public RBNode nil, root
|
||||
|
||||
cpdef getroot(RBTree self)
|
||||
cdef void __rotate_left(RBTree self, RBNode x)
|
||||
cdef void __rotate_right(RBTree self, RBNode y)
|
||||
cdef RBNode __successor(RBTree self, RBNode x)
|
||||
cpdef RBNode successor(RBTree self, RBNode x)
|
||||
cdef RBNode __predecessor(RBTree self, RBNode x)
|
||||
cpdef RBNode predecessor(RBTree self, RBNode x)
|
||||
cpdef insert(RBTree self, RBNode z)
|
||||
cdef void __insert_fixup(RBTree self, RBNode x)
|
||||
cpdef delete(RBTree self, RBNode z)
|
||||
cdef inline void __delete_fixup(RBTree self, RBNode x)
|
||||
cpdef RBNode find(RBTree self, double start, double end)
|
||||
cpdef RBNode find_left_end(RBTree self, double t)
|
||||
cpdef RBNode find_right_start(RBTree self, double t)
|
@@ -1,20 +1,27 @@
|
||||
"""Red-black tree, where keys are stored as start/end timestamps."""
|
||||
# cython: profile=False
|
||||
# cython: cdivision=True
|
||||
|
||||
"""
|
||||
Jim Paris <jim@jtan.com>
|
||||
|
||||
Red-black tree, where keys are stored as start/end timestamps.
|
||||
This is a basic interval tree that holds half-open intervals:
|
||||
[start, end)
|
||||
Intervals must not overlap. Fixing that would involve making this
|
||||
into an augmented interval tree as described in CLRS 14.3.
|
||||
|
||||
Code that assumes non-overlapping intervals is marked with the
|
||||
string 'non-overlapping'.
|
||||
"""
|
||||
|
||||
import sys
|
||||
cimport rbtree
|
||||
|
||||
class RBNode(object):
|
||||
"""One node of the Red/Black tree. obj points to any object,
|
||||
'start' and 'end' are timestamps that represent the key."""
|
||||
def __init__(self, obj = None, start = None, end = None):
|
||||
"""If given an object but no start/end times, get the
|
||||
start/end times from the object.
|
||||
|
||||
If given start/end times, obj can be anything, including None."""
|
||||
cdef class RBNode:
|
||||
"""One node of the Red/Black tree, containing a key (start, end)
|
||||
and value (obj)"""
|
||||
def __init__(self, double start, double end, object obj = None):
|
||||
self.obj = obj
|
||||
if start is None:
|
||||
start = obj.start
|
||||
if end is None:
|
||||
end = obj.end
|
||||
self.start = start
|
||||
self.end = end
|
||||
self.red = False
|
||||
@@ -26,21 +33,23 @@ class RBNode(object):
|
||||
color = "R"
|
||||
else:
|
||||
color = "B"
|
||||
return ("[node "
|
||||
if self.start == sys.float_info.min:
|
||||
return "[node nil]"
|
||||
return ("[node ("
|
||||
+ str(self.obj) + ") "
|
||||
+ str(self.start) + " -> " + str(self.end) + " "
|
||||
+ color + "]")
|
||||
|
||||
class RBTree(object):
|
||||
cdef class RBTree:
|
||||
"""Red/Black tree"""
|
||||
|
||||
# Init
|
||||
def __init__(self):
|
||||
self.nil = RBNode(start = sys.float_info.min,
|
||||
end = sys.float_info.min)
|
||||
end = sys.float_info.min)
|
||||
self.nil.left = self.nil
|
||||
self.nil.right = self.nil
|
||||
self.nil.parent = self.nil
|
||||
self.nil.nil = True
|
||||
|
||||
self.root = RBNode(start = sys.float_info.max,
|
||||
end = sys.float_info.max)
|
||||
@@ -48,9 +57,21 @@ class RBTree(object):
|
||||
self.root.right = self.nil
|
||||
self.root.parent = self.nil
|
||||
|
||||
# We have a dummy root node to simplify operations, so from an
|
||||
# external point of view, its left child is the real root.
|
||||
cpdef getroot(self):
|
||||
return self.root.left
|
||||
|
||||
# Rotations and basic operations
|
||||
def __rotate_left(self, x):
|
||||
y = x.right
|
||||
cdef void __rotate_left(self, RBNode x):
|
||||
"""Rotate left:
|
||||
# x y
|
||||
# / \ --> / \
|
||||
# z y x w
|
||||
# / \ / \
|
||||
# v w z v
|
||||
"""
|
||||
cdef RBNode y = x.right
|
||||
x.right = y.left
|
||||
if y.left is not self.nil:
|
||||
y.left.parent = x
|
||||
@@ -62,8 +83,15 @@ class RBTree(object):
|
||||
y.left = x
|
||||
x.parent = y
|
||||
|
||||
def __rotate_right(self, y):
|
||||
x = y.left
|
||||
cdef void __rotate_right(self, RBNode y):
|
||||
"""Rotate right:
|
||||
# y x
|
||||
# / \ --> / \
|
||||
# x w z y
|
||||
# / \ / \
|
||||
# z v v w
|
||||
"""
|
||||
cdef RBNode x = y.left
|
||||
y.left = x.right
|
||||
if x.right is not self.nil:
|
||||
x.right.parent = y
|
||||
@@ -75,9 +103,9 @@ class RBTree(object):
|
||||
x.right = y
|
||||
y.parent = x
|
||||
|
||||
def __successor(self, x):
|
||||
cdef RBNode __successor(self, RBNode x):
|
||||
"""Returns the successor of RBNode x"""
|
||||
y = x.right
|
||||
cdef RBNode y = x.right
|
||||
if y is not self.nil:
|
||||
while y.left is not self.nil:
|
||||
y = y.left
|
||||
@@ -89,10 +117,14 @@ class RBTree(object):
|
||||
if y is self.root:
|
||||
return self.nil
|
||||
return y
|
||||
cpdef RBNode successor(self, RBNode x):
|
||||
"""Returns the successor of RBNode x, or None"""
|
||||
cdef RBNode y = self.__successor(x)
|
||||
return y if y is not self.nil else None
|
||||
|
||||
def _predecessor(self, x):
|
||||
cdef RBNode __predecessor(self, RBNode x):
|
||||
"""Returns the predecessor of RBNode x"""
|
||||
y = x.left
|
||||
cdef RBNode y = x.left
|
||||
if y is not self.nil:
|
||||
while y.right is not self.nil:
|
||||
y = y.right
|
||||
@@ -105,14 +137,18 @@ class RBTree(object):
|
||||
x = y
|
||||
y = y.parent
|
||||
return y
|
||||
cpdef RBNode predecessor(self, RBNode x):
|
||||
"""Returns the predecessor of RBNode x, or None"""
|
||||
cdef RBNode y = self.__predecessor(x)
|
||||
return y if y is not self.nil else None
|
||||
|
||||
# Insertion
|
||||
def insert(self, z):
|
||||
cpdef insert(self, RBNode z):
|
||||
"""Insert RBNode z into RBTree and rebalance as necessary"""
|
||||
z.left = self.nil
|
||||
z.right = self.nil
|
||||
y = self.root
|
||||
x = self.root.left
|
||||
cdef RBNode y = self.root
|
||||
cdef RBNode x = self.root.left
|
||||
while x is not self.nil:
|
||||
y = x
|
||||
if (x.start > z.start or (x.start == z.start and x.end > z.end)):
|
||||
@@ -128,7 +164,7 @@ class RBTree(object):
|
||||
# relabel/rebalance
|
||||
self.__insert_fixup(z)
|
||||
|
||||
def __insert_fixup(self, x):
|
||||
cdef void __insert_fixup(self, RBNode x):
|
||||
"""Rebalance/fix RBTree after a simple insertion of RBNode x"""
|
||||
x.red = True
|
||||
while x.parent.red:
|
||||
@@ -163,10 +199,11 @@ class RBTree(object):
|
||||
self.root.left.red = False
|
||||
|
||||
# Deletion
|
||||
def delete(self, z):
|
||||
cpdef delete(self, RBNode z):
|
||||
if z.left is None or z.right is None:
|
||||
raise AttributeError("you can only delete a node object "
|
||||
+ "from the tree; use find() to get one")
|
||||
cdef RBNode x, y
|
||||
if z.left is self.nil or z.right is self.nil:
|
||||
y = z
|
||||
else:
|
||||
@@ -203,10 +240,10 @@ class RBTree(object):
|
||||
if not y.red:
|
||||
self.__delete_fixup(x)
|
||||
|
||||
def __delete_fixup(self, x):
|
||||
cdef void __delete_fixup(self, RBNode x):
|
||||
"""Rebalance/fix RBTree after a deletion. RBNode x is the
|
||||
child of the spliced out node."""
|
||||
rootLeft = self.root.left
|
||||
cdef RBNode rootLeft = self.root.left
|
||||
while not x.red and x is not rootLeft:
|
||||
if x is x.parent.left:
|
||||
w = x.parent.right
|
||||
@@ -252,141 +289,89 @@ class RBTree(object):
|
||||
x = rootLeft # exit loop
|
||||
x.red = False
|
||||
|
||||
# Rendering
|
||||
def __render_dot_node(self, node, max_depth = 20):
|
||||
from printf import sprintf
|
||||
"""Render a single node and its children into a dot graph fragment"""
|
||||
if max_depth == 0:
|
||||
return ""
|
||||
if node is self.nil:
|
||||
return ""
|
||||
def c(red):
|
||||
if red:
|
||||
return 'color="#ff0000", style=filled, fillcolor="#ffc0c0"'
|
||||
else:
|
||||
return 'color="#000000", style=filled, fillcolor="#c0c0c0"'
|
||||
s = sprintf("%d [label=\"%g\\n%g\", %s];\n",
|
||||
id(node),
|
||||
node.start, node.end,
|
||||
c(node.red))
|
||||
|
||||
if node.left is self.nil:
|
||||
s += sprintf("L%d [label=\"-\", %s];\n", id(node), c(False))
|
||||
s += sprintf("%d -> L%d [label=L];\n", id(node), id(node))
|
||||
else:
|
||||
s += sprintf("%d -> %d [label=L];\n", id(node), id(node.left))
|
||||
if node.right is self.nil:
|
||||
s += sprintf("R%d [label=\"-\", %s];\n", id(node), c(False))
|
||||
s += sprintf("%d -> R%d [label=R];\n", id(node), id(node))
|
||||
else:
|
||||
s += sprintf("%d -> %d [label=R];\n", id(node), id(node.right))
|
||||
s += self.__render_dot_node(node.left, max_depth-1)
|
||||
s += self.__render_dot_node(node.right, max_depth-1)
|
||||
return s
|
||||
|
||||
def render_dot(self, title = "RBTree"):
|
||||
"""Render the entire RBTree as a dot graph"""
|
||||
return ("digraph rbtree {\n"
|
||||
+ self.__render_dot_node(self.root.left)
|
||||
+ "}\n");
|
||||
|
||||
def render_dot_live(self, title = "RBTree"):
|
||||
"""Render the entire RBTree as a dot graph, live GTK view"""
|
||||
import gtk
|
||||
import gtk.gdk
|
||||
sys.path.append("/usr/share/xdot")
|
||||
import xdot
|
||||
xdot.Pen.highlighted = lambda pen: pen
|
||||
s = ("digraph rbtree {\n"
|
||||
+ self.__render_dot_node(self.root)
|
||||
+ "}\n");
|
||||
window = xdot.DotWindow()
|
||||
window.set_dotcode(s)
|
||||
window.set_title(title + " - any key to close")
|
||||
window.connect('destroy', gtk.main_quit)
|
||||
def quit(widget, event):
|
||||
if not event.is_modifier:
|
||||
window.destroy()
|
||||
gtk.main_quit()
|
||||
window.widget.connect('key-press-event', quit)
|
||||
gtk.main()
|
||||
|
||||
# Walking, searching
|
||||
def __iter__(self):
|
||||
return self.inorder(self.root.left)
|
||||
return self.inorder()
|
||||
|
||||
def inorder(self, x = None):
|
||||
def inorder(self, RBNode x = None):
|
||||
"""Generator that performs an inorder walk for the tree
|
||||
starting at RBNode x"""
|
||||
rooted at RBNode x"""
|
||||
if x is None:
|
||||
x = self.root.left
|
||||
x = self.getroot()
|
||||
while x.left is not self.nil:
|
||||
x = x.left
|
||||
while x is not self.nil:
|
||||
yield x
|
||||
x = self.__successor(x)
|
||||
|
||||
def __find_all(self, start, end, x):
|
||||
"""Find node with the specified (start,end) key.
|
||||
Also returns the largest node less than or equal to key,
|
||||
and the smallest node greater or equal to than key."""
|
||||
if x is None:
|
||||
x = self.root.left
|
||||
largest = self.nil
|
||||
smallest = self.nil
|
||||
cpdef RBNode find(self, double start, double end):
|
||||
"""Return the node with exactly the given start and end."""
|
||||
cdef RBNode x = self.getroot()
|
||||
while x is not self.nil:
|
||||
if start < x.start:
|
||||
smallest = x
|
||||
x = x.left # start <
|
||||
x = x.left
|
||||
elif start == x.start:
|
||||
if end < x.end:
|
||||
smallest = x
|
||||
x = x.left # start =, end <
|
||||
elif end == x.end: # found it
|
||||
smallest = x
|
||||
largest = x
|
||||
break
|
||||
if end == x.end:
|
||||
break # found it
|
||||
elif end < x.end:
|
||||
x = x.left
|
||||
else:
|
||||
largest = x
|
||||
x = x.right # start =, end >
|
||||
x = x.right
|
||||
else:
|
||||
largest = x
|
||||
x = x.right # start >
|
||||
return (x, smallest, largest)
|
||||
x = x.right
|
||||
return x if x is not self.nil else None
|
||||
|
||||
def find(self, start, end, x = None):
|
||||
"""Find node with the key == (start,end), or None"""
|
||||
y = self.__find_all(start, end, x)[1]
|
||||
return y if y is not self.nil else None
|
||||
cpdef RBNode find_left_end(self, double t):
|
||||
"""Find the leftmode node with end >= t. With non-overlapping
|
||||
intervals, this is the first node that might overlap time t.
|
||||
|
||||
def find_right(self, start, end, x = None):
|
||||
"""Find node with the smallest key >= (start,end), or None"""
|
||||
y = self.__find_all(start, end, x)[1]
|
||||
return y if y is not self.nil else None
|
||||
Note that this relies on non-overlapping intervals, since
|
||||
it assumes that we can use the endpoints to traverse the
|
||||
tree even though it was created using the start points."""
|
||||
cdef RBNode x = self.getroot()
|
||||
while x is not self.nil:
|
||||
if t < x.end:
|
||||
if x.left is self.nil:
|
||||
break
|
||||
x = x.left
|
||||
elif t == x.end:
|
||||
break
|
||||
else:
|
||||
if x.right is self.nil:
|
||||
x = self.__successor(x)
|
||||
break
|
||||
x = x.right
|
||||
return x if x is not self.nil else None
|
||||
|
||||
def find_left(self, start, end, x = None):
|
||||
"""Find node with the largest key <= (start,end), or None"""
|
||||
y = self.__find_all(start, end, x)[2]
|
||||
return y if y is not self.nil else None
|
||||
cpdef RBNode find_right_start(self, double t):
|
||||
"""Find the rightmode node with start <= t. With non-overlapping
|
||||
intervals, this is the last node that might overlap time t."""
|
||||
cdef RBNode x = self.getroot()
|
||||
while x is not self.nil:
|
||||
if t < x.start:
|
||||
if x.left is self.nil:
|
||||
x = self.__predecessor(x)
|
||||
break
|
||||
x = x.left
|
||||
elif t == x.start:
|
||||
break
|
||||
else:
|
||||
if x.right is self.nil:
|
||||
break
|
||||
x = x.right
|
||||
return x if x is not self.nil else None
|
||||
|
||||
# Intersections
|
||||
def intersect(self, start, end):
|
||||
def intersect(self, double start, double end):
|
||||
"""Generator that returns nodes that overlap the given
|
||||
(start,end) range, for the tree rooted at RBNode x.
|
||||
|
||||
NOTE: this assumes non-overlapping intervals."""
|
||||
# Start with the leftmost node before the starting point
|
||||
n = self.find_left(start, start)
|
||||
# If we didn't find one, look for the leftmode node before the
|
||||
# ending point instead.
|
||||
if n is None:
|
||||
n = self.find_left(end, end)
|
||||
# If we still didn't find it, there are no intervals that intersect.
|
||||
if n is None:
|
||||
return none
|
||||
|
||||
# Now yield this node and all successors until their endpoints
|
||||
|
||||
if False:
|
||||
yield
|
||||
return
|
||||
(start,end) range. Assumes non-overlapping intervals."""
|
||||
# Start with the leftmode node that ends after start
|
||||
cdef RBNode n = self.find_left_end(start)
|
||||
while n is not None:
|
||||
if n.start >= end:
|
||||
# this node starts after the requested end; we're done
|
||||
break
|
||||
if start < n.end:
|
||||
# this node overlaps our requested area
|
||||
yield n
|
||||
n = self.successor(n)
|
1
nilmdb/rbtree.pyxdep
Normal file
1
nilmdb/rbtree.pyxdep
Normal file
@@ -0,0 +1 @@
|
||||
rbtree.pxd
|
@@ -88,6 +88,17 @@ class Stream(NilmApp):
|
||||
message = sprintf("%s: %s", type(e).__name__, e.message)
|
||||
raise cherrypy.HTTPError("400 Bad Request", message)
|
||||
|
||||
# /stream/destroy?path=/newton/prep
|
||||
@cherrypy.expose
|
||||
@cherrypy.tools.json_out()
|
||||
def destroy(self, path):
|
||||
"""Delete a stream and its associated data."""
|
||||
try:
|
||||
return self.db.stream_destroy(path)
|
||||
except Exception as e:
|
||||
message = sprintf("%s: %s", type(e).__name__, e.message)
|
||||
raise cherrypy.HTTPError("400 Bad Request", message)
|
||||
|
||||
# /stream/get_metadata?path=/newton/prep
|
||||
# /stream/get_metadata?path=/newton/prep&key=foo&key=bar
|
||||
@cherrypy.expose
|
||||
@@ -145,19 +156,11 @@ class Stream(NilmApp):
|
||||
@cherrypy.expose
|
||||
@cherrypy.tools.json_out()
|
||||
#@cherrypy.tools.disable_prb()
|
||||
def insert(self, path, old_timestamp = None):
|
||||
def insert(self, path, start, end):
|
||||
"""
|
||||
Insert new data into the database. Provide textual data
|
||||
(matching the path's layout) as a HTTP PUT.
|
||||
|
||||
old_timestamp is used when making multiple, split-up insertions
|
||||
for a larger contiguous block of data. The first insert
|
||||
will return the maximum timestamp that it saw, and the second
|
||||
insert should provide this timestamp as an argument. This is
|
||||
used to extend the previous database interval rather than
|
||||
start a new one.
|
||||
"""
|
||||
|
||||
# Important that we always read the input before throwing any
|
||||
# errors, to keep lengths happy for persistent connections.
|
||||
# However, CherryPy 3.2.2 has a bug where this fails for GET
|
||||
@@ -182,18 +185,31 @@ class Stream(NilmApp):
|
||||
"Error parsing input data: " +
|
||||
e.message)
|
||||
|
||||
if (not parser.min_timestamp or not parser.max_timestamp or
|
||||
not len(parser.data)):
|
||||
raise cherrypy.HTTPError("400 Bad Request",
|
||||
"no data provided")
|
||||
|
||||
# Check limits
|
||||
start = float(start)
|
||||
end = float(end)
|
||||
if parser.min_timestamp < start:
|
||||
raise cherrypy.HTTPError("400 Bad Request", "Data timestamp " +
|
||||
repr(parser.min_timestamp) +
|
||||
" < start time " + repr(start))
|
||||
if parser.max_timestamp >= end:
|
||||
raise cherrypy.HTTPError("400 Bad Request", "Data timestamp " +
|
||||
repr(parser.max_timestamp) +
|
||||
" >= end time " + repr(end))
|
||||
|
||||
# Now do the nilmdb insert, passing it the parser full of data.
|
||||
try:
|
||||
if old_timestamp:
|
||||
old_timestamp = float(old_timestamp)
|
||||
result = self.db.stream_insert(path, parser, old_timestamp)
|
||||
result = self.db.stream_insert(path, start, end, parser.data)
|
||||
except nilmdb.nilmdb.NilmDBError as e:
|
||||
raise cherrypy.HTTPError("400 Bad Request", e.message)
|
||||
|
||||
# Return the maximum timestamp that we saw. The client will
|
||||
# return this back to us as the old_timestamp parameter, if
|
||||
# it has more data to send.
|
||||
return ("ok", parser.max_timestamp)
|
||||
# Done
|
||||
return "ok"
|
||||
|
||||
# /stream/intervals?path=/newton/prep
|
||||
# /stream/intervals?path=/newton/prep&start=1234567890.0&end=1234567899.0
|
||||
|
@@ -13,7 +13,9 @@ verbosity=2
|
||||
#tests=tests/test_cmdline.py
|
||||
#tests=tests/test_layout.py
|
||||
#tests=tests/test_rbtree.py
|
||||
tests=tests/test_interval.py
|
||||
#tests=tests/test_interval.py
|
||||
#tests=tests/test_rbtree.py,tests/test_interval.py
|
||||
#tests=tests/test_interval.py
|
||||
#tests=tests/test_client.py
|
||||
#tests=tests/test_timestamper.py
|
||||
#tests=tests/test_serializer.py
|
||||
|
90
tests/renderdot.py
Normal file
90
tests/renderdot.py
Normal file
@@ -0,0 +1,90 @@
|
||||
import sys
|
||||
|
||||
class Renderer(object):
|
||||
|
||||
def __init__(self, getleft, getright,
|
||||
getred, getstart, getend, nil):
|
||||
self.getleft = getleft
|
||||
self.getright = getright
|
||||
self.getred = getred
|
||||
self.getstart = getstart
|
||||
self.getend = getend
|
||||
self.nil = nil
|
||||
|
||||
# Rendering
|
||||
def __render_dot_node(self, node, max_depth = 20):
|
||||
from nilmdb.printf import sprintf
|
||||
"""Render a single node and its children into a dot graph fragment"""
|
||||
if max_depth == 0:
|
||||
return ""
|
||||
if node is self.nil:
|
||||
return ""
|
||||
def c(red):
|
||||
if red:
|
||||
return 'color="#ff0000", style=filled, fillcolor="#ffc0c0"'
|
||||
else:
|
||||
return 'color="#000000", style=filled, fillcolor="#c0c0c0"'
|
||||
s = sprintf("%d [label=\"%g\\n%g\", %s];\n",
|
||||
id(node),
|
||||
self.getstart(node), self.getend(node),
|
||||
c(self.getred(node)))
|
||||
|
||||
if self.getleft(node) is self.nil:
|
||||
s += sprintf("L%d [label=\"-\", %s];\n", id(node), c(False))
|
||||
s += sprintf("%d -> L%d [label=L];\n", id(node), id(node))
|
||||
else:
|
||||
s += sprintf("%d -> %d [label=L];\n",
|
||||
id(node),id(self.getleft(node)))
|
||||
if self.getright(node) is self.nil:
|
||||
s += sprintf("R%d [label=\"-\", %s];\n", id(node), c(False))
|
||||
s += sprintf("%d -> R%d [label=R];\n", id(node), id(node))
|
||||
else:
|
||||
s += sprintf("%d -> %d [label=R];\n",
|
||||
id(node), id(self.getright(node)))
|
||||
s += self.__render_dot_node(self.getleft(node), max_depth-1)
|
||||
s += self.__render_dot_node(self.getright(node), max_depth-1)
|
||||
return s
|
||||
|
||||
def render_dot(self, rootnode, title = "Tree"):
|
||||
"""Render the entire tree as a dot graph"""
|
||||
return ("digraph rbtree {\n"
|
||||
+ self.__render_dot_node(rootnode)
|
||||
+ "}\n");
|
||||
|
||||
def render_dot_live(self, rootnode, title = "Tree"):
|
||||
"""Render the entiretree as a dot graph, live GTK view"""
|
||||
import gtk
|
||||
import gtk.gdk
|
||||
sys.path.append("/usr/share/xdot")
|
||||
import xdot
|
||||
xdot.Pen.highlighted = lambda pen: pen
|
||||
s = ("digraph rbtree {\n"
|
||||
+ self.__render_dot_node(rootnode)
|
||||
+ "}\n");
|
||||
window = xdot.DotWindow()
|
||||
window.set_dotcode(s)
|
||||
window.set_title(title + " - any key to close")
|
||||
window.connect('destroy', gtk.main_quit)
|
||||
def quit(widget, event):
|
||||
if not event.is_modifier:
|
||||
window.destroy()
|
||||
gtk.main_quit()
|
||||
window.widget.connect('key-press-event', quit)
|
||||
gtk.main()
|
||||
|
||||
class RBTreeRenderer(Renderer):
|
||||
def __init__(self, tree):
|
||||
Renderer.__init__(self,
|
||||
lambda node: node.left,
|
||||
lambda node: node.right,
|
||||
lambda node: node.red,
|
||||
lambda node: node.start,
|
||||
lambda node: node.end,
|
||||
tree.nil)
|
||||
self.tree = tree
|
||||
|
||||
def render(self, title = "RBTree", live = True):
|
||||
if live:
|
||||
return Renderer.render_dot_live(self, self.tree.getroot(), title)
|
||||
else:
|
||||
return Renderer.render_dot(self, self.tree.getroot(), title)
|
@@ -131,6 +131,7 @@ class TestClient(object):
|
||||
|
||||
testfile = "tests/data/prep-20120323T1000"
|
||||
start = datetime_tz.datetime_tz.smartparse("20120323T1000")
|
||||
start = start.totimestamp()
|
||||
rate = 120
|
||||
|
||||
# First try a nonexistent path
|
||||
@@ -155,14 +156,41 @@ class TestClient(object):
|
||||
|
||||
# Try forcing a server request with empty data
|
||||
with assert_raises(ClientError) as e:
|
||||
client.http.put("stream/insert", "", { "path": "/newton/prep" })
|
||||
client.http.put("stream/insert", "", { "path": "/newton/prep",
|
||||
"start": 0, "end": 0 })
|
||||
in_("400 Bad Request", str(e.exception))
|
||||
in_("no data provided", str(e.exception))
|
||||
|
||||
# Specify start/end (starts too late)
|
||||
data = nilmdb.timestamper.TimestamperRate(testfile, start, 120)
|
||||
with assert_raises(ClientError) as e:
|
||||
result = client.stream_insert("/newton/prep", data,
|
||||
start + 5, start + 120)
|
||||
in_("400 Bad Request", str(e.exception))
|
||||
in_("Data timestamp 1332511200.0 < start time 1332511205.0",
|
||||
str(e.exception))
|
||||
|
||||
# Specify start/end (ends too early)
|
||||
data = nilmdb.timestamper.TimestamperRate(testfile, start, 120)
|
||||
with assert_raises(ClientError) as e:
|
||||
result = client.stream_insert("/newton/prep", data,
|
||||
start, start + 1)
|
||||
in_("400 Bad Request", str(e.exception))
|
||||
# Client chunks the input, so the exact timestamp here might change
|
||||
# if the chunk positions change.
|
||||
in_("Data timestamp 1332511271.016667 >= end time 1332511201.0",
|
||||
str(e.exception))
|
||||
|
||||
# Now do the real load
|
||||
data = nilmdb.timestamper.TimestamperRate(testfile, start, 120)
|
||||
result = client.stream_insert("/newton/prep", data)
|
||||
eq_(result[0], "ok")
|
||||
result = client.stream_insert("/newton/prep", data,
|
||||
start, start + 119.999777)
|
||||
eq_(result, "ok")
|
||||
|
||||
# Verify the intervals. Should be just one, even if the data
|
||||
# was inserted in chunks, due to nilmdb interval concatenation.
|
||||
intervals = list(client.stream_intervals("/newton/prep"))
|
||||
eq_(intervals, [[start, start + 119.999777]])
|
||||
|
||||
# Try some overlapping data -- just insert it again
|
||||
data = nilmdb.timestamper.TimestamperRate(testfile, start, 120)
|
||||
@@ -215,7 +243,8 @@ class TestClient(object):
|
||||
# Check PUT with generator out
|
||||
with assert_raises(ClientError) as e:
|
||||
client.http.put_gen("stream/insert", "",
|
||||
{ "path": "/newton/prep" }).next()
|
||||
{ "path": "/newton/prep",
|
||||
"start": 0, "end": 0 }).next()
|
||||
in_("400 Bad Request", str(e.exception))
|
||||
in_("no data provided", str(e.exception))
|
||||
|
||||
@@ -238,7 +267,7 @@ class TestClient(object):
|
||||
# still disable chunked responses for debugging.
|
||||
x = client.http.get("stream/intervals", { "path": "/newton/prep" },
|
||||
retjson=False)
|
||||
eq_(x.count('\n'), 2)
|
||||
lines_(x, 1)
|
||||
if "transfer-encoding: chunked" not in client.http._headers.lower():
|
||||
warnings.warn("Non-chunked HTTP response for /stream/intervals")
|
||||
|
||||
|
@@ -192,11 +192,17 @@ class TestCmdline(object):
|
||||
self.contain("no such layout")
|
||||
|
||||
# Create a few streams
|
||||
self.ok("create /newton/zzz/rawnotch RawNotchedData")
|
||||
self.ok("create /newton/prep PrepData")
|
||||
self.ok("create /newton/raw RawData")
|
||||
self.ok("create /newton/zzz/rawnotch RawNotchedData")
|
||||
|
||||
# Verify we got those 3 streams
|
||||
# Should not be able to create a stream with another stream as
|
||||
# its parent
|
||||
self.fail("create /newton/prep/blah PrepData")
|
||||
self.contain("error creating table at that path")
|
||||
|
||||
# Verify we got those 3 streams and they're returned in
|
||||
# alphabetical order.
|
||||
self.ok("list")
|
||||
self.match("/newton/prep PrepData\n"
|
||||
"/newton/raw RawData\n"
|
||||
@@ -362,36 +368,36 @@ class TestCmdline(object):
|
||||
def test_cmdline_07_detail(self):
|
||||
# Just count the number of lines, it's probably fine
|
||||
self.ok("list --detail")
|
||||
eq_(self.captured.count('\n'), 11)
|
||||
lines_(self.captured, 8)
|
||||
|
||||
self.ok("list --detail --path *prep")
|
||||
eq_(self.captured.count('\n'), 7)
|
||||
lines_(self.captured, 4)
|
||||
|
||||
self.ok("list --detail --path *prep --start='23 Mar 2012 10:02'")
|
||||
eq_(self.captured.count('\n'), 5)
|
||||
lines_(self.captured, 3)
|
||||
|
||||
self.ok("list --detail --path *prep --start='23 Mar 2012 10:05'")
|
||||
eq_(self.captured.count('\n'), 3)
|
||||
lines_(self.captured, 2)
|
||||
|
||||
self.ok("list --detail --path *prep --start='23 Mar 2012 10:05:15'")
|
||||
eq_(self.captured.count('\n'), 2)
|
||||
lines_(self.captured, 2)
|
||||
self.contain("10:05:15.000")
|
||||
|
||||
self.ok("list --detail --path *prep --start='23 Mar 2012 10:05:15.50'")
|
||||
eq_(self.captured.count('\n'), 2)
|
||||
lines_(self.captured, 2)
|
||||
self.contain("10:05:15.500")
|
||||
|
||||
self.ok("list --detail --path *prep --start='23 Mar 2012 19:05:15.50'")
|
||||
eq_(self.captured.count('\n'), 2)
|
||||
lines_(self.captured, 2)
|
||||
self.contain("no intervals")
|
||||
|
||||
self.ok("list --detail --path *prep --start='23 Mar 2012 10:05:15.50'"
|
||||
+ " --end='23 Mar 2012 10:05:15.50'")
|
||||
eq_(self.captured.count('\n'), 2)
|
||||
lines_(self.captured, 2)
|
||||
self.contain("10:05:15.500")
|
||||
|
||||
self.ok("list --detail")
|
||||
eq_(self.captured.count('\n'), 11)
|
||||
lines_(self.captured, 8)
|
||||
|
||||
def test_cmdline_08_extract(self):
|
||||
# nonexistent stream
|
||||
@@ -444,7 +450,7 @@ class TestCmdline(object):
|
||||
|
||||
# all data put in by tests
|
||||
self.ok("extract -a /newton/prep --start 2000-01-01 --end 2020-01-01")
|
||||
eq_(self.captured.count('\n'), 43204)
|
||||
lines_(self.captured, 43204)
|
||||
self.ok("extract -c /newton/prep --start 2000-01-01 --end 2020-01-01")
|
||||
self.match("43200\n")
|
||||
|
||||
@@ -453,6 +459,57 @@ class TestCmdline(object):
|
||||
server_stop()
|
||||
server_start(max_results = 2)
|
||||
self.ok("list --detail")
|
||||
eq_(self.captured.count('\n'), 11)
|
||||
lines_(self.captured, 8)
|
||||
server_stop()
|
||||
server_start()
|
||||
|
||||
def test_cmdline_10_destroy(self):
|
||||
# Delete records
|
||||
self.ok("destroy --help")
|
||||
|
||||
self.fail("destroy")
|
||||
self.contain("too few arguments")
|
||||
|
||||
self.fail("destroy /no/such/stream")
|
||||
self.contain("No stream at path")
|
||||
|
||||
self.fail("destroy asdfasdf")
|
||||
self.contain("No stream at path")
|
||||
|
||||
# From previous tests, we have:
|
||||
self.ok("list")
|
||||
self.match("/newton/prep PrepData\n"
|
||||
"/newton/raw RawData\n"
|
||||
"/newton/zzz/rawnotch RawNotchedData\n")
|
||||
|
||||
# Notice how they're not empty
|
||||
self.ok("list --detail")
|
||||
lines_(self.captured, 8)
|
||||
|
||||
# Delete some
|
||||
self.ok("destroy /newton/prep")
|
||||
self.ok("list")
|
||||
self.match("/newton/raw RawData\n"
|
||||
"/newton/zzz/rawnotch RawNotchedData\n")
|
||||
|
||||
self.ok("destroy /newton/zzz/rawnotch")
|
||||
self.ok("list")
|
||||
self.match("/newton/raw RawData\n")
|
||||
|
||||
self.ok("destroy /newton/raw")
|
||||
self.ok("create /newton/raw RawData")
|
||||
self.ok("destroy /newton/raw")
|
||||
self.ok("list")
|
||||
self.match("")
|
||||
|
||||
# Re-create a previously deleted location, and some new ones
|
||||
rebuild = [ "/newton/prep", "/newton/zzz",
|
||||
"/newton/raw", "/newton/asdf/qwer" ]
|
||||
for path in rebuild:
|
||||
# Create the path
|
||||
self.ok("create " + path + " PrepData")
|
||||
self.ok("list")
|
||||
self.contain(path)
|
||||
# Make sure it was created empty
|
||||
self.ok("list --detail --path " + path)
|
||||
self.contain("(no intervals)")
|
||||
|
@@ -20,6 +20,12 @@ def ne_(a, b):
|
||||
if not a != b:
|
||||
raise AssertionError("unexpected %s == %s" % (myrepr(a), myrepr(b)))
|
||||
|
||||
def lines_(a, n):
|
||||
l = a.count('\n')
|
||||
if not l == n:
|
||||
raise AssertionError("wanted %d lines, got %d in output: '%s'"
|
||||
% (n, l, a))
|
||||
|
||||
def recursive_unlink(path):
|
||||
try:
|
||||
shutil.rmtree(path)
|
||||
|
@@ -13,12 +13,19 @@ from nilmdb.interval import Interval, DBInterval, IntervalSet, IntervalError
|
||||
from test_helpers import *
|
||||
import unittest
|
||||
|
||||
# set to False to skip live renders
|
||||
do_live_renders = False
|
||||
def render(iset, description = "", live = True):
|
||||
import renderdot
|
||||
r = renderdot.RBTreeRenderer(iset.tree)
|
||||
return r.render(description, live and do_live_renders)
|
||||
|
||||
def makeset(string):
|
||||
"""Build an IntervalSet from a string, for testing purposes
|
||||
|
||||
Each character is 1 second
|
||||
[ = interval start
|
||||
| = interval end + adjacent start
|
||||
| = interval end + next start
|
||||
] = interval end
|
||||
. = zero-width interval (identical start and end)
|
||||
anything else is ignored
|
||||
@@ -31,7 +38,7 @@ def makeset(string):
|
||||
elif (c == "|"):
|
||||
iset += Interval(start, day)
|
||||
start = day
|
||||
elif (c == "]"):
|
||||
elif (c == ")"):
|
||||
iset += Interval(start, day)
|
||||
del start
|
||||
elif (c == "."):
|
||||
@@ -71,24 +78,24 @@ class TestInterval:
|
||||
assert(Interval(d1, d3) < Interval(d2, d3))
|
||||
assert(Interval(d2, d2) > Interval(d1, d3))
|
||||
assert(Interval(d3, d3) == Interval(d3, d3))
|
||||
with assert_raises(TypeError): # was AttributeError, that's wrong
|
||||
x = (i == 123)
|
||||
#with assert_raises(TypeError): # was AttributeError, that's wrong
|
||||
# x = (i == 123)
|
||||
|
||||
# subset
|
||||
assert(Interval(d1, d3).subset(d1, d2) == Interval(d1, d2))
|
||||
eq_(Interval(d1, d3).subset(d1, d2), Interval(d1, d2))
|
||||
with assert_raises(IntervalError):
|
||||
x = Interval(d2, d3).subset(d1, d2)
|
||||
|
||||
# big integers and floats
|
||||
x = Interval(5000111222, 6000111222)
|
||||
eq_(str(x), "[5000111222.0 -> 6000111222.0]")
|
||||
eq_(str(x), "[5000111222.0 -> 6000111222.0)")
|
||||
x = Interval(123.45, 234.56)
|
||||
eq_(str(x), "[123.45 -> 234.56]")
|
||||
eq_(str(x), "[123.45 -> 234.56)")
|
||||
|
||||
# misc
|
||||
i = Interval(d1, d2)
|
||||
eq_(repr(i), repr(eval(repr(i))))
|
||||
eq_(str(i), "[1332561600.0 -> 1332648000.0]")
|
||||
eq_(str(i), "[1332561600.0 -> 1332648000.0)")
|
||||
|
||||
def test_interval_intersect(self):
|
||||
# Test Interval intersections
|
||||
@@ -109,7 +116,7 @@ class TestInterval:
|
||||
except IntervalError:
|
||||
assert(i not in should_intersect[True] and
|
||||
i not in should_intersect[False])
|
||||
with assert_raises(AttributeError):
|
||||
with assert_raises(TypeError):
|
||||
x = i1.intersects(1234)
|
||||
|
||||
def test_intervalset_construct(self):
|
||||
@@ -130,6 +137,15 @@ class TestInterval:
|
||||
x = iseta != 3
|
||||
ne_(IntervalSet(a), IntervalSet(b))
|
||||
|
||||
# Note that assignment makes a new reference (not a copy)
|
||||
isetd = IntervalSet(isetb)
|
||||
isete = isetd
|
||||
eq_(isetd, isetb)
|
||||
eq_(isetd, isete)
|
||||
isetd -= a
|
||||
ne_(isetd, isetb)
|
||||
eq_(isetd, isete)
|
||||
|
||||
# test iterator
|
||||
for interval in iseta:
|
||||
pass
|
||||
@@ -151,11 +167,18 @@ class TestInterval:
|
||||
iset = IntervalSet(a)
|
||||
iset += IntervalSet(b)
|
||||
eq_(iset, IntervalSet([a, b]))
|
||||
|
||||
iset = IntervalSet(a)
|
||||
iset += b
|
||||
eq_(iset, IntervalSet([a, b]))
|
||||
|
||||
iset = IntervalSet(a)
|
||||
iset.iadd_nocheck(b)
|
||||
eq_(iset, IntervalSet([a, b]))
|
||||
|
||||
iset = IntervalSet(a) + IntervalSet(b)
|
||||
eq_(iset, IntervalSet([a, b]))
|
||||
|
||||
iset = IntervalSet(b) + a
|
||||
eq_(iset, IntervalSet([a, b]))
|
||||
|
||||
@@ -168,61 +191,79 @@ class TestInterval:
|
||||
|
||||
# misc
|
||||
eq_(repr(iset), repr(eval(repr(iset))))
|
||||
eq_(str(iset), "[[100.0 -> 200.0], [200.0 -> 300.0]]")
|
||||
eq_(str(iset), "[[100.0 -> 200.0), [200.0 -> 300.0)]")
|
||||
|
||||
def test_intervalset_geniset(self):
|
||||
# Test basic iset construction
|
||||
assert(makeset(" [----] ") ==
|
||||
makeset(" [-|--] "))
|
||||
eq_(makeset(" [----) "),
|
||||
makeset(" [-|--) "))
|
||||
|
||||
assert(makeset("[] [--] ") +
|
||||
makeset(" [] [--]") ==
|
||||
makeset("[|] [-----]"))
|
||||
eq_(makeset("[) [--) ") +
|
||||
makeset(" [) [--)"),
|
||||
makeset("[|) [-----)"))
|
||||
|
||||
assert(makeset(" [-------]") ==
|
||||
makeset(" [-|-----|"))
|
||||
eq_(makeset(" [-------)"),
|
||||
makeset(" [-|-----|"))
|
||||
|
||||
|
||||
def test_intervalset_intersect(self):
|
||||
# Test intersection (&)
|
||||
with assert_raises(TypeError): # was AttributeError
|
||||
x = makeset("[--]") & 1234
|
||||
x = makeset("[--)") & 1234
|
||||
|
||||
assert(makeset("[---------]") &
|
||||
makeset(" [---] ") ==
|
||||
makeset(" [---] "))
|
||||
# Intersection with interval
|
||||
eq_(makeset("[---|---)[)") &
|
||||
list(makeset(" [------) "))[0],
|
||||
makeset(" [-----) "))
|
||||
|
||||
assert(makeset(" [---] ") &
|
||||
makeset("[---------]") ==
|
||||
makeset(" [---] "))
|
||||
# Intersection with sets
|
||||
eq_(makeset("[---------)") &
|
||||
makeset(" [---) "),
|
||||
makeset(" [---) "))
|
||||
|
||||
assert(makeset(" [-----]") &
|
||||
makeset(" [-----] ") ==
|
||||
makeset(" [--] "))
|
||||
eq_(makeset(" [---) ") &
|
||||
makeset("[---------)"),
|
||||
makeset(" [---) "))
|
||||
|
||||
assert(makeset(" [--] [--]") &
|
||||
makeset(" [------] ") ==
|
||||
makeset(" [-] [-] "))
|
||||
eq_(makeset(" [-----)") &
|
||||
makeset(" [-----) "),
|
||||
makeset(" [--) "))
|
||||
|
||||
assert(makeset(" [---]") &
|
||||
makeset(" [--] ") ==
|
||||
makeset(" "))
|
||||
eq_(makeset(" [--) [--)") &
|
||||
makeset(" [------) "),
|
||||
makeset(" [-) [-) "))
|
||||
|
||||
assert(makeset(" [---]") &
|
||||
makeset(" [----] ") ==
|
||||
makeset(" . "))
|
||||
eq_(makeset(" [---)") &
|
||||
makeset(" [--) "),
|
||||
makeset(" "))
|
||||
|
||||
assert(makeset(" [-|---]") &
|
||||
makeset(" [-----|-] ") ==
|
||||
makeset(" [----] "))
|
||||
eq_(makeset(" [-|---)") &
|
||||
makeset(" [-----|-) "),
|
||||
makeset(" [----) "))
|
||||
|
||||
assert(makeset(" [-|-] ") &
|
||||
makeset(" [-|--|--] ") ==
|
||||
makeset(" [---] "))
|
||||
eq_(makeset(" [-|-) ") &
|
||||
makeset(" [-|--|--) "),
|
||||
makeset(" [---) "))
|
||||
|
||||
assert(makeset(" [----][--]") &
|
||||
makeset("[-] [--] []") ==
|
||||
makeset(" [] [-]. []"))
|
||||
# Border cases -- will give different results if intervals are
|
||||
# half open or fully closed. Right now, they are half open,
|
||||
# although that's a little messy since the database intervals
|
||||
# often contain a data point at the endpoint.
|
||||
half_open = True
|
||||
if half_open:
|
||||
eq_(makeset(" [---)") &
|
||||
makeset(" [----) "),
|
||||
makeset(" "))
|
||||
eq_(makeset(" [----)[--)") &
|
||||
makeset("[-) [--) [)"),
|
||||
makeset(" [) [-) [)"))
|
||||
else:
|
||||
eq_(makeset(" [---)") &
|
||||
makeset(" [----) "),
|
||||
makeset(" . "))
|
||||
eq_(makeset(" [----)[--)") &
|
||||
makeset("[-) [--) [)"),
|
||||
makeset(" [) [-). [)"))
|
||||
|
||||
class TestIntervalDB:
|
||||
def test_dbinterval(self):
|
||||
@@ -273,12 +314,13 @@ class TestIntervalTree:
|
||||
import random
|
||||
random.seed(1234)
|
||||
|
||||
# make a set of 500 intervals
|
||||
# make a set of 100 intervals
|
||||
iset = IntervalSet()
|
||||
j = 500
|
||||
j = 100
|
||||
for i in random.sample(xrange(j),j):
|
||||
interval = Interval(i, i+1)
|
||||
iset += interval
|
||||
render(iset, "Random Insertion")
|
||||
|
||||
# remove about half of them
|
||||
for i in random.sample(xrange(j),j):
|
||||
@@ -288,10 +330,15 @@ class TestIntervalTree:
|
||||
# try removing an interval that doesn't exist
|
||||
with assert_raises(IntervalError):
|
||||
iset -= Interval(1234,5678)
|
||||
render(iset, "Random Insertion, deletion")
|
||||
|
||||
# show the graph
|
||||
if False:
|
||||
iset.tree.render_dot_live()
|
||||
# make a set of 100 intervals, inserted in order
|
||||
iset = IntervalSet()
|
||||
j = 100
|
||||
for i in xrange(j):
|
||||
interval = Interval(i, i+1)
|
||||
iset += interval
|
||||
render(iset, "In-order insertion")
|
||||
|
||||
class TestIntervalSpeed:
|
||||
@unittest.skip("this is slow")
|
||||
@@ -300,18 +347,23 @@ class TestIntervalSpeed:
|
||||
import time
|
||||
import aplotter
|
||||
import random
|
||||
import math
|
||||
|
||||
print
|
||||
yappi.start()
|
||||
speeds = {}
|
||||
for j in [ 2**x for x in range(5,18) ]:
|
||||
for j in [ 2**x for x in range(5,20) ]:
|
||||
start = time.time()
|
||||
iset = IntervalSet()
|
||||
for i in random.sample(xrange(j),j):
|
||||
interval = Interval(i, i+1)
|
||||
iset += interval
|
||||
speed = (time.time() - start) * 1000000.0
|
||||
printf("%d: %g μs (%g μs each)\n", j, speed, speed/j)
|
||||
printf("%d: %g μs (%g μs each, O(n log n) ratio %g)\n",
|
||||
j,
|
||||
speed,
|
||||
speed/j,
|
||||
speed / (j*math.log(j))) # should be constant
|
||||
speeds[j] = speed
|
||||
aplotter.plot(speeds.keys(), speeds.values(), plot_slope=True)
|
||||
yappi.stop()
|
||||
|
@@ -196,6 +196,6 @@ class TestServer(object):
|
||||
# GET instead of POST (no body)
|
||||
# (actual POST test is done by client code)
|
||||
with assert_raises(HTTPError) as e:
|
||||
getjson("/stream/insert?path=/newton/prep")
|
||||
getjson("/stream/insert?path=/newton/prep&start=0&end=0")
|
||||
eq_(e.exception.code, 400)
|
||||
|
||||
|
@@ -11,65 +11,149 @@ from nilmdb.rbtree import RBTree, RBNode
|
||||
from test_helpers import *
|
||||
import unittest
|
||||
|
||||
render = False
|
||||
# set to False to skip live renders
|
||||
do_live_renders = False
|
||||
def render(tree, description = "", live = True):
|
||||
import renderdot
|
||||
r = renderdot.RBTreeRenderer(tree)
|
||||
return r.render(description, live and do_live_renders)
|
||||
|
||||
class TestRBTree:
|
||||
def test_rbtree(self):
|
||||
rb = RBTree()
|
||||
rb.insert(RBNode(None, 10000, 10001))
|
||||
rb.insert(RBNode(None, 10004, 10007))
|
||||
rb.insert(RBNode(None, 10001, 10002))
|
||||
s = rb.render_dot()
|
||||
rb.insert(RBNode(10000, 10001))
|
||||
rb.insert(RBNode(10004, 10007))
|
||||
rb.insert(RBNode(10001, 10002))
|
||||
# There was a typo that gave the RBTree a loop in this case.
|
||||
# Verify that the dot isn't too big.
|
||||
s = render(rb, live = False)
|
||||
assert(len(s.splitlines()) < 30)
|
||||
|
||||
def test_rbtree_big(self):
|
||||
import random
|
||||
random.seed(1234)
|
||||
|
||||
# make a set of 500 intervals, inserted in order
|
||||
# make a set of 100 intervals, inserted in order
|
||||
rb = RBTree()
|
||||
j = 500
|
||||
j = 100
|
||||
for i in xrange(j):
|
||||
rb.insert(RBNode(None, i, i+1))
|
||||
|
||||
# show the graph
|
||||
if render:
|
||||
rb.render_dot_live("in-order insert")
|
||||
rb.insert(RBNode(i, i+1))
|
||||
render(rb, "in-order insert")
|
||||
|
||||
# remove about half of them
|
||||
for i in random.sample(xrange(j),j):
|
||||
if random.randint(0,1):
|
||||
rb.delete(rb.find(i, i+1))
|
||||
render(rb, "in-order insert, random delete")
|
||||
|
||||
# show the graph
|
||||
if render:
|
||||
rb.render_dot_live("in-order insert, random delete")
|
||||
|
||||
# make a set of 500 intervals, inserted at random
|
||||
# make a set of 100 intervals, inserted at random
|
||||
rb = RBTree()
|
||||
j = 500
|
||||
j = 100
|
||||
for i in random.sample(xrange(j),j):
|
||||
rb.insert(RBNode(None, i, i+1))
|
||||
|
||||
# show the graph
|
||||
if render:
|
||||
rb.render_dot_live("random insert")
|
||||
rb.insert(RBNode(i, i+1))
|
||||
render(rb, "random insert")
|
||||
|
||||
# remove about half of them
|
||||
for i in random.sample(xrange(j),j):
|
||||
if random.randint(0,1):
|
||||
rb.delete(rb.find(i, i+1))
|
||||
render(rb, "random insert, random delete")
|
||||
|
||||
# show the graph
|
||||
if render:
|
||||
rb.render_dot_live("random insert, random delete")
|
||||
# in-order insert of 50 more
|
||||
for i in xrange(50):
|
||||
rb.insert(RBNode(i+500, i+501))
|
||||
render(rb, "random insert, random delete, in-order insert")
|
||||
|
||||
# in-order insert of 250 more
|
||||
for i in xrange(250):
|
||||
rb.insert(RBNode(None, i+500, i+501))
|
||||
def test_rbtree_basics(self):
|
||||
rb = RBTree()
|
||||
vals = [ 7, 14, 1, 2, 8, 11, 5, 15, 4]
|
||||
for n in vals:
|
||||
rb.insert(RBNode(n, n))
|
||||
|
||||
# show the graph
|
||||
if render:
|
||||
rb.render_dot_live("random insert, random delete, in-order insert")
|
||||
# stringify
|
||||
s = ""
|
||||
for node in rb:
|
||||
s += str(node)
|
||||
in_("[node (None) 1", s)
|
||||
eq_(str(rb.nil), "[node nil]")
|
||||
|
||||
# inorder traversal, successor and predecessor
|
||||
last = 0
|
||||
for node in rb:
|
||||
assert(node.start > last)
|
||||
last = node.start
|
||||
successor = rb.successor(node)
|
||||
if successor:
|
||||
assert(rb.predecessor(successor) is node)
|
||||
predecessor = rb.predecessor(node)
|
||||
if predecessor:
|
||||
assert(rb.successor(predecessor) is node)
|
||||
|
||||
# Delete node not in the tree
|
||||
with assert_raises(AttributeError):
|
||||
rb.delete(RBNode(1,2))
|
||||
|
||||
# Delete all nodes!
|
||||
for node in rb:
|
||||
rb.delete(node)
|
||||
|
||||
# Build it up again, make sure it matches
|
||||
for n in vals:
|
||||
rb.insert(RBNode(n, n))
|
||||
s2 = ""
|
||||
for node in rb:
|
||||
s2 += str(node)
|
||||
assert(s == s2)
|
||||
|
||||
def test_rbtree_find(self):
|
||||
# Get a little bit of coverage for some overlapping cases,
|
||||
# even though the class doesn't fully support it.
|
||||
rb = RBTree()
|
||||
nodes = [ RBNode(1, 5), RBNode(1, 10), RBNode(1, 15) ]
|
||||
for n in nodes:
|
||||
rb.insert(n)
|
||||
assert(rb.find(1, 5) is nodes[0])
|
||||
assert(rb.find(1, 10) is nodes[1])
|
||||
assert(rb.find(1, 15) is nodes[2])
|
||||
|
||||
def test_rbtree_find_leftright(self):
|
||||
# Now let's get some ranges in there
|
||||
rb = RBTree()
|
||||
vals = [ 7, 14, 1, 2, 8, 11, 5, 15, 4]
|
||||
for n in vals:
|
||||
rb.insert(RBNode(n*10, n*10+5))
|
||||
|
||||
# Check find_end_left, find_right_start
|
||||
for i in range(160):
|
||||
left = rb.find_left_end(i)
|
||||
right = rb.find_right_start(i)
|
||||
if left:
|
||||
# endpoint should be more than i
|
||||
assert(left.end >= i)
|
||||
# all earlier nodes should have a lower endpoint
|
||||
for node in rb:
|
||||
if node is left:
|
||||
break
|
||||
assert(node.end < i)
|
||||
if right:
|
||||
# startpoint should be less than i
|
||||
assert(right.start <= i)
|
||||
# all later nodes should have a higher startpoint
|
||||
for node in reversed(list(rb)):
|
||||
if node is right:
|
||||
break
|
||||
assert(node.start > i)
|
||||
|
||||
def test_rbtree_intersect(self):
|
||||
# Fill with some ranges
|
||||
rb = RBTree()
|
||||
rb.insert(RBNode(10,20))
|
||||
rb.insert(RBNode(20,25))
|
||||
rb.insert(RBNode(30,40))
|
||||
# Just a quick test; test_interval will do better.
|
||||
eq_(len(list(rb.intersect(1,100))), 3)
|
||||
eq_(len(list(rb.intersect(10,20))), 1)
|
||||
eq_(len(list(rb.intersect(5,15))), 1)
|
||||
eq_(len(list(rb.intersect(15,15))), 1)
|
||||
eq_(len(list(rb.intersect(20,21))), 1)
|
||||
eq_(len(list(rb.intersect(19,21))), 2)
|
||||
|
@@ -1,8 +1,9 @@
|
||||
./nilmtool.py destroy /bpnilm/2/raw
|
||||
./nilmtool.py create /bpnilm/2/raw RawData
|
||||
|
||||
if true; then
|
||||
time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-110000 /bpnilm/2/raw
|
||||
time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-120001 /bpnilm/2/raw
|
||||
time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-110000 -r 8000 /bpnilm/2/raw
|
||||
time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s 20110513-120001 -r 8000 /bpnilm/2/raw
|
||||
else
|
||||
for i in $(seq 2000 2050); do
|
||||
time zcat /home/jim/bpnilm-data/snapshot-1-20110513-110002.raw.gz | ./nilmtool.py insert -s ${i}0101-010001 /bpnilm/2/raw
|
||||
|
Reference in New Issue
Block a user