-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random feedback #4
Comments
A tab I found open: https://groups.google.com/forum/#!topic/datomic/c9ZGWHLqTMY (Datomic already supports at least Couchbase as backing store? But of course a JS implementation of what is striking me as a cool-but-compact query language is a sweet project!) |
So the more I think about it, and read the datomic docs, the more I think this ultimately should become both a PouchDB and CouchDB plugin the way geo is! At that level, the T stuff could be handled properly and invisibly — i.e. when you catch up your datalog index(es) with recent primary storage changes, you just would keep the old emits instead of removing them like MapReduce view engine does. The best part then is that you don't need any special write policy and the resulting main set of documents can still be used for other indexes too (MapReduce/spatial/fulltext…) |
@natevw ohai! Could you throw the most interesting ideas you'd found to CouchDB dev@ ML? |
@natevw Thank you so much for all your thoughts! It's gonna take me a while to digest all of it, but my (light) first pass yielded two immediate questions:
|
With the caveat that I still know very little about Datalog, here's some thoughts on going whole hog while not needing tons of index space:
According to https://github.com/tonsky/datascript#project-status you need:
I understand these to be permutations of the
Entity Attribute Value Transaction
concepts from reading e.g. http://docs.datomic.com/query.html#sec-2. In our case "entity" is basically doc._id; "attribute" is the name of a property within that doc (i.e. key); "value" is the value of said property, and I'm beginning to suspect that "transaction" is just doc._local_seq.[Within a single instance of a database doc._local_seq stores a monontically increasing number — the database "_changes" sequence at that point. So for the same _id/_rev in a replica it may be different, but locally it's kinda what you need. Although I bet your trouble's gonna be that the view is only going to include
emit
s from the most recent/winning version…so if you go to use it your data could actually disappear mid-execution which is obviously the opposite of the point!]Code review
So anyway, putting aside transactions for a bit (have a few more thoughts, maybe lower down ;-) let's go to reviewing the code at https://github.com/dahjelle/pouch-datalog/blob/c9ae7f421c93d3f5b5ac64681fca5fe07b567912/index.js#L21:
Clean up emits
For starters why not just drop the
emit(tuple, tuple)
silliness and allow justemit(tuple)
by a simple change here:Right?
Avoid redundant _id
For your AVE view you really just need to
emit([k,v])
. This will save a bit of redundant space usage. Your query connector can fix it back up without much trouble:Note that [since we've left transactions out — this is actually the AVET index!] this doesn't change the sort order or anything — internally when you
emit(k,v)
CouchDB sort of stores/sorts that as[k,id] -> v
, for array keys its like you have one more item at the end that you can hop to via ?startkey_docid.Getting rid of views, pt. 1
I'm of a mixed mind on this one, but for completeness: the EAV[T] index is sort of redundant with the built-in _all_docs one, assuming it always just gets called for a single E+A lookup. What I mean is that you would do something like this pseudocode in the scope above the code we've been reviewing:
That is, an index by E is basically just what CouchDB has underlying
db.get
(akadb.allDocs
if you do need a range of E, which seems unlikely in this case?) and then picking out the relevant keys.Now I'm a mixed mind, because if your documents are large and you only want to fetch a single attribute across each, having this index will save I/O and parsing overhead — so it's a tradeoff between storing an additional copy of your database in the form of this index, versus better query performance. So keeping it is in the Couch tradition of optimizing for disk usage last.
Getting rid of views, alternate universe
So what is interesting about this from a user of this library perspective is that really my job is just to emit whatever [A,V[, T]] tuples I think are relevant from a document!
There's a couple tricks that might be useful in this vein:
require()
logic in your viewsSo you could things like:
where (unbecarest to the user) the helper would be something like:
Or I can almost imagine even an inverted version where pouch-datalog provides the "real" map and the user provides the "tuple" one, but basically splitting the actual CouchDB details from the Datalog tuple concept.
There's kind of a tradeoff here, where you do insulate the user from the internal indexing details but now they have to copy-paste the right version of your code into their ddoc before using this plugin (or is Kanso not actually dead yet?)
Although! You could just add an initialization phase to your plugin? Something like:
…and it would basically write/overwrite "_design/datalog" with something generated from, roughly:
Okay, in the time I spent explaining this I could have experimented with this for real, but at least now you know what I think you should do ;-)
One more thing
Oh, on the transaction thing, which I think is kind of cool although also agree it makes sense to at least leave optional, basically you tell users (or give them a helper method…) that instead of saving over top the old doc, basically just post a new doc each time it changes. So instead of having a replacing sequence (pretend its a changes feed):
you store:
You'll still get conflict detection since the ids are deterministically based on the same MVCC token, but now your emitTuple can use
doc._id.split("@after@")[0]
as E anddoc._local_seq
as T (and the provided k/v as A and V, natch) and voilà your user gets Datalog when their ops team gave them CouchDB!The text was updated successfully, but these errors were encountered: