DQL roots

Posted by

A few years ago, I and 3 of my colleagues were drafted for a skunkworks effort, a throwaway project. Prove concept, save relics and go back to your regular job. We were interested in taking a quite functional REST API that was serviced by much more expensive technology and have it instead use native Domino services. We worked for a few months, over the Christmas holiday season, to show a cheaper way to give the API what it needed to function.

Part of that work involved data transformation. JSON is the format of all REST payloads so it was something we needed to supply and consume. Fortunately, for the most part we had some built-in libraries for that problem. But another part was query solving. And pulling together Domino services to satisfy the different query terms, it worked! We delivered a demonstrable, cheap prototype that inspired later work.

I have a long history in query processing going back my 1986 work on the mainframe database, Model 204®, now owned by Rocket Software. Its language, unceremoniously called “User Language” thrives by using 2 kinds of indexes and direct-data access in a way that was at least 40 years a precursor of Lucene and Hadoop sharding and map-reduce engines. Its Boolean processing is both stingy in avoiding I/O via partitioning and optimal in actual low-level operations using the machine-level instruction set to AND, OR and NOT bitmaps.

Image result for and or not

Later, the same technology was ported to the C language and Unix/Windows and I was part of an effort to support the full SQL 1992 language. It ported well, and specialized in the same area – high speed complex Boolean processing.

I also have a long history with SQL. I appreciate its strengths and its standardized publication of the very well defined relational algebra. But, working on Notes/Domino and diving deeply into the unique and valuable properties and capabilities of semi-structured and unstructured data, I have observed that the mapping to the SQL standard has always been a forced one and the success of each attempt varied at best. Enter NoSQL and its pundits. Indeed, enter the internet, where relational data plays a subsidiary role in the extensive unstructured data corpus.

Image result for internet data corpus

Earlier this year (2018) we began working in earnest on providing NoSQL capabilities using Node.js to access Domino. We surveyed the landscape and found it populated by engines that had invested heavily in JSON as their native data format. Now, one of the most beautiful attributes of Domino has always been its malleability to support any number of front (and truth be told, back) ends. Node.js and JSON are no exception, though there is work to do. And they comprise what can only be described as a new standard.

The challenge for us in developing this new front end is to map and make valuable the data, processing and everything else possible in Domino in the new (well, new to Domino) format. Though I pledge to write a LOT about the work in such a way that seeks both input and advertises the incumbent power of the underlying engine, one early deliverable was quickly identified as a query capability.

Domino has had the underlying structures to support a general query facility for a long time. It is NOT a relational engine, which is a very good thing for a NoSQL database. And its deep underpinnings in unstructured, relationally denormalized data are formative in this work.

Now, much of the Domino engine was built in support of the Notes® client and its browser-based ancestors. That is not a liability; there is very rich and useful functionality at our collective disposal. But in Node.js and a query facility, the usage of the indexes and document data has a different footprint. For instance, a call to render 100 index entries at a time while scrolling an inbox or view is a small increment of that needed to find the results for 5000 entries across the same view. And we need to take care not to overwhelm one kind of processing with the other.

But using the indexes of the Notes Indexing Facility (NIF – the part of Domino that comprises views) was an obvious approach in the aforementioned skunkworks and it has born fruit in the current effort. Given the semantics of a database-level query, and the Domino data model, certain restrictions in view and view column design have been needed to have a working engine.

Set-based terms connected with Boolean primitives are the building blocks of any query engine. And in that skunkworks we also identified the Domino IDTable functions as the avenue of choice for Boolean processing. Their speed is tremendous. The one restriction they bring is that NoteIDs are not portable to other replicas, but that affected no early user story or requirement is worth living with for the performance benefit. IDTables are the currency of the query engine and as such, all data manipulation will be done via efficient post processing, at least for now.

We also needed to define the language. Early on we identified the existing engines in the document-based NoSQL world. They were MongoDB®, and CouchDB®, both well established and adopted in the field They each had JSON query interfaces that have users building Boolean trees. So that was the first interface we built, DQL 1.0 if you like. But when we looked at it, and read developer reviews of those interfaces, we concluded it was not way to go. That decision forged DQL in its current, shipping form.  We didn’t focus on the language so much as the engine.  So we called it DGQF (Domino General Query Facility) because Domino is a collection of facilities working together.  But the language acronym, DQL, won the day to the praises of many (If it’s ok, we’ll still call it DQF internally).

There isn’t space here to go into all the variants and power of the language already. The formal documentation is undergoing its final editions and I will provide pointers once it’s available. The approach we took is sound and will yield newfound power in the hands of application developers even into a new generation. We did our best – and will continue in that – to bring existing capabilities into innovative use and expose components such as IDTables that exist in views and folders, into the syntax. We think it hangs together pretty well.

So .. enjoy. And here’s to Domino V11. You ain’t seen nothin’ yet!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s