Concerning the Design Catalog …

Efficiency in programming languages is achieved by avoiding bad practices like excessive looping, too many I/O or network operations and data movement. Modern processors have moved the bar of unacceptable performance concerning CPU-consuming inefficiencies but of course you can still loop yourself into a hole. Please don’t take that as a challenge.

With languages that have the power to execute very extensive (and therefore expensive) operations to achieve their ends, the same bottlenecks exist. I/O (even with SSD) and network access can still predominate cost and time, and those two resources are related but not the same. Query and other high-level languages like Pig Latin, Hive, SQL, and MongoDB® query language all work well by optimizing (minimizing and cutting the cost of) underlying data and network access required to satisfy the requests they process.

Classic query optimization

Therefore it becomes very important to plan how to do the work. The first order of business is to make sure the request makes sense, that the participating objects exist and are configured to perform what they have been asked to do. To use an example most people are somewhat familiar with, consider the SQL statement:

SELECT order_number, order_origin FROM orders WHERE part_count > 250 AND back_order = 1

the orders table must be checked to see if it exists and the columns order_number, order_origin, part_count and back_order exist.

You may know that relational databases all have system catalogs. These are sets of tables that can be queried like any other. The actual design of those tables varies by database, but most have something like a TABLES table, where every table ever CREATEd has a row or rows. Now, it is cost prohibitive for those engines to perform queries to compile queries, so they use a memory-resident, highly optimized copy of the TABLEs table (and all other catalog tables) to do that work.

Concerning Domino®, the internal knowledge about design elements resides in intricate and related field values on design documents. Since the same problem of runtime access to even validate queries exists, we have to create different, optimized instances of the design data. We call it the “Design Catalog”.

The second order of business in planning a query/request is to find any helpful optimizing strategies to solve the problem at hand. These are combinations of data structures like indexes and fast-path execution means like pre-seeded query terms or classic approaches like nested loop or sort-merge joins.

But something that seems to be simple yet is remarkable complex like how to order the work is the first decision to be made. In general, equality terms are cheaper than range ones. Index-satisfiable terms are cheaper than those requiring direct data access. And for sharded and distributed databases, getting results for single terms on single nodes is the first order of work for map-reduce processing.

In relational engine system catalogs there is virtually always an INDEX table to be consulted for this part of the problem. And to finish the calculations to perform optimization, system catalogs contain COLUMN tables with gathered and sampled numbers of values (aka cardinality) and other statistical data.

What about Domino’s indexes and DQL optimization?

Across its history, Domino’s indexes have been foundational to its market value. The Notes Indexing Facility (NIF) is a many-splendored thing with its trees of trees and optimized ordinal retrieval capabilities (“get me document 129093 in order by a given key” requires index walking in most engines). Domino’s indexes also house persistently-indexed computed values. Though there may be other engines with something akin to this power there are certainly none more robust. And available today.

So the Design Catalog needed to have quickly-available descriptions of available indexes in a database, meaning that design data needed to be extracted from its normal residence and itself indexed for quick lookup and use in optimization. However, this is complicated business.

For one thing, Domino’s industry-best security model allows for privileges to be applied to design elements. Not all views (or their indexes) are available to all users. For V10.0.0 of Domino we have had to punt on that, and remove all views or folder with readers fields on their design documents from consideration in DQL.

Secondly, since views have implicit document restrictions. So given the Pending view’s selection criteria:

select form = ”order” & order_state = “pending”

any use of those indexes would apply those selection criteria on top of the criteria in the “free form” query term (vs specifying the view to be used like below). So

order_origin = ‘Los Angeles’

using the Pending view would actually mean the following threesome terms:

form = ‘order’ and order_state = ‘pending’ and order_origin = ‘Los Angeles’

and that is not what the user intended. So we need to NOT use views with anything except “Select @all” selection criteria in that general case, and if application developers want to use the Pending view, we opened the syntax

‘Pending’.order_origin = ‘Los Angeles’

which is much more optimal than the fully spelled out threesome since the index persists.

Further considerations

Given the multiply-occurring value data model in Domino, we also restricted free form query terms to only use indexes that exploded those multiply-occurring values into individual index entries. And we had to restrict to using non-Categorized indexes as well.

Query-ability

So in comparison with the relational model above, what of the query-ability of the Design Catalog? Well, we have put the system catalog data into a non-replicated database called GQFDsgn.cat. And by doing so, we have removed the database context of the design elements and that is a liability. So at this writing I cannot guarantee the forward existence of GQFDsgn.cat; it is at this point stopgap. That means any querying of its contents is very risky if attempted. No doubt people will do it anyway and that’s fine.

For now, the Design Catalog gets the job done. Further instructions on its use will appear of its formal documentation.

Share this:

Related

Leave a comment Cancel reply