TDR – Notes Indexing Facility

Posted by

Notes Indexing Facility (NIF) is original equipment for Notes/Domino.  It is the technology that enables folders and views, core Domino objects used universally to display and order sets of document summaries.  Those objects are programmable down to the summary contents and selection in the document sets. 

Historically, the facility’s first job was to service the e-mail inbox, indeed Notes was one of the first e-mail clients and other instantiations like Outlook or Google (search results, then G-Mail) were predated by at least 10 years.  But in Domino that was only the beginning.  Multiple sort orders, wildly open document selection criteria, categorization and support for response hierarchies rounded out the functionality. 

I have come to think about views as persistent query results.  Both the query language of views and their project list (think SQL’s SELECT clause) support full expression handling using Formula Language. 

Since it has always been so easy to create views and folders and edit them to create yet more, 2 trends developed in shops running Domino:

  1. Views propagated.  They can number in the hundreds with indexes (one per sort order) numbering in the thousands.
  2. Updating views, by necessity, always involved a lag.  Unlike transaction-oriented systems like relational DBMSs, where indexes are updated with each data update, given the thousands of existing indexes in Domino, updating was deferred to a scheduled operation or user access.  Lazy updating can be liability for some applications and operations, but the sheer number of views mandated it.
  3. Even so, keeping views updated (or, euphemistically speaking, “refreshed”) took more and more resources to accomplish, disproportionate to the amount of access.
  4. Consolidation and removal/disablement of sort orders was required to provide good service and keep from constant hardware upgrades. 

Accessing Views – the Tumblers

Finding data using an index is classic database data processing.  So is traversing the entries in an index.  Both are part of what NIF offers.  But there’s something else.

Since virtually all clients that traverse indexes only render a single page or window of entries.  A cursor is maintained to keep one’s place in the list.  If a scroll bar is positioned halfway, or 5/8 of the way or 15/1024 of the way into the list, you need a way to quickly get to those spots in the list.  Walking entry by entry won’t do once the number of entries gets large.  For that, Domino NIF has Tumblers – 32 nested offsets into a view’s entries.  So, if a view has 389209 entries, and the user scrolls to entry 23423, there are actually a secondary key (well, promoted and nested page counts) in the index structure that provides instant access to exactly that entry.  Maybe this exists somewhere else, but it’s the only index where I’ve ever seen it.

A drawback is that if Tumblers are held persistently, they get stale.  So don’t do that.

Trees of trees

To effect nesting and full feature support within it, NIF doesn’t just keep a single B-Tree structure for an entire view.  Of course, there is a separate one for each column indexed, but if you ask for categories under any other order, NIF will build separate trees to accommodate you.  This allows independence in traversal, walking the outer layers can be much cheaper if the inner layers have more entries; the Tumblers at each level can advance or go back without affecting other levels.

Typelessness and permutation

Domino’s three main datatypes – text, number and timestamp – can all be seamlessly collated in a single index.  The order of the types is by the published class values (this is from the C API – inc\nsfdata.h in the SDK pack)

#define CLASS_NUMBER                (3 << 8)

#define CLASS_TIME                       (4 << 8)

#define CLASS_TEXT                       (5 << 8)

So, all numbers are sorted before all timestamps before all text values.  Opinions vary about how to use the different datatypes for a given field (with some eschewing the very idea), but people do it, and views created against those fields function fine across the types.

Domino also supports multiply occurring fields.  In views, the default is that only the first value in a list is indexed, but permutation happens when this is checked:

This has the effect of normalizing the entries as if there was a separate relational table for the field, while maintaining optimal access to the whole set of values that are collocated on a single document. 

Now, some goodies

Much of this may be old hat to you; many people have known about views for a long time.  And in that sense, this high-level description of NIF might have been less than informative.  But let me reward you for reading to this point.

I. Optimization – turn off responses.

It’s unpublished as far as I know, but turning this flag off can have a great effect on view traversal speed, affecting both concurrency and throughput

It is on be default for historical reasons, but if you don’t want or need responses, turning it off is advised.  The internal code calls the view “flat” at that point and goes into hyperdrive.

II. Update performance/throughput concurrency using updall -inline

I mentioned that views were updated lazily.  Some have found that due to this and Domino’s somewhat globular locking model, there can be a bottleneck when views are desired to be “refreshed” (the internal code calls it aptly “updated”).  This graphic shows the phenomenon that can result:

In V10, we provided 2 solutions to this issue:

  • Dynamic indexing of highly used views

Available in 9.01FP3 as a view specifiable option, the feature now exists to dynamically set up a dedicated view update thread to streamline and prioritize view updating.  While this doesn’t guarantee refreshed views for every thread, it drastically cuts down on stale ones

  • updall -inline

To truly update views as documents are updated, you should use the -inline option with updall.  It is recommended that you try a view at a time.  This mode of view updating works like a transactional, relational database, applying updates to indexes as they are applied to documents.  And, all “refreshing” threads are made read-only, since the view is guaranteed to be updated. 

Personally, I have generated loads with heavy view contention and seen a 20X speed up for the read threads.

III. A peering into future use of NIF

Again, as I wrote above, views have always functioned as persistent query results.  At this writing, I will set expectations for that to continue and expand to encompass DQL results.  The engine to extract, sort and build indexes is extremely robust and optimized and its use in results production, sorting and joining is a natural fit.

Also, views are far too fixed in their place within databases.  A facility that archives view definitions for seasonal use and/or future research likely already exists; I would expect that the flow of the creation and destruction of views in a Domino database would become a normal part of database administration, tied of course to application requirements, or human interaction.

3 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s