On second thought

If nothing else in life I want to be painfully honest. Of course honesty can be painful not just for the person who practices it but for those who receive its implications. And I want to be able to communicate not-so-good news as well as great news, so people can actually trust me, though it tests my friendships with people in marketing.

On January 24, I gave a webinar on the use of DQL in which I recommended the following approach to save results and reuse them in later queries (Java code):

DocumentCollection doccol;
Database db;
db = Sess.getDatabase("", m_dbname);
DominoQuery dq = db.createDominoQuery();
doccol = dq.execute("date_origin >= @dt('2017-01-01T00:00:00.000') AND date_origin < @dt('2018-01-01T00:00:00.000')");
doccol.putAllInFolder("2017_Sales", true);

Then, after storing the results in the 2017_Sales folder, one could reference that folder in any number of ways in DQL, which all functions fine. For instance:

doccol = dq.execute("in ('2017_Sales') and sales_person = 'Trudi Ayton');

Now to be fair to my recommendation, I had timed such code using a smallish database I have on hand to test DQL syntax. It worked and performed great.

But I had not scaled that test. And when I did, the results were not so good.

My testing was against a 2M+ document database of the same construction. These are laptop numbers, so grant some leeway and know servers will do much better. But still.

The query:

doccol = dq.execute("date_origin >= @dt('2017-01-01T00:00:00.000') AND date_origin < @dt('2018-01-01T00:00:00.000')");

took about 55 seconds to return, traversing through 400K view entries. 40K entries would take one tenth of that on my laptop. That operation is not free, and it’s not claimed to be. I’m sure servers will do much better.

But the operation to put the results into the folder:

doccol.putAllInFolder("2017_Sales", true);

took upwards of 10 minutes

The reason is that Domino uses single-document folder operations to perform that, which is the the atomic operation that gets transaction logged and replicated. And on my laptop, they run about 1.4 msec per document. Again I’m sure a server will do much better.

The real driver of cost was my 400K result. If it had been 40K it would have been much faster, and that operation is a one-time expense. So my advice was sound; it just didn’t describe the scale where this would give an advantage.

Finally, the IN clause in the query that was the goal of this work:

doccol = dq.execute("in ('2017_Sales') and sales_person = 'Trudi Ayton');

was found to be only about 2x faster (~30 seconds) than the original range term with the date values.

Once again, that gets a lot better when the results stored are smaller.

The measurement of cost of the 3 parts (initial query, folder insert, second query using folder) shows that if the first query took a long time to find a smallish result (< 5000 documents say) then putAllInFolder (~7-10 seconds) and the subsequent query are both much faster.

And in any case, we’re working to speed that up and I’ll offer some more tips in further blogs.

Ok, so gross apologies to anyone trying this with a large database and discovering the same thing. Know that I am suitably embarrassed.

We are moving on quickly to hit DQL performance on several fronts and talk is cheap so let me stop here and I will vastly improved results as I have them.

Share this:

Related

Leave a comment Cancel reply