Why saving Notes documents can be dangerous and how scanEZ can help

save-danger

The flexibility and complexity of the IBM Notes and Domino platform can be reflected sometimes in the fickle behavior of Notes documents. As administrators or developers, we need to be very careful how to deal with them before they cause major havoc. A wide variety of problems caused by unnecessarily saving masses of documents is a subject that comes up frequently during conversations with customers, so we deemed it worthy of an article. We’ve even covered some of these in a replication webcast that we presented with TLCC.

Anatomy of a Notes Document

Before looking at the pros and cons regarding saving and modifying Notes documents, let’s examine some of the most important identifiers and standard items responsible for the back end document’s state (Fig. 1).

grid copy paste

Fig.1 – Important identifiers responsible for the back end document’s state, as displayed in scanEZ.

UniversalID (UNID)

This 16 byte identifier is what uniquely identifies a document across replicas. We can use the @DocumentUniqueID formula or the NotesDocument class’s Universal ID method to access it using Lotusscript. Note that the @DocumentUniqueID formula will not return a text value as such, therefore use @Text(@DocumentUniqueID) instead if you want a text value. Using the UniversalID is great when investigating replication problems, as it allows one to search for a given document in any replica.

Tip: Use scanEZ Lite (FREE) to easily copy the Universal ID, and use the “Search by UNID” feature to locate the note in another replica.

Sequence Number

Sequence numbers are increased each time the document is saved. They are one of the most important elements that are considered during replication.

CAUTION: There are two kinds of sequence numbers. The first is a sequence number associated with a document that indicates how many times a given document has been saved. The second type of sequence number is found on each backend item. This lets you know how many times each item has been modified. In optimal circumstances, the highest item level sequence number will equal the document level sequence number. Note that the item level sequence numbers reset after 255 while the document level sequence number will keep increasing. For example, this can result in a document with sequence number 260 and the highest item level sequence number at 5 because it has been reset.

Tip: Sequence numbers are shown in the document property dialog box or they can be retrieved using an API Call. Again, you can use scanEZ Lite (FREE) to access both the document and item level sequence numbers.

Last Modified Date

There are two different “Last Modified” dates on a document or design element (Fig. 2):

grid copy paste
Last Modified (Initially)

The official date the document was modified (whether in this or another replica). It has a significant role in deciding whether the current document wins, or if a conflict document is created. This is the date given by the @Modified formula. Note that this is displayed as “Modified Initially” in scanEZ.

Last Modified (In this file)

The most recent date when the document was modified in the current database. It determines whether or not it should be included in a replication. Any document with a Last Modifed date older than the Replication History entry for the corresponding target server won’t be included in the replication. If the changes were replicated from another database, this date will be different from the Initial Modification stamp. This is the date provided by the LastModified property of the Notes Document class in LotusScript and Document class in Java. Note that this is displayed as “Modified in This File” in scanEZ.

How identifiers are used during replication

All the previously mentioned identifiers are used during replication, and are also part of the explanation why documents should NOT be modified unless absolutely necessary. IBM Notes Domino replication is a large subject and comes with a complex roadmap that the Notes Replicator engine follows. Let’s look below at a simplified list of steps involved in replication that describes what happens when two databases replicate with each other.

A list of notes (documents and designs) identified by the server is built in both replicas.

Documents & Profiles:

– A comparison takes place for each document:

> Documents that do not have a note with their UNID in the other replica are created in the other replica.

> Documents that have a Deletion stub with their UNID in the other replica are deleted, unless the document’s sequence number is larger than the one of the deletion stub.

– For documents that exist in both databases, the sequence number is examined. If both the sequence numbers and Initial Last modification dates are equal, nothing is replicated.

– If either the sequence numbers or Initial Last Modification dates are not equal, the $Revisions item is used to find the last sync date and determine whether a simple update takes place or a conflict is created.

Designs & Deletion Stubs

– Designs or deletion stubs that do not exist in the other database are created during replication. In the case of deletion stubs, this is a “silent creation”.

– For designs or deletion stubs that exist in both databases, the sequence number is examined, and whichever has a smaller sequence number or initial modification date will be updated.

This ensures that there is no conflict, and it also explains why no $Revisions item exists on design notes. So if two developers work on the same template across replicas at the same time, the version with the smaller sequence number loses its changes. Usually, the user who hits Ctrl+S (saves) the most wins!

For a deeper analysis and for more information on what exactly happens when documents or designs are compared, check out this in-depth webinar recording co-hosted by TLCC and Ytria.

What happens when you save a large amount of documents?

When it comes to mass-modifying documents, it’s more important than ever to make sure you save them only if absolutely necessary. Keep in mind that some codes can result in saving documents that are processed even if no modification takes place. Moreover, the impact of making changes on a large scale is directly proportional to the number of documents. When you save a large number of documents, you increase their sequence number, modify both their Last Modified Date stamps and add an entry to the $Revisions item.

This means they’ll ALL get replicated as soon as replication occurs, and you may run into the following problems:

Replication conflicts will be created

For example, if you are modifying all documents in a frequently used application with hundreds of replicas, then ALL the documents in the database will need to get replicated to all the servers that hold a replica of the application. The greater number of replicas and users you have, the greater the risk that someone is editing some of these documents on another server at the same time as you are making your changes. This will result in replication conflicts (Fig. 3). Of course whether or not they’ll be created depends on the form-level conflict handling settings.

grid copy paste

Fig. 3 – Replication conflicts displayed in the Replication Auditor panel of scanEZ.

Deleted documents may come back

If you make modifications to all documents in a database, and some of those documents have been deleted in another replica, you may end up with a bigger sequence number for those documents than for their deletion stubs in the other replicas. In this case, the deleted documents will take precedence during replication and the deletion stubs will be removed!

Documents will be marked as changed

SSince @Modified will be changed, the new modification date may be visible to users if it is used in views or elsewhere.

Documents will be marked as unread to all

The documents will be marked as unread to all if this is managed in the database.

@Author field may be affected

The official last modifier of the documents will be changed to the name of the user who launched the agent or to the name of the agent signer (if the agent is running on a server). This name will be added to the $Updatedby special item, which is used by the @Author formula. Therefore there may be an impact on what is shown to users if this function is used in the User Interface or in any view.

How do we modify documents, then? The scanEZ way

In most cases the documents you really need to change make up only a small portion of all documents in the database, and scanEZ offers great ways to search for, analyze and narrow down subsets of documents. Use “My Selection” virtual folders to separate and inspect documents. You can add to, remove from, or intersect with other My Selection folders, making it easy and fast to get to the actual documents to be analyzed or changed.

Let’s look at a couple of scenarios encountered at customer sites:

Example 1: Dealing with ghost documents

Customer X ended up with ghost documents in a large application, and wanted to get rid of them. In order for this to work, he needed the “Added in this file” date stamp to figure out how long it took for the document to replicate, and compare this with the Deletion Stub lifetime setting. Since this property can’t be returned using formulas, he ended up using an API call and saving an additional item on all documents, just so he can create a view using this item value to find ghosts.

Result: All documents were modified, and countless conflicts were created.

The scanEZ way: The Post Replication Auditor in scanEZ is a powerful “ghost finder” tool that gathers the “Added in this file” date dynamically. Using this tool, find and remove ghost documents within minutes while avoiding modifying/saving every document in your application (Fig. 4).

grid copy paste

Fig. 4 – Use scanEZ’s Post Replication Auditor to find and remove ghost documents within minutes.

Example 2: Investigating production databases

Customer Y has a web application on both a development and a production server with the design changes being only published weekly. Developers of the application needed to find troublesome documents, which was something they could only do in the production database. Since they didn’t have any tools to do it, they had to create a view to reveal these documents.

Result: It took developers 3 tries to get the view selection formula and columns right, and ended up taking 3 weeks to find the problematic documents.

The scanEZ way: Using scanEZ’s Diff > Values function and the Display Titles functionality allows you to work with formulas without making any changes in the production database design, so that these documents could be found within minutes (Fig. 5). You can remove these instantly or analyze them further by adding them to a My Selection folder via the right-click menu.

grid copy paste

Fig. 5 – Use scanEZ’s Diff > Values function to work with formulas without making any permanent changes in the production database design.

In conclusion, before processing any Notes documents through code or simply by saving changes, make sure you understand the implications at the level of the Note document identifiers mentioned earlier. You can certainly avoid many problems by using formulas to their full extent. In many cases they’ll be able to help within a Notes view or in scanEZ.