By Brian Arnold
If you have used Teamstudio Adviser you will know that there is a lot of data that is collected from the Notes databases that are on your Domino servers. A typical Adviser environment will have on average at least 200,000 documents within its database. In some of our client environments there is in excess of a million documents contained across multiple Adviser databases. “How do you capture and store that data in a way that is efficient and fast?” is a question that has been asked many times during demonstrations and deployments.
So, how do we store all of that data and present it in a manner that is both fast and does not make the Notes database so large that it is unusable?
We use a technique called Data Normalization. Database Normalization is a technique used to organize a database into tables of data. The idea is that a table should be about a specific data type and that only supporting data is included in that table. By limiting a table to one purpose you reduce the amount of duplicate data contained within the database. This eliminates issues stemming from data modifications when multiple processes are running on the same set of data.
This is a very common technique in relational databases. In fact, the idea of data normalization was first proposed by British IBMer Edgar F. Codd in 1970. Data normalization is not, however, generally used in a Notes database. Because Notes is based on a hierarchical document-oriented data structure, most developers will duplicate their data across multiple documents to allow them to use the data within views. This is in fact a very common way to store and display data in a Notes database, but the downside is that if a value changes on Document 1 it must be transferred to Documents 2, 3, 4, etc. This process causes an excess of data commits in the Notes database which, in turn, can drastically decrease the performance of your processes and increase the amount of disk space needed to store the data.
Since all of the data in Teamstudio Adviser is presented to the user via a web browser interface we do not have the issue with having to synchronize data across multiple documents like in a traditional Notes database. Instead we perform a process of data joining, which combines the normalized data for display purposes by using a common “key identifier”. This makes both the loading of the interface fast and the consumption of disk space low.
For example, within Teamstudio Adviser we capture the following information:
Domino Catalog information for each database;
Usage statistics from the Domino Activity Logging task;
Database Design Complexity based on the number and content of the design elements in all Notes database on each Domino server being scanned.
Now, let’s say the information that identifies a Notes database (Title, Domino Server, Location, Replica ID, etc.) has a total size of 1 KB of data. That doesn’t seem like a lot, until you start calculating how much data would need to be transferred to other documents:
Data Type: Catalog
Number of Documents: 100,000
Size of Shared Data: 100MB
If we were also to duplicate this data on every other document that was associated to the Catalog documents, we would essentially triple that data size to 300MB; 100MB on the Usage documents and another 100MB on the Complexity documents. And if there were a lot more Catalog documents, then the required disk space needed keeps growing and growing. Not only will the database grow in size, but the time required to pass around 100MB of data multiple times would result in a huge performance deterioration.
So, we can see how the data can get large quickly. This is why we use the Data Normalization technique.
But what if you wanted/needed to perform a data join in your Notes environment? The process is actually quite simple once you know the components needed:
Main Document: Catalog
Found In: View “cvDbsById”
Key Identifier: “ID” Field value
Additional Document: Usage
Found In: View “uvDbsById”
Additional Document: Complexity
Found In: View “xvDbsById”
You can create a very simple LotusScript routine that can combine the above information all into a single temporary document that you can then use for any number of purposes, such as:
Reporting on all data at once;
Creating combined data for display in Notes Views.
Below is an example of some LotusScript that can combine all of the data from the three document types above into a single document that is in memory only. I use the first Catalog document in the Catalog View. With the document I also create another document that is emailed to the current user with a full list of Fields from all three documents:
Dim session As New NotesSession
Dim db As NotesDatabase
Dim catView As NotesView
Dim useView As NotesView
Dim comView As NotesView
Dim catDoc As NotesDocument
Dim useDoc As NotesDocument
Dim comDoc As NotesDocument
Dim key As String
Set db=session.Currentdatabase
Dim combined As New NotesDocument(db)
Dim email As New NotesDocument(db)
Dim body As New NotesRichTextItem(email,"Body")
Set catView=db.Getview("cvDbsById")
Set useView=db.Getview("uvDbsById")
Set comView=db.Getview("xvDbsById")
Set catDoc=catView.Getfirstdocument()
key=catDoc.Getfirstitem("ID").Text
Set useDoc=useView.Getdocumentbykey(key)
Set comDoc=comView.Getdocumentbykey(key)
Call catDoc.Copyallitems(combined,True)
If Not useDoc Is Nothing Then Call useDoc.Copyallitems(combined,False)
If Not comDoc Is Nothing Then Call comDoc.Copyallitems(combined,False)
Dim item As NotesItem
ForAll i In combined.Items
Set item=i
Call body.Appendtext(item.Name+":"+Chr(9))
Call body.Addnewline(1,False)
Call body.Appendtext(item.Text)
Call body.Addnewline(2,True)
End ForAll
email.Subject="Combined Adviser Data Example"
Call email.Send(False,session.Username)
We use a very similar method within our Teamstudio Adviser Data Extraction Utility to join the data in Teamstudio Adviser and then create a .CSV file which can combine all of the data from the various documents into a single row of information. This can then be very easily used in external data reporting applications such as Microsoft Excel or Power BI.
Hopefully I’ve shed some light on how we store the various data types in Teamstudio Adviser and how we used that data to present the web interface and also extract the data for reporting purposes. You can even use these same methods for making your Notes databases smaller and more performance tuned!