Importing a million records daily in ServiceNow (Pre-Quebec)

Published On: 2020-08-23, Reading Time: 5 minutes

Note: Make sure to check out part 2 of this (as this is no longer the most performant solution).

I had a client ask me to create an integration with a service that will create/update 1 million records a day and store that information in ServiceNow.

I am not used to putting my performance hat on when working with ServiceNow because most of my previous projects are focused on client side optimization (utilizing javascript functions or reducing the number of calls to the backend).

So I took this as an opportunity to jump into the deep end.

ServiceNow is great about documentation, but it can be hard to find official best practices. This post is dedicated to my lessons learned and experiences. I hope it helps someone else in the future.

Architecture

Let’s start with deciding the architecture. There are a couple questions we have to ask ourselves:

  1. How is the data getting into ServiceNow? Is it going to be a push or pull model?
  2. What technology are we going to use to import the records?
  3. What performance considerations do we need to keep in mind?
  4. What scope are we doing this in?

In this example, we are going to answer them as the following:

  1. The external application is going to push data into ServiceNow. The external application wants to keep a standard format (JSON) for pushing to multiple applications. This narrows down our options
    1. We won’t be using Data Sources to pull the data.
    2. We can’t utilize ServiceNow’s standard import set APIs
    3. We can explore some of the event management APIs, but we want to consider licensing / modules. In this case they didn’t have ITOM.
    4. Data streaming would be a nice way to do this (and probably another blog), but the client isn’t on Paris.
    5. This leaves us with a Scripted REST endpoint as the solution I chose.
  2. Since we now know how data will be received, we have a few options on how to import the data.
    1. We could just use a standard GlideRecord and inspect the payload and set the fields for each record sent.
      1. I chose to not do this because, I want a ServiceNow admin to not have to write a lot of code if it changes in the future.
      2. Another reason could be performance. If we are opening 1 million GlideRecord’s it could be painful for ServiceNow to handle
    2. I went with utilizing the Import Set Javascript API
  3. The first performance question that comes to mind is: cleaning up the data.Since we are using a staging table and transform map, once we successfully import the data, we need to clean up the staged data. We don’t want to keep a million records growing daily. The second performance question that comes to mind is: Will import sets survive having 1 million records in a single import set?
    1. So what is the most performant way to clean up the million records?
      1. Let’s go to our friend google. I tried: “mass delete records in ServiceNow”. After reading a few KB articles, I stumbled upon the “Mass Deletion and Excess Data Management” article.
        1. This article was extremely helpful in understanding the pros and cons of each type of data cleanup. The summary was: “all methods are slow”, but some are better than others. I chose the Table Cleanup Policy (Auto-flush -> sys_auto_flush_list.do) in this case over a scheduled Job. You could probably do an “OnAfter” in a transform script and delete while you are importing, but I was worried that it would cause too much stress on the system.
    2. So what is the best structure for large import sets?
      1. Thankfully, I googled “increase performance scripted rest endpoint servicenow 1 million records” and the 3rd link (Troubleshoot import set performance) down gave me some ideas. If you scroll to the bottom of that link, there is a section for importing large data sets and make sure you split them into multiple import sets. For example, 10 import sets with 10,000 records is better than 1 import set with 1 million records.
  4. This one is always a tricky one but relates to number 2. In this example, the client wanted it in the HR related scope. If your example is in a different scope, just find the correct Import Set API utils needed.

Now that we understand the architecture, we can draw it and start building it. Since this blogging platform isn’t tailored to creating UML based diagrams, I will just import a picture.

The Magic

Alright. Enough talk. Show me the magic.

Scripted REST Endpoint

var staging_table = 'some_table';

    var requestBody = request.body.data;
    var responseBody = {};
    var data = requestBody.data;
    var sID, impSet;
    //Make sure the payload exists in the request
    if (data && typeof(data) != undefined && data.length > 0) {
        var size = 10000; //10 for testing;
        var i;
        var len = data.length;

        var gr = new GlideRecord(staging_table);
        //Check if the import set is already full
        if (data.length > size) {
            //multiple import sets
            while (data.length) {
                var arr = data.splice(0, size);

                for (i = 0; i < arr.length; i++) {
                    gr.initialize();
                    gr.u_field1 = arr[i].field1;
                    gr.u_field2 = arr[i].field2;
                    gr.u_field3 = arr[i].field3;
                    sID = gr.insert();
                    //retrieve Import Set sys_id from the last record inserted in the batch
                    if (i == arr.length - 1) {
                        impSet = getImportSet(sID);
                    }
                }
                //close the active import set
                closeImportSet(impSet);
            }
        } else {
            //just 1 import set
            for (i = 0; i < data.length; i++) {
                gr.initialize();
                gr.field1 = data[i].field1;
                gr.field2 = data[i].field2;
                gr.field3 = data[i].field3;
                gr.field4 = data[i].field4;
                sID = gr.insert();

                //retrieve Import Set sys_id from the last record inserted in the batch
                if (i == data.length - 1) {
                    impSet = getImportSet(sID);
                }
            }
            //close the active import set
            closeImportSet(impSet);
        }
        response.setStatus('201');
        responseBody.size = len;
        response.setBody(responseBody);
    } else {
        response.setStatus('400');
        responseBody.error = 'Please provide format in the following spec: {data:[]}.';
        response.setBody(responseBody);
    }
    
    //Return the import set sys id as a string
    function getImportSet(sID) {
        var grSHCJMS = new GlideRecord(staging_table);
        if (grSHCJMS.get(sID)) {
            return String(grSHCJMS.sys_import_set);
        }
    }
    // Close the import set, so we can create a new one
    function closeImportSet(impSet) {
        var util = new global.HRIntegrationsUtils();
        util.setImportSetState(impSet, 'processed');
    }

Auto-flush Record (make sure it is in the same scope of the staging table)

Tablename: <the name of the staging table>
Matchfield: <sys_created_on>
age in seconds: 86,400 (1 day)
Active: true
Cascade delete: false
conditions: sys_import_stateINprocess,cancelled^EQ
comments powered by Disqus