Kernel by Subtraction

VistA    The Veterans Health Information Systems and Technology Architecture (VistA) is a massive suite of more than 100 health care related software applications. [1] From a technical perspective VistA consists of approximately 30,000 routines (programs) and 2,500 data files encompassing more than 60,000 discrete data elements.

    The development of VistA was publicly funded and therefore VistA is considered public domain in the United States.  For convenience, the Department of Veterans Affairs (DVA) releases VistA updates on a periodic basis, under the US Freedom of Information Act (FOIA).  This FOIA VistA adaptation is nearly the same as the version that is used internally by the DVA, except for redaction of proprietary (add-on) elements, and components that relate to internal security. As of this writing, the most recently updated VistA database may be downloaded from the Open Source Electronic Health Record Alliance (OSEHRA) FOIA VistA download site as a roughly 900 MB zip file.  When extracted, the Intersystems Caché database is nearly 4 GB in size.

Infrastructure    A collection of special applications reside at the heart of VistA.  These uniquely important components provide common services to VistA applications, including, communication services, user and programmer interfaces to the MUMPS database, security services, task scheduling and other system management functions, as well as programming and application sharing tools.  This infrastructure called the Kernel provides a rich foundation for VistA, as illustrated by the long standing and frequently cited VistA onion diagram.

    Perhaps as significantly the Kernel may also be regarded as a general purpose database management environment, well suited for organizing and managing any large-scale application, whether or not the application relates to healthcare.  It would be difficult to overstate the power of the VistA Kernel as a database application development environment.  While the MUMPS File Manager (Fileman) has found use in a great many non-healthcare applications throughout the world, and is therefore relatively well known, other components of the VistA Kernel remain less well known or appreciated outside the context of VistA itself.
CPRS - An integrated application
    It may be possible to generate the VistA Kernel constructively by installing the base package and patches, along with Fileman, Mailman, HL7, the RPC Broker, Vistalink, and so forth.  That is how the work described on this page would have been done in the 1990’s. To indicate the scale of such an undertaking it should be noted that Kernel 8.0 alone has more than 600 patches at the time of this writing.  One might therefore inquire whether it is possible to generate the Kernel by alternative means.

    Because VistA applications are highly integrated, attempting to disentangle them once they have been configured into the VistA whole
is generally regarded a fool’s errand. However, if it may be agreed that a small number of residual elements that are not strictly needed by the Kernel can be tolerated without causing harm, then perhaps it is not unthinkable to create a Kernel development environment by whittling away those parts of VistA that are not Kernel, which is to say nearly all of VistA. The exercise to be described in this page begins from the assumption that such a deconstruction will be possible, unless proven otherwise.

    The motivation for undertaking this effort in the first place was to create a small footprint database suitable for non-VistA web application development using EWD.js. Links to the results of the present exercise will be included in the next section, including links to download the Kernel database, as well as some of the tools used to produce the reduction.

End Point

    The remainder of this page will deal with methods that were employed to reduce FOIA VistA to its generic core components—the Kernel and associated tools. However, it is possible that the result itself may be of greater interest than the methods of achieving it. Download links found immediately below this paragraph point to the end product, a reduced database that is based on the July 2015 FOIA cache.dat. This reduction was obtained by running a suite of reduction tools on a recent FOIA database installed under Intersystems Caché. A summary of the run that produced the reduced database is included at the bottom of this page.

Reduced VistA cache.dat (Kernel+)
K0715.zip (routines and globals for GT.M)
Routine SISBNP (analysis and reduction tools)
Routine SISZWR (database export utility)


Starting Point

    This project begins with the FOIA VistA database (cache.dat) file (available for download from OSEHRA).  An intuitive place to start an examination of the VistA database would be the PACKAGE file.  Other obvious candidates for supplying potentially useful information about the contents of the VistA database are the BUILD and INSTALL files.  Then there are, of course, the Fileman dictionary of files (File number 1) and the data dictionary (^DD global).  However, intuition is not always the most practical guide.

    In any case, the initial objective will be to perform some sort of preliminary analyses, with a view to devising a practical plan for the database reduction—not necessarily rigorous or sharp-edged, but effective in the sense of producing a significant size reduction, and being as nearly bug-free as possible (not leaving troublesome stray pointers).  Thus the plan will be conservative, erring on the side of not removing something if it is unclear whether or not it belongs in the reduced database.

    It may be helpful to consider the problem in terms of the components of a VistA application.  These
are components of the same types as are itemized in BUILD definitions: files, sub-files or fields with or without data, and a page-long list of other types including templates, options, routines, parameters, protocols, remote procedures, and so forth.  Templates and forms may as well be included with files, for a reason that will become clear. In general, files and routines make up the largest part of an application.  Other components including options and protocols and everything else contribute a minor fraction to the size of applications.

    Applications to be preserved include: Fileman, Kernel device management, Mailman, KIDS, Taskman, the RPC Broker and VistA foundations (Vistalink and web services), the Kernel option driver, List manager, User management support, HL7/HLO, and Kernel parameters and alerts—as well as a few generically useful demographic files (state, county, zip)—perhaps the term Kernel+ would be appropriate for the latter.

Step 1 - Identify files to delete or preserve (include associated templates and forms). After deletion, remove empty globals.
Step 2 - Identify routines to delete or preserve.  After deletion, remove ROUTINE file entry.
Step 3 - Remove remote procedures that refer to non-existent routines.
Step 4 - Remove options that refer to non-existent routines.  Then remove references to non-existent options, etc. (recursively).
Step 5 - Repeat step 4 for options that refer to non-existent files.
Step 6 - Remove unwanted protocols by application namespace..
Step 7 - Identify parameter definitions to delete or preserve.  Remove unwanted Kernel parameters and parameter definitions.
Step 8 - Miscellaneous cleanup.  Zero-out the AUDIT file.

    The step-by-step outline in the table above may be considered a high-level plan for reducing the database to a reasonably clean state.  Of course, this outline does not specify how to identify components for removal. However, it will be seen that by first identifying files and routines to be deleted or preserved, other components such as options and remote procedures that depend on files or routines can be handled automatically. For the sake of this exposition, details for each step will be described in the order listed. This order is also in part a logical order for performing the steps, although deviations are possible.

Step 1


    VistA files are Fileman files and Fileman files are documented in the FILE file, and of course in the Fileman data dictionary global ^DD.   Low numbered files belong to Fileman itself, as illustrated by the following partial listing:

Low numbered files

    File number 2 is the VistA PATIENT file, possibly the most important file in the entire VistA suite, although obviously any file that participates in an application is important. Additional Kernel files follow immediately after the VistA PATIENT file in the 3.x number range, including the DEVICE and TERMINAL TYPE files, and many others. Clearly, files supporting device managment must be preserved.  A little further along (in numeric order) are the STATE (#5) and POSTAL CODE (#5.12) files. While these are application files, they are of potential generic usefulness, and should also be preserved for the Kernel+ database.

    Continuing the examination of files in ascending numeric order one soon arrives at a number space where many successively numbered files belong not to the Kernel but to VistA applications. In the VistA architecture, applications generally have assigned number spaces, such that files belonging to the same application are found in the same sub-space of numbers. However, it is not true that after a certain middling point the remainder of the files are all applications.  For example, the PROTOCOL file is number 101 while HL7 pops up in the 770-780 range, and Kernel parameters are near the stratosphere at 8989.x.

    By and large though, files-to-keep and files-to-be-discarded fall into contiguous groups and not a great many such groups exist.  Therefore without even the aid of such resources as the PACKAGE file or BUILD file, it is possible to classify groups of files quickly, producing a list either of files-to-keep or files-to-discard.

Files (Keep List)

    Following this general strategy one generates a files-to-keep list.  A few iterations of editing and generating the list, followed by examination of results, suffices to reach a stable conclusion. The discard or cull list is, of course, the complementary set, consisting of files that are not in the keep list (computed at tag DFILES in routine ^SISBNP). The files-to-delete list includes the largest file in VistA, the Lexicon EXPRESSIONS file, which in the July 2015 FOIA database numbers 1,594,604 entries.

    When Fileman deletes a file (UTILITY FUNCTIONS, EDIT FILE), it optionally deletes the file’s data, templates and forms.  Therefore the present step may be greatly simplified by constructing a wrapper that calls the File Manager’s delete code at 61^DIU0.  In the following excerpt, tag 6 is the Fileman delete wrapper and tag XFILES traverses the list of files to be deleted.

Code to delete files

    Some VistA files reside on global subscripts, whereas others, such as the PATIENT file, take up an entire global.  Therefore, after deleting files it is a good idea to remove globals that no longer contain any files (any data). The convenience code reproduced below relies on the ^$GLOBAL structured system variable, and is therefore Caché-specific.  However, it is easy also to traverse the global directory in GT.M, so the following code could be readily adapted to GT.M.

Kill empty globals1

    It would probably not be interesting to describe every data reduction step in the same detail as has been given for the file deletion step. Instead the following paragraphs will highlight only what is unusual or of specific significance about each remaining step. Additional details can be understood through examining the implementing code.

Step 2

   
    The procedure for choosing which routines to keep resembles the process of selecting files to keep, in that it relies on a human decision process and is not automatic.  At first, it seemed that some computational procedure based on the PACKAGE file might be feasible.  However, VistA is not well encapsulated—applications embody interdependencies, so-to-speak. It is not trivial to construct a practical selection of routines, based on readily accessible sources of documentation.  For this second step, as well as the first, it is simplest to construct a custom list of routine namespaces to keep.

Routines to Keep

    While both files and routines are identified manually by applying knowledge of VistA application namespaces and number spaces, file deletions and routine deletions are the most crucial steps in the process of reducing VistA to the target Kernel+ database.  Tag DRTNS^SISBNP takes two optional parameters.  The first is a $NAME specification for returning a list of routines to be deleted, subscripted 1, 2, 3 ...  If no name is specified the list is compiled in ^UTILITY($J).  The second parameter is a flag that specifies
actually to delete the routine, using ^%ZOSF("DEL").

    Subroutine DRTNS relies on the Caché-specific ^ROUTINE global for a list of all routines.  Therefore this subroutine cannot be used in GT.M unless modified to obtain the routine list in a different way. While routine selection relies on a platform-specific construct, the actual deletion is platform-independent, as it calls whatever method is specified in ^%ZOSF("DEL"). Once unwanted routines have been deleted, the subsequent existence of a routine serves to validate other components such as options and remote procedures that depend on the given routine.  Similarly, options that reference files (edit, print, inquire options) can be validated by the existence of the corresponding file. That is why it is useful to remove files and routines first.

Step 3


    VistA includes a security feature known as ‘RPC Broker Context’ which creates a particular type of option
dependency (‘b’ type option) on remote procedures. Therefore, it is necessary to address cleanup of the REMOTE PROCEDURE file before proceeding to the OPTION file.  This is a particularly easy step because the REMOTE PROCEDURE file has a ROUTINE field.  If the named routine exists, keep the remote procedure; otherwise delete it.

Delete RPCs


Steps 4 - 5

    OPTION file cleanup is possibly the most complex step in the process of reducing the dataset.  Some options require routines; others require files.  In addition, options may themselves be named as items on menu type options, or similarly, broker-type options may point to deprecated remote procedures.  Thus, reducing the OPTION file becomes an iterative process. First, simple options that refer to non-existent routines or files are removed.  Then menu items that refer to
non-existent options are removed from menu-type options.  If a menu has no remaining items after defunct items have been removed, then the menu option itself is deleted. But then the deleted menu may have been a sub-menu on another menu option, and so on, so the process must be repeated until an internally consistent collection of options remains. At the end, only simple options and menus of resolvable items should exist.

    This procedure for cleaning up the OPTION file is not rigorous. Although it is trivial to pull a routine name from the ROUTINE field, it is more challenging to detect a routine name in ENTRY ACTION or EXIT ACTION code. A non-obvious routine reference (e.g., indirect reference) in one of these fields could easily be missed. The program resorts to a sort of guessing.

Heuristic to guess routine

    It should also be noted that some option types such as protocol and server types and other less used types are ignored. Nevertheless, after removing options by applying the iterative procedures outlined above, the result will be a much smaller set than the entire VistA set (more than 10,000 options in July 2015), and will contain very few unwanted entries.

    The subsidiary entry point for option cleanup is XOPT^SISBNP. Running this procedure produces a report such as the following:

Iterative Option Cleanup

    Nothing but zeros in the final run indicates that OPTION cleanup is complete. This procedure removes about 90% of OPTION file entries. As previously noted, implementing code for this process will reveal additional details.

Step 6

    The PROTOCOL file could be culled by a method similar to the OPTION file reduction. However, tagging protocols for removal based on their content seems more daunting than the same chore for options. 
Therefore, rather that attempting to reproduce the complexity of the OPTION reduction method, it was decided to eliminate unwanted protocols by namespace.  Most protocol names begin with the 2, 3, or 4-character application namespace. Comparing to the list of routine namespaces produces a result that seems reasonable. More than 4000 PROTOCOL file entries were decreased to just over 200 by application of this method. It is possible of course, as with any of the reductions, that some removed entries should have been kept (or the converse).

Step 7

    How far to go with the cleanup of files is a somewhat arbitrary decision.  There is little harm in having some junk left over in a functional Kernel environment. Indeed the final product of the present exercise includes data that reference non-existent entities.  For example the BUILD and INSTALL files are not modified.  It would be possible to continue refining the reduction process by adding utilities for presently unhandled references, and then including calls to these utilities in the main wrapper at EN^SISBNP.

    Step 7 addresses parameters and parameter definitions.  The PARAMETERS file is interesting because its entries correspond to individual instances, rather than to all instances of a parameter.  The #.01 field of the PARAMETERS file is a variable pointer named ENTITY that points to multiple files. But I digress. The following code excerpt addresses the PARAMETERS file, again using application namespace as the selection criterion for deletion. A similar pair of entry points XPARD and XXPARD identifies and deletes unneeded entries in the PARAMETER DEFINITION file.

Parameters File Reduction

Step 8

    The last numbered step that was included in the project plan table near the top of this page addresses loose ends. VistA has a ROUTINE file that documents application routines. Names must exist in this file in order for the named routines to be transported in a KIDS build, for example. In this last step, routines that no longer exist are deleted from the ROUTINE file.

    At this writing, the FOIA database
AUDIT file contains uninteresting data (as originally distributed). It is more logical to start auditing from scratch in a reduced database. Therefore the current step zeros-out the AUDIT file.

    Finally, it is noted that additional cleanup utilities can be appended to the reduction procedure, as they are developed and tested.

Example Run - Reducing the July 2015 FOIA VistA

    The listing below shows the specific run that produced the reduced July 2015 database listed for download in the section titled “End Point” near the top of this page. Ellipses have been substituted where output data are repetitive.

K0715>ZL SISBNP D EN

This procedure will delete many files and routines and other components
in a way that cannot be undone.  Before continuing make sure that this
is the intended namespace for reduction, and that a backup exists.

Do you wish to continue? NO// YES

Deleting files ...
   >>File# 2  (PATIENT)
   Deleting the DATA DICTIONARY...
   Deleting the INPUT TEMPLATES......................
   Deleting the PRINT TEMPLATES..............
   Deleting the SORT TEMPLATES..........
   Deleting the FORMS...
   Deleting the BLOCKS...
   >>Removing File# 2 from FILE file (if not already removed)

   >>File# 7  (PROVIDER CLASS)
   Deleting the DATA DICTIONARY...
   Deleting the INPUT TEMPLATES...
   Deleting the PRINT TEMPLATES...
   Deleting the SORT TEMPLATES...
   Deleting the FORMS...
   Deleting the BLOCKS...
   >>Removing File# 7 from FILE file (if not already removed)

. . .
. . .

   >>File# 9999999.41  (IMMUNIZATION LOT)
   Deleting the DATA DICTIONARY...
   Deleting the INPUT TEMPLATES...
   Deleting the PRINT TEMPLATES...
   Deleting the SORT TEMPLATES...
   Deleting the FORMS...
   Deleting the BLOCKS...
   >>Removing File# 9999999.41 from FILE file (if not already removed)

   >>File# 9999999.64  (HEALTH FACTORS)
   Deleting the DATA DICTIONARY...
   Deleting the INPUT TEMPLATES....
   Deleting the PRINT TEMPLATES.....
   Deleting the SORT TEMPLATES......
   Deleting the FORMS...
   Deleting the BLOCKS...
   >>Removing File# 9999999.64 from FILE file (if not already removed)

Removing empty globals
...SORRY, HOLD ON...
Global cleanup complete.

Deleting routines ...
...HMMM, HOLD ON...
27467 routines were deleted.

Deleting remote procedures ...
3098 REMOTE PROCEDURE file entries deleted.

The OPTION file will be cleaned up iteratively ...

6264 deprecated RUN ROUTINE options deleted.
6889 deprecated MENU items removed.
652 empty MENU options deleted.
1152 unresolved ENTRY or EXIT actions. Corresponding options deleted.
398 references to nonexistent files - Corresponding options deleted.
1646 deprecated MENU items removed.
394 empty MENU options deleted.
3038 deprecated RPC broker items removed.
53 empty RPC broker options deleted.

. . .
. . .

0 deprecated RUN ROUTINE options deleted.
0 deprecated MENU items removed.
0 empty MENU options deleted.
0 unresolved ENTRY or EXIT actions.
0 references to nonexistent files
0 deprecated MENU items removed.
0 empty MENU options deleted.
0 deprecated RPC broker items removed.
0 empty RPC broker options deleted.

OPTION file reduction complete.

Removing protocols ...
3977 PROTOCOL file entries deleted.

Removing deprecated Kernel parameters ...
1635 PARAMETERS file entries deleted.

Removing parameter definitions ...
820 PARAMETER DEFINITION file entries deleted.

Final cleanup - Clearing audit data ...
AUDIT file has been cleared.

Removing ROUTINE file entries that refer to non-existent routines ...
...SORRY, HOLD ON...
24712 ROUTINE file entries deleted.

Database reduction is complete.


    The preceding illustration shows the output of running the reduction program, which is equivalent to performing each of the 8 steps in numeric order. However, the process of producing the reduced VistA database also involves meta-steps. These steps depend in part on the platform that is used to extract the Kernel from the FOIA database. The FOIA database itself is distributed in Intersystems Caché format, that is, as a cache.dat file. If the reduction will be carried out under GT.M, then is necessary first to port the
cache.dat file to GT.M. In a sense, though, it is simpler to reduce the database before porting it. The zip file that was created in Windows and referenced for download in the “End Point” section of this page contains the reduced database in a format suitable for import to GT.M. Convert the routine export file K0715.RO to Linux format (DOS-to-UNIX end-of-lines) and import the routines using the GT.M ^%RI utility. The .zwr global files that are included in the zip are already in Linux/UNIX format and may be loaded into the GT.M database using the mupip utility. The procedure is the same as it would be for porting the original FOIA database to GT.M. All this is documented elsewhere.

Meta Steps

    Because meta-steps depend on the platform and possibly other contextual resources, it is not possible to present a universal recipe for the overall reduction process. The following
Caché-based step-by-step guide should provide sufficient information to permit reproducing equivalent procedures in either Caché or GT.M. It is assumed that a recent FOIA database has been downloaded and installed. This database is referred to as FOIA, in the notes below.

Kernel by Subtraction - Caché meta-steps

1. Create a stub database for Kernel - Called KERNEL in notes below.

2. Dismount FOIA and KERNEL databases.

3. Copy FOIA database to KERNEL (Original size is multi-Gigabyte).

4. Mount both databases.

5. Load or copy routines ^SISBNP and ^SISZWR to KERNEL namespace.

6. Run the reduction in the Kernel namespace - ZL SISBNP D EN

7. Create target HFS folders .\r and .\g (for export to GT.M).

8. Save all routines.

K0715>D ^%RO

Routine output (please use %ROMF for object code output)
Routine(s): %DT*
Routine(s): %RCR
Routine(s): %Z*
Routine(s): *
Routine(s):

Description: Kernel by subtraction (AKA Reduced VistA) <date>

9. Export globals in GT.M format -

    D FOIA^SISZWR("<path>\Kmmyy\g",$C(10),1)

Shrink Database

10. Create sub-folder .\Cache_Global_Export.

11. Export globals in block format using System Management Portal.

    Export finished successfully (less than 1 second)
    [If Cache displays names of globals that have been deleted, ignore.]

12. Dismount KERNEL.

13. Create SCRATCH 1 MB database (not mounted) and copy to KERNEL.

14. Mount KERNEL.

15. Use System Management Portal to import globals.

    Import runs in the background but completes in a second or two.

16. Load routines from routine save set Kmmyy.RO.

17. Run global directory, then spot-check Fileman and Kernel.


Kernel on Raspberry Pi

    I recently learned
about a lesser known MUMPS platform that is compatible with Raspberry Pi. It is called  simply MUMPS  (http://mv1.mumps.org/, http://mumps.sourceforge.net/). I wondered whether it would be possible to port the VistA Kernel to this ‘minimalist platform? To test the idea I applied the reduction process of the preceding paragraphs to the April 2017 FOIA VistA distribution (http://foia-vista.osehra.org/DBA_VistA_FOIA_System_Files/). This 5-minute demo video features a particular (i.e., selected) positive outcome of the exercise.

    As one might guess, not all parts of the VistA Kernel could be demonstrated in this way.  In short, getting Taskman to run and to execute a scheduled task may be little more than a curiosity. This demonstration does not imply that the Pi + VistA Kernel platform is ready for serious development use at this time.  On the other hand, with effort other Kernel features could be adapted to the platform, perhaps as a foundation for advanced database applications that exploit Pi
s small footprint, or its uniquely convenient interfacing options.

    One further caution, ... I read somewhere that Raspberry Pi should not be used in
‘mission critical’ applicationsVistA kernel-based healthcare applications would surely be considered such.