In this post we will discuss a framework which I have designed and implemented which is used to store and manage “Request Data” separately from the golden source or actual database.
The root concept and benefit of this, is that the Request Framework plus the Request Data Model enables a system or application to keep in-flight data that may or may not be associated with a workflow process separate from the live production data.
Because of this, we sometimes refer to this as In-Flight or Staging Area data.
Another common use for this framework is in large scale complex web applications such as a Tax Forms application or some other web application with potentially Hundreds of fields.
While a user is working on the Data Record (which includes the sum total of fields a Request Type or set of screens within the application supports editing on), which we refer to as the “Request” itself, the system has the freedom of saving the data at any time without affecting the Live Production or what we refer to as the Golden Source Record or Copy of the data.
A common web application scenario where you might want to use this, is when for scalability purposes you don’t want to store the Request object in the HTTP Session and instead you just store the Request Id which associates the user with a record in the Request Data Model. Now, say in a “Wizard-like” application, every time the user clicks the Next button to proceed to the next section of the set of forms, the app can save the request updates to the database, again without affecting the Golden Source until the entire full set of form pages in the Wizard’s guided path are completed.
In a workflow based system, you can use this concept to be able to store the data in a persist data store such as a database while the Workflow request itself is traveling from step to step or queue to queue in the Process path. Complex business workflows sometimes take days or even weeks to complete a single request in some certain circumstances (say if you want to open an account for a client of a bank and you are waiting for them to send you signed documents) and therefore keeping the request data in a persisted state while in-flight is invaluable in a workflow based system.
Some workflow engines support this concept out of the box, that is storing user defined fields in the workflow database tables itself, however I have found this to be inflexible, and if we refer back to my Adapter-Factory model for Vendor Product Integration, you want to minimize the use of “extended” non-core product functions for the sake of portability.
What is the Request Framework exactly?
The Request Framework is a combination of three components.
- Request APIs
- Store
- Stores in a target abstract data store (aka the Request Database), the Name-Value Pair set, transformed from the in-memory Request Object Model, via the Object Codec (Could be a database or a file, or any other persistent data storage mechanism. I have also used the transformed name-value pairs, to serialize an object over sockets).
- Load
- This is simply the opposite I/O operation of the Store API. It loads the Name-Value Pair form from the data store, and using the Object Codec transforms the data back into an in-memory Request Object Model object.
- Archive
- I use this API to move Requests that have completed their workflow process to duplicate Request Data Map and Narrows Map tables which I call the archive version of these tables.
- This is used to ensure the performance of loading and storing of the requests which are still active in the workflow process is maintained over the lifetime of the application. As Request Counts grow, we don’t want completed, requests which will not be loaded often to slow down the performance of the main tables. The form of the tables, which are described below are very narrow, but become very tall due to the nature of the highly normalized form of name-value pair storage.
- I have put a check in my implementations of the load API to detect if a Request is in the Active or Archive tables, and load the request no matter where it is. This is useful when an auditor comes and wants to see a request from N-number of years ago.
- Clone
- This API again is self explanatory. Often users want to “copy” a request they already submitted and then just change the few fields they need to create the new request. This is one of the key user activities that benefits from this API. However some internal system operations, can also benefit from this API.
- It can also used to clone a request from a production environment to a UAT environment for production support testing and debugging of a production issue with a particular request.
- Delete
- Depending on the nature of the business, you may need to differentiate between physically removing a request from the database and simply marking it as deleted often referred to as a Logical Delete.
- You can put a flag in your request transfer object to this API so the implementation can support both physical and logical deletes.
- Logical deletes are used very often over physical deletes in highly regulated industries, due to auditing requirements.
- Store
- The Object Codec
- The Object Codec implementation that I prefer to use in my own systems will be saved for the next article I post. However for now, all you need to know is that you need a way to “Serialize” a Request Object to some text-based format for fast and easy storage to a Persistent Data Store such as a Database; that’s the Encode Half of the Codec. And the Decode Half of the Codec is the implementation to take the Text-Based form of the Request Object and “De-Serialize” it back to the In-Memory Request Object, once retrieved from the Data Store. The actual Data Store functions are separate from the Object Codec by design, so that many different types of Data Storage implementations can be used without bloating the code of the Object Codec. The only job of the Object Codec should be to Serialize and De-Serialize the Request Object.
- The Request Data Model
- This is the final piece of the puzzle. The Request Data Model is designed to extremely quickly (in the cases of my systems, sub-second) store and load any single Request. In my experience we usually test the performance of the Data Model with an Request Object payload of around 500 to 1000 fields per request.
- The data model must be designed to accommodate the Serialized Form produced by your Object Codec Implementation.
The Request Framework
The Request Framework is the set of APIs that wrap the calls to the Object Codec and the Data Store Persistence layer to interact with the Request Data Model, in my systems this is usually JDBC. I prefer direct JDBC over ORM Frameworks, for both speed and fine-grain control over the SQL to keep to sub-second store and load times usually required by my application users.
Solution Overview:
- Request Objects Flexibility
- Developers can design any complex Java-Bean Compliant Object as a request object, without having to take into consideration the database model.
- Request Objects should encapsulate all fields related to the Golden Source Data Model as Object Model Objects within a root Request object class.
- If it’s a workflow driven system, they Workflow Process Keys should also be contained within the Request Object.
- Request Processing, Golden Source Writes, and Workflow Actions can eventually be handled in a layer I refer to as Smart Persistence. Which we will discuss in a separate article.
- If the Golden Source Data Model contains distinct data entities, than there should be one Request Class for each Data Model Entity.
- Also if required by business requirements, there can be combination Request Types; requests that combine multiple entity types from the Data Model.
- However in my experience you should always start with a single Request Object for each Data Model Root Entity. (Examples: Account Request, Client Request, Product Request)
Serialized Form:
I prefer to serialize or “transform” an Object in-memory to text based Name-Value Pairs. The Name or Key of the pair is the fully-qualified Variable or Field Name using the “.” (period/dot) object notation and “[ ]” array notation for array elements.
There are only name-value pairs for “scalar” non-user-defined objects. Therefore only built-in types, plus Strings, Dates, Enums, and other basic types can be stored as a name value pair. But since all user defined data types are simple Objects which contain the native or built in types for the actual data elements, user-defined objects are stored as multiple name-value pairs, one pair for each variable within the user-defined type.
Expanding upon this, we can store N-level nested object’s data using the Dot object dereferencing notation to create the fully-qualified names.
Examples of Names:
Note: Root Object Name is: AccountRequest (this will NOT be included in the fully-qualified name).
- addresses[0].line1
- addresses[1].type
- ratings.sAndP.ratingValue
- requestorName
- requestId
The values of the name-value pairs are the String representation of the field or variable’s actual value. For a String, this would be the value itself, for numbers (int, float, double, long, short), these are easily converted to text representations. Other built in types such as Date objects which most modern languages support, can be converted either as a parsable Date-Timestamp string which the Decoder/Deserializer can convert back into the data object, or even as a Long integer which is the date’s representation as milliseconds elapsed since some Epoch. The value can be any text representation of the variables value which can be efficiently parsed back into the native data type in-memory once the name-value pair is processed by the Deserializer/Decoder of the ObjectCodec.
Examples of name-value pairs:
- addresses[0].line1 = 123 Main Street
- addresses[1].type = Mailing Address
- ratings.sAndP.ratingValue = AAA
- requestorName = John Smith
- requestId = 6474721
The Request Data Model
The Request Data Model can be reduced to a Conceptual Model of only THREE basic entities or tables. The diagram below shows these tables and their cardinality.
Conceptual Model:
Logical Model:
The Tables:
- Request
- This is the “main” table of the request data model.
- Contained within it is the basic data about a request, otherwise called the “header”
- For each unique Request Id there is one and only one row in this table.
- Table Structure:
- Data Map
- The data map table stores the Name-Value Pairs of the requests.
- For a single unique Request Id, there may be N-number of rows of Name-Value Pairs within the Data Map table.
- There is at least ONE row in this table for every primitive/native built in data type or ObjectCodec supported Data Type within the Java Bean compliant Request Object model.
- The value field is NOT defined as a CBLOB/BLOB, instead for efficiency its defined as a VARCHAR.
- For elements whose data length is longer than the length of the VARCHAR field defined in the database table, we introduce a sequence number field, and the name-value pair is split across the multiple rows.
- When the Request Data Map is being loaded back from the database, the name-value pairs which have been split into multiple rows, will be concatenated back into a single row, using the sequence number to ensure the proper ordering when reassembling the string representation of the variable value.
- If you divide the LENGTH of the VALUE by the MAX LENGTH of the defined VARCHAR field in the database, you will get the number of rows the name-value pair needs to be split into (if it doesn’t divided, evenly just add 1, you can either use modulus for this, or use integer division, then times the result by the length of the VARCHAR field, and minus that from the actual data length. If the result is great than ZERO, add 1 row).
- For elements whose data length is longer than the length of the VARCHAR field defined in the database table, we introduce a sequence number field, and the name-value pair is split across the multiple rows.
- The value field is NOT defined as a CBLOB/BLOB, instead for efficiency its defined as a VARCHAR.
- Table Structure:
- Narrows Map
- This table is only used when a variable or field within the Request Object Model is a base or abstract type (basically we are using Polymorphism), and the field references some sub-class or concrete type.
- The concrete data type information, mainly the fully-qualified class name is stored in this table, associated with the object notation path of the field that references it.
- This is so the ObjectCodec can properly decode complex Request Objects where the original creation code of the Request Object leverage the properties of the language to use Polymorphism.
- This is sort of an extended feature, and in general in your own projects if you want to use this name-value pair design for storing request data, you can leave this part out and just make the coding convention for your project restrict using polymorphism within your request object model.
- Request Xref
- Xref of course is short form for Cross Reference. A commonly defined table in many relational database schemas.
- The Request Cross Reference in this case, is used to store Unique ID or Keys other than the Request ID itself, that are related to the Request.
- These can be IDs for the workflow engine to use.
- They can also be application specific IDs, such as a Golden Source primary key, so that we can track which requests have been associated with that Golden Source record for reporting and audit trail purposes. (Although there are many other ways to achieve this, depending on your data model).
- It can also be used to relate this request to a request within another system, in the case when you have programmatic inter-system integration. (An external system can raise or update data on a request within your system / Enterprise Application Integration).
- Table Structure:
- Workflow States
- This may be a set of tables, depending on your workflow audit trail requirements.
- These tables are defined to store Workflow Step Audit information, such as the usernames and actions the user took at each step within a workflow process for a particular Request.
- Now, the workflow engine itself stores this information, however in my systems I duplicate this outside of the workflow’s native data store, to maintain a loosely coupled state, between my system and the vendor supplied workflow engines; again see my Adapter-Factory Vendor Project Integration Model for more information on this.
The Request Framework Advantage:
I hope from the above description of my Request Framework and Data Model, you can see real world applications where this would be extremely useful in your own applications. I know for me, both on my professional projects and my personal programming projects, I have seen this framework and data model grow and become the most useful tool in my arsenal for tackling complex Golden Source and In-Flight data separation issues, as well as delivering a solution to business requirements of being able to change the Request Model quickly for short time to market releases to production. The framework and data model above definitely delivers to the agile development world. In an upcoming article I will dive deeper into the Object Codec utility which I use in conjunction with the request framework.
Just Another Stream of Random Bits… – Robert C. Ilardi