June 11, 2014
ESB: Validation Is Not Required
XML validation requires a lot of CPU and memory. The connected systems are tested in DIT, SIT and UAT and they are unlikely to produce invalid XML. The backend systems perform validation anyway. Consider validating in test environments only, and running without schema validation in production. Use business validation instead of schema validation.
An ideal ESB layer is invisible. It should not add any overhead to the round trip time.
With businesses aiming at sub-second responses, any extra delay in the ESB can break that promise.
A common cause of performance being killed in an ESB is XML validation. Most designs, without a second thought, run schema validation for incoming requests, and often validate the responses too.
The result is an enormous increase in ESB delay. Let’s see why.
Validation Requires Parsing, Which Kills Streaming
Oracle OSB is a sophisticated system, and it includes a few optimizations for transferring a payload faster.
One of them is so-called streaming. In a nutshell, OSB doesn’t parse the XML unless it has to, and when it does not, the XML is streamed directly from the entry proxy to the output business service.
The time required for the streaming can be sub-millisecond.
However, a proxy using payload validation effectively disables the streaming, making even a direct proxy much slower.
Parsing Is Relatively Slow
Parsing is the building of an in-memory data structure representing the XML.
Parsing itself is a relatively straightforward procedure. It is done in one pass, and generally doesn’t require any reference to elements outside of the current one.
Parsing takes from 0.3ms (2K request) to 250ms (5M response).
Validation Is Much Slower Than Just Parsing
Validation, on the other hand, adds heavy processing on top of the parsing. For each element defined in the schema, the validation has to look up the validation rule and perform a set of operations to check the element’s validity.
In some cases, the validation of a single element could be more expensive than parsing the whole document(!) - think of restrictions defined as regular expressions, for example.
When we enable validation, we get from 5ms (2K request) to 800ms (5M response) of delay.
Parsing Requires Much More Memory
Parsing not only requires the CPU directly, but it also consumes memory. Saxon’s author, Michael Kay, notes, that the in-memory data size can be 3 to 8 times bigger than the request size.
Note also that in streaming mode only one data buffer, possibly 4K or so, is kept in memory at any time, while the whole document model must be placed into the memory for parsing. For large documents, this can easily be 100 times more.
Under a heavy load, memory, not CPU, could become a bottleneck, plateauing the throughput.
Remember also that the subsequent GC work required to collect the parsed payloads adds to the CPU load level.
XML Validation Is (Usually) Not Required
I believe that in most cases XML validation is not required. In fact, it is often harmful.
How can we decide, then, when to use it and when not? Here are a few guidelines I use.
Do Not Repeat The Work
Why do we validate the schema on the ESB, anyway?
Think of it: the service deployed onto OSB is forwarding requests to a backend service. That service itself most likely has validation. Why duplicate the effort?
For the 1% or 0.1% of requests that turn out to be invalid, the consumer will get an error message directly from the backend. Why penalize the remaining 99%?
Do Not Validate What Can’t Be Broken
How could invalid XML appear in a production environment? What system would send it?
ESBs are generally used within corporate networks. There are no public endpoints, exposed to untested consumers. All internal consumers are thoroughly tested via multiple cycles of DIT, SIT and UAT. The chances that any of them will generate non-valid XML are miniscule.
There is no practical reason to validate something that has been tested already and known to work.
But Validate The Public Services
One notable exception is when ESB is used to provide access to external vendors and partners. It is hard to sync release (and thus testing) schedules between two or more companies, so the risk of invalid requests becomes very real.
Partner links must be validated.
Perform Full Validation In DEV/QA
It is a good idea to perform full validation in test environments.
The backends used in DIT and SIT are often mocked or partially implemented, i.e. they may not perform any validation. By performing its own validation, ESB shortens the cycle for detecting and fixing issues.
Perform Business Validation Only
Schema validation alone is not enough to verify that a message has a business sense. A request that is perfectly valid from structural point of view may contain incompatible or meaningless data.
Such data errors easily slip past QA tests, and can happen in PROD environments because the data in a DB is incorrect.
Consider performing business validation only. It is much faster than schema validation (since it only requires parsing), but the value to the business is much higher.
I'm building SOA enterprise systems for clients large and small for almost 20 years. Most of that time I've been working with BEA (later Oracle) Weblogic platform, including OSB and other SOA systems.
Feel free to contact me if you have a SOA project to design and implement. See my profile on LinkedIn.
I live in Toronto, Ontario, Canada. Email me at firstname.lastname@example.org