January 9, 2014

Why GZip Doesn't Help Direct Proxy's Performance

For direct OSB proxies, passing XML content as GZip will not improve the end-to-end performance due to extra time required for gzipping and un-gzipping.

There is a service I have to deal with which is a constant headache.

The service returns a huge, 1Mbyte or more, XML with all kinds of information.

The web application that calls the service experiences a very noticeable delay, which affects the user experience. Web developers measured that the average delay time for them is about 500ms, and the delay grows proportionally to the response size.

The size is obviously the culprit, and I gotta do something about it!

GZip The Response! It Must Help!

Internet has solved the same problem long time ago. When the web page is big, server compresses the payload with gzip which often reduces the size by factor of 10.

Since SOAP runs on top of HTTP, the gzip solution is available to it as well. I just need to make OSB to pass-through the gzipped content without trying to decompress it. In fact, it does so by default, but it strips the “Content-Type: gzip” header. Missing header breaks the caller, but this is easy to fix though.

Here’s what I did:

  • I made sure I do not touch the payload, even for logging, so OSB will not try to parse the gzipped stream

  • I sent “Accept-Encoding: gzip” header to the backend service

  • I passed all headers from the backend service back to the caller(**)

Since the backend service supports gzip, it should serve me the gzipped content now. And it did!

Yay! The response size was 50 times smaller than original. I was expecting at least 10 times improvement in the call time.

And… Nothing

The performance test indeed has shown an improvement … of about 1%. Huh?

Looking Outside Of OSB Box

Now it was obvious my mental model of where the time is spent was wrong.

I painstakingly measured all the steps the calling application and the backend had to do to communicate. They did quite a lot of work outside of simple sending the data!

Sender had to build DOM of the message first, then serialize it into XML string, and only then write it into the wire.

Receiver had to parse the XML string into DOM.

When GZip was used, sender had to compress the XML string before sending it, and the recipient had to decompress it before parsing.

Here’s how the request/response cycle looked before implementing GZip:

Caller In-Flight+OSB Backend Time
Building Request DOM 6ms
Serializing Request DOM into XML 15ms
Sending request via OSB 5ms
Parsing request into DOM 5ms
Querying database 55ms
Bulding response DOM 82ms
Serializing response DOM into XML 237ms
Sending response via OSB 31ms
Parsing response into DOM 166ms
TOTAL 602ms

Note that largest time hog is not the network, but building, serializing and parsing XML! Using gzip does not affect these steps, and so has little effect on the overall delay:

Caller In-Flight+OSB Backend Time
Building Request DOM 6ms
Serializing Request DOM into XML 15ms
Sending request via OSB 5ms
Parsing request into DOM 5ms
Querying database 55ms
Bulding response DOM 82ms
Serializing response DOM into XML 237ms
GZipping the response 18ms
Sending response via OSB 5ms
Un-GZipping the response 2ms
Parsing response into DOM 166ms
TOTAL 596ms

Gzip indeed cut the transfer time some 6 times, but the backend had to spend some time to compress the payload, and the caller had to spent a bit more time to decompress it, which killed almost all the savings.

The most troublemaking steps though - XML processing - left unaffected.

Conclusion

The size was indeed the root cause, but its effect was outside of OSB. The optimization efforts should have been focused on making the data size smaller.

Vladimir Dyuzhev, author of GenericParallel

About Me

My name is Vladimir Dyuzhev, and I'm the author of GenericParallel, an OSB proxy service for making parallel calls effortlessly and MockMotor, a powerful mock server.

I'm building SOA enterprise systems for clients large and small for almost 20 years. Most of that time I've been working with BEA (later Oracle) Weblogic platform, including OSB and other SOA systems.

Feel free to contact me if you have a SOA project to design and implement. See my profile on LinkedIn.

I live in Toronto, Ontario, Canada.  canada   Email me at info@genericparallel.com