Dynamic Service Level Agreements

Web service messages in a SOA environment can be measured for compliance to service level agreements. There are many tools and applications out on the market that allow the measurement and enforcement of these types of SLAs, but what happens when you have something out of the ordinary? For example, recently I had done some consulting for an organization that wanted to do dynamic service level agreements base on their services’ dynamic response times. My initial thought was “Huh? Why? That goes against the mantra of SLAs.” Then I thought about it for a minute as this kind of intrigued me.

So the problem domain was such that the client needed an SLA to dynamically adjust based upon a certain number of elements within a SOAP message. The client knew that based upon one element in a SOAP message that it took X amount of seconds for the web service to process that specific element. This is especially important if the service is waiting for a database or a transaction to process that message. (Why not an entity bean with an asynchronous messaging scenario is another story all together.) However, how would this be handled in a post-processed scenario?

I set out to figure this problem out. So what I did was to get the potential profile characteristics of this service and its potential messages. The service normally processing would have a baseline response time of 300 ms. The elements that were to be metered (for lack of a better term) would be an additional 400 ms for each element. The dynamic total response time would have to be a total of the baseline response time plus a calculation of all the number of elements times their 400 ms expected transaction time. So in simpler terms I came up with the following algorithm for doing dynamic SLA response times:

Total Response Time = Baseline Response Time + (# of Elements * Element Transaction Time)

In essence, if one had the baseline response time of 300 ms, 10 elements and 400 ms for each element, I would come up with the following result:

Total Response Time = 300 ms + (10 * 400)

Total Response Time = 300 ms + 4000 ms

Total Response Time = 4300 ms, or 4.3 seconds

I thought to myself, cool, got that one solved, then the client threw another wrench in my gear. “Well, those elements might have a different transaction profile, so element A will have a transaction time of 300 ms, element B-400 ms and element C-500 ms.” So I thought to myself that this would be no problem, it is a simple modification of the algorithm above.

Total Response Time = Baseline Response Time + (# of Elements A * Element Transaction Time) + (# of Elements B * Element Transaction Time) + (# of Elements C * Element Transaction Time)

Total Response Time = 300 ms + (10 * 300) + (7 * 400) + (5 * 500)

Total Response Time = 300 ms + 3000 ms + 2800 ms + 2500 ms

Total Response Time = 8600 ms, or 8.6 seconds

After thinking a lot about it, I ended up thinking that this was a pretty good idea. I was able to implement this with simple XPath element checking of the message-counting of the elements, doing the appropriate calculation and throwing alerts when a message transaction did not comply with the dynamic SLA. Not something I see everyday, but it definitely solved one of their business logic issues.