performance

OAuth signatures: why we need 'em

Before I get started, let me say that this should be taken with a very large grain of salt. I am not a technical expert in these areas. I've only done a fair amount of reading and following along with standards efforts. Conclusions here should be questioned thoroughly and corrected as necessary, but since no one else seems to be willing to systematically look into the performance argument for signatures over SSL/TLS, I thought I'd just sit down for a couple of hours and pound it out. The following is the result of that work.

Resource access without SSL/TLS - Signatures vs. Bearer-tokens

The crux of the trade-off comes down to this: In the case of signed authorization information (with a nonce), a Man in the Middle (MITM) attacker owns the individual message. The MITM can do the following:

  1. Decline to deliver the message
  2. Hold the message and deliver it later (subject to constraints on timestamp validity - so this is a limited attack)
  3. Deliver the message with altered contents (only if the contents themselves are not signed)

In the case of bearer-token authorization, a MITM owns the entire connection. The MITM can do the following:

  1. Decline to deliver the message
  2. Hold the message and deliver it later
  3. Change the timestamp (if one is used at all)
  4. Change the contents of the message
  5. Create an arbitrary number of new messages with arbitrary contents using the bearer-token and deliver those (subject to bearer-token invalidation)

In my opinion, #5 makes bearer-tokens without SSL/TLS unusable in nearly all use cases. Any use case where spam is a threat, for example, is not an acceptable use case because of this attack because any MITM can send an arbitrary number of spam messages that the provider application will show as coming directly from the user. If you think Twitter direct-message spam is bad now...

Because of this and other possible attacks (successfully impersonating a provider gets you the bearer-token, for example), I think that bearer-tokens without SSL/TLS are not worth considering further for the vast majority of use cases.

So what about SSL/TLS? Can't we just use that for everything and forget about signatures?

Yes, but there are tradeoffs.

The most important and unavoidable tradeoff is performance. The performance tradeoff comes in two flavors:

  1. Connection latency to bootstrap the connection (from the asymmetric/public-key encryption operations)
  2. Processing to encrypt requests (from the symmetric/private-key encryption operations)

Connection latency is incurred to the public-key cryptography operations that are used to facilitate the exchange of a shared secret, which can then be used in the much more efficient private/symmetric key cryptography used to encrypt the individual messages. Research (1 - pdf) indicates that, on the web server, SSL introduces approximately a 70% overhead over basic HTTP for small requests and that of this overhead almost 90% is incurred in the public key operations.

As request size increases, the public key operation overhead remains constant while the private key operations to encrypt the request scales more or less linearly with request size. As a result, the actual act of encrypting the request becomes the limiting factor relatively quickly for large requests.

Keep in mind, this is additional latency on the web server, not on the client. Other research (2 - pdf) indicates that clients may be even more resource-bound for these types of operations, resulting in even more severe latency problems.

It's important to note that these two types of performance impact become problematic under different use cases.

When SSL makes sense and when it doesn't

SSL has negative performance impacts which will result in real user experience performance impacts in certain situations. Service providers can use specialized hardware to significantly accelerate the necessary cryptography operations, but any client smaller than a medium-sized web application provider will not be able to afford the cost of such hardware. As such, these clients will suffer a user experience impact from the required use of SSL/TLS connections in some situations.

What are these situations?

Connection latency is an issue when new connections must be established for most operations. This is the case for intermittent API requests. Some use cases where this could be an issue (keeping in mind that even an extra fraction of a second of wait-time is a user-experience penalty that many applications may be unhappy paying):

  1. Display statistics in a rich/desktop app in response to a user request (Ego app on iPhone requesting statistics from Google Analytics, for instance)
  2. Check for new messages in a rich/desktop in response to a user request (Dashboard widget, for example)
  3. Request data within a web application from a little-used third-party provider in response to a user request (web app like youcalc.com with ability to connect to arbitrary OAuth-enabled datastores)

Meanwhile, encryption of the request becomes an issue for large request uploads and download especially. Some examples:

  1. Video and media file access requests, including streaming
  2. Google Docs document list API upload/download requests

In both of the situations outlined above, SSL/TLS imposes a performance overhead that should theoretically have user experience implications. As I've been thinking about this issue, the following matrix has proven helpful in visualizing the problem space and helping to show that SSL/TLS is a clear winner in only a limited (but admittedly very common) portion of the problem space.

Sensitive to connection latency Not sensitive to connection latency
Large requests SSL/TLS not a good option SSL/TLS may be an option
No large requests SSL/TLS may be an option SSL/TLS is a good option

How does signing address these use cases?

The answer to this question depends on the form that signing ends up taking. If only the authentication parameters are signed, then we are in a situation similar to OAuth 1.0a in which latency is minimal because the hash operation used in the signing is cheap and is executed on a very small amount of data. Further, performance is unrelated to request size because only the authentication parameters are signed and the length of these parameters are unrelated to the request payload.

If full message signing is required, then we start to run into the encryption penalty, as effort to calculate the hash of the entire payload should scale more or less linearly with the size of the payload. However, connection latency should remain minimal for small requests.

Of course, using signatures without transport-layer security brings with it a whole new matrix of known security considerations. As mentioned at the beginning of this little essay, from a security perspective, signing appears to be at least as good in all respects as bearer tokens sent in the clear.

Conclusion

There are theoretical and currently deployed use-cases in which dispensing with SSL/TLS and signing at least the authentication parameters of a request makes sense. These are use cases in which SSL/TLS has an undesirable user experience impact and in which leaving the request payload in the clear does not have significant negative security impacts. For this reason, and because the signing approach is the predominant deployed OAuth method (even though SSL/TLS with a bearer token is an option), we should maintain a signature method in the current draft.

References

  1. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.62.8589&rep=rep1&type=pdf - Anatomy and Performance of SSL Processing - Li Zhao, Ravi Iyer, Srihari Makineni, Laxmi Bhuyan - p. 4 is the meat
  2. http://iweb.tntech.edu/hexb/publications/https-STAR-03122003.pdf - A Performance Analysis of Secure HTTP Protocol - Xubin He

Automated testing in SAP systems

This is another in my series of notes to myself that might somehow come in handy to others, so I'll post them to the blog. This is more of a brainstorm than anything else, but some thought has gone into this area. In terms of actual testing going on in the SAP world, it probably displays my ignorance more than anything, but I'd love to be corrected. If you aren't interested in SAP or in testing methodology, then I recommend you stop reading now!

Based on what I've seen in my admittedly shortish time in the SAP world, the state of automated testing is sadly lacking. This is not to say that the facilities for testing don't exist, but perhaps that they are not adequately ergonomic or well understood. The result is that most implementations don't have a mature testing automation strategy, which is a key component of a modern approach to software development.

Unit testing

In the unit test arena we're doing alright as we have most of the standard Java capabilities on the Java development side and we have ABAP Unit on the ABAP side of the house. Though, I'll note that you've got to be a bit careful in designing your ABAP Unit tests to handle more than one failure per execution. There also isn't much of a facility for handling alternative testing strategies such as specification derived tests or behavioral testing. Part of this is due to the limitations of ABAP as a (mostly) statically typed, strictly object-oriented language. Part of this inflexibility is simply due to the fact that the necessary facilities have not been developed.

I'll also note that while Java isn't my cup of tea, apparently setting up automated unit testing using JUnit on the Java side of the SAP world isn't straightforward. The SDN blog "Test-driven development with Composite Application Framework (CE 7.1)"looks like a decent overview of the process of getting JUnit testing set up.

With SAP's focus on positioning itself as a full-featured development platform for the enterprise and the growth of agile, modern development methodologies in the enterprise, I'm surprised that unit testing and test-driven development aren't getting more attention from SAP as an integral part of the application lifecycle.

Performance

In the performance testing area a lot of work has been done to work out transactional systems under load. However, I'd argue that we need to understand the limits of this sort of testing. Usually it is focused on a single transaction or report, which is fine for testing the maximum performance of that piece of the system, but the tester must have a significant knowledge of the underlying architecture of the system in order to really test for table write contention or the performance of specialized processing logic under load.

Performance testing gets really worthwhile only when the level of knowledge of the system architecture reaches a level where a significant number of these trouble spots can be identified and tested under load in a test environment. As it happens, we're mostly doing this type of testing in production in the SAP world, which is the situation we are ostensibly trying to avoid by doing "performance testing" in a pre-production test system.

Jonathon Kohl writes knowledgeably of some of the complexities inherent in performance testing. I think we would be wise to admit the limits of performance testing in an environment as complex as most SAP implementations. This is not to say that we shouldn't do it, but simple load testing isn't getting us very far and we might be better served by focusing on replicating real system use or focusing on a more exploratory version of performance testing.

Functional/Behavior

Up until this point we've had cursory automated testing occurring in most implementations. But when we get to functional testing things start to fall apart. There is a lot of potential contention over the right place to interface into the application to do functional testing. Most MVC frameworks provide an interface into the controller level to mock view requests and model contents. Some believe that the view/interface level is a good place for this kind of testing, which makes a certain amount of sense if the framework doesn't have a clear delineation between the user interface and the application logic or won't commit to the delineation that exists.

The options for functional testing in delivered SAP systems are pretty much eCATT or an external testing tool that hooks in at the UI or API level. eCATT is not an ideal functional testing tool on some reads of "proper" functional testing but should be usable. I'll be looking into some eCATT good practices and testing strategies in a future article. There are also some options from vendors like Mercury (now HP).

It should be possible to hack together a working external automated functional testing system for your organization's SAP landscape and chosen development method, but it would be expensive to maintain across significant upgrades. For example the recent Netweaver 7.0 enhancement package 1 upgrade would probably break most browser driven testing of WebDynPro ABAP/Java applications because of the shift from full server round-trips to a more AJAX-y user interface programming approach. Multiple levels of abstraction in the test harness would be necessary to limit breakage to predictable and manageable layers of the testing stack.

I still think it is worthwhile to pursue this strategy if the built-in support for functional testing from SAP does not satisfy the needs of an organization. The automated approach done well can provide for better coverage, accuracy, and speed, while saving a bunch of un-fulfilling person-hours spent manually exercising a system while struggling to stay awake. There may be room for a vendor to build up and maintain standard APIs across versions of SAP systems and I'm sure that vendors like Mercury are working on this, at least in transactional systems. Heck, there may be room for an open source project to do the same using a browser driver like WATIR. Hmmmm...

BI and performance management applications still seem to be pretty much an open field here.

Why automation

The real power of automated functional testing starts to show up when we consider doing things that we basically don't do in SAP projects. Coupled with an automated system build process (yes, we probably need to talk about this too), automated test suites allow for

  • Full nightly functional regression tests
  • Running the full test suite on a contemplated change or note application
  • Full regression testing after upgrading
  • Behavior driven development (BDD) or business applications, where the specs are the business requirements
  • Continuous integration for every level of the development and maintenance landscape

The potential gains in productivity and reductions in risk that can be gleaned from engaging in a disciplined testing methodology (be it focused on coverage, TDD, BDD, etc) appear large. Certainly worth more thought and study.

Workflow change costs estimation

Originally posted as a comment by esjewett on The Inquisitr using Disqus.

These are excellent points, but it only gets at half of the equation. As I see it, the two main questions when working on my personal workflow are effectiveness and flexibility. My impression is that when people are thinking about their workflows there is far too much emphasis on the stability of the platform (which is your point, and a good one) because of an inflated estimate of the costs associated with a workflow change.

The basic question is this: Is a change to my workflow that takes X hours to execute worthwhile?

The way a lot of people think about this question is by asking "Do I have X hours to spend on this or would I rather spend it on something else?"

The question people should be asking contains another variable - the number of hours they will save overall through this change. Let's call this number of hours "Y". The question is: "Do I have X-Y hours to spend on this or would I rather spend it on something else?" When the number of hours saved (Y) becomes greater than the switching cost (X), the cost associated with a workflow change becomes negative.

I actually switch tools quite often, so my estimate of the total productivity gain for a given tool is necessarily limited by a short time horizon and my tolerance for switching costs should be correspondingly lower, but I still find that a switch is of tools is justified for a pretty small daily productivity gain (even 5 minutes saved per day makes a tool a clear winner).

If Google Reader shut down tomorrow and I had to switch to a different feed reader, would I come out ahead in productivity over the last year or two as compared to the next best solution? I have no doubt in my mind that I would. As such, I have no problem "entrusting" this part of my workflow to the web, as long as the switching costs (X) are less than the productivity gain.

I find that the "web-iness" of a tool doesn't factor much in the estimate of switching costs. The OPML export from Google Reader happens to make the switching cost very low, which makes my decision to use Google Reader very easy.

Original article: http://www.inquisitr.com/15368/so-you-want-to-trust-your-workflow-to-the-web-good-luck-with-than-plan/.

FriendFeed slightly over enthusiastic

FriendFeed is the hot new thing in social activity aggregation. The basic idea is that it aggregates all of your activity from various online services into one big activity stream. You can find mine here: http://www.friendfeed.com/esjewett.

Doing this type of aggregation means querying the services where content lives on a regular basis for each user. I signed up this blog, so this blog's RSS feed gets pinged on a regular basis. Perhaps too regular a basis.

Feedburner stats for FriendFeed bot

Those are the stats for a single day. In case anyone didn't notice, I don't post here all that much, so checking ~65 times per day (every 20 minutes or so) might be overdoing it a tad!

Some other feed-catchers do checks at increasing intervals when no new content is found, with a reasonable maximum interval of an hour or so. Such an algorithm, tweaked for FriendFeed's particular case, might save some serious bandwidth. It might also open up bandwidth and processing time to allow it to stay even more ridiculously up to date with frequently updated content.

Twitter, Rails, and scaling - An uninformed commentary

This post originally appeared as a comment on Shel Israel's Global Neighborhoods post, "An Open Letter to the Twitter Guys". I've copied it here to make it available to me in the future, and I've made some edits to connect links and bring it more into line with a blog post than a comment. In the process, it's been removed from the context of its conversation, so click back through the above links and read the thread. It's a good blog post and there are some excellent comments.

----

The only problem with the speculation on the technical underpinnings of Twitter's scaling issues going on here is that we might be (probably are?) wrong. That said, allow me to engage in some speculation!

I think in this case Shel and his technical contacts are probably on the wrong track. This conversation has already occurred once, with the base accusation that Rails can't talk to multiple databases. Once aired, the problem was solved relatively quickly and easily via a code contribution from the community: http://drnicwilliams.com/2007/04/12/magic-multi-connections-a-facility-in-rails-to-talk-to-more-than-one-database-at-a-time/

So even at the beginning, the problem was not the Rails application server architecture, but an issue of database contention. David HH's response to the criticism may have seemed a bit defensive, but at root he was correct that the best way to engage the community is to air your issues in community forums rather than try to work through your problems silently and then accuse the product of a shortcoming out of frustration during an interview. It certainly seems that the whole thing could have been handled better.

DHH also seems to be correct that at the application server level, Rails scales easily (though expensively) by simply throwing hardware at the problem. However, Twitter wasn't dealing with an application server problem and therefore wasn't dealing with a specifically "Rails" problem.

Moving on to the current discussion, it seems the root issue is still database contention, possibly in conjunction with a client polling architecture that may be reaching the limit of scale. See http://www.highscalability.com/scaling-twitter-making-twitter-10000-percent-faster and http://www.readwriteweb.com/archives/xmpp_web.php for decent overviews.

Having used Rails, Ruby and PHP (though not having scaled anything), I can say that as far as I see there is nothing inherent in either that solves or significantly exacerbates either of these issues. On the database side, applications built using either language/framework will generate SQL statements that query the database. In pure PHP, you usually write the SQL yourself. Rails does a lot of the SQL work for you, but it is certainly possible to override that assist and do it yourself if you've developed a situation where further optimization is required.

In short, all factors seem to point to inherent architectural issues that Twitter is struggling with. There are always ways to approach these issues, but it's far more complex than switching from one language or framework to another.

Syndicate content