Why in-memory doesn't matter (and why it does)

​​(This post is a bit of a blast from the past. It was original published in 2009 and was lost during a migration in 2011. The software landscape it references is obviously dated and many links are likely broken, but the analysis is still relevant.)

Well, that title was going to be perfect flame-bait, but then I went all moderate and decided to write a blog that actually matters. So here's the low-down:

There's a lot of talk lately about in-memory and how it's the awesome. This is especially true in the SAP-o-sphere, primarily due to SAP's marketing might getting thrown behind Business Warehouse Accelerator (BWA) and the in-memory analytics baked into Business ByDesign.

I'm here today to throw some cold water on that pronouncement. Yes, in-memory is a great idea in a lot of situations, but it has its downsides, and it won't address a lot of the issues that people are saying it addresses. In the SAP space, I blame some of the marketing around BWA. In the rest of the internet, I'm not sure if this is even an issue.

Since I've actually done a fair amount of thinking about these issues (and as a result I troll people on Twitter about it), I thought maybe it'd be helpful if I wrote it down.

So let's get down to brass tacks:

How in-memory helps

In short: it speeds everything up.

How much? Well, let's do the math: Your high-end server hard drive has a seek time of around 2 ms. That's 2*10^-3 seconds (thanks Google). Yes, I'm ignoring rotational latency to keep it simple.

Meanwhile, fast RAM has a latency measured in nanoseconds. Let's say 10ns to keep it simple. That's 10^-8 seconds.

So, if I remember my arithmetic (and I don't), RAM is about 2*10^5, or 200,000 times faster than hard disk access.

Keep in mind that RAM is actually faster because the CPU-memory interface usually supports faster transfer rates than the disk-CPU interface. But then, hard disks are actually faster because there are ways to drastically improve overall access performance and transfer rates (RAID, iSCSI? - not really my area). Point is, RAM helps your data access go a lot faster.

But ... er ... wait a second (or several thousand)

So here I am thinking, "Well, we're all fine and dandy then. I just put my job in RAM and it goes somewhere between 100,000 and 1,000,000 times as fast. Awesome!".

But then I remember that RAM isn't a viable backing store for some applications, like ERPs (no matter what Hasso Plattner seems to be saying) or any other application where you can't lose data, period. Yes it can act as a cache, but your writes (at least) are going to have to be transactional and will be constrained by the speed of your actual backing store, which will probably be a database on disk.                                                           

And then I see actual benchmarks meant to reflect the real world like this. For those who won't click the link, the numbers are a bit hard to read, but I'm seeing RAM doing about 10,000 database operations in the amount of time it takes a hard disk store to do about 100. That's only a 100x speedup.

Ok, now I'm back down to earth and I'm thinking, "I just put my job in RAM and I'll get maybe a 50-100x speedup but at the cost of significant volatility". (I'm also thinking that SAP's claimed performance improvements of 10x - 100x sound just about like what we'd expect.)

This is still really really good. It makes some things possible that were not possible before and it makes some things easy that used to be hard.

And finally, why in-memory doesn't matter

But really, what is the proportion of well-optimized workloads in the world? How often are people going to use in-memory as an excuse to be lazy about solving the actual underlying problems? In my experience, a lot. Already we are hearing things along the lines of, "The massive BW query on a DSO is slow? Throw the DSO into the BWA index." [Editor's note: A DSO is essentially a flat table. Also, the current version of BWA doesn't support direct indexing of DSOs, but it probably will soon, along with directly indexing ERP tables.]

Now's the part where we who know what we're doing tear these people to shreds and tell them to implement a real Information Lifecycle Management system and build a Inmon-approved data warehouse using their BW system (BW makes it relatively easy). Then that complex query on a flat table that used to take two days of runtime will run in 30 seconds.

Well, that would be one approach, but frankly most people and companies don't have the time or the organizational maturity in their IT function to pull this off. And in this world, where people have neither the time nor the business process for this sort of thing, then it starts to make sense to spend money on it, and something like BWA is a great thing in this context.

But it's not great because it's in-memory. It's great because it takes your data - that data you haven't had the time to properly build into a datawarehouse with a layered and scalable architecture, highly optimized ROLAP stores, painstakingly configured caching, and carefully crafted delta processes - and it compresses it, partitions it, and denormalizes it (where appropriate). Then, as the icing on the cake, it caches the heck out of it in memory.

Let's be clear: BW already has in-memory capabilities. Livecache is used with APO, and the OLAP cache resides in memory. The reason BWA matters is not that it is in-memory. It matters because it does the hard work for you behind the scenes, and partially because of this it is able to use architectural paradigms like column-based stores, compression, and partitioning that deliver performance improvements for certain types of queries regardless of the backing store.

In-memory is great, and fast, and should be used. But in most ways that are really important, it doesn't matter all that much.'

How Google+ spams people using the Gmail widget

So, there's this new-ish widget in Gmail that shows up on the right side gutter and gives information about the person you are receiving email from. It's actually pretty useful. In addition to basic contact information, it shows thumbnails of recent pictures in emails from that person, and gives chat options.

Here, I have sent an email to my real Gmail account from a throwaway account called ethanjewett@gmail.com

Here, I have sent an email to my real Gmail account from a throwaway account called ethanjewett@gmail.com

Recently, the option to add a person to your Google+ circles has been added to this widget in Gmail. Circles have also been integrated into Gmail Contacts, so a reasonable person might well think that this widget is a way to add people to contact groups.

The new G+ button

The new G+ button

Clicking on this butten pops up a list of your circles, allowing you to choose how you want to categorize this person.

I'm my own friend

I'm my own friend


At this point, my "friend" has been added to my "Friends" circle.

What I might not expect at this point is that Google has sent my new "Friend" a very excited email falsely stating that I am inviting him to join Google+

What I might not expect at this point is that Google has sent my new "Friend" a very excited email falsely stating that I am inviting him to join Google+

I'm a little concerned by this for the following reasons:

  • Did I invite this person to join Google+? No, I added him to a circle in Gmail.
  • Am I recommending Google+ for this person? No. In fact, I only found out about this because I added people to my circles in this manner that were confused by the invitation. One of these people wrote back to me asking what it was. I would never have invited some of these people to Google+ because I know they have no interest in it.
  • Did I want this person to know that I had added them to a circle/group in my Gmail contacts? Not necessarily. I really believe that this function is easy to confuse with contact groups, and organizing contacts is a very private activity that I do not want my contacts to know about. What if this is someone I want to avoid, someone I don't want to think about me. What if I added this person to a circle called "Stalkers"?
  • Google (and everyone else for that matter) should never ever ever email my contacts without my express consent.

I trust Google with my contact information and my email. This type of behavior makes quite clear that Google is not worthy of that trust. Am I going to do anything about it? Not at the moment, but it is one more straw added to the camel's back.

SAP BI OnDemand and Hana

It's been some time now since the press releases and SAP TechEd Bangalore keynote proclaiming that SAP's BI OnDemand product now runs on HANA as its underlying database. The press releases have gone out. The product is here. The BI OnDemand website has been updated with a shiny new "Powered by SAP Hana" logo.

There is only one problem. It seems that the BI OnDemand that most people can see is not actually powered by Hana.

I discovered this for myself when discussing the topic with Courtney Bjorlin, who was working on an article about the announcement. SAP confirms in the article that only the "Advanced Edition" of BI OnDemand is available on the HANA database. At SAP's TechEd in Madrid, I was able to ask around on the show floor and hallways and find out more about the situation.

How do I get BI OnDemand running on HANA?

You have to buy the "Advanced Edition" of BI OnDemand. This involves a sales process and is a hosted version of the BI OnDemand platform. It seems that it's not exactly SaaS or "OnDemand", but more on that below.

The fact that the logo at https://bi.ondemand.com says "Powered by SAP Hana" is apparently an inaccuracy. Hopefully that will be corrected soon.

What are these different "editions" of BI OnDemand?

There are three "editions" of BI OnDemand: Personal, Essential, and Advanced. Based on my discussions, it seems that Personal and Essential editions are SaaS applications hosted by SAP, while the Advanced edition is hosted by partners. All editions seem to include the same web interface as seen on bi.ondemand.com, but the Essential version includes customization and branding options as well as more storage. The Advanced edition features even more storage and customization options, plus access to a hosted version of the BusinessObjects Data Services, which can be used to manage contents of DataSets. This integration of Data Services can allow for incremental updates to DataSets, which is a key feature and is not possible in the Personal or Essential editions.

As far as I can tell, none of this is documented anywhere on SAP's standard sites. My thanks to Richard Hirsch for finding this presentation outlining some of these points (see page 17): link to PDF (link has been removed because SAP has let the domain lapse - original URL was http://sap-partnersummit2011.com/doc/post_event/FKOM2011_BA&T%20track_Day2/BAtrack_2_BA-Solutions&Innovation.pdf).

So if I have the Advanced edition, I'm now on Hana?

No, not quite.

First of all, based on discussions at TechEd Madrid, it seems that only new customers can currently get onto the Hana-based BI OnDemand platform. Apparently there are contingencies for existing customers to migrate eventually, but right now it is only for new customers.

Further complicating the issue, it seems that not all hosting partners for the Advanced edition provide HANA as the underlying platform. I was told by SAP employees on the show floor that only one partner is currently providing BI OnDemand on HANA, and that partner is only in North America. Other partners are providing the BI OnDemand on the older Microsoft SQL Server-based platform. I have yet to confirm this; it is only based on the one source, so take it with a grain of salt. But there is clearly confusion around the availability of BI OnDemand using HANA, even if you are purchasing the Advanced Edition.

If capabilities provided only by HANA are required for your implementation, be sure you are actually getting HANA when you buy the BI OnDemand Advanced Edition.

Is it Hana or HANA?

I have no idea. I did learn at TechEd that HANA (or Hana) is not an acronym, so I'm leaning towards Hana, but old habits die hard.

Ok, enough with the Q&A. What does this mean?

In my view, this means that SAP still has a lot of work to do getting its message across clearly. It is not particularly bad or good that HANA is not available for the Personal or Essential editions of BI OnDemand. These editions are limited to data set sizes that are simply too small for HANA to make much of a difference.

The greater concern here is one of communication. For any company, it is extremely important to say what you mean and mean what you say. It would have been much better if SAP had been clear about the roll-out of HANA for BI OnDemand from the beginning. As it stands now, many people will try out the Personal edition and think that they are using "Hana", but they're not.

Looking to the wider view, I worry about what this partial roll-out means for SAP's BI cloud play. The BI SaaS market is still very immature and SAP has the opportunity to play a leading role in this emerging market. However the BI OnDemand product doesn't seem to have received the sort of development attention required for this role, and the deployment options seem to be severely lacking.

Companies and departments looking to buy powerful SaaS BI capabilities are not interested in figuring out what database the product is using and the impact this has on their reporting needs. SaaS should work as defined in SLAs, and it should keep getting faster and better in a way that is non-disruptive.

After talking with some of the BI OnDemand development team in February, I know that they have a good understanding of the BI SaaS space and have some great ideas for the BI OnDemand platform. I'd love to see SAP deliver on its potential in this area and I think they have the people and the vision to do so, but we haven't seen it in the product yet.

Hopefully SAP can get both the BI OnDemand message and the platform straightened out quickly. The BI SaaS market it still extremely young and SAP could be leading the way.

Disclosure: SAP provided my travel and badge for the TechEd + Sapphire 2011 conference in Madrid.

On Layers

Something I saw somewhere about cutting layers out from a software stack reminded me of something else that has been niggling me for a while. Most likely what I saw was someone talking about the Vishal Sikka keynote at SAP's TechEd conference, talking about SAP's HANA in-memory application server removing layers from the software stack.

SAP has been doing this layer-collapsing move a fair amount lately and I think this is attributable to their innovation push over the last few years. I think that's all good, though we'll see where it brings us. More on that later. But what niggles me is this idea that removing layers is always a good thing. Them there layers are actually doing something you know!

So what are layers? On the server side of a client-server (two layers right there) architecture we might be talking about database servers, application servers, and front-end rendering servers. In application the application space maybe we talk about model, view, controller architectures. Every system of any complexity has some sort of layering concept.

Why do these layers exist? What are they doing? Primarily lowering costs and allowing people to get their work done. Layers server as an abstraction, as a way to get work done within a layer without having to worry too much about what is going on outside of the layer. They are a protective bubble that allows me to do things like write this post or tweet without experiencing (increasingly) existential angst about how the bits are stored.

Yes, layers look really complex when viewed from a systems perspective, but that is usually because they are codifying an incredibly complex system into a form that we can actually comprehend. Layers force structure onto a system in order to make spaces where complexity is hidden and people can live and work.

Which brings us to the point that it's not really about what is in the layers so much as it is about the transitions between the layers - those shields that create the spaces where actual work and life gets done. Transitions between layers are interfaces of some sort. They are negotiated ways for layers to talk to each other. In programs, these are called APIs. In real life, this is called bureacracy.

APIs and bureacracy share a few things in common:

  • They have lots of rules about who you have to talk to and how you have to talk to them
  • Everyone thinks they are terribly designed
  • They're usually pretty slow compared to going "straight to the source"
  • If they didn't exist and everyone went "straight to the source" then there would be total chaos

In short, everyone hates them but nothing would get done without them.

Sometimes, layers outlive their usefulness and those who hate them overwhelm the inherent inertia of mass acceptance. When this happens in technology, it's a "technical revolution". When it happens to companies, it's a "reorganization". And when it happens to governments, it's a real "revolution.

Here's the thing: Revolutions are pretty nasty affairs.

In the political sphere, lots of people die. In the industrial sphere, people lose their jobs. And in technology people experience angst about bits.

Seriously, there have been a ton of questions about how HANA persists data. People are literally experiencing angst over how their bits are stored.

So, I know that tearing apart layers is sometimes necessary in order to get past some institutionalized road-blocks that have outlived their usefulness. Just remember that those layers you just ripped through actually were doing something useful.