<?xml version="1.0" encoding="UTF-8"?>
<!--Generated by Squarespace Site Server v5.11.81 (http://www.squarespace.com/) on Thu, 23 Feb 2012 05:03:01 GMT--><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rss="http://purl.org/rss/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:cc="http://web.resource.org/cc/"><rss:channel rdf:about="http://www.esjewett.com/blog/"><rss:title>Blog</rss:title><rss:link>http://www.esjewett.com/blog/</rss:link><rss:description></rss:description><dc:language>en-US</dc:language><dc:date>2012-02-23T05:03:01Z</dc:date><admin:generatorAgent rdf:resource="http://www.squarespace.com/">Squarespace Site Server v5.11.81 (http://www.squarespace.com/)</admin:generatorAgent><rss:items><rdf:Seq><rdf:li rdf:resource="http://www.esjewett.com/blog/sap-bi-ondemand-and-hana.html"/><rdf:li rdf:resource="http://www.esjewett.com/blog/2011/9/15/on-layers.html"/><rdf:li rdf:resource="http://www.esjewett.com/blog/2011/6/12/downloads-for-developers.html"/><rdf:li rdf:resource="http://www.esjewett.com/blog/2011/5/22/saps-hana-and-the-overall-confusion.html"/><rdf:li rdf:resource="http://www.esjewett.com/blog/2011/4/9/what-is-a-commodity.html"/><rdf:li rdf:resource="http://www.esjewett.com/blog/musing-about-semantics-in-bi.html"/><rdf:li rdf:resource="http://www.esjewett.com/blog/2010/10/31/thoughts-on-whats-next-for-apache-esme.html"/><rdf:li rdf:resource="http://www.esjewett.com/blog/2010/10/27/what-does-sap-mean-by-in-memory.html"/><rdf:li rdf:resource="http://www.esjewett.com/blog/elastic-lists-using-protovis.html"/></rdf:Seq></rss:items></rss:channel><rss:item rdf:about="http://www.esjewett.com/blog/sap-bi-ondemand-and-hana.html"><rss:title>SAP BI OnDemand and Hana</rss:title><rss:link>http://www.esjewett.com/blog/sap-bi-ondemand-and-hana.html</rss:link><dc:creator>Ethan Jewett</dc:creator><dc:date>2011-11-21T18:58:45Z</dc:date><dc:subject>bi cloud hana saas sap</dc:subject><content:encoded><![CDATA[<p>It's been some time now since the <a href="http://www.sap.com/about-sap/newsroom/press-releases/press.epx?pressid=17611">press releases</a> and <a href="http://www.virtualsapteched.com/index.aspx?url=vxStR2fZ/TV0dPFNfHjdKOdehNYIKn4G5smAnwX+xs7Qacz6Mg8rtg1/UJEb/fR3eEtgdyljO8HZ~sXnO9ZhLKTN6wL/y4UkM/lE0P2b/9A0=#yQCsCCTcXPyBL0BPjarl8jydsv/f7NR2FBrotkbn72g=">SAP TechEd Bangalore keynote</a>&nbsp;proclaiming that SAP's BI OnDemand product now runs on HANA as its underlying database. The press releases have gone out. The product is here. The BI OnDemand website has been updated with a shiny new "Powered by SAP Hana" logo.</p>
<p>There is only one problem. It seems that the BI OnDemand that most people can see is not actually powered by Hana.</p>
<p>I discovered this for myself when discussing the topic with <a href="https://twitter.com/cbjorlin">Courtney Bjorlin</a>, who was working on an <a href="http://www.asugnews.com/2011/10/20/bi-ondemand-running-on-sap-hana-database%E2%80%94but-not-for-all-users/">article</a> about the announcement. SAP confirms in the article that only the "Advanced Edition" of BI OnDemand is available on the HANA database. At SAP's TechEd in Madrid, I was able to ask around on the show floor and hallways and find out more about the situation.</p>
<p><strong>How do I get BI OnDemand running on HANA?</strong></p>
<p>You have to buy the "Advanced Edition" of BI OnDemand. This involves a sales process and is a hosted version of the BI OnDemand platform. It seems that it's not exactly SaaS or "OnDemand", but more on that below.</p>
<p>The fact that the logo at <a href="https://bi.ondemand.com">https://bi.ondemand.com</a> says "Powered by SAP Hana" is apparently an inaccuracy. Hopefully that will be corrected soon.</p>
<p><strong>What are these different "editions" of BI OnDemand?</strong></p>
<p>There are three "editions" of BI OnDemand: Personal, Essential, and Advanced. Based on my discussions, it seems that Personal and Essential editions are SaaS applications hosted by SAP, while the Advanced edition is hosted by partners. All editions seem to include the same web interface as seen on bi.ondemand.com, but the Essential version includes customization and branding options as well as more storage. The Advanced edition features even more storage and customization options, plus access to a hosted version of the BusinessObjects Data Services, which can be used to manage contents of DataSets. This integration of Data Services can allow for incremental updates to DataSets, which is a key feature and is not possible in the Personal or Essential editions.</p>
<p>As far as I can tell, none of this is documented anywhere on SAP's standard sites. My thanks to <a href="https://twitter.com/rhirsch">Richard Hirsch</a> for finding this presentation outlining some of these points (see page 17): <a href="http://sap-partnersummit2011.com/doc/post_event/FKOM2011_BA&amp;T%20track_Day2/BAtrack_2_BA-Solutions&amp;Innovation.pdf">link to PDF</a>.</p>
<p><strong>So if I have the Advanced edition, I'm now on Hana?</strong></p>
<p>No, not quite.</p>
<p>First of all, based on discussions at TechEd Madrid, it seems that only new customers can currently get onto the Hana-based BI OnDemand platform. Apparently there are contingencies for existing customers to migrate eventually, but right now it is only for new customers.</p>
<p>Further complicating the issue, it seems that not all hosting partners for the Advanced edition provide HANA as the underlying platform. I was told by SAP employees on the show floor that only one partner is currently providing BI OnDemand on HANA, and that partner is only in North America. Other partners are providing the BI OnDemand on the older Microsoft SQL Server-based platform. I have yet to confirm this; it is only based on the one source, so take it with a grain of salt. But there is clearly confusion around the availability of BI OnDemand using HANA, even if you are purchasing the Advanced Edition.</p>
<p>If capabilities provided only by HANA are required for your implementation, be sure you are actually getting HANA when you buy the BI OnDemand Advanced Edition.</p>
<p><strong>Is it Hana or HANA?</strong></p>
<p>I have no idea. I did learn at TechEd that HANA (or Hana) is not an acronym, so I'm leaning towards Hana, but old habits die hard.</p>
<p><strong>Ok, enough with the Q&amp;A. What does this mean?</strong></p>
<p>In my view, this means that SAP still has a lot of work to do getting its message across clearly. It is not particularly bad or good that HANA is not available for the Personal or Essential editions of BI OnDemand. These editions are limited to data set sizes that are simply too small for HANA to make much of a difference.</p>
<p>The greater concern here is one of communication. For any company, it is extremely important to say what you mean and mean what you say. It would have been much better if SAP had been clear about the roll-out of HANA for BI OnDemand from the beginning. As it stands now, many people will try out the Personal edition and think that they are using "Hana", but they're not.</p>
<p>Looking to the wider view, I worry about what this partial roll-out means for SAP's BI cloud play. The BI SaaS market is still very immature and SAP has the opportunity to play a leading role in this emerging market. However the BI OnDemand product doesn't seem to have received the sort of development attention required for this role, and the deployment options seem to be severely lacking.</p>
<p>Companies and departments looking to buy powerful SaaS BI capabilities are not interested in figuring out what database the product is using and the impact this has on their reporting needs. SaaS should work as defined in SLAs, and it should keep getting faster and better in a way that is non-disruptive.</p>
<p>After talking with some of the BI OnDemand development team in February, I know that they have a good understanding of the BI SaaS space and have some great ideas for the BI OnDemand platform. I'd love to see SAP deliver on its potential in this area and I think they have the people and the vision to do so, but we haven't seen it in the product yet.</p>
<p>Hopefully SAP can get both the BI OnDemand message and the platform straightened out quickly. The BI SaaS market it still extremely young and SAP could be leading the way.</p>
<p>Disclosure: SAP provided my travel and badge for the TechEd + Sapphire 2011 conference in Madrid.</p>]]></content:encoded></rss:item><rss:item rdf:about="http://www.esjewett.com/blog/2011/9/15/on-layers.html"><rss:title>On Layers</rss:title><rss:link>http://www.esjewett.com/blog/2011/9/15/on-layers.html</rss:link><dc:creator>Ethan Jewett</dc:creator><dc:date>2011-09-15T17:12:00Z</dc:date><dc:subject>architecture layering sap</dc:subject><content:encoded><![CDATA[<p>Something I saw somewhere about cutting layers out from a software stack reminded me of something else that has been niggling me for a while. Most likely what I saw was someone talking about the&nbsp;<a href="http://www.sapvirtualevents.com/teched/sessiondetails.aspx?sId=38">Vishal Sikka keynote at SAP's TechEd conference</a>, talking about SAP's HANA in-memory application server removing layers from the software stack.</p>
<p>SAP has been doing this layer-collapsing move a fair amount lately and I think this is attributable to their innovation push over the last few years. I think that's all good, though we'll see where it brings us. More on that later. But what niggles me is this idea that removing layers is always a good thing. Them there layers are actually doing something you know!</p>
<p>So what are layers? On the server side of a client-server (two layers right there) architecture we might be talking about database servers, application servers, and front-end rendering servers. In application the application space maybe we talk about model, view, controller architectures. Every system of any complexity has some sort of layering concept.</p>
<p>Why do these layers exist? What are they doing? Primarily lowering costs and allowing people to get their work done. Layers server as an abstraction, as a way to get work done within a layer without having to worry too much about what is going on outside of the layer. They are a protective bubble that allows me to do things like write this post or tweet without experiencing (increasingly) existential angst about how the bits are stored.</p>
<p>Yes, layers look really complex when viewed from a systems perspective, but that is usually because they are codifying an incredibly complex system into a form that we can actually comprehend. Layers force structure onto a system in order to make spaces where complexity is hidden and people can live and work.</p>
<p>Which brings us to the point that it's not really about what is in the layers so much as it is about the transitions between the layers - those shields that create the spaces where actual work and life gets done. Transitions between layers are interfaces of some sort. They are negotiated ways for layers to talk to each other. In programs, these are called APIs. In real life, this is called bureacracy.</p>
<p>APIs and bureacracy share a few things in common:</p>
<ul>
<li>They have lots of rules about who you have to talk to and how you have to talk to them</li>
<li>Everyone thinks they are terribly designed</li>
<li>They're usually pretty slow compared to going "stright to the source"</li>
<li>If they didn't exist and everyone went "straight to the source" then there would be total chaos</li>
</ul>
<p>In short, everyone hates them but nothing would get done without them.</p>
<p>Sometimes, layers outlive their usefulness and those who hate them overwhelm the inherent inertia of mass acceptance. When this happens in technology, it's a "technical revolution". When it happens to companies, it's a "reorganization". And when it happens to governments, it's a real "revolution.</p>
<p>Here's the thing: Revolutions are pretty nasty affairs.</p>
<p>In the political sphere, lots of people&nbsp;<strong>die</strong>. In the industrial sphere, people lose their jobs. And in technology people experience angst about bits.</p>
<p>Seriously, there have been a ton of questions about how HANA persists data. People are literally experiencing angst over how their bits are stored.</p>
<p>So, I know that tearing apart layers is sometimes necessary in order to get past some institutionalized road-blocks that have outlived their usefulness. Just remember that those layers you just ripped through actually&nbsp;<strong>were</strong>&nbsp;doing something useful.</p>]]></content:encoded></rss:item><rss:item rdf:about="http://www.esjewett.com/blog/2011/6/12/downloads-for-developers.html"><rss:title>Downloads for Developers</rss:title><rss:link>http://www.esjewett.com/blog/2011/6/12/downloads-for-developers.html</rss:link><dc:creator>Ethan Jewett</dc:creator><dc:date>2011-06-12T17:13:00Z</dc:date><dc:subject>competition enterprise software</dc:subject><content:encoded><![CDATA[<blockquote>
<p>The news here isn&rsquo;t that the 'new king-makers', as Savio put it, look a lot like the old kingmakers: developers. The news is that management may finally be realizing it.</p>
</blockquote>
<p>-&nbsp;<a href="http://redmonk.com/sogrady/2010/09/09/the-new-kingmakers">Stephen O'Grady, Redmonk</a></p>
<blockquote>
<p>Developers, developers, [...] developers</p>
</blockquote>
<p>- Steve Balmer</p>
<p>Most software platform companies at least partially get it these days - developers drive adoption and quality among technical groups. The software that has quality developers on its side will look better to both business and technical interest groups than the same software that is dragged down by developer indifference or animosity.</p>
<p>These points can be debated to a certain extent.</p>
<p>There are plenty of non-technical power-centers that drive adoption of software platforms in the enterprise. Preferred-vendor arrangements are common and historically were often negotiated at the CIO or higher level, with little developer involvement. Further, application vendors attempt (often with good success) to sell into business units rather than IT groups.</p>
<p>But in both cases developers still drive quality and adoption. Business units that buy applications directly often find themselves in need of connections to other systems or extensions to the application. This means developers are involved, either via an IT group, or as outside consultants.</p>
<p>Meanwhile, preferred-vendor agreements are constantly undermined and even when they are successful they may very well promote homogeneity and management ease at the expense of long-term quality in people and software. In order to make good software development and vendor management decisions, one must be well aware of the world beyond a single vendor bubble.</p>
<p>In order to bring developers on board with a vendor's offerings, to increase general awareness, and to drive sales, vendors need to get their software into the hands of developers. In the case of open source vendors, this is mostly an issue of getting the word out, as the software itself is only a download away. But in the case of more traditional enterprise vendors this can be a complicated proposition. Most enterprise vendors now provide downloads of some version of most of their platform software. Some vendors provide downloads of much of their application software as well.</p>
<p>Some example download sites:</p>
<ul>
<li>Oracle:&nbsp;<a href="http://www.oracle.com/technetwork/indexes/downloads/index.html">http://www.oracle.com/technetwork/indexes/downloads/index.html</a></li>
<li>IBM:&nbsp;<a href="http://www.ibm.com/developerworks/downloads/evalmethod.html">http://www.ibm.com/developerworks/downloads/evalmethod.html</a></li>
<li>SAP:&nbsp;<a href="http://www.sdn.sap.com/irj/scn/downloads">http://www.sdn.sap.com/irj/scn/downloads</a></li>
</ul>
<p>These downloads are usually made under a fairly restrictive license and are usually not available for all parts of the application or platform software. Because of vendors' business model it is somewhat costly to provide these downloads because they appear in a format that is not the standard distribution format for the vendor's software. There are also legal costs associated with writing and maintaining the developer licenses that are applied to these downloads.</p>
<p>I believe that vendors have a tendency to see these sorts of downloads as an overhead cost. They are not. They are a key step in driving both sales and developer adoption, which are closely linked.</p>
<p>Here's how:</p>
<ul>
<li>Ability to prototype before purchasing is a key part of the software selection process for responsible companies.</li>
<li>Today's developers will guide tomorrow's purchasing decisions.</li>
<li>A healthy developer ecosystem is necessary condition for a strong third-party application ecosystem.</li>
<li>A skilled, and preferably large, pool of developers is necessary for good project success rates.</li>
</ul>
<p>In his article&nbsp;<a href="http://www.sandhill.com/opinion/daily_blog.php?id=64&amp;post=438">"The CIO is the last to know"</a>, Billy Marshall talks about the CIO of a financial services company who is surprised to find that his operations people are running Red Hat Linux. This CIO was handed a decision via bottom-up fiat. It is a story that is played out again and again in the enterprise space.</p>
<p>The point is not that CIOs aren't doing their jobs. It's that the decision is inevitably influenced from a different level: the level of those actually carrying out development and operations. Maybe these people aren't actually making the purchasing decisions, but they talk to the people who are. And if someone makes a purchasing decision that development and operations disagree with or are unable to execute, that person is going to hear it. And they'll probably feel it when their group's productivity falls off a cliff.</p>
<p>CIOs either are or should be listening to their developers' opinions. It would be wise for enterprise vendors to divert some sales attention into making sure that those developers have good opinions of their software.</p>
<p>The first step is getting that software into developers' hands quickly and with a minimum of developer effort.</p>]]></content:encoded></rss:item><rss:item rdf:about="http://www.esjewett.com/blog/2011/5/22/saps-hana-and-the-overall-confusion.html"><rss:title>SAP's HANA and "the Overall Confusion"</rss:title><rss:link>http://www.esjewett.com/blog/2011/5/22/saps-hana-and-the-overall-confusion.html</rss:link><dc:creator>Ethan Jewett</dc:creator><dc:date>2011-05-22T08:40:00Z</dc:date><dc:subject>bi datawarehouse enterprise hana in-memory sap</dc:subject><content:encoded><![CDATA[<p>I threw together a very long response to a very long question on the SCN forums, regarding SAP's HANA application and its impact on business intelligence and datawarehousing activities. The original thread is <a href="http://forums.sdn.sap.com/thread.jspa?threadID=1959085&amp;tstart=0">here</a> and I'm sure it will continue to grow. But since my response was pretty thorough and contains a ton of relevant links, I thought I would reformat it and post it here as well.  In order to get a good overview of the HANA situation, I strongly recommend that anyone interested check out the following blogs and articles by several people, myself included:</p>
<ul>
<li><a href="http://www.saptour.ch/landingPagesFr/manager/uploads/6/3_zurek.pdf">The Impact of In-Memory Technology on Data Warehousing and Business Intelligence - by Thomas Zurek (pdf)</a></li>
<li><a href="http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/24672">Infocubes and Data Store Objects ... and HANA - by Thomas Zurek</a></li>
<li><a href="http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/21575">The BW - HANA Relationship - by Thomas Zurek</a></li>
<li><a href="http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/13952">Comparing SAP BW and an Oracle DW - by Thomas Zurek (this is an old blog, but great points about using a DW toolkit vs. loosely coupled tools)</a></li>
<li><a href="http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/21037">BW path to the Data Warehouse plus the role of New Technologies - by Vitaliy Rudnytskiy</a></li>
<li><a href="http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/22486">SAP HANA: What it means for business and for your career - Vitaliy Rudnytskiy</a></li>
<li><a href="http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/24738">SAP In-memory - what is in the bag? - by Vitaliy Rudnytskiy</a></li>
<li><a href="http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/21826">What does SAP mean by "In-Memory"? - by Ethan Jewett</a></li>
<li><a href="http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/22427">Thoughts and questions about the HANA announcement - by Ethan Jewett</a></li>
<li><a href="http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/23069">The latest hot tech vs. fundamental datawarehousing tradeoffs - Remember BW? - by Ethan Jewett</a></li>
</ul>
<p>Some of these blogs are using out of date terminology, which is hard to avoid since SAP seems to change its product names every 6 months. But hopefully if you read them they will give you some insight into the overall situation unfolding around HANA.  With regards to DW/BI and HANA, these blogs address many of those issues as well.  Now, to try answering the questions:</p>
<blockquote>1. Does SAP HANA replace BI?</blockquote>
<p>It's worth noting that HANA is actually a bundle of a few technologies on a specific hardware platform. It includes ETL (Sybase Replication Server and BusinessObject Data Services), Database and database-level modeling tools (ICE, or whatever it's called today), and reporting interfaces (SQL, MDX, and possibly bundled BusinessObjects BI reporting tools). So, in the sense that your question is "does anything change as far as needing to do ETL, modeling, and reporting work to develop BI solutions?", then the answer is no.  If you are asking about SAP's overall strategy regarding BW, then this is open to change and I think the blogs above will give you some answers. The short answer is that I see SAP supporting both the scenario of using BW as a DW toolkit (running on top of BWA or HANA) as well as the scenario of using loosely coupled tools (HANA alone, or the database of your choice with BusinessObjects tools) for the foreseeable future. At least I hope this is the case, as I think it would be a mistake to do otherwise.</p>
<blockquote>2. Will SAP continue 5-10 years down the road to support "Traditional BI"?</blockquote>
<p>I hope so. If you read my last blog listed above you will see that HANA actually solves none of the traditional BI problems, and addresses only a few of them. So we still need "traditional" (read "good old hard work") approaches to address these problems.</p>
<blockquote>3. What does this mean for our RDBMS, meaning Oracle?</blockquote>
<p>Very interesting question. For a long time, SAP has supported competitive products to Oracle offerings. In my view, this was to give SAP and its customers options other than the major database vendors, and to give itself an out in the event that contract negotiations with a major vendor went south. So in a sense, HANA can be seen as maintaining this alternative offering.  Of course, SAP says HANA is more than that, and I think they are right. Analytic DBMSes have been relatively slow catching on and as SAP's business slants more and more towards BI, the fact is that the continued use of traditional RDBMSes in BI and DW contexts has done a lot of damage by making it difficult to achieve good performance. It's a lot easier to sell fast reports than slow reports :-) So that is another driver.  Personally, I don't agree with SAP's rhetoric about HANA being revolutionary or changing the industry. The technologies and approaches used in the ICE are not new, as far as I have seen. As far as changing the industry from a performance or TCO perspective, I'm reserving judgement on that until SAP releases some repeatable benchmarks against competing products. I doubt that HANA will significantly outperform competitive columnar in-memory databases like Exasol and ParAccel.  If you are Oracle, you have a rejuvenated, and perhaps slightly more frightening competitor. I don't think anyone really thought that MaxDB was a danger to Oracle, but HANA holds more potential as a competitor to Exadata. Licensing discussions could get interesting.</p>
<blockquote>4. Is HANA going to be adopted and implemented more quickly on the ECC side than BI side first?</blockquote>
<p>Everything I have seen has indicated that SAP will be driving adoption in BI/Analytic scenarios first and then in the ECC/Business Suite scenario once everyone is satisfied with the stability of the solution. Keep in mind, the first version of HANA is still in ramp-up. SAP is usually very conservative in certifying databases to run Business Suite applications.</p>]]></content:encoded></rss:item><rss:item rdf:about="http://www.esjewett.com/blog/2011/4/9/what-is-a-commodity.html"><rss:title>What is a commodity?</rss:title><rss:link>http://www.esjewett.com/blog/2011/4/9/what-is-a-commodity.html</rss:link><dc:creator>Ethan Jewett</dc:creator><dc:date>2011-04-09T17:15:00Z</dc:date><dc:subject>enterprise software</dc:subject><content:encoded><![CDATA[<p>Lately I've seen several discussions of commoditization in the enterprise software space. Or perhaps "commoditisation", depending on which side of the Great English-language Divide on which you happened to spend your school years. Specifically the claim has been made that applications (ERP, BI, analytics) or skills (programming, project management, for example) are becoming commoditized.</p>
<p>I'm not going to link to any of these claims because my claim is that the word is being subject to rampant misuse and I'd rather not call out anyone specific. The misuse is widespread and I don't think it would be fair to name individuals.</p>
<p>This misuse is a shame. There is a lot of really useful economic and social theory around commoditization. Incorrectly labeling a market trend as "commoditization" creates the incorrect impression that these bodies of theory are applicable, and this can result in incorrect analysis.</p>
<p>See, for example, the insightful discussion of commoditization as a competitive strategy in&nbsp;<a href="http://www.marco.org/4458987911">this blog about Facebook's Open Compute Project by Marco Arment</a>. When we use the term incorrectly, we poison the well from which this sort of analysis is drawn.</p>
<p>So, what is commoditization? Wikipedia, as usual, has&nbsp;<a href="http://en.wikipedia.org/wiki/Commodity">a fairly good definition</a>. To break it down, a commodity is:</p>
<ol>
<li>Undifferentiated - a commodity from supplier X is basically the same as the same product from supplier Y</li>
<li>Fungible - an instance of a commodity can easily be switched for another instance of the same commodity without significant impact on the user of the commodity - in other words, a commodity has low switching costs</li>
</ol>
<p>When a product is a commodity it will usually have a price that is determined by a market of exchange. Markets are not always efficient, and very few products are truly commodities, but there are some products that come fairly close. Wikipedia gives several examples, but here are a couple for your consideration:</p>
<ul>
<li>Salt - All salt is basically the same, and switching from one brand of salt to another has no noticeable impact on the user (not withstanding "premium" salts, for example sea-salt).</li>
<li>Unskilled labor - The initial pay of grocery store shelf-stockers, for example, is determined primarily by market forces or minimum wage as the labor pool is considered fairly undifferentiated and the cost of hiring a different employee is fairly low in some labor markets.</li>
</ul>
<p>So what about enterprise software? I can't think of anything that's a commodity in enterprise software. Maybe servers, as the advent of virtualization and cloud computing begins to lower switching costs (improving fungibility).</p>
<p>What is&nbsp;<strong>not</strong>&nbsp;a commodity in enterprise software? Lots of stuff. Here are some examples:</p>
<ul>
<li>ERP and BI software - Different offerings are still quite differentiated, and switching costs are astronomical because of the amount of customization required of all solutions. Additionally, cloud vendors are now creating data lock-in scenarios that can make it very difficult to migrate old data to a new solution.</li>
<li>Switching from "build" to "buy" does not commoditize a market - An IT department switching from a "build" to a "buy" approach, or a vendor pushing solutions that require less customization, does not result in commoditization of a market. This is because different solutions are still differentiated based on features, performance, or ease-of-use, and because switching costs remain high. Switching costs should be lower when going from a custom solution to an "off-the-self" solution than the other way around, perhaps making the product more commodity-like, but this is a stretch. Implementation costs are still going to be quite high.</li>
<li>Developers and consultants - There is lots of suspect research talking about how the best developers are some multiplier (usually 6-20X) more productive than the average developer. In fact, it's probably worse because this research tends to focus on long-term employment. Because of the time taken for on-boarding (switching costs) and the increased administration costs that come with a larger team, hiring a middling developer or consultant for your project can often make the project progress even more slowly than hiring no one at all!</li>
<li>Tool-kits - Development tool-kits have a real impact on the performance of custom development. The choice to use language A versus language B for a given development project is not academic. The differentiation equation depends on your existing skill-set as well as features of the tool-kit and switching costs are high due to the need for retraining and reorientation.</li>
</ul>
<p>Commodity theory does not apply to any of these areas. The goods are not fungible and the products are differentiated.</p>
<p>Any vendor who says otherwise is probably peddling a subpar product (or labor). Any IT department that believes this is probably making some bad purchasing decisions. And every purchasing department likely talks to their suppliers about how this is a commodity market ... because they are trying to negotiate a better price.</p>]]></content:encoded></rss:item><rss:item rdf:about="http://www.esjewett.com/blog/musing-about-semantics-in-bi.html"><rss:title>Musing about semantics in BI</rss:title><rss:link>http://www.esjewett.com/blog/musing-about-semantics-in-bi.html</rss:link><dc:creator>Ethan Jewett</dc:creator><dc:date>2011-02-28T16:19:00Z</dc:date><dc:subject>bi datawarehouse hana semantic</dc:subject><content:encoded><![CDATA[<p>Recently I've been blogging mostly about SAP's new HANA product and the general in-memory approach. My deeper professional focus is a little further from the metal, in datawarehousing, business intelligence, and planning processes and architectures. Some recent emails, tweets, and discussions have prompted me to get back to my roots ... but roots are hidden and hard to conceptualize. So I brought diagrams!</p>
<p>One of the hard problems in datawarehousing and business intelligence is semantics, or meaning. We need to integrate the semantics in user requirements with the semantics of the underlying systems. We need to integrate the semantics of underlying systems with each other. And we need to integrate the semantics of a system with itself!</p>
<p>That wasn't very clear. Here's an example: Revenue.</p>
<p>Simple right? Not so fast!</p>
<p>Our users want a revenue report. When our finance users say revenue, they might mean the price on the invoice, without any discounts. But our ERP system may display revenue as a number that includes certain types of discounts. (This is the problem of integrating user's semantics with system semantics.) And our other ERP system may include a different mix of discounts in the revenue number. (The problem of integrating the semantics of underlying systems with each other.) Meanwhile, a single SAP ERP system will record revenue from a sales in several different places: On the invoice, in the G/L, maybe in a CO-PA document. Each of these records is going to have a different semantics and it's quite possible that it is difficult to derive the number the system displays to us from the data in the underlying tables. (The challenge of integrating the semantics of systems with themselves.)</p>
<p>Wow! That's just the first line of the P&amp;L statement!</p>
<p>This example is a little contrived, but it's not too far from the truth. At this point, I just want to recognize that this is a tough problem and we really don't have a very good solution to it aside from the application of large amounts of effort. The interesting question to me right now is where this effort is already embedded into our systems (so we don't have to expend as much effort in our implementations) and what affect SAP's new analytics architectures might have in this area.</p>
<p>I promised diagrams and musing, so here we go. I want to talk a little bit about layering semantic representations on top of ERP data models, which tend to be highly optimized for performance and therefore quite semantically opaque. In order to think more clearly about the different ways of doing this and the trade-offs involved, I cooked up some pictures. We'll start simple and move on to more complex architectures.</p>
<p><span class="full-image-block ssNonEditable"><span><img src="http://www.esjewett.com/storage/Semantics in the datawarehouse 1.png?__SQUARESPACE_CACHEVERSION=1321814444816" alt="" /></span></span></p>
<p>This is a naive model of an ERP system. It's got a lot of tables: 5 (multiply by at least 1000 for a real ERP system). These tables have a lot of semantic relationships between themselves that the ERP system keeps track of. It knows which tables hold document headers and which tables hold the line items for those documents. It knows about all the customers, and the current addresses of those customers, and it knows how to do the temporal join to figure out what the addresses of all our customers was in the middle of last year. I don't have much more to say about this. It just is how it is: Complicated</p>
<p><span class="full-image-block ssNonEditable"><span><img src="http://www.esjewett.com/storage/Semantics in the datawarehouse 2.png?__SQUARESPACE_CACHEVERSION=1321814466501" alt="" /></span></span></p>
<p>This is an ERP system that has semantic views built into it. These views turn the underlying tables into something that makes sense to us - we might call them views of business objects. Maybe the first view is all of those customers with start and end dates for each address. And the second view might be our G/L entries with line item information properly joined to document header information.</p>
<p>Interestingly, creating semantic views like this is almost exactly what BW business content extractors do. These extractors have been built up over more than a decade of development. They were built by the application teams, so if anyone knows how the application tables are supposed to fit together, it's the people who built these extractors. There is a lot not to like about various business content extractors but we can't deny the huge amount of semantic knowledge and integration work embedded in these tools.</p>
<p>Other tools, like the BusinessObjects Rapidmart solutions also know how to create semantic views of underlying ERP tables, though Rapidmarts accomplish this in a slightly different way. There is a lot of knowledge and work embedded in these solutions as well.</p>
<p><span class="full-image-block ssNonEditable"><span><img src="http://www.esjewett.com/storage/Semantics in the datawarehouse 3.png?__SQUARESPACE_CACHEVERSION=1321814483669" alt="" /></span></span></p>
<p>When we use the business content extractors with BW, we move the semantic view that the ERP system creates into a structure in the datawarehouse. As long as you use the business content extractors you don't need to worry much about the ERP data models. This diagram shows a fairly traditional datawarehousing approach. The same sort of thing happens with other solutions based on semantic representations of ERP data.</p>
<p><span class="full-image-block ssNonEditable"><span><img src="http://www.esjewett.com/storage/Semantics in the datawarehouse 4.png?__SQUARESPACE_CACHEVERSION=1321814503956" alt="" /></span></span></p>
<p>Another option is to directly replicate our ERP tables into an analytic layer. This is what happens in the case of SAP HANA if you are using Sybase Replication Server to load data into HANA. Notice the virtual semantic views that are created in the datawarehouse system. This work must be done for most ERP data structures, because as we've already discussed, these ERP data structures don't necessarily make any sense on their own. Creating these views is one thing we have been hearing from&nbsp;<a href="http://weblogs.sdn.sap.com/cs/user/view/cs_msg/70574">Vitaliy Rudnytskiy</a>&nbsp;that IC Studio will be used for. Ingo Hilgefort touches on some of the same points in his&nbsp;<a href="http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/22448">blog on the HANA architecture</a>. And Brian Wood also briefly touches on his role in developing semantic views for ERP data in HANA in his&nbsp;<a href="http://www.virtualsapteched.com/index.aspx?url=4pgTM2VA2O2Mnh5V0s%2F4QO+h3CQPnURyqLyxhzipQEz7btPhgHgYF7gG8afP3XDGrpRQd+LSleIF*%7E*8qctBwlvB11Uu8+DdWRKSvmcBG8HJ8Q%3D">TechEd 2010 presentation</a>.</p>
<p>I find that there are two interesting things about this approach, and these are things to watch out for if you are implementing a system like this:</p>
<p>First, whereas the semantic views in the previous diagram are materialized (meaning pre-calculated), these views are not, meaning that they need to be calculated at query run-time. Even on a system as blazing fast as HANA, I can see the possibility of this turning into a problem for certain types of joins. No matter how fast you are going, some things just take time. Vitaliy, again, does a great job of discussing this in his comment on Arun's blog musing on the disruption that HANA may cause to the datawarehousing space:&nbsp;<a href="http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/22570">http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/22570</a>.</p>
<p>The second musing I have is that until SAP or partners start releasing semantic integration content, each customer or systems integrator is going to need to come up with their own strategy for building these semantic views. In some cases this is trivial and it's going to be tough to get wrong, but in a lot of cases the semantics of ERP tables are extremely complex and there will be lots of mistakes made. It is going to take a while for semantic content to reach a usable level, and it will take years and years for it to reach the level of the current business content extractors. Customers who are used to using these extractors with their BW installations should take note of this additional effort.</p>
<p><span class="full-image-block ssNonEditable"><span><img src="http://www.esjewett.com/storage/Semantics in the datawarehouse 5.png?__SQUARESPACE_CACHEVERSION=1321814522089" alt="" /></span></span></p>
<p>The solution to semantic views that are too processing intensive to run in the context of a query is to materialize the view. It is unclear to me whether or not you can use IC Studio to do this in HANA. At worst you can use BusinessObjects Data Integrator to stage data into a materialized semantic view, then query on this view in HANA. Of course, now we are storing data twice in HANA, and these blades aren't exactly cheap!</p>
<p>When we do this, using the tools currently available to us in HANA, we also lose the concept of real time. This is because our ETL process is no longer only a push process using Sybase Replication Server; now there is also a batch ETL process that populates the materialized view. We are back in the same trade-off between load-time complexity and query-time complexity that we face and struggle with in any BI system.</p>
<p><span class="full-image-block ssNonEditable"><span><img src="http://www.esjewett.com/storage/Semantics in the datawarehouse 6.png?__SQUARESPACE_CACHEVERSION=1321814539952" alt="" /></span></span></p>
<p>One possible solution to the second problem mentioned above (the difficulty of building semantic views on very complex and heterogeneous data models), is for SAP and partners to deliver semantic integration logic in a specialized semantic unification layer. We might call this layer the Semantic Layer, which Jon Reed, Vijay Vijayasankar, and Greg Myers discuss very insightfully in this podcast:&nbsp;<a href="http://www.jonerp.com/content/view/380/33/">http://www.jonerp.com/content/view/380/33/</a>. I suspect that this layer will be a central piece in the strategy to address the semantic integration problem that is introduced when we bypass the business content extractors or source datawarehouse structures from non-SAP systems.</p>
<p>This is even possible across source systems in BusinessObjects 4.0 with the use of Universes that support multiple sources, a feature that is new to this release. It is a very powerful idea and I really look forward to seeing what SAP, customers, and partners build on this new platform.</p>
<p>But I'm a little worried about this approach in the context of higher-volume data, and the reason is those stripped arrows crossing the gap between the datawarehouse system and the semantic layer system. If you look back at the previous diagrams, the initial semantic view is always in the same physical system as the tables that the semantic view is based on. Except in the last diagram. In this diagram the semantic view is built on a different platform than the data is stored in.</p>
<p>What does this mean? It means for certain types of view logic, we are going to be in one of two situations: Either we are going to need to transfer the entire contents of all tables that feed the view into the semantic layer, or we are going to need to do large numbers of round-trip queries between the semantic layer and the datawarehouse layer as the semantic layer works to incrementally build up the view requested by the query. Either of these integration patterns is very difficult to manage from a performance perspective, especially when the integration is over a network between two separate systems.</p>
<p>There are ways around this, including (re)introducing the ability to easily move semantically integrated data from an ERP system into a hypothetical future HANA datawarehouse, or tight integration of the semantic layer and the datawarehouse layer that allows the logic in the semantic layer to be pushed down into the datawarehouse layer.</p>
<p>I wonder if we'll see one or both of these approaches soon. Or maybe something different and even better!</p>]]></content:encoded></rss:item><rss:item rdf:about="http://www.esjewett.com/blog/2010/10/31/thoughts-on-whats-next-for-apache-esme.html"><rss:title>Thoughts on what's next for Apache ESME</rss:title><rss:link>http://www.esjewett.com/blog/2010/10/31/thoughts-on-whats-next-for-apache-esme.html</rss:link><dc:creator>Ethan Jewett</dc:creator><dc:date>2010-10-31T17:25:00Z</dc:date><dc:subject>enterprise esme social twitter</dc:subject><content:encoded><![CDATA[<p>I'm a committer on the Apache Enterprise Social Messaging Environment (Apache ESME). At least I think that's what it stands for today. We sort of looked at SAP's approach where the acronym for a product changes every year or so and we maybe went a little too far in the opposite direction, refusing to change the acronym even when we need to change the name - teen vampire romances be damned!</p>
<p>In any case, ESME takes a lot of its cues from Twitter, but with a focus on the needs of the enterprise. To that affect, we built in Scala (which runs on the JVM and provides easy integration with Java code), used David Pollack's super-scalable Lift framework and an actor-model designed by David, then we added features like message pools (which allow for groups of people to exchange non-public messages), the ability to post more than 140 characters, and the ability to follow not just people, but also tags and conversations.</p>
<p>Maybe that gives you the idea, maybe not. You can always go try it for yourself at<a href="http://esmecloudserverapache.dickhirsch.staxapps.net/">http://esmecloudserverapache.dickhirsch.staxapps.net/</a>&nbsp;or&nbsp;<a href="http://incubator.apache.org/esme/#Index-GettingStarted">help out with the project</a>&nbsp;- we can use help in many areas.</p>
<p>We've managed to do lots of other cool stuff in the context of ESME, but what I want to write a little bit about is what I think we still have ahead of us.</p>
<h3>Distributed Twitter and federation</h3>
<p>Talk of a non-centralized version of Twitter sprung up in earnest a couple of years ago with&nbsp;<a href="http://www.scripting.com/stories/2008/01/16/adecentralizedtwitter.html">a post by Dave Winer</a>, the inventor of RSS. The initial context was Twitter's regular downtime as it struggled with scaling, but the larger context quickly because the concern that we can't trust a single company to properly steward an enormous piece of communication infrastructure. The concern is basically about the Facebook-ification of Twitter. GigaOm has a pretty<a href="http://gigaom.com/2010/06/17/what-would-a-more-open-twitter-look-like/">decent overview</a>&nbsp;of the current state of the discussion.</p>
<p>From an enterprise perspective, this concern is even more motivated. Most enterprises still tend towards on-premise software by default and it is unclear if a messaging service is well-suited to a SaaS deployment option. Some companies, like Yammer (pure SaaS) and Status.net (open source - SaaS and on-premise options), are working on delivering a Twitter-like solution for the enterprise, but we aren't there yet.</p>
<p>Key requirements for a distributed Twitter service:</p>
<ol>
<li>Inter-operable Federation - Status.net has worked to introduce the&nbsp;<a href="http://ostatus.org/about">OStatus standard</a>, and this is an excellent start. However, the inter-operability of this protocol is relatively untested. I'd like to see if we can make this work for ESME, but it is going to take some additions to the protocol to manage pooled messages, for example.</li>
<li>Follow any feed - Friendfeed had this capability, and ESME provides it in a bit of a different manner through our actions (though actions -&nbsp;<a href="http://twitter.com/vdichev">Vassil's</a>&nbsp;brainchild - do far more than this). I sometimes think of this capability as light-weight, or one-way, federation.</li>
<li>Real-time updates from federated data source - Not only do we need to be able to follow feeds, we need to get updates from those feeds nearly instantly. PubSubHubbub (PuSH) is probably the most wide-spread solution here, and it is the solution that OStatus uses. But PuSH has weaknesses around authorization of subscriptions to private feeds, and here it would need to be rolled up with another standard like OAuth.</li>
<li>Updates available as (protected) feeds.</li>
</ol>
<p>Status.net seems to be the most on-the-ball with regards to these requirements, but there is a need for variety and for a tool like ESME that was built with business users in mind.</p>
<h3>Real activities and objects as social objects</h3>
<p>One important ability for an socially-oriented messaging system is that it makes business objects into first-class social objects. This is what John Tropea is getting at when&nbsp;<a href="http://libraryclips.blogsome.com/2010/07/16/enterprise-microblogging-needs-a-facelift-to-rival-email">he talks about the ability to follow conversations and tags</a>. These objects should be first-class members of the messaging environment, supporting following and real-time updates.</p>
<p>We should also be able to integrate real business objects and business activities into the system as first-class objects. "I want to follow this customer", should be a desire that we support. Currently we offer a couple of ways to do this:</p>
<ol>
<li>Bring an activity into the ESME system as a message, either via the API or through an action that pulls an RSS feed. This message (or rather, the conversation around it) is a first-class messaging object in ESME, so if people want to see responses to an action, then they can follow the action. For example, I have actions set up on<a href="http://esmecloudserverapache.dickhirsch.staxapps.net/">http://esmecloudserverapache.dickhirsch.staxapps.net/</a>&nbsp;that pull my new Twitter messages and newly created ESME Jira tickets in to my timeline.</li>
<li>Bring an object into the ESME system as a tag, again using the API or an action. The tag then acts as the object that we can follow. We currently allow this as well, and it was used heavily by Sig Rinde in his prototype during his quite awesome&nbsp;<a href="http://blog.thingamy.com/sigs_blog/2010/01/process-engine-social-media-thingamy-and-esme.html">prototype integration of the object-oriented business process engine (OOBPE?) Thingamy with ESME</a>. But we could stand to have some more functionality for extracting tags and metadata from RSS feeds, allowing us to use this tag-as-object approach in a richer way.</li>
</ol>
<p>We're currently thinking about ways to make our system more extensible and further enable the representation of business activities and objects as first-class objects in the messaging environment. We'd certainly like all the help we can get thinking about this topic.</p>
<h3>Easier integration into other software products and environments</h3>
<p>The capability that actions give us to pull in RSS and Atom feeds is really important. It means that ESME can integrate with systems that were not designed with ESME, or even social messaging in general, in mind. In turn, we need to improve our APIs to allow easier integration of ESME into other tools. Part of this involves doing things like providing RSS and Atom representations of timeline, probably via open standards like the&nbsp;<a href="http://activitystrea.ms/">Activity Streams standard</a>.</p>
<p>On the somewhat more complex side, in will probably involve supporting other existing and emerging standards like allowing use of LDAP for authorization, LDAP groups for automatic pool creation, OpenSocial, PubSubHubbub for push-based feeds, using OAuth in our API, providing more semantic and linked information about data via our API, and supporting actions pulling Atom from OAuth-protected resources.</p>
<p>When I put it like that, it sounds like a lot, but it also sounds really exciting!</p>
<p>So, I'm sure I've missed a lot here, but these are just my thoughts about directions I'd like to see ESME move over the next few releases. Got ideas about where the project should go? We'd love to hear them :-)</p>]]></content:encoded></rss:item><rss:item rdf:about="http://www.esjewett.com/blog/2010/10/27/what-does-sap-mean-by-in-memory.html"><rss:title>What does SAP mean by "In-memory"?</rss:title><rss:link>http://www.esjewett.com/blog/2010/10/27/what-does-sap-mean-by-in-memory.html</rss:link><dc:creator>Ethan Jewett</dc:creator><dc:date>2010-10-27T13:00:00Z</dc:date><dc:subject>bi bwa hana in-memory sap</dc:subject><content:encoded><![CDATA[<p>It's been a bit more than 2 years since SAP introduced the "In Memory" marketing push, starting with Hasso Plattner's speech at Sapphire ... or was it TechEd ... my memory fails me ;-)</p>
<p>It has been two years and I have yet to see a good understanding emerge in the SAP community about what SAP actually means when it talks about "In Memory". I put the phrase "In Memory" into quotes, because I want to emphasize that it has a meaning entirely different from the standard English meaning of the two words "in" and "memory". This is a classic case, best summed up by a quote from one of the favorite movies of my childhood:</p>
<blockquote><strong>Vizzini</strong>: HE DIDN'T FALL? INCONCEIVABLE.  <strong>Inigo Montoya</strong>: You keep using that word. I do not think it means what you think it means.</blockquote>
<p>- <a href="http://www.imdb.com/title/tt0093779/quotes?qt0482717">IMDB</a></p>
<p>The only reasonably specific explanation of the "In Memory" term that I have seen from SAP is in <a href="http://www.saptour.ch/landingPagesFr/manager/uploads/6/3_zurek.pdf">this presentation by Thomas Zurek - on page 11</a>.</p>
<p>If you want a coherent, official stance from SAP on "In Memory" and the impact of HANA on BW, I highly recommend reading and understanding this presentation. I think I can add a little more detail and ask some important questions, so here is my take:</p>
<h3>Fact (I think...)</h3>
<p>SAP is talking about at least 4 separate but complementary technologies when it says "In Memory":</p>
<p><strong>1. Cache data in RAM</strong></p>
<p><strong></strong>This is the easy one, and is what most people assume the phrase means. But as we will see below, this is only part of the story.</p>
<p>By itself, caching data in RAM is no big deal. Yes, with cheaper RAM and 64-bit servers, we can cache more data in RAM than ever before, but this doesn't give us persistence, nor does working on data in RAM guarantee a large speedup in processing for all data-structures. Often, more RAM is a very expensive way to achieve a very small performance gain.</p>
<p><strong>2. Column-based storage</strong>&nbsp;</p>
<p>Columnar storage has been around for a long time, but it was introduced to the SAP eco-system in the BWA (formerly BIA, now BAE under HANA - gotta respect the acronyms) product under the guise of "In Memory" technology. The introduction of a column-based data model for use in analytic applications was probably the single biggest performance win for BWA and followed in the footsteps of pioneering analytical databases like Sybase IQ, but it was largely ignored.</p>
<p>Interestingly, Sybase IQ is a disk-based database, and yet displays many of the same performance characteristics for analytical queries that BWA boasts. Further evidence that not all of BWA's magic is enabled by storing data in RAM.</p>
<p><strong>3. Compression</strong></p>
<p><strong></strong>So how do we fit all of that data in to RAM? Well, in the case of BWA the answer is that we don't - it stores a lot of data on disk and then caches as much as possible in RAM. But we can fit a lot more data into RAM if it is compressed. BWA, and HANA, implement compression algorithms to shrink data volume by up to 90% (or so we are told).</p>
<p>Compression and columnar storage go hand-in-hand for two reasons:</p>
<p>a. Column-based storage usually sorts columns by value, usually at the byte-code level. This results in similar values being close to each other, which happens to be a data layout that results in highly efficient compression using standard compression algorithms that make use of similarities in adjacent data. Wikipedia has the scoop here: <a href="http://en.wikipedia.org/wiki/Column-oriented_DBMS#Compression">http://en.wikipedia.org/wiki/Column-oriented_DBMS#Compression</a>&nbsp;</p>
<p>b. When queries are executed on a column-oriented store it is often possible to execute the query directly on the *compressed* data. That's right - for some types of queries on columnar-databases you don't need to decompress the data in order to retrieve the correct records. This is because knowledge of the compression scheme can be built into the query engine, so query values can be converted into their compressed equivalents. If you choose a compression scheme that maintains ordering of your keys (like Run Length Encoding), you can even do range queries on compressed data. This paper is a good discussion of some of the advantages of executing queries on compressed data: <a href="http://db.csail.mit.edu/projects/cstore/abadisigmod06.pdf">http://db.csail.mit.edu/projects/cstore/abadisigmod06.pdf</a></p>
<p><strong>4. Move processing to the data</strong></p>
<p><strong></strong>Lastly, the BWA and HANA systems make heavy use of the technique of moving processing closer to the data, rather than moving data to the processing. In essence, the idea is that it is very costly to move large volumes of data across a network from a database server to an application server. Instead, it is often more efficient to have the database server execute as much processing as possible and then send a smaller result set back to the application server for further processing. This processing trade-off has been known for a long time, but the move-processing-to-the-data approach was popularized relatively recently as a core principle of the Map-Reduce algorithm pioneered by Google: <a href="http://labs.google.com/papers/mapreduce.html">http://labs.google.com/papers/mapreduce.html</a>&nbsp;</p>
<p>This approach is especially useful when an analytical database server (which tends to have high data volumes) implements columnar-storage and parallelization with compression and heavy RAM-caching, so that it is capable of executing processing without becoming a bottle-neck.</p>
<h3>Speculation</h3>
<p>There are also a few technologies that I suspect SAP has rolled into HANA, but since they don't share the detailed technical architecture of the product, I don't know for sure.</p>
<p><strong>1. Parallel query evaluation</strong>&nbsp;</p>
<p>Parallel query execution (sometimes referred to as MPP, or massively-parallel-processing, which is a more generic term) involves breaking up, or sometimes duplicating, a dataset across more than one hardware node and then implementing a query execution engine that is highly aware of the data layout and is capable of splitting queries up across hardware. Often this results in more processing (because it turns one query into many, with an accompanying duplication of effort) but faster query response times (because each of the smaller sub-queries executes faster and in parallel). MPP is another concept that has been around for a long time but was popularized recently by the Map-Reduce paradigm. Several distributed DBMSes implement parallel query execution, including Vertica, Teradata, and hBase</p>
<p><strong>2. Write-persistence-mechanism</strong>&nbsp;</p>
<p>Since HANA is billed as ANSI SQL-compliant and ACID-compliant, it clearly delivers full write-persistence. What is not clear is what method is used to achieve fast and persistent writes along with a column-based data model. Does it use a write-ahead-log with recovery? Maybe a method involving a log combined with point-in-time snapshots? Some other method? Each approach has different trade-offs with regards to memory consumption and the ability to maintain performance under a sustained onslaught of write operations.</p>
<h3>Conclusion</h3>
<p>So, there are still a lot of questions about what exactly SAP means (or thinks it means) when it talks about "In Memory", but hopefully this helps to clarify the concept, and maybe prompt some more clarity from SAP about its technology innovations. There is no denying that BWA was and HANA will be a fairly innovative product, but for people using this technology it is important to get past the facade of an innovative black-box and understand the technologies underneath and how the approach applies to the business, data, or technical problem we are trying to solve.</p>]]></content:encoded></rss:item><rss:item rdf:about="http://www.esjewett.com/blog/elastic-lists-using-protovis.html"><rss:title>Elastic lists using Protovis</rss:title><rss:link>http://www.esjewett.com/blog/elastic-lists-using-protovis.html</rss:link><dc:creator>Ethan Jewett</dc:creator><dc:date>2010-07-03T15:23:00Z</dc:date><dc:subject>bi javascript visualization</dc:subject><content:encoded><![CDATA[<p>I've been seeing more and more list-based visualizations used for data selection showing up in BI software. These types of selection interfaces are especially prominent in&nbsp;<a href="http://www.qlikview.com/">Qlikview</a>&nbsp;and&nbsp;<a href="http://www.sap.com/solutions/sapbusinessobjects/large/business-intelligence/search-navigation/explorer/index.epx">SAP BusinessObjects Explorer</a>&nbsp;(which you can try on the<a href="https://goexplore.ondemand.com/">web</a>).</p>
<p>Ever since seeing Moritz Stefaner's implementation of&nbsp;<a href="http://moritz.stefaner.eu/projects/elastic-lists/">Elastic Lists</a>, I've been a bit dissatisfied with the implementations in enterprise BI tools, including the ones listed above. "Elastic" lists leverage the list format to visualize characteristics of the data by tying the size of the bar representing a column value in the selection list to a metadata metric - in this case the probability that a given column value will occur in a dataset.</p>
<p>In order to help myself understand the strengths and weaknesses of this type of visualization more thoroughly, I started to experiment with list-based visualizations in&nbsp;<a href="http://vis.stanford.edu/protovis/">Protovis (a Javascript-based visualization library using SVG for rendering)</a>. Eventually, I added in elasticity and gave the list selection the power to drive a second visualization. It uses the cars dataset and visualization from the&nbsp;<a href="http://gitorious.org/protovis/protovis/trees/master/examples/cars">Protovis examples</a>&nbsp;to demonstrate driving a second visualization with the list selection. (Note: the coordinates on the second visualization are reversed for reasons that I haven't looked into at the moment.)</p>
<p>That experiment is now working well enough that I thought I'd publish it so that others can comment, use the code (but really, it's a bit of a mess, so be wary), and experiment with the concept. If you want to add some capability, go right ahead and&nbsp;<a href="http://github.com/esjewett/protovis_elastic_lists">fork the project on Github</a>.</p>
<p>For my part, I will likely do a more thorough analysis of list-based visualization in BI tools eventually, but for now I think I can safely say that anywhere a list appears, there is little excuse for lack of "elasticity" in the visualization.</p>
<p>Note: This visualization will only work in browsers that support the SVG standard. It does not work in IE6, 7, or 8. Pretty much any other browser (Firefox, Chrome, Safari, etc.) should work fine.</p>
<p>You can view a static image of the visualization below.</p>
<p><span class="full-image-block ssNonEditable"><span><img src="http://www.esjewett.com/storage/protovis elastic lists 1.png?__SQUARESPACE_CACHEVERSION=1321814599667" alt="" /></span></span></p>]]></content:encoded></rss:item></rdf:RDF>
