Another SAP data visualization head scratcher

Maybe I need to make a series of out of these.

This time via a tweet from Andrew Fox: The worst implementation of a pie chart I have ever seen, courtesy of SAP BusinessObjects Visual Intelligence.

Transient

Is there something between Jackets and Sweat-T-Shirts? And what is between Accessories and Dresses? What happened to Overcoats, City Trousers, and Outerwear? Fell through the cracks, I guess.

SAP BusinessObjects Visual Intelligence is supposed to be SAP's answer to industry leading data visualization products like Tableau and Qlikview. Seeing this, it looks more like an advertisement for these competing (I use the term loosely) products.

The full demo video is here.

It's difficult to pay for good information these days

Stephen Few recently published a blog that takes Forrester to task over the value of its publications on the topic of visual business intelligence, and by extension on all topics that Forrester weighs in on. Meanwhile, over on ZDNet, we've got Ed Bott asking why the IT industry continues to listen to Gartner.

I'm not particularly comfortable with the personal tone that Stephen Few takes in some of his diatribes, but both of these posts are fundamentally correct that the big IT analyst firms are strikingly bad at predicting the future and at evaluating the current state of the market. I'll venture that most people think these are exactly the two skills that analyst firms are selling in their reports, and so it is rather surprising that analyst firms like Forrester and Gartner are no better (and often worse) than the rest of us in these areas.

Of course, this isn't actually what these firms are doing with these reports. As far as I can tell, there are three main customers of large IT analyst firm reports:

  1. Large enterprises (i.e. enterprise software customers)
  2. Consulting firms
  3. Vendors

Notably missing from the list are experts in the fields that these firms are covering. Yes, some of these experts work at consulting companies, vendors, and customers, but they make up a surprisingly small percentage of the employees of these companies. I don't think these are the people buying the reports.

My take: Software customers buy these reports to justify buying decisions. This is the 21st century version of "no one gets fired for buying IBM", but now it reads "no one gets fired for buying a leader on the Gartner Magic Quadrant". Consulting firms buy these reports so that they appear knowledgeable and know what products they can recommend to their clients who are reading the same reports. Vendors buy distribution rights to the reports that list them as industry leaders so they can use the reports to reassure their nervous customers during the extensive sales process.

The emperor has no clothes and everyone already knows it, including the emperor. It's just convenient for everyone to keep their mouths shut about the topic. Stephen Few and Ed Bott are hardly the naive child in this scenario. They have not suddenly realized that the emperor has no clothes, but pointing out this fact every once in a while is a good way to advertise that they are selling something different.

Meanwhile, IT analyst firms churn out reports that are designed to be incredibly conservative, informed by research and sales processes that inherently support the status quo. Of course, they pretend to be cutting edge with an eye on disruptive innovation. They've got to, because the idea of disruptive innovation has become the status quo. But in reality it is almost impossible for one of these reports to recognize and recommend a disruptive technology because the very methodology of the reporting process precludes the analysis of truly disruptive vendors.

It's a sad situation, especially as I know some really bright minds work for these firms. These are  people who do understand the industry and do have a feeling for disruptions and innovations. I'm quite glad that Gartner and Forrester now have many of their analysts blogging on these topics, because this is a platform where these people can actually give us a more honest idea of what's going on. But it is telling that when independent analysts join these firms, their blogging tends to drop off as more of their time is plowed into the production of reports and client inquiries.

As it is, I now primarily follow independent analysts (like Curt Monash or Horace Dediu)  and commentators for informative reports on the BI, mobile, and data spaces. And I end up doing a lot of my own research. It seems to me that it's unfortunately difficult to pay for good information these days, so you're often better off not paying.

Toward an analysis of datawarehouse and business intelligence challenges - part 2

(This post is a bit of a blast from the past. It was original published in 2010 and was lost during a migration in 2011.)

This is the second half of an analysis, and with my first post on the topic constitutes my first swipe and listing the current fundamental challenges of the datawarehousing and business intelligence fields. The list is in no particular order and will surely change in the future. It is conceived as the beginning of a framework from which to evaluate new or maturing technologies and architectures from the perspective of applicability to the field.

Aggregating silo-ed data sources

Silos silos silos. Anyone trying to do data analysis has run into this problem: the data exists, but we can't get at it. The technical aspects of this challenge are many (bandwidth, interfaces, and ETL), but it's worth noting that they are usually dwarfed by the cultural and organizational obstacles (default against sharing, departmental rivalries), many of which are in place for good reason (security and permissions concerns, privacy laws).

Representing data in a meaningful way

Historically this feels like one of the least-addressed challenges, but we are finally seeing some serious attention paid to this problem. Challenges in representation of data range from visualization (and the related topic of responsible visualization - as visualization is too often untruthful), to analytical views and tools, through search and guided data exploration.

As we stand, the data in datawarehouses and business intelligence datamarts is too often opaque and misunderstood by most users. Even the most impressive and advanced visualizations and analysis tools (Gapminder, BusinessObjects Explorer, and Qlikview, for example) are still highly guided constructs that are often only applicable to predetermined datasets. We have come a long way (finally) over the last decade, but we have a long way yet to go.

Representing reporting structures

Reporting structures are now fairly well understood, but representing them efficiently in our datawarehouses or BI tools remains a challenge. Some examples of such structures: reporting hierarchies, time-dependency, calculated measures, and derived or dependent characteristics. Challenges revolve around rollup and calculation performance, reorganization due to reporting structure changes, and accessibility to potential users.

Performance

Traditionally this is the "big one" and it is still very much an unsolved problem. Bound by the CAP tradeoff, we are more or less forced to give up either consistency, availability, or partition-tolerance in order to improve performance under constant resources. Two approaches prevail: architectures that give up one or more of the three in exchange for performance, and architectures that attempt to better optimize for the problem-space in order to improve performance while maintaining all three CAP axes. Both are perfectly legitimate approaches, but it will be important to recognize which architectural approach is being pursued in any given product or technology. As a wise person once said, "there is no such thing as a free lunch".

Further complicating matters, there are multiple performance aspects of datawarehouse and business intelligence applications, and we need to be clear which ones we attempt to optimize for. These aspects include query performance (keeping in mind the random access vs. batch/bulk/scan access difference), data loading (ETL) and reorganization, and (in some systems) writeback or changing of data.

Security

Security models pose more of a management problem than a technical problem for datawarehouse and BI applications. Nonetheless, I think they're worth mentioning as a core challenge to keep in mind, just in case someone comes up with a way to make reasoning about security in analytical-processing-oriented datasets less painful.

Data loading

Last but certainly not least, data loading is a perennial headache in datawarehouse and BI systems. The three basic types of data loading (batch, real-time/streaming, and write-back/input) all to some extent conflict with each other. Add to that the complexity of managing a profusion of delta mechanisms (many of which exist for good reason, others of which exist because of careless design) and different interface formats and we've got ourselves a real party. Standardization of interfaces and design practices are the key touchstones of conquering this challenge, but as with many of these challenges, this is more of a human problem than a technical problem.

Conclusion - technical vs. design challenges

If we take one thing away from this enumeration of the challenges of the datawarehouse and business intelligence spaces, I hope it is the fact that most of these challenges are more human in nature than they are technical. They tend to derive from the difficulty in making tradeoff decisions, standardizing interfaces and architectures, identifying and focusing on the problem space, and understanding how people may actually use these systems to greatest effect. Because of this, these challenges are often at least as susceptible to design solutions as they are to pure technical solutions. There is a tendency in the industry to focus on technical answers to these challenges over design answers, perhaps because technical solutions are often more impressive and in some sense physical. I think that's unfortunate.

Chin scratcher in SAP BusinessObjects Mobile

Here's an interesting one.

Transient

What is the point of the slider control at the bottom of this chart? This type of control is only useful for time series or similar arrangements, but one would almost always use a line-, not a bar-chart, to display this type of information. Yet it appears to be the default configuration for bar charts in SAP BusinessObjects Mobile and it is used as seen above in the demo application that SAP provides.

I'm stumped.