soa

Tarpipe with YQL

I've been complaining for months that there is apparently no way to get a Yahoo! Pipes -> Tarpipe connection. What I'm really looking to do is drive a Tarpipe workflow using an RSS feed, so when a new item shows up in the RSS or Atom feed it will kick off a Tarpipe workflow for that item.

Unfortunately Tarpipe doesn't natively support feeds to kick off a workflow. It allows for the use of email or a specifically formatted request to one of its two APIs. Yahoo! Pipes meanwhile can consume RSS feeds second to none, but it is limited in the after-effects that it can trigger. Pipes does provide a web service operator that can send a request to an arbitrary URL, but the person building the pipe has very little control over the format of the request, so this is not a satisfactory way to interact with arbitrary APIs, and it doesn't work with Tarpipe's API.

For a couple of months I've been wondering if the relatively new Yahoo! Query Language (YQL) might be the missing link. In perusing the YQL documentation yesterday I noticed that YQL now supports INSERT, UPDATE, and DELETE operations in addition to its SELECT operation. Given the built in ability to make state-changing requests to APIs specified in YQL's Open Data Table (ODT) format, I decided to give it a shot.

I'll be darned, but it works.

I threw up an Open Data Table definition for the Tarpipe 1.0 API on Github, based on the provided YQL examples and the very understandable YQL documentation. Open Data Table definitions are XML files that specify the possible operations on a "table" as well as optional Javascript to be executed when the operation is called.

The complete Open Data Table for the Tarpipe 1.0 API (with the exception of the "image" field) is now available on my Github fork of spullara's fork of the yql-tables repository. Whew. Gotta love Github for creating a simple interface to a tangle of forks.

The stable raw XML file is here.

This definition specifies an ODT that responds to a query of the format

INSERT into tarpipe.process_1_0 (workflow_key,title,body) VALUES ("workflow_key_here","This is my title","This is the body")

Given a valid Tarpipe workflow key, which is found in the definition of a Tarpipe workflow based on a REST receptor, this call will kick off the workflow with the provided title and body parameters.

Since this ODT is not currently included in the standard environment, we also have to specify the ODT definition to use, so the complete query we need to send to YQL looks like

USE "http://www.esjewett.com/files/tarpipe.process_1_0.xml"; INSERT into tarpipe.process_1_0 (workflow_key,title,body) VALUES ("workflow_key_here","This is my title","This is the body")

And that's it. I (or rather Github) have already provided the hosted ODT definition, so all you need to do is run the YQL query above in order to kick off a Tarpipe workflow. The query can be run in any of the manners allowed by the YQL system, including the console available here. Simply replace the text 'workflow_key_here' (keep the double-quotes) with your Tarpipe workflow key and you should be good to go.

Next up, I'm hoping we'll be able to go through how to use this techinque to wire Yahoo! Pipes directly into Tarpipe workflows. I'm not quite there yet as it's tough to get Pipes to ignore old entries, but I think we can figure something out...

UPDATE (7 Aug, 2009) - Due to an obscure incompatibility between Ruby on Rails versions higher than 2.1 and the Yahoo! infrastructure, the github.com versions of the open data table XML files no longer work. I am now hosting the XML file necessary for this blog at http://www.esjewett.com/files/tarpipe.process_1_0.xml. I've changed the code and links above to point to this file.

Pluggability redux - Some thoughts on enterprise system trust

For the second post in my pluggability series (series? seriously?), I'll return to the issue of trust, specifically what I was talking about when I said

Trust (between systems) in the enterprise has been all or nothing. With the emergence of SaaS and the integration of consumer services, this approach doesn't work anymore. Access delegation capabilities are becoming essential. OAuth is part of the answer to this issue. Standards, architectures, and UIs that can handle multiple delegation are probably the end-state solution.

Let's take this apart a bit, with the caveat that I just fiddle with security schemes and my real current interest is in the data user interface area, roughly.

There are several different types of trust between systems and I'm not going to capture them all in the discussion below, but the basic types are trusted systems (where we ask the question, "Do I, system X, trust the integrity of system Y?"), identity verification and authentication ("Do I trust that this user/system is who it says it is?"), and authorization ("Given that this user/system is identified, what access do I trust it with?").

This is all relatively straightforward in the classic situation of a user trying to log onto a system. Yes, the user is trusted. Identity is verified via a username/password pair and we protect against man-in-the-middle attacks using a VPN. Authorization is provided for on a user-by-user basis.

It's also straightforward in the circumstance where a server is attempting to communication with another server. Again, integrity is assumed because the server is known, authentication is carried out under one of a number of schemes, and authorization is on a per-system basis.

But when we're now in a world of agent systems acting on behalf of users, and these systems start to fall apart. How do I treat a system that claims it is acting on behalf of a user? Do we grant authorization appropriate to the user, or just punt on the question and let the trusted system figure it out? How can we be sure that the system is acting on behalf of the user that it says it is acting on behalf of. What if the system requesting access is one we've never heard of before? Do we trust it?

This is the problem of access delegation. (To be clear, this is not the related problem of single-sign-on.) There are a couple of general approaches to access delegation in the enterprise, neither of which access delegation:

  • Punt and let the trusted system figure out the requisite authorizations - this is clearly not an acceptable approach for much sensitive data unless the authorization scheme is designed as a whole across systems, in which case we're not really talking about two separate systems and we certainly aren't looking at a pluggable architecture that might at some point become user-definable.
  • Implement trust along with a single-sign-on mechanism that assumes the trusted system will vouch that user X is requesting access (usually by sending along the ID and password of the user requesting access). Authorizations assigned to user X can then be afforded to requests coming from the trusted system on behalf of that user. This is not delegation, this is impersonation - the approach is better from an authorizations standpoint and allows systems to be somewhat "pluggable" at the developer level, but it doesn't address two concerns:
    • There is no capability inherent in such an architecture for user-driven pluggability because trust is still all or nothing with IT deciding which systems are to be trusted for impersonation (as it should).
    • This type of system simply treats the requesting system the same way it treats the user itself. In many situations we may want to give a requesting system less access or different access from the user. We may want to give different requesting systems different access, even when they are requesting on behalf of the same user. For example, we might have a trusted, stand-alone internal reporting system and a less trusted prototype web-based dashboard system. Certain classes of data may simply be deemed to sensitive for prototype systems, even if the user is trusted with the general class of access. Such an architecture leaves little room for this sort of granularity except on an implementation-by-implementation basis.

So what would have to happen to allow for user-driven pluggability?

First, authorization schemes that allow for layered access controls by user and system. In some organizations, access from anonymous systems and applications on behalf of users would be allowed, and this access would be granted at a default minimal level (display only, no access to sensitive data, etc.).

Second, in all organizations a path to request and receive certification of a given application at a particular access level should be provided for. I don't have high hopes for this in a world where a large number of companies still block GMail at their corporate firewalls, but a boy can dream.

Third, standards for access delegation must become available so as to allow for automatic interoperability of delegated access systems. We still may need an open loop in an enterprise environment, as in step 2, but it shouldn't be technically required. OAuth is probably the best bet here, as all enterprise schemes appear to be pretty much vendor-specific. Hopefully the IETF push will start to propel OAuth into some enterprise vendor's products, though I'm not holding my breath.

Fourth, working out the user interface is an ongoing major issue. Single delegation is hard enough right now, but my impression is that the OAuth community is getting pretty close to nailing it down. Multiple delegation is going to be a real headache, but people are starting to think about it, and there's no question in my mind that it's where we need to get for pluggability to be really workable while maintaining (Question: why do I bother writing this stuff when someone at Google has already done it?)

This all begs for a bit of an example. Google's example (link) is a good combined consumer-web/enterprise thought experiment, but it's a target for a few years out, certainly. More simple examples done well (reporting systems signing onto data-warehousing or operational systems, for example) are hard enough to come by in the enterprise as is. Hopefully that'll be content for another post.

Pluggability examples in the consumer web

As I was reading about Google's new Google Visualization open-wire protocol API (a mouthful to be sure) and new research from Accenture indicating that "millennials" route around enterprise IT departments (don't I know it), my thoughts turned to how IT can architect its systems so that they to stop behaving as roadblocks for information workers and start behaving as enablers.

I still believe that the basic problem is one of control, but the technologies are emerging to allow for architectures that satisfy demands for control without hamstringing capability. I'm thinking of standards like OAuth and standard APIs OpenSocial or the Google Visualization API here.

A couple thoughts on this topic:

  • Trust (between systems) in the enterprise has been all or nothing. With the emergence of SaaS and the integration of consumer services, this approach doesn't work anymore. Access delegation capabilities are becoming essential. OAuth is part of the answer to this issue. Standards, architectures, and UIs that can handle multiple delegation are probably the end-state solution.
  • System plug-ability or mash-ability is a key component. No single vendor can deliver adequate functionality for information workers, thinkers, or process innovators across industries. Areas like these are simply not well enough understood or structured to provide a delivered solution. Real SOA and mashups, allowing users to define the architecture by dynamically plugging and reorganizing information and process flows between systems are the answer here. This type of self-service SOA for data and process is the next step after self-service reporting on data that the user can't really do anything about. Since we haven't totally figured out how to the reporting yet, I'm not optimistic that we are getting close here, but it's a goal worthy of some work. Thingamy is one view of this world on the process side. Yahoo! Pipes and Tarpipe are examples of this on the information side.

This post has turned into more of a note to myself than anything else, but there's a long tradition of using a blog for this type of activity, and since I can't seem to post in any other way, I might as well throw this up!

Syndicate content