yahoo

Make a webhook out of anything

(or ... "How I learned to stop worrying about doing it right, and just make the damn thing work")

I've had this problem for a while:

I use a great service called Instapaper (try it, seriously) for keeping track of my reading list. Which is great. But I want stuff to happen to certain items once I'm done reading something in Instapaper. They should be posted to Twitter, stored in Evernote, or squirreled away in Diigo, del.icio.us, or Pinboard.

This just isn't very achievable. While Instapaper is totally awesome, it does not provide automatic posting to all (or any) of these sites. It doesn't even provide Webhooks, which might provide the ... hook ... to allow for this sort of posting via some service like Yahoo! Query Language (YQL) or Tarpipe. What Instapaper provides is an RSS feed of shared items.

I've previously reviewed how to use YQL to post to Tarpipe, which solves part of the problem. It is possible to use this technique to consume a feed in Yahoo! Pipes and then post each item to Tarpipe. But this isn't really what I want to do, because it will post each item to Tarpipe each time the feed is read. Which is going to result in a lot of duplicate Tweets, Evernote notes, or whatever else I'm having Tarpipe do.

What we need is a Yahoo! Pipe that will only call the special Tarpipe (or other) YQL when there is a new item in the feed. Pipes isn't very good at this, but in steps Google Reader. Pretty much all that Google Reader does is query a feed occasionally and keep track of when a new item appears.

My strategy (and it works, bless Google and Yahoo!'s hearts) is to use Google Reader to check a Pipe and cache the items it has seen in a publicly accessible label. This is a little circular, but the very Pipe that Google Reader is checking pulls the feed I want to webhookify and compares the contents of the feed to the contents of the Google Reader label. If the Pipe sees any items in the feed that aren't yet stored in the Google Reader label, it does its magic on only those items.

I've made the Pipe that does this public at http://pipes.yahoo.com/esjewett/feed_to_webhook_using_google_reader but getting it working take a little doing, which is described below with screenshots. I'll demonstrate using a public feed (Daring Fireball's main article feed, because just about every single article is worth bookmarking), but you could use it on any feed that Yahoo! Pipes can access. I use it on my Instapaper starred items feed, among other things.

Step 1

Determine your Google Reader User ID by selecting your Shared Items feed in Google Reader:

Once you've selected this feed, take note of the URL in the address bar of your browser. It includes a string that is your Google Reader User ID. The ID contains only numbers. That "F" in front of it is not part of the ID and the "%" after it is not part of the ID. Copy this ID down somewhere as you'll need it later.

Step 2

Create your Pipe by cloning http://pipes.yahoo.com/esjewett/feed_to_webhook_using_google_reader

Step 3

Populate the user input fields of the Yahoo! Pipe with the feed you want to use ("http://daringfireball.net/feeds/articles" in our case), the Google Reader User ID from step 1, and the label you are going to make publicly accessible in Google Reader in a later step. The label you choose is important. It needs to be a label that is used for only this purpose and only this feed. Make it unique and call it something that will remind you of its purpose. I'll call mine "Daring Fireball Articles".

Once you've done all this, click the "Run Pipe" button.

Step 4

You should at this point get a list of the latest Daring Fireball articles, or whatever is in the feed that you've chosen to use. Click the button to add the Pipe results to Google Reader.

Click the "Add to Google Reader" button.

Step 5

Now you should be in Google Reader staring at your newly subscribed feed. Click the "Feed settings" button.

Then choose "New Folder..." from the bottom of the list of option, and name the folder whatever you put in as the "Google Reader Label" above. In our case, it is "Daring Fireball Articles".

You should see the feed on the left sidebar, under the folder you just created.

Step 6

Now we have all the infrastructure in place to actually do something with this feed. But we have not yet defined the action that our webhook Pipe will execute. So we need to tweak this pipe slightly. Go back to the pipe. You'll find your cloned version (And you did clone it didn't you?) at http://pipes.yahoo.com/pipes/person.info, at the top.

Edit the pipe.

At the lower-right corner of the edit screen is a loop operator with no module or pipe defined.

Drag any pipe or valid module into this loop. This action will be called exactly once for every new item in the feed you have just defined. If you want to post to Tarpipe, I recommend taking a look at the Pipe http://pipes.yahoo.com/esjewett/post_to_tarpipe_1_0_api (you'll have to clone it as well), which will post each item to tarpipe, using fields you specify as the title and body of the post. But you could call an arbitrary pipe that makes a call to a web service or even YQL.

I am using this Pipe to call a Tarpipe that posts ever Daring Fireball article into Evernote automatically (the Tarpipe workflow key is fake, so don't get any ideas :-)

That's it.

Using this method of setup the pipe will not process existing entries in the feed, but it will process any new entries through the pipe you have assigned to the loop.

One limitation of this particular pipe is that it will not work reliably for feeds that are updated often. This is simply because Google Reader doesn't poll often enough. I have observed that Google Reader polls this feed every 4-8 hours. If more than 8 items are added between polls, older items will not be picked up by Google Reader and will not be processed by the pipe.

Ok, that's it. For real this time.

Tarpipe with YQL

I've been complaining for months that there is apparently no way to get a Yahoo! Pipes -> Tarpipe connection. What I'm really looking to do is drive a Tarpipe workflow using an RSS feed, so when a new item shows up in the RSS or Atom feed it will kick off a Tarpipe workflow for that item.

Unfortunately Tarpipe doesn't natively support feeds to kick off a workflow. It allows for the use of email or a specifically formatted request to one of its two APIs. Yahoo! Pipes meanwhile can consume RSS feeds second to none, but it is limited in the after-effects that it can trigger. Pipes does provide a web service operator that can send a request to an arbitrary URL, but the person building the pipe has very little control over the format of the request, so this is not a satisfactory way to interact with arbitrary APIs, and it doesn't work with Tarpipe's API.

For a couple of months I've been wondering if the relatively new Yahoo! Query Language (YQL) might be the missing link. In perusing the YQL documentation yesterday I noticed that YQL now supports INSERT, UPDATE, and DELETE operations in addition to its SELECT operation. Given the built in ability to make state-changing requests to APIs specified in YQL's Open Data Table (ODT) format, I decided to give it a shot.

I'll be darned, but it works.

I threw up an Open Data Table definition for the Tarpipe 1.0 API on Github, based on the provided YQL examples and the very understandable YQL documentation. Open Data Table definitions are XML files that specify the possible operations on a "table" as well as optional Javascript to be executed when the operation is called.

The complete Open Data Table for the Tarpipe 1.0 API (with the exception of the "image" field) is now available on my Github fork of spullara's fork of the yql-tables repository. Whew. Gotta love Github for creating a simple interface to a tangle of forks.

The stable raw XML file is here.

This definition specifies an ODT that responds to a query of the format

INSERT into tarpipe.process_1_0 (workflow_key,title,body) VALUES ("workflow_key_here","This is my title","This is the body")

Given a valid Tarpipe workflow key, which is found in the definition of a Tarpipe workflow based on a REST receptor, this call will kick off the workflow with the provided title and body parameters.

Since this ODT is not currently included in the standard environment, we also have to specify the ODT definition to use, so the complete query we need to send to YQL looks like

USE "http://www.esjewett.com/files/tarpipe.process_1_0.xml"; INSERT into tarpipe.process_1_0 (workflow_key,title,body) VALUES ("workflow_key_here","This is my title","This is the body")

And that's it. I (or rather Github) have already provided the hosted ODT definition, so all you need to do is run the YQL query above in order to kick off a Tarpipe workflow. The query can be run in any of the manners allowed by the YQL system, including the console available here. Simply replace the text 'workflow_key_here' (keep the double-quotes) with your Tarpipe workflow key and you should be good to go.

Next up, I'm hoping we'll be able to go through how to use this techinque to wire Yahoo! Pipes directly into Tarpipe workflows. I'm not quite there yet as it's tough to get Pipes to ignore old entries, but I think we can figure something out...

UPDATE (7 Aug, 2009) - Due to an obscure incompatibility between Ruby on Rails versions higher than 2.1 and the Yahoo! infrastructure, the github.com versions of the open data table XML files no longer work. I am now hosting the XML file necessary for this blog at http://www.esjewett.com/files/tarpipe.process_1_0.xml. I've changed the code and links above to point to this file.

Twitter -> Pipes -> ESME

This is a quick how-to, showing how to route your Twitter feed through Yahoo! Pipes to filter and cleanse it, then consume it using ESME actions.

I've created a parametrized Yahoo! Pipe here where you can input your Twitter username and get back a feed of all mentions of "ESME", "esme", or "Esme" from your Twitter timeline.

Pipe with input for Twitter username

(Pipe address: http://pipes.yahoo.com/pipes/pipe.info?_id=fJZ29qrr3RGjLPJI6icw5g)

Once the "Run Pipe" button is pressed, the filtered time-line displays. I've also removed the username from the beginning of the Tweets, for better display in ESME.

Pipe after running

Feel free to use this pipe with your username, or "Clone" it and modify it as you like. The structure of the Pipe as it is now is available as an image here.

Once the pipe has been run, it is possible to retrieve the output as an RSS feed. Copy the link for the RSS feed for this pipe, as we will need to consume it using an ESME action.

Copying the link for the RSS feed of the pipe

Now navigate to your ESME server. We need to use a relatively recent version of ESME, specifically a version including the recently merged actions branch. I'm using http://esmecloudserverapache.dickhirsch.staxapps.net/ in this example.

Log in to the ESME server and create a new action. Fill in the action name as desired. For the test, use the "every N mins" test. I'm chosen 5 minutes as my interval, which means that the pipe feed will be queried every 5 mins for new entries.

For the Action, use the rss: action type, followed directly (no space) by the URL of the Pipes RSS feed that we copied earlier. Your action form should look like this.

ESME action form filled in with Yahoo! Pipe information

Click the "Add" button and your recent Tweets containing the word "ESME" will start showing up on the ESME server. One neat aspect of this is that any hashtags (#esme for example) in your Twitter updates will be converted to ESME tags automatically.

This only works for public Twitter updates, as Yahoo! Pipes doesn't support authentication for feeds. You could query your private Twitter feed directly from the ESME server, embedding your Twitter username and password in the URL, but I strongly recommend against this as your username and password will be stored in the ESME server database, which you do not control.

Pluggability examples in the consumer web

As I was reading about Google's new Google Visualization open-wire protocol API (a mouthful to be sure) and new research from Accenture indicating that "millennials" route around enterprise IT departments (don't I know it), my thoughts turned to how IT can architect its systems so that they to stop behaving as roadblocks for information workers and start behaving as enablers.

I still believe that the basic problem is one of control, but the technologies are emerging to allow for architectures that satisfy demands for control without hamstringing capability. I'm thinking of standards like OAuth and standard APIs OpenSocial or the Google Visualization API here.

A couple thoughts on this topic:

  • Trust (between systems) in the enterprise has been all or nothing. With the emergence of SaaS and the integration of consumer services, this approach doesn't work anymore. Access delegation capabilities are becoming essential. OAuth is part of the answer to this issue. Standards, architectures, and UIs that can handle multiple delegation are probably the end-state solution.
  • System plug-ability or mash-ability is a key component. No single vendor can deliver adequate functionality for information workers, thinkers, or process innovators across industries. Areas like these are simply not well enough understood or structured to provide a delivered solution. Real SOA and mashups, allowing users to define the architecture by dynamically plugging and reorganizing information and process flows between systems are the answer here. This type of self-service SOA for data and process is the next step after self-service reporting on data that the user can't really do anything about. Since we haven't totally figured out how to the reporting yet, I'm not optimistic that we are getting close here, but it's a goal worthy of some work. Thingamy is one view of this world on the process side. Yahoo! Pipes and Tarpipe are examples of this on the information side.

This post has turned into more of a note to myself than anything else, but there's a long tradition of using a blog for this type of activity, and since I can't seem to post in any other way, I might as well throw this up!

Syndicate content