tarpipe

Make a webhook out of anything

(or ... "How I learned to stop worrying about doing it right, and just make the damn thing work")

I've had this problem for a while:

I use a great service called Instapaper (try it, seriously) for keeping track of my reading list. Which is great. But I want stuff to happen to certain items once I'm done reading something in Instapaper. They should be posted to Twitter, stored in Evernote, or squirreled away in Diigo, del.icio.us, or Pinboard.

This just isn't very achievable. While Instapaper is totally awesome, it does not provide automatic posting to all (or any) of these sites. It doesn't even provide Webhooks, which might provide the ... hook ... to allow for this sort of posting via some service like Yahoo! Query Language (YQL) or Tarpipe. What Instapaper provides is an RSS feed of shared items.

I've previously reviewed how to use YQL to post to Tarpipe, which solves part of the problem. It is possible to use this technique to consume a feed in Yahoo! Pipes and then post each item to Tarpipe. But this isn't really what I want to do, because it will post each item to Tarpipe each time the feed is read. Which is going to result in a lot of duplicate Tweets, Evernote notes, or whatever else I'm having Tarpipe do.

What we need is a Yahoo! Pipe that will only call the special Tarpipe (or other) YQL when there is a new item in the feed. Pipes isn't very good at this, but in steps Google Reader. Pretty much all that Google Reader does is query a feed occasionally and keep track of when a new item appears.

My strategy (and it works, bless Google and Yahoo!'s hearts) is to use Google Reader to check a Pipe and cache the items it has seen in a publicly accessible label. This is a little circular, but the very Pipe that Google Reader is checking pulls the feed I want to webhookify and compares the contents of the feed to the contents of the Google Reader label. If the Pipe sees any items in the feed that aren't yet stored in the Google Reader label, it does its magic on only those items.

I've made the Pipe that does this public at http://pipes.yahoo.com/esjewett/feed_to_webhook_using_google_reader but getting it working take a little doing, which is described below with screenshots. I'll demonstrate using a public feed (Daring Fireball's main article feed, because just about every single article is worth bookmarking), but you could use it on any feed that Yahoo! Pipes can access. I use it on my Instapaper starred items feed, among other things.

Step 1

Determine your Google Reader User ID by selecting your Shared Items feed in Google Reader:

Once you've selected this feed, take note of the URL in the address bar of your browser. It includes a string that is your Google Reader User ID. The ID contains only numbers. That "F" in front of it is not part of the ID and the "%" after it is not part of the ID. Copy this ID down somewhere as you'll need it later.

Step 2

Create your Pipe by cloning http://pipes.yahoo.com/esjewett/feed_to_webhook_using_google_reader

Step 3

Populate the user input fields of the Yahoo! Pipe with the feed you want to use ("http://daringfireball.net/feeds/articles" in our case), the Google Reader User ID from step 1, and the label you are going to make publicly accessible in Google Reader in a later step. The label you choose is important. It needs to be a label that is used for only this purpose and only this feed. Make it unique and call it something that will remind you of its purpose. I'll call mine "Daring Fireball Articles".

Once you've done all this, click the "Run Pipe" button.

Step 4

You should at this point get a list of the latest Daring Fireball articles, or whatever is in the feed that you've chosen to use. Click the button to add the Pipe results to Google Reader.

Click the "Add to Google Reader" button.

Step 5

Now you should be in Google Reader staring at your newly subscribed feed. Click the "Feed settings" button.

Then choose "New Folder..." from the bottom of the list of option, and name the folder whatever you put in as the "Google Reader Label" above. In our case, it is "Daring Fireball Articles".

You should see the feed on the left sidebar, under the folder you just created.

Step 6

Now we have all the infrastructure in place to actually do something with this feed. But we have not yet defined the action that our webhook Pipe will execute. So we need to tweak this pipe slightly. Go back to the pipe. You'll find your cloned version (And you did clone it didn't you?) at http://pipes.yahoo.com/pipes/person.info, at the top.

Edit the pipe.

At the lower-right corner of the edit screen is a loop operator with no module or pipe defined.

Drag any pipe or valid module into this loop. This action will be called exactly once for every new item in the feed you have just defined. If you want to post to Tarpipe, I recommend taking a look at the Pipe http://pipes.yahoo.com/esjewett/post_to_tarpipe_1_0_api (you'll have to clone it as well), which will post each item to tarpipe, using fields you specify as the title and body of the post. But you could call an arbitrary pipe that makes a call to a web service or even YQL.

I am using this Pipe to call a Tarpipe that posts ever Daring Fireball article into Evernote automatically (the Tarpipe workflow key is fake, so don't get any ideas :-)

That's it.

Using this method of setup the pipe will not process existing entries in the feed, but it will process any new entries through the pipe you have assigned to the loop.

One limitation of this particular pipe is that it will not work reliably for feeds that are updated often. This is simply because Google Reader doesn't poll often enough. I have observed that Google Reader polls this feed every 4-8 hours. If more than 8 items are added between polls, older items will not be picked up by Google Reader and will not be processed by the pipe.

Ok, that's it. For real this time.

Tarpipe with YQL

I've been complaining for months that there is apparently no way to get a Yahoo! Pipes -> Tarpipe connection. What I'm really looking to do is drive a Tarpipe workflow using an RSS feed, so when a new item shows up in the RSS or Atom feed it will kick off a Tarpipe workflow for that item.

Unfortunately Tarpipe doesn't natively support feeds to kick off a workflow. It allows for the use of email or a specifically formatted request to one of its two APIs. Yahoo! Pipes meanwhile can consume RSS feeds second to none, but it is limited in the after-effects that it can trigger. Pipes does provide a web service operator that can send a request to an arbitrary URL, but the person building the pipe has very little control over the format of the request, so this is not a satisfactory way to interact with arbitrary APIs, and it doesn't work with Tarpipe's API.

For a couple of months I've been wondering if the relatively new Yahoo! Query Language (YQL) might be the missing link. In perusing the YQL documentation yesterday I noticed that YQL now supports INSERT, UPDATE, and DELETE operations in addition to its SELECT operation. Given the built in ability to make state-changing requests to APIs specified in YQL's Open Data Table (ODT) format, I decided to give it a shot.

I'll be darned, but it works.

I threw up an Open Data Table definition for the Tarpipe 1.0 API on Github, based on the provided YQL examples and the very understandable YQL documentation. Open Data Table definitions are XML files that specify the possible operations on a "table" as well as optional Javascript to be executed when the operation is called.

The complete Open Data Table for the Tarpipe 1.0 API (with the exception of the "image" field) is now available on my Github fork of spullara's fork of the yql-tables repository. Whew. Gotta love Github for creating a simple interface to a tangle of forks.

The stable raw XML file is here.

This definition specifies an ODT that responds to a query of the format

INSERT into tarpipe.process_1_0 (workflow_key,title,body) VALUES ("workflow_key_here","This is my title","This is the body")

Given a valid Tarpipe workflow key, which is found in the definition of a Tarpipe workflow based on a REST receptor, this call will kick off the workflow with the provided title and body parameters.

Since this ODT is not currently included in the standard environment, we also have to specify the ODT definition to use, so the complete query we need to send to YQL looks like

USE "http://www.esjewett.com/files/tarpipe.process_1_0.xml"; INSERT into tarpipe.process_1_0 (workflow_key,title,body) VALUES ("workflow_key_here","This is my title","This is the body")

And that's it. I (or rather Github) have already provided the hosted ODT definition, so all you need to do is run the YQL query above in order to kick off a Tarpipe workflow. The query can be run in any of the manners allowed by the YQL system, including the console available here. Simply replace the text 'workflow_key_here' (keep the double-quotes) with your Tarpipe workflow key and you should be good to go.

Next up, I'm hoping we'll be able to go through how to use this techinque to wire Yahoo! Pipes directly into Tarpipe workflows. I'm not quite there yet as it's tough to get Pipes to ignore old entries, but I think we can figure something out...

UPDATE (7 Aug, 2009) - Due to an obscure incompatibility between Ruby on Rails versions higher than 2.1 and the Yahoo! infrastructure, the github.com versions of the open data table XML files no longer work. I am now hosting the XML file necessary for this blog at http://www.esjewett.com/files/tarpipe.process_1_0.xml. I've changed the code and links above to point to this file.

Introducing jsglue

Last weekend I pushed out the very first version of something I'm calling jsglue to Github. It now lives here: http://github.com/esjewett/jsglue/tree/master

jsglue is in essence a framework for implementing web-connective applications a la Yahoo! Pipes and Tarpipe. Currently it is at best a compliment to those programs and at worst totally useless. In the future I would like to see it or something like it become an alternative to these tools, for a few reasons that I'll eventually get into in later posts.

jsglue does three things:

  • It allows you to register a handler to a path.
    • The handler consists of a path and two pieces of javascript - one that constructs a response to a request sent to that path, and one that constructs one request (and in the future multiple requests, optionally) to another URL.
  • It accepts HTTP requests to paths with registered handlers.
    • When this happens, it creates a response using the handler javascript for this purpose, and it adds a job to a stack that will be processed later.
  • It provides a program that can be run periodically to process the stack of jobs that has built up, sending off new requests as specified by the javascript in the handler associated with each job.

That's it.

Why do I care?

Well, hopefully that will become clear of its own accord. But the key is that the full contents of the original request are exposed to the javascript processing script that constructs the new request. As such, you can do pretty much any kind of processing you like within this handler code, which is user-defined.

So why do you care? Let me count the ways:

1. Receive a request in JSON and spit it back out multi-part form-encoded (in fact, right now this is pretty much the only thing you can do). Ever tried to connect up Yahoo! Pipes with Tarpipe? It doesn't work. With this, it can.

2. Webhooks are great. Webhooks are the facility to have a web application issue an HTTP request to an arbitrary URL when some event happens in the web application. That sounds boring, but it's actually awesome. Webhooks are great, except that no one speaks the same language so every webhook-based solution is bespoke. Bespoke is great in a suit or a coffee mug, but it's bad in web infrastructure. Yahoo! Pipes can't understand webhook calls. Tarpipe usually can't understand them. Most other web applications can't understand them. There needs to be some sort of middle-person.

3. If I'm going to run a ton of my personal data through some middleman web application, you should have the option of running that web application yourself. I'm not saying you will, but I think it would be nice if you could.

Okay, 3 ways is enough for now. We'll get to more later.

Why do you *not* care?

Well, there are lots of reasons for that too.

1. This is dorky. No, there is not a UI. No, it doesn't do much of interest. It's an infrastructure prototype more than anything else. The idea is really that we need infrastructure for building applications that can do this sort of thing. I don't have a lot of time to spare, so I'm willing to just put a framework out there, and maybe a REST-only web-application if I can get the components running on a hosting service (harder than it sounds). I'll leave it to someone else to put the UI on top of it. I'm not convinced that the "pipe" metaphor is correct (I'm partial to "tubes" myself), but I don't have a better idea, so someone else will have to have that idea.

2. This code sucks. Yes it does. I urge you to fork it, improve it, or throw up your hands in disgust and start over. I just want something that does this. I don't really care if it's written by me.

3. There's no way this execution model will fly on a public site, and no one is going to run this on their own server. This is sort a feature of this design that allows your users to execute arbitrary javascript on your server. As such, I think this will primarily find use on private servers, or as a back-end engine for a public site where the inputs are carefully cleansed. Not a recipe for ultra-popularity, I'll grant. But that's not really the point either.

So what's it made out of?

Currently, there are only four main ingredients:

  • Ruby is the implementation language. It's role in jsglue is to serve as duct-tape for the other components.
  • Datamapper is the database interface, allowing you to use pretty much any supported database (I'm using SQLite at the moment).
  • Sinatra for the HTTP web-service interfaces. These interfaces are pretty much a direct mapping onto the database. (REST-ful? Maybe.) (Incidentally, how is it that a minuscule Ruby web-framework beats out FRANK SINATRA in the Google rankings?)
  • Johnson for the Javascript processing.

That's it. It's a couple hundred lines of code. I haven't really counted, or put it on Ohloh.com for that matter, but it can't be more than that. It's got some unit tests. It's going to be changing quickly as I make it more multi-purpose.

I'll document and post examples as they become available.

Syndicate content