pipes

Make a webhook out of anything

(or ... "How I learned to stop worrying about doing it right, and just make the damn thing work")

I've had this problem for a while:

I use a great service called Instapaper (try it, seriously) for keeping track of my reading list. Which is great. But I want stuff to happen to certain items once I'm done reading something in Instapaper. They should be posted to Twitter, stored in Evernote, or squirreled away in Diigo, del.icio.us, or Pinboard.

This just isn't very achievable. While Instapaper is totally awesome, it does not provide automatic posting to all (or any) of these sites. It doesn't even provide Webhooks, which might provide the ... hook ... to allow for this sort of posting via some service like Yahoo! Query Language (YQL) or Tarpipe. What Instapaper provides is an RSS feed of shared items.

I've previously reviewed how to use YQL to post to Tarpipe, which solves part of the problem. It is possible to use this technique to consume a feed in Yahoo! Pipes and then post each item to Tarpipe. But this isn't really what I want to do, because it will post each item to Tarpipe each time the feed is read. Which is going to result in a lot of duplicate Tweets, Evernote notes, or whatever else I'm having Tarpipe do.

What we need is a Yahoo! Pipe that will only call the special Tarpipe (or other) YQL when there is a new item in the feed. Pipes isn't very good at this, but in steps Google Reader. Pretty much all that Google Reader does is query a feed occasionally and keep track of when a new item appears.

My strategy (and it works, bless Google and Yahoo!'s hearts) is to use Google Reader to check a Pipe and cache the items it has seen in a publicly accessible label. This is a little circular, but the very Pipe that Google Reader is checking pulls the feed I want to webhookify and compares the contents of the feed to the contents of the Google Reader label. If the Pipe sees any items in the feed that aren't yet stored in the Google Reader label, it does its magic on only those items.

I've made the Pipe that does this public at http://pipes.yahoo.com/esjewett/feed_to_webhook_using_google_reader but getting it working take a little doing, which is described below with screenshots. I'll demonstrate using a public feed (Daring Fireball's main article feed, because just about every single article is worth bookmarking), but you could use it on any feed that Yahoo! Pipes can access. I use it on my Instapaper starred items feed, among other things.

Step 1

Determine your Google Reader User ID by selecting your Shared Items feed in Google Reader:

Once you've selected this feed, take note of the URL in the address bar of your browser. It includes a string that is your Google Reader User ID. The ID contains only numbers. That "F" in front of it is not part of the ID and the "%" after it is not part of the ID. Copy this ID down somewhere as you'll need it later.

Step 2

Create your Pipe by cloning http://pipes.yahoo.com/esjewett/feed_to_webhook_using_google_reader

Step 3

Populate the user input fields of the Yahoo! Pipe with the feed you want to use ("http://daringfireball.net/feeds/articles" in our case), the Google Reader User ID from step 1, and the label you are going to make publicly accessible in Google Reader in a later step. The label you choose is important. It needs to be a label that is used for only this purpose and only this feed. Make it unique and call it something that will remind you of its purpose. I'll call mine "Daring Fireball Articles".

Once you've done all this, click the "Run Pipe" button.

Step 4

You should at this point get a list of the latest Daring Fireball articles, or whatever is in the feed that you've chosen to use. Click the button to add the Pipe results to Google Reader.

Click the "Add to Google Reader" button.

Step 5

Now you should be in Google Reader staring at your newly subscribed feed. Click the "Feed settings" button.

Then choose "New Folder..." from the bottom of the list of option, and name the folder whatever you put in as the "Google Reader Label" above. In our case, it is "Daring Fireball Articles".

You should see the feed on the left sidebar, under the folder you just created.

Step 6

Now we have all the infrastructure in place to actually do something with this feed. But we have not yet defined the action that our webhook Pipe will execute. So we need to tweak this pipe slightly. Go back to the pipe. You'll find your cloned version (And you did clone it didn't you?) at http://pipes.yahoo.com/pipes/person.info, at the top.

Edit the pipe.

At the lower-right corner of the edit screen is a loop operator with no module or pipe defined.

Drag any pipe or valid module into this loop. This action will be called exactly once for every new item in the feed you have just defined. If you want to post to Tarpipe, I recommend taking a look at the Pipe http://pipes.yahoo.com/esjewett/post_to_tarpipe_1_0_api (you'll have to clone it as well), which will post each item to tarpipe, using fields you specify as the title and body of the post. But you could call an arbitrary pipe that makes a call to a web service or even YQL.

I am using this Pipe to call a Tarpipe that posts ever Daring Fireball article into Evernote automatically (the Tarpipe workflow key is fake, so don't get any ideas :-)

That's it.

Using this method of setup the pipe will not process existing entries in the feed, but it will process any new entries through the pipe you have assigned to the loop.

One limitation of this particular pipe is that it will not work reliably for feeds that are updated often. This is simply because Google Reader doesn't poll often enough. I have observed that Google Reader polls this feed every 4-8 hours. If more than 8 items are added between polls, older items will not be picked up by Google Reader and will not be processed by the pipe.

Ok, that's it. For real this time.

Introducing jsglue

Last weekend I pushed out the very first version of something I'm calling jsglue to Github. It now lives here: http://github.com/esjewett/jsglue/tree/master

jsglue is in essence a framework for implementing web-connective applications a la Yahoo! Pipes and Tarpipe. Currently it is at best a compliment to those programs and at worst totally useless. In the future I would like to see it or something like it become an alternative to these tools, for a few reasons that I'll eventually get into in later posts.

jsglue does three things:

  • It allows you to register a handler to a path.
    • The handler consists of a path and two pieces of javascript - one that constructs a response to a request sent to that path, and one that constructs one request (and in the future multiple requests, optionally) to another URL.
  • It accepts HTTP requests to paths with registered handlers.
    • When this happens, it creates a response using the handler javascript for this purpose, and it adds a job to a stack that will be processed later.
  • It provides a program that can be run periodically to process the stack of jobs that has built up, sending off new requests as specified by the javascript in the handler associated with each job.

That's it.

Why do I care?

Well, hopefully that will become clear of its own accord. But the key is that the full contents of the original request are exposed to the javascript processing script that constructs the new request. As such, you can do pretty much any kind of processing you like within this handler code, which is user-defined.

So why do you care? Let me count the ways:

1. Receive a request in JSON and spit it back out multi-part form-encoded (in fact, right now this is pretty much the only thing you can do). Ever tried to connect up Yahoo! Pipes with Tarpipe? It doesn't work. With this, it can.

2. Webhooks are great. Webhooks are the facility to have a web application issue an HTTP request to an arbitrary URL when some event happens in the web application. That sounds boring, but it's actually awesome. Webhooks are great, except that no one speaks the same language so every webhook-based solution is bespoke. Bespoke is great in a suit or a coffee mug, but it's bad in web infrastructure. Yahoo! Pipes can't understand webhook calls. Tarpipe usually can't understand them. Most other web applications can't understand them. There needs to be some sort of middle-person.

3. If I'm going to run a ton of my personal data through some middleman web application, you should have the option of running that web application yourself. I'm not saying you will, but I think it would be nice if you could.

Okay, 3 ways is enough for now. We'll get to more later.

Why do you *not* care?

Well, there are lots of reasons for that too.

1. This is dorky. No, there is not a UI. No, it doesn't do much of interest. It's an infrastructure prototype more than anything else. The idea is really that we need infrastructure for building applications that can do this sort of thing. I don't have a lot of time to spare, so I'm willing to just put a framework out there, and maybe a REST-only web-application if I can get the components running on a hosting service (harder than it sounds). I'll leave it to someone else to put the UI on top of it. I'm not convinced that the "pipe" metaphor is correct (I'm partial to "tubes" myself), but I don't have a better idea, so someone else will have to have that idea.

2. This code sucks. Yes it does. I urge you to fork it, improve it, or throw up your hands in disgust and start over. I just want something that does this. I don't really care if it's written by me.

3. There's no way this execution model will fly on a public site, and no one is going to run this on their own server. This is sort a feature of this design that allows your users to execute arbitrary javascript on your server. As such, I think this will primarily find use on private servers, or as a back-end engine for a public site where the inputs are carefully cleansed. Not a recipe for ultra-popularity, I'll grant. But that's not really the point either.

So what's it made out of?

Currently, there are only four main ingredients:

  • Ruby is the implementation language. It's role in jsglue is to serve as duct-tape for the other components.
  • Datamapper is the database interface, allowing you to use pretty much any supported database (I'm using SQLite at the moment).
  • Sinatra for the HTTP web-service interfaces. These interfaces are pretty much a direct mapping onto the database. (REST-ful? Maybe.) (Incidentally, how is it that a minuscule Ruby web-framework beats out FRANK SINATRA in the Google rankings?)
  • Johnson for the Javascript processing.

That's it. It's a couple hundred lines of code. I haven't really counted, or put it on Ohloh.com for that matter, but it can't be more than that. It's got some unit tests. It's going to be changing quickly as I make it more multi-purpose.

I'll document and post examples as they become available.

Twitter -> Pipes -> ESME

This is a quick how-to, showing how to route your Twitter feed through Yahoo! Pipes to filter and cleanse it, then consume it using ESME actions.

I've created a parametrized Yahoo! Pipe here where you can input your Twitter username and get back a feed of all mentions of "ESME", "esme", or "Esme" from your Twitter timeline.

Pipe with input for Twitter username

(Pipe address: http://pipes.yahoo.com/pipes/pipe.info?_id=fJZ29qrr3RGjLPJI6icw5g)

Once the "Run Pipe" button is pressed, the filtered time-line displays. I've also removed the username from the beginning of the Tweets, for better display in ESME.

Pipe after running

Feel free to use this pipe with your username, or "Clone" it and modify it as you like. The structure of the Pipe as it is now is available as an image here.

Once the pipe has been run, it is possible to retrieve the output as an RSS feed. Copy the link for the RSS feed for this pipe, as we will need to consume it using an ESME action.

Copying the link for the RSS feed of the pipe

Now navigate to your ESME server. We need to use a relatively recent version of ESME, specifically a version including the recently merged actions branch. I'm using http://esmecloudserverapache.dickhirsch.staxapps.net/ in this example.

Log in to the ESME server and create a new action. Fill in the action name as desired. For the test, use the "every N mins" test. I'm chosen 5 minutes as my interval, which means that the pipe feed will be queried every 5 mins for new entries.

For the Action, use the rss: action type, followed directly (no space) by the URL of the Pipes RSS feed that we copied earlier. Your action form should look like this.

ESME action form filled in with Yahoo! Pipe information

Click the "Add" button and your recent Tweets containing the word "ESME" will start showing up on the ESME server. One neat aspect of this is that any hashtags (#esme for example) in your Twitter updates will be converted to ESME tags automatically.

This only works for public Twitter updates, as Yahoo! Pipes doesn't support authentication for feeds. You could query your private Twitter feed directly from the ESME server, embedding your Twitter username and password in the URL, but I strongly recommend against this as your username and password will be stored in the ESME server database, which you do not control.

Pluggability examples in the consumer web

As I was reading about Google's new Google Visualization open-wire protocol API (a mouthful to be sure) and new research from Accenture indicating that "millennials" route around enterprise IT departments (don't I know it), my thoughts turned to how IT can architect its systems so that they to stop behaving as roadblocks for information workers and start behaving as enablers.

I still believe that the basic problem is one of control, but the technologies are emerging to allow for architectures that satisfy demands for control without hamstringing capability. I'm thinking of standards like OAuth and standard APIs OpenSocial or the Google Visualization API here.

A couple thoughts on this topic:

  • Trust (between systems) in the enterprise has been all or nothing. With the emergence of SaaS and the integration of consumer services, this approach doesn't work anymore. Access delegation capabilities are becoming essential. OAuth is part of the answer to this issue. Standards, architectures, and UIs that can handle multiple delegation are probably the end-state solution.
  • System plug-ability or mash-ability is a key component. No single vendor can deliver adequate functionality for information workers, thinkers, or process innovators across industries. Areas like these are simply not well enough understood or structured to provide a delivered solution. Real SOA and mashups, allowing users to define the architecture by dynamically plugging and reorganizing information and process flows between systems are the answer here. This type of self-service SOA for data and process is the next step after self-service reporting on data that the user can't really do anything about. Since we haven't totally figured out how to the reporting yet, I'm not optimistic that we are getting close here, but it's a goal worthy of some work. Thingamy is one view of this world on the process side. Yahoo! Pipes and Tarpipe are examples of this on the information side.

This post has turned into more of a note to myself than anything else, but there's a long tradition of using a blog for this type of activity, and since I can't seem to post in any other way, I might as well throw this up!

Syndicate content