Working with Pipes #2: A DIY personalized community with Del​.icio​.us, Flickr and Google Blog Search

It’s
not nec­es­sary to devel­op your own Web 2.0 soft­ware infra­struc­ture to
cre­ate an inde­pen­dent Web 2.0‑powered com­mu­ni­ty online. It’s far
sim­pler to set a stan­dard for your com­mu­ni­ty to use on exisiting
net­works and then to use Yahoo Pipes to pull it together.

I decid­ed on about a dozen cat­e­gories to use with my DIY blog aggre­ga­tor (Quak­erQuak­er).
I only want to pull in posts that are being gen­er­at­ed for my site by
com­mu­ni­ty mem­bers so we use a com­mu­ni­ty iden­ti­fi­er, a unique prefix
that isn’t like­ly to be used by others. 

This post will show you how to pull in tagged feeds from three sources: the Del​.icio​.us social book­mark­ing sys­tem, the Flickr pho­to shar­ing site and Google Blog Search.

Step 1: Pick a community designator

I’ve been using the com­mu­ni­ty name fol­lowed by a dot. The prefix
goes in front of cat­e­go­ry descrip­tion to make a set of unique tags for
the aggre­ga­tor. When some­one wants to add some­thing for the site they
tag it with this “community.category” tag. In my exam­ple, when someone
wants to list a new Quak­er blog they use “quak​er​.blog”, “quak­er” being
the com­mu­ni­ty name, “blog” being the cat­e­go­ry name for the “New Blogs”
page.

Step 2: Collect the community prefix and category name in Pipes


You begin by going into Pipes and pulling over two text inputs: one for
the com­mu­ni­ty pre­fix, the oth­er for the spe­cif­ic category.

Step 3: Construct these into tags


Now use the “String Con­cate­na­tion” mod­ule to turn this into the
“community.category” mod­el. The com­mu­ni­ty input goes into the top slot,
a dot is the sec­ond slot and the cat­e­go­ry input goes into the last slot.

blank Now, when you have a tag in Flickr with a dot in it, Flickr auto­mat­i­cal­ly removes it in the resul­tant RSS feed.
So with Flickr you want your tag to be “com­mu­ni­ty­cat­e­go­ry” with­out a
dot. Sim­ple enough: just pull anoth­er “String Con­cate­na­tion” module
onto your Pipes work space. It should look the same except that it
won’t have the mid­dle slot with the dot.

Step 4: Turn these tags into RSS URLs

blank
Pull three “URL­Builder” mod­ules into Pipes, one for each of the
ser­vices we’re going to query. For the Base, use the non-tag specific
part of the URL that each ser­vice uses for its RSS feeds. Here they are:

Del​.icio​.us http://​del​.icio​.us/​r​s​s​/​tag
Flickr http://​api​.flickr​.com/​s​e​r​v​i​c​e​s​/​f​e​eds
Google Blog Search http://​blogsearch​.google​.com

Under path ele­ments, put the cor­rect tag: for Del​.icio​.us and Google it should be the community.category tag, for Flickr the dot-less com­mu­ni­ty­cat­e­go­ry tag.

Step 5: Fetch and Dedupe

blank Fetch is the Pipes mod­ule that pulls in URLs and out­puts RSS feeds. It can also com­bine them. Send each URLBuilder out­put into the same Fetch routine.

Since it’s pos­si­ble that you’ll might have dupli­cate posts, use the “Unique” mod­ule to dedu­pli­cate entries by URL.
Through a lit­tle tri­al and error I’ve deter­mined that in cas­es of
dupli­cates, feeds low­er in the Fetch list trump those high­er. In the
actu­al Pipe pow­er­ing my aggre­ga­tor I pull a sec­ond Del​.icio​.us feed: my
own. I have that as the last entry in the Fetch list so that I can
per­son­al­ly over­ride every oth­er input.

Step 6: Sort by Date

blank
With exper­i­men­ta­tion it seems like Pipes orders the out­put entries by
descend­ing date, which is prob­a­bly what you want. But I want to show
how Pipes can work with “dc” data, the “Dublin Core” mod­el that allows
you to extend stan­dard RSS feeds (see yes­ter­day’s post for more on this).

Google Blog Search and Del​.icio​.us feeds use the “dc:date” field to
record the time when the post was made. Flickr uses “dc:date.Taken” to
pass on the pho­tograph’s meta­da­ta about when it was tak­en. Pipes’
“Rename” mod­ule lets you copy both fields into one you cre­ate (I’ve
sim­ply used “date”), which you can then run through its “Sort” module.
Again, it’s a moot point since Pipes seems to do this automatically.
But it’s good to know how to manip­u­late and rename “dc” data if only
because many PHP parsers have trou­ble lay­ing it out on a webpage.

Update: it’s all moot: accord­ing to ZDNet blog, “Pipes now auto­mat­i­cal­ly appends a pub­Date tag to any RSS feed that has any of the oth­er allow­able date tags.” This is nice: no need to hack the date every time you want to make a Pipe!

Step 7: Output

blank The final step for any Pipe is the “Pipe Out­put” module.

In action

You can see this pub­lished Pipe here, and copy and play with it your­self. The result lets you build an RSS feed based on the two inputs.