Paul Donnelly: Using Yahoo! Pipes and the YQL Module

Paul Donnelly: Using Yahoo! Pipes and the YQL Module


>>PAUL DONNELLY: What I love about Pipes
is that it’s fun to use. I think the initial attraction to Pipes, for most people, is the
editor. What the editor is is this visual tool where you have to drag on these modules,
and then you have to drag these wires from one module to another to basically build your
data flow. It’s kind of like building blocks. As you use Pipes more and more, it actually
becomes really fast to build your custom data feed. Sometimes it’s even faster than if you’re
to code it out yourself. Not only is Pipes fun to use, it’s also very powerful. You can
create new data, you can augment data, decrement data, combine data, and filter on data. There are some limitations with Pipes, and
some of these limitations are solved by using YQL within Pipes. I’ll talk about that a little
bit later. How many people here have used Pipes before?
OK, awesome. For those of you that haven’t used it, you’ll learn how to create a Pipe
here, and those of you that have used Pipes before, hopefully you’ll learn a little bit
more about it. How many people have used YQL before? OK, a lot more people. Excellent.
You can create more powerful Pipes by using YQL within Pipes, and you’ll just have a little
bit more of an advantage because you know how to use YQL. This is the overview, what I’m going to talk
to you about today. I’ll give you a little introduction to Pipes — what is Pipes, why
you should use Pipes. Creating a Pipe — I’m going to start out with a simple Pipe creation.
Extending the functionality with Pipes by using YQL — there are three different ways
that you could go about doing that, and I’ll show you how to actually use YQL within Pipes
using those modules. And finally, using the Pipes data. So what is Pipes? Well, if you haven’t been
there before, you can go to pipes.yahoo.com to actually check it out. Pipes is a visual
programming tool that allows you to create and transform data from the internet. I think
the best way to kind to describe that to you is showing you what a Pipe looks like. Here’s
an example of the editor with a simple Pipe. It’s using the Fetch Feed module and it’s
bringing in two RSS feeds, and you simply drag a wire and you connect it to the Pipe
output to get your single master feed. This is an example of a simple Pipe. It can get
complex, and really complex. And they can grow quite large as well. So what is Pipes again? Pipes is a visual
programming tool that allows you to create and transform data from the internet. People
usually ask me, “what is Pipes?”, and I give them that answer. The next question usually
is “what can I do with Pipes, how will it help me?” and “what do Pipes users currently
do?” Here are some of the popular use cases for Pipes. The number one use case for Pipes
is feed aggregation. This is where you bring in a whole bunch of RSS feeds or data feeds
from REST endpoints and you can filter on it. The second one is two source mashups. An example
of this is if you bring in, let’s say, a Craigslist RSS feed and you want to see a price point
against Amazon. This is another popular use case. Data transformation and geocoding. This is
where you might have a news feed and you want to augment it; you’re basically making that
data more rich so you can actually apply a geocoding filter to it. If you apply it to
a map case you can actually see pinpoints of where a particular article might be. Complex mashups using REST APIs and YQL. An
example of that is, let’s say you want to create a custom search Pipe, you can enter
let’s say a city and this Pipe will actually go out and fetch the weather for a particular
city, Twitter feeds, news APIs, what have you. These are the four popular use cases
for Pipes right now. I’m going to just quickly go through how to
actually get to the Pipes editor. You go to pipes.yahoo.com. You need to be logged in
to Yahoo!. You simply click on the Create a Pipe link. You need to be logged in. To
access the editor, click on Create a Pipe. This is the Pipe editor. This area is where
you drag the modules onto this workspace area, and this dark gray area is where you see the
output of your Pipe. On the left hand corner here is where you can grab your external data,
either RSS, Arrest API, YQL data, Flickr data. You could create your own data using the Item
Builder module. There are a lot of things that you can access data from. CSV data as
well. These are the User Inputs modules. This is
where users can enter data into your Pipe if you needed to. They can enter text, URLs,
that kind of thing. This is the Operator module. This is where you basically… The set of
modules you use the most, these modules will be used to massage the data into the format
that you need. You have a URL Builder module — it helps you construct URLs. Some String
modules that help you with string manipulation. You can apply Translate module to that; that
would translate your feed or data. There’s a Term Extractor module which will actually
extract special keywords from text. We have some Date modules that will help you build
date values and format dates, and we have a location builder that will help you build
geolocated data, and a simple math function. Here’s the list of favorite Pipes that I’ve
tagged throughout the Pipe site, and here’s a list of my Pipes that I’ve created. To drag a module onto the workspace you just
click and drag it. It’s really easy to do. You can drag more than one module on. I’m
just going to demonstrate how you actually wire up a module. It’s really easy to do.
The receptor will light up, showing you where you can actually connect a particular module.
If it doesn’t light up that means you can’t connect to it. This is just a basic framework
of what a Pipe might look like. There’s no data on here, it’s just an empty Pipe with
some modules. So that’s the Pipes editor. After you save a Pipe, every Pipe gets its
own landing page. What a landing page is it’s basically where you can see your Pipe output.
By default every Pipe gets a list view; this is a list rendering of what your Pipe might
have in HTML. If there happens to be image data, you’ll get an image tab and it’ll actually
show you a slideshow of what images are in your feed. It also shows you on the left hand
column here some meta-data of the Pipe. It shows you when it was published, how many
clones were made from this Pipe. You can share your Pipe through various social networks.
You could tag your Pipe so it’s easier to find. It shows what sources are used within
the Pipe and what modules were used to build the Pipe, and a little thumbnail of what it
actually looks like. This is also where you can delete your Pipe,
edit your Pipe, publish your Pipe, and clone your Pipe. This is also the area where you
grab the links that you will use in your application. Let’s say if I want to import this Pipe into
Google Reader, there’s a convenient Google button there, as well as Get as RSS if you
want to import that into your blog or website. There’s also MyYahoo! button there that will
import this Pipe into your MyYahoo! modules. Get as JSON is the link that you’ll use if
you want to use this data in a JavaScript application or any kind of application that
you might need. If there’s geolocated data, you’ll get a… Actually, sorry, this is the
image view of what this Pipe looks like, and here is a map view of geolocated data. So
again, the Pipe’s landing page is basically an easy place to see, share, and change your
data for Pipes. Why would I use Pipes? Well, it’s easy to
use. There’s no new language to learn; you just drag and connect modules together. It’s
a Yahoo! hosted service, so you don’t have to rely on your own server. It’s hosted at
Yahoo!, and you’re leveraging a large infrastructure. We have multiple servers around the world;
if you were to access your data, it’s just a lot quicker than if you were to host it
on your ISP which might just be here in California. And Yahoo! has a large internet Pipe, so it’s
easy for us to access data faster than if you were to access data on your ISP, which
might be bogged down by other services. We also provide easy to use REST style URLs,
so it’s easy to get the data that you need. A lot of data out on the internet is good,
but there are some pieces of data out there that might be malicious, and the Pipe’s engine
actually sanitizes that code, it cleans it, and if a feed happens to be malformed with
the wrong encoding type, the Pipes engine will actually repair it into UTF8 correctly. After a Pipe is run it’s cast for performance,
so you don’t have to wait for it to render. It’s really quick. A lot of us here learned HTML by viewing source
in a browser, and we’d take this same analogy to Pipes — you can view Pipe source. This
is the way developers learn. If you happen to see a Pipe that you like, you can actually
clone it or copy it and further tweak it to provide the data that you need. Pipes has a graphical editor that makes it
quick to create Pipes. You just use a browser, you don’t have to download software or install
a plugin. You just kind of go. It uses HTML5, the Canvas tag, to draw the actual wiring.
This was done before HTML5 was a buzzword. Pipes also parses and consumes popular formats
such as XML, JSON, CSV, and iCal, that a lot of REST APIs out there provide. We had some recent Pipe enhancements. We have
a new engine which was launched this past August. Previously it was a Perl-based engine
which didn’t scale very well; it was kind of slow. It was completely rewritten using
Java, and it’s the same engine as YQL. We also migrated from MySQL to the Data Cloud
Storage new SQL. It’s called Sherpa in Yahoo!. Before, if you created a Pipe and then you
saved it, you might see some lag when you went to your landing page. This is because
there was some replication lag between the MySQL servers. With Sherpa it’s now pretty
instant — you can save a Pipe and almost immediately run it. It’s also quicker to load
your Pipes in the editor. We also upgraded the Pipes Badges to YUI3,
just to make it more current and more stable. Pipes Badges are these snippets where you
can put the list map or image views onto your website or blog. It’s just an easy convenient
way of doing that. So that was a brief introduction on Pipes.
There are many, many talks about Pipes if you want to dig further. Let’s go about creating a Pipe. For this example
I’m going to use… I’m going to create a custom news feed featuring iPhone only articles.
I’m going to aggregate from four different RSS sources, I’m going to filter the titles
only showing iPhone in the title. I’m then going to sort it by date, showing the newest
articles first, and I’m going to unique the titles so I don’t show duplicates. Again, here’s the Pipes homepage. Let’s create
a Pipe. You need to be logged in again, and click on Create a Pipe. I’m going to drag
on the Fetch Feed module and enter in Gadget. Hit refresh, see the output in the debugger.
To add more feeds to the Fetch Feed module, I simply click on the plus button and it adds
another field. I’m going to add Mashable in there. Currently I have 40 items. Hit refresh,
70. Keep on clicking URL. Add Crunch Gear. I have 90 items now, and I’m going to add
Ubergizmo. I have my 4 feeds that I now am aggregating
from, and I want to filter these items. I’ll use the Filter module, and I want to permit
items that match all the following — not any, but all. Here’s a rule set. I can add
multiple rules, but for this example I’m going to use one. I want to make sure that the title
contains iPhone. I’m going to use the Regex matching because it’s a little more powerful
than contains. I’ll use an inline modifier that indicates case insensitivity. Out of
the 120, now I have 3. Now I want to sort it by date, and I notice
that there’s a Y:published node here, so I’m going to sort on that using the Sort module.
Type in Y:published, and I want it in descending order, newest to oldest. I check the pub date
here: November 3rd, November 2nd, and November 1st, so it looks like my sort worked. Then I want to make sure that the titles are
unique. All of these are unique now, but I don’t know that in the future because they’re
coming from different sources, and I want them unique based on title. So I drag on the
Unique module. To finalize the Pipe I connect it to the Pipe
output module, hit refresh. It looks like that’s the data that I want. I can press the
layout button so that actually formats the modules a little bit nicer, and hit the save
button. I’ll call it iPhone News Feed. I’ll hit Run Pipe, which takes me to the landing
page. Because there are images in this Pipe I’ll see an image tab, and here’s the default
list tab. If I want to use the RSS, I can see the raw RSS feed from this Pipe, put that
into my news reader of choice. I could also edit this landing page to a custom name to
make it easier to share. I’ll call it iPhone News Feed. That’s how you create a simple
Pipe. Again, this is the URL if you want to check
out that Pipe to see how it was built. For this Pipe I used the Filter module, which
uses Regex and inline modifiers. It’s more powerful. It’s a little harder to use than
just contains, which only matches on case. But using the Regex part, you could add the
inline modifier ?i to make sure it’s case insensitive for a particular keyword. If you
see Y:published in your output, you know to sort on that because some normalization happened
with the dates. That means you pulled in either an RSS feed or an Atom feed, and the Pipes
engine basically normalized that date to Y:published, and you would sort on that. If you’re only
using RSS feeds than you could sort on pubDate, or somewhere. Here are the various Pipes outputs that are
available to you. RSS, JSON, and Serialized PHP are always available. CSV, iCal, and KML
are optional depending on if that Pipe supports that output. The cool thing with KML is that
you can actually paste the KML Pipes output into the search bar in Google Maps. Here’s
an example of that. Here’s a Pipe map, and I’ll go to more options, copy the link, Get
as KML, I’ll go to Google Maps, paste in the KML link there, and hit search. Now Google
Maps is being powered by Yahoo! Pipes to show the data and have the same pinpoints. You
can zoom in and see the callouts for the data. Pipes does have some limitations. It can’t
authenticate OAuth based systems, and there’s no script module, meaning you can’t use a
traditional programming language to process simple utility type functions. These limitations
can be solved by using YQL within Pipes. This is a really brief intro on YQL. What
is YQL? You can go to that URL to check out the YQL console. YQL is this webservice that
allows developers to query, filter, and combine data. It’s very similar to Pipes; it’s basically
the inverse. It’s not graphical based but text based. YQL uses an SQL-like syntax. It
was made this way so it’s familiar to developers, so you didn’t have to learn a whole brand
new language to use YQL. YQL allows server-side JavaScript executions, so we can actually
perform our utilities there. It’s able to use OAuth for authentication where Pipes can’t
currently. These are the three modules that you can use
in Pipes to bring in YQL data. It’s the Fetch Data module, the YQL module, and the Webservice
module. I’ll go through each module to show you how it works. Extending Pipes with YQL, using the Fetch
Data module — why would I use this? Well, this module supports https, so if YQL gives
you https endpoints, this is the only module that will actually accept it. The other two
don’t support https. It’s easy to paste it in an YQL endpoint; you just copy the REST
endpoint from the YQL console into the Fetch Data module. It has this really cool path
to item list convenience, so you can drill down to the data that you actually want. For this example I’m going to use Twitter.
It’s going to list tweets from people that I follow. For this example I need to go to
dev.twitter.com first to get four items: I need to get my consumer key, consumer secrets,
my access token, and my access token secret. You have to create an application in Twitter
to get these keys. I’m going to copy these four keys, go to my YQL console. I’m going
to use the twitter.statustimeline.friends table, and I’m going to use the Query Builder
which is this tab on the right hand side to give me a form like interface to actually
input the keys that I got from Twitter. Then I’ll hit the test button and it’ll actually
create the query for me so I don’t have to type it in. After it forms the query I’ll
hit test, and I’ll see that the data is returned. It looks correct to me. Then at the bottom here is the REST query
endpoint that I’m going to use within my Fetch Data module in Pipes. I’m going to copy this
link, and I’m actually going to show a little demo on how to do that. Here’s the console.
I can see that I’m getting the correct data. I go down to the REST query, I’m going to
copy it, go to the Pipes editor, drag on the Fetch Data module, paste it in, and I can
see in my debugger that I got some output. I want to expand on that. I have results,
statuses, status. What I really want is just the status nodes, so I’m going to use the
path to item list to drill down on that data, because I don’t need the data that’s on top. Now I have the list that I want to operate
on. Because this isn’t RSS I’m going to need to format it into RSS. The text node here
is the tweet, and the user node has the description of the username that I also want to bubble
up in my RSS feed. I’m going to drag on the Rename module, connect that together, and
I’m going to construct my title and description tags. I know I want my title to be the tweet,
so I’m going to enter item.text. I want to rename that to title since we’re using RSS.
I want the user’s description to be in my description node, so I’m going to rename user.description
to just description. I’m also going to want a placeholder for my
friend’s name, and create a temporary variable node called YouName. I want to find… These
are the title and description tabs that were just created; the tweet and the description.
Now I want to find the name of the user, and I’m going to put that into a temporary node
called YouName. That’s how I’m going to actually prepend the title with the user’s name. In
order to prepend the user’s name in front of the title I have to use the Regex module,
connect that together, and in the title I want to prepend with the user’s name. I can
use a variable substitution ${} to actually bring in that variable. Here we have the user’s
name : and their tweets. To finalize the Pipe you have to drag it to
the Pipe output and hit save. I’ll name it Twitter Friends, and I’ll run the Pipe. Here
we have the list view. We have the people that I follow, their username, their tweet,
and a description of who they are. I could use the Get as RSS links and Get as JSON to
use that data within my application or news reader. That’s how you go about using the
Fetch Data module. This is the link to that example. Again, because
we’re not dealing with RSS data we need to create the title and description nodes. You
could use the ${node_here} without variable substitution to find those nodes and kind
of construct the data that you want using the Regex module. Pipes actually caches data
for 30 minutes, so it’s not exactly real time but it’s still pretty current. The next one is the YQL module. Why would
I use this one? Well, it’s the easiest one to use. This is what the YQL module looks
like. I could just type in my query. For this example, I’m going to actually go to IPO Home,
using a select star from HTML where the URL equals the URL that I want, and I’m going
to use XPath to drill down to the data that I actually want for my application. Pipes
currently doesn’t support XPath, but you can use YQL to get that data for you. Here’s a Pipe with the YQL module. I’ll start
the demo. You just drag it on, and here’s the query. Basically you’re extracting specific
information from this website — I basically want to find all upcoming IPOs. Again, I need
to use a Regex module to construct the RSS feed. When I use the YQL module it actually
imports the HTML as nodes, and I can use the variable substitution to actually drill down
to that and construct my title, link, and description nodes for RSS. From IPO Home I’m actually seeing upcoming
IPOs: Groupon, which launch day, and all IPOs that might have launched this week. I don’t
believe that IPO Home actually offers an RSS feed that features this set of data, so I
went out and created it myself. Here’s the list view. You can use this within your RSS
reader. You could see what IPOs are coming up so you can buy them or not, and that’s
through RSS. Again, here’s a link to that example: you
can go to pjdonnellywork/ipos. And again, we use the YQL module and the Regex module,
using the ${} for the variable substitution. The final way to use YQL within Pipes is using
the Webservice module. What the Webservice module is is it basically allows you to export
Pipes data to another service that’s not on Yahoo! or within Yahoo! to extend the functionality
of the Pipe. Why would I use the Webservice module? Well, it’s good for when you need
to extend Pipe’s utility functions such as randomizing a feed, or getting a string or
word count from a title, etc. This is probably for more experienced Pipes and YQL users. For this example I’m going to count the words
in a title. You should probably be familiar with YQL table creation. Here’s the YQL table
based template that I’m going to use to actually further process my Pipe. You’ll use this template
every time when using the Webservice module with Pipes, so you could basically copy and
paste this when you create a new table. You want to make sure that the key ID is equal
to data, as this is basically where Pipes puts in the data that YQL will process. To
use the server side JavaScript, you’re going to put all your JavaScript within the execute
block. That’s the only part that you really want to worry about. OK, so here’s a quick demo of how to do that.
Here I have a pre-defined Pipes template in my tables. I’m going to just copy and paste
that since it’s always the same. I’m going to create a new YQL table and paste that template
in. Again, I want to make sure that the key node has ID equals data, and I’m going to
rename this table to word count. To put all of your server side JavaScript, you use it
within the execute block. I’m importing a JavaScript parsing library, and I’m going
to parse that JSON data. I make sure that it’s using data there to parse that variable
name. The data that comes in for Pipes is an array, so I’m just going to do a simple
forLoop and iterate over that. It’s always going to come in with the array name items,
so we’re going to make sure we iterate over that. This is where I’m going to augment the data;
I’m going to assign and create a new node called word count. To get the word count I’m
just going to use simple JavaScript. I’m going to find the title, and then I’m going to use
the JavaScript split function to get the word count from the title. And we’re done. It’s
as easy as that. I’m going to add in response.object so YQL
sends back the information to Pipes, and use pData. Again, I’m importing the JavaScript
parser library, I’m parsing the JSON, I’m iterating over it, I’m augmenting the data,
and I’m finding out how many words are in the title. I’m going to save this table, refresh
it. I see that the new table is there, and I’m going to change my query to select star
from word count, where data equals @data, and this is how YQL would get the Pipes data.
I’m going to further shorten this by using a query alias; I’ll just call it word count,
so I don’t have this long REST query as the bottom. Now it’s shortened to just word count.
I’m going to copy this REST query and go back to Pipes. I want to find how many words are in each
title from, let’s say, Digg. I’m going to fetch a feed, bring on the Webservice module,
just paste in that endpoint that I created from YQL, hit refresh. You can see that there’s
data there. Again, I want to just work with the items node, so I’m going to drill down
to that result set using path to item list. Hit refresh. Now I’m just dealing with the
items. I can see that the word count is there now, which is provided by YQL, and I have
one, two, three, four, five, six words, and that’s correct. That’ll be for each article.
That meta data is now there. Just see if my Pipe… This is a simple example of how to
use YQL as a Webservice module. To get to this example you can go to that
URL on top: pjdonnellywork/wordcount, and to see the actual XML that was used in the
YQL table creation, just go to my GitHub location. Also at this location, github.com/hapdaniel/pipes,
there are many YQL execute examples for the Pipes Webservice module. Hapdaniel is a developer
who doesn’t work for Yahoo!, but he’s a huge contributor to Pipes and the YQL community.
He’s really good. So how do we use Pipes data? I’ll talk about
the Pipes URL anatomy and a really easy example with YUI and Pipes. Again, this is the landing
page for Pipes. If you want to use this Pipes data in your RSS feed or reader, you would
just click on Get as RSS, but if you want to use Pipes data in a JavaScript application
or any other type of application that accepts JSON, you can copy and paste the Get as JSON
link. When I copy and paste it, it looks like this. _id has the ID of the Pipe. It’s unique.
_render=json means that I’m going to get JSON output. If I substitute that with RSS or KML,
that’s the type of output I would get. These are the customized parameters that users
can input to your Pipe. This Pipe actually accepts stock tickers. This can be changed
to display stocks that you actually want to see. We provide this JSONP callback. You have
to use _callback=whatever your JavaScript function is going to be. That’s the Pipes URL. If you want to use YUI
with Pipes, you can use the JSONP module. You just use your simple YUI use construct
and then you use the JSONP library. If you go to the yellow code it’s just basically
one line: you do Y.jsonp. The first argument is the Pipes URL, and you want to notice that
_callback is {callback}. This YUI library actually dynamically creates the JSONP callback
name, and then the second argument is the function that will handle the incoming JSON.
This one is just a simple function that’s going to output the data into Y.log or console.log. The other example that’s on the YUI library
right now is using YUI IO. They have a great example on how to use Pipes with that particular
module. That’s the end of my talk. Thank you. [applause]

Author:

Leave a Reply

Your email address will not be published. Required fields are marked *