1. The Architecture and Software behind Pipes

    Now that Pipes got an open source release, I’d like to present this dive into the architecture of the software. I hope it will be helpful for anyone trying to build a similar system, and especially for developers wanting to customize Pipes CE.

    Software overview

    Pipes goal is to provide infrastructure for users to work with data from the internet. It provides a visual interface where data flows from block to block and each block does one operation. Internally, each data item is an item in an RSS feed that gets sent from block object to block object, and also external data input is often an RSS feed, though it is possible to fetch data from webpages and to send XML or JSON data to webhook blocks. A typical example for what users do with Pipes is combining multiple feeds into one, filtering them, or creating feeds for sites that do not provide them. It definitely stands in the tradition of Yahoo! Pipes, hence the name.

    Pipes example

    Pipes is a Ruby/Sinatra application, uses SQLite as database engine and Raphaël for the main part of the frontend, the editor. Ruby works well here, since it is a very expressive language with a healthy ecosystem around XML (oga, nokogiri) and some useful modules for RSS/ATOM normalization (feedparser, feedjira), and creation (included). Sinatra provides the webserver parts of the system and the HTML views, this part is rather small. SQLite is the data store for everything, including the pipes serialized to JSON and cache data. And finally Raphaël abstracts creating the SVG part of the javascript powered editor, painting the blocks and manipulating them.

    Portier is used as the auth system. It enables logins without having to store passwords. Users log in with their email address and only have to confirm that they control that address, by either clicking on a link sent to them or, for Gmail, logging in with their Google account. That work is done by an external broker, Pipes just has to check the token sent back by that system.

    Backend

    I will start with a description of the backend, because defining the data structure is also how Pipes started.

    It began with the realization that a pipe can be thought of as a tree. The pipe’s output object is the root of the tree, the block that connects to its input are its children, and so on, until you reach the blocks that have no inputs, the leaves. That will usually be a Feed or a Download block, though that later expanded to also allow other Pipes (as represented by one block), Twitter or the Webhook block. Running a pipe means asking the root object of the tree for output, which will ask its children for input, which iterates through the whole tree until blocks that provide input are encountered. This data then flows back through the tree up to the Output block, with each block doing its own operations on the transported data.

    So when you look at this pipe, where two Feed blocks fetch some data, one is filtered, then both are combined into one feed and that combined feed set as pipes output:

    pipes example in editor

    You are actually looking at this execution tree:

    pipes example tree

    The code that enables this structure is minimal. There is a Pipe object, defined in pipe.rb. There is also an abstract Block object, defined in block.rb, the parent class for all blocks. Those two classes together define the pipe execution flow. An instantiated subclass of Block has an array of inputs, containing other blocks. When the system runs a stored pipe, this happens:

    1. A Pipe object is created
    2. That object fetches the JSON structure as stored in the database and sets the output block as its root
    3. Then it creates the block objects that are the inputs of the current block, by recursively calling createInputs on its children and their children.
    4. To finally get the output, it calls Pipe.run(), which calls Block.run(), which will go through the tree as described above.
    5. The actual data manipulation work is done in a process function each indiviudal block has to define on its own. Look for example at the process function of the FilterBlock.

    To define pipe execution like that has a hidden advantage: When you want to see the output of a pipe at a specific node of the tree, at one specific block, all you have to do is set that block as the root element of a new Pipe.

    This describes almost all classes and files of the backend. Additionally, there is a server.rb, the entrypoint for the Sinatra application and where all routes as well as some helper functions are defined. twitterclient.rb is a small wrapper around the twitter gem and used by the twitter block. downloader.rb is central for most pipes, as it is used by the Feeds and the Download block to fetch data, and the central cache is set here that prevents requesting data from other servers too often. The Gemfile describes the requirements, the config.ru makes the application runnable by the usual ruby commands and also initializes the Moneta session store, that prevents users from being logged out when the server process restarts. HTML templates are stored under views/, ERB is used for them. The individual blocks are stored under blocks/. Images, Javascript, CSS etc are in public/, accessible to users.

    As you probably noted, this is all bundled into one application. The way a pipe works, it would be easy enough to divide the Sinatra part of the application and the Pipes part, into what could also be called server frontend and worker backend. That would provide a way to scale horizontally, over multiple processes and even multiple servers. So far this just has not been necessary, upgrading the server has been enough to keep pipes.digital running. But keep that option in mind if you run into performance limitations in deployments with a big amount of individual pipes.

    Frontend

    The website uses server side rendering, as provided by Sinatra. But that describes only the starting point and the less important parts of the frontend adequately. The heart is the editor that gets initialized by a HTML template (views/editor.erb), but then fetches and sends data via AJAX, more akin to a Single Page Application. It is a big SVG canvas powered by Raphaël, with absolutely positioned HTML input elements mixed into it, to make the blocks configurable. Its code lives in public/pipes-ui.js.

    The main objects here are blocks and connectors. Connectors can be Inputs or Outputs, and there exist TextInputs and TextOutputs for the text fields that double as parameters when calling a pipe. Let’s look at the Filter block as an example.

    Filter Block

    In simplified form it is defined like this:

    function FilterBlock(x, y) {
        Block.call(this, 1, 1, x, y, 'Filter', 200, 150);
        
        this.userinputs[0].style.width = '150px';
        this.userinputs[0].required = true;
        this.userinputs[0].placeholder = 'keyword';
        
        var textinput = new TextInput(x + 29, y + 37, this);
        this.textinputs.push(textinput);
        
        var field = document.createElement('select');
        var all = document.createElement('option');
        all.value = 'all';
        all.text = 'all';
        var content = document.createElement('option');
        content.value = 'content';
        content.text = 'item.content';
        
        field.appendChild(all);
        field.appendChild(content);
        
        field.style.position = 'absolute';
        field.setAttribute('data-xoffset', 30);
        field.setAttribute('data-yoffset', 75);
        field.style.left = x + 30 + 'px';
        field.style.top = y + 75 + 'px';
        field.style.width = '150px';
        field.name = 'field';
        document.querySelector('#program').appendChild(field);
        this.userinputs.push(field)
    }
    

    What happens here? First, Block.call creates a Block object. The constructor of the Block object (that’s just the function Block) creates the needed SVG elements. The two 1s define the number of Inputs and Outputs the block will have, x and y are the position on the canvas, 'Filter' the title as shown in the title bar of the block, and finally the width and height are set to 200 and 150.

    The Block constructor also creates a userinputs array and adds one HTML input element to it, a text box. That array is where all the inputs of a block have to be stored. That way, other code like the drag’n drop handlers know which absolutely positioned inputs belong to which block. This first userinput element is then changed, a width is set, it is marked as required and it gets a placeholder.

    Next, it also gets a TextInput, allowing connections to TextBlocks. That’s the green circle at the left of the text input shown above.

    The code that follows shows how to add additional userinput elements, in this case a HTML select element with two options. Note how elements like this are added to the DOM (since they are absolutely positioned and regular HTML elements) and also to the userinputs array, for the link to the block.

    Not shown here but working similar to the userinputs and textinputs arrays is the deco array that does the same thing for Raphaël SVG elements, used for example to add text and lines to blocks.

    Other main classes, but something you probably won’t need to change, is Connector, the parent of Input and Output, the half-circles at the left and right of input blocks. They manage the block connections.

    When a pipe is saved these steps happen:

    1. The function save() is called.
    2. It calls serialize()
    3. serialize iterates through the blocks array and creates a JSON representation of their current state, including which blocks they connect to
    4. This JSON array gets sent to the backend

    While I’m happy with the resulting user interface, this approach to building it involves a lot of manual coding work and is not modern. It would be quite interesting to see an implementation of this interface based for example on Vue, as long as it involves no javascript build process. Not bloating the development process with such an abomination was a high priority goal when starting the project.

    Database

    SQLite is used to store all data in a file. The file database.rb creates all tables and the functions for accessing them.

    Important for understanding how Pipes works are only two: users and pipes. Let’s look at users first:

    CREATE TABLE IF NOT EXISTS users(
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        email TEXT UNIQUE
    );
    

    Every user get its own id and a field for the email address. Note how that address is not the id, to make it easier to change it later.

    The data for each pipe is stored in the second table:

    CREATE TABLE IF NOT EXISTS pipes(
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        pipe TEXT,
        title TEXT,
        description TEXT,
        user INTEGER,
        preview TEXT,
        public BOOLEAN DEFAULT 0,
        publicdate INTEGER DEFAULT 0,
        date INTEGER DEFAULT CURRENT_TIMESTAMP,
        FOREIGN KEY(user) REFERENCES users(id)
    );
    

    The purpose of most of these fields should be clear. Each pipe gets its own id, can have a title and a description and is linked to a user id. For shared pipes public is set to 1. Most important is the pipe column. It contains a JSON representation of a pipe, as created in the frontend when serializing the editor content. That JSON structure is later transformed into the tree, as described above.

    Outlook: Open development questions

    The above describes Pipes as it is today. But what could future development look like?

    Much of the development so far was user driven. Exchanges by email or in the issue tracker lead to improvements, big and small, like being able to run javascript before crawling sites, create CSS selector for fetching data from sites by clicking on the target elements in an overlay, or keeping media enclosures. I’m sure more will follow, but that development is unpredictable to me.

    One thought I return to quite often is the focus on RSS feeds. Right now, that is baked hard into the software, as the data exchange format between blocks. That plus normalization is a big part of what makes the current approach as powerful as it is. But I often wonder whether there could be a better approach, something that still allows manipulating RSS feeds and creating them as pipe output, but making it more straightforward to work with raw text, XML, JSON or other structured data instead. That was also the starting point of the usecase focused blog series describing how Pipes can be used today. Finding a better concept could allow supporting more usecases (how would they look like?) and also remove the need for normalizing RSS/ATOM/JSON-Feeds before manipulating them, which could become an optional step of the feed block. One advantage there would be avoiding the issue of sometimes removing parts of the original feed not covered by that normalization.

    Splitting the backend into a server part and a multiprocess program running the pipes would be an optimization useful for big deployments. It could also be nice when deploying to multicore servers with weak cores, like those ARM servers scaleway used to offer or a Raspberry Pi at home. Resque could work well for that.

    Porting the user interface to a more modern javascript library could be worthwhile if it would result in new capabilities for the user, maybe even supporting new usecases that way.


    Questions, suggestions? Join the pipes gitter channel or open a github issue.

    onli,
  2. Pipes CE: A free version of Pipes

    If you ever wanted to run Pipes locally or look at the source code, now you can. Pipes CE is an AGPL licensed version of Pipes. Right now it is almost equal to what is run on pipes.digital, minus stuff like billing code and with a slightly different color scheme.

    /blog/upload/pipesce.png

    That’s a big step! It moves Pipes away from being a closed project to it being free software, software that can survive even without the involvement of this site. If you are a developer, have a look at the Github repository, and if you have want to see some new code merged into the core you now have the option to send in a pull request.

    onli,
  3. How to get notified of new YouTube videos in your feed reader

    This is part 6 in a series explaining how pipes works and what you can do with it. The article before this showed how to embed tweets into RSS feeds created by pipes.

    Some sites have working RSS feeds that can be used to easily feed them into pipes, but they hide them. Regular discovery can’t work when the feed is not linked in the head of a page, which is exactly what happens sometimes. The biggest offender here is probably YouTube. But since the feeds do exist and their url follows a known scheme, pipes can access them anyway.

    Let’s say I want to see all new videos added to the T90Official Age of Empires 2 channel. Load that page:

    /blog/upload/youtube_t90.jpg

    All we need is the url of the channel, https://www.youtube.com/channel/UCZUT79WUUpZlZ-XMF7l4CFg. Some channels also have a nicer to look at user url, like https://www.youtube.com/user/GamersNexus. That would also work.

    Create a new pipe, drag a feed block from the left toolbar into the editor area and connect it to the pipe output to the right:

    /blog/upload/youtube_feedblock.png

    Now the created feed contains items with links to all new uploaded videos:

    /blog/upload/youtube_t90_feed.png

    The second YouTube element that can be accessed are playlists. For example the game rating discussions by GamersGlobal:

    /blog/upload/youtube_gg_playlist.png

    Set the url https://www.youtube.com/playlist?list=PLK9Cx_P99XLIBMkdW48UMA8OCnwBXUEVZ as target of a feed block. And the feed will now contain the videos in the playlist:

    /blog/upload/youtube_playlist_feed.png

    Both type of feeds get updated as one would expect when new videos get added.

    Having those feeds is useful to filter out videos you don’t like, combine multiple channels into one or just to be notified about new videos of your favorite channel not on the YouTube website or app, but in your feed reader.

  4. How to convert tweets to RSS feeds

    This is part 5 in a series explaining how pipes works and what you can do with it. Read part 1 to learn how to filter feeds, part 2 to see how to combine feeds, part 3 to create feeds for sites that have none and part 4 to convert a shortened feed to a full text RSS feed.

    Maybe you do not want to read tweets on Twitter, or you want to embed a bunch of them somewhere else. It might be useful to access them as an RSS feed, and in the process be able to sort and combine them into one bigger feed. That’s something pipes can help you with.

    Like always we will first create a new pipe.

    Now in the editor, start by dragging a Tweets block from the left toolbar into the editor area. Pipes supports three modes for accessing tweets. Let’s start with the search, just enter a keyword. Note how in the inspector first tweets appear:

    /blog/upload/tweets_search.jpg

    The second method is accessing a hashtag by prepending #. I also added a truncate block, that way we can limit how big the final feed will be:

    /blog/upload/tweets_hashtag.jpg

    And the third option is accessing the public tweets of a user account, by entering @username:

    /blog/upload/tweets_username.jpg

    You can combine all three. You place one tweets block for every type of tweet you want to access and lead them to a combine block. Remember that the combine block adds the feeds together block by block, so you almost always want to add a sort block behind it:

    /blog/upload/tweets_combined.jpg

    A further small improvement is possible: If like in our example the tweet blocks might output the same tweets, those duplicates can be filtered out by adding a unique block to the mix. Here I also changed the sort order to the more twitter like descending:

    /blog/upload/tweets_combined_unique.jpg

    And that’s it. Adding tweets to your pipe is as simple as adding a tweets block. The result is a working RSS feed (with a rather strange mix of tweets in this example) you could use anywhere you want:

    /blog/upload/tweets_preview.jpg

    You can see the pipe I created for this article here. You could fork it to access the tweets for hashtags, accounts and search keywords you would prefer.

  5. How to make a full text RSS feed out of a shortened feed

    This is part 4 in a series explaining how pipes works and what you can do with it. Read part 1 to learn how to filter feeds, part 2 to see how to combine feeds and part 3 to create feeds for sites that have none.

    Some sites prefer not to put the whole article into their RSS feed. For them, having an RSS feed is nice to inform users that new articles are available, but they want you to visit the site, to see the article in the full design, and often to have a chance to show ads. As a reader you might want to respect that. But if not, that’s okay as well. Pipes often can transform such a feed into a full text feed, and here I will show how that works.

    The prior article explaining how to create a feed for a site will be useful for this, because that’s basically what we will do to get the full text feed content, but with the original feed as a starting point.

    Like always we will first create a new pipe.

    First drag a feed block into the editor area. Enter the feed url. In this example, we will also truncate it to get only the two newest feed items: If we already crawl the site it is only fair to keep the load low. But that’s optional.

    /blog/upload/fullfeed_truncated.jpg

    From the shortened feed we need the links to the articles. This time I decided to take them from the guid element, the link element would have worked as well. Use an extract block to get the text stored in those elements:

    /blog/upload/fullfeed_extract_link.jpg

    Next comes downloading these articles. But since we have multiple links, we need a foreach block:

    /blog/upload/fullfeed_foreach.jpg

    Place it, then add a download block next to it, and then drag the download block onto the foreach block (the foreach block should turn yellow when hovering it with the download block). This means: For each url given to this block, download it:

    /blog/upload/fullfeed_foreach_download.jpg

    Since we need multiple elements from the downloaded articles, at this point duplicate the output of the foreach block with a duplicate block:

    /blog/upload/fullfeed_foreach_download_duplicated.jpg

    The next step is extracting the elements we need for the full text feed from the article pages. Again it’s the extract block that does the work, and which selectors to pick depends on the HTML structure of site and will vary a lot. But this time, unlike when using a download block on its own, the downloaded site is wrapped inside items of an RSS feed, that’s the output of the foreach block. Thus activate the option Start at item.content on each extract block when entering the CSS selectors. For the example site they look like this:

    /blog/upload/fullfeed_full_extract.jpg

    A feed builder block is the only regular part of the pipe missing. Place it near the extract blocks and connect them together, set a title and connect its output to the pipe output at the right (which I forgot to do before taking this screenshot):

    /blog/upload/fullfeed_createfeed.jpg

    One last thing is not perfect yet: images. Some sites like the example site load them via Javascript only when they become visible. That will not work in a feed since most readers will not execute the Javascript, even if it were inlined into the article text. But we can fix that: The real image link is normally stored in a data element, here it is data-src. With one replace block remove the current src that just has a placeholder, with a second transform data-src to src:

    /blog/upload/fullfeed_imagefix.jpg

    Now the images load. The whole feed looks good:

    /blog/upload/fullfeed_preview.jpg

    And that’s everything you need to create a full text RSS feed when the site you want to follow just delivers a shortened RSS feed.

    You can see the pipe I created for this article here. You can fork it to use it as a starting point to transform the shortened feed of your favorite site into a full text feed.