GitHub - calebwin/pipelines: An experimental programming language for data flow

Pipelines is a language and runtime for crafting massively parallel pipelines. Unlike other languages for defining data flow, the Pipeline language requires implementation of components to be defined separately in the Python scripting language. This allows the details of implementations to be separated from the structure of the pipeline, while providing access to thousands of active libraries for machine learning, data analysis and processing. Skip to Getting Started to install the Pipeline compiler.

An example

As an introductory example, a simple pipeline for Fizz Buzz on even numbers could be written as follows -

from fizzbuzz import numbers
from fizzbuzz import even
from fizzbuzz import fizzbuzz
from fizzbuzz import printer

numbers
/> even 
|> fizzbuzz where (number=*, fizz="Fizz", buzz="Buzz")
|> printer

Meanwhile, the implementation of the components would be written in Python -

def numbers():
    for number in range(1, 100):
        yield number

def even(number):
    return number % 2 == 0

def fizzbuzz(number, fizz, buzz):
    if number % 15 == 0: return fizz + buzz
    elif number % 3 == 0: return fizz
    elif number % 5 == 0: return buzz
    else: return number

def printer(number):
    print(number)

Running the Pipeline document would safely execute each component of the pipeline in parallel and output the expected result.

The imports

Components are scripted in Python and linked into a pipeline using imports. The syntax for an import has 3 parts - (1) the path to the module, (2) the name of the function, and (3) the alias for the component. Here's an example -

from parser import parse_fasta as parse

That's really all there is to imports. Once a component is imported it can be referenced anywhere in the document with the alias.

The stream

Every pipeline is operated on a stream of data. The stream of data is created by a Python generator. The following is an example of a generator that generates a stream of numbers from 0 to 1000.

def numbers():
    for number in range(0, 1000):
        yield number

Here's a generator that reads entries from a file

def customers():
    for line in open("customers.csv", 'r'):
        yield line

The first component in a pipeline is always the generator. The generator is run in parallel with all other components and each element of data is passed through the other components.

from utils import customers             as customers # a generator function in the utils module
from utils import parse_row             as parser
from utils import get_recommendations   as recommender
from utils import print_recommendations as printer

customers |> parser |> recommender |> printer

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
pipelines		pipelines
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pipelines.nimble		pipelines.nimble

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

An example

The imports

The stream

The pipes

The `where` keyword

The `to` keyword

Getting started

Some next steps

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

calebwin/pipelines

Folders and files

Latest commit

History

Repository files navigation

An example

The imports

The stream

The pipes

The where keyword

The to keyword

Getting started

Some next steps

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

The `where` keyword

The `to` keyword

Packages