8000 GitHub - zarulzakuan/getl: Go ETL Framework
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

zarulzakuan/getl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

getl

getl, pronounced "getle" as in Go ETL is a framework for building pipeline for data integration and data transformation. Similar to water pipeline, data is streamed from source to sink, and transformed in between.

Installation

go get github.com/zarulzakuan/getl
package main

import (
	"github.com/zarulzakuan/getl"
)
...

How it works?

Nodes are the data processors. They are responsible for data extracting; Source, data dumping; Sink or data transformation; Transform. And 2 auxiliary nodes, Tee and Union for splitting data flow into multiple flows and merging multiple flows into a single flow.

 ___________                 _____________                 __________
|           |               |             |               |         |
|  Source   | ====(pipe)====|  Transform  | ====(pipe)====|  Sink   |
| (runner)  |               |  (runner)   |               | (runner)|
|___________|               |_____________|               |_________|

Data pipes are built automatically and will break if any of the pipe inlets is closed. Each node (except the aux nodes) must have a runner, a user defined function to extract, process or write down the data. These runners must satisfy the interface requirement which they must have a data reader and writer as parameters.

Example:

func writeTerminal(writer *io.PipeWriter, input *io.PipeReader) {
   for {
       buff := make([]byte, 50)
       n, err := input.Read(buff)
       if n != 0 {
           println(string(buff[:n]))
       }
       if err != nil {
           break
       }
    }
}

sink := new(getl.SinkNode)
sink.Name = "Write to terminal"
sink.Runner = writeTerminal

Once we have defined all the nodes, we can use chain them together like the following examples:

Example 1:

getl.RunNow("0 *\/1 * * *", time.Local, false).Source(source).Transform(filter).Sink(sink) // run every 1 hour and start immediately

Example 2 - Split into multiple data flows:

ta := getl.TeeAdapter()
getl.RunAt(300, time.Local, true).Source(source1).Tee(ta).Transform(filter).Sink(sink) // run every 5 minutes (300s)
ta.Transform(sort).Sink(sink2)

Example 3 - Join multiple data flows:

dataflow1 := getl.RunNow().Source(source1).Tee(ta).Transform(filter).Sink(sink) // run every 30 minutes
getl.RunNow().Source(source2).Union(dataflow1).Sink(sink)

Readme created from Go doc with goreadme

About

Go ETL Framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0