8000 GitHub - akteke/rumble: ⛈️ RumbleDB 1.17.0 "Cacao tree" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
  • Insights
  • akteke/rumble

     
     

    Folders and files

    NameName
    Last commit message
    Last commit date

    Latest commit

     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     

    RumbleDB

    With RumbleDB, you can query with ease a lot of different nested, heterogeneous data formats like JSON, CSV, Parquet, Avro, LibSVM, text, etc.

    RumbleDB exposes a query language rather than a DataFrame API, for more flexibility, more productivity but also because a lot of data simply will not fit in DataFrames.

    You can query it in place from any local file systems or data lakes (Azure blob storage, Amazon S3, HDFS, etc).

    You can prepare, clean up, validate your data and put it right into your machine learning pipelines with RumbleDB ML.

    Getting started: you will find a Jupyter notebook that introduces the JSONiq language on top of RumbleDB here. You can also run it locally if you prefer.

    The documentation also contains an introduction specific to RumbleDB and how you can read input datasets, but we have not converted it to Jupyter notebooks yet (this will follow).

    The documentation of the latest official release is available here.

    The documentation of the current master (for the adventurous and curious) is available here.

    About

    ⛈️ RumbleDB 1.17.0 "Cacao tree" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

    Resources

    License

    Unknown and 11 other licenses found

    Licenses found

    Unknown
    LICENSE.txt
    Unknown
    LICENSE-ANTLR.txt
    Unknown
    LICENSE-Apache-Commons-IO.txt
    Unknown
    LICENSE-Apache-Commons-Lang.txt
    Unknown
    LICENSE-Apache-Commons-Text.txt
    Apache-2.0
    LICENSE-Apache-HttpClient.txt
    Unknown
    LICENSE-JLine.txt
    Apache-2.0
    LICENSE-Joda-time.txt
    BSD-3-Clause
    LICENSE-Kryo.md
    BSD-3-Clause
    LICENSE-Laurelin.txt
    Apache-2.0
    LICENSE-Spark.txt
    Apache-2.0
    LICENSE-gson.txt

    Stars

    Watchers

    Forks

    Packages

    No packages published

    Languages

    • Java 83.7%
    • JSONiq 7.1%
    • jq 4.6%
    • Jupyter Notebook 2.9%
    • ANTLR 1.6%
    • HTML 0.1%
    0