8000 GitHub - nektcom/nekt-dev-container
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

nektcom/nekt-dev-container

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Nekt - Notebook for Data Transformation

Welcome to the Nekt's Jupyter Notebook for Data Transformation template. This guide will walk you through the steps to set up your environment and use this notebook to load multiple tables from your Lakehouse, perform necessary data operations or transformations and validate your transformation before executing it on production at Nekt.

This repo gives you instructions to:

  • Run locally using Dev Container: Use Dev Containers to create an isolated enviroment with all the dependencies required to run PySpark transformations, no additional setup required.
  • Run on GitHub Codespaces: GitHub Codespaces gets you up and coding faster with fully configured, secure cloud development environments native to GitHub.

Run locally using Dev Container

Prerequisites

Setup

  1. Open Docker desktop app and wait until the Docker engine is up and running.

  2. Open this repository on VS Code on the root folder. This is what it should look like: image

  3. Open VS Code command pallete (Control+Shif+P on Windows or CMD+Shift+P on MacOS).

  4. Run Dev Containers: Reopen in Container image

  5. The project will reopen on a new window of VS Code. You can click on show log on the bottom right corner to see the progress. Wait until the environment is set up and all dependencies are installed. image

Run on GitHub Codespaces

Setup

  1. Click on Code.

  2. Click on Create codespace on main. image

  3. A new tab will open with a web version of VS Code. Wait until the environment is set up and all dependencies are installed. image

  4. Once everything is ready, this README will show up in VS Code. image

Using the Notebook

The template notebook is pre-filled to help you load tables directly from your Lakehouse, perform the required data transformations, and guide you on the next steps to deploy your transformation code on our platform. Here's a quick overview of how to use it:

  • Load Data: Follow the steps in the notebook to connect to your data sources and load the data into the notebook environment.
  • Transform Data: Utilize the pre-built functions or add your own transformations as needed.
  • Deploy: Instructions are provided within the notebook on how to deploy your code once your data operations are complete.

Deployment

Once you're done with testing your data transformations in the Jupyter notebook, you're ready to deploy your code to Nekt's platform and apply the transformation in a production pipeline. Here's the step-by-step on how to do it:

  1. Perform testing and validation of your Jupyter notebook script to ensure it's working as intended;
  2. Use the 'Final result' section of your Jupyter notebook file to populate the function user_transformation(delta_dict). This is the function you will use to create your transformation in our platform; 2.1. Please read the comments in that section carefully to properly populate the function with your code. There are a few points to pay attention to, all of which are described there.
  3. Go to Add transformation, select your input tables, give your new table a name, and paste the user_transformation(delta_dict) in the code section.
  4. Once configured you will be able to see the details on the Transformations tab of the Nekt app.

We are working hard to allow users to directly add transformations to their workspaces. We will be sure to keep you posted.

Best Practices

  • Isolate Environments: Use pip to manage dependencies. Add new dependencies updating devcontainer.json and Dockerfile.
  • Version Control: Consider using version control (e.g., git) to manage and track changes to your notebook and related files. As your data evolves, you have to make sure your data transformations follow along.

Support

If you encounter any issues or have questions about using the notebook, please message us on Slack or email support@nekt.ai.

We're here to assist you in leveraging our data platform to its fullest potential.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  
0