8000 GitHub - ASRCsoft/nwpdownload: Tool for downloading weather forecast datasets
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

ASRCsoft/nwpdownload

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nwpdownload is an extension of the Herbie python package, focused on downloading large datasets for forecast calibration and long-term forecast evaluation.

This package is in an early stage of development, so expect bugs and breaking changes.

Example

import pandas as pd
from nwpdownload import NwpCollection
from dask.distributed import Client

# set up the dask workers
client = Client(processes=False, threads_per_worker=4,
                n_workers=1, memory_limit='2GB')
client.dashboard_link # view info about the tasks and workers

# describe the GEFS data to get
search_0p25 = '|'.join([
    ':TMP:2 m above ground:',
    ':DPT:2 m above ground:',
    ':PRES:surface:'
])
# define the spatial extent for subsetting
nyc_extent = (285.5, 286.5, 40, 41.5)
runs = pd.date_range(start='2021-04-01 12:00', periods=4, freq='D')
fxx = range(3, 24 * 8, 3) # 63 forecast hours going out 8 days

gefs_0p25 = NwpCollection(runs, fxx, 'gefs', 'atmos.25', search_0p25,
                          members=['avg'], save_dir='/path/to/nwp/data',
                          extent=nyc_extent)
gefs_0p25.collection_size() # estimate the complete download size
gefs_0p25.get_status() # summary of existing files
gefs_0p25.download() # download files in parallel with dask

Installation

Installation requires git to be installed.

Running locally

wgrib2 is required. It is available from conda-forge, spack, and RPM. There is no Windows package, but it may work in WSL.

pip install git+https://github.com/ASRCsoft/nwpdownload

Kubernetes cluster

There is no wgrib2 requirement, because it will be installed automatically in the worker containers.

pip install nwpdownload[k8s]@git+https://github.com/ASRCsoft/nwpdownload

Clusters

The package contains functions to set up computing clusters with settings appropriate for parallel downloads (currently only kubernetes).

Kubernetes

Requirements:

  • kubectl on the system running python, configured to connect to kubernetes
  • the dask-kubernetes-operator helm chart must be installed on kubernetes
  • permission to create kubernetes pods in the selected namespace
  • a directory that can be mounted by kubernetes, to store the files
from nwpdownload.kubernetes import k8s_download_cluster
from dask.distributed import Client

# Create a cluster named nwp-demo and store files at
# /path/to/nwp/data. Use port forwarding if running python outside of
# kubernetes.
cluster = k8s_download_cluster('nwp-demo', '/path/to/nwp/data', n_workers=1,
                               threads_per_worker=6,
                               port_forward_cluster_ip=True)
client = Client(cluster)

When creating the NwpCollection, set save_dir="/mnt/nwpdownload". This is where the data directory will be mounted within the worker containers.

gefs_0p25 = NwpCollection(runs, fxx, 'gefs', 'atmos.25', search_0p25,
                          members=['avg'], save_dir='/mnt/nwpdownload',
                          extent=nyc_extent)

Benchmarks

On a kubernetes cluster with a high-speed internet connection, running 600 threads, I was able to download GEFS data from AWS at an average of 700MB/s (5.6Gb/s).

About

Tool for downloading weather forecast datasets

Topics

Resources

License

Stars

Watchers

Forks

357B

Releases

No releases published

Packages

No packages published

Languages

0