[Core][Object Store] Object Store to manage files in the cluster

Description

Object Store can transfer specified files between different nodes.
Object Store can manage the metadata of those specified files. include:
- reference count: delete all the files on the cluster when the reference count is zeros.
- locations: raylet can pull the specified file.
- the local file will fate-sharing with the metadata in the local plasma store. maybe we can insert a callback that will be called when plasma deletes metadata, and delete the corresponding local file.
When disk space is low (When trying to pull the remote file fails), try to evict file metadata in the Object Store.

API

# In CoreWorker A:

# Ray provides a special class: RayFile, which only contains the metadata of the file.
file = RayFile(
    # user to ensure that the file exists, we can support s3, local file, ...
	path = "/path/to/a/exsits/file",
    # Whether to copy the file to the temporary file created by ray for each job (this directory is fate-sharing with the job)
    # default is True.
    enable_copy=false,
)
file_ref = ray.put(file)

# In CoreWorker B:

# This is a synchronous operation. When it is completed, the file has arrived locally and is available.
# This RayFile instance holds a plasma point (such as numpy.ndarray), ensuring that as long as 
# this instance exists, the metadata of this file will exist in the local plasma store, and the local file will not be deleted
file = ray.get(file_ref)
# this file is read-only
with open(file.path, "r") as f:
    ...

Use case

Given scenario

ray cluster description:
1. Node Group A: without GPU, responsible for processing and generating training data. These data are large, generally no smaller than 1GB.
2. Node Group B: with GPU, responsible for training model. Those training processes need to read data from Node Group A.
The files generated by A are consumed by B. Currently, there are two main solutions:
1. Transit by storage services (s3, NFS, NAS.....)
  - cons: Additional expenses. The overall system will become complex.
2. Use RPC to pull remote files
  - cons: Users need to maintain metadata by themselves, and the code complexity is high. And we want users on ray to couple ray as much as possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Description

API

Use case

Given scenario

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Description

API

Use case

Given scenario

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions