Description
Consider the following case: user wants to insert 1000 rows in a sharded space, and doesn't want to do vshard.router.callrw
1000 times for performance reasons. For that vshard has map_callrw
, which even supports partial map reduce (see bucket_ids
option). However, it accepts only one function and one argument for that function, which will be passed to every instance. Not only user function will have to somehow figure out, which data must be inserted, but also we'll have to send the whole batch to the master, which is very inefficient.
It's proposed to introduce the new map_callrw
option sharded_arguments
(name subject to change), which will allow user to pass arguments in the form of:
args = {
<bucket_id> = {
<some data>
},
<other buckets>
}
If sharded_arguments
is true
, then it's supposed, that this is partial map reduce and bucket_ids
are deduced from the sharded_arguments
. Vshard will pass only needed data to the needed masters.