Allow to make several requests with hanging connection

@Gerold103

Currently it's impossible to make several requests to an instance, connection to which hanged. Router must slightly balance replica even during callro, we'll call this stateless balancing.

The first problem there is that request use all remaining time hoping for an answer from an instance:

vshard/vshard/router/init.lua

Lines 624 to 627 in 8c6dd62

    
           opts.timeout = tend - fiber_clock() 
        
           local storage_call_status, call_status, call_error = 
        
               replicaset[call](replicaset, 'vshard.storage.call', 
        
                                {bucket_id, mode, func, args}, opts)

However, if connection is not alive, but it's shown as connected, request fails with timed out error and we cannot make another one here, as we have no remaining time.

I propose to introduce new option for router calls and name it smth like request_timeout. It shows, how much time single request has and it must always be <= than the timeout. This option will be exported to crud and users will be able to control its value so that they have no failed requests, which is important in mission critical projects.

The question here, whether we should make such value to be equal e.g. request_timeout = timeout / <number of replicas in rs> or leave as it is now (just request_timeout = timeout). @Gerold103

The second problem we have is that replicaset module itself doesn't change priority of replicas. So, even if request_timeout will be less than timeout, we'll just make several requests to a dead replica (if failover fiber won't wake up between requests).

vshard/vshard/router/init.lua

Lines 710 to 714 in 8c6dd62

    
           if err then 
        
               return nil, err 
        
           else 
        
               return nil, lerror.timeout() 
        
           end

I suppose, that if request failed, we should unconditionally lower its priority. This will cause constant change of priority on dead replicasets, when e.g. privileges are configured incorrectly (however, this case is covered by the backoff procedure), but this will make requests much more solid, they'll fail much more rarely. @Gerold103, your opinion on this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	opts.timeout = tend - fiber_clock()
	local storage_call_status, call_status, call_error =
	replicaset[call](replicaset, 'vshard.storage.call',
	{bucket_id, mode, func, args}, opts)

	if err then
	return nil, err
	else
	return nil, lerror.timeout()
	end

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions