Availability: Backoff and Retry Policy


The retry and timeout mechanism for a request is controlled by a BackOffer object, which is created one per RawKVClient method. The BackOffer will decide how much time the next sleep and retry should spend, and whether to timeout the request if not enough time is left for retrying the request. If we need a back off sleep, we call backOffer.doBackOff(funcType, exception), and the current thread will sleep for a decided time. If the current operation will timeout after sleep, the doBackOff simply throw an exception to abort the operation.


RegionStoreClient.callWithRetry inherits from AbstractGRPCClient.callWithRetry. The concrete logic is in RetryPolicy.callWithRetry, which implements a retry mechanism, but the specific retry strategy is determined by the ErrorHandler. ErrorHandler’s handler{Request, Response}Error function returns a boolean value indicating whether to retry inside callWithRetry. The control flow for callWithRetry is as follows:


The error handler is chosen obeying the following table:

gPRC requestthe resulthandler
throws exception-handleRequestError
no exceptionis nullhandleRequestError
no exceptionis errorhandleResponseError
no exceptionnormalnormal return

The handleRequestError function copes with the following situations:

situationretry within callWithRetrynote
invalid store in region managertruerefresh ClientStub
region has not got multiple copiesfalse
successfully switched to new leadertrue
seekProxyStoretrue if successonly when tikv.enable_grpc_forward is set

The handleResponseError function copes with the following gRPC errors:

errorretry within callWithRetry
NotLeadertrue if leader unchanged
EphochNotMatchtrue if region epoch in ctx is ahead of TiKV's
Raft ProposalDroppedtrue