-
Notifications
You must be signed in to change notification settings - Fork 882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry handover error during graceful failover #7028
base: main
Are you sure you want to change the base?
Conversation
err := ni.checkReplicationState(nsEntry, info.FullMethod) | ||
if err != nil { | ||
var nsErr error | ||
nsEntry, nsErr = ni.namespaceRegistry.RefreshNamespaceById(nsEntry.ID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm this will keep refreshing the namespace? I mean all requests will perform a refresh upon retry.
also if the handover error is from history service then the refresh here is unnecessary
return err | ||
} | ||
return ni.checkReplicationState(namespaceEntry, fullMethod) | ||
// Only retry on ErrNamespaceHandover, other errors will be handled by the retry interceptor | ||
err := backoff.ThrottleRetryContext(ctx, op, handoverRetryPolicy, common.IsNamespaceHandoverError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also worry this will cause server api latency alert to fire during the handover, we should probably exclude this backoff from nouserlatency metric.
return err | ||
} | ||
return ni.checkReplicationState(namespaceEntry, fullMethod) | ||
// Only retry on ErrNamespaceHandover, other errors will be handled by the retry interceptor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we just let retry interceptor handle the error? and this interceptor just block until ns is no longer in handover state?
What changed?
Retry handover error during graceful failover
Why?
Namespace handover error is transient during graceful failover we can retry on this error.
This is the best effort to reduce the returned error. If the retry exhausted, it still return the handover to caller.
The backoff retry handles the context deadline duration and retry.
How did you test it?
Unit tests.
Potential risks
Documentation
Is hotfix candidate?