-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: take_over_counters #4890
base: main
Are you sure you want to change the base?
fix: take_over_counters #4890
Conversation
Signed-off-by: kostas <kostas@dragonflydb.io>
tests/dragonfly/replication_test.py
Outdated
assert await c_blocking.execute_command("BLPOP BLOCKING_KEY1 BLOCKING_KEY2 100") is None | ||
try: | ||
assert await c_blocking.execute_command("BLPOP BLOCKING_KEY1 BLOCKING_KEY2 100") is None | ||
except redis.exceptions.ConnectionError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but why do we have connection error?
is it because takeover already started? if so isnt the solution is to increase a little bit the delay in takeover execution ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adiholden Oh I thought I wrote it in the description. The client fails to connect
with an os error
. The redis lib converts the OsError to connection error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if so isnt the solution is to increase a little bit the delay
Or just handle the ConnectionError. It doesn't eve happen that often 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this kind of a fix that now you see it does not happen that often, but tomorrow someone will change the test and will remove the delay in takeover or something else in the test will change and we will not know this functionality is not tested at all as it might always get to the path of ConnectionError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep but it's also flaky if you just increase the takeover because you don't know when BLPOP
will run by the python's event loop
.
I ended up increasing the takeover delay to 1s. Let's see how that goes in practice 😄
The issue: client fails to connect and redis lib converts
OsError
toConnectionError
and rethrows. This happens rarely and the symptom can be treated by catching the exception.and
redis.exceptions.ConnectionError: Error UNKNOWN while writing to socket. Connection lost.
Fixes # #4533