Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: take_over_counters #4890

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

fix: take_over_counters #4890

wants to merge 2 commits into from

Conversation

kostasrim
Copy link
Contributor

@kostasrim kostasrim commented Apr 4, 2025

The issue: client fails to connect and redis lib converts OsError to ConnectionError and rethrows. This happens rarely and the symptom can be treated by catching the exception.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
dragonfly/replication_test.py:1249: in block_during_takeover
    assert await c_blocking.execute_command("BLPOP BLOCKING_KEY1 BLOCKING_KEY2 100") is None
/usr/local/lib/python3.8/dist-packages/redis/asyncio/client.py:611: in execute_command
    conn = self.connection or await pool.get_connection(command_name, **options)
/usr/local/lib/python3.8/dist-packages/redis/asyncio/connection.py:1064: in get_connection
    await self.ensure_connection(connection)
/usr/local/lib/python3.8/dist-packages/redis/asyncio/connection.py:1097: in ensure_connection
    await connection.connect()
/usr/local/lib/python3.8/dist-packages/redis/asyncio/connection.py:296: in connect
    await self.on_connect()
/usr/local/lib/python3.8/dist-packages/redis/asyncio/connection.py:401: in on_connect
    await self.send_command("CLIENT", "SETINFO", "LIB-VER", self.lib_version)
/usr/local/lib/python3.8/dist-packages/redis/asyncio/connection.py:505: in send_command
    await self.send_packed_command(

and
redis.exceptions.ConnectionError: Error UNKNOWN while writing to socket. Connection lost.

Fixes # #4533

Signed-off-by: kostas <kostas@dragonflydb.io>
assert await c_blocking.execute_command("BLPOP BLOCKING_KEY1 BLOCKING_KEY2 100") is None
try:
assert await c_blocking.execute_command("BLPOP BLOCKING_KEY1 BLOCKING_KEY2 100") is None
except redis.exceptions.ConnectionError as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but why do we have connection error?
is it because takeover already started? if so isnt the solution is to increase a little bit the delay in takeover execution ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adiholden Oh I thought I wrote it in the description. The client fails to connect with an os error. The redis lib converts the OsError to connection error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if so isnt the solution is to increase a little bit the delay

Or just handle the ConnectionError. It doesn't eve happen that often 🤷‍♂️

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this kind of a fix that now you see it does not happen that often, but tomorrow someone will change the test and will remove the delay in takeover or something else in the test will change and we will not know this functionality is not tested at all as it might always get to the path of ConnectionError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep but it's also flaky if you just increase the takeover because you don't know when BLPOP will run by the python's event loop.

I ended up increasing the takeover delay to 1s. Let's see how that goes in practice 😄

@kostasrim kostasrim requested a review from adiholden April 7, 2025 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants