Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bbr: account for bytes buffered at lower layers #1999

Merged
merged 1 commit into from
Apr 9, 2025
Merged

Conversation

ghedo
Copy link
Member

@ghedo ghedo commented Apr 4, 2025

For example, due to pacing offload some packets might be buffered in the fq qdisc queue and not yet sent on the network.

This is more or less meant to be equivalent to Linux' bbr_packets_in_net_at_edt() function.


This is an alternative to #1636 and instead of walking the list of sent packets which is quite expensive, we just sort of estimate the amount of bytes by looking at BW and pacing rate.

@ghedo ghedo requested a review from a team as a code owner April 4, 2025 17:18
@antoniovicente antoniovicente self-requested a review April 4, 2025 17:35
fn bbr_bytes_in_net(r: &Congestion, in_flight: usize, now: Instant) -> usize {
let edt = r.pacer.next_time().max(now);
let interval = edt.saturating_duration_since(now);
let interval_delivered = interval.as_secs_f64() * r.bbr_state.btlbw as f64;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cant tell but is r.bbr_state.btlbw equivalent to bbr_bw. If its different should we call out the difference in a comment?

Copy link
Contributor

@antoniovicente antoniovicente Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that r.bbr_state.btlbw is equivalent to lt_bw in the kernel code, so bbr_bw would match btlbw when lt_use_bw is true.

I assume that we have a valid btlbw at this point, but it wouldn't hurt to double check (btlbw starts as 0, so worst case we won't adjust down the in_flight bytes)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell btlbw is the closest thing to bw in BBR1 state.

The RFC only seems to mention BBR.BtlBw.

fn bbr_bytes_in_net(r: &Congestion, in_flight: usize, now: Instant) -> usize {
let edt = r.pacer.next_time().max(now);
let interval = edt.saturating_duration_since(now);
let interval_delivered = interval.as_secs_f64() * r.bbr_state.btlbw as f64;
Copy link
Contributor

@antoniovicente antoniovicente Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that r.bbr_state.btlbw is equivalent to lt_bw in the kernel code, so bbr_bw would match btlbw when lt_use_bw is true.

I assume that we have a valid btlbw at this point, but it wouldn't hurt to double check (btlbw starts as 0, so worst case we won't adjust down the in_flight bytes)

let mut in_flight_at_edt = in_flight;

// pacing_gain > 1.0
if (r.bbr_state.pacing_gain - 1.0).abs() > f64::EPSILON {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this .abs() call correct?

If pacing gain were 0.8, we'ld add the send_quantum. This doesn't seem right.

This check is effectively pacing_gain != 1.0, not pacing_gain > 1.0

I see use of .abs() in other places, we may want to audit those too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See existing impl of pacing_gain == 1.0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I misunderstood the abs() comparison, and indeed we already do a > 1.0 check a few lines below that one

so I changed this to a straight comparison.

Somewhat suspiciously the bbr_drain() test started failing again after that change, and to fix it I had to increase the number of sent packets even more (to 7, compared to the original 5). The test itself doesn't take pacing rate into account which I suspect is what causes the mismatch (since the new code expects pacing to be done)... I also realized it had a seemingly stray comment ("Stop at right before filled_pipe=true") that doesn't seem to make sense since it's actually asserting filled_pipe=true afterwards so I removed it...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing against > epsilon or > 1.0 + epsilon is more correct, but it I don't know how much error we can end up accumulating here. Is gain of 1.0 something that can only happen when we assign 1.0, or can it also happen after (pacing_gain * N) followed by * (1/N) or pacing_gain + N followed by - N?

@ghedo ghedo force-pushed the bbr_bytes_in_net branch from 8ccf134 to 053d496 Compare April 7, 2025 12:54
For example, due to pacing offload some packets might be buffered in the
fq qdisc queue and not yet sent on the network.

This is more or less meant to be equivalent to Linux'
`bbr_packets_in_net_at_edt()` function.
@ghedo ghedo force-pushed the bbr_bytes_in_net branch from 053d496 to 951e1fc Compare April 7, 2025 13:53
@ghedo ghedo merged commit 216ad7a into master Apr 9, 2025
36 checks passed
@ghedo ghedo deleted the bbr_bytes_in_net branch April 9, 2025 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants