Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrument the OTel trace API's auto SDK #2001

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

MrAlias
Copy link
Contributor

@MrAlias MrAlias commented Mar 17, 2025

@MrAlias MrAlias added this to the v0.22.0 milestone Mar 17, 2025
@MrAlias MrAlias force-pushed the nonrecording-trace-api-sdk branch 2 times, most recently from 2f5c92a to d650cf5 Compare March 17, 2025 21:03
@MrAlias MrAlias marked this pull request as ready for review March 17, 2025 21:13
@MrAlias MrAlias requested a review from a team as a code owner March 17, 2025 21:13
@MrAlias MrAlias force-pushed the nonrecording-trace-api-sdk branch 3 times, most recently from 5a0200f to 2ea1657 Compare March 19, 2025 22:38
@MrAlias MrAlias force-pushed the nonrecording-trace-api-sdk branch from 2ea1657 to 0933d79 Compare March 20, 2025 15:46
Copy link
Contributor

@RonFed RonFed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall, looks great.
wonder if we can share a lot of this between the auto/sdk probe and this new one in:

  • C code - we can have a common header with the common functionality.
  • probe.go also has common logic of parsing the perf events.
  • e2e tests can be unified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very similar to the autosdk I think we should consider to unify them to one test where we could check both functionalities.

Copy link
Contributor Author

@MrAlias MrAlias Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to combine these tests, it will take a lot of careful redesign of our testing system and (bats) utilities. These are testing independent workflows for independent code-paths that originate based on different mechanism. If we want to unify this effort, I would say we add this complexity after this PR is merged.

@@ -87,6 +88,7 @@ func NewInstrumentation(ctx context.Context, opts ...InstrumentationOption) (*In
kafkaProducer.New(c.logger, Version()),
kafkaConsumer.New(c.logger, Version()),
autosdk.New(c.logger),
otelTrace.New(c.logger),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want this to be opt-in same as we have below the check for

if c.globalImpl {
		p = append(p, otelTraceGlobal.New(c.logger))
	}

I think it should be the same opt-in flag - or do we want to remove that flag entirely and have this on by default (both the global and the non-recording)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the flag. It was originally intended to set user expectations about our initial partial support for the OTel tracing API. At this point we provide full coverage of the API (for recent versions of otel-go).

I can track the removal of this opt-in flag in an issue an tackle it in a follow-on PR.

Comment on lines +92 to +95
if (wrote_flag) {
// Already wrote flag value.
return 0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of an invalid state - if we already wrote we should have been unloaded I guess.
It also can take some time for the unload to happen so this covers this, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also can take some time for the unload to happen so this covers this, right?

Right, this is my thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this is how the auto/sdk probe also handles this.

Comment on lines +14 to +23
// TODO: tune this. This is a just a guess, but we should be able to determine
// the maximum size of a span (based on limits) and set this. Ideally, we could
// also look into a tiered allocation strategy so we do not over allocate
// space (i.e. small, medium, large data sizes).
#define MAX_SIZE 2048

// Injected const.
volatile const u64 span_context_trace_id_pos;
volatile const u64 span_context_span_id_pos;
volatile const u64 span_context_trace_flags_pos;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could share this in a header with auto/sdk probe that uses the exact same consts.

Copy link
Contributor Author

@MrAlias MrAlias Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on board if you would like to refactor this after this PR is merged. I would prefer to keep the complexity of this PR to a minimum and making changes to other probes to support a new header is not something I want to include here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIG meeting feedback. Refactor this PR to handle dedup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants