Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats/opentelemetry: separate out interceptors for tracing and metrics #8063

Open
wants to merge 62 commits into
base: master
Choose a base branch
from

Conversation

janardhanvissa
Copy link
Contributor

@janardhanvissa janardhanvissa commented Feb 3, 2025

RELEASE NOTES: None

Copy link

codecov bot commented Feb 3, 2025

Codecov Report

Attention: Patch coverage is 74.48980% with 50 lines in your changes missing coverage. Please review.

Project coverage is 82.01%. Comparing base (ce35fd4) to head (cc12a84).

Files with missing lines Patch % Lines
stats/opentelemetry/client_tracing.go 56.71% 23 Missing and 6 partials ⚠️
stats/opentelemetry/server_tracing.go 57.69% 7 Missing and 4 partials ⚠️
stats/opentelemetry/client_metrics.go 85.45% 5 Missing and 3 partials ⚠️
stats/opentelemetry/server_metrics.go 90.90% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8063      +/-   ##
==========================================
- Coverage   82.16%   82.01%   -0.15%     
==========================================
  Files         410      410              
  Lines       40248    40342      +94     
==========================================
+ Hits        33068    33087      +19     
- Misses       5830     5881      +51     
- Partials     1350     1374      +24     
Files with missing lines Coverage Δ
stats/opentelemetry/opentelemetry.go 77.90% <100.00%> (+2.90%) ⬆️
stats/opentelemetry/server_metrics.go 86.14% <90.90%> (-3.68%) ⬇️
stats/opentelemetry/client_metrics.go 82.35% <85.45%> (-7.05%) ⬇️
stats/opentelemetry/server_tracing.go 69.44% <57.69%> (-30.56%) ⬇️
stats/opentelemetry/client_tracing.go 66.27% <56.71%> (-17.73%) ⬇️

... and 21 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@janardhanvissa janardhanvissa force-pushed the refactor-tracing-metrics branch from 69df069 to 71804b4 Compare February 3, 2025 11:48
@purnesh42H
Copy link
Contributor

@janardhanvissa its not clear what is the intention of this refactor. The follow up from opentelemetry tracing API PR was to create separate interceptors for metrics and traces. Right now, single interceptor is handling both trace and metrics options. Once we have separate unary and stream interceptor each for tracing and metrics, we don't have to check for options disabled/enabled everytime.

Copy link
Contributor

@purnesh42H purnesh42H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janardhanvissa janardhanvissa removed their assignment Mar 20, 2025
Copy link
Contributor

@purnesh42H purnesh42H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@purnesh42H
Copy link
Contributor

@dfawley for second review

Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the cleanup! This definitely looks better, but I think it can be improved even more.

if !o.isTracingEnabled() {
return do
}
tracingHandler := &clientTracingHandler{options: o}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you call this cth, or make the above metricsHandler instead?

Copy link
Contributor

@purnesh42H purnesh42H Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the current code and with this refactor, its fine to call the clientStatsHandler as metricsHandler. @janardhanvissa

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -117,10 +113,19 @@ type MetricsOptions struct {
// MeterProvider. If the passed in Meter Provider does not have the view
// configured for an individual metric turned on, the API call in this component
// will create a default view for that metric.
//
// For the traces supported by this instrumentation code, provide an
// implementation of a TextMapPropagator and OpenTelemetry TracerProvider.
func DialOption(o Options) grpc.DialOption {
csh := &clientStatsHandler{options: o}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it possible to have metrics disabled and tracing enabled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, right now if metrics are disabled, the clientStatsHandler behaves as no-op for any type of metric work but its still always added. @janardhanvissa could you try not this interceptor if metrics is disabled? If there is no issue, then probably the right thing to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

estats "google.golang.org/grpc/experimental/stats"
istats "google.golang.org/grpc/internal/stats"
"google.golang.org/grpc/metadata"
"google.golang.org/grpc/stats"
"google.golang.org/grpc/status"

otelattribute "go.opentelemetry.io/otel/attribute"
otelmetric "go.opentelemetry.io/otel/metric"
)

type clientStatsHandler struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this clientMetricsHandler instead? "stats handler" is a specific thing that makes both metrics and tracing work. Since this only handles metrics, it probably shouldn't use "stats".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +71 to 81
ci := getCallInfo(ctx)
if ci == nil {
if logger.V(2) {
logger.Info("Creating new CallInfo since its not present in context in clientStatsHandler unaryInterceptor")
}
ci = &callInfo{
target: cc.CanonicalTarget(),
method: determineMethod(method, opts...),
}
ctx = setCallInfo(ctx, ci)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing this? I would expect every unary interceptor start should be for a new call attempt, so there should never be anything in the context already? Am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, with this refactor, we have 2 stats handler now 1) metrics and 2) traces. The stats handlers are executed in the order in which they are added. So, if metrics handler executes first, we need to make sure that tracing handler doesn't create a new call info rather use the existing one and add the tracing stuff there. Vice-versa is also true if in future we change the order of interceptors. Basically, each interceptor needs to check if call info already exist or not and then add its info to existing one if present otherwise create a new one.

Comment on lines +111 to 121
ci := getCallInfo(ctx)
if ci == nil {
if logger.V(2) {
logger.Info("Creating new CallInfo since its not present in context in clientStatsHandler streamInterceptor")
}
ci = &callInfo{
target: cc.CanonicalTarget(),
method: determineMethod(method, opts...),
}
ctx = setCallInfo(ctx, ci)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason.


func (h *clientTracingHandler) unaryInterceptor(ctx context.Context, method string, req, reply any, cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {
ci := getCallInfo(ctx)
if ci == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.


func (h *clientTracingHandler) streamInterceptor(ctx context.Context, desc *grpc.StreamDesc, cc *grpc.ClientConn, method string, streamer grpc.Streamer, opts ...grpc.CallOption) (grpc.ClientStream, error) {
ci := getCallInfo(ctx)
if ci == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.


// perCallTraces sets the span status based on the RPC result and ends the span.
// It is used to finalize tracing for both unary and streaming calls.
func (h *clientTracingHandler) perCallTraces(err error, ts trace.Span) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't have all of the per-call trace events, right? When we add the resolver delay in #8074 anyway.

So probably finishTrace() or endCall or something indicating it handles the end of the RPC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed as finishTrace()

func (h *clientTracingHandler) TagRPC(ctx context.Context, _ *stats.RPCTagInfo) context.Context {
ri := getRPCInfo(ctx)
var ai *attemptInfo
if ri == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

Comment on lines 41 to 47
func (h *serverTracingHandler) unaryInterceptor(ctx context.Context, req any, _ *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (any, error) {
return handler(ctx, req)
}

func (h *serverTracingHandler) streamInterceptor(srv any, ss grpc.ServerStream, _ *grpc.StreamServerInfo, handler grpc.StreamHandler) error {
return handler(srv, ss)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These just shouldn't exist, and don't register an interceptor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@dfawley dfawley assigned janardhanvissa and unassigned dfawley Mar 28, 2025
@@ -130,7 +130,7 @@ func (h *clientTracingHandler) HandleConn(context.Context, stats.ConnStats) {}
func (h *clientTracingHandler) TagRPC(ctx context.Context, _ *stats.RPCTagInfo) context.Context {
ri := getRPCInfo(ctx)
var ai *attemptInfo
if ri == nil {
if ri.ai == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here also we can't assume ri is not nil if the order of stats handlers changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

metricsHandler := &clientMetricsHandler{options: o}
metricsHandler.initializeMetrics()
unaryInterceptors = append(unaryInterceptors, metricsHandler.unaryInterceptor)
streamInterceptors = append(streamInterceptors, metricsHandler.streamInterceptor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is changing the current order. Let's avoid that. Keep only 2 variables metricsInterceptors and tracesInterceptors and add metricsInterceptors before traces

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

streamInterceptors = append(streamInterceptors, tracingHandler.streamInterceptor)
do = append(do, grpc.WithStatsHandler(tracingHandler))
}
if len(unaryInterceptors) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these ifs will change to metrics and traces

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// TagRPC implements per RPC attempt context management for traces.
func (h *serverTracingHandler) TagRPC(ctx context.Context, _ *stats.RPCTagInfo) context.Context {
ri := getRPCInfo(ctx)
var ai *attemptInfo
if ri == nil {
if ri.ai == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. We can't assume ri to be not nil here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@janardhanvissa janardhanvissa removed their assignment Apr 4, 2025
@janardhanvissa janardhanvissa requested a review from dfawley April 4, 2025 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Observability Includes Stats, Tracing, Channelz, Healthz, Binlog, Reflection, Admin, GCP Observability Type: Internal Cleanup Refactors, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants