Skip to content

NR-171475 Prevent AsyncApiImpl from overwriting suspended transactions that are… #1555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 22, 2023

Conversation

jasonjkeller
Copy link
Contributor

@jasonjkeller jasonjkeller commented Oct 18, 2023

… already mapped to an AsyncContext


Resolves #1554

With the changes in this PR, the Jetty/Karaf/CamelCXF transactions are reporting with correct times and no transactions are getting suspended and never resumed (which is what was causing a memory leak).

Screenshot 2023-10-18 at 5 36 46 PM

A workaround has been put in place to prevent this specific memory leak condition from being possible. It applies to the legacy async API itself, rather than being Jetty specific, which means that it could address such an issue with a wide range of app servers that are instrumented. To enable the workaround, set the following config to a value of true (default is false):

Config Option 1: Agent config file (this will update dynamically if the config file is changed)

common: &default_settings
  legacy_async_api_skip_suspend: true

Config Option 2: System Property

-Dnewrelic.config.legacy_async_api_skip_suspend=true

Config Option 3: Environment Variable

NEW_RELIC_LEGACY_ASYNC_API_SKIP_SUSPEND=true

Discovered a nuance to this while running the SpringTest.test_spring_webflux_tomcat_webclient_timeout AIT, which uses embedded Tomcat and the legacy async API. The same transaction instance can be suspended/resumed multiple times and if that is prevented then this AIT will fail (as existing agent behavior is changed).

In this particular example, the transaction was initially suspended via a call to CoyoteAdapter.service, which the agent instruments:

"http-nio-8080-exec-1@10768" daemon prio=5 tid=0x34 nid=NA runnable
java.lang.Thread.State: RUNNABLE
	at com.newrelic.agent.AsyncApiImpl.suspendTransaction(AsyncApiImpl.java:73)
	at com.newrelic.agent.AsyncApiImpl.suspendAsync(AsyncApiImpl.java:45)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:67)

	at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:373)
	at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
	at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:868)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1590)
	at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
	- locked <0x2c1c> (a org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:829)

As execution continued the same transaction was resumed and eventually suspended again, this time via a call to CoyoteAdapter.asyncDispatch, which the agent also instruments:

"http-nio-8080-exec-1@10768" daemon prio=5 tid=0x34 nid=NA runnable
java.lang.Thread.State: RUNNABLE
	at com.newrelic.agent.AsyncApiImpl.suspendTransaction(AsyncApiImpl.java:73)
	at com.newrelic.agent.AsyncApiImpl.suspendAsync(AsyncApiImpl.java:52)
	at org.apache.catalina.connector.CoyoteAdapter.asyncDispatch(CoyoteAdapter.java:67)

	at org.apache.coyote.AbstractProcessor.dispatch(AbstractProcessor.java:238)
	at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:52)
	at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:868)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1590)
	at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
	- locked <0x2c1c> (a org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:829)

@codecov-commenter
Copy link

codecov-commenter commented Oct 18, 2023

Codecov Report

Merging #1555 (73d81e4) into main (01aa79a) will increase coverage by 0.00%.
The diff coverage is 50.00%.

❗ Current head 73d81e4 differs from pull request most recent head 6631b45. Consider uploading reports for the commit 6631b45 to get more accurate results

@@            Coverage Diff            @@
##               main    #1555   +/-   ##
=========================================
  Coverage     70.58%   70.59%           
+ Complexity     9791     9790    -1     
=========================================
  Files           817      817           
  Lines         39489    39491    +2     
  Branches       5995     5995           
=========================================
+ Hits          27875    27879    +4     
+ Misses         8905     8902    -3     
- Partials       2709     2710    +1     
Files Coverage Δ
.../src/main/java/com/newrelic/agent/MetricNames.java 60.00% <ø> (ø)
...ava/com/newrelic/agent/config/AgentConfigImpl.java 86.97% <50.00%> (-0.17%) ⬇️

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@jasonjkeller
Copy link
Contributor Author

@jasonjkeller jasonjkeller changed the title Prevent AsyncApiImpl from overwriting suspended transactions that are… NR-171475 Prevent AsyncApiImpl from overwriting suspended transactions that are… Oct 20, 2023
@jasonjkeller jasonjkeller merged commit 68f1efb into main Nov 22, 2023
@jasonjkeller jasonjkeller deleted the NR-171475-legacy-async-api-memory-leak branch November 22, 2023 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Edge case: rare memory leak when using legacy Async API
4 participants