Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixing OSS filter_fuzz_test OOM issue when thread_count is 800 million. #23782

Merged

Conversation

yanjunxiang-google
Copy link
Contributor

Signed-off-by: Yanjun Xiang [email protected]

Commit Message:
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @htuch
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #23782 was opened by yanjunxiang-google.

see: more, trace.

@yanjunxiang-google
Copy link
Contributor Author

When running OSS filter_fuzz_test, we are seeing an OOM issue with file system buffer extension filter when thead_count is configured to be extremely large, like 800 million. The test logs are below:

/bazel-bin/test/extensions/filters/http/common/fuzz/filter_fuzz_test test/extensions/filters/http/common/fuzz/filter_corpus/clusterfuzz-testcase-minimized-filter_fuzz_test-5022310482706432 -l trace > ~/yanjunxiang/test.log
[2022-11-01 21:21:25.197][3683384][info][misc] [test/fuzz/main.cc:47] Corpus file: test/extensions/filters/http/common/fuzz/filter_corpus/clusterfuzz-testcase-minimized-filter_fuzz_test-5022310482706432
[2022-11-01 21:21:25.206][3683384][debug][misc] [test/extensions/filters/http/common/fuzz/filter_fuzz_test.cc:75] Filter configuration: name: "envoy.filters.http.match_delegate"
typed_config {
[type.googleapis.com/envoy.extensions.common.matching.v3.ExtensionWithMatcher] {
extension_config {
name: "."
typed_config {
[type.googleapis.com/envoy.extensions.filters.http.file_system_buffer.v3.FileSystemBufferFilterConfig] {
manager_config {
thread_pool {
thread_count: 811335680
}
}
}
}
}
}
}

[2022-11-01 21:21:25.330][3683384][info][misc] [test/test_common/utility.cc:63] TestRandomGenerator running with seed 1991829417
[2022-11-01 21:21:25.382][3683384][info][misc] [test/test_common/utility.cc:63] TestRandomGenerator running with seed 1991829417
[2022-11-01 21:21:25.383][3683384][info][config] [./envoy/registry/registry.h:413] Factory 'envoy.network.dns_resolver.cares' (type '{"envoy.extensions.network.dns_resolver.cares.v3.CaresDnsResolverConfig"}') registered for tests
[2022-11-01 21:21:25.384][3683384][info][config] [./envoy/registry/registry.h:456] Removed test factory 'envoy.network.dns_resolver.cares' (type '{"envoy.extensions.network.dns_resolver.cares.v3.CaresDnsResolverConfig"}')
[2022-11-01 21:21:25.384][3683384][info][misc] [test/extensions/filters/http/common/fuzz/uber_filter.cc:69] filter name envoy.filters.http.match_delegate
[2022-11-01 21:21:25.384][3683384][warning][misc] [source/common/protobuf/message_validator_impl.cc:35] message 'envoy.extensions.common.matching.v3.ExtensionWithMatcher' is marked as work-in-progress. API features marked as work-in-progress are not considered stable, are not covered by the threat model, are not supported by the security team, and are subject to breaking changes. Do not use this feature without understanding each of the previous points.
[2022-11-01 21:21:25.398][3683384][warning][misc] [source/common/protobuf/message_validator_impl.cc:35] message 'envoy.extensions.filters.http.file_system_buffer.v3.FileSystemBufferFilterConfig' is contained in proto file 'envoy/extensions/filters/http/file_system_buffer/v3/file_system_buffer.proto' marked as work-in-progress. API features marked as work-in-progress are not considered stable, are not covered by the threat model, are not supported by the security team, and are subject to breaking changes. Do not use this feature without understanding each of the previous points.
[2022-11-01 21:21:25.398][3683384][warning][misc] [source/common/protobuf/message_validator_impl.cc:35] message 'envoy.extensions.common.async_files.v3.AsyncFileManagerConfig' is contained in proto file 'envoy/extensions/common/async_files/v3/async_file_manager.proto' marked as work-in-progress. API features marked as work-in-progress are not considered stable, are not covered by the threat model, are not supported by the security team, and are subject to breaking changes. Do not use this feature without understanding each of the previous points.
[2022-11-01 21:21:25.398][3683384][warning][misc] [source/common/protobuf/message_validator_impl.cc:35] message 'envoy.extensions.common.async_files.v3.AsyncFileManagerConfig.ThreadPool' is contained in proto file 'envoy/extensions/common/async_files/v3/async_file_manager.proto' marked as work-in-progress. API features marked as work-in-progress are not considered stable, are not covered by the threat model, are not supported by the security team, and are subject to breaking changes. Do not use this feature without understanding each of the previous points.
[2022-11-01 21:21:25.398][3683384][info][main] [source/extensions/common/async_files/async_file_manager_thread_pool.cc:41] AsyncFileManagerThreadPool created with id '', with 811335680 threads
terminate called without an active exception

@yanjunxiang-google
Copy link
Contributor Author

The decode is below:

+----------------------------------------Release Build Stacktrace----------------------------------------+
  | Command: /mnt/scratch0/clusterfuzz/resources/platform/linux/unshare -c -n /mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_envoy_13d8ff3fd8b6e12ff5bbd32d951c40c9e1c6513f/revisions/filter_fuzz_test -rss_limit_mb=2560 -timeout=60 -runs=100 /mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases/oom-71797470bea5299a678f6b9eef8d6f405b8113c3
  | Time ran: 3.469438076019287
  |  
  | INFO: found LLVMFuzzerCustomMutator (0x205a650). Disabling -len_control by default.
  | INFO: Running with entropic power schedule (0xFF, 100).
  | INFO: Seed: 1495360748
  | INFO: Loaded 1 modules (577515 inline 8-bit counters): 577515 [0x7d6c188, 0x7df9173),
  | INFO: Loaded 1 PC tables (577515 PCs): 577515 [0x7df9178,0x86c9028),
  | /mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_envoy_13d8ff3fd8b6e12ff5bbd32d951c40c9e1c6513f/revisions/filter_fuzz_test: Running 1 inputs 100 time(s) each.
  | Running: /mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases/oom-71797470bea5299a678f6b9eef8d6f405b8113c3
  | ==7650== ERROR: libFuzzer: out-of-memory (malloc(6490685440))
  | To change the out-of-memory limit use -rss_limit_mb=
  |  
  | #0 0x20298e1 in __sanitizer_print_stack_trace /src/llvm-project/compiler-rt/lib/asan/asan_stack.cpp:87:3
  | #1 0x1f48218 in fuzzer::PrintStackTrace() /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerUtil.cpp:210:5
  | #2 0x1f2c1b5 in fuzzer::Fuzzer::HandleMalloc(unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:131:3
  | #3 0x1f2c0ca in fuzzer::MallocHook(void const volatile*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:100:6
  | #4 0x2030d67 in __sanitizer::RunMallocHooks(void const*, unsigned long) /src/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_common.cpp:318:5
  | #5 0x1fa0f14 in __asan::Allocator::Allocate(unsigned long, unsigned long, __sanitizer::BufferedStackTrace*, __asan::AllocType, bool) /src/llvm-project/compiler-rt/lib/asan/asan_allocator.cpp:600:5
  | #6 0x1fa0853 in __asan::asan_malloc(unsigned long, __sanitizer::BufferedStackTrace*) /src/llvm-project/compiler-rt/lib/asan/asan_allocator.cpp:953:34
  | #7 0x201f865 in __interceptor_malloc /src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:70:10
  | #8 0x2103f44 in operator new(unsigned long)
  | #9 0x4e16850 in __libcpp_operator_new /usr/local/include/c++/v1/new:245:10
  | #10 0x4e16850 in __libcpp_allocate /usr/local/include/c++/v1/new:271:10
  | #11 0x4e16850 in allocate /usr/local/include/c++/v1/__memory/allocator.h:105:38
  | #12 0x4e16850 in allocate /usr/local/include/c++/v1/__memory/allocator_traits.h:262:20
  | #13 0x4e16850 in __split_buffer /usr/local/include/c++/v1/__split_buffer:306:29
  | #14 0x4e16850 in std::__1::vector<std::__1::thread, std::__1::allocatorstd::__1::thread >::reserve(unsigned long) /usr/local/include/c++/v1/vector:1489:53
  | #15 0x4e156f1 in Envoy::Extensions::Common::AsyncFiles::AsyncFileManagerThreadPool::AsyncFileManagerThreadPool(envoy::extensions::common::async_files::v3::AsyncFileManagerConfig const&, Envoy::Api::OsSysCalls&) /proc/self/cwd/source/extensions/common/async_files/async_file_manager_thread_pool.cc:42:16
  | #16 0x4df9f70 in __shared_ptr_emplace<const envoy::extensions::common::async_files::v3::AsyncFileManagerConfig &, Envoy::Api::OsSysCalls &> /usr/local/include/c++/v1/__memory/shared_ptr.h:294:37
  | #17 0x4df9f70 in allocate_shared<Envoy::Extensions::Common::AsyncFiles::AsyncFileManagerThreadPool, std::__1::allocatorEnvoy::Extensions::Common::AsyncFiles::AsyncFileManagerThreadPool, const envoy::extensions::common::async_files::v3::AsyncFileManagerConfig &, Envoy::Api::OsSysCalls &, void> /usr/local/include/c++/v1/__memory/shared_ptr.h:953:55
  | #18 0x4df9f70 in make_shared<Envoy::Extensions::Common::AsyncFiles::AsyncFileManagerThreadPool, const envoy::extensions::common::async_files::v3::AsyncFileManagerConfig &, Envoy::Api::OsSysCalls &, void> /usr/local/include/c++/v1/__memory/shared_ptr.h:962:12
  | #19 0x4df9f70 in Envoy::Extensions::Common::AsyncFiles::AsyncFileManagerFactoryImpl::getAsyncFileManager(envoy::extensions::common::async_files::v3::AsyncFileManagerConfig const&, Envoy::Api::OsSysCalls*) /proc/self/cwd/source/extensions/common/async_files/async_file_manager_factory.cc:61:29
  | #20 0x4dbb63c in Envoy::Extensions::HttpFilters::FileSystemBuffer::FileSystemBufferFilterFactory::createFilterFactoryFromProtoTyped(envoy::extensions::filters::http::file_system_buffer::v3::FileSystemBufferFilterConfig const&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, Envoy::Server::Configuration::FactoryContext&) /proc/self/cwd/source/extensions/filters/http/file_system_buffer/config.cc:26:57
  | #21 0x4dbdca7 in createFilterFactoryFromProto /proc/self/cwd/source/extensions/filters/http/common/factory_base.h:71:12
  | #22 0x4dbdca7 in non-virtual thunk to Envoy::Extensions::HttpFilters::Common::FactoryBase<envoy::extensions::filters::http::file_system_buffer::v3::FileSystemBufferFilterConfig, envoy::extensions::filters::http::file_system_buffer::v3::FileSystemBufferFilterConfig>::createFilterFactoryFromProto(google::protobuf::Message const&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, Envoy::Server::Configuration::FactoryContext&) /proc/self/cwd/source/extensions/filters/http/common/factory_base.h:0
  | #23 0x3886108 in Envoy::Common::Http::MatchDelegate::MatchDelegateConfig::createFilterFactoryFromProtoTyped(envoy::extensions::common::matching::v3::ExtensionWithMatcher const&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, Envoy::Server::Configuration::FactoryContext&) /proc/self/cwd/source/common/http/match_delegate/config.cc:257:33
  | #24 0x388c437 in createFilterFactoryFromProto /proc/self/cwd/source/extensions/filters/http/common/factory_base.h:71:12
  | #25 0x388c437 in non-virtual thunk to Envoy::Extensions::HttpFilters::Common::FactoryBase<envoy::extensions::common::matching::v3::ExtensionWithMatcher, envoy::extensions::common::matching::v3::ExtensionWithMatcher>::createFilterFactoryFromProto(google::protobuf::Message const&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, Envoy::Server::Configuration::FactoryContext&) /proc/self/cwd/source/extensions/filters/http/common/factory_base.h:0
  | #26 0x2176dff in Envoy::Extensions::HttpFilters::UberFilterFuzzer::fuzz(envoy::extensions::filters::network::http_connection_manager::v3::HttpFilter const&, test::fuzz::HttpData const&, test::fuzz::HttpData const&) /proc/self/cwd/test/extensions/filters/http/common/fuzz/uber_filter.cc:76:19
  | #27 0x205b5fc in TestOneProtoInput /proc/self/cwd/test/extensions/filters/http/common/fuzz/filter_fuzz_test.cc:78:12
  | #28 0x205b5fc in LLVMFuzzerTestOneInput /proc/self/cwd/test/extensions/filters/http/common/fuzz/filter_fuzz_test.cc:13:1
  | #29 0x1f2e493 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
  | #30 0x1f18fa2 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
  | #31 0x1f1e84c in fuzzer::FuzzerDriver(int*, char***, int ()(unsigned char const, unsigned long)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9
  | #32 0x1f489d2 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
  | #33 0x7f16350e60b2 in __libc_start_main /build/glibc-eX1tMB/glibc-2.31/csu/libc-start.c:308:16
  | #34 0x1f0f16d in _start
  |  
  | SUMMARY: libFuzzer: out-of-memory
 


@yanjunxiang-google
Copy link
Contributor Author

/assign @ravenblackx

@yanjunxiang-google
Copy link
Contributor Author

yanjunxiang-google commented Nov 1, 2022

@ravenblackx I think we need to add a PGV to guard the max limit of the thread_count. But not sure what number should we put it there. If this is not the right fix, please feel free to create a new PR and we can close this one.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me to bound to something like this.

Copy link
Contributor

@ravenblackx ravenblackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@KBaichoo
Copy link
Contributor

KBaichoo commented Nov 2, 2022

@htuch this is waiting for API shepard lgtm

Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a minimized testcase to validate that the fuzz issue is solved?
/wait

@KBaichoo KBaichoo merged commit 9db77e6 into envoyproxy:main Nov 2, 2022
yanjunxiang-google added a commit to yanjunxiang-google/envoy that referenced this pull request Nov 2, 2022
@yanjunxiang-google
Copy link
Contributor Author

yanjunxiang-google commented Nov 2, 2022

Can you add a minimized testcase to validate that the fuzz issue is solved? /wait
Thanks for the comments!

Yes, the issue is resolved by running the bazel test with the same test case file. The PGV blocked such configurations, thus the test passed. That reminds me we should commit the fuzz test case file. This is the PR for this leftover part: #23806.

Regarding your offline question about why the upper bound is 1024, why not bigger 4096, or why not less, like 256, and how much is the memory size for each case?

  1. The code which do the malloc is here:
    thread_pool_.reserve(thread_pool_size);

And thread_pool_ is defined as:
std::vectorstd::thread thread_pool_;

each std::thread data structure size is 8 bytes.
Thus 1024 threads is 8192 bytes. 256 threads is 2048 bytes, 4096 is 32k, et.al.
811335680 threads is 6490685440 bytes, 6.5G----- this the number printed in the above trace back.

  1. To not OOM, IIUC, Envoy is bounding it to be 2560M. To achieve that, we just need to upper bound thread_count to be somewhere within 2560 M/ 8 = 320 million.

However, thread_count is reflecting the concurrent threads hardware supports:

// The number of threads to use. If unset or zero, will default to the number
.

In that sense, 1024 is a sufficient big number for that,. At same time doesn't cause OOM(just 8K here).
Also, do we need to keep this upper bound smaller, like 256 or 512? I think this is a good question. But I guess we don't have to make it that tight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants