dns: coalesce identical concurrent lookup() requests#62599
dns: coalesce identical concurrent lookup() requests#62599orgads wants to merge 1 commit intonodejs:mainfrom
Conversation
|
Review requested:
|
When multiple callers issue dns.lookup() for the same (hostname, family,
hints, order) concurrently, only one getaddrinfo call is now dispatched
to the libuv threadpool. All callers share the result.
getaddrinfo is a blocking call that runs on the libuv threadpool (capped
at 4 threads by default, with a slow I/O concurrency limit of 2). When
DNS resolution is slow - e.g. ~10-20 s per call due to a misbehaving
resolver - identical requests queue behind each other, causing timeouts
that grow linearly with the number of concurrent callers:
Before: 100 parallel lookup('host') -> 50 batches × 10 s = 500+ s
After: 100 parallel lookup('host') -> 1 getaddrinfo call = ~10 s
This is particularly severe on WSL, where the DNS relay rewrites QNAMEs
in responses (appending the search domain), causing glibc to discard
them as non-matching and wait for a 5s timeout per retry.
The coalescing is keyed on (hostname, family, hints, order) so lookups
with different options still get separate getaddrinfo calls. Each
caller independently post-processes the shared raw result (applying the
'all' flag, constructing address objects, etc.).
Signed-off-by: Orgad Shaneh <orgad.shaneh@audiocodes.com>
PR-URL: nodejs#62599
Fixes: nodejs#62503
36f0f63 to
c754d12
Compare
When multiple callers issue dns.lookup() for the same (hostname, family,
hints, order) concurrently, only one getaddrinfo call is now dispatched
to the libuv threadpool. All callers share the result.
getaddrinfo is a blocking call that runs on the libuv threadpool (capped
at 4 threads by default, with a slow I/O concurrency limit of 2). When
DNS resolution is slow - e.g. ~10-20 s per call due to a misbehaving
resolver - identical requests queue behind each other, causing timeouts
that grow linearly with the number of concurrent callers:
Before: 100 parallel lookup('host') -> 50 batches × 10 s = 500+ s
After: 100 parallel lookup('host') -> 1 getaddrinfo call = ~10 s
This is particularly severe on WSL, where the DNS relay rewrites QNAMEs
in responses (appending the search domain), causing glibc to discard
them as non-matching and wait for a 5s timeout per retry.
The coalescing is keyed on (hostname, family, hints, order) so lookups
with different options still get separate getaddrinfo calls. Each
caller independently post-processes the shared raw result (applying the
'all' flag, constructing address objects, etc.).
Signed-off-by: Orgad Shaneh <orgad.shaneh@audiocodes.com>
PR-URL: nodejs#62599
Fixes: nodejs#62503
c754d12 to
b213b0f
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #62599 +/- ##
==========================================
- Coverage 91.53% 89.72% -1.81%
==========================================
Files 352 695 +343
Lines 147833 214543 +66710
Branches 23148 41076 +17928
==========================================
+ Hits 135321 192503 +57182
- Misses 12255 14095 +1840
- Partials 257 7945 +7688
🚀 New features to boost your workflow:
|
mcollina
left a comment
There was a problem hiding this comment.
I think this might break async_hooks continuation, could you verify? Adding an AsyncResource on coalescing would be enough.
When multiple callers issue dns.lookup() for the same (hostname, family,
hints, order) concurrently, only one getaddrinfo call is now dispatched
to the libuv threadpool. All callers share the result.
getaddrinfo is a blocking call that runs on the libuv threadpool (capped
at 4 threads by default, with a slow I/O concurrency limit of 2). When
DNS resolution is slow - e.g. ~10-20 s per call due to a misbehaving
resolver - identical requests queue behind each other, causing timeouts
that grow linearly with the number of concurrent callers:
Before: 100 parallel lookup('host') -> 50 batches × 10 s = 500+ s
After: 100 parallel lookup('host') -> 1 getaddrinfo call = ~10 s
This is particularly severe on WSL, where the DNS relay rewrites QNAMEs
in responses (appending the search domain), causing glibc to discard
them as non-matching and wait for a 5s timeout per retry.
The coalescing is keyed on (hostname, family, hints, order) so lookups
with different options still get separate getaddrinfo calls. Each
caller independently post-processes the shared raw result (applying the
'all' flag, constructing address objects, etc.).
Signed-off-by: Orgad Shaneh <orgad.shaneh@audiocodes.com>
PR-URL: nodejs#62599
Fixes: nodejs#62503
b213b0f to
cee78bf
Compare
|
Great observation! Should be fixed now. |
When multiple callers issue dns.lookup() for the same (hostname, family,
hints, order) concurrently, only one getaddrinfo call is now dispatched
to the libuv threadpool. All callers share the result.
getaddrinfo is a blocking call that runs on the libuv threadpool (capped
at 4 threads by default, with a slow I/O concurrency limit of 2). When
DNS resolution is slow - e.g. ~10-20 s per call due to a misbehaving
resolver - identical requests queue behind each other, causing timeouts
that grow linearly with the number of concurrent callers:
Before: 100 parallel lookup('host') -> 50 batches × 10 s = 500+ s
After: 100 parallel lookup('host') -> 1 getaddrinfo call = ~10 s
This is particularly severe on WSL, where the DNS relay rewrites QNAMEs
in responses (appending the search domain), causing glibc to discard
them as non-matching and wait for a 5s timeout per retry.
The coalescing is keyed on (hostname, family, hints, order) so lookups
with different options still get separate getaddrinfo calls. Each
caller independently post-processes the shared raw result (applying the
'all' flag, constructing address objects, etc.).
Signed-off-by: Orgad Shaneh <orgad.shaneh@audiocodes.com>
PR-URL: nodejs#62599
Fixes: nodejs#62503
cee78bf to
610a0c3
Compare
|
I have a feeling that the failing test on GHA is relevant, can you take a look? |
It doesn't look related. Similar failures happened here and here. Also, this test calls The test is flaky, since it relies on external systems which may refuse or time out. |
When multiple callers issue dns.lookup() for the same (hostname, family, hints, order) concurrently, only one getaddrinfo call is now dispatched to the libuv threadpool. All callers share the result.
getaddrinfo is a blocking call that runs on the libuv threadpool (capped at 4 threads by default, with a slow I/O concurrency limit of 2). When DNS resolution is slow - e.g. ~10-20 s per call due to a misbehaving resolver - identical requests queue behind each other, causing timeouts that grow linearly with the number of concurrent callers:
Before: 100 parallel lookup('host') -> 50 batches × 10 s = 500+ s
After: 100 parallel lookup('host') -> 1 getaddrinfo call = ~10 s
This is particularly severe on WSL, where the DNS relay rewrites QNAMEs in responses (appending the search domain), causing glibc to discard them as non-matching and wait for a 5s timeout per retry.
The coalescing is keyed on (hostname, family, hints, order) so lookups with different options still get separate getaddrinfo calls. Each caller independently post-processes the shared raw result (applying the 'all' flag, constructing address objects, etc.).
Fixes: #62503