-
Notifications
You must be signed in to change notification settings - Fork 61
Description
The hdfsRead() requests more data from datanode then needed, so TCP socket input buffer is overflow.
Calls chain: hdfsRead() -> InputStreamImpl::read() - > InputStreamImpl::readInternal() - > InputStreamImpl::readOneBlock() -> InputStreamImpl::setupBlockReader(bool) -> ctor RemoteBlockReader() with wrong 'len' argument
The length calculated here regardless requested data length:
libhdfs3/src/client/InputStreamImpl.cpp
Line 389 in ceb428c
| len = curBlock->getNumBytes() - offset; |
So RemoteBlockReader requests more data from a datanode which continue to transmit data until overflow message received. During this time a client can receive a lot of data (up to hundreds of megabytes for several connections) that occupy system memory while the socket is open.
If running in container, it can be seen in field 'sock' from /sys/fs/cgroup/memory.stat .
For example: read the row batches from parquet file in parallel mode, i.e. read a part of hdfs block, not the whole block in one call.
The hdfsRead() is used in ClickHouse here:
https://github.com/ClickHouse/ClickHouse/blob/34074c00b11245eebb45cdac98d4959107351b0d/src/Storages/ObjectStorage/HDFS/ReadBufferFromHDFS.cpp#L119
There is a "enable_hdfs_pread" config parameter ( default is true ). It enables using the hdfsPread call, which does not have this bug. But I'm not sure that hdfsRead is never called in this case.
Workaround: use hdfsPread() instead of hdfsRead().