问题

最近一段时间发现,线上clickhouse_sinker写入clickhouse总是莫名其妙发生失败 clickhouse_sinker端报错

应用日志

2021/03/27 15:21:29.057217 clickhouse.go:123: task: indigo_show_user_event_62_050101999 reconnect with dsn tcp://169.136.179.53:9000?database=ops&username=default&password=1LgyVkXU&block_size=
2097152&connection_open_strategy=in_order&alt_hosts=169.136.182.223:9000

clickhouse-go日志

[clickhouse][connect=21159][connect] write error: write tcp 164.90.66.159:46164->169.136.179.53:9000: i/o timeout

clickhouse端报错:

2021.03.27 15:21:41.418854 [ 60813 ] {8e76bcab-aadf-433d-b4e5-7c12677a2842} <Error> executeQuery: Code: 33, e.displayText() = DB::Exception: Cannot read all data. Bytes read: 51704. Bytes expected: 62921. (version 20.3.8.53) (from 164.90.66.174:60256) (in query: INSERT INTO imo.indigo_show_user_event_62_050101999 (itime,uid,deviceId,os,os_version,imei,imsi,client_version,session_id,tz,locale,country,resolution,dpi,isp,channel,model,vendor,sdk_version,appkey,guid,hdid,mac,debug,gaid,idfa,appsflyerId,events_time,events_lng,events_lat,events_net,events_event_info_abi,events_event_info_is_emulator,events_event_info_is_catched_exception,events_event_info_process,events_event_info_crash_type,events_event_info_crash_thread_name,events_event_info_crash_exception_name,events_event_info_crash_message,events_event_info_max_mem,events_event_info_cur_activity,events_event_info_is_bg,events_event_info_crash_thread_tag,events_event_info,events_event_info_boot_flag,abextra_abi,abextra_imo_flags,abextra_imo_type,abextra_imo_uid,abextra_abflags_v2,events_event_info_xcrash,events_event_info_native_md5,events_event_info_native_tag,events_event_info_last_webview_url,events_event_info_backtrace,events_event_info_package_name,events_event_info_crash_lib,events_event_info_format_thread_name) VALUES ), Stack trace (when copying this message, always include the lines below):

0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x107f5263 in /usr/bin/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x8b0203d in /usr/bin/clickhouse
2. void DB::deserializeBinarySSE2<1>(DB::PODArray<char8_t, 4096ul, Allocator<false, false>, 15ul, 16ul>&, DB::PODArray<unsigned long, 4096ul, Allocator<false, false>, 15ul, 16ul>&, DB::ReadBuffer&, unsigned long) (.cold) @ 0xcc4d6db in /usr/bin/clickhouse
3. DB::DataTypeString::deserializeBinaryBulk(DB::IColumn&, DB::ReadBuffer&, unsigned long, double) const @ 0xcc4bdc5 in /usr/bin/clickhouse
4. DB::NativeBlockInputStream::readData(DB::IDataType const&, DB::IColumn&, DB::ReadBuffer&, unsigned long, double) @ 0xcb5dc18 in /usr/bin/clickhouse
5. DB::NativeBlockInputStream::readImpl() @ 0xcb5e9c1 in /usr/bin/clickhouse
6. DB::IBlockInputStream::read() @ 0xcb42883 in /usr/bin/clickhouse
7. DB::TCPHandler::receiveData(bool) @ 0x8bead3e in /usr/bin/clickhouse
8. DB::TCPHandler::receivePacket() @ 0x8beb437 in /usr/bin/clickhouse
9. DB::TCPHandler::readDataNext(unsigned long const&, int const&) @ 0x8beb61d in /usr/bin/clickhouse
10. DB::TCPHandler::processInsertQuery(DB::Settings const&) @ 0x8bebc56 in /usr/bin/clickhouse
11. DB::TCPHandler::runImpl() @ 0x8bed66f in /usr/bin/clickhouse
12. DB::TCPHandler::run() @ 0x8bedc60 in /usr/bin/clickhouse
13. Poco::Net::TCPServerConnection::start() @ 0xe128c4b in /usr/bin/clickhouse
14. Poco::Net::TCPServerDispatcher::run() @ 0xe1290e1 in /usr/bin/clickhouse
15. Poco::PooledThread::run() @ 0x10884187 in /usr/bin/clickhouse
16. Poco::ThreadImpl::runnableEntry(void*) @ 0x1087fdec in /usr/bin/clickhouse
17. void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(void*), Poco::ThreadImpl*> >(void*) @ 0x1088185d in /usr/bin/clickhouse
18. start_thread @ 0x76ba in /lib/x86_64-linux-gnu/libpthread-2.23.so
19. /build/glibc-Cl5G7W/glibc-2.23/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:111: clone @ 0x10741d in /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.23.so

分析

clickhouse_sinker端

从clickhouse_sinker报错可知,sinker写入ck出现了超时

相关代码

		err = tx.Commit()
		if err != nil {
			logger.ErrLogger.Println("task:", c.Name, "commit transaction failed metrics:", len(metrics), err.Error())

			if strings.Contains(err.Error(), "Too many parts") {
				c.conn = c.getAnotherConn()
				return
			}

			if shouldReconnect(err) {
				logger.ErrLogger.Println("task:", c.Name, "reconnect with dsn", conn.GetDsn())
				conn.ReConnect()
			}
			return
		}

执行tx.Commit时报错,然后执行shouldReconnect重连同一个实例,并重试插入相同的数据。

func (conn *connect) Write(b []byte) (int, error) {
	var (
		n      int
		err    error
		total  int
		srcLen = len(b)
	)
	if currentTime := now(); conn.writeTimeout != 0 && currentTime.Sub(conn.lastWriteDeadlineTime) > (conn.writeTimeout>>2) {
		conn.SetWriteDeadline(time.Now().Add(conn.writeTimeout))
		conn.lastWriteDeadlineTime = currentTime
	}
	for total < srcLen {
		if n, err = conn.Conn.Write(b[total:]); err != nil {
			conn.logf("[connect] write error: %v", err)
			conn.Close()
			return n, driver.ErrBadConn
		}
		total += n
	}
	return n, nil
}

在driver层clickhouse-go中,向TCP缓冲区写入数据发生超时。

clickhouse server端

server端在读取TCP缓冲区时,首先读取数据长度len,然后读取len个byte。

template <int UNROLL_TIMES>
static NO_INLINE void deserializeBinarySSE2(ColumnString::Chars & data, ColumnString::Offsets & offsets, ReadBuffer & istr, size_t limit)
{
    size_t offset = data.size();
    for (size_t i = 0; i < limit; ++i)
    {
        if (istr.eof())
            break;

        UInt64 size;
        readVarUInt(size, istr);

        offset += size + 1;
        offsets.push_back(offset);

        data.resize(offset);

        if (size)
        {
#ifdef __SSE2__
			...
#endif
            {
                istr.readStrict(reinterpret_cast<char*>(&data[offset - size - 1]), size);
            }
        }
        data[offset - 1] = 0;
    }
}

readStrict实现中,如果read_bytes小于n, 就抛出异常。

    /** Reads n bytes, if there are less - throws an exception. */
    void readStrict(char * to, size_t n)
    {
        auto read_bytes = read(to, n);
        if (n != read_bytes)
            throw Exception("Cannot read all data. Bytes read: " + std::to_string(read_bytes) + ". Bytes expected: " + std::to_string(n) + ".", ErrorCodes::CANNOT_READ_ALL_DATA);
    }

从sinker和ck两端的日志来看,sinker先报错,几十秒后ck才报错。猜想应该是网络故障,导致sinker写入超时,ck此时只能读取到部分数据,然后抛异常

排查

首先用netstat查看网络队列,发现send-q一直比较大,而这种情况是不正常的。说明TCP缓冲区中的数据一直没有被对端ACK, 自然也就无法释放。 netstat -antp | grep “169.136.179.53:9000”

然后我们尝试在sinker机器上抓包 sudo tcpdump -i bond0.504 -s96 host 169.136.179.31 and port 9000 -w w.txt

用wireshark打开w.txt文件,我们能够看到大量的TCP Dup ACKTCP RetransmissionTCP Out-Of-Order

todo 解释下以上三种的含义

tcpdump问题包

从抓包结果分析,sinker端和ck端存在网络丢包或者重传

ifconfig 查看网络层统计

bond0.503 Link encap:Ethernet  HWaddr f8:f2:1e:3c:11:1c  
          inet addr:164.90.66.159  Bcast:164.90.66.255  Mask:255.255.255.0
          inet6 addr: fe80::faf2:1eff:fe3c:111c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:51099705427 errors:0 dropped:0 overruns:0 frame:0
          TX packets:35197660242 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:8000 
          RX bytes:72584273137501 (72.5 TB)  TX bytes:65194214206548 (65.1 TB)

相隔几分钟执行netstat -s 查看传输层统计 netstat_s

查看监控 TCP重传率 tcp重传率监控

联想到sinker和ck属于不同的机房,他们之间走的是专线, 猜想是不是由于专线带宽被打满导致网络拥塞丢包?

带宽监控

果然!

解决

明确了原因,接下来就好办了,只需将ck所在机器加入高优白名单即可。