pcp
[Top] [All Lists]

Re: PCP Network Latency PMDA

To: William Cohen <wcohen@xxxxxxxxxx>
Subject: Re: PCP Network Latency PMDA
From: fche@xxxxxxxxxx (Frank Ch. Eigler)
Date: Wed, 25 Jun 2014 12:07:14 -0400
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <53A9E126.2040000@xxxxxxxxxx> (William Cohen's message of "Tue, 24 Jun 2014 16:35:50 -0400")
References: <53A34A47.3060008@xxxxxxxxxx> <53A9E126.2040000@xxxxxxxxxx>
User-agent: Gnus/5.1008 (Gnus v5.10.8) Emacs/21.4 (gnu/linux)
Hi, Will -

wcohen wrote:


> [...] The receive is a bit more complicated because of the async
> nature of interrupts for incoming packets and the syscall reading
> the data, but I think that I understand enough of that to code
> something similar for receives. [...]

Looking forward to that part.  It sounds significantly trickier.


> [...]
> $ cat /proc/systemtap/stap_df43d0122ca9ec5271896401487121a6_9305/net_latency 
> #dev: tx_queue_n tx_queue_avg tx_queue_sum tx_xmit_n tx_xmit_avg tx_xmit_sum 
> tx_free_n tx tx_free_avg tx_free_sum
> em1: 89 3277 291671 89 3284 292340 83 3536 293549
> lo: 9236 7 69453 9257 1948 18033760 140 96020 13442850

OK, that confirms the suspicion that a sampled-metric type of pmda
approach suits this better than timestamped-line-of-trace-data one.


> [...]
> probe kernel.trace("sys_enter") { sys_t[tid()] = gettimeofday_us() }
>
> probe kernel.trace("sys_exit") { delete sys_t[tid()] }

Note that not all syscalls return; you'll want sys_t to be %-marked or
something to avoid overflow over time.

You might consider also tracking the skb-free-to-sys_exit latency, as
time required to return to the application after the outgoing packet
has made it down to the network device.

Do we have any visibility into network drivers as to when they finally
send the packets out on the wire?

How does the script handle socket writes that don't result in an
outgoing transmission right away due e.g. to !TCP_NODELAY or TCP_CORK?


- FChE

<Prev in Thread] Current Thread [Next in Thread>