[Top] [All Lists]

realtime section bugs still around

To: xfs@xxxxxxxxxxx
Subject: realtime section bugs still around
From: Jason Newton <nevion@xxxxxxxxx>
Date: Fri, 27 Jul 2012 01:14:17 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=iYbMYJxDoQ9dtyOXXO7rVqPE6QsILRHF8JWw15IKKWw=; b=s5G401037NH7uygsRgko2tZ4ofS93JDtCX6RiiAUYCaEJQyORfrsDEYoTpzCwfpC5n mMo8It856wDCHI8aSVvF5kxYBx9jpLmeEnAzuk5VDKXu+KF2liWDw4rHLpGJ94o81NqX AefUaFQ3MvsQnlXXJ21rYxxuWvQvX4goAMG5zrSxBc8YTHzMI4+ESRX01TdWB6RsWqVD Aq8FTskfnuQ2qx6TMGyjFFLxN3gmzDaQPmeanpaARNh8J/z6Fh+zHqotXVS1CP6Bcmi9 HPFDP0BuexQuVxbCnFyFKvlkC5b9EZLaP9lsfMwinka6ILjAzMWhr8N9OcA77YsS6j3h Doyg==

I think the following bug is still around:


I get the same stack trace.  There's another report out there somewhere with another similar stack trace.  I know the realtime code is not maintained so much but it seems to be a waste to let it fall out of maintenance when it's the only thing on linux that seems to fill the realtime io niche.

So this email is mainly about the null pointer deref on the spinlock in _xfs_buf_find on realtime files, but I figure I might also ask a few more questions.

What kind of differences should one expect between GRIO and realtime files?

What kind of on latencies of writes should one expect for realtime files vs normal?

My use case is diagnostic tracing on an embedded system as well as saving raw video to disk (3 high res 10bit video streams, 5.7MB per frame, at 20hz so effectively 60fps total).   I use 2 512GB OCZ vertex 4 SSDs which support ~450MB/s each.  I've soft-raided them together (raid 0) with a 4k chunksize and I get about 900MB/s avg in a benchmark program I wrote to simulate my videostream logging needs.  I only save one file per videostream (only 1 videostream modeled in simulation), which I append to in a loop with a single write call, which records the frame, over and over while keeping track of timing.  The frame is in memory and nonzero with some interesting pattern to defeat compression if its in the pipeline anywhere.  I get 180-300MB/s with O_DIRECT, so better performance without O_DIRECT (maybe because it's soft-raid?).  The problem is that I occationally get hickups in latency... there's nothing else using the disk (embedded system, no other pid's running + root is RO).  I use the deadline io scheduler on both my SSDs. 

I only have 50 milliseconds per frame and latencies exceeding this would result in dropped frames (bad).

Benchmarks (all time values in milliseconds per frame for the write call to complete), with 4k chunksizes for raid-0 (85-95% CPU):
[04:42:08.450483000] [6] min: 4 max: 375 avg: 6.6336148 std: 4.6589185 count = 163333, transferred 900.33G
[07:52:21.204783000] [6] min: 4 max: 438 avg: 6.4564963 std: 3.9554192 count = 34854, transferred 192.12G (total time=226.65sec, ~154fps)

O_DIRECT (60-80% CPU):
[07:46:08.912902000] [6] min: 13 max: 541 avg: 25.9286739 std: 10.3084094 count = 17527, transferred 96.61G

Some benchmarks of last nights 32k chunksizes for raid-0:
vectorized write (prior to d_mem aligned, tightly packed frames):
[05:46:02.481997000] [6] min: 4 max: 50 avg: 6.3724173 std: 3.1656021 count = 3523, transferred 19.42G
[06:14:19.416474000] [6] min: 4 max: 906 avg: 6.6565749 std: 9.2845644 count = 22538, transferred 124.23G
[06:15:58.029818000] [6] min: 4 max: 485 avg: 6.4346011 std: 5.6314630 count = 12180, transferred 67.14G
[06:33:24.125104000] [6] min: 4 max: 1640 avg: 6.7820190 std: 9.9053959 count = 40862, transferred 225.24G
[06:47:00.812176000] [6] min: 4 max: 503 avg: 6.7217849 std: 5.8866980 count = 13099, transferred 72.20G
[07:03:55.334832000] [6] min: 4 max: 505 avg: 6.5297441 std: 8.0027016 count = 14636, transferred 80.68G

non vectorized (many write calls):
[05:46:55.839896000] [6] min: 5 max: 341 avg: 7.1133700 std: 7.3144947 count = 2878, transferred 15.86G
[06:03:00.353392000] [6] min: 5 max: 464 avg: 7.8846180 std: 5.5350027 count = 27966, transferred 154.16G

[07:51:45.467037000] [6] min: 9 max: 486 avg: 11.6206933 std: 6.9021786 count = 9603, transferred 52.93G
[07:59:04.404820000] [6] min: 9 max: 490 avg: 11.8425485 std: 6.6553718 count = 32172, transferred 177.34G

xfs_info of my video raid:
meta-data="" isize=256    agcount=32, agsize=7380047 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=236161504, imaxpct=25
         =                       sunit=1      swidth=2 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=115313, version=2
         =                       sectsz=512   sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

I'm using 3.2.22 with the rt34 patchset.

If it's desired I can post my benchmark code. I intend to rework it a little so it only does 60fps capped since this is my real workload.

If anyone has any tips for reducing latencies of the write calls or cpu usage, I'd be interested for sure.

Apologies for the long email!  I figured I had an interesting use case with lots of numbers at my disposal.

<Prev in Thread] Current Thread [Next in Thread>