|To:||Dave Chinner <david@xxxxxxxxxxxxx>, xfs@xxxxxxxxxxx|
|Subject:||realtime section bugs still around|
|From:||Jason Newton <nevion@xxxxxxxxx>|
|Date:||Tue, 31 Jul 2012 16:01:13 -0700|
|Dkim-signature:||v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=QeEcN6GK5z1JFZljcxa1OkY+1Iu5JXEGeiG/vEngBfA=; b=LTDpCIZFLqhtf2XW7m2rpiOm6pHDE/9JjonhhwKdAYwXutPqCJ6rT0gOtZG3O33kZ5 diR7pk6JLJ11zIiVc9CCAStKNJovNcnRZ5EP24ttQsM71snU7SaC1Ursb3FDNaiJ4Xh2 FEopevnESU+cRG2XTUEw8iWU96S4Vx6A2N9pQvlUNhAnov6qoQ88NmtRZtmnMjCDbJA0 YqNGEeCNTBb6cWjfQjVlA97RoW0rX/p7KmGBhwmrMnx20KOotfdWfGaE1pXgIGRFQrbo l0SmKB8vjdEuO7SYgM3nDYTN0IZybbo07encWGpVMAxvllBLkKCeA9J1XHt/lZJi34/M aLBw==|
|References:||<CAGou9MgezsS=2+SngGWBJv5Npsuqacx1VPJwvMuf0FS+XnXt8A@xxxxxxxxxxxxxx> <20120730030333.GE2877@dastard> <CAGou9MheeBWxajd65szNfDB2L+VVoZ7SypEdUKj7np3L0H8fHA@xxxxxxxxxxxxxx>|
On Sun, Jul 29, 2012 at 8:03 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
This wasn't expected, thanks for the clarifications. What was the original point of RT files?
Well, I had meant with say one block of io.
I went to the intel builtin raid0 and I found chunksize 4k, 64k, and 128k, doesn't actually affect much in terms of latency, throughput with the simulation application I've written - nor CPU. Even directly streaming to the raid partition still gobbles 40% cpu (single thread, single stream @ 60fps, higher avg latency than xfs). XFS on any of these chunksizes is 60-70% CPU with 3 streams, 1 per thread. For XFS single thread, single stream @ 60fps it looked like the same as direct, maybe getting up to 45, and 50% CPU occasionally. All these numbers are seemingly dependent on the mood of the SSD, along with how often there were latency overruns (sometimes none for 45 minutes, sometimes every second - perhaps there's a pattern to the behavior). I'd be interested in trying larger blocksizes than 4k (I don't mean raid0 chunksize) but that doesn't seem possible with x86_64 and linux...
Note that you are also writing hundreds of GB to the SSDs, which
500ms does look like to be in the neighborhood for the garbage collection for these drives. Maybe 4-450 on the avg. This neighborhood is an obvious outlier in some tests.
Ah, that is interesting. I used to save tiffs but I figured that would be more variable in latency and cpu usage since it's opening and closing files constantly. However you have a definite point since it's not serialized to one stream, that there's some extra concurrency to exploit. I'll have to benchmark with multiple files again.
Indeed, if you use file per frame, and a RAID0 chunk size of 3MB
Yes, I don't really want to convolute the main program with AIO, it's complex enough as is.
Using buffered IO means the write(2) operates at memory speed, but
Interesting, what constitutes a proper Direct IO implementation? AIO + an recording structures who's size is a multiple of in this case 4k?
Sorry, the topic quickly moved from something of a bug report / query to an involved benchmark and testing. This xfs_info was not when I had the realtime section, it was just for 4k chunksize raid0. After a few crashes on the realtime section I moved on to other testing since I doubted there was little that could be done. I've since performed alot of testing (to be discussed hopefully in the next week, I'm getting to be pretty short on time) and rewrote the framelogging component of the application with average bandwidth in mind and decoupled the saving of frame data from the framegrabber threads. Basically I just have a configurable circular buffer of up to 2 seconds of frames. I think that is the best answer for now as from my naive point of view, its some combination of linux related (FS path was never RT) and SSD (garbage collection was unplanned... who knows what else the firmware is doing).
I'm still interested in finding out why streaming a few hundred MB to disk has so much over head in comparison to the calculations I do in userspace, though. Straight copies of frames (in the real program, copied because of limitations of the framegrabber driver's DMA engine) don't use as much cpu as writing to a single SSD. It takes a little over a millisecond to copy a frame. On hardware, while it's an embedded system it's got an 2.2ghz 2-core i7 in it, the southbridge is BD82QM67-PCH.
|<Prev in Thread]||Current Thread||[Next in Thread>|
|Previous by Date:||Re: xfs_growfs / planned resize / performance impact, Stan Hoeppner|
|Next by Date:||Re: realtime section bugs still around, Stan Hoeppner|
|Previous by Thread:||Re: realtime section bugs still around, Dave Chinner|
|Next by Thread:||Re: realtime section bugs still around, Stan Hoeppner|
|Indexes:||[Date] [Thread] [Top] [All Lists]|