NEW READERS: IT’S NOT ABOUT THE FILESYSTEM ANYMORE BUT IT’S STILL BROKEN: SEE UPDATES AT BOTTOM OF POST. Addressing filesystem performance only partly fixed it. Thanks!
Since always, I’ve had latency issues on my digital audio workstation, which is running Ubuntu Linux (currently 12.04 LTS) against a Gigabyte motherboard with 4G of RAM and a suitably symmetric four-core processor. CPUs run 20%-ish in use most of the time (and all the time for these purposes), and I never have to swap.
In this configuration, I should be able to get down to around 7ms of buffer time and not get XRUNs (data loss due to buffer overrun) in my audio chain. 14ms if I want to be safe.
In reality, I can’t make it reliably at 74ms, and that has hitches I just have to live with. To get no XRUNs or close to it I have to go up to like 260ms, which is insane. I even tried getting a dedicated root-device USB card – I’ve long assumed it was some sort of USB issue. But no.
With some new tools (latencytop in particular) I have found it. It’s the file system. Specifically, it’s in the ext3′s internal transaction logging. To wit:
EXT3: committing transaction 302.9ms log_wait_commit 120.3ms
If I turn off read-time updating, which I tried last night, I get rid of 90% of the XRUNs, because the file system does about 90% less transaction logging to update all those inodes with new times.
But any attempt to write – well, you can guess. Even the pure realtime kernel doesn’t help; I compiled and installed a custom build of one today, but apparently this is still atomic: I get exactly the same behaviour. I may be able to live with that to some degree, because it’s a start-and-stop-of-writes thing, and as long as it doesn’t trigger during writes, I can get by.
But it’s bullshit, and it pisses me off.
I’m currently in progress of updating ext3 to ext4. I’d like to think that would solve it, given ext4′s dramatically better performance, but I have no such assurances at this point. I genuinely thought the realtime kernel might do it.
DO YOU HAVE ANYTHING YOU CAN TELL ME, DEAR INTERNETS? Particularly about filesystem tuning. Because this shouldn’t be happening; it just shouldn’t. Honestly, three tenths of a second to commit a transaction? I’ve been places where that kind of number was reasonable; it was called 1983, and I don’t live there anymore.
THINGS IT IS NOT:
- Shared interrupt
- This particular hard drive (the previous drive did it too; this one is faster)
- ondemand CPU scheduling (i’m running in performance)
- this particular USB port or a USB hub or extension cord or any of the sort
- bluetooth or other random services (including search)
- Corrupt HD
- Old technology (it’s SATA; the drive is like six months old)
- lack of RT kernel. I built this RT kernel today.
- Going to be solved by installing a different operating system. Please don’t.
ETA: I got the ext3 filesystem upgraded to ext4, which made all those above numbers get dramatically smaller, but no further XRUN improvement. So I then disabled journaling, a configuration which outperforms raw ext2 in benchmarks I saw, and the machine is screamingly fast despite the RT kernel…
…and it hasn’t made one goddamn whit of difference in the remaining XRUNs. WTF, computer? WTF.
ETA2 (23:51 18 August): Okay, while screwing with the filesystem did solve many XRUN problems, there are still other XRUNs which are apparently unrelated, most notably, the master-record-enable XRUN. Even moving the project to a tmpfs RAM disk and running from there produced identical results, so I’m concluding this is an entirely separate problem.
I’ve already done pretty much everything there is to do the LinuxMusicians configuration consultation page and my setup actually passes their evaluation script. I should be golden, but I’m not. Help?
ETA3 (0:26 19 August): Every two minutes, right now, with the system mostly idle, I’m getting a burst of XRUNs. On an idle machine. But it is exactly every two minutes. And while Ardour remains on top of Top even when idle (at 10% of CPU and 13.5% of RAM), Xorg pops up just underneath it, and its CPU use spikes.
What does Xorg do every two minutes? Anybody? Seriously I have no idea.
ETA4 (13:19 19 August): ARDOUR 3 TRIGGERS SESSION SAVE EVERY TWO MINUTES BY DEFAULT. Disabling that STOPS the two-minute failures entirely. We’re back to file system adventures. Holy hell. THIS HAPPENS EVEN ON RAMDISK so it’s not filesystem or media specific. What the hell is going on here?