Warning: there may be occasional oddness due to css and blog edits. **KNOWN ISSUE: possible hidden text**

Thursday, September 26, 2024

Bisect was a cache

I had been having troubles playing minetest, the symptom was sporadic lag, slowdowns, stuttering.  I had been seeing debug information in the xterm where I start minetest or other messages suggesting something that implied bad network.  Eventually I mentioned this on a NodeCore minetest discord server and gave information about what I was seeing.  This is where I was told that it seemed like my lag was network related.

The debug text repeated many many times:

"2024-09-24 20:42:06: WARNING[Main]: collisionMoveSimple: maximum step interval exceeded, lost movement details"

Within the next day or so, I finally decided to try to upgrade the ZIL and SLOG on my main box which I use to play minetest.  I had set two SSDs for this quite a long time ago and couldn't remember which one was the SLOG and which was the ZIL, so I ended up causing both to be disabled by pulling them out.  I managed to restore the SLOG but did not fix the ZIL at that moment.

What I discovered after this happened while the ZIL (cache) device was no longer being used, completely nonexistent and removed from my box, was that minetest no longer had those lag issues.  Prior to this, I had planned to attempt to discover where in the series of commits to minetest that things went wrong.  I had backtracked to a much earlier version of minetest, hoping to find a time when this sporadic but slowly progressively worsening lag began.  I was set to learn how to bisect the commits to discover when the issue appeared, but since removing the cache SSD cured the lag for my minetest client and also shortly also noticed some less snappiness in my GUI (FVWM3) as well, the whole bisect need was averted.  This blog post would have documented my experience with the whole bisect of minetest client process had I done so.

Instead of telling my tale of learning something new (bisect) I learned that my cache device must have had some kind of problem which was dragging my entire system down.  This should be one more thing to check, to see if your unexplained network lag debug report or similar message indicating network performance issues has nothing to do with actual networking, rather that essentially disk i/o was interfering with network transmission. If I knew off the top of my head how better to triage disk i/o, or prove that it was not network in actuality, then this entire conversation is likely to have been much shorter.

Since the cache (ZIL) was removed, I took a few moments to attach two 500GB SSDs as a striped cache device.  Perhaps over time it will speed things up, but at least now it is only going to be a positive addition, not something which will slow the system and network packets for a game.  If you have cache devices attached to your system for ZIL especially, remember that it can be the reason for general lag on your system when that SSD becomes faulty in one regard or another.

No comments:

Post a Comment

Thank you for your interest!

Frequently viewed this week