Chaos Project

General => Electronic and Computer Section => Programming / Scripting / Web => Topic started by: Blizzard on November 25, 2016, 01:39:10 pm

Title: One of the most bizarre bugs I ever "fixed"
Post by: Blizzard on November 25, 2016, 01:39:10 pm
So, today I "fixed" a very bizarre bug.

We're using etcpak, an open source piece of code for conversion of images to the ETC1 format (all other implementations are much slower). It only had libpng so I integrated libjpeg in there as well since we kinda need it (it's especially useful since ETC1 is only RGB anyway). We made a container format called ETCX that uses 2 RGB images where the second (optional) image is the alpha channel and then we merge it all together in the shader. This is because we need texture compression for our newest game, but not a lot of devices support ETC2 yet.

I was batch converting images today and it would keep crashing on JPEG images. After I found the bug and fixed it (a race condition with data that libjpeg uses), it seemed to work fine. Well, except for those how_to_play_X.jpg images (X going from 0 to 9, we have 10 images). And it would keep crashing and crashing.

So I pull out the source again, compile, nothing. No crash. O_o I try the release configuration, no crash. O_o But etcpak.exe it crashes every single time when called from 2 layers of python (the main script calls a utility script as another process which then again calls etcpak for conversion before merging 2 ETC1 images into ETCX).

I start removing the libjpeg code, still keeps crashing. In the end I completely removed every trace of it, it still crashes. Then I removed the fopen() and fclose() calls and it suddenly stops. There is no threading problem here, the file is opened and closed in the main thread. And then I start suspecting the JPEG files themselves. There has to be some nasty corruption going on, but the images seem fine.
I open them in GIMP, just resave, nothing else and it works. It stops crashing. O_o

Long story short: "Corrupted" JPEG files would cause fopen() and fclose() would cause an exe to crash when it's done regardless of whether the exe had any JPEG code for handling or not. Merely accessing the file would cause the exe to be doomed to crash. And I was able to reproduce it consistently on another PC that pulled the JPEG files via SVN client. So it was definitely the files.

O_o
Title: Re: One of the most bizarre bugs I ever "fixed"
Post by: Ryex on December 06, 2016, 05:40:16 am
That seems indicative of a problem with how fopen was handled. but the very concept of such a bug existing is hard to comprehend. fopen is implemented both at the kernel level and at the clib level and both implementations are heavily used, tested, and vetted. have been for years. What's more interesting is that the contents of a file have absolutely no baring on the operation of fopen. that knowledge would lead me to believe that is was file system corruption. except, you say this was reproducible after pulling the file from svn on another computer. it's a bit baffling. could it be that file system corruption that existed when the files we're originally commuted caused svn to set some weird state flags that when cheeked out reproduced that file-system corruption? like misaligned file size/start-end position? something like that would cause the kernel to say "hey your not supposed to be reading there!" and halt the process.
Title: Re: One of the most bizarre bugs I ever "fixed"
Post by: Blizzard on December 06, 2016, 06:21:41 am
God knows. xD