Problems with blt/stretch_blt

Started by Blizzard, August 20, 2011, 01:00:11 pm

Previous topic - Next topic

Blizzard

August 20, 2011, 01:00:11 pm Last Edit: August 20, 2011, 01:02:22 pm by Blizzard
Since ARC's RGSS portion of the engine runs with 3D acceleration through DirectX9 rather than using the slow DirectDraw that Enterbrain used, we've had quite a few problems with blt and stretch_blt operations because we were not using DirectDraw's integrated methods of their Bitmap classes but we simply rendered the textures as through render targets. Normal blt and stretch_blt operations would be a lot slower than rendering to render targets. I was able to fix the problem with alpha blending during rendering so a proper resulting alpha value is used, but there is still a problem with rendering the color properly. Don't get me wrong, this is the ACTUAL proper way to render the color, the problem is that Enterbrain used the Blt and StretchBlt functions which had a weird way of calculating the resulting color. After a whole day of analyzing the data and function curves, I simply couldn't figure out the formula used and how to make DirectX use that formula to blend the colors like Blt and StretchBlt do. The closest to what I came was this:

c = (cs * as + cd * ad) / (as + ad)

It worked in a good portion of the cases (probably 60% accuracy I'd say), but it didn't work completely. Besides, it was useless because I couldn't define this kind of calculation for the color in DirectX.

What all of this means? Bitmap#blt and Bitmap#strech_blt will not work the same in ARC as they worked in RMXP. The color formula is this one:

c = (cs * as + cd * (255 - as) / 255

This formula is the formula usually used in alpha blended colors. So screw Enterbrain, Microsoft and DirectDraw.
Check out Daygames and our games:

King of Booze 2      King of Booze: Never Ever
Drinking Game for Android      Never have I ever for Android
Drinking Game for iOS      Never have I ever for iOS


Quote from: winkioI do not speak to bricks, either as individuals or in wall form.

Quote from: Barney StinsonWhen I get sad, I stop being sad and be awesome instead. True story.

ForeverZer0

I'm gonna admit that how to calculate the formula for this is over my head, but I would say whatever works, use it. 
Like you said, screw them, make your own way.  :)
I am done scripting for RMXP. I will likely not offer support for even my own scripts anymore, but feel free to ask on the forum, there are plenty of other talented scripters that can help you.

G_G

And screw the Wright brothers! :V

As we progress with ARC, my hate for Enterbrain increases. They are all lazy japanese fucks >:U *look who's talking*

Blizzard

Lol! No, using DirectDraw for a 2D game was not a stupid or lazy idea. But by the time they made RMXP, nobody still used DirectDraw. I really don't understand why they didn't go with DirectX. :/ Just the use of 3D acceleration should have eradicated most of the performance problems RMXP has. :/
Even the Blt and StretchBlt color calculations are fine, they make sense. But I can't find any proper documentation on it and I can't figure out the calculation formula for the colors.
Check out Daygames and our games:

King of Booze 2      King of Booze: Never Ever
Drinking Game for Android      Never have I ever for Android
Drinking Game for iOS      Never have I ever for iOS


Quote from: winkioI do not speak to bricks, either as individuals or in wall form.

Quote from: Barney StinsonWhen I get sad, I stop being sad and be awesome instead. True story.

Ancurio

August 28, 2013, 08:00:04 am #4 Last Edit: August 28, 2013, 08:33:29 pm by Ancurio
Hey,

so I hit this really stupid problem with text rendering: If text is rendered on a bitmap,
what happens in my engine is that I first render it in software (using SDL_ttf), upload it to VRAM
and then simply render it ontop of the Bitmap like any other blending operation.
The problems start when drawing text with opacities <255. What I did until know was
render the text as usual, then draw it as a quad with the specified opacity. However, what happens
when I do this ontop of a cleared bitmap (as window contents always tend to be) is that the text
color is rendered at this opacity, and the resulting pixel itself also carries the opacity! This means
that what I see on the screen is actually at opacity^2, because the same opacity is reapplied during
the final render of the window contents sprite. (I'm sorry if this is hard to understand, I'm really bad
at explaining stuff..).
Anyway, this prompted me to look into how exactly the text drawing works in RMXP. Turns out it's
mostly the same process, except the text is blended using the same algorithm that blt and stretch_blt
employ (so probably the same DirectDraw functionality). The way this algorithm works makes the
"double opacity" problem never come up. Ok, enough blablah, here's what I got:

[Removed wrong algorithm]

(Would be cool if someone played with it a bit more and verified my results! :D)

And here's my giant problem with it: While the above is fairly easy to implement in software
(which I'm pretty sure is what DirectDraw does), it's really damn hard to do it in hardware because
at least with current gen GPUs, the blending stage is wired in fixed function hardware, and
it doesn't even look like that's going to change in the coming future. (On another note,
programmable blending is implemented on some mobile GPUs, but that's irrelevant).

This is a bit of a dilemma. I don't really want to keep around a shadow texture for each and
every Bitmap (to do the destination pixel reading from), so I'm thinking about how this could
be realized using other hacks. I know you guys use Direct3D, but we both use the same hardware
in the end so I'm pretty sure we're in the same boat here. The biggest problem is that I have
to somehow supply a "third" (uniform) alpha value (ab), but I can't use the source pixel
alpha because that would screw up the remaining color blending.
At least with text rendering I think I can salvage the situation because SDL_ttf provides me
with a software surface anyway, so I'm thinking about simply duplicating it, setting rgb to ab
and basically blending alpha and color components in two separate passes.

But as for the (stretch_/)blt situation, I have no fucking clue... I think I'll leave it aside for now as for
most games wrong blending doesn't seem to have much impact (most blt's are done with full opac).

Blizzard

Don't worry about it being implemented in ARC. Yes, you can't just implement it with th default functions, but you can when using a shader which ARC already is using.
Check out Daygames and our games:

King of Booze 2      King of Booze: Never Ever
Drinking Game for Android      Never have I ever for Android
Drinking Game for iOS      Never have I ever for iOS


Quote from: winkioI do not speak to bricks, either as individuals or in wall form.

Quote from: Barney StinsonWhen I get sad, I stop being sad and be awesome instead. True story.

Ancurio

Quote from: Blizzard on August 28, 2013, 08:49:31 am
Don't worry about it being implemented in ARC. Yes, you can't just implement it with th default functions, but you can when using a shader which ARC already is using.


=O How would you implement this in a shader? Does Direct3D allow readback from the framebuffer
you're writing to in its shaders?

Blizzard

August 28, 2013, 09:30:40 am #7 Last Edit: August 28, 2013, 09:32:13 am by Blizzard
Yeah, otherwise it wouldn't work well, because we additionally need the opacity parameter.

vertex shader: ShowHide
Code: hlsl
float4x4 WorldViewProjection : WORLDVIEWPROJ;

struct VS_INPUT
{
float4 position : POSITION;
float4 color : COLOR0;
};

struct VS_OUTPUT
{
float4 position : POSITION;
float4 color : COLOR0;
};

VS_OUTPUT main(VS_INPUT vertex)
{
VS_OUTPUT v;
v.position = mul(vertex.position, WorldViewProjection);
v.color = vertex.color;
return v;
}


pixel shader: ShowHide
Code: hlsl
float4 alpha : COLOR0;
float4 color : COLOR1;
float4 tone : COLOR2;
sampler2D tex0;
float3 rgb2hsl = {0.299, 0.587, 0.114};

float4 main(float2 texcoord0 : TEXCOORD0) : COLOR0
{
float4 c = tex2D(tex0, texcoord0);
if (c.a > 0)
{
c.a = c.a * alpha.a;
if (c.a > 0)
{
if (color.a > 0)
{
c.rgb = c.rgb * (1.0 - color.a) + color.rgb * color.a;
}
if (tone.a == 1)
{
c.rgb = c.rgb + (tone.rgb - 0.5) * 2;
}
else
{
float gray = dot(c.rgb, rgb2hsl);
c.rgb = (c.rgb - gray) * tone.a + gray + (tone.rgb - 0.5) * 2;
}
}
}
return c;
}



Though this can still be optimized because if-branches in pixel shaders are bad practice and reduce performance.
Check out Daygames and our games:

King of Booze 2      King of Booze: Never Ever
Drinking Game for Android      Never have I ever for Android
Drinking Game for iOS      Never have I ever for iOS


Quote from: winkioI do not speak to bricks, either as individuals or in wall form.

Quote from: Barney StinsonWhen I get sad, I stop being sad and be awesome instead. True story.

Ancurio

Um, okay, but where in your sprite shader are you reading values from the destination framebuffer??

Blizzard

August 28, 2013, 10:00:13 am #9 Last Edit: August 28, 2013, 10:02:53 am by Blizzard
That's not how shaders work. They calculate the final value of the pixel on the screen depending on the texture UV, modulation color (actually LERPing color in RGSS) and tone. They are calculating the value of the pixel on the screen which is the frame buffer value so it can be altered in any way before it is actually send to the frame buffer.
Check out Daygames and our games:

King of Booze 2      King of Booze: Never Ever
Drinking Game for Android      Never have I ever for Android
Drinking Game for iOS      Never have I ever for iOS


Quote from: winkioI do not speak to bricks, either as individuals or in wall form.

Quote from: Barney StinsonWhen I get sad, I stop being sad and be awesome instead. True story.

Ancurio

August 28, 2013, 10:12:48 am #10 Last Edit: August 28, 2013, 10:32:17 am by Ancurio
Quote from: Blizzard on August 28, 2013, 09:30:40 am
Yeah, otherwise it wouldn't work well,


Quote from: Blizzard on August 28, 2013, 10:00:13 am
That's not how shaders work.


You're contradicting yourself ^^

Anyway, I know how shaders work (I sure as hell wouldn't have been able to implement
sprites in my engine otherwise =P), and I'm also sure that current graphics hardware is
unable to do destination framebuffer readbacks, so using shaders doesn't really solve anything..

Edit: Remember that you need ac to calculate the final color, which depends on ar,
which in turn depends on ad. So you need the destination alpha to calculate the correct color.

Blizzard

Yes, I didn't express myself properly. What I meant is that you don't need a frame buffer value for that. It's possible to use separate blending for alpha and for color values, but I'm not sure if that would be enough now that I think about it.
Check out Daygames and our games:

King of Booze 2      King of Booze: Never Ever
Drinking Game for Android      Never have I ever for Android
Drinking Game for iOS      Never have I ever for iOS


Quote from: winkioI do not speak to bricks, either as individuals or in wall form.

Quote from: Barney StinsonWhen I get sad, I stop being sad and be awesome instead. True story.

Ancurio

Quote from: Blizzard on August 28, 2013, 10:34:00 am
Yes, I didn't express myself properly. What I meant is that you don't need a frame buffer value for that. It's possible to use separate blending for alpha and for color values, but I'm not sure if that would be enough now that I think about it.


Okay. Yeah, I don't think it's enough (read the Edit to my previous post).

Blizzard

Did you try tinkering with SetTextureStageState() with D3DTA_TFACTOR?
Check out Daygames and our games:

King of Booze 2      King of Booze: Never Ever
Drinking Game for Android      Never have I ever for Android
Drinking Game for iOS      Never have I ever for iOS


Quote from: winkioI do not speak to bricks, either as individuals or in wall form.

Quote from: Barney StinsonWhen I get sad, I stop being sad and be awesome instead. True story.

Ancurio

Quote from: Blizzard on August 28, 2013, 10:44:34 am
Did you try tinkering with SetTextureStageState() with D3DTA_TFACTOR?


Sorry, I'm not familiar with Direct3D at all =/ (I work with OpenGL), but from the bit of googling
I just did this seems to be part of the fixed function pipeline that was deprecated in favor
of programmable shaders, so I don't think it could help us.

Blizzard

August 28, 2013, 11:17:59 am #15 Last Edit: August 28, 2013, 11:45:34 am by Blizzard
Right, I forgot that you can't use those if you use shaders.

EDIT: Have you thought about using 2 passes and premultiplied alpha? If your first pass would calculate only the final alpha for the source with the same color and then use the result as source for a second operation, you could keep the alpha and use that alpha in the second pass.
Check out Daygames and our games:

King of Booze 2      King of Booze: Never Ever
Drinking Game for Android      Never have I ever for Android
Drinking Game for iOS      Never have I ever for iOS


Quote from: winkioI do not speak to bricks, either as individuals or in wall form.

Quote from: Barney StinsonWhen I get sad, I stop being sad and be awesome instead. True story.

Ancurio

Quote from: Blizzard on August 28, 2013, 11:17:59 am
EDIT: Have you thought about using 2 passes and premultiplied alpha? If your first pass would calculate only the final alpha for the source with the same color and then use the result as source for a second operation, you could keep the alpha and use that alpha in the second pass.


I was doing something similar to what you described (rendering alpha and color separately in two passes),
but after finding the proper blend algorithm it turned out that this won't cut it.
The problem is the division (2nd equation). ar is unfortunately the divisor
meaning I can't even do cute tricks like substituting a multiplication with the inverse.
I'm not sure how familiar you are with OpenGL, but none of the offered blend functions
offer anything that would make this doable (and I'm sure OpenGL already offers everything
the hardware offers).

I'm currently thinking about just biting the bullet and doing a copy of the destination buffer
to do the blending myself. As long as I reuse the same auxiliary buffer for it and don't do
any copies to RAM, maybe it's not the end of the world after all.. gotta profile stuff as always ;D

Blizzard

August 28, 2013, 12:46:11 pm #17 Last Edit: August 28, 2013, 12:48:04 pm by Blizzard
I'm as familiar with OpenGL as I am with DirectX. Essentially they are very similar, except that some of the logic is different (e.g. setting the current modelview matrix or left-handed vs. right-handed vertex rendering).

I know that you can't accurately replicate it, but getting close is better than nothing. Lol, good that I gave up back then. I would have gone insane if I had kept working on it.
Check out Daygames and our games:

King of Booze 2      King of Booze: Never Ever
Drinking Game for Android      Never have I ever for Android
Drinking Game for iOS      Never have I ever for iOS


Quote from: winkioI do not speak to bricks, either as individuals or in wall form.

Quote from: Barney StinsonWhen I get sad, I stop being sad and be awesome instead. True story.

Ancurio

Quote from: Blizzard on August 28, 2013, 12:46:11 pm
I'm as familiar with OpenGL as I am with DirectX. Essentially they are very similar, except that some of the logic is different (e.g. setting the current modelview matrix or left-handed vs. right-handed vertex rendering).


Yeah, and the whole C++ object oriented stuff ^^ I also heard handling of offscreen
rendertargets is heaps easier compared to OGL.

Quote from: Blizzard on August 28, 2013, 12:46:11 pm
I know that you can't accurately replicate it, but getting close is better than nothing. Lol, good that I gave up back then. I would have gone insane if I had kept working on it.


Actually, going this (performance-wise) hard route means I get to replicate it pixel-perfectly ^^

Ancurio

Sorry for the delay.

While implementing the blend mode in glsl, I noticed that I had the calculation wrong after all.
So I started from scratch. Here's the important snipped from the final code:


vec4 resFrag;

float ab = opacity; // blend opacity
const float as = srcFrag.a;
const float ad = dstFrag.a;

const float at = ab*as;
resFrag.a = at + ad - ad*at;

resFrag.rgb = mix(dstFrag.rgb, srcFrag.rgb, ab*as);
resFrag.rgb = mix(srcFrag.rgb, resFrag.rgb, ad*resFrag.a);

gl_FragColor = resFrag;


It's unfortunately not 100% pixel perfect like I had hoped for. It is only accurate for ad = 0 and = 255,
the values in between are linearly interpolated but in RMXP they change in a more "square" way.
But that's good enough for my needs I guess.