Let me put it like this: I set the opacity, color and tone in form of "constants" that will be used throughout the whole next render procedure and then I just render 4 vertexes with the texture. It's like BAM and all the problems I had earlier were suddenly solved. :O
float4 alpha : COLOR0;
float4 color : COLOR1;
float4 tone : COLOR2;
sampler2D tex0;
float3 rgb2hsl = {0.299, 0.587, 0.114};
float4 main(float2 texcoord0 : TEXCOORD0) : COLOR0
{
float4 c = tex2D(tex0, texcoord0);
if (c.a > 0)
{
c.a = c.a * alpha.a;
if (c.a > 0)
{
if (color.a > 0)
{
c.rgb = c.rgb * (1.0 - color.a) + color.rgb * color.a;
}
if (tone.a == 1)
{
c.rgb = c.rgb + (tone.rgb - 0.5) * 2;
}
else
{
float gray = dot(c.rgb, rgb2hsl);
c.rgb = (c.rgb - gray) * tone.a + gray + (tone.rgb - 0.5) * 2;
}
}
}
return c;
}
On top of that shading is done in a very efficient way. Multiplying two float values takes the same amount of time like multiplying two float vectors of actually 4 values. <3