String Encoding problems

Started by Blizzard, August 01, 2012, 02:52:19 pm

Previous topic - Next topic

Blizzard

August 01, 2012, 02:52:19 pm Last Edit: August 01, 2012, 03:01:20 pm by Blizzard
Ruby 1.9.x has by default US-ASCII (7 bit encoding) as file encoding instead of ASCII-8BIT. While this generally does not cause problems, something else does. If a source file has a UTF-8 encoding, all string literals automatically become UTF-8 encoded strings and that causes problems. e.g. You can't really use Win32API properly anymore because the default strings that you create will have to be converted which is not available in Ruby 1.8.x which then again means that your code is not backwards compatible.

There are three ways to avoid this problem. I have tried all three. Two of them would be easily integrated into ARC if they actually worked. :/ The last one is a bit hackish, but it's the only one that I was able to get working.

1. Use -E switch when initializing Ruby to force the "internal" default encoding. There's also an "external" encoding, but that one is for strings that are read from external sources using IO. Regardless, it's not working.

2. Set Encoding#default_internal (and Encoding#default_external) to the wanted encoding. Doesn't work either.

3. Adding a so-called magic comment like "# encoding: ascii-8bit" in the first line of the script. This one works, but it's really hackish. When I read "magic comment", I immediately thought of "magic numbers". Every half-assed programmer knows that magic numbers are bad practice. :/

Why is this important and how does it affect the source code? Well, apparently not all. A file can be UTF-8 encoded, but if we force the ASCII-8BIT encoding on created strings, they will all be binary strings and hence the scripts will be backwards compatible with Ruby 1.8.x. The only way I see to get this hack working is to simply remove the first line of the script in the script editor and add it back with the BOM (the 3 bytes at the beginning of the file designating a UTF-8 encoded file) and the comment "# encoding: ascii-8bit". Does anybody have any other ideas?

We don't have to worry about using ASCII-8BIT, because it's basically binary encoding. It will work with everything without problems.
Here are a few links if you want to read up on encodings in Ruby 1.9.x, how it affects everything and ways to work with it.
http://www.ruby-doc.org/core-1.9.3/String.html#method-i-encode
http://www.ruby-doc.org/core-1.9.3/Encoding.html
http://nuclearsquid.com/writings/ruby-1-9-encodings
http://blog.grayproductions.net/articles/ruby_19s_three_default_encodings
http://www.ruby-forum.com/topic/141711

EDIT: I'd like to avoid aliasing String#new. We're running Ruby 1.9.x, we should keep its features working.

EDIT: I also tried the -K switch already, but we should avoid using that one as it's deprecated. And it didn't seem to work either.

EDIT: And please no recompilation of Ruby. #_# Even though that one might actually be the best solution here, but it could be quite nasty to find the proper pieces of code that need to be changed.
Check out Daygames and our games:

King of Booze 2      King of Booze: Never Ever
Drinking Game for Android      Never have I ever for Android
Drinking Game for iOS      Never have I ever for iOS


Quote from: winkioI do not speak to bricks, either as individuals or in wall form.

Quote from: Barney StinsonWhen I get sad, I stop being sad and be awesome instead. True story.

ForeverZer0

Have you tied setting the default encoding with $KCODE or RUBYOPT, or used the -Ku commandline switch? -Ku is supposed to set the the encoding internally and for script parsing.
I am done scripting for RMXP. I will likely not offer support for even my own scripts anymore, but feel free to ask on the forum, there are plenty of other talented scripters that can help you.

Blizzard

August 01, 2012, 03:19:14 pm #2 Last Edit: August 01, 2012, 03:37:27 pm by Blizzard
$KCODE doesn't seem to work either. e.g. When I try "p $KCODE", I get nil. As I said in my 2nd edit, I tried the -K switch, but nothing happens. I just tried setting the ENV variable "LANG=en_US.ASCII-8BIT", but it didn't work either. I haven't tried RUBYOPT yet.

EDIT: Nothing works except the magic comment.
Check out Daygames and our games:

King of Booze 2      King of Booze: Never Ever
Drinking Game for Android      Never have I ever for Android
Drinking Game for iOS      Never have I ever for iOS


Quote from: winkioI do not speak to bricks, either as individuals or in wall form.

Quote from: Barney StinsonWhen I get sad, I stop being sad and be awesome instead. True story.

ForeverZer0

Very strange.
"-Ku" us supposed to set the internal encoding and script parsing to the given encoding. I wonder why these don't work...
I am done scripting for RMXP. I will likely not offer support for even my own scripts anymore, but feel free to ask on the forum, there are plenty of other talented scripters that can help you.

Blizzard

August 01, 2012, 03:53:29 pm #4 Last Edit: August 01, 2012, 04:36:39 pm by Blizzard
Both -K and $KCODE have been deprecated and removed in Ruby 1.9.x. If I try to set $KCODE, I get a warning that tells me that the action was ignored.

EDIT: Encoding.default_(in|ex)ternal isn't the solution for our problem: http://www.ruby-forum.com/topic/169062#741901

EDIT: As it seems the only solutions are Ruby source code edit and recompilation, manually calling some crazy Ruby functions before loading files (or at the initialization of the interpreter if possible) or magic comments set by the editor.
Check out Daygames and our games:

King of Booze 2      King of Booze: Never Ever
Drinking Game for Android      Never have I ever for Android
Drinking Game for iOS      Never have I ever for iOS


Quote from: winkioI do not speak to bricks, either as individuals or in wall form.

Quote from: Barney StinsonWhen I get sad, I stop being sad and be awesome instead. True story.

Ryex

I think It's best not to think of it as a magic comment. it's a comment that is phrased when the interrupter phrases the file that is read a directive to the VM to treat the file as a specified encoding. I don't feel it's hackish at all as it's a fairly standard way to declare an encoding in a file, it's used everywhere from webpages to established languages like python.

I say have the editor place the encoding line after the UTF8 binary header and then write the script. we could have a drop down list of encodings the editor could enforce on a file by file case and have it change the encoding line
I no longer keep up with posts in the forum very well. If you have a question or comment, about my work, or in general I welcome PM's. if you make a post in one of my threads and I don't reply with in a day or two feel free to PM me and point it out to me.<br /><br />DropBox, the best free file syncing service there is.<br />

Blizzard

No, leave the file encoding UTF-8, that's just fine. I'm only talking about the encoding for literal strings and new strings here. Their encoding will default to the same encoding as the file's encoding which causes problems. Ideally all strings should have the binary (ASCII-8BIT) encoding as default regardless of the file's encoding.

Among those links that I posted, people are discussing actually the problem that using \u in strings will give the string UTF-8 encoding automatically while using \x is ambiguous. If \x strings would just default to strings with ASCII-8BIT encoding, the problem would be basically solved as coders would only have to make sure to use a string with \x to designate binary encoding. Sadly the earliest discussion I found was in 2007 and since so far it hasn't been changed, I don't expect it to change in the future either.

I am still for hiding the magic comment in the editor though.
Check out Daygames and our games:

King of Booze 2      King of Booze: Never Ever
Drinking Game for Android      Never have I ever for Android
Drinking Game for iOS      Never have I ever for iOS


Quote from: winkioI do not speak to bricks, either as individuals or in wall form.

Quote from: Barney StinsonWhen I get sad, I stop being sad and be awesome instead. True story.

Ryex

can we not use // to designate literal stings in binary form?
I no longer keep up with posts in the forum very well. If you have a question or comment, about my work, or in general I welcome PM's. if you make a post in one of my threads and I don't reply with in a day or two feel free to PM me and point it out to me.<br /><br />DropBox, the best free file syncing service there is.<br />

ForeverZer0

I have no problem with the magic comment, not even hiding it. As Ryex said, I don't think it should be viewed as a "magic number" because it really isn't. Its actually convention to have these comments within scripts, and I know I have always used them in Ruby scripts I have written that are unrelated to RMXP. I am actually all for allowing and even "enforcing" a bit more real-world Ruby and less "RGSS" mentality with scripters. Too often I have seen scripters, even good ones, be stuck within the "RMXP box". Although they are talented scripters, they don't have much knowledge of Ruby outside of the basics found in the help manual. I'm not even just talking about Ruby classes, but even class methods for basic classes such as Array, etc. If the method's documentation was for some odd reason missing from the help manual, how often do you ever see it in a RMXP script? Rarely. I am all for supplying the actual Ruby documentation (or at least link to the online documentation) with ARC rather than a watered down version.

I know this is slightly off-topic, but the point is, I don't think we should be worrying about the comments. Let that be the "fix", and lets embrace it. 

I am done scripting for RMXP. I will likely not offer support for even my own scripts anymore, but feel free to ask on the forum, there are plenty of other talented scripters that can help you.

Blizzard

August 03, 2012, 02:23:32 am #9 Last Edit: August 03, 2012, 02:24:49 am by Blizzard
That's not my main concern. My main concern is backwards compatibility with RMXP's script. What good is ARC Legacy Edition if you can't just insert your RMXP script and it works? Whether we like it or not, we have to obey some "RMXP box" rules if we want ARC to work seamlessly.
Check out Daygames and our games:

King of Booze 2      King of Booze: Never Ever
Drinking Game for Android      Never have I ever for Android
Drinking Game for iOS      Never have I ever for iOS


Quote from: winkioI do not speak to bricks, either as individuals or in wall form.

Quote from: Barney StinsonWhen I get sad, I stop being sad and be awesome instead. True story.

ForeverZer0

You could have the converter add it. That wouldn't solve copy-paste additions, but it would take care of the majority of it.

I dunno, I see your point, though. Could have the editor check if its there, and insert if not.
I am done scripting for RMXP. I will likely not offer support for even my own scripts anymore, but feel free to ask on the forum, there are plenty of other talented scripters that can help you.

Blizzard

I was thinking something along those lines. The converter adds them either way and the editor makes sure that new scripts or edited scripts have them, too. If somebody goes ahead and deletes them, they deserve that the coed breaks on them. :>
Check out Daygames and our games:

King of Booze 2      King of Booze: Never Ever
Drinking Game for Android      Never have I ever for Android
Drinking Game for iOS      Never have I ever for iOS


Quote from: winkioI do not speak to bricks, either as individuals or in wall form.

Quote from: Barney StinsonWhen I get sad, I stop being sad and be awesome instead. True story.

Ryex

is this still going to be a problem with Ruby 2
I no longer keep up with posts in the forum very well. If you have a question or comment, about my work, or in general I welcome PM's. if you make a post in one of my threads and I don't reply with in a day or two feel free to PM me and point it out to me.<br /><br />DropBox, the best free file syncing service there is.<br />

Blizzard

I think not. Ruby 2 always enforces UTF8 encoding and that's great because UTF8 is backwards compatible with ASCII. Just test it once to make sure and that's basically it.
Check out Daygames and our games:

King of Booze 2      King of Booze: Never Ever
Drinking Game for Android      Never have I ever for Android
Drinking Game for iOS      Never have I ever for iOS


Quote from: winkioI do not speak to bricks, either as individuals or in wall form.

Quote from: Barney StinsonWhen I get sad, I stop being sad and be awesome instead. True story.