[C#] Regexp Help

Started by G_G, February 18, 2011, 07:41:49 pm

Previous topic - Next topic

G_G

February 18, 2011, 07:41:49 pm Last Edit: February 18, 2011, 08:10:19 pm by game_guy
Alright, so I've started to get the hang of regular expressions. I've never really understood them but I'm getting the hang of it. Alright so I've got a list of files. I want to be able to access these files using something like this.
$FILES[file_index]

So here's my pattern
string pattern = "\\$FILES\\[\\d{1}\\]$";

Which every time I test the pattern it works just fine. But what I want to be able to do is get the digit in between the brackets and then replace the entire thing with a file from the array. Something like this.
string input = "<img src=\"$FILES[0]\" />
string pattern = "\\$FILES\\[\\d{1}\\]$";
string output = Regex.Replace(input, pattern, files[digit variable here]);


How do I get the digit? Thanks guys!

EDIT: Resolved by myself. I'm pretty sure there is an easier way but this is what I did.
string pattern = "\\$FILES\\[\\d{1}\\]$";
string output = "";
string input = "$FILES[0]";
string html = "";
if (Regex.IsMatch(input, pattern))
{
    pattern = "\\d{1}";
    Match match = Regex.Match(input, pattern);
    output = files[Convert.ToInt32(match.value)];
    html += String.Format("<img src=\"{0}\" />", output);
}


Works just fine. If there is a shorter/easier way lemme know. Not saying that my way was hard at all.

Ryex

I was looking for the simple regexp functionality ruby has when I used it in c# too. it doesn't exist. what you have the is about a simple as it gets.
I no longer keep up with posts in the forum very well. If you have a question or comment, about my work, or in general I welcome PM's. if you make a post in one of my threads and I don't reply with in a day or two feel free to PM me and point it out to me.<br /><br />DropBox, the best free file syncing service there is.<br />

G_G

Yea. The only bad part though with mine is what if input happens to be
input = "Ninja 1 - $FILES[0]";

Then yea...

Blizzard

February 19, 2011, 04:21:12 am #3 Last Edit: February 19, 2011, 04:25:11 am by Blizzard
You can also use gsub! (for multiple matches) or sub! (for one match) and then use $1 and $2. Otherwise, yes, this is as simple as it gets.

In Cateia we use this kind of format for localization files:

Spoiler: ShowHide
Key1
{
Value1
}

Key2
{
Part1 of Value2
Part2 of Value2
Part3 of Value2
}


This is a Ruby script that parses such files and gives some statistics:

Spoiler: ShowHide
def parseFile(filename)
 data = {}
 begin
   f = File.open(filename, 'r')
   # the first line of a UTF file can be "screwed up"
   skip = 0
   line = f.readline()
   line.each_byte {|i|
     break if i < 128
     skip += 1
   }
   f.seek(skip)
   # read data
   string = f.read().gsub("\r", '')
   f.close()
   # regular expressions are awesome
   re = /(.+)\n\{\n((?:.|\n)+?)\n\}/
   while m = re.match(string)
     data[m[1]] = m[2]
     string.sub!(re, "")
   end
 rescue
   puts "exception: #{$!.message}"
   puts $!.backtrace.join("\n")
 end
 
 return data
end

entries = []
Dir.entries('.').each {|i|
 if File.file?(i) && i != '.' && i != '..'
   entries.push(i)
 end
}

size = 0
count = 0
longest = 0
entries.each {|filename|
 data = parseFile(filename)
 size += data.size
 # check for multi-byte characters properly
 re = /(\w+)/
 data.each_key {|key|
   length = 0
   string = data[key]
   while m = re.match(string)
     string.sub!(re, "")
     length += 1
   end
   count += length
   longest = length if length > longest
 }
}

puts "Statistics:"
puts "-- Entries: #{size}"
puts "-- Words: #{count}"
puts "-- Longest entry: #{longest}"
gets


As you can see, I have to replace the matched string with an empty string if I want to do multiple matches like this. :/ Python did have a different way of doing it.

Spoiler: ShowHide
import sys
import re

def parseFile(filename):
    data = {}
    try:
        f = open(filename, 'r')
        # the first line of a UTF file can be "screwed up"
        skip = 0
        line = f.readline()
        for i in line:
            if ord(i) < 128:
                break
            skip += 1
        f.seek(skip)
        # read data
        string = f.read().replace('\r', '')
        f.close()
        # regular expressions are awesome
        matches = re.findall('(.+)\n\{\n((?:.|\n)+?)\n\}', string)
        for match in matches:
            data[match[0]] = match[1]
   
    except Exception, e:
        print 'exception: ' + str(e)
        trace_exception()
   
    return data
   
filename = 'English.lang'
if len(sys.argv) > 1:
    filename = sys.argv[1]
data = parseFile(filename)
count = 0
longest = 0
for key in data.keys():
    length = len(re.findall(r'\w+', data[key]))
    count += length
    longest = max(longest, length)

print "Statistics for %s:" % filename
print "-- Entries: %d" % len(data)
print "-- Words: %d" % count
print "-- Longest entry: %d" % longest


Also, you should use \d+ for numbers. I think that d{1} will give you only the numbers for 0 to 9.
Check out Daygames and our games:

King of Booze 2      King of Booze: Never Ever
Drinking Game for Android      Never have I ever for Android
Drinking Game for iOS      Never have I ever for iOS


Quote from: winkioI do not speak to bricks, either as individuals or in wall form.

Quote from: Barney StinsonWhen I get sad, I stop being sad and be awesome instead. True story.

Zeriab

The key to extract a subset of the matched expression is like in Ruby to use capture groupings. (Parenthesis)
.NET allows you to name the capture groupings which often makes the code easier to read.

Here is a snippet showing the idea: (I did change the regular expression to match any number rather than just 0-9)
string input = "Ninja 1 - $FILES[0]";
Regex regex = new Regex("\\$FILES\\[(?<file_number>\\d+)\\]$");

Match match = regex.Match(input);
if (match.Success)
{
    Console.WriteLine(match.Groups["file_number"].Value);
}
else
{
    // Handle failure
}


@Blizz:
Wrong language  :haha:
I did incorporate your \d+ note

*hugs*

Blizzard

I know it's for C#, I'm just saying. xD .NET might have some way to get all matches at once.
Check out Daygames and our games:

King of Booze 2      King of Booze: Never Ever
Drinking Game for Android      Never have I ever for Android
Drinking Game for iOS      Never have I ever for iOS


Quote from: winkioI do not speak to bricks, either as individuals or in wall form.

Quote from: Barney StinsonWhen I get sad, I stop being sad and be awesome instead. True story.

Ryex

say, you can look at the source for the RMX-OS GUI the code I use to get the data from the rmxos config file uses Regexp and match groups
I no longer keep up with posts in the forum very well. If you have a question or comment, about my work, or in general I welcome PM's. if you make a post in one of my threads and I don't reply with in a day or two feel free to PM me and point it out to me.<br /><br />DropBox, the best free file syncing service there is.<br />

Zeriab

Oh yeah, I though you might find the regular expression easier to read if you use verbatim strings.
Regex regex = new Regex(@"\$FILES\[(?<file_number>\d+)\]$");

You may not be able to do that always. I can imagine it'll be problematic if you want to search for a newline or tabular.

@Blizzard:
Yes, you can use regex.matches(input) to retrieve a MatchCollection.
Using regex.match(input) is imo a nicer solution in this case as you'll find at most 1 match. (Notice the $ token at the end which means end of string unless the multiline option is selected).

*hugs*

Blizzard

There are regex options to allow it to be multiline, though.
Check out Daygames and our games:

King of Booze 2      King of Booze: Never Ever
Drinking Game for Android      Never have I ever for Android
Drinking Game for iOS      Never have I ever for iOS


Quote from: winkioI do not speak to bricks, either as individuals or in wall form.

Quote from: Barney StinsonWhen I get sad, I stop being sad and be awesome instead. True story.

Ryex


string p1 = @"(\w+)\s*=\s*(.*)\s*";
Regex r = new Regex(p1);
string cfg = rmxoscfg.ReadLine();
Match m = r.Match(cfg);
if (m.Success)
{
    Group g = m.Groups[1];
    CaptureCollection cc = g.Captures;
    Capture c = cc[0];
    value = c.ToString();
}


that is a compressed spinet from the config app.  basically you get a match object  by calling the Match() method of the Regexp object. then you can get a group object that refers to one of the sets of () in the Regexp, the index starts at 1 for the () and If I remember right 0 gets a group that refers to the entire Regexp. from the group object you can get a capture collection. the collection stores all the strings that matched that group. from the collection you can get a capture. and calling ToString() you get the string that matched.
I no longer keep up with posts in the forum very well. If you have a question or comment, about my work, or in general I welcome PM's. if you make a post in one of my threads and I don't reply with in a day or two feel free to PM me and point it out to me.<br /><br />DropBox, the best free file syncing service there is.<br />

G_G

Ah! Thanks Zeriab, will definitely be of help. Blizz, You mentioned to change the \d{1}, I'm forcing it so users can only have 10 files possible.