[regex] when would you prefer capture groups or String tokenizers?
by Michael Uplawski from LinuxQuestions.org on (#5E5JX)
Good afternoon.
I revisit Jeffrey Friedl's great book on Regular Expressions. Each time I wonder if I should learn Perl, just for the fun of it. And each time, I know that I did enough programming without Perl and fear the steep learning curve.
But remembering my past solutions in code, when Stings had to be matched, split up into pieces, analyzed in any way, I have to admit that I avoided Regular Expressions, if I could do the same thing with a simple tokenizer. When you can define a delimiter, many string-functions and -methods let you split-up and compare strings by fraction, and you will not need to know much about Regular Expressions, even if these functions and methods often accept a Regulalr Expression as parameter.
Would you formulate a rule or just present an experience which talks about giving precedence to one over the other?
I shall provide code-examples...
Code:hulk@hogan:~$irb
irb(main):009:0> "hey, there is a 50-note lying on the table".scan(/.*,/)
=> ["hey,"]
irb(main):010:0> "hey, there is a 50-note lying on the table".scan(/\d+/)
=> ["50"]
irb(main):031:0> str = "a223abb233b".match(/(\d+)a/)[1]
=> "223"
irb(main):022:0> /a+(\d+)a.*(\1)b/.match "aaa233a234a233b"
=> #<MatchData "aaa233a234a233b" 1:"233" 2:"233">Code:#include <string.h>
#include <stdio.h>
int main(int argc, char** argv) {
char* str = "";
char* where = "";
str = "Hey, there is nothing lying on the table";
where = strstr(str, "nothing");
printf("%s\n", where);
return 0;
}


I revisit Jeffrey Friedl's great book on Regular Expressions. Each time I wonder if I should learn Perl, just for the fun of it. And each time, I know that I did enough programming without Perl and fear the steep learning curve.
But remembering my past solutions in code, when Stings had to be matched, split up into pieces, analyzed in any way, I have to admit that I avoided Regular Expressions, if I could do the same thing with a simple tokenizer. When you can define a delimiter, many string-functions and -methods let you split-up and compare strings by fraction, and you will not need to know much about Regular Expressions, even if these functions and methods often accept a Regulalr Expression as parameter.
Would you formulate a rule or just present an experience which talks about giving precedence to one over the other?
I shall provide code-examples...
Code:hulk@hogan:~$irb
irb(main):009:0> "hey, there is a 50-note lying on the table".scan(/.*,/)
=> ["hey,"]
irb(main):010:0> "hey, there is a 50-note lying on the table".scan(/\d+/)
=> ["50"]
irb(main):031:0> str = "a223abb233b".match(/(\d+)a/)[1]
=> "223"
irb(main):022:0> /a+(\d+)a.*(\1)b/.match "aaa233a234a233b"
=> #<MatchData "aaa233a234a233b" 1:"233" 2:"233">Code:#include <string.h>
#include <stdio.h>
int main(int argc, char** argv) {
char* str = "";
char* where = "";
str = "Hey, there is nothing lying on the table";
where = strstr(str, "nothing");
printf("%s\n", where);
return 0;
}