Perl command performance issue in Linux
by satyarankireddy from LinuxQuestions.org on (#5GDKE)
I've used the following command to remove special characters in number columns (.CSV file) and it is working fine as excepted but the issue here is performance. My CSV file number column data contains as below. To remove 1000 separator comma in data I've used the following Perl command.
Quote:
Note: My file delimiter was "," comma.
input file :
Organization,Amount,Revenue,Balance,Desc
Congos,"4,233.78","3,233.78","1,233.78",Payment
Toyoto,590.2,390.2,190.2,Payment
lenives,"5,234.89","2,234.89","1,234.89",Payment
Excepted OutPut:
Organization,Amount,Revenue,Balance,Desc
Congos,4233.78,3233.78,1233.78,Payment
Toyoto,590.2,390.2,190.2,Payment
lenives,5234.89,2234.89,1234.89,Payment
Command : cat | perl -p -e 's/,(?=[\d,.]\d")//g and s/"(\d[\d,.])"/\1/g' 'test.csv' >> newfile.csv
File data count: 11 millions data
Issue: It was taking almost 10 minutes to remove 1000 separate "comma" in data.
Any better solution to improve performance?


Quote:
payment "4,326.34" 590.20 "12,499.40" |
input file :
Organization,Amount,Revenue,Balance,Desc
Congos,"4,233.78","3,233.78","1,233.78",Payment
Toyoto,590.2,390.2,190.2,Payment
lenives,"5,234.89","2,234.89","1,234.89",Payment
Excepted OutPut:
Organization,Amount,Revenue,Balance,Desc
Congos,4233.78,3233.78,1233.78,Payment
Toyoto,590.2,390.2,190.2,Payment
lenives,5234.89,2234.89,1234.89,Payment
Command : cat | perl -p -e 's/,(?=[\d,.]\d")//g and s/"(\d[\d,.])"/\1/g' 'test.csv' >> newfile.csv
File data count: 11 millions data
Issue: It was taking almost 10 minutes to remove 1000 separate "comma" in data.
Any better solution to improve performance?