Manipulate a list of numbers and text strings with regex
by fyrmest from LinuxQuestions.org on (#582YC)
Hello,
I have a list of over 5,000 German vocabulary words, each preceded by a number. There is no punctuation in the file whatsoever, just number - space - word - space - number - word, etc., as follows:
68990 hast 68312 sein 67939 ihr 67905 da 67378 aus 67376 kann 67062 aber 66548 Aber 65985 schon 65002 wenn 62904 wird 61210 um 60432 Wie 59151 als 57636 bist 57362 im 56522 mal 55227 doch 54294 gut 53007 meine 52473 jetzt 50439 wei 49877 Wenn 48978 werden
I would like to add a line feed after each vocabulary word, so that each line has number - word - line feed.
Notice that the numbers are of length from 1 to 6 digits, and the words are in German, in UTF-8 format, in a mix of upper and lower case.
I am somewhat familiar with regex, awk, sed and grep, but not enough to be able to create the winning expression to make the magic happen.


I have a list of over 5,000 German vocabulary words, each preceded by a number. There is no punctuation in the file whatsoever, just number - space - word - space - number - word, etc., as follows:
68990 hast 68312 sein 67939 ihr 67905 da 67378 aus 67376 kann 67062 aber 66548 Aber 65985 schon 65002 wenn 62904 wird 61210 um 60432 Wie 59151 als 57636 bist 57362 im 56522 mal 55227 doch 54294 gut 53007 meine 52473 jetzt 50439 wei 49877 Wenn 48978 werden
I would like to add a line feed after each vocabulary word, so that each line has number - word - line feed.
Notice that the numbers are of length from 1 to 6 digits, and the words are in German, in UTF-8 format, in a mix of upper and lower case.
I am somewhat familiar with regex, awk, sed and grep, but not enough to be able to create the winning expression to make the magic happen.