Article 4Y22K Awk to find lines in a file with unbalanced numbers of parentheses

Awk to find lines in a file with unbalanced numbers of parentheses

by
kmkocot
from LinuxQuestions.org on (#4Y22K)
Hi all,

I'm a biologist working with files that represent phylogenetic trees. I have a couple files from analyses that have been running for literally months but at least one line in each file got corrupted when that hard drive filled up. The files have one tree represented per line. They should have an even number of "(" and ")" parentheses on each line (see below for an example of a good one) and end with a semicolon, but on at least one line got truncated or otherwise messed up somehow.

I used these commands but found no lines without terminal semicolons:
Code:wirenia@wirenia:~/Desktop/pb_tests$ grep '[^;]$' Chain5.treelist
wirenia@wirenia:~/Desktop/pb_tests$ grep '[^;]$' Chain6.treelist
wirenia@wirenia:~/Desktop/pb_tests$ grep -v \;$ Chain5.treelist
wirenia@wirenia:~/Desktop/pb_tests$ grep -v \;$ Chain6.treelistSo the problem seems to be that there is an unmatched number of left and right parentheses, but I'm not sure how to figure out which is/are the offending lines. I ran these commands but this will just tell me if the overall number of parentheses is even or odd. It should be an even number but there could be 2 more left than right and it would still 'pass' this test.
Code:wirenia@wirenia:~/Desktop/pb_tests$ awk -F '[()]' 'NF % 2 == 0' Chain5.treelist
wirenia@wirenia:~/Desktop/pb_tests$ awk -F '[()]' 'NF % 2 == 0' Chain6.treelistAny suggestions would be greatly appreciated!

Thank you,
Kevin

Code:Sample tree:
(((((Praticolella_mexicana:0.00480602,Polygyra_cereolus:0.00371952):1.47793,((Camaena_cicatricosa:0.115961,Camaena_poyuensis:0.103371):0.953901,((Aegista_aubryana:0.136351,Aegista_diversifamilia:0.192602):0.843191,Mastigeulota_kiangsinensis:0.522806):0.303342):0.100898):0.121644,((Cernuella_virgata:0.260359,Helicella_itala:0.140086):0.617919,(Cylindrus_obtusus:0.686453,(Helix_aspersa:0.516477,Cepaea_nemoralis:1.89785):0.308462):0.414249):0.137774):1.01136,(Arion_rufus:2.29168,(((Achatina_fulica:1.69866,(Rhopalocaulis_grandidieri:2.29815,((Myosotella_myosotis:1.82384,((Physella_acuta:2.23482,((Radix_balthica:0.142582,Galba_pervia:0.208535):0.50107,(Biomphalaria_glabrata:0.404376,(Planorbarius_corneus:0.00209395,Planorbella_duryi:0.00295444):0.472689):0.687874):0.168854):0.293892,(((((Onchidella_celtica:0.202836,Onchidella_borealis:0.889813):0.185783,(Peronia_peronii:0.159037,Platevindex_mortoni:0.198364):0.101483):0.170859,(((Carychium_tridentatum:0.501811,Ovatella_vulcani:0.317034):0.0582359,(Ellobium_chinense:0.0951185,Auriculinella_bidentata:0.349713):0.0828497):0.0754304,Trimusculus_reticulatus:0.297743):0.0656323):0.0868645,((Pyramidella_dolabrata:0.980992,Salinator_rhamphidia:0.349017):0.0732706,Acochlidium_fijensis:0.465269):0.0420865):0.0276939,(((((Ringicula_conformis:1.11622,((Valvata_sp:0.685025,Microdiscula_charopa:2.95017):0.224082,((((Marisa_cornuarietis:0.179614,Pomacea_canaliculata:0.30989):0.293205,((Turritella_bacillum:0.202813,Tylomelania_sarasinorum:0.189615):0.162401,(Cymatium_parthenopeum:0.155288,(Ilyanassa_obsoleta:0.134292,((Menathais_tuberosa:0.0574794,(Concholepas_concholepas:0.114143,Thais_clavigera:0.110627):0.048665):0.0853467,Conus_striatus:0.240452):0.0408924):0.0419603):0.172519):0.0449037):0.0472011,(Bellamya_quadrata:0.200306,Cipangopaludina_cathayensis:0.178853):0.57194):0.154863,((Titiscania_limacina:0.71693,(Georissa_bangueyensis:0.950074,Clithon_retropictus:0.206275):0.216874):0.105551,(Angaria_delphinus:0.350123,Phasianella_solida:0.561236):0.211091):0.202515):1.24934):2.44132):0.107756,(((Berthellina_sp:0.231649,Pleurobranchaea_novaezealandiae:0.245021):0.328029,(((Notodoris_gardineri:0.373909,Homoiodoris_japonica:0.336948):0.0108453,((Hypselodoris_festiva:0.212083,(Chromodoris_magnifica:0.0107541,Chromodoris_quadricolor:0.0190366):0.180875):0.0775515,((((Tritonia_diomedea:0.191568,Sakuraeolis_japonica:0.34007):0.0426962,Melibe_leonina:0.301892):0.378198,Roboastra_europaea:0.192852):0.125796,Nembrotha_kubaryana:0.153817):0.0875222):0.0345442):0.0680711,Phyllidia_ocellata:0.279088):0.157023):0.267243,((Micromelo_undatus:0.0508658,Hydatina_physis:0.0510341):0.21094,Pupa_strigosa:0.714736):0.307753):0.0715796):0.0425515,(((Illbia_ilbi:1.32697,Runcina_ornata:1.47963):0.262683,((Smaragdinella_calyculata:0.167591,Bulla_sp:0.608702):0.123347,(Odontoglaja_guamensis:0.832243,Sagaminopteron_nigropunctatus:0.340381):0.12399):0.268611):0.0540452,(((Aplysia_californica:0.0374974,Aplysia_dactylomela:0.0167428):0.0267237,(Aplysia_kurodai:0.0140971,Aplysia_vaccaria:0.0103373):0.0444639):0.171587,Tylodina_sp:0.415181):0.0632413):0.123332):0.063078,(Siphonaria_pectinata:0.288448,Siphonaria_gigas:0.509812):0.0896595):0.0615275,(Ascobulla_fragilis:0.395827,(Placida_sp:0.136588,((Elysia_ornata:0.135254,Elysia_chlorotica:0.120108):0.0609832,(Plakobranchus_cf_ocellatus:0.121242,Thuridilla_gracilis:0.179529):0.0317251):0.0298276):0.38227):0.144995):0.262714):0.109505):0.0445404):0.0841337,Pedipes_pedipes:1.1272):0.160818):0.377031):0.201989,((Pupilla_muscorum:0.842588,(Vertigo_pusilla:0.762949,Gastrocopta_cristata:0.47463):0.156762):0.223448,(Achatinella_sowerbyana:0.08089,Achatinella_mustelina:0.0780632):2.00741):0.498303):0.13143,(Succinea_putris:1.75508,Naesiotus_nux:1.52222):0.267404):0.256461):0.252589):1.69963,Cerion_uva:0.401049,Cerion_incanum:0.770272);latest?d=yIl2AUoC8zA latest?i=tuFNO8OmVWU:4nyszsBXspA:F7zBnMy latest?i=tuFNO8OmVWU:4nyszsBXspA:V_sGLiP latest?d=qj6IDK7rITs latest?i=tuFNO8OmVWU:4nyszsBXspA:gIN9vFwtuFNO8OmVWU
External Content
Source RSS or Atom Feed
Feed Location https://feeds.feedburner.com/linuxquestions/latest
Feed Title LinuxQuestions.org
Feed Link https://www.linuxquestions.org/questions/
Reply 0 comments