Article 4ZXG7 Convert PDF file to CSV in Ubuntu 18.04

Convert PDF file to CSV in Ubuntu 18.04

by
yjacobo
from LinuxQuestions.org on (#4ZXG7)
Hi to all. Currently, i need to convert a pdf file into a csv file. The original PDF comes from our payroll software, and i need to convert it into a csv to generate some reports for our company's board.
So far, i have used pdftotext to create the txt file:

pdftotext -nopgbrk -layout ACUM\ EMPLEADOS\ DIC\ 2019-2.pdf try1.txt

Then used sed to create the csv file:

sed 's/ \+/,/g' try1.txt > try1.csv

This is the output i get:

,COMPANYNAME
,*Acumulados,Anuales,-,Empleado,Mes

Ejercicio:,Diciembre,-,Diciembre,2019

,ID#,de,Concepto:,Concepto,Percepciones,Deducciones
Departamento:,Obras
Ubicacion:,Distrito,Federal
,1346,NAME1,NAME2,,NAME3,NAME4
,1,Sueldo,Ordinario,27,389.82,0.00
,21,Vacaciones,1,928.86,0.00
,23,Aguinaldo,14,466.45,0.00
,31,Bono,de,Asistencia,2,738.98,0.00
,32,Bono,de,Puntualidad,2,738.98,0.00
,34,Otras,Percepciones,437.57,0.00
,42,Vale,Despensa,0.00,0.00
,101,ISR,0.00,10,909.86
,102,IMSS,0.00,1,095.00
,131,Pri(C)stamo,Caja,de,Ahorro,0.00,4,600.00
,134,Cri(C)dito,INFONAVIT,0.00,4,930.49
,199,Ajuste,por,Redondeo,0.00,-0.38
,223,Exento,de,Aguinaldo,0.00,0.00
,301,Impuesto,Estatal,0.00,0.00
,302,IMSS,-,Empresa,0.00,0.00
,303,IMSS,-,Retiro,0.00,0.00
,304,INFONAVIT,-,Empresa,0.00,0.00
,623,Aguinaldo,Complemento,3,513.31,0.00
,53,213.97,21,534.97
,4456,NAME1,NAME2,,NAME3
,1,Sueldo,Ordinario,34,319.16,0.00
,23,Aguinaldo,16,933.80,0.00
,31,Bono,de,Asistencia,3,431.92,0.00
,32,Bono,de,Puntualidad,3,431.92,0.00
,42,Vale,Despensa,0.00,0.00
,101,ISR,0.00,14,797.94
,102,IMSS,0.00,990.32
,199,Ajuste,por,Redondeo,0.00,0.80
,223,Exento,de,Aguinaldo,0.00,0.00
,301,Impuesto,Estatal,0.00,0.00
,302,IMSS,-,Empresa,0.00,0.00
,303,IMSS,-,Retiro,0.00,0.00
,304,INFONAVIT,-,Empresa,0.00,0.00
,606,Compensacion,Variable,4,000.00,0.00
,623,Aguinaldo,Complemento,4,046.26,0.00
,66,163.06,15,789.06

So far, it seems right, however, i need the CSV file to have the following format so it can be read from an external API to process the data:

,COMPANYNAME,,,,,,,,,
,*Acumulados Anuales - Empleado Mes,,,,,,,,,
Ejercicio:,,,Noviembre - Noviembre 2019,,,,,,,
,ID# de Concepto:,,Concepto,Percepciones,Deducciones,Prestaciones,Obligaciones,GravIMSS,,
Departamento:,,,Obras,,,,,,,
Ubicacion:,,Distrito Federal,,,,,,,,
,1346,,"NAME1 NAME2, NAME3 NAME4",,,,,,,
,1,,Sueldo Ordinario,"29,318.68",0.00,0.00,0.00,0.00,,
,31,,Bono de Asistencia,"2,931.86",0.00,0.00,0.00,0.00,,
,32,,Bono de Puntualidad,"2,931.86",0.00,0.00,0.00,0.00,,
,42,,Vale Despensa,0.00,0.00,777.12,0.00,0.00,,
,101,,ISR,0.00,"6,458.24",0.00,0.00,0.00,,
,102,,IMSS,0.00,"1,059.68",0.00,0.00,0.00,,
,131,,Pri(C)stamo Caja de Ahorro,0.00,"4,600.00",0.00,0.00,0.00,,
,134,,Cri(C)dito INFONAVIT,0.00,"4,771.44",0.00,0.00,0.00,,
,151,,Seguro de Gastos de Vivienda,0.00,15.00,0.00,0.00,0.00,,
,199,,Ajuste por Redondeo,0.00,-0.96,0.00,0.00,0.00,,
,301,,Impuesto Estatal,0.00,0.00,0.00,606.90,0.00,,
,302,,IMSS - Empresa,0.00,0.00,0.00,"5,221.52",0.00,,
,303,,IMSS - Retiro,0.00,0.00,0.00,"2,023.06",0.00,,
,304,,INFONAVIT - Empresa,0.00,0.00,0.00,"1,964.14",0.00,,
,,,,"35,182.40","16,903.40",777.12,"9,815.62",0.00,,
,4456,,"NAME1 NAME2, NAME3",,,,,,,
,1,,Sueldo Ordinario,"34,319.16",0.00,0.00,0.00,0.00,,
,31,,Bono de Asistencia,"3,431.92",0.00,0.00,0.00,0.00,,
,32,,Bono de Puntualidad,"3,431.92",0.00,0.00,0.00,0.00,,
,42,,Vale Despensa,0.00,0.00,777.12,0.00,0.00,,
,101,,ISR,0.00,"9,264.33",0.00,0.00,0.00,,
,102,,IMSS,0.00,958.38,0.00,0.00,0.00,,
,199,,Ajuste por Redondeo,0.00,0.29,0.00,0.00,0.00,,
,301,,Impuesto Estatal,0.00,0.00,0.00,793.20,0.00,,
,302,,IMSS - Empresa,0.00,0.00,0.00,"4,776.56",0.00,,
,303,,IMSS - Retiro,0.00,0.00,0.00,"1,835.06",0.00,,
,304,,INFONAVIT - Empresa,0.00,0.00,0.00,"1,781.62",0.00,,
,603,,Bono Desempeno,0.00,0.00,0.00,0.00,0.00,,
,606,,Compensacion Variable,"4,000.00",0.00,0.00,0.00,0.00,,
,,,,"45,183.00","10,223.00",777.12,"9,186.44",0.00,,

My idea is to create a simple shellscript that can do this. Any suggestions to accomplish this?, thanks in advance.latest?d=yIl2AUoC8zA latest?i=Aq_E7KsB5CQ:vUPQbu11yxc:F7zBnMy latest?i=Aq_E7KsB5CQ:vUPQbu11yxc:V_sGLiP latest?d=qj6IDK7rITs latest?i=Aq_E7KsB5CQ:vUPQbu11yxc:gIN9vFwAq_E7KsB5CQ
External Content
Source RSS or Atom Feed
Feed Location https://feeds.feedburner.com/linuxquestions/latest
Feed Title LinuxQuestions.org
Feed Link https://www.linuxquestions.org/questions/
Reply 0 comments