merging *.csv files to remove duplicate lines
by rdx from LinuxQuestions.org on (#5FVXW)
Given: a .csv file of stock market values for a particular stock,
each line has fields for:
Code:year-month-day,open,high,low,close, adj close,volumeI can get the data for the last year (~252 entries) as a file in the same format. Since I update the data frequently, most of the data in
the update file are duplicate lines. The easy way to do what I want is something like:
Code:cat stock.csv update.csv | sort -u > updated_stock.csvNote: the input files are both sorted by date, which is perfect, and exactly what I want on the output.
Question: Is there a more efficient way to merge these files given the parameters?


each line has fields for:
Code:year-month-day,open,high,low,close, adj close,volumeI can get the data for the last year (~252 entries) as a file in the same format. Since I update the data frequently, most of the data in
the update file are duplicate lines. The easy way to do what I want is something like:
Code:cat stock.csv update.csv | sort -u > updated_stock.csvNote: the input files are both sorted by date, which is perfect, and exactly what I want on the output.
Question: Is there a more efficient way to merge these files given the parameters?