📜 ⬆️ ⬇️

Parsing a CSV file with bash and awk

Good day, Habrochitatel!

I needed to translate the interface of one system. The translation for each form lies in a separate XML file, and the files are grouped in folders, which is very inconvenient. It was decided to create a single dictionary in Excel to work with the translation of all forms. This task, in turn, is divided into 2 subtasks: extract information from all XML files into one CSV file; after transferring from a CSV file, create XML files with the same structure. Bash and awk were chosen as tools. It makes no sense to describe the first subtask, since it is rather trivial. But how to parse the CSV file?

On the Internet you can find a lot of information on this topic. Most of the examples easily handle only simple options. But I did not find anything suitable, for example, for this:

./web/analyst/xml/list.template.xml;test;"t ""test""; est"
./web/analyst/xml/list.template.xml;%1 _{factory_desc}s found. Displaying %2 through %3; : %1. %2 %3

')
In Excel, these lines look like this:
FileTagTransfer
./web/analyst/xml/list.template.xmltestt "test"; est
./web/analyst/xml/list.template.xml% 1 _ {factory_desc} s found. Displaying% 2 through% 3Found objects:% 1. Displayed from% 2 to% 3

Taking the example from OpenNET as a basis, I decided to change it. Here is the text of the awk program:

 { $0=$0";"; while($0) { #       ,  ;  ""; match($0,/[^;"]*;|^"[^"]*(""[^"]*)*";/); #     F  SF SF=F=substr($0,RSTART,RLENGTH); #  ;  ""; gsub(/^"|";$|;$/,"",F); #      gsub("\"\"","\"",F); ++c; #    file_to   xml-      if (c%3==1) file_to=AWK_XML_PATH F; #   xml- tag      if (c%3==2) print " <ResourceString>\r\n <tag>"F"</tag>" > file_to; #   xml- value      if (c%3==0) print " <value>"F"</value>\r\n </ResourceString>\r\n" > file_to; #     SF gsub(/\\/,"\\\\",SF); #  .    SF gsub("([][?$|^+*()])","\\\\""&",SF); #       SF   .  sub(SF,""); } } 

And here is a fragment of the bash-script ( XML_PATH is a variable with the path along which folders with XML files are located):

 #         iconv -f WINDOWS-1251 -t UTF8 $1 | tr -d '\r' | sed '1d' > translation.csv # ""     xml- awk -v AWK_XML_PATH="$XML_PATH" –f csv_parse.awk translation.csv 

As a result, from the table
FileTagTransfer
./web/analyst/xml/list.template.xmltestt "test"; est
./web/analyst/xml/list.template.xml% 1 _ {factory_desc} s found. Displaying% 2 through% 3Found objects:% 1. Displayed from% 2 to% 3

A list.template.xml file is generated with the following contents:

  <ResourceString> <tag>test</tag> <value>t "test"; est</value> </ResourceString> <ResourceString> <tag>%1 _{factory_desc}s found. Displaying %2 through %3</tag> <value> : %1.   %2  %3</value> </ResourceString> 

PS
I know that you can choose other tools that can solve the problem more efficiently. Perhaps Python. This example will be useful to those who for some reason cannot use them.

Source: https://habr.com/ru/post/184956/


All Articles