Friday, December 18, 2009

awk in multiline records

I had a database creation script that looks like spanning thousands of lines. However I was only interested to see only few of them which were only containing name pattern TEST1 and TEST2.

I could have used grep to find the valid lines of the file but my records were spread across the lines means each valid record consisted of multiple lines. So I googled for the problem and found that awk can also do a multi line search given proper inputs.

Suppose I have a file input.sql that contains many multiline database object creation queries. I now have to find all queries which only contains TEST1 and TEST2. All of these queries/records are separated by separator ";".

Write a awk input script as follows.

#scrip.awk
BEGIN {
RS=";"
}
{
if (($0 ~ /TEST1/ )||($0 ~ /TEST2/)) {
print $0 ";"
}
}

Here I am specifying that my record separator RS is equal to ";". Also In the awk execution line I specified that if each of my record field which is represented by $0 contains either TEST1 or TEST2 then dump that field to console using print command of awk. Also I have appended a additional string ";" so that my output file output.sql contains actual queries when run on sqlplus prompt

Now I can execute the command as follows on command prompt.
awk -f script.awk input.sql > output.sql

This should give you the all matching record containing TEST1 and TEST2 in the file output.sql

A very nice awk Related tutorial can be found at
http://www.ibm.com/developerworks/library/l-awk1.html
http://www.ibm.com/developerworks/library/l-awk2.html

Sphere: Related Content