awk in multiline records
I had a database creation script that looks like spanning thousands of lines. However I was only interested to see only few of them which were only containing name pattern TEST1 and TEST2.
I could have used grep to find the valid lines of the file but my records were spread across the lines means each valid record consisted of multiple lines. So I googled for the problem and found that awk can also do a multi line search given proper inputs.
Suppose I have a file input.sql that contains many multiline database object creation queries. I now have to find all queries which only contains TEST1 and TEST2. All of these queries/records are separated by separator ";".
Write a awk input script as follows.
#scrip.awk
BEGIN {
RS=";"
}
{
if (($0 ~ /TEST1/ )||($0 ~ /TEST2/)) {
print $0 ";"
}
}
Here I am specifying that my record separator RS is equal to ";". Also In the awk execution line I specified that if each of my record field which is represented by $0 contains either TEST1 or TEST2 then dump that field to console using print command of awk. Also I have appended a additional string ";" so that my output file output.sql contains actual queries when run on sqlplus prompt
Now I can execute the command as follows on command prompt.
awk -f script.awk input.sql > output.sql
This should give you the all matching record containing TEST1 and TEST2 in the file output.sql
A very nice awk Related tutorial can be found at
http://www.ibm.com/developerworks/library/l-awk1.html
http://www.ibm.com/developerworks/library/l-awk2.html