Multi-line Search And Replace

May 30, 2024 Post a Comment

I have an unstructured file and I would like to search and replace pattern of strings. Must replace the string that exists between SELECT and FROM strings; the one's outside of t

Solution 1:

With sed:

sed '/SELECT/,/FROM/ {s/as col4 *//;s/\([A-Za-z]*\.\)\{0,\}col4/upper(&) as col4/;}' file

Explanations:

s/as col4 *//: existing as col4 is removed to prevent duplicates after second substitution
$[A-Za-z]*\.$\{0,\}col4: search for 0 or more combinations of letters and dots followed by col4
upper(&) as col4/;: replace with new text(matching string is inserted using &)

Solution 2:

I think this, at least, 95% does it. Please tell me if there's an error:

withopen('ej.txt', 'r') as file:
    string=file.read().replace('\n',' ')


import re

matches=re.findall(r'SELECT.*?FROM',string)
replacements={"col4,":"upper(col4) as col4,",
             "sch.col4":"upper(sch.col4)",
             "sch.tab.col4":"upper(sch.tab.col4)",
             "col4 as col4,": "upper(col4) as col4,"}
new_matches=[]
for match in matches:
    for k,v in replacements.items():
        match=match.replace(k,v)
    new_matches.append(match)


for k,v in {k:v for k,v inzip(matches,new_matches)}.items() :
    string=string.replace(k,v)

string

Solution 3:

Here is a short awk script performing your request:

awk '/SELECT/,/FROM/ {$0=gensub(/^[^[:space:]]*col4/,"upper(\\0)",-1);}1' input.txt

The output is:

abc 12345 !$% DATA SELECT
col1 as col1,
col2 as col2.
col3,
upper(sch.col4) as col4,
upper(sch.tab.col4) as col4_1,
upper(col4),
col5 FROM sch.tab
xyz 34354 ^&* DATA SELECT
col5 as col5,
col3,
upper(col4),
upper(col4) as col4,
upper(col4) FROM
blah blah blah

Explanation:

/SELECT/,/FROM/ inclusive range selecting each line from /SELECT/ to /FROM/

$0=gensub(***) update current line with substitions from gensub()

/^[^[:space:]]*col4/ search for non space prefix to col4 at the beginning of line

upper(\\0)",-1 replace found-match with-upper('found-match') only first match

1 print the current line. 1

Solution 4:

Your description of the transformations you need is incomplete (e.g. you say you want to change col4, to upper(col4) as col4, but line 7 of the expected output doesn't reflect that) so I set that aside and just wrote this which but will produce the output you want from the input you provided (using GNU awk for the 3rd arg to match()) and hopefully this is what you actually want:

$ cat tst.awk
/SELECT/ { inBlock=1 }
inBlock {
    if ( match($0,/^((sch\.(tab\.)?)?col4\>)( as.*)/,a) ) {
        $0="upper(" a[1] ")" a[4]
    }
    elseif ( match($0,/^(col4\>)(.*)/,a) ) {
        $0="upper(" a[1] ") as " a[1] a[2]
    }
}
/FROM/   { inBlock=0 }
{ print }

$ awk -f tst.awk file
col4 isrequired to be upper so
make col4 upper
abc 12345!$%DATASELECT
col1 as col1,
col2 as col2.
col3,
upper(sch.col4) as col4,
upper(sch.tab.col4) as col4_1,
upper(col4) as col4,
col5 FROM sch.tab
xyz 34354^&*DATASELECT
col5 as col5,
col3,
upper(col4) as col4,
upper(col4) as col4,
upper(col4) as col4 FROM
blah blah blah

Python Playground