I want to delete 2 tags and the content between them in any of these 2 cases that should be covered simultaneously:
<tag>single line of text here </tag>
or
<tag> line 0
line 1 of text
line 2 of text
final line text </tag>
Is it possible to have a single sed expression to delete the single line (first case) and the 4 (or n) lines containing the start tag , the end tag & the content in-between the tags (second case)? If not, can there be successive more sed expressions applied to get the same result in any cases (no matter what case occur) ?
PS I have seen that if I do the simple
sed -i '/<tag>/,/<\/tag>/d' test.xml
this won't work for the first case (as it deletes everything from that line to the end of the file)
PS2 there are no other tags in the <tag>
content in my case (I'm pruning a pom.xml file, removing some custom properties)
I want to delete 2 tags and the content between them in any of these 2 cases that should be covered simultaneously:
<tag>single line of text here </tag>
or
<tag> line 0
line 1 of text
line 2 of text
final line text </tag>
Is it possible to have a single sed expression to delete the single line (first case) and the 4 (or n) lines containing the start tag , the end tag & the content in-between the tags (second case)? If not, can there be successive more sed expressions applied to get the same result in any cases (no matter what case occur) ?
PS I have seen that if I do the simple
sed -i '/<tag>/,/<\/tag>/d' test.xml
this won't work for the first case (as it deletes everything from that line to the end of the file)
PS2 there are no other tags in the <tag>
content in my case (I'm pruning a pom.xml file, removing some custom properties)
Just use awk. Using GNU awk (which you must have since you're using GNU sed) for multi-char RS
and RT
regardless of whether you have <
or >
between the tags:
$ awk -v RS='</tag>' -v ORS= 'RT{sub(/<tag>.*/,"")} 1' file
or
The above was run against this sample input from the question:
$ cat file
<tag>single line of text here </tag>
or
<tag> line 0
line 1 of text
line 2 of text
final line text </tag>
Consider also input such as:
$ echo 'foo<tag>a</tag>bar this<tag>a</tag>that' |
awk -v RS='</tag>' -v ORS= '{sub(/<tag>.*/,"")} 1'
foobar thisthat
FWIW, with Perl it could just be:
perl -i -gpe 's|<tag>.*?</tag>||gs' file
Technically, the most recent POSIX ERE spec includes the *?
operator but I don't know any sed -E
implementation that supports it:
Each of the duplication symbols ('+', '*', '?', and intervals) can be suffixed by the repetition modifier '?' (<question-mark>), in which case matching behavior for that repetition shall be changed from the leftmost longest possible match to the leftmost shortest possible match, including the null match
https://pubs.opengroup./onlinepubs/9799919799/basedefs/V1_chap09.html#tag_09_04_06
Also, you'd have to use an extension like GNU's -z
. Or accumulate chunks of the file into the hold space and perform lots of bookkeeping (which is fraught with gotchas).
With input:
aaa
bbb<tag>single line of text here </tag>ccc
ddd
eee<tag> line 0
line 1 of text
line 2 of text
final line text </tag>fff
ggg
Given input like 'foo<tag>a</tag>bar this<tag>a</tag>that' that would
output foothat instead of foobar thisthat.
output will be:
aaa
bbbccc
ddd
eeefff
ggg
Given input like 'foobar thisthat' that would
output foothat instead of foobar thisthat.
This might work for you (GNU sed):
sed ':a;/<tag>/{
:b;s#</tag>#\n#;tc;N;bb
:c;s/<tag>.*\n//;ba
}' file
If a line contains <tag>
then search for </tag>
gathering up lines in the pattern space and replace that with a newline. Then remove everything between <tag>
and last newline.
Repeat.
For my situation found a quick solution with 2 sed commands applied in succession like this:
sed -i '/<tag>.*<\/tag>/d' test.xml
sed -i '/<tag>/,/<\/tag>/d' test.xml
This null-data GNU sed
may work.
$ sed -Ez 's~<tag>[^>]*(>[^>]*)?</tag>\n*~~g' input_file
With Raku/Sparrow:
between: { "<tag>" } { "</tag>" }
:any:
end:
code: <<RAKU
!raku
# update all matched blocks
for streams().values -> $block {
# delete all lines
# of a block including
# open and close tag
for $block<> -> $i {
my $line-num = $i<index>;
# update file in place
# by removing a line
replace(
"/path/to/file.txt",
$line-num,
""
);
}
}
RAKU
<
and>
occurring between the tags in the example as that's also necessary for testing in general. – Ed Morton Commented Mar 5 at 18:05