Sed expression for deleting content (single line AND multiple lines) between tags, tags included - Stack Overflow

admin2025-04-19 13

I want to delete 2 tags and the content between them in any of these 2 cases that should be covered simultaneously:

<tag>single line of text here </tag>

    <tag> line 0
line 1 of text
          line 2 of text
    final line text </tag>

Is it possible to have a single sed expression to delete the single line (first case) and the 4 (or n) lines containing the start tag , the end tag & the content in-between the tags (second case)? If not, can there be successive more sed expressions applied to get the same result in any cases (no matter what case occur) ?

PS I have seen that if I do the simple

  sed -i '/<tag>/,/<\/tag>/d' test.xml

this won't work for the first case (as it deletes everything from that line to the end of the file)

PS2 there are no other tags in the <tag> content in my case (I'm pruning a pom.xml file, removing some custom properties)

I want to delete 2 tags and the content between them in any of these 2 cases that should be covered simultaneously:

<tag>single line of text here </tag>

    <tag> line 0
line 1 of text
          line 2 of text
    final line text </tag>

PS I have seen that if I do the simple

  sed -i '/<tag>/,/<\/tag>/d' test.xml

this won't work for the first case (as it deletes everything from that line to the end of the file)

PS2 there are no other tags in the <tag> content in my case (I'm pruning a pom.xml file, removing some custom properties)

Share Improve this question edited Mar 6 at 9:21 asked Mar 5 at 16:05 MS13 3976 silver badges17 bronze badges

Create and show us a minimal reproducible example that contains multiple tags, sample input/output with just 1 pair of tags isn't adequate to test a potential solution with. Also include < and > occurring between the tags in the example as that's also necessary for testing in general. – Ed Morton Commented Mar 5 at 18:05
I suggest to use an XML parser (xmlstarlet, xmllint ...). – Cyrus Commented Mar 5 at 18:56

Add a comment |

6 Answers 6

Sorted by: Reset to default 1

Just use awk. Using GNU awk (which you must have since you're using GNU sed) for multi-char RS and RT regardless of whether you have < or > between the tags:

$ awk -v RS='</tag>' -v ORS= 'RT{sub(/<tag>.*/,"")} 1' file


or

The above was run against this sample input from the question:

$ cat file
<tag>single line of text here </tag>

or

    <tag> line 0
line 1 of text
          line 2 of text
    final line text </tag>

Consider also input such as:

$ echo 'foo<tag>a</tag>bar this<tag>a</tag>that' |
    awk -v RS='</tag>' -v ORS= '{sub(/<tag>.*/,"")} 1'
foobar thisthat

FWIW, with Perl it could just be:

perl -i -gpe 's|<tag>.*?</tag>||gs' file

Technically, the most recent POSIX ERE spec includes the *? operator but I don't know any sed -E implementation that supports it:

Each of the duplication symbols ('+', '*', '?', and intervals) can be suffixed by the repetition modifier '?' (<question-mark>), in which case matching behavior for that repetition shall be changed from the leftmost longest possible match to the leftmost shortest possible match, including the null match

https://pubs.opengroup./onlinepubs/9799919799/basedefs/V1_chap09.html#tag_09_04_06

Also, you'd have to use an extension like GNU's -z. Or accumulate chunks of the file into the hold space and perform lots of bookkeeping (which is fraught with gotchas).

With input:

aaa
bbb<tag>single line of text here </tag>ccc
ddd
    eee<tag> line 0
line 1 of text
          line 2 of text
    final line text </tag>fff
ggg

 Given input like 'foo<tag>a</tag>bar this<tag>a</tag>that' that would
 output foothat instead of foobar thisthat.

output will be:

aaa
bbbccc
ddd
    eeefff
ggg

 Given input like 'foobar thisthat' that would
 output foothat instead of foobar thisthat.

This might work for you (GNU sed):

sed ':a;/<tag>/{                 
          :b;s#</tag>#\n#;tc;N;bb         
          :c;s/<tag>.*\n//;ba
      }' file

If a line contains <tag> then search for </tag> gathering up lines in the pattern space and replace that with a newline. Then remove everything between <tag> and last newline.

Repeat.

For my situation found a quick solution with 2 sed commands applied in succession like this:

sed -i '/<tag>.*<\/tag>/d' test.xml
sed -i '/<tag>/,/<\/tag>/d' test.xml

This null-data GNU sed may work.

$ sed -Ez 's~<tag>[^>]*(>[^>]*)?</tag>\n*~~g' input_file

With Raku/Sparrow:

between: { "<tag>" } { "</tag>" }
:any:
end:

code: <<RAKU
!raku
# update all matched blocks
for streams().values -> $block {
  # delete all lines  
  # of a block including 
  # open and close tag
    for $block<> -> $i {
    my $line-num = $i<index>;
    # update file in place
    # by removing a line
    replace(
      "/path/to/file.txt",
       $line-num,
       ""
    );
  }
}
RAKU

转载请注明原文地址:http://conceptsofalgorithm.com/Algorithm/1745021816a280425.html

最新回复(0)