sed inline not working to modify XML-style input

Question

the problem is that I cannot put the complete command on a line with SED, I had already done it but with those file it does not work: my exemple :

<file>Documents/time/text1</file> //2X slash + 2 words to remove !! <file>Commun/text2</file> //1X slash to remove + 1 words to remove <file>Current/text3</file> //1X slash to remove + 1 words to remove

Why does this code not work in line ?

sed 's/Documents//g' | sed 's/time//g' | sed 's/Commun//g' | sed 's/Current//g' | sed 's/Current//g' | sed '/<file>/s|<file>/|<file>|' | sed '/<file>/s|<file>/|<file>|' tracklist.txt > newtracklist.txt

please update the question with a) a textual description of what you're attempting to do (eg, retrieve the file name), b) the expected result (corresponding to the provided sample input) and c) details of what you mean by does not work ... (syntax) error? wrong output? hangs? — markp-fuso, CommentedFeb 27, 2024 at 20:53
If the answer to "Why does this code not work in line?" isn't just "because you aren't calling sed -i" then edit your question to clarify what you mean by "in line" and what exactly the problem is you're trying to solve. — Ed Morton, CommentedMar 2, 2024 at 14:16

markp-fuso · Accepted Answer · 2024-02-27 21:17:24Z

Running OP's current pipeline of sed scripts causes the contents of the input file (tracklist.txt) to be printed to stdout and then the pipeline hangs (ie, no other output, no return to a commnand prompt). I'm guessing this is what OP is referring to when stating it does not work ... ??

Primary issue: the input file (tracklist.txt) needs to be provided as an argument to the 1st sed script and not as an argument to the last sed script.

Recommentation:

# instead of this: sed 's/Documents//g' | ... | sed '/<file>/s|<file>/|<file>|' tracklist.txt ^^^^^^^^^^^^^ # do this: sed 's/Documents//g' tracklist.txt | ... | sed '/<file>/s|<file>/|<file>|' ^^^^^^^^^^^^^

Running this updated version of OP's sed pipeline generates:

<file>text1</file> <file>text2</file> <file>text3</file>

While there are better tools for parsing HTML/XML, if OP must use sed then there are several ways to generate the same results but in a more efficient manner.

One idea requiring a single sed script:

sed -E 's|(<file>).*/([^/]+</file>)|\1\2|' tracklist.txt

Where:

-E - enables support for extended regexes
(<file>) - (1st capture group) matches the string <file>
([^/]+</file>) - (2nd capture group) matches all characters that are not a / followed by the string </file>
.*/ - everything between the two capture groups ending in a /
\1\2 - replacement string consists of the two capture groups appended together
NOTE: this works for the specific input provided by OP; it may need tweaking if the format of the input varies from what's shown in OP's sample input

For OP's sample input this generates:

<file>text1</file> <file>text2</file> <file>text3</file>

Nice, but OP use the wrong tool ^^ What if the closing tag is on a newline? This is would be a perfectly valid XML. — Gilles Quénot, CommentedFeb 27, 2024 at 21:28
did you see the parts ... there are better tools for parsing HTML/XML, if OP must use sed .... this works for the specific input provided by OP; there are a slew of ways this answer breaks if the format of OP's input differs from that provided in the sample, and OP will find out real quickly should they actually run into such a situation — markp-fuso, CommentedFeb 27, 2024 at 21:57
thank a lot , have make a new script this work fine : sed -r 's/\b(Documents|time|Commun|Current|CurrentTitle)\b//g' tracklist1.txt | sed -E 's|(<file>).*/([^/]+</file>)|\1\2|' > newtracklist.txt — Allan Tori, CommentedFeb 28, 2024 at 17:57
@AllanTori -r is the old GNU sed option that's now replaced by -E, don't mix them. You don't need 2 separate sed commands anyway though as sed 'foo' | sed 'bar' is usually equivalent to sed 'foo; bar'. — Ed Morton, CommentedMar 2, 2024 at 14:19

Gilles Quénot · Accepted Answer · 2024-02-27 21:23:01Z

Given your input XML file

Added a r root node:

<r> <file>Documents/time/text1</file> <file>Commun/text2</file> <file>Current/text3</file> </r>

The code:

xidel --xquery ' <r>{ for $x in //file return <file>{tokenize($x, "/")[last()]}</file> }</r> ' --output-format=xml --output-node-indent file.xml

Yields:

<?xml version="1.0" encoding="UTF-8"?> <r> <file>text1</file> <file>text2</file> <file>text3</file> </r>

Explanations:

Here, instead of using the wrong tool: sed, I use XPath and XQuery (the former is a subset of the latter) proper XML parser.

xidel is the Swiss army knife for manipulating HTML/XML.

Usage:

xidel ... file.xml > new_file.xml

If you want to edit on the fly:

xidel ... file.xml | sponge file.xml

sponge from GNUmore-utils.

jubilatious1 · Accepted Answer · 2024-03-05 05:26:41Z

Using Raku (formerly known as Perl_6)

~$ raku -MXML -e 'my $xml = open-xml( $*ARGFILES.Str ); for $xml.elements( :RECURSE(0), :TAG{"file"} ) -> $E { my $old = $E.contents[0]; my $new = XML::Text.new( text => $old.text.match(/ <?after "/"> <-[/]>+ $/) ); $E.replace( $old, $new ); }; .say for $xml;' file.xml

OR:

% raku -MXML -e 'my $xml = open-xml( $*ARGFILES.Str ); for $xml.elements( :RECURSE(0), :TAG{"file"} ) -> $E { my $old = $E.contents[0]; my $new = XML::Text.new( text => $old.text.path.basename ); $E.replace( $old, $new ); }; .say for $xml;' file.xml

Raku is a programming language in the Perl-family that features high-level Grammars for parsing text. Along with Raku/Rakudo itself, community members support modules in the Raku/Rakudo ecosystem. One of those modules is the (Raku-native) XML module.

Similar to the OP's other question, in Raku with the XML-module you can (for example) limit replacements to 1). the top-level and 2). within only the <file> TAG. This is done by setting the code to iterate through elements with the limitations :RECURSE(0), :TAG{"file"}. FYI, you can iterate through all TAGs at all depths if so desired: simply set :RECURSE(Inf) and remove the :TAG named-argument, which sets the :TAG restriction to False.

The first answer above identifies suitable TAGs/levels for replacement. Thus identified, each element's internal (i.e. non-TAG) contents[0] are assigned to the variable $old, which is actually an XML::Text object. The $old object is .text extracted into a string, and the desired match is found. A new (XML::Text.new) object is created ($new) with the now-corrected text => 'value' key/value pair. From here the XML-module's replace routine completes the job: replace( $old, $new ).

The second answer above is a clever twist on the first. Because the OP wants to edit path names, routines associated with Raku's IO::Path object class can be used. Raku's .IO routine understands the text as a valid path name, and Raku's .basename routine returns the final filename. This approach has the potential to increase code portability, because Raku has mechanisms for using the correct (/ or \ ) path-separator on different platforms.

Sample Input (thanks to @GillesQuénot!):

<r> <file>Documents/time/text1</file> <file>Commun/text2</file> <file>Current/text3</file> </r>

Sample Output:

<?xml version="1.0"?><r> <file>text1</file> <file>text2</file> <file>text3</file> </r>

https://github.com/raku-community-modules/XML
https://docs.raku.org/type/IO/Path
https://rakudo.org/
https://raku.org

Ah? I don't know Roku at all I will see this, it seems very interesting, thank you very much — Allan Tori, CommentedMar 6, 2024 at 4:01
Raku is the new name for the programming language formerly known as Perl6. It was re-named in 2019 to reduce confusion between the two sister languages in the Perl-family. You can find some interesting blogs/videos online. Feel free to check out the homepage ( raku.org ) and/or the subreddit ( reddit.com/r/rakulang ). — jubilatious1, CommentedMar 6, 2024 at 7:28

Stack Exchange Network

sed inline not working to modify XML-style input

3 Answers 3

Given your input XML file

The code:

Yields:

Explanations:

Usage:

You must log in to answer this question.

Hot Network Questions

sed inline not working to modify XML-style input

3 Answers 3

Given your input XML file

The code:

Yields:

Explanations:

Usage:

You must log in to answer this question.

Related

Hot Network Questions