4

So I have an XML file I want to parse with a BASH script, etc. using xmlstarlet (or an alternative if people can give me an example).

The basic structure is this:

 <character>   <literal>恵</literal> <misc> <stroke_count>10</stroke_count> </misc> <reading_meaning>     <rmgroup> <reading r_type="ja_on">ケイ</reading> <reading r_type="ja_on">エ</reading>      <reading r_type="ja_kun">めぐ.む</reading> <reading r_type="ja_kun">めぐ.み</reading> <meaning>favor</meaning> <meaning>blessing</meaning> <meaning>grace</meaning> <meaning>kindness</meaning> </rmgroup>     </reading_meaning> </character> 

There are some other fields there and the meanings and readings can change in number. Basically I'd like to get all of the readings, meanings, stroke count, etc. out and generate an HTML table with BASH.

This is also a large file with many characters that need looking up. So I'd like to do this with a script that takes in a $1 and uses that to look up the values based on the tag. So ideally it'd be:

kanjilookup.sh 恵 

And then generate an html table based on the content.

Thoughts? (I'd also be up for using another program like xpath)

2
  • This is basically what XSLT was made for.CommentedFeb 17, 2013 at 8:06
  • Can you give me an example? @thatotherguy
    – user798080
    CommentedFeb 17, 2013 at 8:28

2 Answers 2

2

As @thatotherguy suggested, you'll probably want to do this with something like XSLT instead of Bash. You can parse XML with Bash, but it's probably going to get tricky pretty quick.

Following @thatotherguy's suggestion, you could have an XSLT stylesheet that looks something like this:

<!-- kanjilookup.xsl --> <?xml version="1.0" encoding="iso-8859-1"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:param name="character"/> <xsl:output method="html" indent="yes"/> <xsl:strip-space elements="*"/> <!-- From https://stackoverflow.com/questions/9611569/xsl-how-do-you-capitalize-first-letter --> <xsl:variable name="vLower" select="'abcdefghijklmnopqrstuvwxyz'"/> <xsl:variable name="vUpper" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/> <xsl:template name="capitalize"> <xsl:param name="string"/> <xsl:value-of select= "concat(translate(substring( $string, 1, 1), $vLower, $vUpper), substring($string, 2) ) "/> </xsl:template> <xsl:template match="/"> <xsl:if test="string-length($character) = 0 or not(//literal[. = $character])"> <xsl:message terminate="yes">ERR: No input character given.</xsl:message> </xsl:if> <xsl:apply-templates select="characters/character[literal[. = $character]]"/> </xsl:template> <xsl:template match="character"> <xsl:text disable-output-escaping='yes'>&lt;!DOCTYPE html> </xsl:text> <html> <head/> <body> <table> <tbody> <xsl:apply-templates/> </tbody> </table> </body> </html> </xsl:template> <xsl:template match="literal"> <caption> <xsl:value-of select="."/> </caption> </xsl:template> <xsl:template match="stroke_count"> <tr> <td> <xsl:call-template name="capitalize"> <xsl:with-param name="string" select="translate(local-name(), '_', ' ')"/> </xsl:call-template> </td> <td><xsl:value-of select="."/></td> </tr> </xsl:template> <xsl:template match="misc | reading_meaning | rmgroup"> <xsl:apply-templates/> </xsl:template> <xsl:template match="reading | meaning"> <tr> <td> <xsl:call-template name="capitalize"> <xsl:with-param name="string" select="local-name()"/> </xsl:call-template> <xsl:apply-templates select="@r_type"/> </td> <td> <xsl:value-of select="."/> </td> </tr> </xsl:template> <xsl:template match="@r_type"> <xsl:value-of select="concat(' ', '(', ., ')')"/> </xsl:template> </xsl:stylesheet> 

Let's say you have a file called characters.xml:

<characters> <character> <literal>恵</literal> <misc> <stroke_count>10</stroke_count> </misc> <reading_meaning> <rmgroup> <reading r_type="ja_on">ケイ</reading> <reading r_type="ja_on">エ</reading> <reading r_type="ja_kun">めぐ.む</reading> <reading r_type="ja_kun">めぐ.み</reading> <meaning>favor</meaning> <meaning>blessing</meaning> <meaning>grace</meaning> <meaning>kindness</meaning> </rmgroup> </reading_meaning> </character> </characters> 

You could run kanjilookup.xsl on it with XMLStarlet like this:

xml tr kanjilookup.xsl -s character=恵 characters.xml 

That'll produce a HTML table that looks like this (after pretty-printing):

<!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head> <body> <table> <tbody> <caption>恵</caption> <tr> <td>Stroke count</td> <td>10</td> </tr> <tr> <td>Reading (ja_on)</td> <td>ケイ</td> </tr> <tr> <td>Reading (ja_on)</td> <td>エ</td> </tr> <tr> <td>Reading (ja_kun)</td> <td>めぐ.む</td> </tr> <tr> <td>Reading (ja_kun)</td> <td>めぐ.み</td> </tr> <tr> <td>Meaning</td> <td>favor</td> </tr> <tr> <td>Meaning</td> <td>blessing</td> </tr> <tr> <td>Meaning</td> <td>grace</td> </tr> <tr> <td>Meaning</td> <td>kindness</td> </tr> </tbody> </table> </body> </html> 

You'd have to modify the XSLT stylesheets to suit your needs, of course.

1
  • Wow! Thanks for all the info! XSLT is something I'm definitely new to, so this is very helpful! =D
    – user798080
    CommentedFeb 22, 2013 at 19:29
0

Nowadays with XQuery there is no reason to use XSLT anymore, XQuery is much nicer.

E.g. with my XQuery interpreter, you can run it on directly without additional file like this:

xidel --printed-node-format xml characters.xml -e "(character:='恵')[2]" -e - <<<'xquery version "1.0"; (<title>{$character}</title>, for $char in //character[literal eq $character] return <table> <tbody> <caption>{$character}</caption> <tr> <td>Stroke count</td> <td>{$char/misc/stroke_count/text()}</td> </tr> { for $reading in $char//rmgroup/reading return <tr> <td>Reading ({$reading/@r_type/data(.)})</td> <td>{$reading/text()}</td> </tr> } { for $meaning in $char//rmgroup/meaning return <tr> <td>Meaning</td> <td>{$meaning/text()}</td> </tr> } </tbody> </table> ) ' 

Creates a similar table as the xslt answer. (but you need to prepend <?xml version="1.0" encoding="utf-8"?> to the characters.xml posted there)