11

I have HTML code stored in the data base, and I want to read it as XML.

My codes:

http://rextester.com/RMEHO89992

This is an example of the HTML code I have:

<div> <section> <h4> <span> A </span> </h4> <ul> <li> <span> Ab</span> AD <span> AC </span> </li> <li> <span> Ag</span> <span> AL </span> </li> </ul> <h4> <span> B </span> </h4> <ul> <li> <span> Bb</span> BD <span> BC </span> </li> <li> <span> Bg</span> <span> BL </span> </li> </ul> </section> </div> 

and this is an example of the output I need:

Category Selection Value --------- --------- ------------ A Ab AD A Ag AL B Bb BD B Bg BL 

I need to get the value inside the <h4> tag as a Category, the first <span> tag as Selection, and the rest of the values as a concatenated string.

I've tried the following query:

SELECT ( isnull(t.v.value('(h4/span/span[1]/text())[1]','nvarchar(max)'),'') + isnull(t.v.value('(h4/span/text())[1]','nvarchar(max)'),'') + isnull(t.v.value('(h4/span/span[2]/text())[2]','nvarchar(max)'),'') ) AS [Category], ( isnull(c.g.value('(span[1]/text())[1]','nvarchar(max)'),'') + isnull(c.g.value('(span[1]/span/text())[1]','nvarchar(max)'),'') + isnull(c.g.value('(span[1]/text())[2]','nvarchar(max)'),'') ) AS [Selection], ( isnull(c.g.value('(span[2]/text())[1]','nvarchar(max)'),'') + isnull(c.g.value('(span[2]/span/text())[1]','nvarchar(max)'),'') + isnull(c.g.value('(span[2]/text())[2]','nvarchar(max)'),'') ) AS [Value] FROM @htmlXML.nodes('div/section') as t(v) CROSS APPLY t.v.nodes('./ul/li') AS c(g) 

and :

SELECT t.v.value('.','nvarchar(max)') , --( isnull(t.v.value('(h4/span/span[1]/text())[1]','nvarchar(max)'),'')+isnull(t.v.value('(h4/span/text())[1]','nvarchar(max)'),'')+isnull(t.v.value('(h4/span/span[2]/text())[2]','nvarchar(max)'),''))AS [Category], ( isnull(c.g.value('(span[1]/text())[1]','nvarchar(max)'),'')+isnull(c.g.value('(span[1]/span/text())[1]','nvarchar(max)'),'')+isnull(c.g.value('(span[1]/text())[2]','nvarchar(max)'),''))AS [Selection] , ( isnull(c.g.value('(span[2]/text())[1]','nvarchar(max)'),'')+isnull(c.g.value('(span[2]/span/text())[1]','nvarchar(max)'),'')+isnull(c.g.value('(span[2]/text())[2]','nvarchar(max)'),''))AS [Value] FROM @htmlXML.nodes('div/section/h4/span') as t(v) CROSS APPLY @htmlXML.nodes('div/section/ul/li') AS c(g) 

But it only gets the first category, and doesn't get all the values togheter.

Category Selection Value --------- --------- ------------ A Ab AC B Ab AC A Ag AL B Ag AL A Bb BC B Bb BC A Bg BL B Bg BL 

There can be N categories, and the values might or might not be inside <span> tags. How can I get all the categories with their corresponding value? or get :

category h4 number -------- ----------- A 1 B 2 
  • 1 ,mean = h4 first , 2 ,mean = h4 second
 ul number Selection Value --------- --------- ------------ 1 Ab AD 1 Ag AL 2 Bb BD 2 Bg BL 

relation between column ul number and h4 number. i cannt.

2
  • 1
    Are you sure the expected result is correct? Shouldn't it be AD AC for the first row in the third column?CommentedJan 28, 2017 at 16:08
  • I am trying to establish communication between nodes` h4` and ` ul `.
    – RedArmy
    CommentedJan 28, 2017 at 20:00

2 Answers 2

7

This is not exactly elegant but seems to do the job.

DECLARE @X XML = REPLACE(REPLACE(@S, '<h4>', '<foo><h4>'), '</ul>', '</ul></foo>') SELECT Category = x.value('../../h4[1]/span[1]', 'varchar(10)'), Selection = x.value('descendant-or-self::text()[1]', 'varchar(10)'), Value = REPLACE( REPLACE( REPLACE( LTRIM( RTRIM( REPLACE( REPLACE( CAST(x.x.query('fn:data(descendant-or-self::text()[fn:position() > 1])') AS VARCHAR(MAX)) , char(10), '') , char(13), '') ) ) , ' ', ' |') , '| ', '') , '|', '') FROM @X.nodes('div/section/foo/ul/li') x(x) ORDER BY Category, Selection 

Which returns

+----------+-----------+-------+ | Category | Selection | Value | +----------+-----------+-------+ | A | Ab | AD AC | | A | Ag | AL | | B | Bb | BD BC | | B | Bg | BL | +----------+-----------+-------+ 

I'm assuming this is what you want as the desired results table in the question does not return the "rest of the values as a concatenated string"

0
    14

    I am trying to establish communication between nodesh4 and ul.

    You can use the << and >> operator to check if a node is before or after another node in document order. Combine that with a predicate on position, [1], to get the first occurrence also in document order.

    select H4.X.value('(span/text())[1]', 'varchar(10)') as Section, UL.X.query('.') as UL from @X.nodes('/div/section/h4') as H4(X) cross apply H4.X.nodes('(let $h4 := . (: Save current h4 node :) return /div/section/ul[$h4 << .])[1]') as UL(X); 

    rextester:

    << and >> are called Node Order Comparison Operators

    If you have an XML fragment like this:

    <N1>1</N1> <N2>2</N2> <N3>3</N3> <N4>4</N4> <N5>5</N5> 

    you can get all nodes before the first occurrence of N3 with this query:

    select @X.query('/*[. << /N3[1]]'); 

    Result:

    <N1>1</N1> <N2>2</N2> 

    /* will give you all root nodes. What is enclosed in [] is a predicate. . is the current node and /N3[1] is the first N3 node in document order at the root level. So from each root node you get the nodes that precede N3.

    Here is almost the same query, only you get the nodes that follow the first N3 node:

    select @X.query('/*[. >> /N3[1]]'); 
    <N4>4</N4> <N5>5</N5> 

    To only get the first node after the first N3 node, you add the predicate [1]:

    select @X.query('/*[. >> /N3[1]][1]'); 
    <N4>4</N4> 
    0

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.