3
\$\begingroup\$

I need to optimize this snippet to be faster:

Dim lst = (From t In DocElet.ChildNodes Select ID = t.item("ID").outerxml).Distinct().ToList() Parallel.For(0, lst.Count, Sub(i) Dim P As XmlElement = GetElement(lst(i)) Dim ls = (From t In DocElet.ChildNodes Where t.item("ID").innerText = P.InnerText Select t) Parallel.ForEach(ls, Sub(D) Dim verif_date As String = D.Item("DAD").InnerText Sej.ID = D.Item("ID").InnerText End Sub) End Sub) 

This is the XML structure:

<?xml version="1.0" encoding="utf-8"?> <PatientData> <Sejour><ID></ID><DAD></DAD></Sejour> </PatientData> 

I'm asking how I can optimize my code because it takes a lot of time (50 sec) in the case where the list contains 20000 elements.

\$\endgroup\$

    1 Answer 1

    2
    \$\begingroup\$

    The indentation makes your snippet quite hard to read - the VB.NET syntax for inline delegates is fairly bulky (at least compared to C#), I'd give it more air; being able to read the code is step one.

    Dim lst = (From t In DocElet.ChildNodes Select ID = t.item("ID").outerxml).Distinct() .ToList() Parallel.For(0, lst.Count, Sub(i) Dim P As XmlElement = GetElement(lst(i)) Dim ls = (From t In DocElet.ChildNodes Where t.item("ID").innerText = P.InnerText Select t) Parallel.ForEach(ls, Sub(D) Dim verif_date As String = D.Item("DAD").InnerText Sej.ID = D.Item("ID").InnerText End Sub) End Sub) 

    Ok. So you're running a Parallel.For, and inside that, you're running a Parallel.ForEach. That's quite a lot of overhead just there, it's likely that you're shooting yourself in the foot here, I'd first remove the inner Parallel.ForEach:

    Parallel.For(0, lst.Count, Sub(i) Dim P As XmlElement = GetElement(lst(i)) Dim ls = (From t In DocElet.ChildNodes Where t.item("ID").innerText = P.InnerText Select t) For Each foo In ls Dim verif_date As String = D.Item("DAD").InnerText Sej.ID = D.Item("ID").InnerText Next End Sub) 

    What's the purpose of verif_date? If it's not used, remove it:

     For Each foo In ls Sej.ID = D.Item("ID").InnerText Next 

    Now the outer loop is making yet another selection:

     Dim ls = (From t In DocElet.ChildNodes Where t.item("ID").innerText = P.InnerText Select t) 

    I'd try to flatten that. I'm not very familiar with the LINQ syntax in VB.NET, but what you want is a SelectMany, where you'd select the child nodes where "ID"'s inner text matches the element's inner text for the node you're iterating - in other words the ls query gets outside the outer loop and into the lst query, so you only need a single loop (which can probably run in parallel).

    If the XML is massive, it could be worth trying some PLINQ:

    Dim lst = (From t In DocElet.ChildNodes Select ID = t.item("ID").outerxml) .Distinct() .AsParallel() .SelectMany(...) .ToList() 

    Now you can iterate lst with Parallel.ForEach (time it with a normal For Each first) and do your thing.

    As an aside, I have to say your naming is inconsistent and hard to follow: Sub(i) looks good (Sub(index) would be better), Sub(D) doesn't. I realize VB.NET isn't case-sensitive, but if you're going to have a convention to make function parameters camelCase, by all means stick to it. lst would probably be better off as elements or whatever makes sense - use real, readable words for your identifiers, not just arbitrary abbreviations.

    \$\endgroup\$
    2
    • 1
      \$\begingroup\$is it true : accessing XML from multiple threads with Parallel.For/Parallel.ForEach is not guaranteed to work correctly as these classes are not thread safe.\$\endgroup\$CommentedJun 27, 2014 at 15:36
    • 1
      \$\begingroup\$Awesome, thanks for that comment, that falls under "good to know"! ;)\$\endgroup\$CommentedJun 27, 2014 at 16:36

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.