10
\$\begingroup\$

I am still pretty new to Haskell and am working on a graph generator for undirected unlabeled graphs containing loops and I have a bottleneck in the following functions. So far, I have not given any thought to performance and just looked for correctness.

I know this kind of question is not really popular, but I'd be interested in general guidelines for improving performance given some naive but correct implementation like the following.

For instance:

  1. Where to enforce strictness?
  2. Should I use vector/array instead of lists? When, where and why?
  3. Should I improve single functions by replacing recursion using folds/maps/etc. or similar?

Additional info:

  1. I'm using -O2 with ghc
  2. profiling shows (when normalizing runtime of connections to 100%)
    • arqSeq 40% (with almost all time spend in boundsequences)
    • connectionCombinations 60% with a quarter of the time spend in occurences

I don't think you need to understand all the details. I'm looking more for micro-improvements (the following determines arcs from a given node to equivalence classes of vertices grouped by their degree).

import Data.List import Control.Monad type Arcs = Int type Count = Int type Vertex = Int type MSequence = [Int] data EquivClass = EC { order :: Int, verts :: [Vertex] } deriving (Eq, Show) type ECPartition = [EquivClass] type NodeSelection = [(Arcs,Count)] type ECNodeSelection = (EquivClass, NodeSelection) -- number of occurences of unique elements in a list occurences :: (Ord a) => [a] -> [(a, Int)] occurences = map (\xs@(x:_) -> (x, length xs)) . group . reverse . sort -- number of vertices in an equivalence class ecLength :: EquivClass -> Int ecLength = length . verts -- for a given y = (y_1,...,y_n) and a bound m, find all vectors -- x = (x_1,...,x_n) such that |x| = m and x_i <= y_i boundSequences :: (Num a, Ord a, Enum a) => a -> [a] -> [[a]] boundSequences m x | m <= sum x = (fByM . sequence . ranges) x | otherwise = [[]] where fByM = filter (\x -> sum x == m) ranges = map (\x -> [0..x]) -- return m-sequences (combinations of number of arcs) for the given -- order to an ECPartition arcSeq :: Int -> ECPartition -> [MSequence] arcSeq m x = boundSequences m (map ecProd x) where ecProd e = ecLength e * order e -- return all the possible arc combinations to an equivalence -- class for a given number of arcs connectionCombinations :: Int -> EquivClass -> [NodeSelection] connectionCombinations arcs = map groupOcc . prune arcs . sequence . orderRep where orderRep (EC o v) = replicate (length v) [0..o] prune a = nub . map (reverse . sort) . filter ((== a) . sum) groupOcc = filter ((/= 0) . fst) . occurences -- return all the possible lists of equivalence class node selections -- from a Partition and a givne number of arcs connections :: ECPartition -> Int -> [[ECNodeSelection]] connections p order = concatMap (con p) $ arcSeq order p where con p arcs = map (zip p) $ zipWithM connectionCombinations arcs p main = do -- small example. In the complete app this is called 10,000's of -- times in case of high degrees or vertex counts let cons = connections [EC 4 [1], EC 1 [2,3,4]] 5 mapM_ (print) cons 
\$\endgroup\$
5
  • 2
    \$\begingroup\$Possibly the most important thing is to figure out how to effectively measure performance, and determine based on that data where the bottlenecks are. Haskell's performance can be very hard to accurately predict without quantitative measurements.\$\endgroup\$CommentedDec 1, 2011 at 18:03
  • 3
    \$\begingroup\$Performance can be very hard to predict accurately without measurements in any language. Laziness just makes it harder to analyze those results (also in any language, but pervasive in Haskell).\$\endgroup\$CommentedDec 1, 2011 at 18:09
  • \$\begingroup\$@MattFenwick Please see my update, I have chosen this bunch of functions after profiling. Sorry I forgot to mention this.\$\endgroup\$
    – bbtrb
    CommentedDec 1, 2011 at 18:14
  • \$\begingroup\$Could you put up an example that compiles?\$\endgroup\$
    – Nathan Howell
    CommentedDec 1, 2011 at 18:20
  • \$\begingroup\$@NathanHowell: Code updated, should compile and run\$\endgroup\$
    – bbtrb
    CommentedDec 1, 2011 at 18:26

1 Answer 1

11
\$\begingroup\$
-- for a given y = (y_1,...,y_n) and a bound m, find all vectors -- x = (x_1,...,x_n) such that |x| = m and x_i <= y_i boundSequences :: (Num a, Ord a, Enum a) => a -> [a] -> [[a]] boundSequences m x | m <= sum x = (fByM . sequence . ranges) x | otherwise = [[]] where fByM = filter (\x -> sum x == m) ranges = map (\x -> [0..x]) 

Excepting some fortunate cases, you are wasting a lot of work here. sequence . ranges produces (y_1+1) * ... * (y_n+1) lists you have to check. If you write a function to produce only the successful lists, you can gain a lot of performance for larger input (it won't do too much for short lists with small elements, but it will help for those too).
Shouldn't the case sum x < m return [] rather than [[]]?

-- return all the possible arc combinations to an equivalence -- class for a given number of arcs connectionCombinations :: Int -> EquivClass -> [NodeSelection] connectionCombinations arcs = map groupOcc . prune arcs . sequence . orderRep where orderRep (EC o v) = replicate (length v) [0..o] prune a = nub . map (reverse . sort) . filter ((== a) . sum) groupOcc = filter ((/= 0) . fst) . occurences 

Instead of reverse . sort, use sortBy (flip compare). That won't make much difference for short lists, but it's cleaner, IMO. prune arcs . sequence contains another occurrence of the boundSequences problem, creating a lot of lists and filtering out most of them immediately. nub is not good, it is quadratic (worst case, O(total*distinct) in general). The things you're nubbing have an Ord instance, if the order doesn't matter, map head . group . sort is a much faster nub than nub (still better Data.Set.toList . Data.Set.fromList), if order matters, keep a Set of seen elements and for each new, output that and add it to the set of seen.

I've not yet profiled, but it may be that the profile is misleading because it attributes costs to the functions where results are forced rather than where they are calculated, that needs some investigation, check back later.

Okay, the given data doesn't run long enough to produce a meaningful profile, so I changed it to

let cons = connections [EC 4 [1 .. 6], EC 6 [2 .. 9]] 5 

That runs long enough to collect a handful of samples. Unfortunately, the profile wasn't very informative:

 total time = 7.90 secs (395 ticks @ 20 ms) total alloc = 8,098,386,128 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc connectionCombinations Main 100.0 100.0 

So I inserted a couple of {-# SCC #-} pragmas. The biggo in that programme is the sequence . orderRep, unsurprisingly. A rewrite along the lines mentioned above,

boundSequences :: (Num a, Ord a, Enum a) => a -> [a] -> [[a]] boundSequences m xs | sm < m = [] | sm == m = [xs] | otherwise = go sm m xs where sm = sum xs go _ r [] | r == 0 = [[]] | otherwise = [] go _ r [y] | y < r = [] | otherwise = [[r]] go s r (y:ys) = do let mny | s < r+y = r+y-s | otherwise = 0 mxy = min y r c <- [mny .. mxy] map (c:) (go (s-y) (r-c) ys) 

and using that in connectionCombinations instead of prune arcs . sequence (requires a small change in orderRep),

connectionCombinations :: Int -> EquivClass -> [NodeSelection] connectionCombinations arcs = map groupOcc . nub . map (sortBy (flip compare)) . boundSequences arcs . orderRep where orderRep (EC o v) = replicate (length v) o groupOcc = filter ((/= 0) . fst) . occurences 

brought the running time (ghc -O2, no profiling) down significantly:

dafis@schwartz:~/Cairo> time ./orArcs > /dev/null real 0m5.836s user 0m5.593s sys 0m0.231s dafis@schwartz:~/Cairo> time ./arcs > /dev/null real 0m0.008s user 0m0.005s sys 0m0.003s 

Whether any further optimisations (e.g. the abovementioned nub) are necessary, and where most of the time in the actual programme is spent, so that one could identify the most important points, I cannot tell without more realistic data.

\$\endgroup\$
1
  • \$\begingroup\$Wow, thank you for the elaborate answer. It wasn't obvious to me that the algorithm (using sequence) is so flawed, I've tried the most obvious approach first and hoped for haskell to figure the rest out for me.\$\endgroup\$
    – bbtrb
    CommentedDec 1, 2011 at 22:35