I am still pretty new to Haskell and am working on a graph generator for undirected unlabeled graphs containing loops and I have a bottleneck in the following functions. So far, I have not given any thought to performance and just looked for correctness.
I know this kind of question is not really popular, but I'd be interested in general guidelines for improving performance given some naive but correct implementation like the following.
For instance:
- Where to enforce strictness?
- Should I use vector/array instead of lists? When, where and why?
- Should I improve single functions by replacing recursion using folds/maps/etc. or similar?
Additional info:
- I'm using -O2 with
ghc
- profiling shows (when normalizing runtime of
connections
to 100%)arqSeq
40% (with almost all time spend inboundsequences
)connectionCombinations
60% with a quarter of the time spend inoccurences
I don't think you need to understand all the details. I'm looking more for micro-improvements (the following determines arcs from a given node to equivalence classes of vertices grouped by their degree).
import Data.List import Control.Monad type Arcs = Int type Count = Int type Vertex = Int type MSequence = [Int] data EquivClass = EC { order :: Int, verts :: [Vertex] } deriving (Eq, Show) type ECPartition = [EquivClass] type NodeSelection = [(Arcs,Count)] type ECNodeSelection = (EquivClass, NodeSelection) -- number of occurences of unique elements in a list occurences :: (Ord a) => [a] -> [(a, Int)] occurences = map (\xs@(x:_) -> (x, length xs)) . group . reverse . sort -- number of vertices in an equivalence class ecLength :: EquivClass -> Int ecLength = length . verts -- for a given y = (y_1,...,y_n) and a bound m, find all vectors -- x = (x_1,...,x_n) such that |x| = m and x_i <= y_i boundSequences :: (Num a, Ord a, Enum a) => a -> [a] -> [[a]] boundSequences m x | m <= sum x = (fByM . sequence . ranges) x | otherwise = [[]] where fByM = filter (\x -> sum x == m) ranges = map (\x -> [0..x]) -- return m-sequences (combinations of number of arcs) for the given -- order to an ECPartition arcSeq :: Int -> ECPartition -> [MSequence] arcSeq m x = boundSequences m (map ecProd x) where ecProd e = ecLength e * order e -- return all the possible arc combinations to an equivalence -- class for a given number of arcs connectionCombinations :: Int -> EquivClass -> [NodeSelection] connectionCombinations arcs = map groupOcc . prune arcs . sequence . orderRep where orderRep (EC o v) = replicate (length v) [0..o] prune a = nub . map (reverse . sort) . filter ((== a) . sum) groupOcc = filter ((/= 0) . fst) . occurences -- return all the possible lists of equivalence class node selections -- from a Partition and a givne number of arcs connections :: ECPartition -> Int -> [[ECNodeSelection]] connections p order = concatMap (con p) $ arcSeq order p where con p arcs = map (zip p) $ zipWithM connectionCombinations arcs p main = do -- small example. In the complete app this is called 10,000's of -- times in case of high degrees or vertex counts let cons = connections [EC 4 [1], EC 1 [2,3,4]] 5 mapM_ (print) cons