5
\$\begingroup\$

upd: I am very sorry about my mistake. The old version of data is download from luogu, but the website do NOT allow me to download the data of the worst case. In fact, when facing same amount of operations, the memory usage depends on the type of operations and their orders. I re-generate some data, and update the link and profiling logs.


I am new in Haskell, and I try to implement an AVL tree. To test my code, I submit it to an Online Judge, which has a problem of balanced-tree. However, I get a MLE (Memory Limit Exceeded).

In fact, when facing 10^5 times of operations, my code requires 58 MB memories, while native C++ code only need about 2 to 3 MB. The speed of C++ code is also 10x faster.

I wonder if haskell can deal with such large data structure while having performance close to native code. What should I do?

Here is my code:

type Info = (Int, Int) data Tree a = Empty | Ord a => Node (Tree a) a Info (Tree a) just :: Maybe a -> a just (Just a) = a height :: Tree a -> Int height Empty = 0 height (Node _ _ (h,_) _) = h size :: Tree a -> Int size Empty = 0 size (Node _ _ (_,s) _) = s maintainInfo :: Tree a -> Tree a maintainInfo (Node ls v _ rs) = Node ls v (max (height ls) (height rs) + 1, size ls + size rs + 1) rs rotateL :: Tree a -> Tree a rotateL (Node (Node ll lv _ lr) v _ rs) = maintainInfo (Node ll lv (0,0) (maintainInfo (Node lr v (0,0) rs))) rotateR :: Tree a -> Tree a rotateR (Node ls v _ (Node rl rv _ rr)) = maintainInfo (Node (maintainInfo (Node ls v (0,0) rl)) rv (0,0) rr) rotateL_ :: Tree a -> Tree a rotateL_ (Node (Node ll lv _ lr) v _ rs) | height ll >= height lr = rotateL (Node (Node ll lv (0,0) lr) v (0,0) rs) | otherwise = rotateL (Node (rotateR (Node ll lv (0,0) lr)) v (0,0) rs) rotateR_ :: Tree a -> Tree a rotateR_ (Node ls v _ (Node rl rv _ rr)) | height rl > height rr = rotateR (Node ls v (0,0) (rotateL (Node rl rv (0,0) rr))) | otherwise = rotateR (Node ls v (0,0) (Node rl rv (0,0) rr)) maintain :: Tree a -> Tree a maintain (Node ls v info rs) | height ls > height rs + 1 = rotateL_ (Node ls v info rs) | height rs > height ls + 1 = rotateR_ (Node ls v info rs) | otherwise = maintainInfo (Node ls v info rs) insert :: Ord a => a -> Tree a -> Tree a insert val Empty = Node Empty val (1,1) Empty insert val (Node ls v _ rs) | val < v = maintain (Node (insert val ls) v (0,0) rs) | val >= v = maintain (Node ls v (0,0) (insert val rs)) iter :: Tree a -> [a] iter Empty = [] iter (Node ls v _ rs) = iter ls ++ v : iter rs maxi :: Tree a -> Maybe a maxi Empty = Nothing maxi (Node _ v _ Empty) = Just v maxi (Node _ _ _ rs) = maxi rs mini :: Tree a -> Maybe a mini Empty = Nothing mini (Node Empty v _ _) = Just v mini (Node ls _ _ _) = mini ls eraseSwap :: Tree a -> Tree a eraseSwap Empty = Empty eraseSwap (Node ls _ _ Empty) = ls eraseSwap (Node ls _ _ rs) = let next = just (mini rs) in maintain (Node ls next (0,0) (erase next rs)) erase :: a -> Tree a -> Tree a erase _ Empty = Empty erase val (Node ls v _ rs) | val < v = maintain (Node (erase val ls) v (0,0) rs) | val > v = maintain (Node ls v (0,0) (erase val rs)) | val == v = eraseSwap (Node ls v (0,0) rs) next :: a -> Tree a -> Maybe a next _ Empty = Nothing next val (Node ls v _ rs) | val < v = Just (maybe v (min v) (next val ls)) | val >= v = next val rs prev :: a -> Tree a -> Maybe a prev _ Empty = Nothing prev val (Node ls v _ rs) | val <= v = prev val ls | val > v = Just (maybe v (max v) (prev val rs)) rank :: a -> Tree a -> Int rank _ Empty = 1 rank val (Node ls v _ rs) | val <= v = rank val ls | val > v = size ls + 1 + rank val rs select :: Int -> Tree a -> Maybe a select _ Empty = Nothing select k (Node ls v _ rs) | size ls >= k = select k ls | size ls == k - 1 = Just v | otherwise = select (k - size ls - 1) rs update :: [Int] -> Tree Int -> Tree Int update [1,x] = insert x update [2,x] = erase x ask :: [Int] -> Tree Int -> Int ask [3,x] tree = rank x tree ask [4,x] tree = just (select x tree) ask [5,x] tree = just (prev x tree) ask [6,x] tree = just (next x tree) solve :: Int -> Tree Int -> IO () solve 0 _ = return () solve n tree = do inputQuery <- getLine let l = map read (words inputQuery) :: [Int] if head l < 3 then do solve (n-1) (update l tree) else do print (ask l tree) solve (n-1) tree return () main :: IO () main = do inputN <- getLine let n = read inputN :: Int solve n Empty return () 

I humbly accept all suggestions, including those that are not aimed at performance but at coding style.


To test my code, I submit it to Luogu P3369, which is in Chinese. The simple translation is as follow:

You need to dynamically maintain a re-set M and provide the following operations:

  1. Insert a number x into M.
  2. Delete a number x from M (if there are multiple identical numbers, only delete one).
  3. Query how many numbers in M are smaller than x, and add one to the answer.
  4. Query the number that ranks in the x-th position after M is sorted from small to large.
  5. Query the predecessor of x in M (the predecessor is defined as the largest number that is smaller than x).
  6. Query the successor of x in M (the successor is defined as the smallest number that is greater than x).
  • For operations 3, 5, and 6, it is not guaranteed that the number x exists in the current re-set.
  • The first line is n, which indicates the number of operations. Each of the following n lines has two numbers opt and x. opt indicates the sequence number of the operation (1 <= opt <= 6)
  • For operations 3,4,5,6, each line outputs a number, indicating the corresponding answer.

If anyone has difficulty to login luogu, here is also some data files to download:

Here is the output of ghc -O2 -rtsopts main.hs && ./main < avl.in > tmp.out +RTS -s:

 1,579,041,840 bytes allocated in the heap 309,624,880 bytes copied during GC 19,305,064 bytes maximum residency (21 sample(s)) 186,640 bytes maximum slop 58 MiB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 360 colls, 0 par 0.324s 0.329s 0.0009s 0.0055s Gen 1 21 colls, 0 par 0.419s 0.425s 0.0202s 0.0500s INIT time 0.001s ( 0.001s elapsed) MUT time 0.897s ( 0.907s elapsed) GC time 0.743s ( 0.754s elapsed) EXIT time 0.000s ( 0.009s elapsed) Total time 1.641s ( 1.671s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 1,759,782,364 bytes per MUT second Productivity 54.7% of total user, 54.3% of total elapsed 
\$\endgroup\$
0

    1 Answer 1

    9
    \$\begingroup\$

    As I said in a comment, I can't verify that your program as written uses 40MB - your profiling data and my own runs of your program show 3MB, in the same range as the C++ performance you refer to. Even with -O0, I see only 6MB residency for your input file with 10k lines.

    So I can't make any recommendations that are guaranteed to greatly reduce memory usage: it looks reduced enough to me already. But the first few things I would try are:

    Improve strictness

    1. Add {-# LANGUAGE Strict #-} to the top of the file. You don't need any laziness, and this will help prevent space leaks. When I try this I see no change in memory usage, and only a 5% or so speedup (not really statistically significant, I imagine)
    2. Replace your Info tuple with a real data type. This will let the runtime avoid using the existing tuple type, which is of course lazy, and use a type that's equivalent but can be strict (thanks to the Strict language extension).
    3. Replace all of your (0,0) expression with a reference to a top-level constant defined as untracked = Info 0 0. I think you did this in the version of this question you posted to Stack Overflow, calling it nilInfo instead. It's a good idea, so you don't waste time building this same boring value over and over. When I combine these 3 improvements, I again see the same memory usage, and a larger but still not significant speedup.

    Reduce use of dummy Info values

    Next, I note that many of your uses of this untracked "dummy Info" value are followed immediately by a call to maintainInfo, which ignores the Info field and constructs its own. So you are building a Tree node just to tear it down and build a new one. Instead, define a function that takes all the other Tree fields as parameters, and constructs a Node with the appropriate Info:

    mkNode :: Ord a => Tree a -> a -> Tree a -> Tree a mkNode ls v rs = Node ls v (Info (max (height ls) (height rs) + 1) (size ls + size rs + 1)) rs 

    The call sites are easy to update: just replace maintainInfo (Node x y untracked z) with mkNode x y z.

    You should be able to do something similar with the rotate family of functions, and maintain, all of which ignore Info on the nodes they are given, and construct new Info values based on their subtrees after rotation, but still waste time constructing Info metadata to pass around to each other. That's more surgery than I'm interested in doing myself for this answer, though.

    Participate in Prelude typeclasses

    Instead of bespoke functions iter, maxi, and mini, I suggest participating in Foldable. Its maximum and minimum functions don't match signature with your maxi and mini - you return a Maybe, in the modern style, while these functions with much older specifications simply crash when given an empty list. Fortunately, you never call maxi at all, and you assume that mini always returns a Just anyway. So you might as well make your Tree Foldable. This won't help performance at all, but it's just a good practice.

    extremum :: String -> (forall a. Tree a -> Tree a) -> (forall a. Ord a => Tree a -> a) extremum name next = impl where impl Empty = error (name <> ": empty Tree") impl n = go n go n = case next n of Empty -> let Node _ v _ _ = n in v n' -> go n' instance Foldable Tree where foldMap _ Empty = mempty foldMap f (Node ls v _ rs) = foldMap f ls <> f v <> foldMap f rs maximum = extremum "maximum" $ \(Node _ _ _ rs) -> rs minimum = extremum "minimum" $ \(Node ls _ _ _) -> ls length = size 

    I am being more "clever" than necessary with the extremum type signature and implementation. If you don't understand it or don't like it, a simpler way to implement maximum and minimum would be fine of course - it's nothing critical.

    More idiomatic use of comparisons in guard clauses

    In erase, you compare two values three times, checking for <, then for >, then for ==. Instead, use compare, which returns a type with 3 values, so you can simply fall into the right case directly.

    erase :: a -> Tree a -> Tree a erase _ Empty = Empty erase val (Node ls v _ rs) = case compare val v of LT -> maintain (Node (erase val ls) v untracked rs) GT -> maintain (Node ls v untracked (erase val rs)) EQ -> eraseSwap (Node ls v untracked rs) 

    In many other functions, you compare with two mutually exclusive conditions, such as >= and then <. That could be replaced with compare in a similar way (match LT in the first case and _ for the other case), but it is more common to use otherwise instead. Either way, you remove a redundant comparison and make the code easier to understand.

    next :: a -> Tree a -> Maybe a next _ Empty = Nothing next val (Node ls v _ rs) | val < v = Just (maybe v (min v) (next val ls)) | otherwise = next val rs 

    Your select implementation can avoid some redundant function calls by using a where clause to span all the guards:

    select :: Int -> Tree a -> Maybe a select _ Empty = Nothing select k (Node ls v _ rs) | lSize >= k = select k ls | lSize == k - 1 = Just v | otherwise = select (k - lSize - 1) rs where lSize = size ls 

    Consider a different data structure

    AVL trees are a fine data structure to implement for an exercise like this. But if you are worried about performance, they are not famous for being the fastest tree to keep balanced. Their invariant is very strict, and so a lot of time is spent keeping them perfectly balanced, while a data structure with a more relaxed invariant can let the trees get a little lopsided without letting them get too unbalanced. Red/black trees are another classic, simple example, and will spend less time maintaining balance at the cost of a slightly deeper average tree.

    Or you could use some types from the standard library. Data.IntMap should support all the features you need, for example, and I imagine it's more efficient than anything you could do yourself. Or there's probably a fun way to build this yourself out of a FingerTree, which would be somewhere in between doing it all yourself and having it all done for you. It depends on how much you want to do by hand for the exercise, and how much you're happy to have handled for you.

    \$\endgroup\$
    1
    • \$\begingroup\$Thanks a lot! I also realized the problem of my "Info" tuple, but I cannot find out a better solution. BTW, I have updated the new data, which has following feature: first inserting 50000 numbers, and then making queries randomly. After applying all of your optimization, the memory usage reduced from 58M to 46M. That is a great step, but the performance is still far from native code. I am going to check Data.IntMap and FingerTree. Maybe rotate-based tree just not fit haskell.\$\endgroup\$CommentedApr 2 at 1:45

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.