HackerRank, Haskell simple "compression" algorithm

Question

The question asked is very straight-forward and is simple enough to solve. What I am looking for is that hopefully I can get some understanding for using the constructs and built-ins of the Haskell language itself.

That said, the question is as follows:

Joseph and Jane are making a contest for apes. During the process, they have to communicate frequently with each other. Since they are not completely human, they cannot speak properly. They have to transfer messages using postcards of small sizes.
To save space on the small postcards, they devise a string compression algorithm:
If a character, \$ch\$, occurs \$n(> 1)\$ times in a row, then it will be represented by \$\{ch\}\{n\}\$. where \$\{x\}\$ is the value of \$x\$. For example, if the substring is a sequence of \$4\$ 'a' ("aaaa"). it will be represented as "a4".
If a character, \$ch\$, occurs exactly one time in a row, then it will be simply represented as \$\{ch\}\$. For example, if the substring is "a". then it will be represented as "a".
Help Joseph to compress a message. msg.
Input
The only line of input contains a string. msg.
Output
Print the string msg as a compressed message.

Create a messaging system where each word (token) will be compressed based on any consecutive characters.

Single sequenced characters remain the same but multiple identical consecutive characters will be replaced by the character followed by an integer representing the number of times it repeats consecutively.

Example input and output:

Sample Input #00: > abcaaabbb Sample Output #00: > abca3b3 Sample Input #01: > abcd Sample Output #01: > abcd

My code:

-- String representation as a simple Compression Algorithm -- module Main where import Text.Printf compression :: String -> String -> String -> Int -> String compression input output prevChar count | input == "" && output == "" = output | prevChar == "" = compression (tail input) output ([head input]) 1 | input == "" && count > 1 = output ++ prevChar ++ (show count) | input == "" && count < 2 = output ++ prevChar | prevChar == ([head input]) = compression (tail input) output prevChar (count+1) | prevChar /= ([head input]) && count < 2 = compression (tail input) (output ++ prevChar) ([head input]) 1 | prevChar /= ([head input]) && count > 1 = compression (tail input) (output ++ prevChar ++ (show count)) ([head input]) 1 main :: IO () main = do let s1 = "aaabaaaaccaaaaba" printf "Decompressed: %s and Compressed: %s" s1 (compression s1 "" "" 0)

I feel as though some of my Haskell code can be clunky and could use some more features of the language itself as I mentioned earlier.

Now this isn't a full-blown application or even a part of a module for an actual application but just an exercise to improve my working knowledge and skill for writing Haskell.

@J_H the text of the description from the image has been added — Sᴀᴍ Onᴇᴌᴀ, CommentedSep 13, 2024 at 15:11

Rainer P. · Accepted Answer · 2024-08-27 12:45:20Z

I agree with some of the criticism mentioned in the other review, notably:

You're trying to write a function from String to String, so you should have such a function.
In general, be more eager to break up your work into smaller named pieces.

I'd like to add another point, which is heavy use of existing functions (take, drop, takeWhile, dropWhile, splitAt, etc.) instead of iterating yourself. You can search for functions on hoogle.haskell.org by entering the desired type signature and it spits out all matching functions. In many cases, you will find an existing function that fits you needs. If not, write it yourself, but keep it as a separate helper function.

Also, use pattern matching in you function signature. If you write input == "" && output == "" in the pattern and then compression (tail input) output ([head input]) in the function body, you probably should have deconstructed the list during pattern matching. In my code below, I use input@(a : _) to extract the head and the full list (including the head), but I could also extract the tail if I wanted to (by replacing _ with a variable name). This line automatically rejects empty lists, so execution will fall through to until it finds a matching pattern.

I strongly disagree with the code provided in the other review, which is equally clunky than your original code, if not more. An idiomatic solution to your problem looks more like this:

compression :: String -> String compression "" = "" compression input@(a : _) | count > 1 = a : show count ++ compression remainder | otherwise = a : compression remainder where count = length $ takeWhile (== a) input remainder = dropWhile (== a) input

If you want to avoid iterating twice, you can use span, which returns a tuple and is equivalent to (takeWhile, dropWhile). The updated code is:

compression :: String -> String compression "" = "" compression input@(a : _) | count > 1 = a : show count ++ compression remainder | otherwise = a : compression remainder where count = length prefix (prefix, remainder) = span (== a) input

For absolute best performance, you can create a helper function that replaces dropWhile but also reports the number of dropped items. This avoids creating the prefix list and then counting its length.

compression :: String -> String compression "" = "" compression input@(a : _) | count > 1 = a : show count ++ compression remainder | otherwise = a : compression remainder where (remainder, count) = dropWhileAndCount (== a) input dropWhileAndCount :: (a -> Bool) -> [a] -> ([a], Int) dropWhileAndCount f = go 0 where go n (x : xs) | f x = go (n + 1) xs go n xs = (xs, n)

Agreed that this is a much nicer solution to the original coding challenge. — ShapeOfMatter, CommentedAug 28, 2024 at 16:09
take(drop)While was my exact thought process. I was on Hoogle search for them both right before I had originally posted this. I haven't leveraged where nearly enough and this shows it. I'll start to include this instead of creating more recursive cases. I think this is pretty self-evident with my code provided. Thanks again! — tijko, CommentedAug 31, 2024 at 17:42

Franky · Accepted Answer · 2024-09-13 14:35:02Z

The first two sections apply to every problem and every programming language.

Use complementary conditions

I recommend against the > 1 and < 2 conditions. When > 1 is one case, the other should be <= 1. How easy is it to spot the mistake in

f x | x < 1 = "foo" | x > 2 = "bar"

You would have never written the equivalent

f x | x < 1 = "foo" | x >= 3 = "bar"

because it is so obviously wrong. Of course the haskell way of writing it is

f x | x > 1 = "bar" | otherwise = "foo"

but I like my code to closely mirror the problem statement, see next section.

Follow the problem statement as closely as possible

This is very pedantic but it helps in larger problems. Your order of conditions is swapped. The question is

Single sequenced characters remain the same
multiple identical consecutive characters will...

but you handle the multiple case first. Ideally, at some place on code, you can write the code the sequence as the problem statement.

encode count char | count == 1 = char -- Single sequenced characters remain the same but | count > 1 = -- multiple identical consecutive characters will be replaced char -- by the character ++ -- followed by an show count -- integer representing the number of times it repeats consecutively

The comments here are clearly overdoing it, but in the real world you know which part is the most important to the customer or where the requirements are most likely to be changed (slightly).

use standard functions

Learn the standard functions. For the compression algorithm, group jumps to my mind:

group :: Eq a => [a] -> [[a]] > group "abcaaabbb" ["a","b","c","aaa","bbb"] compression = concatMap encode' . group where encode' x = encode (length x) x encode -- see above

ShapeOfMatter · Accepted Answer · 2024-08-21 19:11:55Z

There are a lot of tools Haskell provides for doing things like this elegantly; I'll try to limit my feedback to boring solutions that will be helpful.

You're trying to write a function from String to String, so you should have such a function. Rename what you've got now as compression', and provide a wrapper.
By convention, try to have "context like" arguments first and "data like" arguments last; this will assist with partial-application of your functions.
Having prevChar :: String isn't good; what if it happened to be "asdf"? Maybe Char would be better.
In general, be more eager to break up your work into smaller named pieces.
The quantity of arguments you're handling, and the quantity of cases you're handling in parallel, should feel like a red flag: you need a better abstraction. You've probably already noticed that Functor won't work for this. With a little experience, you'll start to notice that your current recursion scheme basically is the State monad. I'm using the mtl library for this but we don't need to use the Transformer version; just State will serve.

By the time I got this working, it didn't have much in common with what you wrote, sorry. Hopefully it contains the right number of new things for you to learn!

module Main where import Control.Monad (forM_, when) import Control.Monad.State (execState, get, gets, modify, put, State) import Text.Printf data CompressorState = CompressorState { output :: String , prevChar :: Maybe Char , count :: Int , input :: String } deriving (Show) type Compressor = State CompressorState compression' :: Compressor () compression' = do mChar <- getNext case mChar of Nothing -> flushCount Just char -> do previous <- getPrevious when (Just char /= previous) $ do flushCount emit char setPrevious char increment compression' where getNext = do cs@CompressorState{input = i} <- get case i of [] -> return Nothing c:i' -> do put cs{input = i'} -- Using record update syntax return (Just c) getPrevious = gets prevChar increment = modify (\cs@CompressorState{count = c} -> cs{count = c + 1}) flushCount = do c <- gets count when (1 < c) $ forM_ (show c) emit modify (\cs -> cs{count = 0}) emit :: Char -> Compressor () emit c = modify (\cs@CompressorState{output = o} -> cs{output = c : o}) setPrevious :: Char -> Compressor () setPrevious c = modify (\cs -> cs{prevChar = Just c}) compression :: String -> String compression i = reverse . output $ execState compression' CompressorState{ output = "" , prevChar = Nothing , count = 0 , input = i } examples :: [(String, String)] examples = [ ("abcaaabbb", "abca3b3") , ("abcd", "abcd") , ("aaabaaaaccaaaaba", "a3ba4c2a4ba") ] main :: IO () main = do let results = do -- This is in the List Monad! (input, reference) <- examples let output = compression input return (output == reference, input, output, reference) forM_ results print

I have to be honest, I'm not sure I would have used this exact form but I think that is only natural when it comes to writing in a functional form. That said, I was able to learn a few things just by having gone through what you did here and breaking each part down. — tijko, CommentedAug 22, 2024 at 13:56
I've seen this Monad scheme many places now where case catches Maybe and Nothing/Just — tijko, CommentedAug 23, 2024 at 0:11
All when, put, & get are new to me. I'll stop adding every minor comment but I want to say truly thank you. — tijko, CommentedAug 23, 2024 at 1:57
"I've seen this Monad scheme many places now where case catches Maybe" Just note that case expressions are just expressions, nothing about them is tied to monads. (One extra thing I considered using in my response was the LambdaCase extension, which often makes cases more fun to use.) — ShapeOfMatter, CommentedAug 24, 2024 at 0:34
Of course, I've just seen the do notation (where) is used heavily with case where Monads are being used. I'm not at all familiar enough to speak much more on it or LambdaCase. I'm still picking apart several aspects of your example. It's short but packed with nuance. — tijko, CommentedAug 24, 2024 at 1:14

Stack Exchange Network

HackerRank, Haskell simple "compression" algorithm

Input

Output

3 Answers 3

Use complementary conditions

Follow the problem statement as closely as possible

use standard functions

Hot Network Questions

HackerRank, Haskell simple "compression" algorithm

Input

Output

3 Answers 3

Use complementary conditions

Follow the problem statement as closely as possible

use standard functions

Related

Hot Network Questions