I'm a very new Haskell programmer, and I've written the following code for loading TSV files. I think it can be improved, but I'm not sure how.
- I feel like the giant "do" block in loadData is inelegant, and that there's likely a better way to approach this.
- I think I'm trying to avoid the IO monad because I'm not sure how to work with it, but that there's probably a better way to handle mapping
parseTSV
over the contents of the files.
One note: I'm not super concerned about performance - this will run once at the beginning of my program. I want to load in all the files entirely to build a composite data structure from their contents.
module LanguageMachine (loadData) where import System.Directory (listDirectory) import Data.List ((\\), elemIndex) import Data.Maybe (mapMaybe) parseTsv :: String -> [(String, Int)] parseTsv contents = mapMaybe parseLine (lines contents) parseLine :: String -> Maybe (String, Int) parseLine line = case elemIndex '\t' line of Just i -> let (word, count) = splitAt i line in Just (word, read count :: Int) Nothing -> Nothing loadData :: FilePath -> [FilePath] -> IO [(String, [(String, Int)])] loadData path exclude = do files <- listDirectory path let filtered = files \\ exclude let prefixed = map ((path ++ "/") ++) filtered contents <- traverse readFile prefixed let results = map parseTsv contents return $ zip filtered results
The files look like this, which are two-value TSV lines of a word and then the number of occurrences of that word:
ARIA 4308 ORE 4208
Thank you!