Simple Gradient Descent Algorithm In Haskell

Question

After becoming interested in various bits of functional programming provided by non-functional languages, I've recently decided to start learning Haskell. As I'm fairly experienced with conventional imperative programming languages, I decided to make my "hello world" something a little more complex - an implementation of gradient descent with 2 variables.

The idea with this code is that you fill in the training set for the algorithm in the code, and then run the code something similar to this:

descentFunc = singleDescend trainingSet 0.1 deltas = descend descentFunc ( 100, 1 ) 100000

Where 0.1 is the learning rate (referred to as lr in the code) 100000 is the number of iterations for the loop, and ( 100, 1 ) is an initial guess for the coefficients.

The actual code that's running is below. As I'm entirely new to Haskell, I was wondering whether code such as this is acceptable? And whether there's any obvious idioms I'm missing/misusing or any glaring style errors I've made.

Any comments on my implementation of the algorithm are welcome also.

import Data.List trainingSet = [ (1.0,10.0),(2.0,20.0),(3.0,30.0),(4.0,40.0) ] hypothesis (d1,d2) x = d1 + (d2 * x) squareDiff d (x,y) = diff * diff where diff = ( hypothesis d x ) - y costFunc ts d = scaleFactor * sum squares where scaleFactor = 1.0 / (2 * genericLength ts) squares = map (squareDiff d) ts diff d (x,y) = (hypothesis d x) - y descendTheta thetaFunc deltaFunc ts lr deltas = deltaFunc deltas - (scaleFactor * sum scaledDiffs) where scaleFactor = lr / genericLength ts diffs = map (diff deltas) ts scaledDiffs = map (\(x,y) -> x * thetaFunc y) $ zip diffs ts descendThetaZero = descendTheta (\_ -> 1) fst descendThetaOne = descendTheta (fst) snd singleDescend ts lr deltas = (thetaZero,thetaOne) where thetaZero = descendThetaZero ts lr deltas thetaOne = descendThetaOne ts lr deltas descend func deltas i | i == 0 = deltas | otherwise = descend func ( func deltas ) ( i - 1 )

would you please consider accepting my answer - or do you still have questions — epsilonhalbe, CommentedJul 7, 2012 at 23:27
@epsilon'εⳆ2'halbe Sorry, i'm actually just having a difficult time deciding who to accept. I'm not convinced that code-review lends itself very well to selecting single answer. I will make a decision shortly though. — obmarg, CommentedJul 11, 2012 at 10:43
sorry i wrote this comment wen no other answer was there - of course choose the answer most helpful to you — epsilonhalbe, CommentedJul 11, 2012 at 21:12

Rahul Gopinath · Accepted Answer · 2012-06-24 03:30:42Z

Welcome to haskell, Your code is quite good for a beginner, I have only peripheral recommendations on your style.

import Test.HUnit import Data.List import Control.Arrow

As @epsilon 'εⳆ2' halbe suggested below, use type signatures to define your functions first, this would help you avoid obvious errors.

type DeltaPair = (Double, Double) type PairFunc = DeltaPair -> Double trainingSet = [(1.0,10.0),(2.0,20.0),(3.0,30.0),(4.0,40.0)]

The classes in Control.* are often very useful. These are just your functions that have been modified a little bit. This may seem like overkill to you, but as a beginner, you would profit tremendously if you internalize the functions in the Control.* space. At the very least, understand the functions used here. (I would certainly recommend your definition of the diff over this, but the point here is one of demonstration.)

diff :: DeltaPair -> DeltaPair -> Double diff d = uncurry (-) . first (flip hypothesis d) where hypothesis x = uncurry (+) . first (* x)

The scaleDiffs can be written in two ways, one by recognizing that it is actually a zipWith and the other, using the functions in Control.Arrow to manipulate the pair. The second gets you a pointfree form. Use which ever you feel comfortable with, but understand both.

I have replaced genericLength with fromIntegral . length because it seems from your training set that the set is not very large. If that is not the case, change it back to genericLength

descendTheta :: PairFunc -> PairFunc -> [DeltaPair] -> Double -> DeltaPair -> Double descendTheta thetaFunc deltaFunc ts lr deltas = deltaFunc deltas - sSum where scaleFactor = lr / fromIntegral (length ts) -- scaledDiffs ts = zipWith (\x y -> x * thetaFunc y) (diffs ts) ts scaledDiffs ts = map (uncurry (*) . second thetaFunc) $ zip (diffs ts) ts diffs = map (diff deltas) sSum = scaleFactor * sum (scaledDiffs ts)

It is often hard to decide which definitions go on the top level. A good rule of thumb is to see if the function being defined can be reused elsewhere. If they can't then they probably are better of as sub definitions under a where clause.

singleDescend :: [DeltaPair] -> Double -> DeltaPair -> DeltaPair singleDescend ts lr deltas = (fn descendThetaZero, fn descendThetaOne) where fn f = f ts lr deltas descendThetaZero = descendTheta (const 1) fst descendThetaOne = descendTheta fst snd

We should always be on the look out for general functions. These help us tremendously in refactoring the code. Here is one such function.

napply :: Int -> (c -> c) -> c -> c napply n = ((!! n) .) . iterate descend :: (DeltaPair -> DeltaPair) -> DeltaPair -> Int -> DeltaPair descend = flip . flip napply deltas n = descend descentFunc (100, 1) n where descentFunc = singleDescend trainingSet 0.1

A few unit tests

tests = TestList [ TestLabel "1" test1, TestLabel "2" test2, TestLabel "3" test3 ] test1 = TestCase $ assertEqual "A1" (100.0, 1.0) (deltas 0) test2 = TestCase $ assertEqual "A1" (92.25,-17.25) (deltas 1) test3 = TestCase $ assertEqual "A1" (89.8375,-19.875) (deltas 2) tt = runTestTT tests

These two does not seem to be used any where

{- squareDiff :: Num a => (a, a) -> (a, a) -> a squareDiff = ((^ 2) .) . diff costFunc :: Fractional a => [(a, a)] -> (a, a) -> a costFunc ts d = scaleFactor * sum squares where scaleFactor = 1.0 / (2 * fromIntegral (length ts)) squares = map (squareDiff d) ts -}

epsilonhalbe · Accepted Answer · 2012-06-23 17:27:09Z

one obvious change would be to add type signatures, i additionally introduced some type synonyms to distinguish between all those Doubles. A bit explanation could be done towards the algorithm - I don't know what it actually want to achieve. I am no programmer so the non understanding comes quite often.

Next thing is it seems costFunc is never used - why is it there?

and I've replaced genericLength by length and converting the result - as the library says length is faster.

type TrainData = [(Double,Double)] type LearnRate = Double type DeltaPair = (Double,Double) trainingSet :: TrainData trainingSet = [ (1.0,10.0),(2.0,20.0),(3.0,30.0),(4.0,40.0) ] hypothesis :: DeltaPair -> Double -> Double hypothesis (d1,d2) x = d1 + (d2 * x) costFunc :: TrainData -> DeltaPair -> Double costFunc ts d = scaleFactor * sum squares where scaleFactor = 1.0 / (2 * fromIntegral (length ts)) squares = map (square . diff d) ts square x = x * x diff :: DeltaPair -> (Double, Double) -> Double diff d (x,y) = hypothesis d x - y descendTheta :: ((Double, Double) -> Double) -> (DeltaPair -> Double) -> TrainData -> LearnRate -> DeltaPair -> Double descendTheta thetaFunc deltaFunc ts lr deltas = deltaFunc deltas - (scaleFactor * sum scaledDiffs) where scaleFactor = lr / fromIntegral (length ts) diffs = map (diff deltas) ts scaledDiffs = map (\(x,y) -> x * thetaFunc y) $ zip diffs ts descendThetaZero :: TrainData -> LearnRate -> (Double, Double) -> Double descendThetaZero = descendTheta (const 1) fst descendThetaOne :: TrainData -> LearnRate -> (Double, Double) -> Double descendThetaOne = descendTheta fst snd singleDescend :: TrainData -> LearnRate -> DeltaPair -> DeltaPair singleDescend ts lr deltas = (thetaZero, thetaOne) where thetaZero = descendThetaZero ts lr deltas thetaOne = descendThetaOne ts lr deltas descend :: (DeltaPair -> DeltaPair) -> DeltaPair -> Int -> DeltaPair descend f deltas i | i < 0 = error "no negative numbers" | i == 0 = deltas | otherwise = descend f (f deltas) (i-1) main :: IO () main = print (descend descendFunc (100, 1) 10000) where descendFunc = singleDescend trainingSet 0.1

Gabriella Gonzalez · Accepted Answer · 2012-06-25 00:45:15Z

This was the simplest rewrite of your code that I could come up with:

{-# LANGUAGE UnicodeSyntax #-} import Data.List trainingSet = [(1, 10), (2, 20), (3, 30), (4, 40)] hypothesis (δ1, δ2) x = δ1 + δ2 * x diff δs (x, y) = hypothesis δs x - y average xs = realToFrac (sum xs) / genericLength xs descendθ θFunc ts lr δs = lr * average (map (\t -> diff δs t * θFunc t) ts) next ts lr δs@(δ0, δ1) = (δ0', δ1') where δ0' = δ0 - descendθ (\(x, y) -> 1) ts lr δs δ1' = δ1 - descendθ (\(x, y) -> x) ts lr δs descend f δs i = iterate f δs !! i

Some general comments:

You don't need to put .0s on your numeric literals to indicate they are Doubles like you would in C. A type signature will suffice.

I used the unicode syntax extension for greek variable names to make it easier on my eyes.

Your descend function is just iterate in disguise.

Your variable naming in your descendTheta function was incredibly confusing, since you name your output variables theta, even thought they are basically the same as your input variables, which you named delta. I decided to name them both delta for consistency, leaving theta only to refer to the projection of the data set that you are descending on.

Your scaledDiffs function was incredibly awkward and a round-about way to do what essentiallyw as a single map and average. Unfortunately, the average function is not in the Haskell standard library, but the definition I included in the code is the most general version and will type-check for anything you would want to average.

Your training set and learning rate are constant over the entire run, so you could actually improve the above code even more by using the Reader monad to thread those variables through your code without explicit parameter-passing. If you don't know what the Reader monad is, just ask and I will update this post with a demonstration of it. It's very easy to use.

Also, I disagree that you must put type signatures on your code. For application code that only you will maintain, using type signatures is mostly a matter of style. For libraries, though, or code other people have to maintain, I would strongly recommend type signatures.

Very nice and elegent implementation. You could replace ` ((x, y) -> 1)` with const 1 and (\(x, y) -> x) with fst otherwise +1 — Rahul Gopinath, CommentedJun 25, 2012 at 5:36
@blufox Yeah, I wrote it that way to document what those two functions were doing by using the same variable names as the training set data, because it was difficult for me to understand at first. — Gabriella Gonzalez, CommentedJun 25, 2012 at 13:47

Stack Exchange Network

Simple Gradient Descent Algorithm In Haskell

3 Answers 3

Hot Network Questions

Simple Gradient Descent Algorithm In Haskell

3 Answers 3

Related

Hot Network Questions