Sunday, June 5, 2016

Simple Gradient Descent in C Tutorial

This post will be about how to implement gradient descent very quickly, dirtily, and simply in C. It assumes you have some working knowledge of what gradient descent is supposed to do but I will explain as much as possible as we go along (I don't know it well myself). I will be using GCC (this comes with Linux, which is what I am using as well ((Scary!)) and GEdit. That is just Notepad for Linux. So all you need to know to follow this is some basic Linux commands. Or, this can be adapted in C for Windows or whatever Apple operating system you use. I don't even know what Apple operating systems are called and I made an iOS game once! Maybe it is called iOs.. Anyways, let's get started.

Gradient descent is used in machine learning for all types of applications. If you want to know more about it, go google it. I recommend the Coursera course on Machine Learning. For this tutorial, all you need to know about gradient descent, and all we will be implementing functionality-wise, is simple: we will have a set of data points (x,y) and we will find a line that goes through these points the best. Very excitement!

So as an example (and what we will use) let's just create a list of points that semi-follow the math function f(x) = x. If you don't know what f(x) looks like I will tell you: It looks like a 45 degree angle line going through the middle of a plot. Here is a picture of f(x):






f(x) = x


Mini-tutorial: To get Pinta, which is an image editor I used to make this fascinating image, open up the terminal and type: "sudo apt-get install pinta". Then search for "pinta" on the top left in Ubuntu and you can use it. Or google how to get GIMP, which is fancier.

So f(x) = x is just a straight line through the origin. If you haven't gotten the idea yet, when x = 10, f(x) = 10 and so on. If you are not mathed past high school, it is the same as y=x. f(x) is called a function and I am getting side tracked..

With machine learning, the machine (which is the computer) needs data to "train" itself on. We will use a simple amount of data to train our program. We will use data values that are close to points that actually fit f(x) = x. Instead of a data point (2,2) for instance, we will use (2.4, 2.2). The reason I am doing this is so the line won't be exactly f(x) = x. I'm trying to mimic real world cases where the data won't be perfect. Here is the entire data set we will be training on. Notice that each point is a little bit "off" of the actual function we are mimicking.

1.2 1.1
2.4 2.2
3.1 3.3
4.4 4.3
5.3 5.4
6.3 6.7
7.6 7.7
8.7 8.6
9.7 9.8

Data set (x, y)

The left column is our x values and the right column is our y values. Hopefully you can see that this closely resembles f(x) = x. To further explain, these might be measurements you'd make of f(x) in the real world where there is some error. Copy and paste the above data into a text file and save it as "data.dat"

Now if you haven't already set up a file for the C code, go ahead and open a new text file and save it as "GradientDescent.c" Both of these files should be in the same folder or else this won't work. Now we will be redundant and add this data set to our C program. (Note that at this point, you should have a C program that can run. If you are using my set up listed above, you can copy and paste the code I have. Otherwise, hopefully you know how to add this to your own set up and aren't an idiot!) Ok, let's write some code:

//gcc GradientDescent.c -o GradientDescent
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define GNUPLOT "gnuplot -persist"

int main(int argc, char**argv) {
   
    /*training set
    1.2 1.1
    2.4 2.2
    3.1 3.3
    4.4 4.3
    5.3 5.4
    6.3 6.7
    7.6 7.7
    8.7 8.6
    9.7 9.8*/
    float x[] = {1.2, 2.4, 3.1, 4.4, 5.3, 6.3, 7.6, 8.7, 9.7};
    float y[] = {1.1, 2.2, 3.3, 4.3, 5.4, 6.7, 7.7, 8.6, 9.8};

return 0;
}

Explanation: The first line of code is a comment. This is a reminder to me on how to compile this code because I always forget. So if you want to test that everything is working, navigate to the directory in your terminal with the two files you have created and type that commented line. It should create an executable that you can run by typing "./GradientDescent" and hitting enter. It won't do anything yet if you do, though.

The next lines are libraries that we will be using. Probably. One or more might be in there and not used. I don't even know. But keep them there for now until we get this thing working.

The line that says "#DEFINE" is a remnant from whoever's code I was looking at when I adapted this. It will be used later to call the plotting software in Linux to plot our data set and the line we are going to find. If you don't like that it is there, you can replace "GNUPLOT" later on with "gnuplot -persist". This DEFINE command simply changes the last term to the first term when the program compiles. For example, "#DEFINE PI 3.14" will change every term "PI" to 3.14 when it compiles. Google it if it isn't clear.

The next part is a comment. Comments in C can be "//" or "/* */". I put the data set in comments so I could see it while I typed out the next part, which is the most important part we are adding. These are two arrays with our data values. Be careful to see that they are exactly the same as the data in our data.dat file. These could be loaded from the data.dat file, but to keep it simple (and because I would have to look it up), we are being redundant here and typing them twice. If you (or I) can figure out how to load data into arrays, we can do that later and post it in a comment. For now, just leave it be.

Now on to gradient descent. I am going to puke the meat of the code out and explain it. So here it is:

float x[] = {1.2, 2.4, 3.1, 4.4, 5.3, 6.3, 7.6, 8.7, 9.7};    
float y[] = {1.1, 2.2, 3.3, 4.3, 5.4, 6.7, 7.7, 8.6, 9.8};
       
    /*number of examples*/
    int m = 9;

    /*thetas and temp thetas*/
        float theta0, temp0, theta1, temp1;
        theta0 = 0.0; temp0 = 0.0;
        theta1 = 0.0; temp1 = 0.0;
      
    /*# of iterations and learning rate*/
        int iterations = 1000;
        float alpha = 0.03;
  
        int j = 0;
        float h0 = 0.0;
        float h1 = 0.0;
        int i = 0;
        for(i = 0; i < iterations; i++)
        {
      
            h0 = 0.0;
            h1 = 0.0;
            for(j = 0; j<m; j++)
            {
                h0 = h0 + ((theta0 + x[j]*theta1) - y[j]);
                    h1 = h1 + ((theta0 + x[j]*theta1) - y[j])*x[j];
            }     
    temp0 = theta0 - (alpha*h0)/(float)m;
    temp1 = theta1 - (alpha*h1)/(float)m;
    theta0 = temp0;
    theta1 = temp1;
    }
  
        printf("Theta0: %f\n", theta0);
    printf("Theta1: %f\n", theta1);


Explanation: The first line is our array of data points. We just discussed that. Next we define the number of training examples. This is how many data points we have. If my counting is correct, there are nine. We will call it m because I am a filthy and sloppy coder. Also, I think that is the standard variable name for number of examples. The reason why we need this is because the gradient descent algorithm needs to know the amount. That wasn't a great explanation at all. I'll come back to it.

iterations is the variable that will hold how many times we will run gradient descent and alpha is the learning rate. Iterations means that we will loop over the below code that many times to find our thetas. Whoa, back up too many Greek letters.

Above, where it says theta0 and theta1, think of these like the variable x. Remember that we are trying to mimic f(x) = x. So theta0 and theta1 are like variables x. Do you remember the equation for a line? y = mx + b? We are looking for (or creating rather) a line that will me y = (theta0) + (theta1)(x). This will be our line that fits the data points. Gradient descent uses a fancy algorithm (or is a fancy algorithm) to find these values of theta so we can find that line and plot it.

There is a lot more to this and I don't fully understand it all myself. But let's keep going.

There are two loops going on here. The first is the iterations loop. This is super high (1000) and is high because we want to use the computer's abilities to run gradient descent a bunch of times to find a really good fit. Doing this by hand would be out of the question. The second loop is a bit more complicated.

Gradient descent works by minimizing the errors between the data set and the line we want. It involves calculus so I won't go into that here. In fact, this is my first post so I won't go into any of it here. Here is a picture that kind of illustrates what I mean:

Plot for explanation 1

The X's are the data set and the red line is what we are finding. See how it goes through the X's as a best fit? It is determining which line to create that takes into account all of the data. It is correcting itself until it is as close to all of the points as possible.

Take-away: I will post the full code below. Here are some things I know that might be useful if you are trying to implement this:

Alpha is very sensitive and changing it can make gradient descent not work. In this example, the current setting works fine. If you are using your own data set however, you might have to play around with it.

Here is the rest of the code that follows from above:

printf("Theta0: %f\n", theta0);
printf("Theta1: %f\n", theta1);

        FILE *gp;
        gp = popen(GNUPLOT,"w");
        if (gp==NULL)
           {
             printf("Error\n");
             exit(0);
           }
    fprintf(gp, "set xrange [0:12]\n");
    fprintf(gp, "set yrange [0:12]\n");
    fprintf(gp, "plot 'data.dat' using 1:2,");
    fprintf(gp, "%f+%f*x\n",theta0, theta1);
    fclose(gp);

This code opens up the plotter mentioned above and plots the data set as X's and plots the line that is found with gradient descent. fprintf is a print command that is "printing" to the plotter program. If you look at the last fprintf call you can see what I was talking about. It is telling the plotting program to plot a line that is of the form "b+mx" (a straight line). The slope and the y-intercept are the theta values we found. Gradient descent finds both of these for us using the data set. So if you were to evaluate (and you can do this to test it out) say f(9) you could put: 

printf("%f+%f*9\n", theta0, theta1);

This should output close to the value of 9. If you run the program, notice how theta0 is close to 0 and theta1 is close to 1. When these are the values, then theta0 + theta1*x is very close to just being x. Here is the complete code for GradientDescent.c

//gcc GradientDescent.c -o GradientDescent
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define GNUPLOT "gnuplot -persist"

int main(int argc, char**argv) {
   
    /*training set1.2
    1.1
    2.4 2.2
    3.1 3.3
    4.4 4.3
    5.3 5.4
    6.3 6.7
    7.6 7.7
    8.7 8.6
    9.7 9.8*/
    float x[] = {1.2, 2.4, 3.1, 4.4, 5.3, 6.3, 7.6, 8.7, 9.7};   
    float y[] = {1.1, 2.2, 3.3, 4.3, 5.4, 6.7, 7.7, 8.6, 9.8};
       
    /*number of examples*/
    int m = 9;

    /*thetas and temp thetas*/
        float theta0, temp0, theta1, temp1;
        theta0 = 0.0; temp0 = 0.0;
        theta1 = 0.0; temp1 = 0.0;
      
    /*iterations and learning rate HIGHLY VARIABLE*/
        int iterations = 1000;
        float alpha = 0.03;
  
        int j = 0;
        float h0 = 0.0;
        float h1 = 0.0;
        int i = 0;
        for(i = 0; i < iterations; i++)
        {
      
            h0 = 0.0;
        h1 = 0.0;
            for(j = 0; j<m; j++)
            {
                h0 = h0 + ((theta0 + x[j]*theta1) - y[j]);
                    h1 = h1 + ((theta0 + x[j]*theta1) - y[j])*x[j];
            }     
   
    //h0 = h0 / (float)m;
    //h1 = h1 / (float)m;   
        temp0 = theta0 - (alpha*h0)/(float)m;
    temp1 = theta1 - (alpha*h1)/(float)m;
        theta0 = temp0;
    theta1 = temp1;
    }
  
        printf("Theta0: %f\n", theta0);
    printf("Theta1: %f\n", theta1);

        FILE *gp;
        gp = popen(GNUPLOT,"w");
        if (gp==NULL)
           {
             printf("Error\n");
             exit(0);
           }
    fprintf(gp, "set xrange [0:12]\n");
    fprintf(gp, "set yrange [0:12]\n");
    fprintf(gp, "plot 'data.dat' using 1:2,");
    fprintf(gp, "%f+%f*x\n",theta0, theta1);
        fclose(gp);
  
    return 0;
}

Here is a screen shot of what your plot should look like:


Hopefully you can modify the code for your own needs. This is just a simple example to get started. If you are on Linux, you can put the two files in the same folder, compile and run. This should get you started.

9 comments:

  1. on these lines:
    temp0 = theta0 - (alpha*h0)/(float)m;
    temp1 = theta1 - (alpha*h1)/(float)m;
    theta0 = temp0;
    theta1 = temp1;

    wouldn't this be easier?
    theta0 -= (alpha*h0)/(float)m;
    theta1 -= (alpha*h1)/(float)m;

    btw this tutorial is pretty good man keep it up

    ReplyDelete
    Replies
    1. Yes, it would be easier. Good catch. I stored them as temps when I was putting it together and never changed it. And thanks.

      Delete
  2. Do you have a method to have this work on Windows? I'm trying to run this, but I'm having trouble with gnuplot.

    ReplyDelete
    Replies
    1. I don't have a solution for Windows. At least not the plotting part. This will still give you the values of theta and you can make predictions but you're right that GNUplot doesn't work on windows. It all depends on what IDE you are using as well. Have you looked for a plotting plugin for what you are using? What are you using to compile it?

      Delete
    2. I haven't looked for a plotting plugin. I'm using codeblocks, not sure of the specific compiler. I did write the values of theta0 and theta1 to a text file and then plot them in R, however, just to check rate of convergence.

      Delete
    3. I did find a way to plot it but it is in Java (or visual basic if you want to get real dirty). The java code is almost the exact same as the code above and it sounds like you would understand it. It uses a library called JavaPlot that invokes GNUPlot. I'm not super familiar with it but I made a post about it here: http://filthysloppycoder.blogspot.com/2016/06/plotting-gradient-descent-graphically.html It's still a simple example. I'd be curious to see what results you found because I kept the learning rate and iterations the same as when I was testing. I didn't think to change them since it runs fast enough on my laptop anyways. It's been a while since I learn gradient descent so all of the ways to check the rate of convergence I forgot how to do. I just set iterations super high to be safe, not efficient.

      Delete
    4. I found a way to plot with CodeBlocks. The easiest way is to use Codeblocks-EP which is a different version. I am going to write up a tutorial on it. I will paste the link when I am done. It was a pain to figure out, but I have it working and the steps aren't all that hard.

      Delete
    5. Ok, try following this: http://filthysloppycoder.blogspot.com/2016/06/simple-gradient-descent-example-using.html

      Delete
    6. It appears that 1000 iterations was near ideal, as was a learning rate of 0.03. A learning rate of 0.05 with 1000 iterations took a while before it started to converge, and a learning rate of 0.01 required more than 1000 iterations to converge. The 0.01 learning rate did not converge until nearly 5000 iterations.

      Thanks for this and your other tutorial! I had not heard of koolplot before, but it seems very simple and useful.

      Delete