Saturday, June 18, 2016

Improving the fit of gradient descent using better functions

This is a continuation of my previous post simple gradient descent example using Codeblocks-EP in C language

Here is an example of trying to fit a linear function "y = theta1*x + theta0" to a data set that mimics
y = x^2

The data set:

float x[] = {1,2,3,4,5,6,7,8,9,10};
float y[] = {1,4,9,16,25,36,49,64,81,100};

As you can see, the training set suggests that the data should follow y=x^2

However, when we plot it using a linear function, the plot is wildly inaccurate for most values of x. It is a terrible fit:





Not good. But now let's change our hypothesis function to a quadratic function and see if we get a better fit. The code as of now is currently:

for(j = 0; j<m; j++)
            {
                h0 = h0 + ((theta0 + x[j]*theta1) - y[j]);
                h1 = h1 + ((theta0 + x[j]*theta1) - y[j])*x[j];
            }

Here we are going through all of our training examples and updating them using a linear function. By adding another x[j] term, we can hopefully get a quadratic fit that more closely fits the data. So we will replace it with:

for(j = 0; j<m; j++)
            {
                h0 = h0 + ((theta0 + x[j]*x[j]*theta1) - y[j]);
                h1 = h1 + ((theta0 + x[j]*x[j]*theta1) - y[j])*x[j];
            }

Now let's see the plot and values for theta0 and theta1:





A much better fit. Note that the learning rate had to be changed as well as the iterations. We went from:

int iterations = 1000;
float alpha = 0.03;

to an increasre in iterations and a decrease in the learning rate (at this value, it fails to converge)

int iterations = 10000;
float alpha = 0.003;

Here is the full code:

#include "koolplot.h"

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(int argc, char**argv) {

    /*training set1.2
    1.1
    2.4 2.2
    3.1 3.3
    4.4 4.3
    5.3 5.4
    6.3 6.7
    7.6 7.7
    8.7 8.6
    9.7 9.8*/
    float x[] = {1,2,3,4,5,6,7,8,9,10};
    float y[] = {1,4,9,16,25,36,49,64,81,100};

    /*number of examples*/
    int m = 10;

    /*thetas and temp thetas*/
        float theta0, temp0, theta1, temp1;
        theta0 = 0.0; temp0 = 0.0;
        theta1 = 0.0; temp1 = 0.0;

    /*iterations and learning rate HIGHLY VARIABLE*/
        int iterations = 1000;
        float alpha = 0.03;

        int j = 0;
        float h0 = 0.0;
        float h1 = 0.0;
        int i = 0;
        for(i = 0; i < iterations; i++)
        {

            h0 = 0.0;
        h1 = 0.0;
            for(j = 0; j<m; j++)
            {
                h0 = h0 + ((theta0 + x[j]*x[j]*theta1) - y[j]);
                h1 = h1 + ((theta0 + x[j]*x[j]*theta1) - y[j])*x[j];
            }

    //h0 = h0 / (float)m;
    //h1 = h1 / (float)m;
    temp0 = theta0 - (alpha*h0)/(float)m;
    temp1 = theta1 - (alpha*h1)/(float)m;
    theta0 = temp0;
    theta1 = temp1;
    }

    printf("Theta0: %f\n", theta0);
    printf("Theta1: %f\n", theta1);

    Plotdata xx(0.0, 10.0), yy = theta0 + xx*xx*theta1;
    plot(xx, yy);

    return 0;
}











No comments:

Post a Comment