Deriving Pythagoras' theorem using Machine Learning

In an earlier article which introduced machine learning, we had seen a comparison of equations derived from scientific principles vs. those derived via machine learning. In this article, we will see how Dythagoras a citizen of the far away planet Darth in the Dilky Day galaxy uses machine learning to derive a correlation between the length of 3 sides of a right-angle triangle.

Dythagoras - resemblance to Pythagoras not intentional

Unlike humans, the citizens of Darth (Darthians) are not very bright mathematically but are capable of observations and making tools. They had been handed a computer and the Dasmic machine learning library by human astronauts of a SpaceX expedition who had visited them a few months ago.

The citizens of Darth need to shorten the travel time between 2 of their biggest cities of Aville and Cville. Currently the only way to goto Cville from Aville is via Bville that takes several hours. Darthians need to know the direct distance between Aville and Cville so that they can decide if a bridge can be built, but since the direct path has water and filled with deadly predators they cant measure the distance easily. However, they do know that the points of Aville, Bville and Cville form a right angle triangle. The task to find the distance Aville and Cville is assigned to Dythagoras who is among their smartest citizens.

Using his measuring equipment, Dythagoras measures the distance between Aville and Bville to be 450000 meters while between Bville and Cville is 650000 meters. Dythagoras had been very interested in right-angle triangles and has an intuition that there is some correlation between the adjacent, opposite and hypotunese of a right angle triangle. Devoid of any mathematical intellect beyond observational capabilties, Dythagoras will now try to figure out a correlation using the human provided computer and Dasmic machine learning library.

First, Dythagoras draws some right angle triangles in his backyard and takes their measurement. These are discrete observation that will form the training set. Being impatient and inexperienced with machine learning, he draws only 5 triangles and takes the measurement of the 3 sides noting them down in the following table (all units are in meters):

Adjacent	Opposite	Hypotenuse
1.000	1.000	1.414
1.000	2.000	2.236
1.000	3.000	3.162
2.000	2.000	2.828
2.000	3.000	3.606

Next he writes some code in C#.NET to pass this data to the multi-variate linear regression (MVLR) algorithm of Dasmic. Since the data set is small he will use the DataSetCompact class available in Dasmic. The code for the function is shown below (Code in other classes to launch the console app is not shown).

Running the above code results in the equation:
H = 0.518810899066669.X0 + 0.854650469199999.X1 + 0.0427412288666704
where:
H is the predicted value of the Hypotenuse
X0 is Adjacent
X1 is Opposite
and,the last numeric value is the bias term

Dythagoras wants to see how this equation performs in predicting the hypotenuse for the triangles he drew and which was used to train the model. He wants to see the difference in the predicted value vs. the measured (or observed value). He also put this difference as % of the measured value to get an idea how divergent the predicted value is. The sign in the percentage show if the predicted value is more (positive) or less than (negative) than the measured value.

The results shown in the table below surprise Dythagoras. The equation generated by the MVLR algorithm in Dasmic does a fairly good job of predicting the hypotenuse. The maximum % deviation between observed and predicted value is 1.55%.

Adjacent	Opposite	Hypotenuse	Predicted Hypotenuse	% Deviation
1.000	1.000	1.414	1.416	0.141
1.000	2.000	2.236	2.271	1.556
1.000	3.000	3.162	3.126	-1.163
2.000	2.000	2.828	2.790	-1.370
2.000	3.000	3.606	3.644	1.075

To further confirm the result, Dythagoras takes out the Root Mean Square Error (RMSE) of the predicted value. The RMSE value is .0334, which is low and Dythagoras knows the lower the RMSE the better.

Exited and more curious at the same time, Dythagoras spends time to draw two more triangles and see how well the equation above predicts their hypotenuse. The data of these two triangles was not used to derive the equation, so this will be a good test to see how well then equation performs on data it does not have a bias for.

Adjacent	Opposite	Hypotenuse	Predicted Hypotenuse	% Deviation
2.000	4.000	4.472	4.499	0.600
3.000	3.000	4.243	4.163	-1.874

Hmm thinks Dythagoras, not bad but 1.874% is a higher deviation than he had seen before.

Motivated by this result, Dythagoras spends an hour to draw 3 more triangles and extends his original data to have a total of 10 observed measurements, shown below.

Adjacent	Opposite	Hypotenuse
1.000	1.000	1.414
1.000	2.000	2.236
1.000	3.000	3.162
2.000	2.000	2.828
2.000	3.000	3.606
2.000	4.000	4.472
3.000	3.000	4.243
3.000	4.000	5.000
3.000	5.000	5.831
3.000	6.000	6.708

Dythagoras modifies the above code and adds this extra data to the DataSetCompact object and re-runs the MVLR algorithm in Dasmic.

This time he gets the equation:
H = 0.544446143361904.X0 + 0.834771484942861.X1 + 0.0519642054285723
where:
H is the predicted value of the Hypotenuse
X0 is Adjacent
X1 is Opposite
and,the last numeric value is the bias term

Dythagoras again computes difference in the predicted value vs. the measured (or observed value) for all the data observations with the results shown below:

Adjacent	Opposite	Hypotenuse	Predicted Hypotenuse	% Deviation
1.000	1.000	1.414	1.431	1.200
1.000	2.000	2.236	2.266	1.337
1.000	3.000	3.162	3.101	-1.946
2.000	2.000	2.828	2.810	-0.637
2.000	3.000	3.606	3.645	1.099
2.000	4.000	4.472	4.480	0.175
3.000	3.000	4.243	4.190	-1.250
3.000	4.000	5.000	5.024	0.488
3.000	5.000	5.831	5.859	0.484
3.000	6.000	6.708	6.694	-0.213

Hmm...interesting ! The triangle which had 1.874% deviation has reduced to 1.25% but the deviation of a previous triangle has increased from 1.163% to 1.946%. But the deviation of another previous triangle has reduced to 1.337% from 1.556%. The RMSE value of .0337 is slightly worse-off that the earlier value of .0334, but not bad at all since the number of data points has doubled.

Curious, Dythagoras wants to give a shot with more data and see if further improvements in prediction can be made. After spending 2 hours in drawing 5 more triangles and taking their measurements, Dythagoras's final table of all observations look like:

Adjacent	Opposite	Hypotenuse
1.000	1.000	1.414
1.000	2.000	2.236
1.000	3.000	3.162
2.000	2.000	2.828
2.000	3.000	3.606
2.000	4.000	4.472
3.000	3.000	4.243
3.000	4.000	5.000
3.000	5.000	5.831
3.000	6.000	6.708
3.000	7.000	7.616
4.000	4.000	5.657
4.000	5.000	6.403
4.000	6.000	7.211
4.000	7.000	8.062

Dythagoras re-runs the MVLR algorithm of Dasmic with the new data added in the DataSetCompact object.

He gets the equation:
H = 0.547693920329669.X0 + 0.836551018428572.X1 + 0.0450441335494489
where:
H is the predicted value of the Hypotenuse
X0 is Adjacent
X1 is Opposite
and,the last numeric value is the bias term

Dythagoras again computes difference in predicted value vs. observed value for all the data observations, with the following results:

Adjacent	Opposite	Hypotenuse	Predicted Hypotenuse	% Deviation
1.000	1.000	1.414	1.429	1.066
1.000	2.000	2.236	2.266	1.331
1.000	3.000	3.162	3.102	-1.894
2.000	2.000	2.828	2.814	-0.527
2.000	3.000	3.606	3.650	1.235
2.000	4.000	4.472	4.487	0.324
3.000	3.000	4.243	4.198	-1.057
3.000	4.000	5.000	5.034	0.687
3.000	5.000	5.831	5.871	0.685
3.000	6.000	6.708	6.707	-0.012
3.000	7.000	7.616	7.544	-0.943
4.000	4.000	5.657	5.582	-1.323
4.000	5.000	6.403	6.419	0.241
4.000	6.000	7.211	7.255	0.610
4.000	7.000	8.062	8.092	0.365

These results look good ! The worst case deviation is 1.894%, a reduction from 1.946% seen earlier.
The RMSE is .0413, which is higher than seen earlier but then we also have more data observations.
Dythagoras now seems confident that he does not need more observations as adding more data to train the model is only leading to slight changes in RMSE and % deviations.

Before he runs the last equation to compute the distance between Aville and Cville, Dythagoras wants to see if another algorithm can get better results. The bright humans who had given him the computer and the Dasmic library were quite gung-ho on neural networks (NN), so he writes the following code to run a simple NN in Dasmic.

using System;
using Dasmic.MLLib.Common.DataManagement;
using Dasmic.MLLib.Algorithms.Regression;
using Dasmic.MLLib.Algorithms.NeuralNetwork;

namespace DemoAppConsole
{
internal class Pythagoras
{
public void Run_15_Samples_NN()
{
string[] headers = { "Adjacent", "Opposite", "Hypotenuse" };
DataSetCompact dsc = new DataSetCompact(headers, 2, -1);
dsc.AddSingleRow(new double[] { 1, 1, 1.414213562 });
dsc.AddSingleRow(new double[] { 1, 2, 2.236067977 });
dsc.AddSingleRow(new double[] { 1, 3, 3.16227766 });
dsc.AddSingleRow(new double[] { 2, 2, 2.828427125 });
dsc.AddSingleRow(new double[] { 2, 3, 3.605551275 });

dsc.AddSingleRow(new double[] { 2, 4, 4.472135955 });
dsc.AddSingleRow(new double[] { 3, 3, 4.242640687 });
dsc.AddSingleRow(new double[] { 3, 4, 5 });
dsc.AddSingleRow(new double[] { 3, 5, 5.830951895 });
dsc.AddSingleRow(new double[] { 3, 6, 6.708203932 });

dsc.AddSingleRow(new double[] { 3, 7, 7.615773106 });
dsc.AddSingleRow(new double[] { 4, 4, 5.656854249 });
dsc.AddSingleRow(new double[] { 4, 5, 6.403124237 });
dsc.AddSingleRow(new double[] { 4, 6, 7.211102551 });
dsc.AddSingleRow(new double[] { 4, 7, 8.062257748 });

Build2LBackPropagation build = new Build2LBackPropagation();
//Threshold of .0001 and 10000 max iteraation with step size of .001
build.SetParameters(0,.0001, 10000, .001);
//Change the output layer activation function from Sigmoid to Linear
build.SetActivationFunction(1, new Dasmic.MLLib.Algorithms.NeuralNetwork.Support.ActivationFunction.Linear());
Dasmic.MLLib.Common.MLCore.ModelBase model = build.BuildModel(dsc.GetAllDataRows(),
dsc.GetAllAttributeHeaders(), dsc.GetIdxTargetAttribute());

//Print the coeffs
Console.WriteLine(model.ToString());
Console.WriteLine("RMSE:" + model.GetModelRMSE(dsc.GetAllDataRows()));

PrintTable(dsc, model); //Function to write values in HTML
}
}
}

The difference in the predicted value vs. the measured (or observed value) are shown below:

Adjacent	Opposite	Hypotenuse	Predicted Hypotenuse	% Deviation
1.000	1.000	1.414	1.669	18.002
1.000	2.000	2.236	2.276	1.775
1.000	3.000	3.162	3.024	-4.370
2.000	2.000	2.828	2.717	-3.944
2.000	3.000	3.606	3.543	-1.746
2.000	4.000	4.472	4.461	-0.260
3.000	3.000	4.243	4.096	-3.456
3.000	4.000	5.000	5.040	0.804
3.000	5.000	5.831	5.970	2.378
3.000	6.000	6.708	6.818	1.638
3.000	7.000	7.616	7.541	-0.983
4.000	4.000	5.657	5.614	-0.754
4.000	5.000	6.403	6.501	1.527
4.000	6.000	7.211	7.276	0.901
4.000	7.000	8.062	7.912	-1.860

Ouch !!! With a maximum deviation of 18.002% and a RMSE of 0.1155 this specific NN is not suited for this problem. Dythagoras wanted to try some other NN architectures, but he was also confident that the RMSE he gets from MVLR is pretty low and will lead to a good prediction. Also he did not have much time to continue his investigation and needs to find the distance between Aville and Cville,soon. Hence, he goes ahead with the last equation derived from MVLR:

H = 0.547693920329669.X0 + 0.836551018428572.X1 + 0.0450441335494489

Based on this equation the distance between Aville and Cville comes to 790220.471 meters. A happy Dythagoras passed this information to his fellow citizens and instantly becomes a living legend.

For earthlings: the distance between Aville and Cville via the Pythagoras' theorem (which states hypotenuse² = adjacent² + opposite²) is 790569.415 meters. The % deviation between actual and predicted is .044%, which is quite remarkable ! As a fun exercise, try the other two equations derived from MVLR to see what % deviation you get.

Love or hate this article ? Have other ideas ? Please leave a comment below !

(Images source: royalty free pictures from paid account in dreamstime.com)

Search This Blog

Chaitanya Belwal's Blog

Deriving Pythagoras' theorem using Machine Learning

Comments

Post a Comment

Popular posts from this blog

Python: Sockets Programming - Non-blocking Client

Introducing Convolution Neural Networks with a simple architecture

Part III: Backpropagation mechanics for a Convolutional Neural Network