A gentle introduction to Machine Learning and Artificial Intelligence



Artificial Intelligence (AI) deals with the capability of computers to predict the result of some phenomena based on past or current information. Machine learning (ML) is the group of algorithms and mathematical equations that make AI possible. The relation between the two can be summed up as: Artificial Intelligence is the practical and useful (to humans) application of Machine Learning. The foundations of ML are set in Mathematics, more specifically Statistics.

Consider this made-up equation:

M = 0.3 x V1 + 0.41 x V2 + 0.35 x V3 + 0.37 x V4 + 0.5 x V5

This equation is computing the value of the variable M. Now let's say that this equation predicts your mood tomorrow. If M > 2, your mood be good else it will be sour. 



The variables V1 through V5  are defined as:

V1: A value for weather today (1 for Rainy, 2 for Cloudy, 3 for Sunny)
V2: If you exercised today (0 for no exercise, 1 for exercise)
V3: If you watched a good movie today (0 for no good movie, 1 for good movie)
V4: If you had a headache today (0 for headache, 1 for no headache)
V5: Result of your stock market trade today (1 for profit, 0 if no trading/gain/loss , -1 for loss)

Let us predict your mood when today was a cloudy day but you still exercised, ended the day with a movie, had no headache and made some money in the stock market. Putting corresponding values in our equation:

M = 0.3 x 2 + 0.41 x 1 + 0.35 x 1 + 0.37 x 1 + 0.5 x 1 =  2.23

Since M > 2, you are likely to be in a good mood tomorrow.

This equation, albeit simple, demonstrates some important concepts of AI and  ML which I highlight  below:


Machine Learning


The method used to get the values which are multiplied by V1 through V5. In this case these values (also called weights) can be predicted by the multi-variate regression method.

Artificial Intelligence
The application of predicting your mood. If I were to market it with a jazzy name I will call it the ‘Artificial Intelligence Mood Predictor’. Machine Learning used with training data result in Artificial Intelligence.

Model
The linear equation M = 0.3 x V+ 0.41 x V2 + 0.35 x V3 + 0.37 x V4 + 0.5 x V5..

Target Variable
The variable M.

Features or Input variables
Variables V1V2V3V4,V5.

Training data
Several past known values of V1, V2,V3, V4,V5 and M. The more values you know the better the ML algorithm will do in figuring out the values of the weights.

Hypothesis
Same as Model.

Hypothesis space
List of all possible models. Each combination of weight values is a different model, but their accuracy for prediction will be different. The goal of a ML algorithm is to find the best possible hypothesis for the given training data. Here are some possible hypothesis for our given problem:

M = 0.2 x V1 + 0.4 x V2 + 0.6 x V3 + 0.55 x V4 + 0.2 x V5
M = 0.35 x V1 + 0.3 x V2 + 0.12 x V3 + 0.7 x V4 + 0.1 x V5
M = 0.5 x V1 +0.16 x V2 + 0.45 x V3 + 0.37 x V4 + 0.43 x V5

Discrete Variable
Type of variable whose values can be selected from a finite number of values. Generally any variable whose value is a string type or an integer type is a discrete variable.

Continuous Variable
Type of variable whose values can be selected from an infinite number of values. Generally any variable whose value is of a real type  is a continuous variable.

Classification Problem
In our equation, M can have values between 0 and 2.53, however using a comparison operator after computing the values of M we have reduced M to have only 2 categories (good or sour mood). Hence, the problem is a classification problem as it ‘classifies’ the mood into two categories. As a general rule, if the target variable is of a discrete type then the problem is a classification problem.

Regression Problem
If we just use the value of M (between 0 and 2.53), then this problem is a regression problem.  Note however, the values of V1 through V5 are fixed, based on which M will have only a finite number of possible values between 0 to 2.53. True regression problems can have infinite number of values, only possible when some of the feature variables are have infinite possible values. As a general rule, if the target variable is of a continuous type then the problem is a classification problem.

What AI and ML provide is not something new. Throughout the ages we humans have been doing predictions to better understand the world we live in. Every scientific and mathematical equation is a prediction that tells us the value of a target, based on some input features. 


For example say you want to remodel your rectangular shaped kitchen and put new tiles. Each tile is 12” x 12” in dimension. You need to predict the numbers of the tiles that will be sufficient to remodel you kitchen. 




Based on the work done by ancient mathematicians this problem statement can be given as:
  
= L X B

Where N is the number of tiles and while L and B are the length and breath dimensions of the Kitchen (in feet). N is the target variable and L and B are feature variables. All variables are continuous. Hence, if L=15 and B=20 the predicted value of N is:

N = 15 x 20 = 300.

Consider another equation:

P = nRT V

Here qw want to find out the pressure of a gas (P), based on Temperature (T) and Volume (V), n is number of moles of the Gas and R is the ideal gas constant.



If you are filling gas in a container with volume V that can sustain 100 psi of pressure and want to know if it is safe to fill the gas you will run this equation. Based on the value of P (regression problem), you will classify if the container will explode (if P > 100psi) or not (if ≤ 100psi). This equation is the famous Ideal Gas law which is used to predict the behavior of ideal gases.

Both the  equations above, were not derived through machine learning but through the tireless efforts of several mathematicians and scientists across several centuries. The formula for area was developed by ancient mathematicians while the Ideal Gas law was the result of works of Robert Boyle, Jacques A. C. Charles, and Joseph Gay‐Lussac in the 17th century (Ref.).

Can both these equations, and others, be derived through machine learning ? The answer is yes ! In the first case by taking measurements of several kitchens, and then placing tiles in them we build up our training data to can train a model to predict the values of N. For the Ideal Gas equation, by taking several measurements with different gases in several containers, at different temperatures then measuring the value of target variable P, we can build some training data that can be used to predict values of P based on some value of variable n, T and V. Note that the equations that will come out of a ML algorithm will look much different than the ones we used. In another blog, I will demonstrate how we derive the Pythagoras theorem using ML.

Needless to say, building the training data for these two cases will be a very time consuming and laborious process.  On top of that, we have to account for another problem which impacts all AI applications.

The Accuracy of Prediction

The predicted value given by ML derived equations cannot be guaranteed for accuracy. Generally each predicted value comes with a probability score showing the confidence level of the value.
In the mood predictor equation, for example, we can never say for sure that each time M > 2, the mood will be good. There will be instances when M > 2, the mood will be sour. We can attach a probability score saying that if M > 2, the mood will be good with a guarantee of 90%. Same will be the case when we use ML to predict values of number of tiles or gas pressure. Since the equations are  regression values we can say that the predicted value will be off the actual value within a tolerance range of ±10%.

That a given range of tolerance is acceptable is determined by the specific application where the equation is applied. For example, when you are filling a tank with nitrogen gas and need to maximize the capacity, you will want to 100% sure on that volume at which pressure will exceed the rating of the tank. A ±10% tolerance can lead to excess pressure leading to a tank explosion which can injure or kill humans. For other cases, like predicting your mood a wrong prediction does not lead to serious consequences and AI can be used.

Increasing the accuracy of the predicted value is a challenge in ML and is one of the main tasks data scientists do by trying different ML models and shuffling training data.  Some models are able to achieve 99.99% accuracy for all data, but never 100%.

The fundamental difference between causation and correlation is the cause of this problem. ML models work by finding correlations between feature variables and the target variables. If a correlation is found and validated to exist all the time, scientists can take over and find if there is a valid causation behind the observed phenomena of correlation. To determine the causation, new scientific principles are found and a mathematical equation is derived which is based on solid evidence and backed by a rigorous mathematical proof.

Hence, while ML models can give a reasonably accurate answer, they are not a replacement for formally derived mathematical equations which are backed by formal proofs and require tireless efforts by scientists and mathematicians across multiple generations.

Due to the effort involved, generating formal proofs is a very expensive activity and may depend on scientific principles which are yet to be discovered. Using ML models offer a much cheaper alternative, which allows us to model all observed phenomena (which can be expressed in terms of numbers) in any field of work. There is sacrifice made in terms of guaranteed accuracy, but in several applications that is an acceptable trade off relative to the benefits of AI and ML.

Love or hate this article ? Have other ideas ? Please leave a comment below !


(Images source: royalty free pictures from paid account in dreamstime.com)

Comments

Popular posts from this blog

Part I: Backpropagation mechanics for a Convolutional Neural Network

Introducing Convolution Neural Networks with a simple architecture

Part III: Backpropagation mechanics for a Convolutional Neural Network