, thanks for the reply and your interested in the subject of neural networks.
Writing neural networks is currently a skill, so every new problem must be throughly analyzed and the according code usually comes from a lot of trial and error (there is actually no out-of-the-box-solves-everything-and-is-efficient-code). The problem you present above is much more complicated than the one in this tutorial. I see it as a problem that requires the the network to quantitatively identify numbers (instead of just working a boolean test like the sample problem). For this reason, the sample code in the tutorial must be adjusted.
That being said, your problem allowed me to study the issue of having a network understand some rudimentary quantification. The following network creator code succesfully creates a network that identifies a number 5 as the first number in a 2-number sequence (after some possible failed attempts, as i also experimented with weight reshufling, and only between the first 5 integer numbers).
Sequences from 1,1 to 5,5 (25 possibilities) make up the statistical universe:
Code: Select all
SetBatchLines, -1
; The code below does a lot of matricial calculations. This is important mostly as a means of organization. We would need far too many loose variables if we did not used matrices, so we are better off using them.
; We start by initializing random numbers into the weight variables (this simulates a first hipotesis of a solution and allows the beggining of the training).
; Since we are planning to have a first layer with 4 neurons that have 3 inputs each and a second layer with 1 neuron that has 4 inputs, we need a total of 16 initial hipothesis (random weights)
Loop 75
{
Random, Weight_%A_Index%, -1.0, 1.0
}
WEIGHTS_1 := Object()
NEXT_NUMBER := 1
Loop 2
{
CURRENT_ROW := A_index
Loop 25
{
NUMBER_TO_USE := NEXT_NUMBER
NEXT_NUMBER++
WEIGHTS_1[CURRENT_ROW, A_Index] := Weight_%NUMBER_TO_USE%
}
}
WEIGHTS_2 := Object()
Loop 25
{
NUMBER_TO_USE := NEXT_NUMBER
NEXT_NUMBER++
WEIGHTS_2[A_index, 1] := Weight_%NUMBER_TO_USE%
}
TRAINING_INPUTS := array([5,1],[1,1],[4,1],[1,4],[1,5],[2,4],[2,3],[3,3],[5,3],[3,4],[3,5],[5,5],[5,2],[2,2],[4,2]) ; 15 out of 25 possible cases are used as training set.
EXPECTED_OUTPUTS := array([1],[0],[0],[0],[0],[0],[0],[0],[1],[0],[0],[1],[1],[0],[0])
; Below we are declaring a number of objects that we will need to hold our matrices.
OUTPUT_LAYER_1 := Object(), OUTPUT_LAYER_2 := Object(), OUTPUT_LAYER_1_DERIVATIVE := Object(), OUTPUT_LAYER_2_DERIVATIVE := Object(), LAYER_1_DELTA := Object(), LAYER_2_DELTA := Object(), OLD_INDEX := 0
Loop 1000 ; This is the training loop (The network creator code). In this loop we recalculate weights to aproximate desired results based on the samples. We will do 5.000 training cycles (care must be taken not to overtrain!).
{
; First, we calculate an output from layer 1. This is done by multiplying the inputs and the weights.
OUTPUT_LAYER_1 := SIGMOID_OF_MATRIX(MULTIPLY_MATRICES(TRAINING_INPUTS, WEIGHTS_1))
; Than we calculate a derivative (rate of change) for the output of layer 1.
OUTPUT_LAYER_1_DERIVATIVE := DERIVATIVE_OF_SIGMOID_OF_MATRIX(OUTPUT_LAYER_1)
; Next, we calculate the outputs of the second layer.
OUTPUT_LAYER_2 := SIGMOID_OF_MATRIX(MULTIPLY_MATRICES(OUTPUT_LAYER_1, WEIGHTS_2))
; And than we also calculate a derivative (rate of change) for the outputs of layer 2.
OUTPUT_LAYER_2_DERIVATIVE := DERIVATIVE_OF_SIGMOID_OF_MATRIX(OUTPUT_LAYER_2)
; Next, we check the errors of layers 2. Since layer 2 is the last, this is just a difference between calculated results and expected results.
LAYER_2_ERROR := DEDUCT_MATRICES(EXPECTED_OUTPUTS, OUTPUT_LAYER_2)
; Now we calculate a delta for layer 2. A delta is a rate of change: how much a change will affect the results.
LAYER_2_DELTA := MULTIPLY_MEMBER_BY_MEMBER(LAYER_2_ERROR, OUTPUT_LAYER_2_DERIVATIVE)
; Than, we transpose the matrix of weights (this is just to allow matricial multiplication, we are just reseting the dimensions of the matrix).
WEIGHTS_2_TRANSPOSED := TRANSPOSE_MATRIX(WEIGHTS_2)
; !! IMPORTANT !!
; So, we multiply (matricial multiplication) the delta (rate of change) of layer 2 and the transposed matrix of weights of layer 2.
; This is what gives us a matrix that represents the error of layer 1 (REMEBER: The error of layer 1 is measured by the rate of change of layer 2).
; It may seem counter-intuitive at first that the error of layer 1 is calculated solely with arguments about layer 2, but you have to interpret this line alongside the line below (just read it).
LAYER_1_ERROR := MULTIPLY_MATRICES(LAYER_2_DELTA, WEIGHTS_2_TRANSPOSED)
;Thus, when we calculate the delta (rate of change) of layer 1, we are finally connecting the layer 2 arguments (by the means of LAYER_1_ERROR) to layer 1 arguments (by the means of layer_1_derivative).
; The rates of change (deltas) are the key to understand multi-layer neural networks. Their calculation answer this: If i change the weights of layer 1 by X, how much will it change layer 2s output?
; This Delta defines the adjustment of the weights of layer 1 a few lines below...
LAYER_1_DELTA := MULTIPLY_MEMBER_BY_MEMBER(LAYER_1_ERROR, OUTPUT_LAYER_1_DERIVATIVE)
; Than, we transpose the matrix of training inputs (this is just to allow matricial multiplication, we are just reseting the dimensions of the matrix to better suit it).
TRAINING_INPUTS_TRANSPOSED := TRANSPOSE_MATRIX(TRAINING_INPUTS)
; Finally, we calculate how much we have to adjust the weights of layer 1. The delta of the Layer 1 versus the inputs we used this time are the key here.
ADJUST_LAYER_1 := MULTIPLY_MATRICES(TRAINING_INPUTS_TRANSPOSED, LAYER_1_DELTA)
; Another matricial transposition to better suit multiplication...
OUTPUT_LAYER_1_TRANSPOSED := TRANSPOSE_MATRIX(OUTPUT_LAYER_1)
; And finally, we also calculate how much we have to adjust the weights of layer 2. The delta of the Layer 2 versus the inputs of layer 2 (which are really the outputs of layer 1) are the key here.
ADJUST_LAYER_2 := MULTIPLY_MATRICES(OUTPUT_LAYER_1_TRANSPOSED,LAYER_2_DELTA)
; And than we adjust the weights to aproximate intended results.
WEIGHTS_1 := ADD_MATRICES(WEIGHTS_1, ADJUST_LAYER_1)
WEIGHTS_2 := ADD_MATRICES(WEIGHTS_2, ADJUST_LAYER_2)
; The conditional below is just to display the current progress in the training loop.
If (A_Index >= OLD_INDEX + 10)
{
TrayTip, Status:, % "TRAINING A NEW NETWORK: " . Round(A_Index / 10, 0) . "`%"
OLD_INDEX := A_Index
}
}
; TESTING OUR OUPUT NETWORK!
; The loop below will evaluate if our calculated network is is accurate enougth to predict all possible cases. If not, the network will be droped and the script reloaded. (This is to avoid losing too much time on bad sets of initial weights...)
RIGHT_ANSWERS := 0
WRONG_ANSWERS := 0
Loop 5
{
NUMBER_1 := A_index
Loop 5
{
NUMBER_2 := A_Index
CASE := Array([NUMBER_1,NUMBER_2])
OUT_1 := SIGMOID_OF_MATRIX(MULTIPLY_MATRICES(CASE, WEIGHTS_1))
OUT_2 := SIGMOID_OF_MATRIX(MULTIPLY_MATRICES(OUT_1, WEIGHTS_2))
If ((NUMBER_1 = 5) AND (OUT_2[1,1] >= 0.5))
{
RIGHT_ANSWERS++
}
Else If ((NUMBER_1 = 5) AND (OUT_2[1,1] < 0.5))
{
WRONG_ANSWERS++
}
Else If ((NUMBER_1 < 5) AND (OUT_2[1,1] < 0.5))
{
RIGHT_ANSWERS++
}
Else If ((NUMBER_1 < 5) AND (OUT_2[1,1] > 0.5))
{
WRONG_ANSWERS++
}
}
}
; Now that we have a perfect network, it's time to prove it's power!
; The code below will apply the network to solve every possible case (25 possibilities) and present the net's conclusions individually.
If (RIGHT_ANSWERS < 25)
{
TrayTip, Status:, % "BAD NETWORK !! " . RIGHT_ANSWERS . "/" . RIGHT_ANSWERS + WRONG_ANSWERS . " answers right. Reloading in 3 secs..."
Sleep 3000
Reload
}
Else
Loop 5
{
NUMBER_1 := A_index
Loop 5
{
NUMBER_2 := A_index
CASE := Array([NUMBER_1,NUMBER_2])
OUT_1 := SIGMOID_OF_MATRIX(MULTIPLY_MATRICES(CASE, WEIGHTS_1))
OUT_2 := SIGMOID_OF_MATRIX(MULTIPLY_MATRICES(OUT_1, WEIGHTS_2))
If (OUT_2[1,1] > 0.5)
{
ANSWER := " is 5"
}
else
{
ANSWER := " is NOT 5"
}
msgbox % "The final network thinks the first number of [" . NUMBER_1 . "," . NUMBER_2 . "]" . ANSWER
}
}
RETURN ; aaaand That's it !! :D The logical part of the ANN code ends here (the results are displayed above). Below are just the bodies of the functions that do the math (matricial multiplication, sigmoid function, etc). But you can have a look at them if you want, i will provide some explanation there too.
; The function below applies a sigmoid function to a single value and returns the results.
Sigmoid(x)
{
return 1 / (1 + exp(-1 * x))
}
Return
; The function below applies the derivative of the sigmoid function to a single value and returns the results.
Derivative(x)
{
Return x * (1 - x)
}
Return
; The function below applies the sigmoid function to all the members of a matrix and returns the results as a new matrix.
SIGMOID_OF_MATRIX(A)
{
RESULT_MATRIX := Object()
Loop % A.MaxIndex()
{
CURRENT_ROW := A_Index
Loop % A[1].MaxIndex()
{
CURRENT_COLUMN := A_Index
RESULT_MATRIX[CURRENT_ROW, CURRENT_COLUMN] := 1 / (1 + exp(-1 * A[CURRENT_ROW, CURRENT_COLUMN]))
}
}
Return RESULT_MATRIX
}
Return
; The function below applies the derivative of the sigmoid function to all the members of a matrix and returns the results as a new matrix.
DERIVATIVE_OF_SIGMOID_OF_MATRIX(A)
{
RESULT_MATRIX := Object()
Loop % A.MaxIndex()
{
CURRENT_ROW := A_Index
Loop % A[1].MaxIndex()
{
CURRENT_COLUMN := A_Index
RESULT_MATRIX[CURRENT_ROW, CURRENT_COLUMN] := A[CURRENT_ROW, CURRENT_COLUMN] * (1 - A[CURRENT_ROW, CURRENT_COLUMN])
}
}
Return RESULT_MATRIX
}
Return
; The function below multiplies the individual members of two matrices with the same coordinates one by one (This is NOT equivalent to matrix multiplication).
MULTIPLY_MEMBER_BY_MEMBER(A,B)
{
If ((A.MaxIndex() != B.MaxIndex()) OR (A[1].MaxIndex() != B[1].MaxIndex()))
{
msgbox, 0x10, Error, You cannot multiply matrices member by member unless both matrices are of the same size!
Return
}
RESULT_MATRIX := Object()
Loop % A.MaxIndex()
{
CURRENT_ROW := A_Index
Loop % A[1].MaxIndex()
{
CURRENT_COLUMN := A_Index
RESULT_MATRIX[CURRENT_ROW, CURRENT_COLUMN] := A[CURRENT_ROW, CURRENT_COLUMN] * B[CURRENT_ROW, CURRENT_COLUMN]
}
}
Return RESULT_MATRIX
}
Return
; The function below transposes a matrix. I.E.: Member[2,1] becomes Member[1,2]. Matrix dimensions ARE affected unless it is a square matrix.
TRANSPOSE_MATRIX(A)
{
TRANSPOSED_MATRIX := Object()
Loop % A.MaxIndex()
{
CURRENT_ROW := A_Index
Loop % A[1].MaxIndex()
{
CURRENT_COLUMN := A_Index
TRANSPOSED_MATRIX[CURRENT_COLUMN, CURRENT_ROW] := A[CURRENT_ROW, CURRENT_COLUMN]
}
}
Return TRANSPOSED_MATRIX
}
Return
; The function below adds a matrix to another.
ADD_MATRICES(A,B)
{
If ((A.MaxIndex() != B.MaxIndex()) OR (A[1].MaxIndex() != B[1].MaxIndex()))
{
msgbox, 0x10, Error, You cannot subtract matrices unless they are of same size! (The number of rows and columns must be equal in both)
Return
}
RESULT_MATRIX := Object()
Loop % A.MaxIndex()
{
CURRENT_ROW := A_Index
Loop % A[1].MaxIndex()
{
CURRENT_COLUMN := A_Index
RESULT_MATRIX[CURRENT_ROW, CURRENT_COLUMN] := A[CURRENT_ROW,CURRENT_COLUMN] + B[CURRENT_ROW,CURRENT_COLUMN]
}
}
Return RESULT_MATRIX
}
Return
; The function below deducts a matrix from another.
DEDUCT_MATRICES(A,B)
{
If ((A.MaxIndex() != B.MaxIndex()) OR (A[1].MaxIndex() != B[1].MaxIndex()))
{
msgbox, 0x10, Error, You cannot subtract matrices unless they are of same size! (The number of rows and columns must be equal in both)
Return
}
RESULT_MATRIX := Object()
Loop % A.MaxIndex()
{
CURRENT_ROW := A_Index
Loop % A[1].MaxIndex()
{
CURRENT_COLUMN := A_Index
RESULT_MATRIX[CURRENT_ROW, CURRENT_COLUMN] := A[CURRENT_ROW,CURRENT_COLUMN] - B[CURRENT_ROW,CURRENT_COLUMN]
}
}
Return RESULT_MATRIX
}
Return
; The function below multiplies two matrices according to matrix multiplication rules.
MULTIPLY_MATRICES(A,B)
{
If (A[1].MaxIndex() != B.MaxIndex())
{
msgbox, 0x10, Error, Number of Columns in the first matrix must be equal to the number of rows in the second matrix.
Return
}
RESULT_MATRIX := Object()
Loop % A.MaxIndex() ; Rows of A
{
CURRENT_ROW := A_Index
Loop % B[1].MaxIndex() ; Cols of B
{
CURRENT_COLUMN := A_Index
RESULT_MATRIX[CURRENT_ROW, CURRENT_COLUMN] := 0
Loop % A[1].MaxIndex()
{
RESULT_MATRIX[CURRENT_ROW, CURRENT_COLUMN] += A[CURRENT_ROW, A_Index] * B[A_Index, CURRENT_COLUMN]
}
}
}
Return RESULT_MATRIX
}
Return
; The function below does a single step in matrix multiplication (THIS IS NOT USED HERE).
MATRIX_ROW_TIMES_COLUMN_MULTIPLY(A,B,RowA)
{
If (A[RowA].MaxIndex() != B.MaxIndex())
{
msgbox, 0x10, Error, Number of Columns in the first matrix must be equal to the number of rows in the second matrix.
Return
}
Result := 0
Loop % A[RowA].MaxIndex()
{
Result += A[RowA, A_index] * B[A_Index, 1]
}
Return Result
}
I consider the code above a mere first step towards making a network that solves the problems you proposed (quantitatively identifying the first number of a 3-number sequence). The nerwork is a simple 2-layer network (25 neurons in first layer, 1 neuron in second).