Sometimes, neural networks and deeplearning is recognized as voodoo magic. In the following blog post I want to bring a bit clearness into the game. At first it is important to ask what is nessary to tranform a neural network into a truing-capable machine. The simplest form of implementing a computer is not a turing-machine itself, but a more basic structure, called logicgates. Logicgates are used in a truthtable and can only have AND, OR, NOT operation. The interesting aspect of logicgates is, that the can do everything what a turing machine can do.
On Youtube there is a video available which shows a primenumber generator build from logicgates only. I would guess, that the diagram was created with a compiler from a highlevel C-program. Most important, the shown logicgate diagram is very huge. There at least 40 logicgates visible, perhaps more.
The reason why I’m explaining the logicgate is because logicgates can be trained like a neural network. They can be seen as a real neural turing machine. Not as an external tape, which is used together with a neural network, and not as an LSTM network, but as a truing-capable computer. So what is necessary to train a logicgate network? I have absolutely no idea but i posted it as a question to stackoverflow https://ai.stackexchange.com/questions/5460/training-of-a-logicgate-network
I would guess, that it is simply not possible to train a logicgate network. In theory perhaps, but in reality the state space is too huge. So we have a truing-capable neural network but no idea how to adjust the weight. And this gives us the answer what the limits of neural networks are. A normal 3 layer neural network is not truing-capapable. Larger deeplearning networks which are based on LSTM neurons and deepmind NTM are perhaps turing-ready, but they are not more powerful than a logicgate network. The problem is, that for normal neural networks, LSTM machines, logicgates or whetever kind of apparatus no efficient learning algorithm is known. That is the real bottleneck.
Learning algorithm like the delta rule, backpropagation or quickprop as not very elegant form of searching in the error diagram for a minimum. They are not able to find a minimal solution. And they won’t even find a simple weight combination for calculating prime numbers. Using a neural network as a turing-machine is a deadend.
But i wouldn’t call the deeplearning movement in general a fail. Only that part of the community who is trying to move neural networks in the direction of a computing device is a failed-project. Even with the fastest nvidia cards this is not possible. Another aspect of deeplearning is in contrast very attractive. That is called big data and means to use neural networks for storing images.
What is the difference? A neural network can be seen twofold: at first as a turing-like device with the aim to search for a program for converting input into output. That is equal to a neural turing machine or to a logicgate network. The other option is to see a neural network as a propabalistic database. Here is the aim to store for example, 10 gb of images and than search the database for similarity. That is a technology that works. It means, that it is possible to use it practical.
I think the deeplearning community should resign from the idea, that their 20 layer network is some kind of trainable computer. For realizing even a simple primenumber algoirthm a way more neurons (logicgates are needed) and training such computer is not possible with current hardware. But what makes sense is to see a neural network as similarity search algorithm for retrieving images. The simplest form is to store 1 Mio dog photos as compressed jpeg file on harddrive, and use a convolutional filter for searching if a given image is similar to one of the photos. If yes, we can label the image with the name “dog”.
This special topic is currently not very well researched and it makes sense to investigate it deeper. It think we should use other vocabulary to make clear what we are doing. Instead of talking about “training a neural network” the aim is to build a similarity search algorithm which has access to an image database.
Neural turing machines
In some papers the so called “Neural turing machine” is presented. Often, these papers are very complicated and contains much mathematical formulas. In reality, a neural turing machine is simply a device which has logicgates and the exact configuration is driven by a learning algorithm. This kind of neural network is called “McCulloch Pitts neuron” and it is truing-capable. It is a trainable turing-machine. But it can’t be used for any purpose. Because it is unclear how to train the logicgates, that means to decide if the gate #23 is an AND gate and if yes which other two neurons are the input signal.
The reason why today no complex logicgates are used in computers, but instead the von-neuman-architecture has to do because on a von-neuman-machine a program can be executed on a tape. And creating such program can be done with higher language. Such programs can be also converted into logicgates, but that is equal to build a different computer for a different algorithm, which is not very useful.
How deeplearning works
Deeplearning has nothing to do with neural networks. Instead the algorithm can be described as a similarity search with Convolutional Neural Networks. That means, the CNN is used as a metric to determine if two images are equal. And the input image is compared with a database of known images. The accuracy is higher, if the database is bigger.
The misunderstanding is, that most tutorials suggests that a neural networks works like a computer, and after training a certain program is found. But in reality, there is a huge difference between a turing-machine and a neural network. Instead it makes sense, to call deeplearning a sort of filter-generation for determine if two images are equal.
Perhaps a small example who OCR works in reality. At first, we need a database of .svg files. The filesize should be 10 gigabyte or more. In that database every kind of characters are stored from every possible font. And now we take a new .svg file. We are searching inside the database for a similar image. The similarity index is calculated with a Convolutional Neural Network, the request goes very fast. If we found an image, we know that on the picture is the character “w”, for example. With deeplearning or neural networks this has nothing to do. Also not with a logicgate or with a neural turing machine. Instead the accuracy depends on two factors:
– size of the database
– similarity filter
It is false to store image data into a neural network. The images can be stored in a normal database. And it also false to search for an algorithm. Instead a given algorithm is used to generate the image filter and this results into the optical character recognition.