Neural Forests
Use case

 
For a client in the climatology field we had for mission to forecast the level of rainfall in a French area at a 3 months ahead horizon. In order to do this our client provided us a monthly database of view dizains of  input fields, representing weather indicators all around the world (atmospheric indices). This database contains a few hundreds of records, from the 1980s to today.

 

The level of rainfall is a 5-class classification problem. The classes are: very wet, wet, average, dry, very dry. According to weather experts, trying to forecast such classification at a 3 month ahead horizon is a very, very tough task.

Our client is currently using several statistical models, for instance logistic regression. The level of Accuracy obtained is around 40% (on the validation set, which is 30% of the raw records), meaning that 40% of forecast values are correct on the records that were not used for training. This low level of Accuracy is understandable, since these forecasts are prepared 3 months before the targeted months!

With our software NEURALSIGHT we launched the creation of 2000 deep neural networks, taking 60% of the raw records for training, thus keeping 40% of records for the validation process. Types of neural activation function, shape of the network, training method and many other neural model parameters vary from one neural network construction to another one. In the end we obtained a level of Accuracy a bit higher than our client’s current best Accuracy, but not in a significant way.

Then we launched a NEHOOV proprietary genetic algorithm, that build “neural forests”, that is sets of neural networks among our 2000 networks: each set constructs a forecast by using ‘democratic votes”. This genetic algorithm considered only the top 500 neural networks, processed during 300 epochs with a population size of 250, with eugenism level of 85%. Hence the creation of these neural forests is relatively deep.

  The result is clear: the best neural forecast, containing 7 neural networks, reached a level of Accuracy of 70% on training + validation sets. The level of Accuracy on the validation set only is around 60%, far above the client’s 40% result! This proves the strength of neural forests.

So, why should we consider only “random forests” in life ? 😉