The level of rainfall is a 5-class classification problem. The classes are: very wet, wet, average, dry, very dry. According to weather experts, trying to forecast such classification at a 3 month ahead horizon is a very, very tough task.
Our client is currently using several statistical models, for instance logistic regression. The level of Accuracy obtained is around 40% (on the validation set, which is 30% of the raw records), meaning that 40% of forecast values are correct on the records that were not used for training. This low level of Accuracy is understandable, since these forecasts are prepared 3 months before the targeted months!
With our software NEURALSIGHT we launched the creation of 2000 deep neural networks, taking 60% of the raw records for training, thus keeping 40% of records for the validation process. Types of neural activation function, shape of the network, training method and many other neural model parameters vary from one neural network construction to another one. In the end we obtained a level of Accuracy a bit higher than our client’s current best Accuracy, but not in a significant way.
Then we launched a NEHOOV proprietary genetic algorithm, that build “neural forests”, that is sets of neural networks among our 2000 networks: each set constructs a forecast by using ‘democratic votes”. This genetic algorithm considered only the top 500 neural networks, processed during 300 epochs with a population size of 250, with eugenism level of 85%. Hence the creation of these neural forests is relatively deep.
So, why should we consider only “random forests” in life ? 😉