In this series of articles we will look into the age-old question that algorithmic traders have always struggled to answer:
"How do I know if I have over-fitted my trading system?"
Despite the fact that traders have struggled to answer this question since the dawn of algo trading, over-optimization is probably the number one reason why many traders fail to ever produce a profitable trading system.
The question isn't easy to answer, that's for sure. But we are convinced it all comes down to statistical significance. If you don't have the required level of statistical significance in your back test results, then your results are meaningless and you will be trading a system that will not replicate your back test results in a live account - fact.
So that's the answer right? Well yes... but how do you know if results are statistically significant I hear you ask? Well, read on...
Although many would consider walk forward analysis (WFA) to be the most effective back testing and optimization technique (myself included) it does have its pitfalls if not undertaken using scientific techniques – an approach I will simply term ‘best-practice’ for the purposes of this article.
So what does this best practice entail? First and foremost is the concept of statistical significance, and it is this that we will focus on. The need for any type of back testing or optimization to have statistical significance is absolutely imperative if the system is to deliver results.
The need for any type of back testing or optimization to have statistical significance is absolutely imperative if the system is to deliver results.
Without it, test results are completely meaningless, and you will have wasted several hours of your life undertaking them. In fact, a lack of statistical significance is the number one reason (by far) why traders fail to optimize and back test effectively.
For Walk Forward Analysis in particular, understanding how statistical significance impacts the methodology, is a slightly more complex concept, but in this article (and the next in the series) I aim to fully cover the subject so that you can incorporate the techniques into your own testing.
A quick recap of Walk Forward Analysis before we start
For those who are new to walk forward analysis, the process is considered by many (myself included) to be the 'gold standard' method of backtesting and optimizing a trading system. It revolves around a number of in-sample optimization phases followed by respective out-of-sample back test phases, each covering a different time epoch. The best way to understand is to watch the following 1 minute quick-start we have prepared.
The optimization phases are designed to identify the optimal set of parameters that work best for that particular time epoch (this allows the model to adapt to changing market conditions and is a major advantage of WFA over other testing techniques).
The out-of-sample back test phases are commonly termed ‘walk forward’ phases (since they were not used in the optimization itself) and they are used to validate the optimal parameters that were identified in the previous optimization phase.
The fact that the entire WFA uses multiple stages, covering different time epochs (and therefore different market conditions / price action personalities) is what makes walk forward analysis so effective. It shows the ability of your system to adapt and perform well across many changing market conditions.
It was stated earlier that in order to ensure best-practice, it is imperative that any trading system testing technique is undertaken in a way that is statistically significant. Because WFA is slightly more complex than other techniques (but also with much greater benefits), understanding how statistical significance affects it is slightly more challenging. But let’s clear the mist now, and get into it.
Firstly, the way we need to look at the statistical significance of the in-sample optimization phases is fundamentally different to how we look at the statistical significance of the out-of-sample back tests. Also the consequences of a lack of statistical significance, results in very different issues depending which we are considering.
Statistical Significance of the in-sample optimizations
As stated above, the purpose of the in-sample optimization phases are to identify the optimal parameter values that make the trading system most effective in the market conditions at that time.
When we undertake optimizations without sufficient statistical significance, this has the effect of reducing (or in extreme cases, completely eliminating) the predictive power of selecting the best parameter values from the optimization. When this is the case, the selection of parameters tends to be based more on randomness and chance, than by the effectiveness of the parameters of your trading system. This then leads to poor performing out-of-sample back tests, and poor performance if traded in a real money account. This is what the industry often terms ‘over-optimization’ or ‘over-fitting’.
When there is a lack of statistical significance in the in-sample phases, this has the effect of reducing (or in extreme cases, completely eliminating) the predictive power of selecting the best parameter values from the optimization.
I’ll let you into a secret. It is much easier to over-optimize than you would ever imagine. If I had to guess, I would estimate that 90% - 95% of algorithmic traders are over-optimizing. This is a real shame because it means that trading systems that could work well are probably thrown away by the trader. They don’t have a chance to work, because they are based predominantly on ‘random’ parameter values, that performed well by chance in the optimization, but will, in all likelihood, never perform well again in the future.
Best Practice MT4 and MT5 EA Optimization Software Walk Forward Pro
Amongst many other features, Walk Forward Pro calculates the statistical significance of both in-sample and out-of-sample phases for you automatically.
Walk Forward Pro helps the user improve the predictive power.of optimizations, leading to more robust systems.Read More
Statistical Significance of the out-of-sample walk forward tests
The underlying implications of poor statistical significance in the out-of-sample phases are very different to the in-sample phases. Since the out-of-sample walk forward tests are used to validate the parameter values, when there is a lack of statistical significance here, it means the results of the cumulative walk forward tests can’t be trusted as being a true indication of the performance of your trading system.
When there is a lack of statistical significance in the out-of-sample phases, it means the results of the cumulative walk forward tests can’t be trusted as being a true indication of the performance of your trading system.
There are two extreme scenarios here, but most people only consider the former. That is that the results could be inflated or over-optimistic, and in fact the system would never achieve results to a similar level in a live account.
This is of course true but also, if the statistical significance is really bad, this can also result in poor back-test results, that if given a chance to perform longer term in a live trading account would actually achieve great results. However, in the meantime the trader has probably thrown this (perfectly good) system away.
In the next article...
In Part 2, we will look at the contributing factors that determine statistical significance, and outline the techniques you can use to improve them.