Importance of Statistical Significance in Trading System Optimization
Best Practice Trading System Optimization Series - Part 2 of 4
- Martyn Tinsley
- 12 June, 2018
- Real-life practical suggestions to undertake more effective backtesting, leading to more robust trading strategies | improving the statistical significance.
This is part 2 of a series of articles. We highly recommend you read Part 1 first. In part 1 we looked at the importance of statistical significance in all types of back testing, but in particular for walk forward analysis and optimization. We also started to look at how the problems resulting from poor statistical significance for in-sample optimizations, is very different to the problems arising from poor statistical significance in your out-of-sample back tests.
In this second part we will look at what constitutes good and bad statistical significance and also provide real-life practical suggestions of how you can address and improve your statistical significance, meaning you undertake more effective testing, leading in turn to more robust strategies.
Just like in part 1, we need to cover in-sample and out-of-sample separately, because they have fundamental differences.
Statistical Significance in your in-sample optimizations
The most important contributing factors to statistical significance in the in-sample optimizations are:
The sample size (which is the typical number of non-correlated trades generated in your optimization phases), and so this tends to be proportional to the duration of each optimization.
A measure of the variance of trade returns experienced in the optimizations.
The number of independent variables or degrees of freedom in the optimization model (which is effectively the number of variables being simultaneously optimized in the walk forward analysis).
The statistical significance is proportional to a function of the sample size, whilst being inversely proportional to functions of the variance and degrees of freedom.
where n is the sample size, σ2 is the variance of trade returns, and d.f. is the number of degrees of freedom.
In other words, a trader should strive to achieve as many non-correlated trades as possible in each optimization, whilst keeping the degrees of freedom as low as feasible to achieve the results of the test. Variance cannot be controlled as easily and is more an attribute of the trading system.
Statistical Significance in your out-of-sample walk-forward back-tests
The main contributing factors to statistical significance of the walk forward analysis out-of-sample data:
The sample size (which is the typical number of non-correlated trades generated across the cumulative walk forward phases and so is dictated by the total duration of all back tests).
A measure of the variance of trade returns across all of the back-tests compared to the average.
The statistical significance is proportional to a function of the sample size, whilst being inversely proportional to a function of the variance:
where n is the sample size and σ2 is the variance of trade returns.
In other words, a trader should strive to achieve as many non-correlated trades as possible across the cumulative walk-forward back-tests. Variance cannot be controlled as easily and is more an attribute of the trading system in question.
Important Note 1
Make sure you recognise that for the statistical significance of the in-sample phases, you need to measure the sample size IN EACH optimization.
However for the out-of-sample phases in a multi-stage walk forward analysis, you need to measure the cumulative sample size ACROSS ALL walk-forward back-tests. So if you are using 4 stages, this should be the total sample size across all 4 walk-forward back-tests.
Important Note 2
You will notice above we made reference to the number of "non-correlated" trades. What did we mean by this? If your rules dictate that your system has a single entry and exit for each trade, where the full trade size is opened and then closed as a whole, then you don't need to worry about it. Your trades should be uncorrelated already.
However, some traders choose to scale into their trades and scale out of them, where the full position size is entered and exited incrementally. As a methodology there is nothing wrong with this. however it does mean that a single decision of the system might lead to 3, 10, or even more individual trades. These trades by their very nature will be highly correlated, and so each individual trade should not contribute to the sample size. The collective of trades for each decision will each lead to a sample size of approximately 1.
Best Practice MT4 and MT5 EA Optimization Software Walk Forward Pro
Amongst many other features, Walk Forward Pro calculates the statistical significance of both in-sample and out-of-sample phases for you automatically.
Walk Forward Pro helps the user improve the predictive power.of optimizations, leading to more robust systems.Read More
What does this tell us?
1 - Firstly, it is clear that sample size (the number of uncorrelated trades) is critical to achieving good statistical significance. If your optimization periods are short, and generate say just 10 trades on average then do you think this will allow a powerful prediction of the optimal parameters to use in the system. In this example, ‘No’ is the easy answer.
2 - Keeping the degrees of freedom down to a reasonable number is absolutely imperative. I had a client recently who was trying to optimize 14 different variables simultaneously. Even before knowing the sample size this was based on, I suspected instantly he was over-optimizing to an extreme level. Intuitively and from my own experience I knew that the statistical significance would almost certainly be virtually zero. In other words, the chance of identifying the best parameters would be negligible. This is over-fitting to the extreme.
3 - Based on the information above, the more perceptive amongst you will be thinking that it is much easier to get a good level of statistical significance for your out-of-sample walk forward back tests than for your in-sample-optimizations. And generally speaking you’d be absolutely right. What we have to do however is try to balance these out if we are to maximise the effectiveness of our walk forward analysis (something we will cover in part 3).
Keeping the degrees of freedom (parameters being optimized) down to a manageable number is absolutely imperative to avoid over-fitting.
So what does make a statistically significant walk forward optimization?
The actual calculations and the theory that supports them are complex and are beyond the scope of a blog series such as this. However those wanting to delve deeper can use the references at the bottom of the page to undertake further independent research. These are the papers we have used to inform our own thinking and implementation of statistical significance calculations in our software.
However, there are a couple of rules of thumb that I will state here, which can be used as a very rough guideline to inform you about whether your current back testing has anywhere near the appropriate level of statistical significance. These rules of thumb should be used however with caution. Depending on the variance of trade returns, you should be looking for an absolute “minimum” number of trades as follows:
Each In-sample optimization:
Minimum Sample Size = 50 + (50 * Num Variables being optimized)
Cumulative across all out-of-sample walk-forward back-tests:
Minimum Sample Size = 50
However, higher sample sizes will of course provide greater levels of statistical significance and therefore have i) higher predictive power in your in-sample optimizations, and ii) greater confidence in your out-of-sample walk forward tests.
In the next article...
In Part 3 we will consider how to architect your walk forward optimization settings to achieve the right balance between in-sample optimization significance and out-of-sample back test significance, which in turn will mean you can optimize the walk forward analysis process to produce more robust systems. This ‘balancing act’ is another aspect of best practice in walk forward optimizations.
Gregory T. Knofczynski and Daniel Mundfrom (2007) Sample Sizes When Using Multiple Linear Regression for Prediction
Campbell R. Harvey, Yan Liu (2014) Evaluating Trading Strategies
Peter C. Austin, Ewout W. Steyerberg (2015) The number of subjects per variable required in linear regression analyses