Importance of Statistical Significance in Trading System Optimization
Best Practice Trading System Optimization Series  Part 2 of 4
 Martyn Tinsley
 /
 12 June, 2018
 /

Abstract:
 Reallife practical suggestions to undertake more effective backtesting, leading in turn to more robust strategies, by improving the statistical significance of your approach
This is part 2 of a series of articles. We highly recommend you read Part 1 first. In part 1 we looked at the importance of statistical significance in all types of back testing, but in particular for walk forward analysis and optimization. We also started to look at how the problems resulting from poor statistical significance for insample optimizations, is very different to the problems arising from poor statistical significance in your outofsample back tests.
In this second part we will look at what constitutes good and bad statistical significance and also provide reallife practical suggestions of how you can address and improve your statistical significance, meaning you undertake more effective testing, leading in turn to more robust strategies.
Just like in part 1, we need to cover insample and outofsample separately, because they have fundamental differences.
Statistical Significance in your insample optimizations
The most important contributing factors to statistical significance in the insample optimizations are:

The sample size (which is the typical number of noncorrelated trades generated in your optimization phases), and so this tends to be proportional to the duration of each optimization.

A measure of the variance of trade returns experienced in the optimizations.

The number of independent variables or degrees of freedom in the optimization model (which is effectively the number of variables being simultaneously optimized in the walk forward analysis).
The statistical significance is proportional to a function of the sample size, whilst being inversely proportional to functions of the variance and degrees of freedom.
where n is the sample size, σ^{2} is the variance of trade returns, and d.f. is the number of degrees of freedom.
In other words, a trader should strive to achieve as many noncorrelated trades as possible in each optimization, whilst keeping the degrees of freedom as low as feasible to achieve the results of the test. Variance cannot be controlled as easily and is more an attribute of the trading system.
Statistical Significance in your outofsample walkforward backtests
The main contributing factors to statistical significance of the walk forward analysis outofsample data:

The sample size (which is the typical number of noncorrelated trades generated across the cumulative walk forward phases and so is dictated by the total duration of all back tests).

A measure of the variance of trade returns across all of the backtests compared to the average.
The statistical significance is proportional to a function of the sample size, whilst being inversely proportional to a function of the variance:
where n is the sample size and σ^{2} is the variance of trade returns.
In other words, a trader should strive to achieve as many noncorrelated trades as possible across the cumulative walkforward backtests. Variance cannot be controlled as easily and is more an attribute of the trading system in question.
Important Note 1
Make sure you recognise that for the statistical significance of the insample phases, you need to measure the sample size IN EACH optimization.
However for the outofsample phases in a multistage walk forward analysis, you need to measure the cumulative sample size ACROSS ALL walkforward backtests. So if you are using 4 stages, this should be the total sample size across all 4 walkforward backtests.
Important Note 2
You will notice above we made reference to the number of "noncorrelated" trades. What did we mean by this? If your rules dictate that your system has a single entry and exit for each trade, where the full trade size is opened and then closed as a whole, then you don't need to worry about it. Your trades should be uncorrelated already.
However, some traders choose to scale into their trades and scale out of them, where the full position size is entered and exited incrementally. As a methodology there is nothing wrong with this. however it does mean that a single decision of the system might lead to 3, 10, or even more individual trades. These trades by their very nature will be highly correlated, and so each individual trade should not contribute to the sample size. The collective of trades for each decision will each lead to a sample size of approximately 1.
Best Practice MT4 and MT5 EA Optimization Software Walk Forward Pro
Amongst many other features, Walk Forward Pro calculates the statistical significance of both insample and outofsample phases for you automatically.
Walk Forward Pro helps the user improve the predictive power.of optimizations, leading to more robust systems.
Read MoreWhat does this tell us?
1  Firstly, it is clear that sample size (the number of uncorrelated trades) is critical to achieving good statistical significance. If your optimization periods are short, and generate say just 10 trades on average then do you think this will allow a powerful prediction of the optimal parameters to use in the system. In this example, ‘No’ is the easy answer.
2  Keeping the degrees of freedom down to a reasonable number is absolutely imperative. I had a client recently who was trying to optimize 14 different variables simultaneously. Even before knowing the sample size this was based on, I suspected instantly he was overoptimizing to an extreme level. Intuitively and from my own experience I knew that the statistical significance would almost certainly be virtually zero. In other words, the chance of identifying the best parameters would be negligible. This is overfitting to the extreme.
3  Based on the information above, the more perceptive amongst you will be thinking that it is much easier to get a good level of statistical significance for your outofsample walk forward back tests than for your insampleoptimizations. And generally speaking you’d be absolutely right. What we have to do however is try to balance these out if we are to maximise the effectiveness of our walk forward analysis (something we will cover in part 3).
Keeping the degrees of freedom (parameters being optimized) down to a manageable number is absolutely imperative to avoid overfitting.
So what does make a statistically significant walk forward optimization?
The actual calculations and the theory that supports them are complex and are beyond the scope of a blog series such as this. However those wanting to delve deeper can use the references at the bottom of the page to undertake further independent research. These are the papers we have used to inform our own thinking and implementation of statistical significance calculations in our software.
However, there are a couple of rules of thumb that I will state here, which can be used as a very rough guideline to inform you about whether your current back testing has anywhere near the appropriate level of statistical significance. These rules of thumb should be used however with caution. Depending on the variance of trade returns, you should be looking for an absolute “minimum” number of trades as follows:
Each Insample optimization:
Minimum Sample Size = 50 + (50 * Num Variables being optimized)
Cumulative across all outofsample walkforward backtests:
Minimum Sample Size = 50
However, higher sample sizes will of course provide greater levels of statistical significance and therefore have i) higher predictive power in your insample optimizations, and ii) greater confidence in your outofsample walk forward tests.
In the next article...
In Part 3 we will consider how to architect your walk forward optimization settings to achieve the right balance between insample optimization significance and outofsample back test significance, which in turn will mean you can optimize the walk forward analysis process to produce more robust systems. This ‘balancing act’ is another aspect of best practice in walk forward optimizations.
References
Gregory T. Knofczynski and Daniel Mundfrom (2007) Sample Sizes When Using Multiple Linear Regression for Prediction
Campbell R. Harvey, Yan Liu (2014) Evaluating Trading Strategies
Peter C. Austin, Ewout W. Steyerberg (2015) The number of subjects per variable required in linear regression analyses