StepbyStep Best Practice Trading System Optimization Example
Best Practice Trading System Optimization Series  Part 4 of 4
 Martyn Tinsley
 /
 12 June, 2018
 /

Abstract:
 A reallife case study of bestpractice trading system optimization. Using machine learning and statistical significance to improve trading system performance
Introduction
This article is part 4 of a series. It is written in a way to be selfcontained and read without needing to read parts 13. However, if the reader is struggling to understand some of the concepts and rationale, we would recommend reading Part 1, Part 2 and Part 3
The article provides a reallife case study that shows the stepbystep approach we take at Trade Like A Machine to undertake bestpractice backtesting and optimization of our trading systems. It illustrates how we i) Improve the effectiveness of our trading systems significantly, ii) Ensure that the results have a grounding in statistical significance (i.e. that they can be relied on). In other words, so that we know they will deliver results in our live account that are comparable to those we see in back testing.
The article will walk through an example of the process using real data, on which occasion we managed to achieve an 8x improvement in the effectiveness of one of our trading systems (as measured by our reward/risk measure), by improving the statistical significance of the test (achieved by actually 'simplifying' the back test settings)
The article will walk through an example of how we managed to achieve an 8x improvement in the effectiveness of one of our trading systems, by improving the statistical significance of the backtest.
Note that the learning we have gained from our research into best practice backtesting and optimization, over many years, has been built into our own testing / optimization software product, Walk Forward Pro. This is a commercially available product and if you wish to read more about it you can do so here. In this article we will therefore show how we undertake the optimization process using Walk Forward Pro, and will show several screens from the application. However the concepts could also be used without Walk Forward Pro if you have the necessary skills.
Quick overview of the Walk Forward Pro software
In order to understand the basis of this case study, it is first necessary to have a brief explanation of what the Walk Forward Pro software does.
Walk Forward Pro uses ‘Multistage Walk Forward Optimization’ to ensure trading systems are optimized in the most effective way possible. For more information on WFO see here.
Version 2 of Walk Forward Pro released in 2017, introduced the Machine Learning Module (artificial intelligence) to suggest the optimal settings for the WFO process based on the characteristics of the trading system being tested. We call this combination of Walk Forward Optimization and Machine Learning ‘MLO’ (Machine Learning Optimization).
Although ‘Walk Forward Optimization’ (WFO) is widely considered to be one of the best techniques available to test and optimize trading systems (if not the best technique), the effectiveness of the process is often reduced when a trader chooses inappropriate settings for the process. This leaves the trader with a system that does not reach its full potential, or even worse, will result in an overoptimized system that has little or no chance of producing consistent profits in a realmoney account.
Although ‘Walk Forward Optimization’ is widely considered to be one of the best techniques available to test and optimize trading systems, the effectiveness of the process is often reduced when a trader chooses inappropriate settings for the process.
The Machine Learning Module within Walk Forward Pro is designed to help traders avoid these problems and to achieve better and more robust results from their trading systems. Under the bonnet of the Machine Learning Module is a Neural Network. This provides assistance to the user by suggesting the settings that are most likely to produce an optimal walk forward optimization, based on ensuring statistical significance. As an example, it specifically addresses the following questions:

How many variables can be simultaneously optimized without overoptimizing the system? (sometimes termed ‘overfitting’)

How much price data is needed to achieve statistically significant test results?

How many stages should be used in the WFO based on the characteristics of my trading system?

What ‘optimization to walk forward ratio’ should be used, based on the characteristics of my trading system?
The complexity for the trader arises because these questions are interrelated and a decision made in relation to one of the questions, will affect the optimal settings needed for the others.
Putting it into Practice  Running the Initial Walk Forward Optimization
Before the Machine Learning Algorithms in Walk Forward Pro can provide any intelligence about what the optimal settings might be, it is necessary to run a ‘standard’ test using settings that the trader thinks might be most effective. This standard test produces the information that the Machine Learning Algos need in order to then provide suggested settings that are likely to achieve better results.
Generally, when running a standard test, we tend to start with an ‘Optimization to Walk Forward Ratio’ of 3 or 4 (which seem to be values recommended by many users of WFO), and either 5 or 6 stages to prove how the system can adapt to changing market conditions. In this case we have chosen 6 stages, as you can see in the Walk Forward Pro setup screen below. Note also how in this study we will use AUDUSD and initially use 5 years of price data for the test. You can also see from the setting screen that we have chosen to use CAGR/MaxDD as our measure of Reward/Risk for the selection of parameters during the optimizations.
The second area that has been highlighted in the screen above, shows the parameters we have chosen to optimize. In this case we have chosen 5 parameters, each taking just 3 values. As can be seen this produces 243 parameter combinations (3 x 3 x 3 x 3 x 3).
The results of this ‘Standard’ Test
Following this initial test, Walk Forward Pro provides a wealth of information to the trader to help determine the performance of the trading system. The first results screen we will look at is what’s called the ‘Cumulative Walk Forward Equity Chart’. This shows the overall performance of all 6 walk forward phases (remember we used 6 stages) against the ‘outofsample’ data sets. So this is the performance of the system against data that was not used in the optimizations and has not been ‘seen’ before by the system. This outofsample performance therefore provides us with our validation that the optimization of parameters has worked well (or otherwise).
At the top of the screen we can see a table of metrics for each walk forward phase, with the values of the parameters that were chosen from the optimization phases. Under this is the equity chart with each of the 6 stages shown in a different colour.
Straight away we can see that the results appear to be less than impressive, with the system seemingly breaking even, over the outofsample time period.
But let’s move on and take a look at additional information that is available to us. The following section of the screen in Walk Forward Pro provides quantitative metrics for the outofsample tests such as the profit factor, Calmar Ratio, Sharpe Ratio, CAGR (Compound Annualized Growth Rate), Maximum Drawdown etc. It also provides a breakdown of trades e.g. by long and short, to help identify any differences in performance.
As stated previously, the results achieved from the test are not very impressive. The system appears to ‘breakeven’ at best. At this point we have to ask ourselves this very important question:
"Is this because the system doesn’t work, or is it because we have not chosen suitable WFO settings for the test?"
This is where the Machine Learning Module of Walk Forward Pro really excels, and provides the intelligence we need to answer this question. The following screen shows the results from the Machine Learning Module:
As can be seen above, there are three separate factors that are measured to help the trader identify problems with the WFO settings. These are as follows:
1. Optimization Predictive Power
If traders are to produce robust systems that perform well in real accounts, it is imperative that there is sufficient predictive power from the insample optimization phases. To ensure this is the case, the optimizations must be undertaken with adequate 'statistical significance'.
When there is a lack of statistical significance associated to the optimization phases, this reduces, or in extreme cases completely eliminates the predictive power of selecting the best parameter values. The selection of parameter values tends to be made based more on ‘randomness and chance’, than on the effectiveness of the actual parameter values chosen. This then leads to poor performance in both outofsample walk forward tests, and also if traded in real money accounts. This is what the industry often calls 'overoptimization' or 'overfitting', and is the number one reason why many traders fail to ever produce systems that perform well in live accounts.
When there is a lack of statistical significance associated to the optimization phases, this reduces, or in extreme cases completely eliminates the predictive power of selecting the best parameter values
An ‘Optimization Predictive Power Score’ of below 1 means it is likely that the parameter values were chosen based more on ‘randomness and chance’, than on the effectiveness of the actual parameter values. The trader should attempt to get this score as high as possible, in order to improve the effectiveness of the optimization phases.
2. Walk Forward Confidence
The second metric that is measured is called the ‘Walk Forward Confidence’. If traders are to produce robust systems that perform well in real accounts, it is imperative that the results achieved from the outofsample walk forward phases also have 'Statistical Significance'.
When there is a lack of statistical significance associated to the walk forward phases, this means the results of the cumulative walk forward tests cannot be trusted as being a true indication of the performance of your trading system.
Note that this ‘Walk Forward Confidence Score’ is calculated, and solely based on analysis of the walk forward phases, and so gives a level of confidence in the results displayed on the ‘OutofSample Walk Forward’ screen. This score does not reflect the significance of the insample optimizations or the parameter selection robustness (for this, see the 'Optimization Predictive Power Score' above).
When there is a lack of statistical significance associated to the walk forward phases, this means the results of the cumulative walk forward tests cannot be trusted as being a true indication of the performance of your trading system
A 'Walk Forward Confidence Score' below 1 means that the results achieved in the outofsample walk forward test will almost certainly not be representative of the results that would be experienced when trading the system in a live account. The trader should attempt to get this score as high as possible, in order to have confidence in the results.
3. Adaptability Score
One of the reasons that the ‘Walk Forward Optimization’ technique can be so effective is that it allows the optimizations and associated walk forward phases to be based on market conditions at that time. It allows systems therefore to adapt to changing markets by selecting parameters matched to those recent conditions.
The Adaptability Score provides a measure of the degree to which your walk forward optimization settings take advantage of this capability i.e. it measures how well your test settings prove that your system is capable (or otherwise) of adapting, when market conditions change.
Now that we understand what the three metrics on the Machine Learning tab represent, this screen starts to provide the answer to the question we asked previously:
"Does the trading system produce poor test results because the system doesn’t work, or is it because we have not chosen suitable WFO settings for the test?"
The highlighted section of the screen below, shows where the problem might be. Major issues with the ‘Optimization Predictive Power’ have been identified.
Based on the description above of what the ‘Optimization Predictive Power’ measures, we are therefore in a position with our test, where a lack of statistical significance in the insample phases means that the selection of the ‘best’ parameter values has probably been made based more on ‘randomness and chance’, than on the effectiveness of the actual parameter values chosen.
Basically, we have “OVEROPTIMIZED” our system!
Scrolling down the screen, Walk Forward Pro provides a detailed explanation of the details of this to the user.
Scrolling further down the screen gets to the really interesting part. This is where we can see what the Machine Learning Algorithms are suggested might be better settings to use in order to increase the statistical significance of the insample stages, enabling the optimization to select more effective parameter values.
Three Machine Learning options are presented to the user. We usually recommend starting off with Option 1. This uses the same amount of market price data as the initial test and so it provides a good likeforlike comparison of any improvements that are realised.
As can be seen for Option 1, in order to produce an ‘ACCEPTABLE’ level of ‘Optimization Predictive Power’ (previously this was classified as ‘POOR’), it is necessary to:

Reduce the number of parameters we are optimizing from 5 to 3

Retain the same number of stages (6)

Increase the ‘Optimization to Walk Forward ratio’ to 4.67 (from the initial value of 3.0)
An ‘ACCEPTABLE’ level of statistical significance is the bare minimum we should tolerate. So let’s go ahead and run this by clicking on the button ‘Run with these settings now’.
We first have to decide which 2 parameters we will no longer optimize (remember that we need to reduce this number from 5 to 3). This often comes down to instinct of which parameters have the least impact on the success of the system (only the individual trader will know this, based on their knowledge of their own system).
IMPORTANT: When removing a variable from an optimization, the trader must select a ‘default value’ for this variable that now makes most sense. Taking a look at the previous optimization and choosing the an average value for this variable that seemed to perform best, is usually the most logical approach. Following this, the Machine Learning Optimization (MLO) will proceed.
Machine Learning Option 1 Results
Let’s take a look at the results. Considering the new equity curve below, we can immediately see that there has been a positive impact on the results. The outofsample results show 5 of the 6 stages are now profitable, with only one significant drawdown throughout.
The quantitative metrics shown below are also much improved:
Because this is a test informed by machine learning settings, Walk Forward Pro also now displays an additional comparison equity chart:
This is a chart that shows the timeframe that is ‘common’ to both the initial ‘standard’ test (the blue dotted line), and the Machine Learning test (the solid red line). As can be seen, the machine learning test compares very favourably to the initial standard test, with a smoother equity curve, and smaller drawdowns. We can see that the Reward/Risk measure (CAGR/MaxDD) is now 0.982 compared to 0.113 (which is over 8x better). The summary of results in our study so far are as follows:
Machine Learning Case Study Results Table
The following table shows the results of the initial baseline walk forward optimization, compared with the walk forward optimization that used settings informed by the Walk Forward Pro Machine Learning Module (and therefore with improved statistical significance).
Test Type  Optimization Predictive Power  Walk Forward Confidence  Adaptability Score  Reward/Risk (CAGR/MaxDD) 

Standard (Baseline)  POOR (0.69)  EXCELLENT (11.25)  GOOD (1.66)  Baseline 
Machine Learning Option 1  ACCEPTABLE (1.34)  EXCELLENT (8.18)  GOOD (1.08)  0.982 (8x improvement from 0.113 in likeforlike period) 
It is worth noting that the machine learning algorithms ‘balanced’ out the statistical significance to address the serious issue we initially had with the ‘Optimization Predictive Power’. This was increased from POOR (0.69) to ACCEPTABLE (1.34).
The ‘Walk Forward Confidence’ and ‘Adaptability Score’ did need to be reduced in order to achieve this however. This is the ‘balancing act’ that Parts 1 to 3 of this series discussed.
Important Note: The improvement that has been achieved by using settings suggested by the Machine Learning Algorithms, is significant. And this has been achieved by optimizing fewer parameters. This effect is seen time after time in our own testing. It completely eradicates the myth held by many traders that "the more parameters that are simultaneously optimized, the better". This is simply not true.
The usual outcome of this approach is ‘overoptimization’ of the trading system. The reason for this is that the insample statistical significance is severely degraded, leading to a low predictive power during the optimization process.
This low predictive power means that the selection of parameters is based more on randomness and chance, than on the effectiveness of the parameter value(s).
This improvement has been achieved by optimizing fewer parameters. This effect is seen time after time in our own testing. It completely eradicates the myth held by many traders that "the more parameters that are simultaneously optimized, the better". This is simply not true.
This leads to poor results in the outofsample walk forward phases of the test and also poor results if the system is traded in a live account.
It comes as a surprise to most traders, but reducing the number of variables being optimized to a level that ensures an adequate level of statistical significance, usually improves the outofsample test results and also the improves robustness of the system significantly.
The purpose of the Walk Forward Pro Machine Learning algorithms is to determine how many variables can be simultaneously optimized; what the optimal ‘number of stages’ is and what the ‘optimization to walk forward ratio’ should be, in order to produce the optimal WFO.
Best Practice MT4 and MT5 EA Optimization Software Walk Forward Pro
Amongst many other features, Walk Forward Pro calculates the statistical significance of both insample and outofsample phases for you automatically.
Walk Forward Pro helps the user improve the predictive power.of optimizations, leading to more robust systems.
Read MoreRunning Machine Learning Option 2
As you will know from parts 13 of this series, the more price data that’s used for the test, the more statistically significant the results will be. Option 2 focusses on this area. As can be seen, ML Option 2 assumes that there is twice as much data available (10 years as opposed to the 5 years that was used in the initial standard test).
Based on this additional data being available, the Machine Learning Algorithms have determined that:

The number of parameters being optimized should be reduced from 5 to 3

The number of stages can be increased from 6 to 11

The ‘Optimization to Walk Forward Ratio’ should be increased from 3.0 to 4.96
Let’s run the test and see what the results show:
The equity curve above represents the 11 outofsample walk forward stages. Once again this is far superior to the initial run (where we had overoptimized). The quantitative metrics are as follows:
What we are most interested in is the equity comparison chart, since this allows a direct comparison of the ‘common timeframe’ shared by both the initial standard test and this machine learning test:
Remember that the blue dotted line represents the initial standard test, while the solid red line represents the new Machine Learning Option 2 test. Although there is an improvement in the results, it doesn’t appear to be as big an improvement as we achieved in Option 1. Usually we might expect an even bigger improvement with Option 2 due to the increased statistical significance, however, this isn’t always the case and is actually proof that often in trading, the more simplistic approach produces better results. So let’s take a look at the summary of results so far:
Machine Learning Case Study Results Table
The following table shows the results of the initial baseline walk forward optimization, compared with the walk forward optimization that used settings informed by the Walk Forward Pro Machine Learning Module (and therefore with improved statistical significance).
Test Type  Optimization Predictive Power  Walk Forward Confidence  Adaptability Score  Reward/Risk (CAGR/MaxDD) 

Standard (Baseline)  POOR (0.69)  EXCELLENT (11.25)  GOOD (1.66)  Baseline 
Machine Learning Option 1  ACCEPTABLE (1.34)  EXCELLENT (8.18)  GOOD (1.08)  0.982 (8x improvement from 0.113 in likeforlike period) 
Machine Learning Option 2  ACCEPTABLE (1.92)  EXCELLENT (54.14)  VERY GOOD (2.00)  0.277 (7x improvement from 0.036 in likeforlike period) 
Machine Learning Option 3
Option 3 attempts to produce improvements in results by reducing the number of parameters being optimized even further than in Option 1 (typically half the number of Option 1). Because Option 1 used three parameters, Option 3 uses two parameters (rounded up).
So let’s go ahead and run our optimization using these settings. Looking at the likeforlike periods compared with our initial standard test, we again see a significant improvement.
Here the reward to risk measure was 0.890 compared with 0.057 for the like for like period in the standard test. The final summary of our findings can now be presented. See the results table below:
Machine Learning Case Study Results Table
The following table shows the results of the initial baseline walk forward optimization, compared with the walk forward optimization that used settings informed by the Walk Forward Pro Machine Learning Module (and therefore with improved statistical significance).
Test Type  Optimization Predictive Power  Walk Forward Confidence  Adaptability Score  Reward/Risk (CAGR/MaxDD) 

Standard (Baseline)  POOR (0.69)  EXCELLENT (11.25)  GOOD (1.66)  Baseline 
Machine Learning Option 1  ACCEPTABLE (1.34)  EXCELLENT (8.18)  GOOD (1.08)  0.982 (8x improvement from 0.113 in likeforlike period) 
Machine Learning Option 2  ACCEPTABLE (1.92)  EXCELLENT (54.14)  VERY GOOD (2.00)  0.277 (7x improvement from 0.036 in likeforlike period) 
Machine Learning Option 3  GOOD (2.90)  VERY GOOD (6.73)  ACCEPTABLE (0.29)  0.890 (15x improvement from 0.057 in likeforlike period) 
We hope you now see the benefit (necessity) of achieving statistical significance in your back testing and optimization processes.
If you would like to find out more about Walk Forward Pro you can do so here