Dear NN-experts,
according to neural networks literature the normalization of all the Input and ouput data is highly recommended.
What do you think about that?
What kind of normalization (std, mean, et.) do you use, how is your calculation?
Regards
Christian
Size:
Color:
In two cases I'm familiar with, Neuro-Lab and one other, simple normalization is not needed. It's being done by the NN software, as the data is being read, on both the inputs and the output. The bigger problem is outliers. Severe outliers can dramatically distort the range, such that most observations than fall into a small subset of the normalized range.
Here's my strategy for outliers: In Neuro-Lab, when you do "Apply Training Set" it gives back Min, Max, Mean and Standard deviation. Min and Max can show outliers in your data. When I see that, I sometimes range check that variable and limit it to zero (assumed) plus/minus 2 or 3 standard deviations. This can be cpu intensive, iterating through one or more affected DataSeries. Remember that this range check also happens during calculation of NNIndicator.Series, when the input script is also run.
It would be more useful if "Apply Training Set" displayed the median, rather than (or in addition to) the mean. That would get rid of the zero assumption.
Len
Size:
Color:
Len, thanks for your feedback.
At what time and where do you process the range check?
Input and output script? Or filter?
Thanks a lot in advance
Christian, NN beginner:-)
Size:
Color:
QUOTE:
At what time and where do you process the range check?
Very often in the output script and for specific DataSeries in the input script where Min or Max mentioned above show that there are outliers. Unfortunately, it only takes one outlier to distort the results.
Len, Old man :-)
Size:
Color:
Hi Len,
In post 2 you describe how you avoid outliers with a range check.
Do you have a c# snippet of doing that?
Regards
Christian
Size:
Color:
Post #6 of "Neuro-Lab: Defining Output Values" Input or output, the same principal applies.
Size:
Color:
Hi Len,
I thought I would continue this thread rather than start a new one on the same topic.
How does Neuro-Lab actually normalize input data?
The reason I ask is because I am seeing different results when I normalize (actually, scale from 0 - 100) all input data before feeding it to neuroLab.Input() versus when I don't scale it. I realize that the training of each network is unique, but when I train two networks that are identical except for the normalization (same training data), I can see distinct and repeatable difference in the resulting NNIndicator values of each network.
Also, as far as I can tell, normalizing the input data requires accessing the entire range of data so it can be scaled based on the max high and low values. In other words, you need to loop over the entire data series to find the max high & low values, and then scale the data based on them. But I'm not sure if it is "safe" to do this in a real auto-trading strategy?
Also, obviously the scaled values will change based on max high/low values in the data range. So the resulting value for a specific bar will change depending on the time range that was normalized/scaled.
The function I use is below.
Any thoughts/insights?
Thanks,
Tim
CODE:
Please log in to see this code.
Size:
Color:
QUOTE:
How does Neuro-Lab actually normalize input data?
Here is the pertinent section of neural network xml:
CODE:
Please log in to see this code.
This defines the data ranges
seen in the training data. Using these, the input and output Dataseries are transformed to have min of zero and max of 1. Your Rescale routine simply makes all Mins = 0 and all Maxs = 100.
The issue I see with your routine is the complete bypass when "hi" not greater than "lo" (without notice). You don't know when you're passing a DataSeries of zeros into the NN calculation, distorting the result. This could account for the difference you're seeing.
Sidebar:From Min/Max output we can also deduce the meaning of the NNIndicator 0 to 100 output by the formula:
CODE:
Please log in to see this code.
QUOTE:
So the resulting value for a specific bar will change depending on the time range that was normalized/scaled.
Absolutely true. A corollary - One of the issues in data modeling is detecting/handling the case where
use of the network with new data includes values
beyond the range seen in training. NN output in this case is indeterminate and can be wildly erratic.
Size:
Color:
Thanks Len!
OK, so it appears that during training it doesn't actually transform the input data into a new data series (like I was trying to do), but just converts the raw values on the fly for each input. I suspect that's why we don't see transformed data in the "Statistics" or "Trainings Set" tabs?
But what about after the networks is trained, when actually using the NNindicator... on different stocks. Are these input values also normalized? How would it know the MinOutput and MaxOutput unless it first sees all of the data in the series?
Assuming that it does scan through the entire data series, is it "safe" for it to do this before we use the indicator in our strategy? In general I assume we need to be very careful that we don't leak any kind of "future knowledge" into the strategy.
Yes I agree that my I should remove the "if (hi > lo)" check. I just wanted to prevent the NaNs from appearing in the data (see attached) and messing up training.
Any ideas on how can we deal with this issue of the NNIndicator values changing for the same bar when using different data ranges?
I have over 20 input data series fed into my network and I've seen the bar value of the NNIndicator change by a couple of points just by adding a couple months to the beginning or ending of the data range. This makes it very difficult to gauge what NNIndicator value should trigger an action in a strategy!
Thanks,
Tim
Size:
Color:
QUOTE:
when actually using the NNindicator... on different stocks. Are these input values also normalized?
Yes, NNIndicator normalizes using the same Mins and Maxs established at training. The scan, as you call it, and establishment of mins and maxs is done only once, in Neurolab at the time of "Apply Training Set." My "corollary" in post #8 highlights the inherent risk when NNIndicator
at strategy time encounters input values beyond the range it it saw
at training time. My "Absolutely true" refers to using different
training periods. After the network is trained, the Mins and Maxs are fixed. NNIndicator, in a strategy, will always return the same values for the same inputs.
QUOTE:
seen the bar value of the NNIndicator change by a couple of points just by adding a couple months to the beginning or ending of the data range.
This could only be because the inputs are different. Have you allowed enough lead bars for your indicators to settle?
Size:
Color:
OK, I get it now.
If the min or max values at strategy time are beyond the min or max at training time, then the NNIndicator will behave erratically. I wonder if that is what I am seeing. The different NNIndicator values I saw were 6 months into the data range, but it was during the big drops in 2009. I suspect some of the values at this time were below the training values.
Would re-normalizing the values (finding new min & max) at strategy time be advisable?
Thanks,
Tim
Size:
Color:
QUOTE:
If the min or max values at strategy time are beyond the min or max at training time, then the NNIndicator will behave erratically
1. Erratically, but consistently. Given identical inputs, it will produce the same (though possibly erratic due to out-of-range inputs) DataSeries every time. I get differences due to the Fidelity provider and my preference for the "Percent Of Equity"PosSizer. Fidelity data loading often produces "1 bars added 1 bars corrected" messages. Corrections can lead to differences in NNindicator, and thereby changes in the trades.
2. I try to train for a minimal 10-year period to include the market fluctuations of 2008 and 2009, which produced similarly extreme indicator ranges.
3. In 2013 I had suggested an NNIndicator modification to notify when out-of-range data was detected, but there was "no appetite" to change NeuroLab or NNIndicator. I closed the ticket in 2015.
And a "thank you", Tim. Your questions have prompted me to revisit the topic and write a range check method to validate the inputs in my version of NNIndicator.
Size:
Color:
Thanks Len!
Those are great points. I am also using the Fidelity provider data and a couple external index instruments.
How does the pos sizer affect the NNIndicator values?
I agree it would be very, very nice if we had some way of knowing when the NNIndicator gets values that were out-of-training-range.
Is there an easy way to tell which years are included in the "first/last" 10%-100% data range setting in Neuro-Lab? I would really like to train over the last 10 years, excluding the last year (so I can do out-of-training testing in the most recent past year).
You wrote your own version of NNIndicator?!
By the way, I would have given up on Neuro-lab without your help. I am finally seeing how useful it can be, and have seen some impressive back-testing results. I've even tried using 2 different NNindicators for buy & sell triggers and it worked very good.
Thanks,
Tim
Size:
Color:
QUOTE:
How does the pos sizer affect the NNIndicator values?
PosSizer doesn't affect NN values. Changing inputs affect NN values. Then, since NN value is priority, the new priority order may change the chosen trades.
QUOTE:
... some way of knowing when the NNIndicator gets values that were out-of-training-range
I'll know the value of that soon. I wrote the code to do that, in my NNIndicator, since my last post. I'm testing to see if it materially improves strategy results now.
QUOTE:
You wrote your own version of NNIndicator?!
Yes, mine is 2-3 times faster than the stock version. Before the recent 1.0.3.1 NNIndicator update, it was 20x faster. It's limited to a 2-layer network and requires the
strategy to also have a copy of the NN Input Script.
QUOTE:
... impressive back-testing results.
Yes, the learning curve is worth it. I've been using mathematical models <<bragging now>> since - wait for it - 1966. Back then it was polynomial curve fitting and linear regression. The beauty of neural networks is their "black box" nature. You wonder, "Does 10-bar MomentumPct of Volume signal future gain/loss?" Just throw it in! If it's meaningful, you'll see the improvement in a better formed "Evaluate Performance." No improvement? Take it out and try something else.
Size:
Color:
Len,
Is that <MinOutput> and <MaxOutput> values you indicated above for the entire dataset being trained, or for each stock/instrument in a set?
If it is (as I suspect) for the entire dataset being trained, then (unless I am totally confused) I think this is not sufficient "normalization" for NN training using a set of stocks.
The reason is that having only 1 min and 1 max over all stocks will hide major differences in the range of possible values that that each particular stock actually had.
For a trivial (but extreme) example, over the entire training data available, the min and max close prices for PCLN is 6.60 and 1578.13 respectively (creating a range of 1571.53), but the min and max for QVCA is 1.65 and 31.02 (creating a range of 29.37). A price of 30 dollars for QVCA should obviously not be considered directly comparable to a 30 price for PCLN! We need to normalize the prices for each stock price by itself in order to safely compare them to each other.
Hopefully I am wrong and Neuro-Lab is normalizing all input for each stock. But if it is only normalizing over the entire data set then I think we will want to do something like I was trying to do above with rescale().
Thanks,
Tim
Size:
Color:
QUOTE:
We need to normalize the prices for each stock price by itself in order to safely compare them to each other.
Normalization is not how the comparison problem is solved. You have correctly discerned that price is not always comparable between instruments. Therefore,
it might not be a good input to the neural network. (It is somewhat useful in that low-priced stocks may behave differently than higher priced stocks) A better choice might be a comparable function of price, say MomentumPct of price.
The neural network has no concept of instrument (stock). It sees a stream of observations, simple rows of numbers, where the last number in each row is the dependent variable. From that stream and the network topology, Training computes weights that minimize the error/difference between the output of the network and the dependent variable.
Size:
Color:
OK, that makes sense I guess.
But using MomentumPct is, in a way, normalizing the price values?
I guess the reason I am still skeptical that we don't need to normalize any input data is that I consistently get smaller training error numbers when I normalize/scale all of the inputs in my input script (assuming the same training data and epoch count). Doesn't this indicate that normalizing is helping improve the accuracy of the network?
Thanks,
Tim
Size:
Color:
QUOTE:
Doesn't this indicate that normalizing is helping improve the accuracy of the network?
Peeking is helping improve the (apparent) accuracy of the network.
Think about what is happening when you "normalize" Bars.Close in your training data. You have indeed created a network input that has a better predictive value. But let's look at why that is. Is it not true, in the training data, that a value of zero is
guaranteed not to go lower, and a value of 100 is
guaranteed not to go higher? So, when input is zero, predicting it will go up is guaranteed.
Oh, wait, that's only true in the
training data. So, while your result may appear better, you have achieved that result by peeking. If bar one is a normalized zero, you are
guaranteed that it will go up, because you have peeked the entire DataSeries in your normalization. A better testing result, perhaps, but I wouldn't trust it outside the training date range.
MomentumPct only looks back, a totally different animal.
Size:
Color:
Ahhhh!!
That's what I was suspecting when I asked this (above):
"...normalizing the input data requires accessing the entire range of data so it can be scaled based on the max high and low values. In other words, you need to loop over the entire data series to find the max high & low values, and then scale the data based on them. But I'm not sure if it is "safe" to do this in a real auto-trading strategy?"
So I guess normalizing the input data by looping over all bars is indeed "cheating" by peeking and forcing a hard bottom and a hard top for each data series.
This would help explain why I was seeing very good results inside the trained data, and not so good results outside the trained data.
So is it true that in general we should only "loop over all the bars" one time in any NN input script or a strategy script (and that would be the main buy/sell loop)?
Thanks for all the help,
Tim
Size:
Color:
QUOTE:
So is it true that in general we should only "loop over all the bars" one time in any NN input script or a strategy script (and that would be the main buy/sell loop)?
Yes, true, because doing so learns something about the future. I can't think of a situation where it's not peeking. I didn't see what you were asking at your earlier question.
Size:
Color:
Hi again,
I'm still a little confused about what kind of data is safe to use inside the Input script of a NN.
In a normal strategy there is a loop over all the bars, and it is obvious which bar you are processing in the loop, and that you shouldn't access any bars beyond the current one. But in the NN input script, at least the ones I've seen, there is no loop over all the bars... you just feed
entire data series into the NeuroLab.Input() method.
And there are several indicators available that can return the Lowest and Highest values (and their bars) in a series up to the specified spot (bar). And there is also the DataSeries.MinValue and DataSeries.MaxValue fields that will also return the highest and lowest values in the series.
Is it possible to use these in a legitimate way in an NN input script (ie. without tainting it with future knowledge)?
Being able to know the min and max value that occurred in a series (in at least some subset of all the data) can
drastically improve the usefulness of the data fed to the NN Input() methods.
To illustrate this with a simple example, I created 2 trivial input scripts with the exact same output script.
First input script "nonNormalizedNN" is just inputting the raw Close price and Volume values:
CODE:
Please log in to see this code.
The second input script "normalizedNN" takes the series and divides it by the max value in the series:
CODE:
Please log in to see this code.
These were both configured identically (3 hidden nodes) and trained on the same data with the same number of epochs.
The resulting NNIndicator values were totally different (as you can see from the attached screenshot).
The non-normalized values did not track the increasing close price at all, but the normalized values did.
Obviously the normalizedNN network input script has knowledge of the maximum close price over the historical data. This knowledge is immensely valuable in any buy/sell strategy! I suppose I could keep it out of the NN input script and then add the maximum seen close price inside the buy strategy as it loops over the bars, but I want to know if it is OK to train a NN with it or not. It also is confusing to me that doing this normalizing in the 2nd script made any difference, since I assumed the NN was also normalizing the input data. At least it is keeping track of the max and min values for each series inside the stored XML file.
Sorry for my confusion over this subject, but it is not intuitive to me if using the DataSeries.Max or DataSeries.Min values in a script is peeking or not. It seems like it is, and the results I see indicate it is,
yet the trained NN XML file also keeps track of the min and max values seen, so I'm confused?
(Even if using this info in a NN or strategy script is indeed "cheating", it would only matter during actual live real-world usage if the stock actually goes above the maximum price that was seen during training. But prices
do generally go up, so a stock could go above the maximum price seen in training pretty soon after the system goes live.)
Maybe this subject could be a help topic in the online/built-in help?
On another subject: I've seen good results using a NN to decide when to sell an open trade, but I saw you mention you don't like to use NNs for this. Why is that?
As always, thanks for all the help!
Tim
Size:
Color:
I noticed after I sent that last message that it appears, for some strange reason, that the NNIndicator for the first "nonNormalizedNN" network is mirroring the Volume, rather than the close price?
So I created a new network with the same script and re-trained it and got the same result?
I have no idea why?
Any ideas?
Thanks,
Tim
PS. Here's the entire script of "nonNormalizedNN":
CODE:
Please log in to see this code.
Here is the output script:
CODE:
Please log in to see this code.
Hi Len,
After a few more tests I can confirm that
including both the close price and volume in a trivial input script will cause the trained network to appear to always track the volume, rather than the close price. It doesn't matter what order the inputs are, or even if you include another input, the resulting trained network will always track the volume?
I also have no idea why normalizing both inputs will cause the the trained network to switch and track the close price.
This make no sense to me?! Can you verify and/or explain this behavior?
Thanks,
Tim
Size:
Color:
Tim,
QUOTE:
This make no sense to me?! Can you verify and/or explain this behavior?
How can I say this tactfully. Since I don't know you, I will try humor. You have successfully proven two axioms of computer algorithms...
1. Garbage In, Garbage out
2. You can't make a silk purse from a sow's ear.
I am not going to try to more fully explain your mirroring one or the other results.
The purpose of NNIndicator is to apply weights to data known as of a bar to predict future gain or loss at a future bar. Today's Close and Volume, as raw values or "normalized", are highly unlikely to predict the gain/loss 100 bars in the future.
Any attempt to normalize in the Input Script requires pre-reading the entire DataSeries, ie peeking. Remember that the input script runs not only at time of training, but also when computing NNIndicator in the strategy, which typically occurs ahead of the strategy buy/sell loop. The normalize done by NeuroLab happens
only when training at the time of Select Training Data. It's purpose is to scale the data to an appropriate range for the sigmoid function. If your attempts at normalization result in a better
backtest result, it is due to due to peeking and nothing more. There is another major difference. NeuroLab's scaling is done across all symbols at the same time. When yours runs as part of NNIndicator, it runs one symbol at a time. Drop the idea of scaling.
Sidebar: Your + or - 16 collar in the Output Script is only correct for predicting 5 bars forward (and is dependent on the volatility of your chosen portfolio). I don't know what is appropriate for 100 bars. To find out, comment the collar if/else if and run "Select Training Data". Then multiply the Std Dev column of "My Output" by 3 or 4 to get the collar limits. Adjust and uncomment, then run Select Training Data again before training.
Size:
Color:
Thanks Len. And thanks for the humor... see attached picture I still have hanging on the wall. ;-)
I understand now that we should not do any kind of scaling or normalizing in input scripts. I won't do that any more. I also forgot that NeuroLab only normalizes the data during training, and not when running in a strategy.
Yet I'm still a little fuzzy on what indicators are safe to use in an input script. Obviously we should use oscillators or percentages, where the min/max values are capped/normalized. And of course Lowest, LowestBar, Highest, and HighestBar should not be used. But all indicators return an entire series of data, and we pass the entire series into the input method all at once. How can we know if any peeking was done when creating the data in the series? I just want to make sure I'm not doing any kind of cheating in my input script.
As for my trivial/garbage examples, of course I realized it wouldn't yield any useful result. But I did expect that if I passed in data series A...Z in the input script, and the output script was based on input A, that after training the indicator would favor input A rather than another input. Apparently that's not always the case.
Also, thanks for highlighting the issue with my output collar. I don't normally use 100 bars, but I will check what the std dev is and adjust it accordingly.
Thanks again for all the help,
Tim
Size:
Color:
QUOTE:
Yet I'm still a little fuzzy on what indicators are safe to use in an input script.
The rules are pretty much the same as in a strategy. Indicator values at a specific bar, even if
calculated ahead of the trading loop, are based only on data that occurred prior to and including that bar. That's not peeking.
During training, values from all inputs and the output are aligned to make an observation. All DataSeries[0] with output DataSeries[0], then [1], [2], etc.
QUOTE:
And of course Lowest, LowestBar, Highest, and HighestBar should not be used
Not true, Here's the definition of "Lowest"
Looks back the specified number of periods from the specified Bar and returns the lowest price within that period. That's perfectly acceptable.
Size:
Color:
QUOTE:
Indicator values at a specific bar, even if calculated ahead of the trading loop, are based only on data that occurred prior to and including that bar.
Oh, OK, that makes sense now. Then I assume that's how most indicators are implemented?
QUOTE:
During training, values from all inputs and the output are aligned to make an observation. All DataSeries[0] with output DataSeries[0], then [1], [2], etc.
I'm not sure what you mean here. Are you saying that the entire output script is run for each input DataSeries in the input script?
Alright, if we can use Lowest, LowestBar, Highest, and HighestBar then apparently we can't pass in Bars.Count() for the period?
Do we have to make sure that the specified period is less than the starting lead bar of the training data?
Thanks,
Tim
Size:
Color:
QUOTE:
I'm not sure what you mean here (values from all inputs and the output are aligned to make an observation)
Select Training Data:On the first symbol, in the Input Script, each call to neuroLab.Input(ds) appends a DataSeries to a List<DataSeries>. When you call neuroLab.Output(ds), you define the output DataSeries. Each successive symbol
extends these DataSeries, also determining min and max, mean, and std dev across all symbols. It scales each input and output "super-DataSeries" so min is 0 and max is 1.
Training:Training starts by setting random weights in the network topology. Then it applies ds[0] from each super-DataSeries to the inputs and runs the NN calculation getting a predicted value. The difference from Output[0] is the error for observation 0. It applies ds[1], [2], etc. comparing to Output[1], [2], summing the errors. The average of all the errors is the result of the epoch, one point on the Training Error graph. It intelligently adjusts the weights and repeats, calculating epoch 2, etc. If the inputs chosen are predictive, it will be able to reduce the error. At the end of training, weights are saved.
NNIndicator:The Input Script is run, creating a List<DataSeries>. It scales these DataSeries using mins and maxs learned in Select Training Data. It applies ds[0] from each (now single-symbol) DataSeries in the list to the inputs and runs the NN calculation getting a predicted value. This, multiplied by 100, is NNIndicator[0]. It applies ds[1], [2], etc. giving NNIndicator[1], [2], etc.
QUOTE:
apparently we can't pass in Bars.Count() for the period?
Yes. Simply ask the question, "Would I have known this at the time of bar, say, 100?" The concept of peeking is using information that you would not have known as of the bar where you're using it.
Size:
Color:
Thank you Len for that clear explanation. I think I now understand how the NN works wrt the input & output data. (Is the above info documented somewhere?)
I will not try to normalize the data in the input script. But it sounds like it could be OK to do something similar to this if you have at least 200 lead bars in the Select Training Data tab:
CODE:
Please log in to see this code.
However, based on the fact that the network uses a single min/max (for each input) to scale the entire "appended" dataseries, it does seem that we would get better results from a NN if we trained each one on a single symbol, rather than against a set of them (esp. if they are different kinds/sizes of companies, etc.). Yet it wouldn't be very practical to use numerous NNs in a single strategy. Would it?
Thanks,
Tim
Size:
Color:
QUOTE:
... better results from a NN if we trained each one on a single symbol, rather than against a set of them
Better results? I pretty strongly disagree. If by better results you mean really good
backtest stats, go for it. But if you plan to trade from your strategy's signals, use at least 30 symbols and at least a five year (daily) period. You can train a NN to fit back data of a single symbol almost to perfection, but it's meaningless, because all you've done is memorized the historical pattern. You're rather looking for the general rule, so that performance is good across many symbols and market conditions. For the same robustness reason, you want a long training time period (up and down market conditions) and you don't want to train too long. Overtraining is a big risk with a NN, because the backtest results can look so good, while actual trading results come nowhere close.
QUOTE:
Yet it wouldn't be very practical to use numerous NNs in a single strategy. Would it?
It could be. Perhaps you have a NN that predicts the market in general and a second NN that predicts tech stocks. I could conceive of a strategy that only buys when both are above specific thresholds. (Note that I've never tried it and there could be NeuroLab-specific implementation reasons why it wouldn't work. I might add here that I don't have NeuroLab source code. Some of my answers are conjecture)
The only documentation I know is the User Guide.
Using a Neural Network is a little like the board game, Othello, which had the slogan, "A minute to learn, a lifetime to master."
Size:
Color:
Hi Len,
Thanks for the good advice. After some testing with single stocks and a set of them, I discovered you are right.
And I agree the goal is to get a NN that learns the rules, not the history.
Thanks,
Tim
Size:
Color:
Tim,
For topic purity, may I suggest that you edit your post #30, cutting from "Getting a little off-topic..." though the Othello comment. Then open a new topic, perhaps, "Neuro-Lab: How to train". I'll attempt to address training in the new topic. Or use similar existing thread, "Neuro-Lab: Number of epochs."
I didn't say Othello was that difficult, only that it's motto fits NN development and usage.
Len
Size:
Color:
Thanks Len,
Agreed, so I edited my post.
I was about to re-post in the "Number of epochs" thread, but I found most of my answers there, and also in the
linked thread.
I'll follow what you suggested in that thread (esp. posts 8 & 10). If I have new questions I'll post a new thread.
Thanks!
Tim
Size:
Color: