Mike, What a fantastic set of questions you ask. I agree with all that Bruce writes but would add a few other comments as well. Note, I have comments, but really no answers, and those are the same issues that I worry about. I would add one other big question to your list of questions and would similarly ask for any help / feedback on the extent to which others worry about this or take it into consideration in their own system design:
and the question is: HOW TO HANDLE DATA QUALITY ISSUES? it seems like i go through similar steps as you do, come up with what i perceive to be a winning formula / methodology, test it somewhat rigorously over various time periods and different data sets, and then move into implementation mode. Only when running the system each day, do the data quality issues become apparent. I happen to own KRE today and yesterday resulting from one of my systems, I had a good day in it yesterday and it closed at 4:00 at 20.89 (I was watching it fairly carefully). Low and behold this morning, I open up my trading platform and the market center official close for the symbol was 20.30. This isn't even an error from the data providers, for some reason, the exchange(s) chose 20.30 as the official close. Now, my system that runs on eod data, will forever view yesterday's price action as a loss, when it really was a gain, and any buys triggered by systems based on yesterday's close of 20.30 will record what appears to be a big win today, when in reality the etf closed down 1 cent from what was an executable fill at the close. My worry is that this happens all the time, and when we backtest, we get the system results that are skewed by the bad data and not actually achievable in real-time trading. If anyone has any suggestions about how to handle this, I'd love to hear the comments.
To some of your points: I was fortunate to attend a very interesting seminar this summer presented by Baruch College and Bloomberg Alpha: ARPM'09 (Advanced Risk and Portfolio Management). The presenter / author is generous enough to put all the course materials as well as his entire text book in slide form on his website, which can be accessed here:
http://www.baruch.cuny.edu/math/arpm2009/course.html and here:
http://www.symmys.com -- the book is a good buy as well, btw. You may find some of the material interesting. I've been focused on using PCA as a forecasting tool, with some initially promising results (although i'm still early in the development / testing).
My own view and hope is that statistics is a tool to help with constructing and evaluating forecasts. As you clearly indicate, one has to be very careful in how various statistics are applied to data -- for example, applying many statistics to price data is generally taboo, while applying them to return data, is potentially useful. I tend to focus only on dividend adjusted return data (actually log return) and only use price series when i'm buying or selling.
Finally, I'd add that given everything I've read and learned on the subject of building trading systems, there is clearly no one right way to do it. Given the thought you've put into the questions you've asked, i'd bet the common sense approach you take to designing and building systems is as good as most other approaches out there and would probably serve as a "best practice" for many.
Regards,
Steve