The Center for Data Innovation spoke with Thomas Wiecki, director of data science at Quantopian, a crowd-sourced algorithmic trading firm based in Boston. Wiecki discussed how crowdsourced algorithms perform compared to traditional hedge funds, as well as how Quantopian uses a technique called probabilistic programming.
Joshua New: Quantopian is a pretty unique kind of investment firm. Could you explain its business model?
Thomas Wiecki: Quantopian is an investment manager powered by a community of over 140,000 quants (quantitative analysts). For our community we have built a state-of-the-art backtester, a research environment, and provide a rich suite of data sets to find alpha in. While we never look at source code, we constantly scan the exhaust of the algorithms—the returns —developed on the platform for something interesting. Once we identified such an algorithm, we work with the algorithm author to license their strategy and include it in our fund where we invest tens of millions into individual strategies. While most of the returns generated by an algorithm obviously go back to the investor, the quant gets a 10 percent share of the net profits generated by his or her strategy.
New: How well do the algorithms you crowdsource perform compared to other kinds of quantitative investment firms that invest heavily in their own proprietary algorithms?
Wiecki: As the user-base and the number of strategies is so vast there are all kinds of different strategies—many of which a sane person would never invest in. However, a small percentage of them can definitely hold their own against the best of what other quant firms develop. Our community is quite extraordinary in many ways and I love interacting with them in real life at our events like the QuantCon conference. You meet so many smart and driven individuals from diverse backgrounds ranging from astronomy to chemical engineering to machine learning. One of my favorite stories is an actual Russian astronaut who uses Quantopian.
In terms of strategy styles we also see a wide array of different ideas. From classical mean reversion and momentum strategies to more exotic ones. We’re particularly focused on cross-sectional algorithms, pair trading algorithms, and event-driven algorithms.
New: You’ve written about how important probabilistic programming is for your work. Can you describe what this is and how Quantopian uses it?
Wiecki: Probabilistic programming is a very convenient and flexible way to build Bayesian statistical models. For many problems we face in the research team, it turned out to be the perfect tool for the job. One particular problems it that of evaluating algorithms. One of the first things we learned is that you can’t really trust a backtest on historical data as it is inevitably overfit. We thus focus on the period of time after the algorithm was created as the quant couldn’t have overfit on it: the out-of-sample period. However, how long do we have to wait to gain confidence? Probabilistic programming allows us to quantify our uncertainty into the strategy over time given its current track-record. Based on that probability estimate of the strategy having an edge, we decide whether we have enough confidence to deploy, or whether we should wait for more track-record to accumulate.
Being able to build complex models also allows us to solve other problems, too. At the core, we have the problem of finding a needle in the haystack. As such, the question is always whether a strategy looks good because it really has an edge in the market or whether it’s just random luck. By setting priors we can constrain estimates to a reasonable range and regularize our estimates to reduce the risk of a false-positive. However, how should we set these priors best? We could just use our own judgement here but a more satisfying method is to actually learn the right amount of regularization from the data itself. This is called a hierarchical model and we train it on a large number of strategies to learn the natural variability in strategies which can then constrain our estimates of individual strategies in a smarter way.
For all of this modeling we use PyMC3, an open-source Python package that Quantopian sponsors and I actively help to develop.
New: What is Quantopian Research? Have you learned anything that surprised you from it?
Wiecki: The research team in general solves quantitative problems at Quantopian. Mostly that’s focused on the fund, but we also developed some content around machine learning. For the fund we have recently built a risk model with some very interesting characteristics that combines macro and statistical risk factors. Probabilistic programming is actually a very interesting tool here as well.
We have also analyzed a large cohort of trading algorithms and compared their backtest to their out-of-sample performance. Interestingly, while the backtested Sharpe Ratio is not a good predictor of the out-of-sample performance, the tail-ratio—a measure that compares the upper and lower tails of the returns distribution—turned out to be a pretty good predictor. Even more interesting is that various backtester metrics can be combined using a machine learning classifier to predict out-of-sample performance which much higher accuracy. We have published our findings online.
New: Is there an ultimate goal when it comes to algorithmic trading? As in, could Quantopian eventually have the most effective sets of trading algorithms possible so people wouldn’t even bother trying to develop new ones?
Wiecki: One of the key challenges in algorithmic trading is that markets are changing constantly. This is in stark contrast to many other domains like physics where the equations are supposed to describe unalterable facts of the universe. As such, we have to constantly be on our toes to stay innovative and adapt to new challenges. This is a core strength of our community-focused approach. As new users from all over the world join the platform every day we have an inexhaustible source of fresh ideas that keep us one step ahead.