First time here?
If you seek context or motivation, here's a first look and some use cases to get you thinking about the potential for an open, democratic prediction network.
Virtual office hours
The fastest way to get going is to drop by our regular Google Meet: dgh-gqxg-mwn at the following times:
|Tuesday 8-9pm EST
||Friday 12-1pm EST
Or contact us to be included in the invite.
It moved to Frequently asked questions. Raise an issue on Github, or email us if you are stuck.
Python Ultra-Quick Start
There's a new way to enter contests that is dead easy and "set and forget". Just fork this repository and enable GitHub actions. There's a notebook in the repo you can use to generate yourself an identity. There are some limitations to this approach, but it will get you on the leaderboards very quickly. If you enjoy copulas of multivariate distributional estimation, this one's for you!
See also our guide to GitHub actions and the blog article How to Enter a Cryptocurrency Copula Contest.
Python Video Tutorials
If you are looking to create streams or run "crawlers" which continuously predict streams in-process, it is best to work through these tutorials covering the basics.
- Your first submission
- Your first crawler
- Retrieving historical data
- Creating a data stream
- Modifying how your crawler makes predictions
- Modifying where your crawler makes predictions
Python Notebook Examples
The repository contains notebook examples you can easily run (for example just open them in Google Colab).
Example Crawlers and Patterns
See example_crawlers/README and don't forget you can click through to many examples directly from the leaderboard. Some examples:
- There's no shame in starting with something like Booze Mammal as suggested in the tutorials. That particular example always thinks the tails will be too thin, and I'll let you decide if that's a good idea or not.
- Shoal Gazelle uses the (new) StreamSkater class. This makes it trivial to put any model from the TimeMachines package in a crawler and set it loose. You can look at the Elo ratings for those point forecasts to get an idea of which are likely to work well. The TimeMachines package makes it easy to use Facebook Prophet, TSA and other popular libraries with one line of code - though that doesn't make them computationally efficient.
- Thallodal Cat is a SequentialStreamCrawler. It uses the simple DigestDist in place of a parametric model.
- See Soshed Boa for an example of using statsmodels.tsa auto-regressive model with automated order selection.
- Exactable fox and Boatable clam use the echochamber package. As you can see from the crawler code, the ESN is so fast to fit that it simply does this on the fly.
- Floatable Bee is another example of on-the-fly fitting.
- In contrast, Yex Cheetah and Comal Cheetah take advantage of stored parameters that are updated periodically by a separate process. Images such as this one allow for easy quality checking.
- The FitCrawler class is intended to make estimating models during downtime easier, and you can also supply an optional URL where parameters are stored. The FitCrawler is an example of SequentialStreamCrawler and illustrates some abstractions that might shorten your work. In particular the FitDist class represents a distribution that has a loss function, such as likelihood. An example is provided by expnormdist.
Utilities for Distributional Sample Generation
module contains utilities for quickly generating 225 samples. For instance there are several ways to generate distributional submissions from point estimates, whether or not the time series takes on discrete values or not.
More Python Patterns
- See our page on residual modeling for an illustration of a composition (stacking) pattern where one model predicts the errors of another.
- GitHub actions enable free ongoing scheduled computation. This is a great pattern for participation when a running process (crawler) is overkill, or when on-the-fly model fitting is unwieldy. With GitHub actions you can:
- Maintain a repository with up-to-date model parameters like these expnorm files.
- Top up your balance, as with the key maker.
- Submit predictions for z-streams, as with this example.
- Generate reports and pretty pictures.
- Staying in the game is critical. For instance by bouncing using bash scripts, or otherwise, to keep your crawler going.
Welcome R statisticians! At present we don't have a fully fledged client for R, but it is certainly easy to enter z-stream contests. Here is an example:
More on the way soon.
Welcome, Julia developers! Video tutorials are on the way. For now:
More at Rusty's repo including an advanced electricity prediction project.
First and foremost: stay in the game. If your crawler stops, your predictions will continue to be judged (unless you use the cancellation method). If your crawler is persistent it may find its niche sooner or later.
- An Introduction to Z-Streams explains mechanics, quarantine and implied percentiles.
- The Lottery Paradox article discusses the accuracy and reward of distributional predictions.
- As noted above the Crawler Examples README includes good jumping-off points, including ARIMA, ESN, NN, or wacky filters for noisy data.
- Our Listing of Python Time Series Packages ranks prediction, outlier, classification, causality, copula, change-point, matrix-profiling, distribution fitting, hyper-parameter optimization, back-testing libraries and more - all ranked by downloads.
- As noted above our Elo ratings of time-series point estimators may be worth perusing.
- Read How to Enter a Cryptocurrency Copula Contest to better understand the motivations for z2 and z3 prediction.
Yes, there is a book coming. To be published by MIT Press late 2021. Contact us if you are volunteering to proof-read it :)
For those who may be interested, and willing to invest the time, here is a sequence of talks providing what is hopefully a coherent case for an open prediction network. It isn't an elevator pitch. Along the way, we explain why vendor Automated Machine Learning and previous attempts at crowdsourcing might violate a fundamental axiom, at least if quantitative business optimization is to be fully democratized. Pro tip: Some people like to start with #3.
- A first look at microprediction and how it gives rise to the "ten minute data science project".
- What must a microprediction oracle do? The answer isn't Kaggle, or DataRobot.
- Business uses of microprediction include pretty much everything (unless "prediction" is in the name, ironically).
- Repeated value function prediction provides the link between microprediction and Control Theory and Reinforcement Learning, which is why microprediction is truly general.
- Why algorithms make excellent managers, and by implication why humans are terrible. Copious theory means algorithms are well placed to orchestrate production of prediction. Humans need not occupy a blocking role.
Learn to Code
If one person decides to learn to code because they were motivated by the prediction network project, we're done. Seriously, it isn't that hard. Here are some entirely free ways to get going.
- Visual Coding and games to get started with coding concepts (my daughter and I love some of these)
- Learn to Code, in Python if you've never coded before.
- Learn Python more quickly if you've coded in another language.
- Learn Data Science, in Python if you know Python
Please contact us if you can help improve these resources. As noted above we'd love to answer your questions via Gitter, or issues on Github, the discussion on GitHub, or even questions at Quora if they are tagged 'microprediction'.