Welcome to the prediction network. It's easier than you think.
Visit the knowledge center or jump right into the prediction web:
Got bigger questions? Raise an issue on Github, or a question at Gitter, or email us.
This site makes it easier for you to use the open microprediction network. See a first look at microprediction.
Any business operation can be optimized through frequent repeated predictions. See business uses of microprediction.
Anyone publishes live data. Anyone predicts it. All in real-time.
The open source project is supported by Intech Investments, a major U.S. fund manager whose expertise lies in converting volatility into portfolio alpha.
You can use it right now, for free, to predict anything you measure (see getting predictions). If you are a data scientist, you can supply predictions right now by creating Python, R or Julia programs (or using the API directly).
Raise an issue on Github, or a question at Gitter, or email us.
Okay so that's not actually a frequently asked question, but it is important. The short answer is "value functions". Techniques from Reinforcement Learning and Control Theory can be used to reduce business optimization to frequently repeated prediction. See the talk on repeated value function prediction.
See business uses of microprediction where we explain why you probably are, or could be. Don't think of microprediction as prediction - the term is loaded.
No. See business uses of microprediction. The "garbage-in-garbage-out" meme is for unconsidering humans only. You're better than that.
Yes but you need not label it. You can transform it, disguise it, predict related quantities - or the difference between something you care about and your existing predictions of the same. There are many ways to use public prediction for private purposes. Contact us.
Only if you wish to receive prize-money.
You can assign prize money rights for your identity to a charity if you wish, and never reveal you identity (Charles Dickens would approve). We will introduce a set_charity() method soon. By default we will choose a charity from our list. At present the list is:
As noted in Dorothy, You're Not in Kaggle Anymore nearest neighbor isn't always the best way to understand something.
... but you are probably correct to treat AutoML as the nearest neighbor. However:
Read Dorothy, You're Not in Kaggle Anymore - though this title is somewhat ironic now that you can participate from inside Kaggle! See Kaggle kernel example.
Quite aside from the open source network ambition ...
You don't.
You run a program in the privacy of your own home that creates your identity, without interacting with the site in any way. See...
Your private identity. A write key is a Memorable Unique Identifier (MUID). See our explanation of memorable unique identifiers.
Only a human would ask that. Welcome to the prediction network, thriving with artificial life forms. They don't all have emails, or bodies.
You can also contact us directly. As noted above we'd love to answer your questions via Gitter too or via issues on Github (or even questions at Quora or similar).
The prediction network is for any use. Commercial use is fine, with acknowledgement. See the terms of use.
Probably not. However here are some ideas.
See the Knowledge Center for resources.
There is a new listing of job opportunities that may interest you and some advice in the articles Want an Interview and Dorothy, You're Not in Kaggle Anymore.
This site offers you a chance to prove that you can successfully maintain a well-considered, live algorithm that is helping someone, somewhere. Maybe it is nicely packaged on Github. Well done! That won't guarantee you an interview in a competitive data science job market, but it might not hurt.
Kaggle is too easy. Not every employer will be impressed by offline analysis of data prepared for you by someone else.
Yes. Use the set_repository() method on you crawler. It will create a clickable CODE badge that will appear on the leaderboards.
If you are a data scientist, you might be used to contest setups which are quite different. There is a warning against anchoring in Dorothy, You're not in Kaggle Anymore.
You run a program (a "crawler") that monitors live streams published here, and publishes predictions of them. You can start by running the default crawler.
Read An Introduction to Z-Streams for an explanation. Short version:
There are rapidly falling marginal benefits of having more of your guesses close to the truth. It is much better to be reasonably close more often than have lots of guesses close infrequently.
Spread them as regularly as possible but in such a way that they reflect your best estimate of the distribution of outcomes. For example if you know the inverse cumulative function of your forecast, apply that to evenly spaced numbers taken from (0,1).
Your most recent submissions will continue to be judged as new data arrives. But this isn't quite the Hotel California. Notice that MicroWriter has a cancel() method. Once that is called, you will only be judged on predictions already made (in just over an hour, you're out, worse case).
Actually no. You can run the default crawler without any modification.
On your local machine ... or the cloud ... or wherever you like. Just be sure it will run indefinitely. We make some suggestions in the tutorials for cloud use but we have no affiliation with any cloud compute provider.
There are many ways to ensure your crawler bounces back. See bouncing with bash, for example, or run an always-on task if you are using PythonAnywhere.
We have no affiliation with cloud providers, but many are quite generous.
You can do this on the fly, or in more of an offline manner.
These are mere examples.
This depends on your preferred language.
You can try googling for clients in other languages. We will provide better supporting documentation and tutorials here in due course. You can also use the API directly from any language.
You don't. This isn't Kaggle. You run a program that views the streams on the site and makes http calls to submit predictions.
See retrieving historical data or, if R is your thing, R module 1.
Yes, however this is better suited to predicting things whose distribution won't change quickly (such as z1~, z2~ or z3~ streams). With other streams, a one-off use of a notebook to submit predictions may quickly lead to stale predictions and send you plummeting down the leaderboard.
Yes. You'll be there by the time you get to:
See modifying how your crawler makes predictions.
It will do it automatically. However you can also list current prizes.
For whatever you want. Here is an example of using a notebook to pull data and do something interesting (well we thought it was). Here you can analyze emoji use in real-time during Presidential Debates, for example.
Yes. Take a look at the leaderboard and you will see CODE badges. Click through to open source implementations. There are also some detailed introductions to classes in the microprediction repository that you can love or leave alone. For example FitCrawler (Comal Cheetah) provides an example of a crawler that runs an online algorithm but occasionally re-fits parameters.
Yes. However this is generally not advisable except for z-streams (those with z1~, z2~ or z3~ prefixes) - and even then the variance of those distributions may well change throughout the day.
Why aren't predictions a single number?
Can I retrieve moments, such as the mean prediction?
Can I publish multiple quantities and create z2~ and z3~ streams of my own?
Over what horizon will my data be predicted?
On what frequency should I publish my data?
What prevents someone from creating thousands of spurious algorithms?
I don't understand MUIDs. What is a MUID?
Can I help you generate MUIDs so they can support civic or worthwhile prediction?
Why aren't predictions a single number?
Can my algorithms make their own predictions?
Can I solicit multivariate distributional predictions?
What prevents someone from creating thousands of spurious algorithms?
I still don't understand MUIDs. What is a MUID?
May I have a key please so I can create streams?
Can I help you generate MUIDs so they can support civic or worthwhile prediction??
The power of this platform comes from the ability to combine many forecasts and this works better if algorithms supply distributional predictions.
Another (related) reason we ask algorithms for multiple scenarios is that if they supply only one, this single number (point estimate) can be difficult to interpret. Imagine asking for predictions of where the tennis ball will land after it leaves Roger Federer’s racket. You can’t summarize that with a single number.
There is a third reason. Distributional estimates admit a stateless reward clearing mechanism that is O(1) in the number of contributions. This platform can handle roughly as many incoming scenarios as, say, Nasdaq processes trade orders. No shortage of cognitive diversity here!
The cumulative distribution function (CDF) provides a slightly noisy estimate. A better, more precise API is on the way that will provide moments.
Yes. This is an advanced use case requiring a rarer MUID (length 12+1, see min_len parameter in config.json). You can use the Copula API or the MicroWriter.cset method in the Python library to simultaneously set the value of multiple streams.
Be sure to read An Introduction to Z-Streams. As is made clear in that article, your action will trigger prediction of:
If z-scores of z-curve projections of implied z-scores sounds a bit elaborate, it may well be. However, it encourages a separation of concerns in attacking an extremely difficult problem - the construction of accurate multi-dimensional joint distributions.
No. You can use the API directly, which is straightforward.
See the API page.
Loosely speaking, crawlers will predict your data (roughly) 1 minute, 5 minutes, 15 minutes ahead and 1 hour ahead. See An Introduction to Z-Streams.
To be more precise in answering the previous question, predictions get quarantined for 70 seconds; 15 minutes plus 10 seconds; 15 minutes plus 10 seconds, and 1 hr less 45 seconds. For the purpose of discussion, we shall presume that you are most interested in the 15 minute horizon. Then there are two main cases:
Where possible, try to avoid forcing algorithms to make predictions in a tight window. Many algorithms crawl to many different data streams so the chance of them helping your stream will be greater if they are able to predict your stream at a time of their choosing.
In general, you should think about using the prediction API the same way you think about time series forecasting pre-processing. Differencing, rescaling and transformations may be helpful. Try to ensure that changes in your time series are very roughly on the scale of 1.0, as compared with thousands.
As a more general comment, you want to attract the best algorithms and not just those that specialize in finding edge cases to exploit. You can view some examples of feeds which exhibit these subtle differences.
See An Introduction to Z-Streams.
Absolutely. See predicting and crawling instructions. As noted, you can use a different key to add volume and draw attention to a particular horizon.
Anyone and everyone.
Participation requires a key. Generating a key requires some appreciable compute time because write_keys are memorable unique identifiers (MUIDs). The system tracks a balance associated with each MUID and prevents participation once the balance drops below a bankruptcy threshold that is set at a negative number. Thus creating noisy algorithms is possible, but chews up CPU time. Furthermore, poorly performing algorithms don't count in the computation of the community distribution.
Honestly you don’t have to worry too much about what MUIDs are, as the library will generate you a write key. However, here is some information here that can help you understand why the generation of a write key takes some time.
Memorable Unique Identifiers are just randomly generated numbers that happen to be very lucky, in the sense that the hex digest of their SHA-256 hash looks like a cute animal name. If that doesn't make sense then:
We have a stockpile of MUIDs that can be used as write_keys for worthwhile purposes. Email us at info@microprediction.org.
What a great idea! Email us at info@microprediction.com if you have some spare CPU capacity.
Yes. It’s relatively painless and inexpensive to sponsor a competition. If your organization is interested in sponsoring, please email us at payments@microprediction.com.
Micropredictions, LLC will be offering compensation for various types of contribution to this project including, but not limited to, the creation of high performing prediction algorithms.
Yes. Micropredictions, LLC is supporting the open source code development.
You can expect 99.99% uptime. We make no warranties.
See An Introduction to Z-Streams.
You submit 225 numbers.
The power of this platform comes from the ability to combine many forecasts and this works better if algorithms supply distributional predictions.
Another (related) reason we ask algorithms for multiple scenarios is that if they supply only one, this single number (point estimate) can be difficult to interpret. Imagine asking for predictions of where the tennis ball will land next after it leaves Roger Federer’s racket. You can’t summarize that with a single number.
There is a third reason. Distributional estimates admit a stateless reward clearing mechanism that is O(1) in the number of contributions. This platform can handle roughly as many incoming scenarios as, say, Nasdaq processes trade orders. No shortage of cognitive diversity here!
Absolutely. Why not publish your model residuals? You achieve at least ongoing performance analysis of your model, and also an accurate distributional description of your residuals. Perhaps also:
Every time a data feed value arrives, an implied z-score is computed based on the existing predictions from yourself and other algorithms. A secondary stream is automatically created where algorithms predict these normalized z-scores. (Your algorithm may be better at predicting the z-scores than the original margins, or vice versa).
See An Introduction to Z-Streams.
See An Introduction to Z-Streams.
See An Introduction to Z-Streams.
It isn't directly supported ... but we can't stop you using the same space-filling curve trick that is employed for the z2~ and z3~ streams. You could provided a univariate time series that is intended to be unpacked into bivariate or trivariate say (which gives us an excuse to show the space filling curve again). However if you do this, you are very strongly encouraged to use the exact same conventions and code. See the microconventions package on GitHub or PyPI. To do otherwise is probably asking too much of the human and artificial life forms here.
Absolutely. See predicting and crawling instructions. As noted, you can use a different key to add volume and draw attention to a particular horizon.
Anyone can, including algorithms that create algorithms.
Participation requires a write_key. Generating a write_key requires some appreciable compute time because write_keys are memorable unique identifiers (MUIDs). The system tracks a balance associated with each MUID and prevents participation once the balance drops below a bankruptcy threshold that is set at a negative number. Thus creating noisy algorithms is possible, but chews up CPU time.
Honestly you don’t have to worry too much about what MUIDs are. Just run the code and don't lose your key.
Memorable Unique Identifiers are just randomly generated numbers that happen to be very lucky, in the sense that the hex digest of their SHA-256 hash looks like a cute animal name. If that doesn't make sense then:
We have a stockpile of keys (i.e. MUIDS) for worthwhile purposes. We have a very liberal definition of worthwhile. Contact us.
What a great idea! Email us at info@microprediction.com if you have some spare CPU capacity.
Yes. It’s relatively painless and inexpensive to sponsor a competition. If your organization is interested in sponsoring, please email us at payments@microprediction.com.
Micropredictions, LLC will be offering compensation for various types of contribution to this project including, but not limited to, the creation of high performing prediction algorithms.
(what this space for announcement)
Yes. Micropredictions, LLC is supporting the open source code development.
You can expect 99.99% uptime. We make no warranties.