Knowledge Center

Frequently Asked Questions

Welcome to the prediction network. It's easier than you think.

Looking for "Hello World?"

Visit the knowledge center or jump right into the prediction web:

Python Module 1   R Module 1

 

The Big Questions

Got bigger questions? Raise an issue on Github, or a question at Gitter, or email us

What?

This site makes it easier for you to use the open microprediction network. See a first look at microprediction

Why?

Any business operation can be optimized through frequent repeated predictions. See business uses of microprediction.

How?

Anyone publishes live data. Anyone predicts it. All in real-time. 

Who?

The open source project is supported by Intech Investments, a major U.S. fund manager whose expertise lies in converting volatility into portfolio alpha.

When?

You can use it right now, for free, to predict anything you measure (see getting predictions). If you are a data scientist, you can supply predictions right now by creating Python, R or Julia programs (or using the API directly). 

R Module 1   Python Module 1

 

Conceptual Questions 

Raise an issue on Github, or a question at Gitter, or email us

How it is that frequently repeated prediction drives all Artificial Intelligence?

Okay so that's not actually a frequently asked question, but it is important. The short answer is "value functions". Techniques from Reinforcement Learning and Control Theory can be used to reduce business optimization to frequently repeated prediction. See the talk on repeated value function prediction

Why do I need this if I'm not predicting anything? 

See business uses of microprediction where we explain why you probably are, or could be. Don't think of microprediction as prediction - the term is loaded. 

Do I need to clean my data first?

No. See business uses of microprediction. The "garbage-in-garbage-out" meme is for unconsidering humans only. You're better than that. 

Do I need to reveal my data? 

Yes but you need not label it. You can transform it, disguise it, predict related quantities - or the difference between something you care about and your existing predictions of the same. There are many ways to use public prediction for private purposes. Contact us

Do I need to reveal my identity? 

Only if you wish to receive prize-money. 

Can I assign my winnings to a charity?

You can assign prize money rights for your identity to a charity if you wish, and never reveal you identity (Charles Dickens would approve). We will introduce a set_charity() method soon. By default we will choose a charity from our list. At present the list is:

  1. Digital Divide Data (navigator report)
  2. Girls Who Code (navigator report)

How is This Different to ... 

As noted in  Dorothy, You're Not in Kaggle Anymore nearest neighbor isn't always the best way to understand something. 

Why don't I use automated Machine Learning instead?

... but you are probably correct to treat AutoML as the nearest neighbor. However: 

  1. AutoML doesn't solve data search.
  2. This is much easier. See a first look at microprediction. All the good AutoML is selected for you by ongoing competition.
  3. All the good AutoML is here. If, not, someone will add it sooner or later. 
  4. If a human doesn't add it, an algorithm will add it sooner or later.
  5. The sum is greater. The ensemble of distributional predictions of many AutoML enabled crawlers is invariably superior to the best amongst them (in fact stronger statements could be made - see weak learner literature). 
  6. AutoML doesn't really exist. It still requires sporadic, surgical use of human intelligence. That comes at high cost, whereas here an extremely low cost version is  facilitated by what is effectively a prediction micro-economy.
  7. It takes a village. Microprediction is inherently collective.  See extended rationale in "what must a microprediction oracle do?"
  8. The prediction network is free
  9. The prediction network can scale to many hosts because it is open source. 
  10. The premise is wrong. You can use the prediction network to assess and complement your use of Automated Machine Learning, or in-house modeling (see ongoing performance analysis). 

How is this different to Kaggle?  

Read Dorothy, You're Not in Kaggle Anymore - though this title is somewhat ironic now that you can participate from inside Kaggle! See Kaggle kernel example

How is this different to a prediction market? 

Quite aside from the open source network ambition ...

  1. The intended use is very different: adding real business value in real time, not just intrigue. See business uses of microprediction.
  2. The API provided here is only suitable for frequently repeated predictions of the same type.
  3. Predictions are usable due to the large number of predictions judged out of sample.
  4. Predictions are made by algorithms, not people who come and go. 
  5. The prediction network is legal in all jurisdictions. There is no staking. It is a game of skill. 

Registration Questions 

How do I register?

You don't. 

How do I not register? 

You run a program in the privacy of your own home that creates your identity, without interacting with the site in any way. See...

Python Module 1 

What is a write key? 

Your private identity. A write key is a Memorable Unique Identifier (MUID). See our explanation of memorable unique identifiers.  

Why don't I register with my email? 

Only a human would ask that. Welcome to the prediction network, thriving with artificial life forms. They don't all have emails, or bodies. 

Commercial Questions 

You can also contact us directly. As noted above we'd love to answer your  questions via Gitter too or via issues on Github (or even questions at Quora or similar).

Can I use this commercially? 

The prediction network is for any use. Commercial use is fine, with acknowledgement. See the terms of use.

Can you help me sell my product? 

Probably not. However here are some ideas. 

  1. Use the set_repository() method on you crawler to create a badge on leaderboards that links back to you.
  2. Use microprediction to enhance your data product feed.
  3. Benchmark your AI product, if it can help with time series prediction in some way. 
  4. Join us. Support the creation of the world's first open source prediction network.  

Career Questions 

See the Knowledge Center for resources.

Can you help me get a job? 

There is a new listing of job opportunities that may interest you and some advice in the articles Want an Interview and Dorothy, You're Not in Kaggle Anymore

This site offers you a chance to prove that you can successfully maintain a well-considered, live algorithm that is helping someone, somewhere. Maybe it is nicely packaged on Github. Well done! That won't guarantee you an interview in a competitive data science job market, but it might not hurt.

How is this different to putting Kaggle on my resume?  

Kaggle is too easy. Not every employer will be impressed by offline analysis of data prepared for you by someone else. 

Can I tout my open-source library here? 

Yes. Use the set_repository() method on you crawler. It will create a clickable CODE badge that will appear on the leaderboards. 

Can I prove to someone else I am the creator of a high performing algorithm? 

  1. Use the set_repository() method on you crawler.
  2. Or do nothing. Read about keys. Your public identity is the hash of your write key. Your name on the leaderboard is merely a short version of your public identity. Only you can exhibit a private key whose hash is a match.   

Operating a Crawler

If you are a data scientist, you might be used to contest setups which are quite different. There is a warning against anchoring in Dorothy, You're not in Kaggle Anymore

What does it mean to participate? 

You run a program (a "crawler") that monitors live streams published here, and publishes predictions of them. You can start by running the default crawler. 

Python Module 1

How will I be scored? 

Read An Introduction to Z-Streams for an explanation. Short version:

  1. Your distributional predictions submitted with delay=70 (say) are quarantined for 70 seconds. 
  2. Then they are judges alongside everyone else's when the next data point arrives. 
  3. The more of your 225 guesses (samples)  are close, the more credits you get. 

Why shouldn't I put all my 225 samples on the most likely outcome? 

There are rapidly falling marginal benefits of having more of your guesses close to the truth. It is much better to be reasonably close more often than have lots of guesses close infrequently. 

How should I allocate my 225 guesses? 

Spread them as regularly as possible but in such a way that they reflect your best estimate of the distribution of outcomes. For example if you know the inverse cumulative function of your forecast, apply that to evenly spaced numbers taken from (0,1).  

I stopped submitting and started plummeting down the leaderboard. Why?

Your most recent submissions will continue to be judged as new data arrives. But this isn't quite the Hotel California. Notice that MicroWriter has a cancel() method. Once that is called, you will only be judged on predictions already made (in just over an hour, you're out, worse case). 

Do I need to be a data scientist? 

Actually no. You can run the default crawler without any modification. 

Python Module 1

Where do I run my program? 

On your local machine ... or the cloud ... or wherever you like. Just be sure it will run indefinitely. We make some suggestions in the tutorials for cloud use but we have no affiliation with any cloud compute provider. 

Python Module 1

What if it crashes? 

There are many ways to ensure your crawler bounces back. See bouncing with bash, for example, or run an always-on task if you are using PythonAnywhere. 

Can I leverage free cloud compute? 

We have no affiliation with cloud providers, but many are quite generous. 

How do I estimate parameters? 

You can do this on the fly, or in more of an offline manner. 

  • FitCrawler  (Comal Cheetah) re-estimates models during downtime. 
  • GitHub actions shows how to use GitHub actions to maintain recently calibrated models. 

These are mere examples. 

Where should I get started if I want to enter contests? 

This depends on your preferred language.

R Module 1   Python Module 1

Do you support Julia, TypeScript, or Language-X? 

You can try googling for clients in other languages. We will provide better supporting documentation and tutorials here in due course.  You can also use the API directly from any language. 

Where do I upload my predictions? 

You don't. This isn't Kaggle. You run a program that views the streams on the site and makes http calls to submit predictions. 

R Module 1   Python Module 1

Where do I download data? 

See retrieving historical data or, if R is your thing, R module 1

Can I use a notebook?

Yes, however this is better suited to predicting things whose distribution won't change quickly (such as z1~, z2~ or z3~ streams).  With other streams, a one-off use of a notebook to submit predictions may quickly lead to stale predictions and send you plummeting down the leaderboard. 

R Module 1 Python Module 1

Can I create my own data stream that others will predict?

Yes. You'll be there by the time you get to:

Python Module 4

How do I select which streams my algorithm will predict? 

See modifying how your crawler makes predictions

Can my crawler determine which streams have prizes? 

It will do it automatically. However you can also list current prizes.

Can I use the data in a notebook or application? 

For whatever you want. Here is an example of using a notebook to pull data and do something interesting (well we thought it was). Here you can analyze emoji use in real-time during Presidential Debates, for example.

Are there crawlers I can copy and fork? 

Yes. Take a look at the leaderboard and you will see CODE badges. Click through to open source implementations. There are also some detailed introductions to classes in the microprediction repository that you can love or leave alone. For example FitCrawler  (Comal Cheetah) provides an example of a crawler that runs an online algorithm but occasionally re-fits parameters.

Can I make occasional predictions - say once a day? 

Yes. However this is generally not advisable except for z-streams (those with z1~, z2~ or z3~ prefixes) - and even then the variance of those distributions may well change throughout the day.  

More Questions ...

Questions About Getting Predictions

Why aren't predictions a single number?

Can I retrieve moments, such as the mean prediction?

Can I publish multiple quantities and create z2~ and z3~ streams of my own?

Do I have to use Python or R?

Over what horizon will my data be predicted?

On what frequency should I publish my data?

Can I predict my own stream?

Who contributes algorithms?

What prevents someone from creating thousands of spurious algorithms?

I don't understand MUIDs. What is a MUID?

I have an important use for mankind but don't want to spend many hours producing a write_key. Is there an easier way to obtain a write_key?

Can I help you generate MUIDs so they can support civic or worthwhile prediction?

Can I offer prize money?

Are others offering prize money?

Will it be maintained?

How reliable is the underlying hosted database technology?

Questions About Getting Predictions

 

Why aren't predictions a single number?

The power of this platform comes from the ability to combine many forecasts and this works better if algorithms supply distributional predictions.

Another (related) reason we ask algorithms for multiple scenarios is that if they supply only one, this single number (point estimate) can be difficult to interpret. Imagine asking for predictions of where the tennis ball will land after it leaves Roger Federer’s racket. You can’t summarize that with a single number.

There is a third reason. Distributional estimates admit a stateless reward clearing mechanism that is O(1) in the number of contributions. This platform can handle roughly as many incoming scenarios as, say, Nasdaq processes trade orders. No shortage of cognitive diversity here!

 

Can I retrieve moments, such as the mean prediction?

The cumulative distribution function (CDF) provides a slightly noisy estimate. A better, more precise API is on the way that will provide moments.

 

Can I publish multiple quantities and create z2~ and z3~ streams of my own?

Yes. This is an advanced use case requiring a rarer MUID (length 12+1, see min_len parameter in config.json). You can use the Copula API or the MicroWriter.cset method in the Python library to simultaneously set the value of multiple streams.

Be sure to read An Introduction to Z-Streams. As is made clear in that article, your action will trigger prediction of:

  1. Margins
  2. Implied z-scores for each margin
  3. Pairs and triples of Z-scores projected from R^2 or R^3 back into one dimension using a Morton space filling Z-curve.
  4. Implied z-scores of the projected Z-scores.

If z-scores of z-curve projections of implied z-scores sounds a bit elaborate, it may well be. However, it encourages a separation of concerns in attacking an extremely difficult problem - the construction of accurate multi-dimensional joint distributions. 

 

Do I have to use Python or R?

No. You can use the API directly, which is straightforward.

  1. Send a PUT request to api.microprediction.org/live/YOUR_STREAM_NAME with a scalar value in the data payload.
  2. Make GET requests (e.g. www.microprediction.org/cdf/YOUR_STREAM_NAME) to retrieve the predicted distribution.

See the API page.

 

Over what horizon will my data be predicted?

Loosely speaking, crawlers will predict your data (roughly) 1 minute, 5 minutes, 15 minutes ahead and 1 hour ahead. See An Introduction to Z-Streams.  

 

On what frequency should I publish my data?

To be more precise in answering the previous question, predictions get quarantined for 70 seconds; 15 minutes plus 10 seconds; 15 minutes plus 10 seconds, and 1 hr less 45 seconds. For the purpose of discussion, we shall presume that you are most interested in the 15 minute horizon. Then there are two main cases:

  1. Absolute levels. Say you want people to predict the absolute level of a quantity roughly 15 minutes ahead. You are better off publishing once every 16 or 20 minutes to avoid race conditions, as compared with every 15 minutes. This gives the algorithms a minute or so to absorb the most recent data point.
  2. Changes to near-martingales. On the other hand if you want people to predict a quantity that is approximately a martingale (like a stock price, that is inherently very hard to predict) then you might want to publish the difference of that value sampled every 15 minutes instead - just shy of the 910 second quarantine time. The rationale here is that most algorithms will not need to update their predictions very much. They can make their prediction of the difference a few minutes prior to the 15 minute cutoff without unduly hurting their chances.

Where possible, try to avoid forcing algorithms to make predictions in a tight window. Many algorithms crawl to many different data streams so the chance of them helping your stream will be greater if they are able to predict your stream at a time of their choosing. 

In general, you should think about using the prediction API the same way you think about time series forecasting pre-processing. Differencing, rescaling and transformations may be helpful. Try to ensure that changes in your time series are very roughly on the scale of 1.0, as compared with thousands.

As a more general comment, you want to attract the best algorithms and not just those that specialize in finding edge cases to exploit. You can view some examples of feeds which exhibit these subtle differences.

See An Introduction to Z-Streams.  

 

Can I predict my own stream?

Absolutely. See predicting and crawling instructions. As noted, you can use a different key to add volume and draw attention to a particular horizon.

 

Who contributes algorithms?

Anyone and everyone.

 

What prevents someone from creating thousands of spurious algorithms?

Participation requires a key. Generating a key requires some appreciable compute time because write_keys are memorable unique identifiers (MUIDs). The system tracks a balance associated with each MUID and prevents participation once the balance drops below a bankruptcy threshold that is set at a negative number. Thus creating noisy algorithms is possible, but chews up CPU time. Furthermore, poorly performing algorithms don't count in the computation of the community distribution. 

 

I don't understand MUIDs. What is a MUID?

Honestly you don’t have to worry too much about what MUIDs are, as the library will generate you a write key. However, here is some information here that can help you understand why the generation of a write key takes some time.

Memorable Unique Identifiers are just randomly generated numbers that happen to be very lucky, in the sense that the hex digest of their SHA-256 hash looks like a cute animal name. If that doesn't make sense then:

  1. We suggest this video introduction.
  2. Or ... if you know what a hash is already just read the MUID README.
 

I have an important use for mankind but don't want to spend many hours producing a write_key. Is there an easier way to obtain a write_key?

We have a stockpile of MUIDs that can be used as write_keys for worthwhile purposes. Email us at info@microprediction.org.

 

Can I help you generate MUIDs so they can support civic or worthwhile prediction?

What a great idea! Email us at info@microprediction.com if you have some spare CPU capacity.

 

Can I offer prize money?

Yes. It’s relatively painless and inexpensive to sponsor a competition. If your organization is interested in sponsoring, please email us at payments@microprediction.com.

 

Are others offering prize money?

Micropredictions, LLC will be offering compensation for various types of contribution to this project including, but not limited to, the creation of high performing prediction algorithms.

 

Will it be maintained?

Yes. Micropredictions, LLC is supporting the open source code development.

 

How reliable is the underlying hosted database technology?

You can expect 99.99% uptime. We make no warranties.


Questions About Making Predictions

 

See An Introduction to Z-Streams.  

 

What am I submitting?

You submit 225 numbers.

 


Why aren't predictions a single number?

The power of this platform comes from the ability to combine many forecasts and this works better if algorithms supply distributional predictions.

Another (related) reason we ask algorithms for multiple scenarios is that if they supply only one, this single number (point estimate) can be difficult to interpret. Imagine asking for predictions of where the tennis ball will land next after it leaves Roger Federer’s racket. You can’t summarize that with a single number.

There is a third reason. Distributional estimates admit a stateless reward clearing mechanism that is O(1) in the number of contributions. This platform can handle roughly as many incoming scenarios as, say, Nasdaq processes trade orders. No shortage of cognitive diversity here!

 


Can my algorithms make their own predictions?

Absolutely. Why not publish your model residuals? You achieve at least ongoing performance analysis of your model, and also an accurate distributional description of your residuals. Perhaps also:

  1. Sooner or later your residuals will correlate to a data source you don't know about.
  2. You'll be able to combine the information with your existing model to make a better one.
 


What are the Z1-streams?

Every time a data feed value arrives, an implied z-score is computed based on the existing predictions from yourself and other algorithms. A secondary stream is automatically created where algorithms predict these normalized z-scores. (Your algorithm may be better at predicting the z-scores than the original margins, or vice versa).

See An Introduction to Z-Streams

 

What are the Z2-streams?

See An Introduction to Z-Streams

 

What are the Z3-streams?

See An Introduction to Z-Streams

 


Can I solicit multivariate distributional predictions?

It isn't directly supported ... but we can't stop you using the same space-filling curve trick that is employed for the z2~ and z3~ streams. You could provided a univariate time series that is intended to be unpacked into bivariate or trivariate say (which gives us an excuse to show the space filling curve again). However if you do this, you are very strongly encouraged to use the exact same conventions and code. See the microconventions package on GitHub or PyPI. To do otherwise is probably asking too much of the human and artificial life forms here. 

Can I predict my own stream?

Absolutely. See predicting and crawling instructions. As noted, you can use a different key to add volume and draw attention to a particular horizon.

 


Who contributes algorithms?

Anyone can, including algorithms that create algorithms.

 


What prevents someone from creating thousands of spurious algorithms?

Participation requires a write_key. Generating a write_key requires some appreciable compute time because write_keys are memorable unique identifiers (MUIDs). The system tracks a balance associated with each MUID and prevents participation once the balance drops below a bankruptcy threshold that is set at a negative number. Thus creating noisy algorithms is possible, but chews up CPU time.

 


I still don't understand MUIDs. What is a MUID?

Honestly you don’t have to worry too much about what MUIDs are. Just run the code and don't lose your key.

Memorable Unique Identifiers are just randomly generated numbers that happen to be very lucky, in the sense that the hex digest of their SHA-256 hash looks like a cute animal name. If that doesn't make sense then:

  1. See our introduction to private keys
  2. Or ... if you know what a hash is already just read the MUID README.
 

May I have a key please so I can create streams? 

We have a stockpile of keys (i.e. MUIDS) for worthwhile purposes. We have a very liberal definition of worthwhile. Contact us.

 


Can I help you generate MUIDs so they can support civic or worthwhile prediction?

What a great idea! Email us at info@microprediction.com if you have some spare CPU capacity.

 


Can I offer prize money?

Yes. It’s relatively painless and inexpensive to sponsor a competition. If your organization is interested in sponsoring, please email us at payments@microprediction.com.

 


Are others offering prize money?

Micropredictions, LLC will be offering compensation for various types of contribution to this project including, but not limited to, the creation of high performing prediction algorithms. 

(what this space for announcement)

 


Will it be maintained?

Yes. Micropredictions, LLC is supporting the open source code development.

 


How reliable is the underlying hosted database technology?

You can expect 99.99% uptime. We make no warranties.

Follow the Movement