Introduction

Python Module 5

Modifying how a crawler makes predictions

Modifying Crawler Prediction

In the module 2 we instantiated the default crawler and ran it from the cloud. 

Now, we will modify how it predicts. 

python_module_5_crawler_prediction

 

 

Here are the steps shown:

(a) Your identity (recap)

As with Module 1, and Module 2 we use Google colab to install the microprediction package

pip install microprediction

Then we import

from microprediction import new_key, MicroWriter

And burn a key

write_key = new_key(difficulty=9)

which lets us instantiate a MicroWriter as before (I recommend difficulty=10 or 11 instead). 

mw = MicroWriter(write_key=write_key)

We reveal the private write_key so you can cut and paste it into the dashboard

print(write_key)

 

(b) Hacker account on PythonAnywhere (recap)

As with Module 2 we visit Python Anywhere and set up a Hacker Account. As with Module 2 we open a bash console (Consoles->Bash) on PythonAnywhere and use pip3.8 to install the microprediction package.   

pip3.8 install --user --upgrade microprediction

Then we click on Files and create a directory for our crawler. 

(c) Subclass MicroCrawler

However, whereas before we simply ran the default crawler, this time we first subclassed the crawler and overwrote the default  sample method.

class MyCrawler(MicroCrawler):

After including some constructor boilerplate, we copied over the default sample method as well, then made a small modification to the way it generates 225 guesses of what the future value of the time series will be. 

The code created during this video is found at this gist which you can open in colab and modify. As per module 2 we used colab as a scratch space and then copied our debugged crawler to PythonAnywhere so it can run forever.

(d) Instantiate and run, as before

Now, instantiating your new type of crawler...

crawler = MyCrawler(write_key-write_key)

we can use the run method as we did in 

crawler.run()

The crawler can be run from the bash console. For example

python3.8 /home/yourusername/first_crawler/default_crawler.py

where I have used the full path so you can copy it over to an always on task (see Tasks menu on PythonAnywhere).

 

(There are, of course, plenty of other ways to develop and you might prefer to test locally. In a future module we'll go over the use of local virtual environments and IDEs for those not already familiar). 

But for now, as with previous Python modules, punch your write key into the dashboard (which greets you at Microprediction.org, or from the top right corner at Microprediction.com) to see how your crawler is doing. 

 

 

Summary

A crawler's predictive intelligence resides in its sample method, which you override when subclassing MicroCrawler.

Distributional prediction ...

Your crawler's sample method should return 225 guesses of what the next value in the time series will be (where by "next" we really mean the next to arrive after a delay of 70 seconds, say). The delay parameter is passed to the sample method and is denominated in seconds. See the gist for an example. 

... on the fly

Your sample method can do anything it likes, provided this occurs reasonably quickly. If your model requires parameter estimation or other lengthy calculations, you may wish to perform that elsewhere (discussed further in future modules) or find a way to do it incrementally (more on that too, to follow). 

Continue

In the next module we will show how to retrieve historical stream data