Introduction

Python Module 6

Modifying how a crawler chooses what to predict

Modifying Crawler Navigation

In the module 2 we instantiated the default crawler and ran it from the cloud. In module 5 we changed the way it predicts. 

Now, we will modify what it predicts. 

 

Here are the steps shown:

(a) Basic setup (recap)

As with Modules 1, 2 and 5 we use Google colab to install the microprediction package with!pip install microprediction   then we importfrom microprediction import new_key, MicroWriterand burn a key in colab to avoid spending CPU time in PythonAnywhere. That is:write_key = new_key(difficulty=9)Also as with Module 5 we're assuming that you've already visited Python Anywhere and set up a Hacker Account. As with Module 2 we open a bash console (Consoles->Bash) on PythonAnywhere and use pip3.8 to install the microprediction package.   pip3.8 install --user --upgrade micropredictionThen we click on Files and create a directory for our crawler.

(a) Constructor arguments

We subclass MicroCrawler as before

class ElectricityCrawler(MicroCrawler):

However this time it has a more instructive name. Also, we can supply arguments to the constructor that we did not previously. For example:

crawler = ElectricityCrawler(write_key-write_key, max_active=32)

limits the crawler to 32 active horizons.  Other examples are found in the example gist As noted in the video, you can refer directly to the crawler code where it is evident which arguments you can supply.

Please note that I misspoke once in the video, referring to min_lenwhen I meantmin_lags.

(c) Stream selector methods

The crawler code also makes apparent which methods can be modified when subclassing. We can modify the crawler's include_stream method:

def include_stream(self, name,**ignore):

For instance if we only care about electricity this function might simply return 

return 'electricity' in name

and the crawler will only consider those streams. Conversely if we wish to eliminate some stream by similar pattern matching, we overwrite 

def exclude_stream(self, name,**ignore):

and, also just as an example, to avoid predicting any derived streams:

return '~' not in name

These examples are found in the example gist.

Finally, and as with the previous two modules, we then instantiate the new crawler, set it loose, and punch our write key into the dashboard to see how it is doing. 

 

 

Summary

You can change where your crawler directs effort in two ways:

Constructor arguments

See the gist for an example, but the best reference is the __init__ method of the crawler code. 

Selector methods

Again the code for the crawler is once again the best place to go and makes it apparent which methods you can override - should you wish to steer your crawler towards, or away from, certain types of data stream.  

Congratulations

You are through the introductory Python modules