In the module 2 we instantiated the default crawler and ran it from the cloud. In module 5 we changed the way it predicts.
Now, we will modify what it predicts.
Here are the steps shown:
As with Modules 1, 2 and 5 we use Google colab to install the microprediction package with!pip install microprediction
then we importfrom microprediction import new_key, MicroWriter
and burn a key in colab to avoid spending CPU time in PythonAnywhere. That is:write_key = new_key(difficulty=9)
Also as with Module 5 we're assuming that you've already visited Python Anywhere and set up a Hacker Account. As with Module 2 we open a bash console (Consoles->Bash) on PythonAnywhere and use pip3.8 to install the microprediction package. pip3.8 install --user --upgrade microprediction
Then we click on Files and create a directory for our crawler.
We subclass MicroCrawler as before
class ElectricityCrawler(MicroCrawler):
However this time it has a more instructive name. Also, we can supply arguments to the constructor that we did not previously. For example:
crawler = ElectricityCrawler(write_key-write_key, max_active=32)
limits the crawler to 32 active horizons. Other examples are found in the example gist As noted in the video, you can refer directly to the crawler code where it is evident which arguments you can supply.
Please note that I misspoke once in the video, referring to min_len
when I meantmin_lags.
The crawler code also makes apparent which methods can be modified when subclassing. We can modify the crawler's include_stream method:
def include_stream(self, name,**ignore):
For instance if we only care about electricity this function might simply return
return 'electricity' in name
and the crawler will only consider those streams. Conversely if we wish to eliminate some stream by similar pattern matching, we overwrite
def exclude_stream(self, name,**ignore):
and, also just as an example, to avoid predicting any derived streams:
return '~' not in name
These examples are found in the example gist.
Finally, and as with the previous two modules, we then instantiate the new crawler, set it loose, and punch our write key into the dashboard to see how it is doing.
You can change where your crawler directs effort in two ways:
See the gist for an example, but the best reference is the __init__ method of the crawler code.
Again the code for the crawler is once again the best place to go and makes it apparent which methods you can override - should you wish to steer your crawler towards, or away from, certain types of data stream.