Introduction

Python Module 2

Your first cloud crawler

Create Your First Crawler

In the previous module we submitted predictions to a single data stream.

Now, we will create a crawler that predicts many different streams. 

HubSpot Video

 

Here are the steps shown:

(a) Your identity

As with Module 1, we use Google colab to install the microprediction package

pip install microprediction

Then we import

from microprediction import new_key, MicroWriter

And burn a key

write_key = new_key(difficulty=9)

which lets us instantiate a MicroWriter as before (I recommend difficulty=10 or 11 instead). 

mw = MicroWriter(write_key=write_key)

We reveal the private write_key so you can cut and paste it into the dashboard

print(write_key)

 

(b) Hacker account on  Python Anywhere

We visit Python Anywhere and set up a Hacker Account. 

We establish a hacker account on PythonAnywhere, a cloud compute provider where you only pay for CPU compute seconds. This isn't the only way, and in future modules we'll cover other providers and other alternatives, such as running locally on your machine. 

(c) We set up our crawler

We open a bash console (Consoles->Bash) on PythonAnywhere and use pip3.8 to install the microprediction package.   

pip3.8 install --user --upgrade microprediction

Then we click on Files and create a directory for our crawler. Our Python script will be very simple. Aside from the from microprediction import MicroCrawler we need only instantiate it and run it. Viz:

crawler = MicroCrawler(write_key="65a4sdf65as")

where you substitute in your write key, of course. Then we simply use the run method.

crawler.run()

The crawler can be run from the bash console. For example

python3.8 /home/yourusername/first_crawler/default_crawler.py

where I have used the full path so you can copy it over to an always on task (see Tasks menu on PythonAnywhere).

 

Don't ask me why I called it mw instead of crawler in the video, but that doesn't matter. What matters is where you are on the leaderboard. So, as with the previous Python module, punch your write key into the dashboard (which greets you at Microprediction.org, or from the top right corner at Microprediction.com) to see how your crawler is doing. 

 

 

Summary

Once again it only took us ten minutes, and in this case two lines of Python, to kick off a crawler.

Python Anywhere

You can run your Python code anywhere you wish. I merely used PythonAnywhere as an example. 

Don't let it stop

Unlike your submission from the first Python module, this crawler will predict fast moving time series constantly. If you stop the program, your predictions will rapidly become stale and you may plummet down the leaderboards. For this reason we have recommended using an always-on task at Python Anywhere. There are plenty of alternatives, however, which we will cover in future modules. 

Continue

In the next module we will show how to retrieve historical stream data