Python FitCrawler

A crawler that periodically fits parameters

Slides for this video are available here.

Comal Cheetah

This talk is intended to help you understand the operation of Comal Cheetah. The crawler makes use of FitCrawler which in turn subclasses  SequentialStreamCrawler as this is an easy way to keep track of state stored for each stream. When the crawler has a few seconds to spare, it pulls a stream from a queue and tries to improve its parameters. Normally, however, there isn't time to fit so it merely updates itself using a new data point (by moving an anchor point towards the arriving data point). 

This talk also discusses some Python optimization packages such as HyperOpt, Optuna, PySOT, PyMoo and Scipy.Optimize. Our preliminary investigations suggest that the existing FitCrawler might benefit from a different choice of optimizer (it currently uses HyperOpt).   




There are subclasses of MicroCrawler in the microprediction package that you may choose to subclass.


See code for an example of a crawler that maintains state attached to each stream. By default this provides sample_using_state which its parent uses to sample. This means that if future changes to the time series are independent (a big if !) you don't need to implement your own sample method.  


See the code for ExpNormDist for an example of implementing a DistMachine, something that can be supplied to a SequentialFitCrawler. Morally speaking, a distribution machine is a distribution (of the next data point to arrive) that is also a state machine (has an update method) and knows its own distribution (has an inv_cdf method). 

These examples are best understood by tracing into the code, and the use is entirely optional. An alternative would be to derive directly from MicroCrawler and implement your own sample method directly, as we have done in the introductory modules.