Welcome to pytorch-lifestream docs
Library content
Here is a brief overview of library with links to the detailed descriptions.
Library modules:
-
ptls.preprocessing
- transforms data toptls
-compatible format withpandas
orpyspark
. Categorical encoding, datetime transformation, numerical feature preprocessing. -
ptls.data_load
- all that you need for prepare your data to training and validation.ptls.data_load.datasets
- PyTorchDataset
API implementation for data access.ptls.data_load.iterable_processing
- generator-style filters for data transformation.ptls.data_load.augmentations
- functions for data augmentation.
-
ptls.frames
- tools for training encoders with popular frameworks like CoLES, SimCLR, CPC, VICReg, ...ptls.frames.coles
- Contrastive leaning on sub-sequences.ptls.frames.cpc
- Contrastive learning for future event state prediction.ptls.frames.bert
- methods, inspired by NLP and transformer models.ptls.framed.supervised
- modules fo supervised training.ptls.frames.inference
- inference module.
-
ptls.nn
- layers for model creation:ptls.nn.trx_encoder
- layers to produce the representation for a single transactions.ptls.nn.seq_encoder
- layers for sequence processing, likeRNN
ofTransformer
.ptls.nn.pb
-PaddedBatch
compatible layers, similar totorch.nn
modules, but works withptls-data
.ptls.nn.head
- composite layers for final embedding transformation.ptls.nn.seq_step.py
- change the sequence along the time axis.ptls.nn.binarization
,ptls.nn.normalization
- other groups of layers.
How to guide
- Prepare your data.
- Use
Pyspark
in local or cluster mode for big dataset andPandas
for small. - Split data into required parts (train, valid, test, ...).
- Use
ptls.preprocessing
for simple data preparation. - Transform features to compatible format using
Pyspark
orPandas
functions. You can also useptls.data_load.preprocessing
for common data transformation patterns. - Split sequences to
ptls-data
format withptls.data_load.split_tools
. Save prepared data intoParquet
format or keep it in memory (Pickle
also works). - Use one of the available
ptls.data_load.datasets
to define input for the models.
- Use
- Choose framework for encoder train.
- There are both supervised of unsupervised frameworks in
ptls.frames
. - Keep in mind that each framework requires his own batch format. Tools for batch collate can be found in the selected framework package.
- There are both supervised of unsupervised frameworks in
- Build encoder.
- All parts are available in
ptls.nn
. - You can also use pretrained layers.
- All parts are available in
- Train your encoder with selected framework and
pytorch_lightning
.- Provide data with one of the DataLoaders that is compatible with selected framework.
- Monitor the progress on tensorboard.
- Optionally tune hyperparameters.
- Save trained encoder for future use.
- You can use it as single solution (e.g. get class label probabilities).
- Or it can be a pretrained part of other neural network.
- Use encoder in your project.
- Run predict for your data and get logits, probas, scores or embeddings.
- Use
ptls.data_load
andptls.data_load.datasets
tools to keep your data transformation and collect batches for inference.
How to create your own components
It is possible create specific component for every library modules. Here are the links to the detailed description: