Bidirectional Encoder Representations from Transformers (BERT)

Transformer based models pretrained with unsupervised task are state-of-the-art in NLP. We implement them for sequential data.

Pretrain tasks are implemented:

  • Replaced Token Detection (RTD) from ELECTRA
  • Next Sequence Prediction (NSP) from BERT
  • Sequences Order Prediction (SOP) from ALBERT
  • Masked Language Model (MLM) from ROBERTA

All of these tasks learn internal structure of the data and use it to make representation.

NSP, RTD learn:

  • 'global' representation of sequence
  • embedding for each transaction is an internal state of seq_encoder
  • embedding for all sequence is an output of seq_encoder

RTD learn:

  • 'local' representation of sequence
  • embedding for each transaction is an internal state of seq_encoder
  • embedding for all sequence available but aren't learned

MLM learn:

  • 'local' representation of sequence
  • embedding for each transaction from trx_encoder
  • pretrained MLM transformer as seq_encoder, CLS token aren't learned

MLM

ptls.frames.bert.MLMPretrainModule is a lightning module.

ptls.frames.bert.MlmDataset, ptls.frames.bert.MlmIterableDataset is a compatible datasets. ptls.frames.bert.MlmIndexedDataset is also compatible with MLM. MlmDataset dataset sample one slice for one user. MlmIndexedDataset sample all possible slices for each user. MlmIndexedDataset index the data this because it hasn't iterable-style variant.

RTD

ptls.frames.bert.RtdModule is a lightning module.

ptls.frames.bert.RtdDataset, ptls.frames.bert.RtdIterableDataset is a compatible datasets.

SOP

ptls.frames.bert.SopModule is a lightning module.

ptls.frames.bert.SopDataset, ptls.frames.bert.SopIterableDataset is a compatible datasets. Requires splitter from ptls.frames.coles.split_strategy

NSP

ptls.frames.bert.NspModule is a lightning module.

ptls.frames.bert.NspDataset, ptls.frames.bert.NspIterableDataset is a compatible datasets. Requires splitter from ptls.frames.coles.split_strategy

Classes

See docstrings for classes.

  • ptls.frames.bert.MlmDataset
  • ptls.frames.bert.MlmIterableDataset
  • ptls.frames.bert.MlmIndexedDataset
  • ptls.frames.bert.RtdDataset
  • ptls.frames.bert.RtdIterableDataset
  • ptls.frames.bert.SopDataset
  • ptls.frames.bert.SopIterableDataset
  • ptls.frames.bert.NspDataset
  • ptls.frames.bert.NspIterableDataset

  • ptls.frames.bert.MLMPretrainModule

  • ptls.frames.bert.RtdModule
  • ptls.frames.bert.SopModule
  • ptls.frames.bert.NspModule