torchtext/models/roberta/bundler.py [236:247]:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    RoBERTa iterates on BERT's pretraining procedure, including training the model longer,
    with bigger batches over more data; removing the next sentence prediction objective;
    training on longer sequences; and dynamically changing the masking pattern applied
    to the training data.

    The RoBERTa model was pretrained on the reunion of five datasets: BookCorpus,
    English Wikipedia, CC-News, OpenWebText, and STORIES. Together theses datasets
    contain over a 160GB of text.

    Originally published by the authors of RoBERTa under MIT License
    and redistributed with the same license.
    [`License <https://github.com/pytorch/fairseq/blob/main/LICENSE>`__,
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -


torchtext/models/roberta/bundler.py [282:293]:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    RoBERTa iterates on BERT's pretraining procedure, including training the model longer,
    with bigger batches over more data; removing the next sentence prediction objective;
    training on longer sequences; and dynamically changing the masking pattern applied
    to the training data.

    The RoBERTa model was pretrained on the reunion of five datasets: BookCorpus,
    English Wikipedia, CC-News, OpenWebText, and STORIES. Together theses datasets
    contain over a 160GB of text.

    Originally published by the authors of RoBERTa under MIT License
    and redistributed with the same license.
    [`License <https://github.com/pytorch/fairseq/blob/main/LICENSE>`__,
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -