All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61988987%3A17610%2F19%3AA20020EU" target="_blank" >RIV/61988987:17610/19:A20020EU - isvavai.cz</a>

  • Result on the web

    <a href="http://dx.doi.org/10.18653/v1/P19-1439" target="_blank" >http://dx.doi.org/10.18653/v1/P19-1439</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.18653/v1/P19-1439" target="_blank" >10.18653/v1/P19-1439</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling

  • Original language description

    Natural language understanding has recently seen a surge of progress with the use of sentence encoders like ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2019) which are pretrained on variants of language modeling. We conduct the first large-scale systematic study of candidate pretraining tasks, comparing 19 different tasks both as alternatives and complements to language modeling. Our primary results support the use language modeling, especially when combined with pretraining on additional labeled-data tasks. However, our results are mixed across pretraining tasks and show some concerning trends: In ELMo's pretrain-then-freeze paradigm, random baselines are worryingly strong and results vary strikingly across target tasks. In addition, fine-tuning BERT on an intermediate task often negatively impacts downstream transfer. In a more positive trend, we see modest gains from multitask training, suggesting the development of more sophisticated multitask and transfer learning techniques as an avenue for further research.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

  • Continuities

    S - Specificky vyzkum na vysokych skolach

Others

  • Publication year

    2019

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Proceedings of the 57th Conference of the Association for Computational Linguistics

  • ISBN

    978-1-950737-48-2

  • ISSN

  • e-ISSN

  • Number of pages

    12

  • Pages from-to

    4465-4476

  • Publisher name

    Association for Computational Linguistics 2019

  • Place of publication

    Florence

  • Event location

    Florence

  • Event date

    Jul 28, 2019

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article

    000493046106098