last entries
random

electra

ELECTRA uses a discriminative training objective to pack more learning into the same amount of pre-training compute. It replaces the masked language model objective with a new pre-training task that is more sample-efficient.