ELECTRA uses a discriminative training objective to pack more learning into the same amount of pre-training compute. It replaces the masked language model objective with a new pre-training task that is more sample-efficient. 27.07.2023 17:54 aior