GShard is a model parallelism library introduced by Google. It enables the training of giant-sized transformer models, with billions to trillions of parameters, across multiple devices. 27.07.2023 17:54 aior