A technique where certain parts of the network use lower precision (like float16) to speed up computation and reduce memory usage while maintaining model quality. 27.07.2023 17:54 aior