a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
nlp gpu decoder machine-translation inference pytorch transformer albert bert roberta gpt2 huggingface-transformers
-
Updated
Jul 18, 2025 - C++