ORTModule + OpenAI Triton Integration now available.Stability/usability improvements for webgpu.webnn ops coverage improvements (SAM, Stable Diffusion).webgpu ops coverage improvements (SAM, T5, Whisper).Swift Package Manager support for ONNX Runtime inference and ONNX Runtime extensions via onnxruntime-swift-package-manager.Mobile support for CLIPImageProcessor pre-processing and CLIP scenario.Improve React Native performance with JSI.Ops support: Equal, Less, LessOrEqual, Greater, GreaterOrEqual, LayerNorm, Asin, Sign, DepthToSpace, SpaceToDepth.Support for resize with asymmetric transformation mode on HTP backend.Enable context binary cache to reduce initialization time.Support user provided cuda compute stream.Allow CUDA allocator to be registered with ONNX Runtime externally.Relax CUDA Graph constraints to allow more models to utilize.Initial fp8 support (QDQ, Cast, MatMul).Add FlashAttention v2 support for Attention, MultiHeadAttention and PackedMultiHeadAttention ops.Optimize BeamScore to improve BeamSearch performance.Improve LLM quantization accuracy with smoothquant.Make Float16_t and BFloat16_t full featured fp16 interfaces that support conversion and expose floating properties (e.g.Make Float16 and BFloat16 full featured fp16 interfaces that support conversion and expose floating properties (e.g.This reduces garbage and exposes direct native memory access via Slice like interfaces. Expose OrtValue API as the new preferred API to run inference in C#.Support for external initializers so that large models that can be instantiated without filesystem access.On JDK 20 and newer the fp16 conversion methods use the JDK's Float.float16ToFloat and Float.floatToFloat16 methods which can be hardware accelerated and vectorized on some platforms. Support for fp16 and bf16 tensors as inputs and outputs, along with utilities to convert between these and fp32 data.New session option to disable default CPU EP fallback session.disable_cpu_ep_fallback. Support for serialization of models >=2GB.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |