Article 4QJ23 NVIDIA TensorRT 6 Breaks 10 millisecond barrier for BERT-Large

NVIDIA TensorRT 6 Breaks 10 millisecond barrier for BERT-Large

by
staff
from High-Performance Computing News Analysis | insideHPC on (#4QJ23)
tensorrt-150x150.jpg

Today, NVIDIA released TensorRT 6, which includes new capabilities that dramatically accelerate conversational AI applications, speech recognition, 3D image segmentation for medical applications, as well as image-based applications in industrial automation. TensorRT is a high-performance deep learning inference optimizer and runtime that delivers low latency, high-throughput inference for AI applications. "With today's release, TensorRT continues to expand its set of optimized layers, provides highly requested capabilities for conversational AI applications, delivering tighter integrations with frameworks to provide an easy path to deploy your applications on NVIDIA GPUs. In TensorRT 6, we're also releasing new optimizations that deliver inference for BERT-Large in only 5.8 ms on T4 GPUs, making it practical for enterprises to deploy this model in production for the first time."

The post NVIDIA TensorRT 6 Breaks 10 millisecond barrier for BERT-Large appeared first on insideHPC.

External Content
Source RSS or Atom Feed
Feed Location http://insidehpc.com/feed/
Feed Title High-Performance Computing News Analysis | insideHPC
Feed Link https://insidehpc.com/
Reply 0 comments