NVIDIA/TensorRT · Auto Wiki by Mutable.ai

Auto-generated from NVIDIA/TensorRT by Mutable.ai Auto WikiRevise

TensorRT
GitHub Repository
Developer	NVIDIA
Written in	C++
Stars	9.0k
Watchers	146
Created	05/02/2019
Last updated	04/03/2024
License	Apache License 2.0
Homepage	developer.nvidia.comtensorrt
Repository	NVIDIA/TensorRT
Auto Wiki
Revision	0
Software Version	0.0.8Basic
Generated from	Commit `147005`
Generated at	04/04/2024

The NVIDIA TensorRT repository is a comprehensive collection of tools, utilities, and sample applications that demonstrate the usage and capabilities of the TensorRT library for optimizing and deploying deep learning models. TensorRT is a high-performance deep learning inference optimizer and runtime engine developed by NVIDIA, designed to accelerate the execution of deep neural networks on NVIDIA GPUs.

The most important components of this repository include the TensorRT Python bindings, a diverse set of sample applications, and a collection of custom TensorRT plugins. The Python bindings, located in the python directory, provide a unified interface for working with various deep learning inference backends, such as ONNX Runtime, PyTorch, TensorFlow, and TensorRT itself. These bindings expose the core functionality of the TensorRT library, including the ability to define neural network models, configure inference, and manage resources.

The sample applications, found in the samples directory, showcase the integration of TensorRT with popular deep learning models and frameworks, such as Detectron2, EfficientNet, BiDAF, and YOLOv3. These samples demonstrate how to convert pre-trained models to the ONNX format, build TensorRT engines, and run inference on the optimized models. The samples also highlight advanced TensorRT features, such as algorithm selection, dynamic input reshaping, and custom plugin integration.

The custom TensorRT plugins, located in the plugin directory, extend the functionality of the TensorRT inference engine by providing specialized operations and layers. These plugins include implementations for critical components like the Fused Multi-Head Attention (FMHA) operation, efficient non-maximum suppression, modulated deformable convolution, and various normalization techniques. The plugins are designed to leverage the performance and efficiency of the TensorRT framework, allowing users to integrate custom functionality into their deep learning pipelines.

The repository also includes a set of tools and utilities, such as the Polygraphy tool, the TensorRT Engine Explorer, the ONNX GraphSurgeon, and quantization toolkits for PyTorch and TensorFlow. These tools provide additional functionality for working with TensorRT, including model inspection, transformation, and deployment optimization.

Overall, the NVIDIA TensorRT repository is a valuable resource for deep learning practitioners and researchers who need to deploy high-performance inference on NVIDIA hardware. The repository's modular design, comprehensive samples, and extensible plugin system make it a powerful platform for integrating TensorRT into a wide range of deep learning workflows.

TensorRT Python Bindings
Revise

References: python

The Python bindings for the TensorRT library provide a unified interface for working with various deep learning inference backends. The key functionality of the TensorRT Python bindings includes:

Core Functionality
Revise

References: python/src/infer

The core functionality of the TensorRT Python API is implemented in the …/infer directory. This directory contains the key interfaces and classes for defining neural network models, configuring inference, and working with various TensorRT components.

Algorithm Selection and Reporting
Revise

References: python/docstrings/infer

The IAlgorithmSelector interface is a key component in the TensorRT Python API that allows applications to customize the algorithm selection process and report on the algorithms used during inference.

Foundational Types
Revise

References: python/docstrings/infer

The TensorRT Python API provides a set of foundational types and data structures that are used throughout the library. These types serve as the building blocks for working with data, dimensions, and versioned interfaces in TensorRT.

Int8 Calibration
Revise

References: python/docstrings/infer

The pyInt8Doc.h file in the …/infer directory defines several classes that extend the IInt8Calibrator interface, which is used to perform calibration for 8-bit integer (Int8) inference in TensorRT.

Plugins
Revise

References: python/docstrings/infer

The TensorRT library provides a plugin system that allows users to implement custom layers and register them with the inference engine. This extends the capabilities of TensorRT beyond the built-in layers, enabling users to integrate specialized operations tailored to their specific deep learning models and use cases.

Packaging and Distribution
Revise

References: python/packaging/bindings_wheel

The packaging and distribution of the TensorRT Python bindings as a standalone wheel is handled in the …/bindings_wheel directory. This directory contains the core functionality for the TensorRT Python API, including utility functions, context manager functionality, and compatibility with various TensorRT types.

ONNX Parser
Revise

References: python/src/parsers

The ONNX Parser in the TensorRT Python Bindings provides the core functionality for parsing ONNX models and interacting with the parsing process. The main features include:

Entry Point
Revise

References: python/packaging/frontend_sdist/tensorrt

The tensorrt/__init__.py file is the main entry point for the TensorRT Python package. It serves as the primary interface for users to access the core functionality of the TensorRT library.

TensorRT Sample Applications
Revise

References: samples, samples/python

The samples directory contains a comprehensive set of sample applications and utilities that demonstrate the usage and capabilities of the NVIDIA TensorRT library. The samples are organized into three main categories: "Hello World" samples, TensorRT API samples, and application samples.

Detectron2 Integration
Revise

References: samples/python/detectron2

The …/detectron2 directory provides a comprehensive set of scripts for integrating the Detectron2 Mask R-CNN R50-FPN 3x model with NVIDIA TensorRT. The key functionality includes:

EfficientNet Integration
Revise

References: samples/python/efficientnet

The …/efficientnet directory focuses on the conversion, inference, and evaluation of the EfficientNet V1 and V2 models using NVIDIA TensorRT.

BiDAF Model Refitting
Revise

References: samples/python/engine_refit_onnx_bidaf

The …/engine_refit_onnx_bidaf directory demonstrates how to work with a refittable TensorRT engine using an ONNX-based Bidirectional Attention Flow (BiDAF) model.

Custom TensorRT Plugins
Revise

References: samples/python/python_plugin

The …/python_plugin directory showcases the implementation of a custom "Circular Padding Plugin" using various Python-based frameworks. The plugin is designed to perform circular padding on input tensors, a common operation in deep learning models.

Weight Stripping
Revise

References: samples/python/sample_weight_stripping

The …/sample_weight_stripping directory demonstrates how to build a TensorRT engine with weight stripping, which can reduce the engine's memory footprint.

Utility Scripts
Revise

References: samples/python/scripts

The …/scripts directory contains two utility scripts for downloading and extracting the MNIST dataset:

Progress Monitoring
Revise

References: samples/python/simple_progress_monitor

The …/simple_progress_monitor directory demonstrates how to implement a custom progress monitor during the TensorRT engine build process using the IProgressMonitor interface.

TensorFlow Object Detection API Integration
Revise

References: samples/python/tensorflow_object_detection_api

The …/tensorflow_object_detection_api directory provides a comprehensive set of scripts and utilities for running TensorFlow Object Detection API (TFOD) models using NVIDIA TensorRT. The key functionality includes:

YOLOv3 Integration
Revise

References: samples/python/yolov3_onnx

The …/yolov3_onnx directory contains a set of Python scripts that demonstrate the conversion of a pre-trained YOLOv3 object detection model from the DarkNet format to the ONNX format, and then the optimization and execution of the ONNX model using the TensorRT library.

Common Utilities
Revise

References: samples/common

The …/common directory provides a set of utility classes, functions, and headers that are commonly used across various TensorRT sample applications. This directory serves as a centralized location for reusable code, helping to ensure consistency and efficiency in the development of TensorRT-based projects.

EfficientDet Integration
Revise

References: samples/python/efficientdet

The …/ directory provides a comprehensive set of scripts and utilities for converting, optimizing, and executing the EfficientDet object detection model using NVIDIA TensorRT.

ONNX Custom Plugin
Revise

References: samples/python/onnx_custom_plugin

The …/onnx_custom_plugin directory contains the implementation of a custom TensorRT plugin for the Hardmax operation, as well as scripts for preprocessing an ONNX model, building a TensorRT engine, and testing the custom plugin.

TensorRT Tools
Revise

References: tools/Polygraphy, tools/tensorflow-quantization, tools/pytorch-quantization, tools/onnx-graphsurgeon, tools/experimental/trt-engine-explorer

The tools directory contains a collection of tools and utilities for optimizing and running various deep learning models using the NVIDIA TensorRT library.

Polygraphy
Revise

References: tools/Polygraphy

The Polygraphy tool is a part of the NVIDIA TensorRT project and provides a unified interface for working with various deep learning inference backends, including ONNX Runtime, PyTorch, TensorFlow, and TensorRT.

TensorFlow Quantization
Revise

References: tools/tensorflow-quantization

The …/tensorflow-quantization directory contains a toolkit for quantizing TensorFlow 2 models, enabling their deployment on NVIDIA hardware and software stacks, such as TensorRT. The key functionality in this directory includes:

PyTorch Quantization
Revise

References: tools/pytorch-quantization

The …/pytorch-quantization directory contains a toolkit for quantizing PyTorch models, enabling their deployment on NVIDIA hardware and software stacks, such as TensorRT.

ONNX GraphSurgeon
Revise

References: tools/onnx-graphsurgeon

The …/onnx-graphsurgeon directory provides the ONNX GraphSurgeon tool, which allows users to easily generate new ONNX graphs or modify existing ones. The tool is composed of three major components: Importers, the Intermediate Representation (IR), and Exporters.

TensorRT Engine Explorer (Experimental)
Revise

References: tools/experimental/trt-engine-explorer

The …/trt-engine-explorer directory contains a set of Python modules, scripts, and Jupyter notebooks that provide functionality for analyzing, visualizing, and optimizing TensorRT engine plans. The main components include:

TensorRT Plugins
Revise

References: plugin

The plugin directory contains a collection of custom TensorRT plugins that extend the functionality of the TensorRT inference engine. These plugins provide specialized operations and layers that can be used in neural network models.

Fused Multi-Head Attention (FMHA)
Revise

References: plugin/bertQKVToContextPlugin/fused_multihead_attention

The …/fused_multihead_attention directory contains the core functionality for the Fused Multi-Head Attention (FMHA) operation, which is a crucial component in the QKV to Context transformation.

QKV to Context Plugin Implementations
Revise

References: plugin/bertQKVToContextPlugin

The …/bertQKVToContextPlugin directory contains the implementation of a custom TensorRT plugin that performs the Query-Key-Value (QKV) to Context transformation, a key component in Transformer-based models like BERT. The plugin supports various data types (FP32, FP16, INT8), different GPU architectures, and includes optimizations for variable sequence lengths and explicit INT8 precision.

Padding and Unpadding Functionality
Revise

References: plugin/bertQKVToContextPlugin

The QkvPaddingRunner class is responsible for managing the padding and unpadding of input tensors for the Multi-Head Attention (MHA) operation in the BERT model. This class performs the following key functionality:

Efficient NMS Plugin
Revise

References: plugin/efficientNMSPlugin

The EfficientNMSPlugin is a TensorRT plugin that performs efficient non-maximum suppression (NMS) on object detection outputs. It supports both standard NMS operation and ONNX NonMaxSuppression op compatibility, allowing it to be used with a wide range of object detection models and frameworks.

Modulated Deformable Convolution Plugin
Revise

References: plugin/modulatedDeformConvPlugin

The ModulatedDeformableConvPluginDynamic class provides a custom TensorRT plugin for performing modulated deformable convolution, a key operation used in deep learning models for computer vision tasks.

Skip Layer Normalization Plugin
Revise

References: plugin/skipLayerNormPlugin

The SkipLayerNormPluginDynamic and SkipLayerNormVarSeqlenPlugin classes in the …/skipLayerNormPlugin directory provide custom TensorRT plugins for performing skip layer normalization, a common operation in transformer-based neural networks.

Embedding Layer Normalization Plugin
Revise

References: plugin/embLayerNormPlugin

The EmbLayerNormPluginDynamic class is a custom TensorRT plugin that performs embedding layer normalization, a crucial component in BERT-based natural language processing models. This plugin is designed to combine the input token IDs, segment IDs, and position embeddings, and then apply layer normalization to the resulting embedding.

Scatter Elements Plugin
Revise

References: plugin/scatterElementsPlugin

The ScatterElementsPlugin provides a custom TensorRT plugin for performing a scatter operation with various reduction types, such as sum, multiplication, mean, minimum, and maximum.

ROI Align Plugin
Revise

References: plugin/roiAlignPlugin

The ROIAlign plugin is a custom TensorRT plugin that performs the Region of Interest (ROI) Align operation, a common operation used in object detection and instance segmentation tasks. The plugin is designed to provide a highly optimized and configurable implementation of the ROI Align operation, allowing it to be seamlessly integrated into a TensorRT-based inference pipeline.

Multiscale Deformable Attention Plugin
Revise

References: plugin/multiscaleDeformableAttnPlugin

The MultiscaleDeformableAttnPlugin class provides a custom TensorRT plugin for the Multiscale Deformable Attention (MSDA) operation, a key component in deep learning models for computer vision tasks.

Multilevel Propose ROI Plugin
Revise

References: plugin/multilevelProposeROI

The MultilevelProposeROI plugin is a crucial component in the sampleMaskRCNN application, responsible for generating the first-stage detection (ROI candidates) from the Region Proposal Network (RPN) outputs and pre-defined anchors. The plugin takes two input tensors, object_score and object_delta, and generates one output tensor of ROI candidates.

Instance Normalization Plugin
Revise

References: plugin/instanceNormalizationPlugin

The InstanceNormalizationPlugin is a custom TensorRT plugin that performs instance normalization, a common operation in deep learning models for image generation tasks. The plugin is based on the ONNX opset 6 definition for InstanceNormalization.

Clip Plugin
Revise

References: plugin/clipPlugin

The ClipPlugin is a custom TensorRT plugin that applies a clipping operation to the input tensor, limiting the values to a specified minimum and maximum range. The plugin is implemented in the …/clipPlugin directory and consists of the following key components:

Batched NMS Plugin
Revise

References: plugin/batchedNMSPlugin

The BatchedNMSPlugin and BatchedNMSDynamicPlugin classes are custom TensorRT plugins that provide an efficient implementation of batched non-maximum suppression (NMS) for object detection models.

Voxel Generator Plugin
Revise

References: plugin/voxelGeneratorPlugin

The VoxelGeneratorPlugin is a custom TensorRT plugin responsible for generating voxel features from point cloud data, a crucial component in 3D object detection tasks. The plugin takes in a point cloud and point count, and outputs a set of voxel features, voxel coordinates, and voxel parameters.

Special Slice Plugin
Revise

References: plugin/specialSlicePlugin

The SpecialSlice plugin is a custom TensorRT plugin that is used to perform a specialized slicing operation on input tensors. This plugin is likely used as part of a larger neural network model, such as MaskRCNN, to extract specific information from the model's output.

Resize Nearest Plugin
Revise

References: plugin/resizeNearestPlugin

The ResizeNearest plugin is a custom TensorRT plugin that performs nearest-neighbor image resizing, a common operation in computer vision tasks. The plugin is designed to efficiently resize the spatial dimensions (height and width) of input tensors while keeping the other dimensions unchanged.

Reorg Plugin
Revise

References: plugin/reorgPlugin

The Reorg class template provides the core functionality for the "reorg" operation, which is commonly used in deep learning models, particularly in object detection tasks. The Reorg class is specialized into two subclasses: ReorgStatic and ReorgDynamic, which handle static and dynamic input shapes, respectively.

Region Plugin
Revise

References: plugin/regionPlugin

The Region plugin is a custom TensorRT plugin specifically designed for object detection tasks, particularly for the YOLOv2 model. The plugin is responsible for processing the output of a neural network to generate bounding boxes, objectness scores, and class probabilities for detected objects.

Pyramid ROI Align Plugin
Revise

References: plugin/pyramidROIAlignPlugin

The PyramidROIAlign plugin is a custom TensorRT plugin that performs Region of Interest (ROI) alignment on feature maps from a Convolutional Neural Network (CNN) using a feature pyramid approach. This functionality is commonly used in object detection and segmentation tasks, such as in Faster R-CNN and Mask R-CNN models.

Proposal Plugin
Revise

References: plugin/proposalPlugin

The ProposalPlugin and ProposalDynamicPlugin classes are responsible for generating object proposals in object detection models, specifically for the Faster R-CNN architecture. These plugins handle tasks such as anchor generation, non-maximum suppression, and bounding box regression to produce the final set of region of interest (ROI) bounding boxes.

TensorRT Core Functionality
Revise

References: include, parsers

The core functionality of the NVIDIA TensorRT library is defined in the header files and implementation details located in the include directory. This includes the runtime, parsers, and consistency checking components.

Runtime and Consistency Checking
Revise

References: include/NvInferImpl.h, include/NvInferConsistency.h

The core runtime functionality of the NVIDIA TensorRT library is defined in the NvInferImpl.h file. This file contains the implementation details of various API methods and classes that are crucial for working with TensorRT.

Plugins and Utilities
Revise

References: include/NvInferPlugin.h, include/NvInferPluginUtils.h, include/NvInferRuntimeBase.h, include/NvInferRuntimePlugin.h

The NvInferPlugin.h file provides an API for using the NVIDIA-provided TensorRT plugins. It defines a single function, initLibNvInferPlugins(), which is used to initialize and register all the existing TensorRT plugins to the Plugin Registry, with an optional namespace. This function should be called once before accessing the Plugin Registry.

Safe Runtime
Revise

References: include/NvInferSafeRuntime.h

The safe runtime API for TensorRT provides a functionally safe execution environment for deploying TensorRT models in critical applications. The key components of this API are:

Version and Configuration Management
Revise

References: include/NvInferVersion.h, include/NvOnnxConfig.h

The NvInferVersion.h file defines the version information for the NVIDIA TensorRT library and its components. This includes the major, minor, patch, and build numbers, as well as the release type (early access, release candidate, or general availability).

Parsers
Revise

References: parsers, parsers/common

The parsers directory contains the implementation of various parsers used in the TensorRT library, which is a C++ library for optimizing and deploying deep learning models. The main components in this directory are:

TensorRT Docker Support
Revise

References: docker

The docker directory contains scripts and configuration files for building and launching Docker containers for the NVIDIA TensorRT deep learning inference optimization library.

Building Docker Images
Revise

References: docker/build.sh, docker/ubuntu-20.04.Dockerfile, docker/ubuntu-22.04.Dockerfile

The build.sh script in the docker directory is responsible for building Docker images for the NVIDIA TensorRT deep learning inference optimization library. This script allows users to customize the Docker file, image name, and CUDA version used in the build process.

Launching Docker Containers
Revise

References: docker/launch.sh

The launch.sh script in the docker directory is responsible for launching Docker containers for the NVIDIA TensorRT deep learning inference optimization library. The script provides a convenient way to start a TensorRT Docker container with various configuration options.

TensorRT Quickstart Guide
Revise

References: quickstart

The TensorRT Quickstart Guide provides introductory examples and tutorials for working with NVIDIA TensorRT, a high-performance deep learning inference optimizer and runtime engine. This guide covers the following key areas:

TensorRT Quickstart Guide Utilities
Revise

References: quickstart/common

The …/common directory contains several utility files that provide common functionality for working with TensorRT, particularly related to image processing and logging.

TensorFlow-TensorRT (TF-TRT) Model Integration
Revise

References: quickstart/IntroNotebooks

The …/IntroNotebooks directory provides utility files and example notebooks for working with TensorFlow-TensorRT (TF-TRT) models. The main functionality is provided in the helper.py and onnx_helper.py files.

Semantic Segmentation with TensorRT
Revise

References: quickstart/SemanticSegmentation

The …/SemanticSegmentation directory contains code that demonstrates the process of exporting a pre-trained PyTorch model for semantic segmentation to the ONNX format, and then running inference on the exported ONNX model using the TensorRT runtime.

Deploying TensorRT Models to Triton Inference Server
Revise

References: quickstart/deploy_to_triton

This subsection provides instructions and sample code for deploying a pre-trained ResNet-50 model, optimized using NVIDIA TensorRT, on the Triton Inference Server.

TensorRT

TensorRT Python BindingsRevise

Core FunctionalityRevise

Algorithm Selection and ReportingRevise

Foundational TypesRevise

Int8 CalibrationRevise

PluginsRevise

Packaging and DistributionRevise

ONNX ParserRevise

Entry PointRevise

TensorRT Sample ApplicationsRevise

Detectron2 IntegrationRevise

EfficientNet IntegrationRevise

BiDAF Model RefittingRevise

Custom TensorRT PluginsRevise

Weight StrippingRevise

Utility ScriptsRevise

Progress MonitoringRevise

TensorFlow Object Detection API IntegrationRevise

YOLOv3 IntegrationRevise

Common UtilitiesRevise

EfficientDet IntegrationRevise

ONNX Custom PluginRevise

TensorRT ToolsRevise

PolygraphyRevise

TensorFlow QuantizationRevise

PyTorch QuantizationRevise

ONNX GraphSurgeonRevise

TensorRT Engine Explorer (Experimental)Revise

TensorRT PluginsRevise

Fused Multi-Head Attention (FMHA)Revise

QKV to Context Plugin ImplementationsRevise

Padding and Unpadding FunctionalityRevise

Efficient NMS PluginRevise

Modulated Deformable Convolution PluginRevise

Skip Layer Normalization PluginRevise

Embedding Layer Normalization PluginRevise

Scatter Elements PluginRevise

ROI Align PluginRevise

Multiscale Deformable Attention PluginRevise

Multilevel Propose ROI PluginRevise

Instance Normalization PluginRevise

Clip PluginRevise

Batched NMS PluginRevise

Voxel Generator PluginRevise

Special Slice PluginRevise

Resize Nearest PluginRevise

Reorg PluginRevise

Region PluginRevise

Pyramid ROI Align PluginRevise

Proposal PluginRevise

TensorRT Core FunctionalityRevise

Runtime and Consistency CheckingRevise

Plugins and UtilitiesRevise

Safe RuntimeRevise

Version and Configuration ManagementRevise

ParsersRevise

TensorRT Docker SupportRevise

Building Docker ImagesRevise

Launching Docker ContainersRevise

TensorRT Quickstart GuideRevise

TensorRT Quickstart Guide UtilitiesRevise

TensorFlow-TensorRT (TF-TRT) Model IntegrationRevise

Semantic Segmentation with TensorRTRevise

Deploying TensorRT Models to Triton Inference ServerRevise

TensorRT Python Bindings
Revise

Core Functionality
Revise

Algorithm Selection and Reporting
Revise

Foundational Types
Revise

Int8 Calibration
Revise

Plugins
Revise

Packaging and Distribution
Revise

ONNX Parser
Revise

Entry Point
Revise

TensorRT Sample Applications
Revise

Detectron2 Integration
Revise

EfficientNet Integration
Revise

BiDAF Model Refitting
Revise

Custom TensorRT Plugins
Revise

Weight Stripping
Revise

Utility Scripts
Revise

Progress Monitoring
Revise

TensorFlow Object Detection API Integration
Revise

YOLOv3 Integration
Revise

Common Utilities
Revise

EfficientDet Integration
Revise

ONNX Custom Plugin
Revise

TensorRT Tools
Revise

Polygraphy
Revise

TensorFlow Quantization
Revise

PyTorch Quantization
Revise

ONNX GraphSurgeon
Revise

TensorRT Engine Explorer (Experimental)
Revise

TensorRT Plugins
Revise

Fused Multi-Head Attention (FMHA)
Revise

QKV to Context Plugin Implementations
Revise

Padding and Unpadding Functionality
Revise

Efficient NMS Plugin
Revise

Modulated Deformable Convolution Plugin
Revise

Skip Layer Normalization Plugin
Revise

Embedding Layer Normalization Plugin
Revise

Scatter Elements Plugin
Revise

ROI Align Plugin
Revise

Multiscale Deformable Attention Plugin
Revise

Multilevel Propose ROI Plugin
Revise

Instance Normalization Plugin
Revise

Clip Plugin
Revise

Batched NMS Plugin
Revise

Voxel Generator Plugin
Revise

Special Slice Plugin
Revise

Resize Nearest Plugin
Revise

Reorg Plugin
Revise

Region Plugin
Revise

Pyramid ROI Align Plugin
Revise

Proposal Plugin
Revise

TensorRT Core Functionality
Revise

Runtime and Consistency Checking
Revise

Plugins and Utilities
Revise

Safe Runtime
Revise

Version and Configuration Management
Revise

Parsers
Revise

TensorRT Docker Support
Revise

Building Docker Images
Revise

Launching Docker Containers
Revise

TensorRT Quickstart Guide
Revise

TensorRT Quickstart Guide Utilities
Revise

TensorFlow-TensorRT (TF-TRT) Model Integration
Revise

Semantic Segmentation with TensorRT
Revise

Deploying TensorRT Models to Triton Inference Server
Revise