Mutable.ai logo
 Auto Wiki by Mutable.ai
Create your own wiki
AI-generated instantly
Updates automatically
Solo and team plans

TensorRT

Auto-generated from NVIDIA/TensorRT by Mutable.ai Auto Wiki
TensorRT
GitHub Repository
DeveloperNVIDIA
Written inC++
Stars9.0k
Watchers146
Created05/02/2019
Last updated04/03/2024
LicenseApache License 2.0
Homepagedeveloper.nvidia.com/tensorrt
RepositoryNVIDIA/TensorRT
Auto Wiki
Revision0
Software Version0.0.8Basic
Generated fromCommit 147005
Generated at04/04/2024

The NVIDIA TensorRT repository is a comprehensive collection of tools, utilities, and sample applications that demonstrate the usage and capabilities of the TensorRT library for optimizing and deploying deep learning models. TensorRT is a high-performance deep learning inference optimizer and runtime engine developed by NVIDIA, designed to accelerate the execution of deep neural networks on NVIDIA GPUs.

The most important components of this repository include the TensorRT Python bindings, a diverse set of sample applications, and a collection of custom TensorRT plugins. The Python bindings, located in the python directory, provide a unified interface for working with various deep learning inference backends, such as ONNX Runtime, PyTorch, TensorFlow, and TensorRT itself. These bindings expose the core functionality of the TensorRT library, including the ability to define neural network models, configure inference, and manage resources.

The sample applications, found in the samples directory, showcase the integration of TensorRT with popular deep learning models and frameworks, such as Detectron2, EfficientNet, BiDAF, and YOLOv3. These samples demonstrate how to convert pre-trained models to the ONNX format, build TensorRT engines, and run inference on the optimized models. The samples also highlight advanced TensorRT features, such as algorithm selection, dynamic input reshaping, and custom plugin integration.

The custom TensorRT plugins, located in the plugin directory, extend the functionality of the TensorRT inference engine by providing specialized operations and layers. These plugins include implementations for critical components like the Fused Multi-Head Attention (FMHA) operation, efficient non-maximum suppression, modulated deformable convolution, and various normalization techniques. The plugins are designed to leverage the performance and efficiency of the TensorRT framework, allowing users to integrate custom functionality into their deep learning pipelines.

The repository also includes a set of tools and utilities, such as the Polygraphy tool, the TensorRT Engine Explorer, the ONNX GraphSurgeon, and quantization toolkits for PyTorch and TensorFlow. These tools provide additional functionality for working with TensorRT, including model inspection, transformation, and deployment optimization.

Overall, the NVIDIA TensorRT repository is a valuable resource for deep learning practitioners and researchers who need to deploy high-performance inference on NVIDIA hardware. The repository's modular design, comprehensive samples, and extensible plugin system make it a powerful platform for integrating TensorRT into a wide range of deep learning workflows.

TensorRT Python Bindings

References: python

The Python bindings for the TensorRT library provide a unified interface for working with various deep learning inference backends. The key functionality of the TensorRT Python bindings includes:

Read more

Core Functionality

References: python/src/infer

• • •
Architecture Diagram for Core Functionality
Architecture Diagram for Core Functionality

The core functionality of the TensorRT Python API is implemented in the …/infer directory. This directory contains the key interfaces and classes for defining neural network models, configuring inference, and working with various TensorRT components.

Read more

Algorithm Selection and Reporting

The IAlgorithmSelector interface is a key component in the TensorRT Python API that allows applications to customize the algorithm selection process and report on the algorithms used during inference.

Read more

Foundational Types

The TensorRT Python API provides a set of foundational types and data structures that are used throughout the library. These types serve as the building blocks for working with data, dimensions, and versioned interfaces in TensorRT.

Read more

Int8 Calibration

The pyInt8Doc.h file in the …/infer directory defines several classes that extend the IInt8Calibrator interface, which is used to perform calibration for 8-bit integer (Int8) inference in TensorRT.

Read more

Plugins

• • •
Architecture Diagram for Plugins
Architecture Diagram for Plugins

The TensorRT library provides a plugin system that allows users to implement custom layers and register them with the inference engine. This extends the capabilities of TensorRT beyond the built-in layers, enabling users to integrate specialized operations tailored to their specific deep learning models and use cases.

Read more

Packaging and Distribution

• • •
Architecture Diagram for Packaging and Distribution
Architecture Diagram for Packaging and Distribution

The packaging and distribution of the TensorRT Python bindings as a standalone wheel is handled in the …/bindings_wheel directory. This directory contains the core functionality for the TensorRT Python API, including utility functions, context manager functionality, and compatibility with various TensorRT types.

Read more

ONNX Parser

References: python/src/parsers

The ONNX Parser in the TensorRT Python Bindings provides the core functionality for parsing ONNX models and interacting with the parsing process. The main features include:

Read more

Entry Point

• • •
Architecture Diagram for Entry Point
Architecture Diagram for Entry Point

The tensorrt/__init__.py file is the main entry point for the TensorRT Python package. It serves as the primary interface for users to access the core functionality of the TensorRT library.

Read more

TensorRT Sample Applications

References: samples, samples/python

The samples directory contains a comprehensive set of sample applications and utilities that demonstrate the usage and capabilities of the NVIDIA TensorRT library. The samples are organized into three main categories: "Hello World" samples, TensorRT API samples, and application samples.

Read more

Detectron2 Integration

• • •
Architecture Diagram for Detectron2 Integration
Architecture Diagram for Detectron2 Integration

The …/detectron2 directory provides a comprehensive set of scripts for integrating the Detectron2 Mask R-CNN R50-FPN 3x model with NVIDIA TensorRT. The key functionality includes:

Read more

EfficientNet Integration

• • •
Architecture Diagram for EfficientNet Integration
Architecture Diagram for EfficientNet Integration

The …/efficientnet directory focuses on the conversion, inference, and evaluation of the EfficientNet V1 and V2 models using NVIDIA TensorRT.

Read more

BiDAF Model Refitting

• • •
Architecture Diagram for BiDAF Model Refitting
Architecture Diagram for BiDAF Model Refitting

The …/engine_refit_onnx_bidaf directory demonstrates how to work with a refittable TensorRT engine using an ONNX-based Bidirectional Attention Flow (BiDAF) model.

Read more

Custom TensorRT Plugins

The …/python_plugin directory showcases the implementation of a custom "Circular Padding Plugin" using various Python-based frameworks. The plugin is designed to perform circular padding on input tensors, a common operation in deep learning models.

Read more

Weight Stripping

• • •
Architecture Diagram for Weight Stripping
Architecture Diagram for Weight Stripping

The …/sample_weight_stripping directory demonstrates how to build a TensorRT engine with weight stripping, which can reduce the engine's memory footprint.

Read more

Utility Scripts

• • •
Architecture Diagram for Utility Scripts
Architecture Diagram for Utility Scripts

The …/scripts directory contains two utility scripts for downloading and extracting the MNIST dataset:

Read more

Progress Monitoring

• • •
Architecture Diagram for Progress Monitoring
Architecture Diagram for Progress Monitoring

The …/simple_progress_monitor directory demonstrates how to implement a custom progress monitor during the TensorRT engine build process using the IProgressMonitor interface.

Read more

TensorFlow Object Detection API Integration

• • •
Architecture Diagram for TensorFlow Object Detection API Integration
Architecture Diagram for TensorFlow Object Detection API Integration

The …/tensorflow_object_detection_api directory provides a comprehensive set of scripts and utilities for running TensorFlow Object Detection API (TFOD) models using NVIDIA TensorRT. The key functionality includes:

Read more

YOLOv3 Integration

• • •
Architecture Diagram for YOLOv3 Integration
Architecture Diagram for YOLOv3 Integration

The …/yolov3_onnx directory contains a set of Python scripts that demonstrate the conversion of a pre-trained YOLOv3 object detection model from the DarkNet format to the ONNX format, and then the optimization and execution of the ONNX model using the TensorRT library.

Read more

Common Utilities

References: samples/common

The …/common directory provides a set of utility classes, functions, and headers that are commonly used across various TensorRT sample applications. This directory serves as a centralized location for reusable code, helping to ensure consistency and efficiency in the development of TensorRT-based projects.

Read more

EfficientDet Integration

• • •
Architecture Diagram for EfficientDet Integration
Architecture Diagram for EfficientDet Integration

The …/ directory provides a comprehensive set of scripts and utilities for converting, optimizing, and executing the EfficientDet object detection model using NVIDIA TensorRT.

Read more

ONNX Custom Plugin

• • •
Architecture Diagram for ONNX Custom Plugin
Architecture Diagram for ONNX Custom Plugin

The …/onnx_custom_plugin directory contains the implementation of a custom TensorRT plugin for the Hardmax operation, as well as scripts for preprocessing an ONNX model, building a TensorRT engine, and testing the custom plugin.

Read more

TensorRT Tools

The tools directory contains a collection of tools and utilities for optimizing and running various deep learning models using the NVIDIA TensorRT library.

Read more

Polygraphy

References: tools/Polygraphy

• • •
Architecture Diagram for Polygraphy
Architecture Diagram for Polygraphy

The Polygraphy tool is a part of the NVIDIA TensorRT project and provides a unified interface for working with various deep learning inference backends, including ONNX Runtime, PyTorch, TensorFlow, and TensorRT.

Read more

TensorFlow Quantization

The …/tensorflow-quantization directory contains a toolkit for quantizing TensorFlow 2 models, enabling their deployment on NVIDIA hardware and software stacks, such as TensorRT. The key functionality in this directory includes:

Read more

PyTorch Quantization

• • •
Architecture Diagram for PyTorch Quantization
Architecture Diagram for PyTorch Quantization

The …/pytorch-quantization directory contains a toolkit for quantizing PyTorch models, enabling their deployment on NVIDIA hardware and software stacks, such as TensorRT.

Read more

ONNX GraphSurgeon

• • •
Architecture Diagram for ONNX GraphSurgeon
Architecture Diagram for ONNX GraphSurgeon

The …/onnx-graphsurgeon directory provides the ONNX GraphSurgeon tool, which allows users to easily generate new ONNX graphs or modify existing ones. The tool is composed of three major components: Importers, the Intermediate Representation (IR), and Exporters.

Read more

TensorRT Engine Explorer (Experimental)

• • •
Architecture Diagram for TensorRT Engine Explorer (Experimental)
Architecture Diagram for TensorRT Engine Explorer (Experimental)

The …/trt-engine-explorer directory contains a set of Python modules, scripts, and Jupyter notebooks that provide functionality for analyzing, visualizing, and optimizing TensorRT engine plans. The main components include:

Read more

TensorRT Plugins

References: plugin

The plugin directory contains a collection of custom TensorRT plugins that extend the functionality of the TensorRT inference engine. These plugins provide specialized operations and layers that can be used in neural network models.

Read more

Fused Multi-Head Attention (FMHA)

• • •
Architecture Diagram for Fused Multi-Head Attention (FMHA)
Architecture Diagram for Fused Multi-Head Attention (FMHA)

The …/fused_multihead_attention directory contains the core functionality for the Fused Multi-Head Attention (FMHA) operation, which is a crucial component in the QKV to Context transformation.

Read more

QKV to Context Plugin Implementations

The …/bertQKVToContextPlugin directory contains the implementation of a custom TensorRT plugin that performs the Query-Key-Value (QKV) to Context transformation, a key component in Transformer-based models like BERT. The plugin supports various data types (FP32, FP16, INT8), different GPU architectures, and includes optimizations for variable sequence lengths and explicit INT8 precision.

Read more

Padding and Unpadding Functionality

The QkvPaddingRunner class is responsible for managing the padding and unpadding of input tensors for the Multi-Head Attention (MHA) operation in the BERT model. This class performs the following key functionality:

Read more

Efficient NMS Plugin

The EfficientNMSPlugin is a TensorRT plugin that performs efficient non-maximum suppression (NMS) on object detection outputs. It supports both standard NMS operation and ONNX NonMaxSuppression op compatibility, allowing it to be used with a wide range of object detection models and frameworks.

Read more

Modulated Deformable Convolution Plugin

• • •
Architecture Diagram for Modulated Deformable Convolution Plugin
Architecture Diagram for Modulated Deformable Convolution Plugin

The ModulatedDeformableConvPluginDynamic class provides a custom TensorRT plugin for performing modulated deformable convolution, a key operation used in deep learning models for computer vision tasks.

Read more

Skip Layer Normalization Plugin

The SkipLayerNormPluginDynamic and SkipLayerNormVarSeqlenPlugin classes in the …/skipLayerNormPlugin directory provide custom TensorRT plugins for performing skip layer normalization, a common operation in transformer-based neural networks.

Read more

Embedding Layer Normalization Plugin

• • •
Architecture Diagram for Embedding Layer Normalization Plugin
Architecture Diagram for Embedding Layer Normalization Plugin

The EmbLayerNormPluginDynamic class is a custom TensorRT plugin that performs embedding layer normalization, a crucial component in BERT-based natural language processing models. This plugin is designed to combine the input token IDs, segment IDs, and position embeddings, and then apply layer normalization to the resulting embedding.

Read more

Scatter Elements Plugin

• • •
Architecture Diagram for Scatter Elements Plugin
Architecture Diagram for Scatter Elements Plugin

The ScatterElementsPlugin provides a custom TensorRT plugin for performing a scatter operation with various reduction types, such as sum, multiplication, mean, minimum, and maximum.

Read more

ROI Align Plugin

• • •
Architecture Diagram for ROI Align Plugin
Architecture Diagram for ROI Align Plugin

The ROIAlign plugin is a custom TensorRT plugin that performs the Region of Interest (ROI) Align operation, a common operation used in object detection and instance segmentation tasks. The plugin is designed to provide a highly optimized and configurable implementation of the ROI Align operation, allowing it to be seamlessly integrated into a TensorRT-based inference pipeline.

Read more

Multiscale Deformable Attention Plugin

• • •
Architecture Diagram for Multiscale Deformable Attention Plugin
Architecture Diagram for Multiscale Deformable Attention Plugin

The MultiscaleDeformableAttnPlugin class provides a custom TensorRT plugin for the Multiscale Deformable Attention (MSDA) operation, a key component in deep learning models for computer vision tasks.

Read more

Multilevel Propose ROI Plugin

The MultilevelProposeROI plugin is a crucial component in the sampleMaskRCNN application, responsible for generating the first-stage detection (ROI candidates) from the Region Proposal Network (RPN) outputs and pre-defined anchors. The plugin takes two input tensors, object_score and object_delta, and generates one output tensor of ROI candidates.

Read more

Instance Normalization Plugin

• • •
Architecture Diagram for Instance Normalization Plugin
Architecture Diagram for Instance Normalization Plugin

The InstanceNormalizationPlugin is a custom TensorRT plugin that performs instance normalization, a common operation in deep learning models for image generation tasks. The plugin is based on the ONNX opset 6 definition for InstanceNormalization.

Read more

Clip Plugin

References: plugin/clipPlugin

• • •
Architecture Diagram for Clip Plugin
Architecture Diagram for Clip Plugin

The ClipPlugin is a custom TensorRT plugin that applies a clipping operation to the input tensor, limiting the values to a specified minimum and maximum range. The plugin is implemented in the …/clipPlugin directory and consists of the following key components:

Read more

Batched NMS Plugin

The BatchedNMSPlugin and BatchedNMSDynamicPlugin classes are custom TensorRT plugins that provide an efficient implementation of batched non-maximum suppression (NMS) for object detection models.

Read more

Voxel Generator Plugin

• • •
Architecture Diagram for Voxel Generator Plugin
Architecture Diagram for Voxel Generator Plugin

The VoxelGeneratorPlugin is a custom TensorRT plugin responsible for generating voxel features from point cloud data, a crucial component in 3D object detection tasks. The plugin takes in a point cloud and point count, and outputs a set of voxel features, voxel coordinates, and voxel parameters.

Read more

Special Slice Plugin

• • •
Architecture Diagram for Special Slice Plugin
Architecture Diagram for Special Slice Plugin

The SpecialSlice plugin is a custom TensorRT plugin that is used to perform a specialized slicing operation on input tensors. This plugin is likely used as part of a larger neural network model, such as MaskRCNN, to extract specific information from the model's output.

Read more

Resize Nearest Plugin

The ResizeNearest plugin is a custom TensorRT plugin that performs nearest-neighbor image resizing, a common operation in computer vision tasks. The plugin is designed to efficiently resize the spatial dimensions (height and width) of input tensors while keeping the other dimensions unchanged.

Read more

Reorg Plugin

References: plugin/reorgPlugin

• • •
Architecture Diagram for Reorg Plugin
Architecture Diagram for Reorg Plugin

The Reorg class template provides the core functionality for the "reorg" operation, which is commonly used in deep learning models, particularly in object detection tasks. The Reorg class is specialized into two subclasses: ReorgStatic and ReorgDynamic, which handle static and dynamic input shapes, respectively.

Read more

Region Plugin

• • •
Architecture Diagram for Region Plugin
Architecture Diagram for Region Plugin

The Region plugin is a custom TensorRT plugin specifically designed for object detection tasks, particularly for the YOLOv2 model. The plugin is responsible for processing the output of a neural network to generate bounding boxes, objectness scores, and class probabilities for detected objects.

Read more

Pyramid ROI Align Plugin

• • •
Architecture Diagram for Pyramid ROI Align Plugin
Architecture Diagram for Pyramid ROI Align Plugin

The PyramidROIAlign plugin is a custom TensorRT plugin that performs Region of Interest (ROI) alignment on feature maps from a Convolutional Neural Network (CNN) using a feature pyramid approach. This functionality is commonly used in object detection and segmentation tasks, such as in Faster R-CNN and Mask R-CNN models.

Read more

Proposal Plugin

• • •
Architecture Diagram for Proposal Plugin
Architecture Diagram for Proposal Plugin

The ProposalPlugin and ProposalDynamicPlugin classes are responsible for generating object proposals in object detection models, specifically for the Faster R-CNN architecture. These plugins handle tasks such as anchor generation, non-maximum suppression, and bounding box regression to produce the final set of region of interest (ROI) bounding boxes.

Read more

TensorRT Core Functionality

References: include, parsers

The core functionality of the NVIDIA TensorRT library is defined in the header files and implementation details located in the include directory. This includes the runtime, parsers, and consistency checking components.

Read more

Runtime and Consistency Checking

• • •
Architecture Diagram for Runtime and Consistency Checking
Architecture Diagram for Runtime and Consistency Checking

The core runtime functionality of the NVIDIA TensorRT library is defined in the NvInferImpl.h file. This file contains the implementation details of various API methods and classes that are crucial for working with TensorRT.

Read more

Plugins and Utilities

• • •
Architecture Diagram for Plugins and Utilities
Architecture Diagram for Plugins and Utilities

The NvInferPlugin.h file provides an API for using the NVIDIA-provided TensorRT plugins. It defines a single function, initLibNvInferPlugins(), which is used to initialize and register all the existing TensorRT plugins to the Plugin Registry, with an optional namespace. This function should be called once before accessing the Plugin Registry.

Read more

Safe Runtime

• • •
Architecture Diagram for Safe Runtime
Architecture Diagram for Safe Runtime

The safe runtime API for TensorRT provides a functionally safe execution environment for deploying TensorRT models in critical applications. The key components of this API are:

Read more

Version and Configuration Management

• • •
Architecture Diagram for Version and Configuration Management
Architecture Diagram for Version and Configuration Management

The NvInferVersion.h file defines the version information for the NVIDIA TensorRT library and its components. This includes the major, minor, patch, and build numbers, as well as the release type (early access, release candidate, or general availability).

Read more

Parsers

References: parsers, parsers/common

• • •
Architecture Diagram for Parsers
Architecture Diagram for Parsers

The parsers directory contains the implementation of various parsers used in the TensorRT library, which is a C++ library for optimizing and deploying deep learning models. The main components in this directory are:

Read more

TensorRT Docker Support

References: docker

• • •
Architecture Diagram for TensorRT Docker Support
Architecture Diagram for TensorRT Docker Support

The docker directory contains scripts and configuration files for building and launching Docker containers for the NVIDIA TensorRT deep learning inference optimization library.

Read more

Building Docker Images

• • •
Architecture Diagram for Building Docker Images
Architecture Diagram for Building Docker Images

The build.sh script in the docker directory is responsible for building Docker images for the NVIDIA TensorRT deep learning inference optimization library. This script allows users to customize the Docker file, image name, and CUDA version used in the build process.

Read more

Launching Docker Containers

References: docker/launch.sh

• • •
Architecture Diagram for Launching Docker Containers
Architecture Diagram for Launching Docker Containers

The launch.sh script in the docker directory is responsible for launching Docker containers for the NVIDIA TensorRT deep learning inference optimization library. The script provides a convenient way to start a TensorRT Docker container with various configuration options.

Read more

TensorRT Quickstart Guide

References: quickstart

The TensorRT Quickstart Guide provides introductory examples and tutorials for working with NVIDIA TensorRT, a high-performance deep learning inference optimizer and runtime engine. This guide covers the following key areas:

Read more

TensorRT Quickstart Guide Utilities

References: quickstart/common

• • •
Architecture Diagram for TensorRT Quickstart Guide Utilities
Architecture Diagram for TensorRT Quickstart Guide Utilities

The …/common directory contains several utility files that provide common functionality for working with TensorRT, particularly related to image processing and logging.

Read more

TensorFlow-TensorRT (TF-TRT) Model Integration

• • •
Architecture Diagram for TensorFlow-TensorRT (TF-TRT) Model Integration
Architecture Diagram for TensorFlow-TensorRT (TF-TRT) Model Integration

The …/IntroNotebooks directory provides utility files and example notebooks for working with TensorFlow-TensorRT (TF-TRT) models. The main functionality is provided in the helper.py and onnx_helper.py files.

Read more

Semantic Segmentation with TensorRT

• • •
Architecture Diagram for Semantic Segmentation with TensorRT
Architecture Diagram for Semantic Segmentation with TensorRT

The …/SemanticSegmentation directory contains code that demonstrates the process of exporting a pre-trained PyTorch model for semantic segmentation to the ONNX format, and then running inference on the exported ONNX model using the TensorRT runtime.

Read more

Deploying TensorRT Models to Triton Inference Server

• • •
Architecture Diagram for Deploying TensorRT Models to Triton Inference Server
Architecture Diagram for Deploying TensorRT Models to Triton Inference Server

This subsection provides instructions and sample code for deploying a pre-trained ResNet-50 model, optimized using NVIDIA TensorRT, on the Triton Inference Server.

Read more