Mutable.ai logoAuto Wiki by Mutable.ai

tensorflow

Auto-generated from tensorflow/tensorflow by Mutable.ai Auto Wiki

tensorflow
GitHub Repository
Developertensorflow
Written inC++
Stars180k
Watchers 7.7k
Created2015-11-07
Last updated2024-01-09
LicenseApache License 2.0
Homepagetensorflow.org
Repositorytensorflow/tensorflow
Auto Wiki
Generated at2024-01-11
Generated fromCommit fd1635
Version0.0.4

TensorFlow is an end-to-end open source platform for machine learning. It provides a comprehensive ecosystem of tools, libraries, and community resources that enable researchers and developers to build and deploy machine learning powered applications.

At its core, TensorFlow provides an extensible framework for constructing, executing, and differentiating the computations necessary for training neural networks and making predictions. The key components that power this functionality are:

  • The Python API defined in …/python provides easy model construction and training using high-level abstractions like layers, models, optimizers, and training loops. It allows building and training complex neural networks in just a few lines of code.

  • The C++ API defined in …/cc enables extending TensorFlow natively in C++ without Python dependencies. It provides classes like ClientSession and Scope for manipulating graph execution from C++.

  • The Core Framework implemented in tensorflow and …/core provides the foundational data structures and operations that power TensorFlow. This includes classes like Tensor and Graph, device-specific kernel implementations, distributed execution infrastructure, and optimization passes.

  • Tools for distribution, mobile deployment, debugging, profiling, and conversion enable using TensorFlow in a wide variety of environments. Utilities like AutoGraph allow transparently converting Python control flow to TensorFlow ops.

The overall architecture centers around the construction of computational graphs, distributed execution across devices, and automatic differentiation to update graph parameters. Key design choices include graph representation vs eager execution, providing multiple client APIs, using protocols like GraphDef and SavedModel to support evolving APIs, and implementing core operations as differentiable kernels.

TensorFlow Core Framework

References: tensorflow, tensorflow/core

The …/core directory contains core framework functionality that powers TensorFlow's machine learning capabilities. It provides fundamental abstractions, operations, kernels, and utilities that are implemented throughout TensorFlow.

Some key aspects implemented in this directory include core data types and common utilities in …/framework. This includes classes like the Tensor class that represents multidimensional arrays, and utilities for attributes, cancellation, control flow, and more.

The Graph class defined in …/graph.h represents the overall graph structure and provides APIs for construction, querying and serialization. It owns Node and Edge objects that define how tensors flow through the graph. The NodeBuilder class in …/node_builder.cc allows incrementally constructing NodeDef protocol buffers that define graph operations.

Centralized configuration and flag functionality is provided in …/config by the Flags class, which stores all flag values in a thread-safe global container accessed via its Global() method. This allows consistent flag access across TensorFlow.

Low-level platform abstractions that are shared across implementations are contained in …/platform. This includes filesystem access via the FileSystem class, which acts as a common interface for filesystem implementations, and threading functionality via mutexes.

Core operations are implemented across files in …/ops and registered with TensorFlow using the REGISTER_OP macro. The OpKernel base class defined in …/op_kernel.h handles aspects like input/output shape validation and allocation for kernels. Subclasses implement the core Compute() method to define each operation's logic.

Kernels that execute operations for different device types are defined in …/kernels. They inherit from the OpKernel base class.

Utilities for common tasks are contained in …/util, such as parsing, serialization. Optimizations for graphs are provided in …/grappler using the GrapplerItem class that encapsulates graphs and metadata passed to the Grappler optimizer.

TensorFlow Core Framework

References: tensorflow/core

The …/framework directory contains core framework functionality that powers TensorFlow's operations, data structures, and utilities. It defines fundamental abstractions like Tensor and OpKernel that are used throughout the system.

Some key components include:

  • The Tensor class defined in …/tensor.h represents multidimensional arrays of elements that flow through TensorFlow graphs. It handles data types, shapes, and underlying buffer management.

  • The OpKernel base class defined in …/op_kernel.h represents a single operation that can be executed on a particular device. Subclasses implement the core operation logic in their compute() method.

  • The Device abstraction in …/device.h represents a compute resource like a CPU or GPU. Subclasses implement device-specific behavior.

  • Utilities for cancellation primitives in …/cancellation.h for aborting operations via callbacks.

Some important implementation details:

  • The Tensor class manages the underlying buffer, allowing different implementations. It handles reference counting for memory management.

  • The OpKernel base class provides a common interface for kernels. Its constructor validates and caches input/output shapes and types.

  • Subclasses like CpuDevice implement Device methods for specific hardware. The DeviceContext abstraction handles per-step resources.

  • Cancellation uses CancellationManager to register callbacks and CancellationToken objects to propagate cancellations.

Python API

References: tensorflow/python

The …/python directory provides the core Python implementation of TensorFlow. It defines the fundamental building blocks for constructing and running TensorFlow models, training machine learning algorithms, and interacting with TensorFlow from Python.

Some key aspects implemented in this directory include:

  • The Session class in …/session.py is central to executing operations and evaluating tensors. It handles running operations and unpacking/repacking structured fetches and feeds via _FetchHandler.

  • The InteractiveSession subclass makes sessions interactive by auto-launching and closing.

  • Functions like TF_Run() in …/tf_session_helper.cc provide a bridge between Python and TensorFlow by handling conversions for feeds and fetches between NumPy and TensorFlow types during session runs.

  • The Graph class represents TensorFlow graphs, containing a dictionary of Operation objects. Operation objects have inputs and outputs, which are Tensor objects representing multidimensional arrays.

  • The Tensor class represents tensors and their shapes. It supports operations like addition and has properties like dtype and shape.

  • The Dataset class represents datasets logically. The Iterator class iterates over datasets. The DistributedIterator handles distributing the base dataset across devices and building iterators to prefetch data.

  • Training loops contain the core logic for fitting models to data using different execution modes and data types. They handle aspects like batch processing and callback invocation.

Python API

References: tensorflow/python

The …/client directory provides the core Python APIs for constructing, executing and interacting with TensorFlow graphs. This includes defining and running computational graphs, as well as retrieving results and tensors.

The key functionality includes:

Some key aspects of working with these APIs include constructing computational graphs using classes like Graph and Operation, launching Sessions to execute graphs, and retrieving results using the Session interface.

Core Framework

References: tensorflow/python, tensorflow/python/client, tensorflow/python/framework

The core TensorFlow functionality for constructing and manipulating graphs is implemented in the Graph class defined in …/framework_lib.py. The Graph class represents the basic TensorFlow computation graph, containing a dictionary of Operation objects which represent individual operations like matrix multiplications.

Operation objects have inputs and outputs, which are Tensor objects representing multidimensional arrays. The Tensor class defined in …/tensor_shape.py handles tensor properties like shape and dtype. It supports basic operations like addition.

The IndexedSlices class in …/indexed_slices.py represents sparse tensors, containing values, indices, and dense_shape properties. It supports operations like addition via __composite_gradient__().

The core Graph class:

The Operation class:

  • Represents individual ops like matrix multiplications in the graph
  • Has inputs and outputs, which are Tensor objects
  • Supports methods like name and type for introspecting properties

The Tensor class:

  • Represents tensors and their shapes and dtypes
  • Supports basic operations like addition
  • Maintains properties like shape and dtype

The IndexedSlices class:

  • Represents sparse tensors with values, indices, dense_shape
  • Supports operations like addition via custom gradient functions

Layers and Models

References: tensorflow/python/keras, tensorflow/python/layers, tensorflow/python/estimator

The …/layers directory contains implementations of common neural network layers that can be used as building blocks for machine learning models. Some key functionality includes:

  • Core layers:

    • The Dense layer implements a fully-connected layer. The call() method performs a dot product between the inputs and kernel, adds any bias, and applies the activation.
    • The Dropout layer applies dropout regularization during training by randomly setting units to 0 based on a dropout rate.
  • Convolutional layers:

    • Layers like Conv2D and Conv3D leverage TensorFlow operations such as tf.nn.conv2d() to perform convolutions. The call() method handles input/output preprocessing and delegates to these ops.
  • Recurrent layers:

    • The LSTM and GRU layers implement common RNN cell types. Their call() methods contain the core RNN computations, applying different update rules based on their gates or reset mechanism.
  • Normalization:

    • Layers such as BatchNormalization normalize inputs using statistics collected over mini-batches, allowing faster training. The call() method computes mean/variance and applies the transformation.
  • Activation layers:

    • Layers like ReLU and Softmax apply common activation functions element-wise using ops such as tf.nn.relu(). The call() method handles this application.

These layers provide common neural network building blocks in a reusable object-oriented way. Their modular implementations leverage efficient TensorFlow operations. The call() method of each layer encapsulates its core logic. Layers can be composed to build complex machine learning models using the Keras functional or subclassing APIs. The Layer base class defines the common layer interface.

Training

References: tensorflow/python/training

The …/training.py file contains utilities and classes for building TensorFlow training loops. It provides common functionality for training machine learning models in TensorFlow, including:

  • The MonitoredSession class handles running TensorFlow sessions within a monitored context, executing hooks at checkpoints. It handles initialization, recovery from errors, and running hooks to help manage distributed training sessions.

  • SessionRunHooks like CheckpointSaverHook allow inserting custom evaluation and monitoring code into the training loop. For example, CheckpointSaverHook saves model checkpoints periodically during training.

  • Training loops utilize classes like MonitoredSession and hooks to repeatedly call the model and optimizer over batches of data to optimize the model parameters. This training loop functionality trains models by iterating optimization steps.

The main classes and their roles are:

  • MonitoredSession is the primary interface for running a training loop. It handles session creation/management, running hooks, and handles failures like restoring from checkpoints on exceptions.

  • Scaffold handles initialization operations, variables, and saving/restoring checkpoints. Properties like init_op, saver, and summary_op are used by MonitoredSession.

  • SessionRunHook is the base class hooks must inherit from. Hooks run callbacks at checkpoints to insert monitoring code into the training loop run by MonitoredSession.

The key aspects of building and optimizing models with this code are:

  • MonitoredSession runs the core training loop, calling hooks before/after its run() method which executes one step.

  • Scaffold handles common initialization/recovery logic via properties used by MonitoredSession.

  • Hooks like CheckpointSaverHook allow running custom callbacks periodically in the training loop to save checkpoints.

  • SessionRunHook defines the callback interface hooks must implement to insert code at checkpoints in the training loop.

Utilities

References: tensorflow/python/util

Some key utility modules include:

Debugging

References: tensorflow/python/debug

The core utilities for debugging TensorFlow graphs and tensors are provided in the …/debug_utils.py file. This file contains high-level functions for programmatically adding debug tensor watches to graphs via the lower-level RunOptions protobuf API. The main functions are add_debug_tensor_watch() and watch_graph().

add_debug_tensor_watch() allows adding a watch for a single tensor, specified by node name and output slot. It takes the run_options and modifies the debug options within it. watch_graph() allows adding watches for the entire graph in one call. It iterates through all operations in the graph and calls add_debug_tensor_watch() for each tensor, optionally applying regex filters.

Both graph watching functions set the parallel_iteration attribute of while loops to 1, to prevent concurrent execution. They also reset the disk byte usage if specified. The main implementation details are:

Distributed

References: tensorflow/python/distribute

The strategies rely on utilities in …/dtensor_util.py like conversion functions and the DTensorDistributedValue class. Tests for the strategies and utilities are in subdirectories like …/mirrored_strategy_test.py.

The MirroredStrategy class provides an implementation of synchronous distributed training across multiple local devices like GPUs. When initialized, it builds a DTensor mesh configuration to distribute variables and computations. Variables created under the strategy will have a replicated layout across the mesh dimension, allowing data parallelism. The key methods are __init__ to initialize the strategy and mesh, and reduce() to aggregate values across replicas.

The DTensorStrategyExtended class implements a strategy directly backed by DTensor. When initialized, it takes a container_strategy and mesh configuration. It overrides variable creation via _create_variable() to use DVariable instances instead of regular TensorFlow variables. call_for_each_replica() ensures function inputs and outputs are converted to/from DTensor format. It implements distributing datasets by unbatching, assigning layouts, and handling global batching with DTensorDataset. The mesh configuration plays a central role in determining variable layouts and replication.

The MultiWorkerMirroredStrategy class extends single-worker replication to multiple workers. It initializes based on a provided mesh or cluster_resolver, parses DTensor environment variables, and builds the distributed mesh using DTensor APIs.

Profiling

References: tensorflow/python/profiler

The …/internal directory contains implementations of core profiling functionality for TensorFlow in Python. This includes files for registering floating point operation counts for operations (…/flops_registry.py), unit tests for FLOPs registration (…/flops_registry_test.py), utilities for building test models (…/model_analyzer_testlib.py), tests for printing model analysis (…/print_model_analysis_test.py), Python bindings for the C++ profiler backend (…/profiler_pywrap_impl.cc and …/profiler_pywrap_impl.h), wrapping the profiler session functionality (…/profiler_wrapper.cc), defining types for Python profiling hooks (…/python_hooks.h), and functionality for tracing Python function calls (…/traceme_wrapper.cc).

The …/flops_registry.py file provides a unified way to calculate floating point operations (FLOPs) for TensorFlow operations. It registers op-specific FLOPs calculation functions with a decorator. These functions leverage common FLOPs calculation functions defined in the file. Convolutional and pooling operations are handled specially, first verifying data format and then calculating kernel operations based on shapes and kernel size. Reduction operations separately calculate reduce and finalization FLOPs. Other operations directly calculate FLOPs based on input and output shapes by multiplying by operations per element.

The …/profiler_wrapper.cc file wraps the underlying C++ profiler session functionality via the ProfilerSession class. Methods like start() and stop() delegate to the real C++ session implementation. Functions such as trace() and monitor() collect remote profiling data by converting options, releasing the GIL for blocking C++ calls, and converting between formats. This provides a safe Python interface to the complex C++ implementation.

The …/traceme_wrapper.cc file defines functionality for tracing Python function calls via the Trace class. The constructor initializes the underlying C++ trace from kwargs. Methods like SetMetadata() and Stop() proxy calls to corresponding C++ trace methods. A PyBind11 module exposes this functionality to Python.

Eager Execution

References: tensorflow/python/eager

The …/execute.py file contains the core functions for executing operations in eager mode:

  • execute(): Handles argument conversion and execution of operations. It converts Python objects to tensors before execution.

Within execute(), functions like args_to_matching_eager() convert Python objects like lists and NumPy arrays to tensors with matching dtypes before execution. This ensures operations receive the expected tensor inputs.

These functions provide the fundamental primitives to execute operations eagerly without static graph construction.

The main class for managing the execution context is the Context class. It has responsibilities like setting the execution mode and handling device placement. Methods like set_execution_mode() allow changing the execution mode, and device() can be used as a context manager for device scopes.

AutoGraph

References: tensorflow/python/autograph

The …/impl directory contains the core implementation of AutoGraph, TensorFlow's tool for automatically converting Python code into equivalent TensorFlow graphs. This allows arbitrary Python code to be compiled into a graph and take advantage of TensorFlow's distributed execution capabilities.

The main components are:

The PyToTF class in …/api.py is the core AST transformer that recursively walks Python code and applies various conversion passes to transform control flow, functions, variables etc. into TensorFlow operations. It has methods like transform_ast() that descend the AST, applying transformations at each node.

The converted_call() function in the same file handles converting any function calls recursively. It first tries executing the function normally, and if not supported it will convert the function and arguments to TensorFlow operations before invoking it. This function plays a key role in lowering Python language constructs to TensorFlow primitives.

Error handling during staging is implemented in _attach_error_metadata() from api.py. It attaches contextual information like the source map to errors, so they can be remapped to the original Python code location. The StackTraceMapper context manager defined in the file helps with remapping stack traces during error recovery.

Classes like AutoGraphError and functions like is_autograph_strict_conversion_mode() in api.py provide the core APIs for error handling and configuration control during conversion.

Overall, the main components work together to recursively convert Python code into an equivalent TensorFlow graph representation. The PyToTF transformer walks the AST, while converted_call() handles lowering function calls. Error handling utilities ensure errors can be recovered back to the original Python code location.

Compatibility

References: tensorflow/python/compat

The …/compat directory contains utilities for maintaining API compatibility across TensorFlow versions. This allows code to work seamlessly across releases.

The compat module centralizes forward compatibility checks by comparing the current date to a stored compatibility date number. This simple check allows flexibility via environment variables or direct calls. The forward_compatible() function uses this check to gate new features that would break compatibility.

The v2_compat module provides a unified way to toggle TensorFlow 1.x and 2.x behaviors globally via enable_v2_behavior() and disable_v2_behavior(). The core switching logic is done in internal modules like tf2 and ops. The Registry class registers callbacks to modularly switch modules like tf.data between versions.

Metrics are tracked with monitoring.BoolGauge to measure how often behaviors are enabled or disabled. This provides visibility into usage. Rigorous tests in compat_test.py and disable_v2_behavior_test.py validate the expected functionality under different conditions.

C++ API

References: tensorflow/cc

The …/cc directory contains the C++ implementation of TensorFlow. It provides C++ APIs and functionality for constructing, executing, and differentiating TensorFlow graphs from C++ code.

The core functionality includes defining operations and building graphs using the Operation and Scope classes. Operations represent graph nodes and are constructed within scopes to ensure unique naming. Core operations like constants and control flow are implemented in templates in …/ops for flexibility.

Gradient functions are registered in …/gradients using REGISTER_GRADIENT_OP and implemented for individual operations. The GradOpRegistry maps operations to their gradient functions.

Graphs are executed using sessions constructed from the ClientSession class in …/client. This class manages running operations and fetching outputs. Multi-threaded execution is supported via Schedule().

SavedModels containing trained graphs can be loaded from C++ and functions within executed using classes like SavedModelAPI and ConcreteFunction defined in …/saved_model.

Experimental APIs exposing higher-level concepts are provided in …/experimental. This includes objects, modules and a runtime via classes like Object, Runtime.

The core abstractions include:

Framework

References: tensorflow/cc, tensorflow/cc/framework

The Scope class handles naming and scoping for graph construction. It represents a named context for operations during construction and provides unique operation names via its GetUniqueNameForOp() method. This handles name collisions within a scope's namespace. Nested sub-scopes can be created that inherit attributes from parent scopes.

The Operation class represents a node in the TensorFlow graph. It contains a pointer to the underlying Node object and extracts important metadata like inputs, outputs, and attributes from the operation definition via its constructor.

The GradOpRegistry class manages a mapping from operation names to GradFunc callbacks that define their gradient functions. Its Register() method adds mappings, and Lookup() retrieves registered gradient functions. This registry enables gradient computation by associating each operation with its gradient function.

The AddSymbolicGradients() function takes initial gradients and the computation graph. It adds the necessary nodes to the graph to backpropagate the gradients through to the requested inputs. It leverages the gradients registered in the GradOpRegistry for each operation.

The SymbolicGradientBuilder class is responsible for the core logic of propagating gradients backwards through the graph. It uses the gradients registered in the GradOpRegistry for each operation to determine how to construct the backward pass graph.

The Coordinator class provides synchronization primitives to coordinate starting and stopping multiple training processes represented by RunnerInterface objects like QueueRunner across threads or processes. It keeps track of registered runners using a std::vector and handles requesting stops, waiting for completion, and aggregating status across runners.

The QueueRunner class implements executing queue-related operations in parallel threads while coordinating behavior using a Coordinator. It initializes from a QueueRunnerDef protocol buffer and executes enqueue operations on queues in parallel threads via its Start() method. It implements the RunnerInterface to be coordinated by a Coordinator.

Ops

References: tensorflow/cc/ops

The …/ops directory contains implementations of core operations for TensorFlow graphs constructed using the C++ API. Operations are implemented through classes that define the computation to be performed for that operation.

Some key operations include:

  • Constant operations are implemented in …/const_op.h. Template specializations of Const handle different data types, and functions like ConstFromProto() allow constructing constants from protos.

  • Operations have associated test files that validate the operation's behavior and properties. For example, …/const_op_test.cc tests constants are constructed correctly.

Client

References: tensorflow/cc/client

The ClientSession class provides the main interface for clients to interact with and execute graphs constructed via the C++ API on a TensorFlow runtime. It represents a session that can be used to drive the evaluation of a TensorFlow graph.

The key aspects of ClientSession are:

  • It manages an underlying Impl class which holds the Session object and graph. The Impl synchronizes access to the graph using a mutex.

  • Constructors initialize a new Session and Impl instance, taking either a Scope or session options.

  • The main execution method is Run(), which handles feeding inputs, fetching outputs, and executing ops on the session. There are overloaded versions supporting options.

  • MakeCallable() and ReleaseCallable() allow bundling subgraphs into reusable handles called "callables" for modularization.

  • MaybeExtendGraph() in Impl checks if the graph has changed size, and extends the session with the new definition if needed.

The Impl class is crucial, as it manages the underlying Session and graph synchronization. It contains the Session pointer and holds a shared_ptr to the Graph definition. Impl uses a mutex to protect access to the graph size in MaybeExtendGraph(). This method extends the session if the size has changed, ensuring the session always uses the latest graph.

ClientSession's key responsibility is the Run() method. It collects feed inputs, passes everything to the underlying Session's Run(), handles any errors, and returns fetch outputs. Overloaded versions support options. MakeCallable() and ReleaseCallable() simply wrap the corresponding session methods, while also calling MaybeExtendGraph() to synchronize the graph.

SavedModel

References: tensorflow/cc/saved_model

The SavedModelBundle class represents a loaded TensorFlow SavedModel. It contains the MetaGraphDef protocol buffer, which defines the graph, signatures, and metadata required to run the model. It also contains a Session object, which is used to execute the graph.

The SavedModelBundle is loaded from disk using the static LoadSavedModel() function. This handles loading the MetaGraphDef, restoring variables from the checkpoint using RestoreSession(), and initializing the session by running initialization ops. It returns a SavedModelBundle containing the loaded graph and session.

The MetaGraphDef contains important model metadata like the graph definition, signatures defining inputs/outputs, assets, and variable initializers. It is parsed from the SavedModel files when loading.

The Session class represents a TensorFlow session, which is required to execute the graph. It is initialized by LoadSavedModel() and stored in the SavedModelBundle. The session can be retrieved via GetSession() and used to run the model by passing inputs through signatures.

Signatures are represented by SignatureDef protocol buffers, containing metadata like input/output names and shapes. They are extracted from the MetaGraphDef and stored in the SavedModelBundle. Signatures define the inputs and outputs for different model functions.

The RestoreSession() function handles restoring variables and running initialization ops. It identifies the necessary ops from the MetaGraphDef and variable_reader, constructs a feed dict with variable paths, and runs the restore/initialization ops in the session. This restores the variable values required for the model to operate correctly.

The LoadSavedModel() function is the main entry point. It handles loading the MetaGraphDef, restoring variables with RestoreSession(), and initializing the session. It collects loading metrics and fingerprints the model. The loaded SavedModelBundle returned represents the fully restored model that is ready to run inferences.

Distributed TensorFlow

References: tensorflow/distribute

The core functionality for distributed training and execution in TensorFlow is implemented in the …/distribute directory. This directory provides abstractions and APIs for distributing computation across multiple devices or machines.

The tf.distribute API defines distribution strategies like MirroredStrategy and MultiWorkerMirroredStrategy that handle distributing computation across devices. The DistributedVariable class is the primary way of wrapping variables to make them accessible from multiple devices. It handles initialization, aggregation, and recovery of variables distributed across devices or machines.

The …/rpc directory contains experimental RPC ops for distributed training. The RpcServer class manages registration of model and loss functions via its FunctionRegistry. It then starts the gRPC server. The RpcClient class allows clients to make asynchronous gRPC calls to functions registered on the server via an RPCState object. The …/kernels subdirectory and …/metadata_for_rpc_ops.cc file define the core TensorFlow ops for the RPC functionality. Utilities for secure gRPC credentials are in …/oss.

Distribution Strategies

References: tensorflow/python/distribute

The TensorFlow distribution strategies implement distributed training across multiple GPUs, TPUs, or machines. The core strategies are MirroredStrategy and MultiWorkerMirroredStrategy.

MirroredStrategy allows running replicated model copies on multiple local devices like GPUs. When initialized, it builds a DTensor mesh configuration to distribute variables and computations. Variables created under the strategy will have a replicated layout across the mesh dimension, allowing data parallelism. Methods like call_for_each_replica() ensure function inputs and outputs are executed on each replica device. It inherits from distribute_lib.Strategy and delegates to a DTensorStrategyExtended instance, allowing reuse of TensorFlow strategy functionality while leveraging DTensor for distributed execution and synchronization.

MultiWorkerMirroredStrategy extends single-worker replication to multiple workers for large-scale distributed training. It initializes based on a provided mesh or cluster_resolver, parses DTensor environment variables, and builds the distributed mesh using DTensor APIs. This allows variables and computations to be distributed across the worker devices in a synchronized manner.

The strategies rely on common utilities in …/dtensor_util.py like conversion functions and the DTensorDistributedValue class. The DTensorDistributedValue class represents distributed values that can be operated on across the mesh. It contains methods like values and merge() for accessing and aggregating the underlying per-replica values.

The strategies make use of the ParallelDevice abstraction defined in …/parallel_device.py to execute operations in parallel across multiple underlying devices such as TPU cores or GPUs. The ParallelDevice handles the core logic of distributing computation by packing and unpacking tensors. When operations are executed on it, the pack/unpack calls ensure the operations are properly distributed.

Distributed Variables

References: tensorflow

DistributedVariables are implemented in the DistributedVariable class. The DistributedVariable class handles initialization, aggregation, and recovery of variables distributed across a set of devices coordinated by a tf.distribute.Strategy.

Some key aspects of how DistributedVariable works:

  • It aggregates updates across devices by summing or averaging partial updates using methods like merge().

  • Initialization is coordinated so each device's slice is initialized independently and the overall variable value is consistent.

  • During training, gradients are aggregated from all replicas/devices and applied evenly using methods like scatter().

  • Values can be automatically mirrored to all devices or explicitly accessed by device to enable distributed execution.

  • Synchronization barriers ensure variables are in a consistent state before and after distributed operations like all_reduce().

  • Recovery handles restoring variable values if any devices fail, ensuring the value is replicated correctly across all surviving devices.

Utilities

References: tensorflow

Common utilities used by distribution strategies are contained in the …/cross_device_ops.py file. This file contains functions that are commonly used across different distribution strategies when distributing computation. Some important utilities include:

  • The all_gather() function, which gathers a tensor from multiple workers/devices and concatenates them along a new axis. This is useful for aggregation operations.

  • The broadcast() function, which broadcasts a tensor to all devices. This allows a computation to access a value from any device.

  • The reduce() function, which applies a reduction like sum or mean across devices. This is used to aggregate values from devices.

These cross-device utilities provide common distributed primitives that can be leveraged by different distribution strategies. The all_gather(), broadcast(), and reduce() functions handle collective communication between devices. Strategies can call these functions to distribute computations with minimal device-specific logic.

The all_gather() function is implemented in …/cross_device_ops.py. It takes the tensor to gather and the axis as arguments. It handles shape validation and concatenation of the gathered tensors.

The reduce() and broadcast() functions are also implemented in cross_device_ops.py. reduce() takes the tensor and reduction function like sum as arguments. It calls the reduction function on each device, gathers the results, and returns the reduced value. broadcast() replicates the tensor across devices by copying the value.

Experimental RPC Ops

References: tensorflow/distribute/experimental/rpc, tensorflow/distribute/experimental/rpc/kernels

The experimental RPC Ops allow implementing distributed training workflows by registering TensorFlow model and loss functions on an RPC server. These functions can then be asynchronously called from TensorFlow client sessions running on other devices or machines. This allows distributed execution of the registered functions across multiple devices.

The main components are the RpcServer class defined in …/rpc_ops.cc and the RpcClient class in the same file. RpcServer manages function registration via its FunctionRegistry. It starts a gRPC server on a given address with the registered functions. RpcClient handles making asynchronous gRPC calls to the server functions.

The RpcServerRegisterOp defined in …/metadata_for_rpc_ops.cc adds functions to the server's registry by capturing the TensorFlow function with the f attribute. RpcCall takes a client, function name, and arguments to call the registered function on the server. It returns a future resource representing the asynchronous call.

The future returned from RpcCall can be used with RpcCheckStatus and RpcGetValue defined in the same file to retrieve the status and output of the remote function call. RpcCheckStatus outputs the error code and message, while RpcGetValue returns the function result. DeleteRpcFutureResource cleans up the future after use.

Utilities for gRPC credentials are provided in …/grpc_credentials.cc. This file contains functions to get insecure credentials for creating unauthenticated gRPC servers and channels within Google's internal network.

TensorFlow Lite

References: tensorflow/lite

TensorFlow Lite provides tools and APIs for working with machine learning models across a variety of platforms and devices. It contains functionality for common tasks over the entire model lifecycle including conversion from TensorFlow, optimization for mobile and embedded targets, running inference locally or on hardware accelerators, and developing custom models and operators.

The core functionality is organized into several key components:

  • Model Conversion: The TOCO tool in …/toco handles converting TensorFlow models to the TFLite format. It applies optimizations and supports multiple input formats.

  • Optimization: Tools in …/optimize allow calibrating, quantizing and debugging models to reduce size and accelerate performance.

  • Inference: The Interpreter class in …/core loads models and executes operations. Kernels in …/kernels implement common operators.

  • Hardware Acceleration: Delegates like in …/nnapi offload operations to hardware accelerators on mobile devices to further boost speed.

  • C++ Runtime: The C API in …/c exposes a C interface that custom operators and delegates implement to integrate with the runtime.

  • Python Bindings: The Interpreter class in …/python provides a high-level Python interface for tasks like conversion and running models.

  • Mobile Platforms: Platform-specific code in directories like …/java and …/objc deploys models on Android and iOS.

Let me now discuss some important implementation details:

The Interpreter class is central to running models. Its constructor loads models via FlatBufferModel and configures delegates. AllocateTensors() prepares inputs/outputs, while Invoke() executes the model.

The TOCO tool converts models by applying graph transformations defined in GraphTransformation subclasses to optimize models. The Converter class orchestrates the full conversion process.

Optimization uses the Calibrator to collect calibration data then quantizes models. The TFLiteConverter handles model conversions in Python.

Delegates like StatefulNnApiDelegate partition models and map ops/tensors to hardware using NNAPIOpBuilder. Kernels implement ops by specializing for data types and hardware.

The C API exposes the TfLiteInterpreter class along with types like TfLiteTensor. Platform integrations provide native language bindings for these C interfaces.

Model Conversion

References: tensorflow/lite, tensorflow/lite/toco

The core functionality for converting TensorFlow models to the TensorFlow Lite format is handled by the TOCO converter. TOCO stands for TensorFlow Lite Optimizing Converter, and is implemented in the …/toco directory.

TOCO handles the end-to-end conversion process from a few important steps:

  1. Importing models: The Import() function reads the input TensorFlow model, typically in GraphDef format, and builds an internal representation using the Model class.

  2. Applying optimizations: A series of GraphTransformation passes are applied via TransformWithStatus() to optimize the model topology before conversion. This includes fusing operations and simplifying the graph.

  3. Exporting models: Once optimizations are complete, the converted model is exported out of TensorFlow Lite format using Export(). This writes the converted model using the FlatBuffer format.

The core conversion workflow is orchestrated by the Convert() function. It handles importing the model, running the transformations, then exporting the result.

The Model class represents the internal graph structure being converted. It contains a list of Operator nodes that make up the model. The Operator class encapsulates individual operation nodes in the graph.

Graph transformations are implemented as subclasses of the base GraphTransformation class. For example, the FuseActivationFunctions transformation matches patterns in the model involving activations and replaces them with optimized subgraphs. The main transformations are applied via calling RunGraphTransformationsWithStatus().

TOCO also provides debugging utilities like dumping models as Graphviz graphs for inspection. Comprehensive tests in files like toco_convert_test.cc help prevent regressions.

Optimization

References: tensorflow/lite/tools/optimize

The core functionality for optimizing TensorFlow Lite models is contained in the …/optimize directory. This directory contains tools for quantizing models to reduce size and speed up inference, as well as pruning, calibration and other techniques.

The main classes and functions for quantization include:

Calibration involves running a model on sample data to collect statistics without modifying ops. This is handled by:

The collected statistics can then be applied to models using classes like CalibrationReader defined in …/calibration_reader.h.

Pruning functionality is contained in subdirectories like …/sparsity. This provides utilities for working with sparse tensors needed for pruning-related optimizations.

Inference

References: tensorflow/lite/c, tensorflow/lite/core, tensorflow/lite/kernels

The core TensorFlow Lite runtime provides APIs and implementations for performing inference with TensorFlow Lite models across platforms. The main classes and interfaces involved are TfLiteInterpreter, TfLiteTensor, and TfLiteDelegate.

The TfLiteInterpreter class represents a TensorFlow Lite model that can be executed. It contains the graph operations and handles running inference. Key methods on it include AllocateTensors(), Invoke(), and others for preprocessing, running, and retrieving results from the model.

TfLiteTensor represents a multi-dimensional tensor that can be used as model inputs/outputs. It contains the data buffer and shape/type information.

Delegates allow accelerating inference by offloading execution to hardware backends. The TfLiteDelegate interface defines callbacks for operations like Prepare(), Copy() and Invoke() that a delegate must implement. Concrete delegates then provide optimized implementations targeting backends like NNAPI.

The core runtime implementations are in the …/core directory. Here, important classes include:

  • Interpreter: Represents a model and handles running inference
  • OpResolver: Maps operation names to registration functions
  • Subgraph: Represents a portion of the model graph that can run independently

Key files implementing these classes are interpreter.cc, mutable_op_resolver.cc, and subgraph.cc.

The …/kernels directory contains implementations of common operators. Functions like Add() in reference_ops.h provide optimized vectorized kernels. Operator implementations must register in register.cc by returning a TfLiteRegistration from functions like Register_ADD().

The C API implementation in …/c exposes interfaces for loading models, creating interpreters, manipulating tensors, and running inference in an opaque manner from C/C++. Functions like TfLiteInterpreterCreate(), TfLiteInterpreterInvoke(), and types like TfLiteModel and TfLiteTensor comprise the main C API.

In summary, these components provide the fundamental abstractions and implementations for performing inference with TensorFlow Lite models across platforms and languages. The key classes coordinate model loading, execution, and result retrieval.

Hardware Acceleration

References: tensorflow/lite/delegates, tensorflow/lite/nnapi

The TensorFlow Lite delegates provide frameworks and implementations for accelerating TensorFlow Lite model execution via specialized hardware backends. Key delegates include:

  • TensorFlow Lite GPU delegate: This allows offloading supported TensorFlow Lite operations to GPUs for acceleration. It is implemented in the …/gpu directory and supports backends like OpenCL, OpenGL, and Metal.

  • TensorFlow Lite Hexagon delegate: Located in …/hexagon, this delegate enables executing TensorFlow Lite models on Qualcomm Snapdragon devices by offloading operations to the powerful Hexagon DSP.

  • TensorFlow Lite NNAPI delegate: Found in …/nnapi, this delegate leverages the Neural Networks API (NNAPI) on Android to distribute computations to hardware accelerators and neural processing units (NPUs) available on Android devices.

The delegates follow a common pattern of mapping TensorFlow Lite operations to optimized kernels for the target hardware. Key implementation techniques include:

  • Representing operations as polymorphic GPUOperation classes that encapsulate kernels, attributes, and execution. Subclasses implement specific operations.

  • Selecting optimal operation implementations during conversion based on model properties and device capabilities using selector functions.

  • Partitioning models into executable subgraphs that can run on the hardware using classes like GraphBuilder and GraphFloat32.

  • Compiling models just-in-time for execution, caching results, and efficiently transferring data between frameworks.

  • Providing interfaces to integrate acceleration into TensorFlow Lite from languages like Java/C++ through wrappers and JNI bindings.

C++ Runtime

References: tensorflow

The C API implementation for TensorFlow Lite exposes interfaces for initializing, running inference, and deleting models from C/C++ code. It provides a clean interface for common tasks like initialization, input/output handling, and model execution. The interface handles interactions with the underlying TensorFlow Lite runtime in a portable way.

Python API

References: tensorflow/lite/python

The …/python directory contains the main Python APIs and tools for working with TensorFlow Lite models. This includes functionality for common tasks like converting models, analyzing models, optimizing models through calibration and quantization, and running inference with models.

The …/convert.py module provides utilities for converting TensorFlow models to the TFLite format. It includes high-level functions like convert() which handles the overall conversion workflow, as well as lower-level functions for converting specific model types like GraphDefs and SavedModels. Conversion can be configured through flags to control quantization, target ops, and other optimization settings.

Model analysis is supported through the ModelAnalyzer class defined in …/analyzer.py. The ModelAnalyzer.analyze() method parses a model and prints out details about its structure like subgraphs, ops, and tensors. It can check for GPU compatibility issues.

Model optimization is handled by the Calibrator class in …/calibrator.py. The Calibrator calibrates models by feeding representative data via its _feed_tensors() method. This collects min/max stats used in quantization. The calibrate_and_quantize() method then quantizes the model weights and activations. Quantization behavior can be configured through options.

Inference is performed using the Interpreter class defined in …/interpreter.py. An Interpreter instance can be created from a model file or buffer. It exposes common methods like allocate_tensors(), set_tensor(), invoke(), and get_tensor() to run inference. Hardware acceleration is supported by loading delegates with load_delegate().

The …/interpreter_wrapper module provides a Python wrapper class called InterpreterWrapper that encapsulates the underlying C++ Interpreter class. This handles initializing, allocating tensors, invoking, and accessing results through Pythonic methods while abstracting away complexity.

Mobile Platforms

References: tensorflow/lite/objc, tensorflow/lite/java, tensorflow/lite/swift

The …/objc and …/java directories contain tools and APIs tailored for building machine learning applications on mobile platforms like Android and iOS.

The …/objc directory provides an Objective-C API and runtime for using TensorFlow Lite models on Apple platforms. The core functionality is contained in …/sources, which defines important classes like TFLInterpreter and TFLTensor. TFLInterpreter represents a loaded TFLite model and is used to run inference via its invoke() method. TFLTensor encapsulates properties of tensors like name, type, and quantization parameters. Classes like TFLMetalDelegate and TFLCoreMLDelegate in …/sources allow accelerating compatible operations via backends like Metal and Core ML.

The …/java directory contains a Java API and tools for using TFLite on Android. The main API code is in …/java, defining classes like Interpreter and Tensor. Interpreter loads models from files and runs inference via its run() method. Tensor represents the multi-dimensional data arrays in models. Example apps in …/demo demonstrate common workflows like image classification on camera frames using the Java API. Tools in …/ovic provide computer vision functionality for tasks like classification and detection optimized for mobile.

Microcontrollers

References: tensorflow/lite/micro

The …/micro directory contains functionality for deploying TensorFlow Lite models to microcontrollers with limited memory and compute resources. At the core is the Interpreter class in the /core/ subdirectory, which allows loading and running TensorFlow Lite models. The Interpreter provides methods like AllocateTensors(), Invoke(), and GetTensor() to interface with models.

The /kernels/ subdirectory implements compute kernels like Conv, Add, Mul that execute operations optimized for supported microcontroller hardware. Useful utilities are provided in /micro_utils/, such as timing functions and logging. Example model usage is demonstrated in /examples/.

The Interpreter class is the primary interface for working with models. Its AllocateTensors() method allocates memory for model tensors based on their sizes. Invoke() runs the model on input tensors, populating output tensors. GetTensor() retrieves output tensor values after invocation.

Kernels implement core operations and are optimized for efficiency on constrained hardware. For example, the Conv kernel performs convolution using minimal memory. Kernels allow models to run without needing a full TensorFlow implementation.

Experimental

References: tensorflow/lite/experimental

The …/experimental directory contains functionality that is still under development and subject to change. This includes new features, extensions to core TensorFlow Lite capabilities, and experimental APIs. Code in this directory aims to prototype and evaluate new ideas before they graduate to stable status.

Some key areas of experimental and preview functionality include:

  • Acceleration configuration in …/configuration allows integrating hardware accelerators into TensorFlow Lite models through delegate plugins. Implementations like CoreMLPlugin and HexagonPlugin initialize backends by parsing configuration. This provides a way to leverage specialized hardware during inference.

  • Model modification in …/model_modifier contains functionality for tasks like embedding validation graphs within models. The CustomValidationEmbedder class handles connecting the validation subgraph to the model inputs/outputs. This allows running validation checks during inference.

  • Resource management in …/resource defines common interfaces and data structures for representing and accessing shared state like variables across operators. The ResourceBase interface and ResourceMap storage provide a standardized way to integrate new resource types.

  • Control flow representation in …/remat contains the ModelControlDependencies data structure and serialization utilities for capturing control dependencies between operations. This enables rematting models by reconstructing the control flow graph.

  • Hardware acceleration evaluation in …/mini_benchmark provides an abstracted framework and implementations for running validation tests on device configurations. The MiniBenchmark interface and BlockingValidatorRunner allow executing benchmarks to continuously evaluate performance.

  • Audio feature extraction in …/microfrontend contains implementations of complete audio pipelines that can run on embedded devices. The AudioMicrofrontendOp integrates optimized signal processing libraries with TensorFlow Lite to extract features from raw audio.

TensorFlow Addons

References: tensorflow

TensorFlow Addons provides additional libraries that extend TensorFlow functionality. It contains code for new models, layers, metrics, losses, and other utilities.

Some key functionality includes files that define:

  • New model architectures.

  • Custom layers.

  • Losses and metrics beyond TensorFlow's built-in functions.

  • Other utilities.

Important files include those that contain:

  • Implementations of models.

  • Definitions of layers.

  • Loss functions.

  • Additional utility functions.

Examples

References: tensorflow/examples

This section demonstrates common machine learning techniques using examples in TensorFlow. The examples cover tasks like image classification, audio processing, transfer learning, and extending TensorFlow with custom operations.

The …/image_retraining directory shows how to perform transfer learning by taking a pretrained model and retraining it for a new classification task. It uses the retrain.py script to load datasets, train a model, and evaluate accuracy. The script demonstrates common transfer learning workflows like fine-tuning models on new classes.

The …/wav_to_spectrogram directory contains examples for audio processing using TensorFlow. It implements an end-to-end example of converting an audio waveform (.wav file) to a spectrogram image representation in the WavToSpectrogram function. The function builds a TensorFlow graph to read the wav, apply a short-time Fourier transform to generate the spectrogram, scale and format the output, and save it as a PNG image.

The …/speech_commands directory provides examples for training and evaluating models for speech command recognition. It contains functionality for loading and preprocessing audio datasets in the AudioProcessor class. The directory also implements common model architectures for speech tasks in the RecognizeCommands class. The examples demonstrate training loops, evaluation, and exporting models.

The …/adding_an_op directory shows how to extend TensorFlow by defining and registering new operations. It provides examples in Python, C++, and CUDA. The AddOneOp class implements a new operation to add one to a tensor. Tests are included to ensure new operations integrate with TensorFlow features like graphs and eager execution.

Image Classification

References: tensorflow/examples/image_retraining, tensorflow/examples/label_image

This section covers examples of using pre-trained models like ResNet, Inception, and MobileNet to perform image classification tasks. The main functionality demonstrated is loading a pre-trained model, preparing an input image, running inference to classify the image, and interpreting the predictions.

Key files that demonstrate image classification include:

  • …/README.md explains retraining a model for new classes using retrain.py.

  • …/main.cc contains a C++ demo that loads the Inception model, prepares an input image, runs inference to classify it and prints predictions.

  • …/label_image.py shows a Python implementation that performs the same task as the C++ demo.

Both demos load a pre-trained model graph using load_graph(), which reads the frozen model definition from disk. They prepare the input image for the model using read_tensor_from_image_file(), which decodes, resizes and normalizes the image.

Inference is run on the model to classify the image by passing the preprocessed image tensor to sess.run(). This returns the predictions as a tensor.

The predictions are interpreted by looking up the top classes and their human-readable labels. For the C++ demo, GetTopLabels() analyzes the outputs to retrieve the highest scoring predictions and indices, while PrintTopLabels() displays them. In Python, the top-5 classes are retrieved and labels loaded with load_labels() are displayed.

These examples demonstrate common techniques for using pre-trained models with TensorFlow like Inception for image classification tasks, including loading models, preprocessing inputs, running inference, and interpreting outputs.

Object Detection

References: tensorflow/examples/multibox_detector

The TensorFlow C++ demo in the …/multibox_detector directory demonstrates object detection on images using a pre-trained Single Shot MultiBox Detector (SSD) model. The SSD model works by applying a convolutional network to extract image features, then predicting bounding boxes and confidences for multiple classes simultaneously using those features. It was trained end-to-end with a multi-task loss to optimize for classification and bounding box regression, allowing detection in one forward pass.

The main.cc file contains the main function to run object detection. It handles loading the SSD model graph with LoadGraph(), preprocessing input images with ReadTensorFromImageFile(), running inference on the model using a Session, and postprocessing the outputs with functions like GetTopDetections() and PrintTopDetections() to extract detections. DrawBox() directly modifies image pixels to visualize detections on the original image.

LoadGraph() loads the model graph definition containing the SSD network architecture and trained weights. ReadTensorFromImageFile() reads an image, resizes it and normalizes the pixels. GetTopDetections() runs a TopK operation to extract the highest scoring class IDs and scores. PrintTopDetections() decodes the locations outputs using priors, draws boxes on the image, and prints information.

The demo provides an end-to-end example of performing multi-class object detection on images using a pre-trained TensorFlow model via C++. The SSD model architecture and TensorFlow APIs like Session and functions like LoadGraph(), ReadTensorFromImageFile(), GetTopDetections(), and PrintTopDetections() are key to its object detection capabilities.

Transfer Learning

References: tensorflow/examples/image_retraining

The image_retraining example demonstrates transfer learning by fine-tuning pretrained models on new image datasets. It shows how to take a model pretrained on a large dataset like ImageNet and adapt it for a new classification task with fewer examples, such as identifying specific objects or scenes.

The main functionality is contained in the retrain.py script located at …/README.md. This script handles loading and preprocessing images, training a model via transfer learning, and exporting the trained model.

It loads a pretrained model checkpoint containing weights for all layers except the last fully-connected layer. New labels are read from a labels file and the final layer is retrained on the new classes. All other layers are frozen so their weights are not updated during fine-tuning, allowing the model to retain its original learned features.

The script uses Estimator for distributed training. It runs training loops using Estimator.train() with a DNNClassifier model, evaluating periodically with Estimator.evaluate().

After training, the script exports the model so it can be loaded and used for inference. This demonstrates how transfer learning can adapt powerful pretrained models to new domains with a small amount of retraining.

Audio Processing

References: tensorflow/examples/wav_to_spectrogram, tensorflow/examples/speech_commands

This section covers examples of audio processing tasks using TensorFlow that are contained in the …/speech_commands directory. The examples demonstrate common techniques like loading audio data, extracting features, training models, and evaluating performance.

The …/wav_to_spectrogram directory contains an example that loads a wav audio file, computes the short-time Fourier transform (STFT) to generate a spectrogram representation, and saves the output as a PNG image. It uses the AudioSpectrogram op to calculate the STFT. The WavToSpectrogram function builds a TensorFlow graph that reads the wav, applies the STFT to generate the spectrogram, scales it, encodes it to PNG, and writes the output file.

The …/speech_commands directory contains various examples for speech recognition tasks using the Speech Commands dataset. It provides functionality for loading audio clips from disk in …/input_data.py and applying preprocessing like normalizing volume, time shifting clips, mixing with background noise, and computing Mel-frequency cepstral coefficients (MFCCs) or spectrograms with ops like audio_ops.mfcc(). The AudioProcessor class manages the loading, preprocessing, and partitioning of the audio data into training, validation, and test sets for use in models.

The directory also contains examples for training models on the preprocessed data. The models.py module implements common neural network architectures for speech recognition with functions like conv() and single_fc(). The train.py script handles downloading and preprocessing the dataset, defining a model graph, running a training loop, and evaluating performance on validation and test sets.

Additional examples demonstrate labeling audio files with a trained model, calculating accuracy metrics by comparing predictions to ground truths, and evaluating models on continuous audio streams.

Adding Operations

References: tensorflow/examples/adding_an_op

Adding new operations to TensorFlow allows extending its functionality to support custom machine learning models, data processing tasks, and more. Operations are the fundamental building blocks that TensorFlow graphs are composed of, so being able to define new operations is crucial for creating customized models and workflows.

The …/adding_an_op directory contains code that demonstrates different techniques for adding operations to TensorFlow from Python, C++ and other languages. A key class shown is OpKernel, which all new operation kernels should subclass. OpKernel defines the Compute() method that contains the core logic an operation implements. New operations are registered with TensorFlow using the REGISTER_OP macro, while kernels are registered for specific devices using REGISTER_KERNEL_BUILDER.

Some important files include:

  • zero_out_op_kernel_1.cc - Defines a ZeroOutOp kernel class that zeros out elements of a tensor except the first. Shows implementing an operation kernel in C++.

  • zero_out_op_1.py - Loads the ZeroOutOp kernel as a new zero_out operation from a shared object file. Demonstrates exposing a new op to Python.

  • fact_test.py - Contains a test case that exercises a user-defined operation, validating it can be used like built-in ops.

To define a new operation, developers typically subclass OpKernel to provide the CPU/GPU implementation, register the op with TensorFlow, then load and use it from Python. Namespaces can also isolate execution of new operations. The examples in this directory provide templates for adding ops through various languages and mechanisms.

Custom Operations

References: tensorflow/examples/custom_ops_doc

The custom operations demonstrate different techniques for building custom operations (ops) in TensorFlow. The …/multiplex_1 directory contains a basic example op called MultiplexDenseOp that performs element-wise selection similar to NumPy's np.where().

The MultiplexDenseOp class is defined in multiplex_1_kernel.cc. It inherits from the OpKernel class, which is the base class for TensorFlow operations. The Compute() method contains the main logic of the op. It first checks that the shapes of the input tensors match using OP_REQUIRES(). It then extracts the flat tensor representations using a template parameter T to support different data types like int32. A for loop directly iterates over each element to select the value from the first input (a_values) if the condition is true, or the second (b_values) if false.

The multiplex_1_op.cc file registers the Examples1>MultiplexDense op with TensorFlow using the REGISTER_OP macro. It provides a shape function that is passed to REGISTER_OP and asserts the inputs all have the same shape by using the InferenceContext to merge the shapes. Any shape mismatch errors would be caught early in graph mode.

The multiplex_1_test.py file contains unit tests for the op. The MultiplexOpRank1Test class contains test methods like test_multiplex_int() that generate sample input tensors, calculate the expected output via NumPy, run the op, and assert the result matches the expectation. Tests validate functionality for different data types and shapes, and check the correct errors are raised for invalid inputs.

Continuous Integration

References: ci

The Continuous Integration system at TensorFlow provides an automated way to test, build, and deploy changes to the library across different platforms and configurations. It utilizes various scripts and jobs that are run whenever code is pushed to repositories to continuously validate the codebase. The main subdirectories that make up the Continuous Integration system are:

…/official - Contains the primary scripts and utilities used to run TensorFlow's continuous integration jobs. It provides a standardized way to test TensorFlow across different environments by defining them in configuration files and running scripts in a controlled manner. Some key components include the …/any.sh script which can run tests, builds or other scripts within a TensorFlow CI environment. It handles setting up common environments and running commands either natively or inside Docker containers depending on the configuration. The …/wheel.sh script implements an end-to-end workflow to build, test and publish TensorFlow Python wheels.

…/utilities - Provides common functions and abstractions used across CI scripts like setting up environments and tools, running commands, parsing outputs, and cleaning up. It includes files like …/setup.sh which loads variables and defines the tfrun function to consistently execute code locally or in containers.

ci - Orchestrates building Docker images used in CI from the TensorFlow codebase. The …/containers directory contains tools for building images containing TensorFlow, dependencies and utilities for development and testing tasks.

The system executes a standardized set of tests on each code change by reading configurations from files, setting up common environments via functions like tfrun, running scripts that orchestrate the build/test process, and conditionally executing steps based on environment variables. This provides an automated way to continuously validate TensorFlow across platforms and configurations in CI systems.

Scripts

References: ci/official, ci/official/wheel.sh

The main scripts that orchestrate the build, test and release pipeline for TensorFlow are …/wheel.sh and …/libtensorflow.sh. These top-level scripts define end-to-end workflows that are executed during continuous integration to build, test, and publish TensorFlow packages.

The …/wheel.sh script handles building, testing, and publishing TensorFlow Python wheels. It first sources utilities from …/setup.sh to configure the environment. It then builds the pip package using Bazel. Next, it runs the package builder executable, passing arguments. Following this, it runs the …/rename_and_verify_wheels.sh script to validate the built wheels. If enabled, it will upload the wheels to PyPI and Google Cloud Storage. Finally, it runs Bazel tests on the built wheels.

The …/libtensorflow.sh script handles building, testing, and publishing libtensorflow packages for Linux. It sources setup scripts to configure environments before running bazel test and bazel build on libtensorflow. It then runs the …/repack_libtensorflow.sh script to repackage artifacts before uploading to Google Cloud Storage if enabled.

Utilities

References: ci/official/utilities

The utilities implemented across TensorFlow CI jobs address common tasks needed to set up, run, and clean up continuous integration processes. Key utility scripts are located in the …/utilities directory.

Some important utilities include:

  • The setup.sh script loads CI environment variables and defines the tfrun function for consistently running commands locally or in Docker. It handles Docker configuration and calls platform-specific setup scripts.

  • The setup_docker.sh script manages pulling, building, and running the "tf" Docker container that commands are executed inside. It redefines tfrun to run inside this container.

  • Platform-specific setup files like setup_macos.sh perform one-time tasks such as configuring Bazel directories, installing tools, and enabling uploads.

  • The cleanup_summary.sh file contains functions for extracting ResultStore URLs from logs and printing messages if extraction fails.

  • Utility functions are defined for tasks like parsing build logs to extract metadata on Bazel invocations in extract_resultstore_links.py.

  • Cleanup utilities like cleanup_docker.sh provide instructions for removing the Docker container when tests complete.

The setup.sh script loads environment variables from the TFCI file if set. It defines the TFCI_GIT_DIR variable and handles variable precedence when loading values. The tfrun function centralizes command execution. Platform configuration like calling setup_macos.sh is done conditionally based on OS. An EXIT trap runs cleanup functions.

The setup_docker.sh script manages the Docker image using variables to determine if it should pull, rebuild, or upload. It runs Docker commands and defines tfrun to execute within the "tf" container. The container is run interactively if needed.

Containers

References: ci/official/containers

The …/containers directory contains Docker container definitions and build tools for TensorFlow continuous integration and releases.

The …/linux_arm64 subdirectory focuses on building Docker images for the Linux ARM64 architecture. The build.sh script handles building Docker images for different targets, setting tags, and pushing images to Google Container Registry. It builds images using Dockerfiles, setting build arguments and targets based on requirements.

The …/builder.* subdirectories contain tools and scripts for building dependencies during the container build process. The builder.devtoolset directory builds a cross-compilation GCC toolchain targeting manylinux2014 on ARM64. It downloads and builds glibc 2.17 and libstdc++ 4.8, then configures and builds GCC to target this environment. The builder.patchelf directory builds the patchelf binary patching tool from source using build_patchelf.sh, ensuring it is available later in the build.

The …/devel.usertools directory provides utilities for common development tasks on ARM64 like get_test_list.sh to get Bazel test names, rename_and_verify_wheels.sh to check wheels, and squash_testlogs.py which uses JUnitXml to merge test results. The setup_venv_test.sh script sets up Python virtual environments for testing.

Requirements

References: ci/official/requirements_updater

The …/requirements_updater directory handles managing Python dependency requirements files across TensorFlow versions. It utilizes the pip-compile tool to compile a base requirements file into locked requirements files for specific Python versions supported by TensorFlow.

The main scripts that drive the requirements updating process are:

  • …/updater.sh, which runs pip-compile via a Bazel target to generate locked requirements files for each version. It defines the list of supported versions in SUPPORTED_VERSIONS and loops through each one to compile requirements.

  • …/release_updater.sh, which updates the locked requirements files when the base dependencies change. It ensures consistency across TensorFlow releases by re-running the compilation on each supported version.

The key Bazel rule is //:requirements_"$VERSION".update, which handles invoking pip-compile to generate a locked requirements file for a given Python version.

By leveraging pip-compile and Bazel, this directory provides an automated process for managing TensorFlow's Python dependencies across versions through:

  • Defining supported versions

  • Compiling requirements into locked files

  • Ensuring consistency by re-compiling on base dependency changes

  • Cleaning up generated lock files

Tests

References: ci/official/wheel_test

The main tests related to continuous integration are those verifying that the TensorFlow API packages can be imported from built wheel files. This is done through the test_import_api_packages test defined in the …/test_import_api_packages.py file.

The ImportApiPackagesTest class is the primary way of checking importability. It loads the list of API packages from the _api/v2/api_packages.txt file included in the wheel. Some packages known to not directly map to importable modules are skipped.

The main test logic is the test_import_runtime() method. It contains a for loop iterating over each non-skipped package name. The __import__() function is used to import each package individually, catching any failures. These are logged and later checked to fail the test if any imports failed.

Another important piece is the …/update_requirements.sh script. It takes the wheel file and Python version as arguments. This script generates an input requirements file then runs a Bazel target to compile an updated lock file mapping dependencies to exact versions. This lock file is later used to resolve dependencies for the hermetic Python environments used by the import test.