logoAuto Wiki by


Auto-generated from keras-team/keras by Auto Wiki

GitHub Repository
Written inPython
Watchers 1.9k
Last updated2023-12-28
LicenseApache License 2.0
Auto Wiki
Generated at2023-12-28
Generated fromCommit 8e897f

Keras is a high-level deep learning API that provides building blocks for developing and training neural networks. It runs on top of lower level frameworks like TensorFlow, PyTorch, and JAX.

The key components of Keras include:

Keras uses an object-oriented approach, with base classes like Layer and Model that provide common functionality and subclasses that implement specific logic. The layers, models, losses, and other components integrate seamlessly to enable quickly developing neural network models.


References: keras/models

The Keras model classes Sequential, Functional, and Model provide the core building blocks for constructing neural networks. Sequential defines a linear stack of layers, making it simple to build basic models compositionally. Under the hood, Sequential utilizes an underlying Functional model for more complex operations.

Functional represents models as directed graphs of layers, allowing arbitrary connections between inputs and outputs. Its __init__() method initializes the model from input and output tensors, while call() runs the model by passing inputs through the graph. Properties like layers and methods like get_config() provide information on the model structure and enable serialization.

The Model class acts as the base for all Keras models. It inherits from Layer to make models layers themselves that can be connected, and Trainer to add common training methods. Model handles initialization of Functional and subclassing models, building from configuration, I/O via save() and load_weights(), and printing summaries with summary().

Sequential provides a simple API for adding layers with add() while utilizing an underlying Functional model as needed. Its call() method applies each layer sequentially. Functional represents the most flexible approach by allowing arbitrary connections specified during initialization.

Together these classes define the core Keras abstractions for constructing models programmatically through sequential stacking, graphs, or subclassing and enable common functionality like training, serialization, and inference.

Model Base Class

References: keras/models/

The Model class acts as the base class for all Keras models. It inherits from both the Layer class and the Trainer class. Inheriting from Layer allows models to be treated as layers themselves that can be connected together in Keras' functional API. Inheriting from Trainer adds common training and inference methods like fit(), predict(), etc.

The __init__() method detects if the model is being initialized as a Functional model or a subclassing model and initializes it appropriately by calling either the Layer or Functional __init__().

The call() method raises an error in the base class, as subclasses must override it to define the model's forward pass. Core methods like fit(), predict(), etc are just passed through to the Trainer implementation to inherit common training logic.

The build() method constructs the model layers. It has special logic to support building from a configuration instead of arguments, allowing models to be defined programmatically without instantiating layers.

The save() and load_weights() methods handle checkpointing the full model weights and architecture or just the weights respectively. Under the hood they use Keras' saving library APIs.

The summary() method prints a text summary of the model using the summary utils. It has options like line_length to customize the output for terminals of different widths.

By providing a common base with standard methods, the Model class allows Keras models to be initialized flexibly and trained/evaluated while abstracting away lower level details. It provides the foundation for Keras' modular approach to building deep learning models.

Sequential Models

References: keras/models/

The Sequential class represents a linear stack of layers and provides a simple interface for sequentially adding layers to a model. It handles the ordering of layers and underlying functionality like input/output shape inference. Sequential inherits from the base Model class.

Layers can be added to the model sequentially using the add() method, which validates the layer and adds it to the internal _layers list. This list is used to track the order of layers. add() will also call the _maybe_rebuild() method to reconstruct the underlying Functional model if needed.

The build() method constructs this underlying Functional model, which is stored in the _functional attribute. It takes the first layer as input, applies each subsequent layer in turn, and sets the resulting tensor as the output. This Functional model is then used for the call() method and other operations.

The call() method has two modes. It will either delegate to the underlying Functional model, or directly apply each layer in sequence if the model has not yet been built or if the inputs are not supported by _functional.

The layers property returns only the layers added by the user, filtering out the automatically generated InputLayer.

The get_config() method serializes the layer configurations for saving and loading models. The from_config() deserializes and recreates the model from this saved configuration.

In summary, Sequential provides a simple linear interface while utilizing the more complex Functional model under the hood when needed. It handles ordering layers, building the model, and serializing the configuration.

Functional Models

References: keras/models/

The Functional class represents a Keras model defined as a directed graph of layers. It inherits from the Function and Model classes. The __init__ method initializes a functional model from input and output tensors, validating the inputs and outputs are Keras tensors.

The call method defines how the model runs on input data by standardizing the inputs and running them through the graph of layers via _run_through_graph(). It supports passing training arguments. _standardize_inputs processes model inputs for calling, converting, adjusting ranks, and adding metadata. _flatten_to_reference_inputs flattens nested inputs.

Properties like input_shape, output_shape and layers provide information about the model structure. Methods like compute_output_spec also define the model behavior. The get_config method serializes the model configuration including layers, nodes, and input/output mappings. functional_from_config deserializes a model from this configuration.

The Functional class inherits from Function to define the forward pass through the model via the call method. This method runs the inputs through the graph of layers by calling _run_through_graph(). _run_through_graph() applies each layer in topological order, passing the output of each layer to the next. _standardize_inputs processes the model inputs before running them through the graph, converting them to tensors and adjusting ranks/metadata as needed. It supports passing arguments like training modes. The layers property stores the graph of layers as a list. Properties like input_shape and output_shape provide information about the model structure. The get_config and functional_from_config methods allow serializing and deserializing the model configuration.

Model Utilities

References: keras/models/

The …/ file contains utilities for cloning Keras models. It allows creating a new model instance with the same architecture and layers as an existing model, but with newly initialized weights. This is useful for tasks like model ensembling where you want multiple similar models.

The main functionality is contained in the clone_model() function. This function checks if the input model is a Sequential or Functional model and dispatches the cloning logic accordingly. For Sequential models it calls _clone_sequential_model(), and for Functional models it calls _clone_functional_model().

For subclassed models, it will serialize and deserialize the model configuration to recreate the model instance. Custom clone functions and input tensors are not supported for subclassed models.

_clone_sequential_model() iterates through the layers of the input Sequential model and clones each layer using the provided clone_function. It also handles cloning the input layer if needed.

_clone_functional_model() recursively runs the model to clone each layer. It checks the input tensors, and if not provided it will create new placeholder input tensors. It runs the model graph while applying the clone_function to each layer to clone it.

Model Testing

References: keras/models/, keras/models/

This section covers the test cases that validate the behavior of Keras models. The main classes for model testing are SequentialTest and FunctionalTest, which inherit from testing.TestCase.

SequentialTest contains methods that exercise different functionality of the Sequential model. Some examples include:

  • Adding layers with and without specifying input shapes using Input and direct specification
  • Building and calling models on both symbolic and eager tensors
  • Serialization of built and unbuilt models
  • Checking for errors like adding duplicate layers

FunctionalTest contains methods for testing the Functional model class. Some examples include:

  • Building models with multiple inputs and outputs
  • Passing scalar and tensor inputs
  • Calling models eagerly and symbolically
  • Input and output specifications
  • Passing inputs by name
  • Serialization

Both classes leverage functionality from the Keras and TensorFlow Python APIs like backend.KerasTensor. The tests validate correct behavior, ensure expected errors are raised, and cover major use cases.

Some key methods in more detail:

Overall the test cases provide comprehensive validation of core model functionality through different scenarios. This ensures the expected interfaces are met and implementations function properly.


References: keras/layers

The …/layers directory contains implementations of commonly used neural network layers that serve as basic building blocks for constructing models. These layers implement fundamental operations like dense connections, convolutions, recurrent connections, and more that are used widely across different model types and domains. The layers define clean Keras interfaces while delegating the actual computations to lower-level frameworks like TensorFlow wherever possible. This separation of concerns allows new layers to be easily added without having to reimplement core logic.

The key layers provided include Dense for dense connections, Conv1D/2D/3D for convolutions, LSTM/GRU/SimpleRNN for recurrent layers, BatchNormalization for normalization, and Activation layers for applying functions like ReLU. These layers have been optimized and tested extensively to ensure they meet expectations. The …/ file centralizes access to all layers by importing and re-exporting them from their respective sub-modules, providing a single namespace.

Now discussing some important classes and files in more detail:

The …/core directory contains fundamental layer types. The Dense layer in …/ implements the core dense connection between inputs and weights through matrix multiplication in its call() method.

The …/convolutional directory provides convolutional layers. The Conv2D layer in …/ inherits from the BaseConv layer, which defines common convolutional logic in call(). It overrides __init__() to set hyperparameters and calls the parent initialization.

The …/rnn directory contains recurrent layers. The LSTM class in …/ implements the core LSTM cell recurrence in its call() method based on the standard equations. The RNN layer in …/ handles running RNN cells on input sequences via inner_loop().

The …/normalization directory implements normalization layers. The BatchNormalization layer in …/ maintains moving averages of mean and variance to consistently normalize inputs during training and inference.

The …/activations directory contains activation layers. The Activation layer in …/ simply applies the given activation function to inputs in its call() method.

Core Layers

References: keras/layers/core

The core layers in Keras implement fundamental neural network building blocks like dense connections, embeddings, and input placeholders. These layers form the basic building blocks that can be combined to construct complex models.

The …/core directory contains implementations of common layer types. The Dense layer in …/ handles dense connections through matrix multiplication. Its call() method performs the core computation of multiplying the inputs by the kernel weights. The Embedding layer in …/ maps integers to dense vectors. It uses ops.take() in its call() method to extract embedding vectors from the weights matrix.

The InputLayer class in …/ is used to define input tensors for models. Its main responsibilities are constructing Keras tensors from arguments like shape and dtype, storing the input tensor, and registering it as the layer output. The Input function provides a cleaner API than using InputLayer directly. The Identity layer in …/ simply returns its input, preserving properties. This allows inserting identity layers without affecting computation.

The Lambda layer in …/ allows arbitrary Python functions to be used as layers. It implements layer methods like call() to wrap functions as layers. The Masking layer in …/ handles masking timesteps. Its compute_mask() method generates masks by checking for equality with the mask_value. The EinsumDense layer in …/ performs dense connections using Einstein summation notation, supporting arbitrary input dimensionality through careful analysis of the provided equation string.

Convolutional Layers

References: keras/layers/convolutional

The …/convolutional directory contains implementations of common convolutional operations as Keras layers. It provides classes for standard convolutions like Conv1D, Conv2D, Conv3D, as well as transposed/fractionally-strided convolutions with Conv1DTranspose, Conv2DTranspose, Conv3DTranspose. Depthwise convolutions are implemented in DepthwiseConv1D and DepthwiseConv2D. Separable convolutions are supported via SeparableConv1D and SeparableConv2D.

The core convolutional logic is defined in BaseConv, which serves as the parent class for standard convolution layers. It handles tasks like input validation, weight initialization, and computing the output shape. Child classes like Conv2D inherit this functionality while implementing the specific convolution operation.

DepthwiseConv2D inherits from BaseDepthwiseConv and performs depthwise separable convolutions. It splits the input into channels, applies a separate depthwise kernel to each channel, and concatenates the results.

SeparableConv2D leverages BaseSeparableConv to optimize separable convolutions. BaseSeparableConv contains the logic to first apply a depthwise convolution followed by a pointwise convolution in a single step.

Comprehensive testing is provided in files like and to validate the layers against NumPy implementations and with various arguments.

Recurrent Layers

References: keras/layers/rnn

The core RNN layers in Keras are implemented in the …/rnn directory. This includes commonly used RNN cell types like LSTM, GRU, and SimpleRNN as well as variants like ConvLSTM. Each layer type has its own class that inherits from the base RNN class.

The RNN class in …/ serves as the base class for all RNN layers. It handles running the input sequence through an RNN cell and returning the outputs. The RNN class takes a cell or list of cells as its first argument. Its key methods include __init__(), compute_output_shape(), build(), call(), and inner_loop(). inner_loop() uses the Keras backend RNN op for the core RNN computation. The RNN layer relies on the cell implementing the call() method and having state_size and output_size attributes. It supports stacked RNNs through the StackedRNNCells wrapper class and handles statefulness through trainable state tensors.

The LSTM class is defined in …/ It uses the LSTMCell class which implements the core LSTM cell logic in its call() method. LSTMCell computes the gates and cell state using the standard LSTM equations. It has two implementations for efficiency - one that splits computations and one that fuses them. The LSTM class handles actually running the RNN by calling LSTMCell on each timestep using inner_loop(). It integrates optimized backend implementations like CuDNN.

The GRU class is defined in a similar way in …/ The GRUCell class contains the core GRU cell logic, computing gates and updating the hidden state. The GRU class wraps GRUCell to apply it to input sequences as a full layer.

The SimpleRNN layer defined in …/ uses the SimpleRNNCell class which defines a basic RNN cell. SimpleRNN handles running the cell on full input sequences.

Variants like ConvLSTM defined in …/ combine convolutions with standard LSTM computations, applying convolutional operations for both input and recurrent transformations.

Normalization Layers

References: keras/layers/normalization

The Keras layers in the …/normalization directory implement various normalization techniques that can be applied to inputs. Normalization is useful for stabilizing the learning process and making the optimization problem easier to solve. The key layers are:

The BatchNormalization layer implements batch normalization through the following steps: in the build() method it adds trainable gamma and beta weights, and in call() it computes mean/variance of the current batch or uses moving averages from training, applies the normalization formula, and scales and shifts the values using gamma and beta.

LayerNormalization normalizes over the specified axis by computing mean and variance with ops.moments(), applying the normalization formula with options for gamma and beta, and supporting masking.

GroupNormalization reshapes inputs into groups, computes per-group stats with _apply_normalization(), applies the formula, and reshapes back, with options like gamma and beta. It generalizes layer and instance normalization.

UnitNormalization calculates the L2 norm across axes in call() with ops.sum() and ops.rsqrt(), and multiplies the inputs by the inverse normalization values to implement L2 normalization.

Thorough unit tests in files like and validate the layers, including correctness tests that pass random inputs.

Activation Layers

References: keras/layers/activations

The Keras layers in the …/activations directory implement common activation functions as reusable Keras layers. This allows different activations to be easily added to models during construction.

The directory contains layer classes for activations like ReLU, LeakyReLU, PReLU, ELU, and Softmax. Each layer class inherits from the base Layer class and focuses the call() method on applying the activation function. This delegates other responsibilities like output shape inference to Keras.

The ReLU layer in …/ applies the activations.relu function to its inputs in call(). It optionally takes hyperparameters like max_value and negative_slope.

The LeakyReLU layer in …/ applies activations.leaky_relu in its call() method, taking a negative_slope parameter.

The PReLU layer in …/ learns the alpha parameter, setting its shape in build() based on shared_axes. Its call() method calculates PReLU directly from inputs and the alpha weight.

The ELU layer in …/ applies the activations.elu function in its call() method, optionally taking an alpha parameter.

Thorough unit tests in files like …/ verify the implementations match specifications and expectations for serialization, input handling, and error cases.

Pooling Layers

References: keras/layers/pooling

Pooling layers downsample inputs spatially to reduce the number of parameters and computations in convolutional networks. The MaxPooling2D and AveragePooling2D layers are commonly used for this purpose. MaxPooling2D performs 2D max pooling on input tensors, taking the maximum value in each pooling window. AveragePooling2D similarly pools inputs by taking the average value in each window. Both layers inherit core pooling logic from the BasePooling layer defined in …/

BasePooling handles aspects like parameter validation, padding, and output shape computation that are shared between pooling layers. It implements the core call() method which performs the actual pooling operation by calling tf.nn.max_pool() or tf.nn.avg_pool() based on the pool_mode. Subclasses like MaxPooling2D and AveragePooling2D initialize the layer parameters and specify the pool_mode of "max" or "average" respectively.

The MaxPooling2D and AveragePooling2D layers downsamples inputs by applying the specified pooling operation over windows defined by the pool_size and shifted by strides. They support both "valid" and "same" padding modes which determine how the input is padded before pooling and thus the output shape. Thorough tests defined in files like …/ and …/ validate the correctness of these layer implementations.

Regularization Layers

References: keras/layers/regularization

The Dropout, GaussianDropout, SpatialDropout1D/2D/3D, and ActivityRegularization layers implement various regularization techniques to help prevent overfitting during neural network training.

Dropout randomly sets input units to zero during training based on a given dropout rate. This disrupts co-adaptations on the training data and forces units to learn more robust representations. The core dropout logic is implemented in the layer's call() method. GaussianDropout applies dropout by multiplying inputs by factors drawn from a Gaussian distribution. SpatialDropout1D/2D/3D drop entire feature maps rather than individual values, helping regularize spatial structure in early convolutional layers.

The ActivityRegularization layer allows easily adding L1 and L2 regularization to the activations of another layer during training. In its __init__() method, it sets the layer's activity_regularizer attribute to an instance of regularizers.L1L2 using provided L1 and L2 factors. This causes the regularization penalties on the layer's activations to be added to the overall loss. The layer's call() method simply returns its input, ensuring it is applied without changing the model architecture but still applies regularization. Together, these layers provide a variety of tools to prevent overfitting through regularization.

Attention Layers

References: keras/layers/attention

The …/attention directory implements various attention mechanisms for Keras. The core Attention class in …/ handles dot-product attention.

Thorough unit tests validate the attention implementations and calculations. The layers provide clean modular implementations of important attention mechanisms.

Preprocessing Layers

References: keras/layers/preprocessing

The Keras preprocessing layers provide a set of tools for common feature engineering and preprocessing tasks in deep learning models. These layers allow features to be normalized, discretized, encoded, and transformed before being fed into models. They handle common tasks like:

  • Normalization: The Normalization layer centers and scales features to have zero mean and unit variance. It can learn normalization statistics from data via adapt() or accept precomputed values.

  • Discretization: The Discretization layer buckets continuous features into discrete bins based on learned quantiles or predefined boundaries. It supports different output_mode encodings.

  • Encoding: The CategoryEncoding layer encodes categorical integer features into one-hot, multi-hot or count representations. The Hashing layer maps features to an integer hash space.

  • Transformations: Layers like FeatureSpace and TextVectorization allow complex feature engineering by applying combinations of preprocessing techniques. FeatureSpace handles normalization, discretization, crossings and outputs features in various formats. TextVectorization handles common NLP tasks like tokenization and vocabulary indexing.

These layers provide a consistent Keras interface for feature preprocessing tasks. Their adapt() methods allow statistics and vocabularies to be learned from training data. They produce outputs compatible with deep learning models and support TensorFlow data pipelines.

The core classes implementing these techniques include Normalization, Discretization, CategoryEncoding, Hashing, FeatureSpace, and TextVectorization. Key methods include their __init__(), adapt(), call(), and get_config() methods which handle initialization, statistic learning, core logic, and serialization respectively. Algorithms like quantile binning and hashing are implemented via lower level TensorFlow functions. The layers provide a clean high-level API for feature engineering in Keras models.

Merging Layers

References: keras/layers/merging

The Keras layers for merging multiple inputs implement common elementwise operations like addition, multiplication, concatenation, and more. These layers take two or more input tensors and combine them through elementwise functions to produce a single output tensor.

The core layers are Add, Multiply, Concatenate, and Dot. The Add layer implements elementwise addition by overriding the _merge_function in the Merge base class. It sequentially adds the input tensors using ops.add(). Multiply similarly overrides _merge_function to multiply the inputs together with ops.multiply(). Concatenate validates shapes and concatenates inputs along an axis with ops.concatenate(). Dot performs dot products along configurable axes via batch_dot().

Other layers implement minimum, maximum, average, subtract, and more. For example, Minimum finds elementwise minimum values by setting the first input as the initial output and recursively taking the minimum of each subsequent input with ops.minimum(). Average calculates averages by adding inputs with ops.add() and dividing by the count.

All merging layers inherit from the Merge base class in …/ This handles common functionality like input validation, broadcasting, and masking. Individual layers override _merge_function to apply the specific TensorFlow operation.

The …/ file contains comprehensive tests for the layers. It defines test parameters and runs correctness, error, and basic checks on each layer. These tests validate the key merging operations and error cases.

Reshaping Layers

References: keras/layers/reshaping

The Keras layers in the …/reshaping directory provide common reshaping operations that can modify the dimensions or structure of input tensors. This includes layers like Flatten, Reshape, and Cropping which are important for preprocessing data.

The Flatten layer takes a tensor of any dimensions and reshapes it into a 2D tensor by flattening all dimensions except the batch dimension. This is useful for converting convolutional or recurrent outputs into dense inputs. The Flatten layer preserves the batch size and handles different data formats like 'channels_first' and 'channels_last'. It uses ops.reshape() to perform the flattening based on the computed output shape.

The Reshape layer reshapes the input tensor into a target shape specified during initialization. It implements the compute_output_shape() method to determine the output shape based on the input shape and target shape. The Reshape layer resolves target shapes containing -1 dimensions by replacing them with inferred sizes. In its build() method, it stores the resolved target shape which is then used in call() with ops.reshape() to perform the actual reshaping.

The Cropping layers allow cropping portions of the input tensor along certain axes. For example, Cropping1D performs 1D cropping on the temporal dimension, Cropping2D crops spatial dimensions of images, and Cropping3D crops volumetric data. They take cropping parameters that specify how many elements to remove from each edge. The cropping logic is implemented in the call() method, which slices the inputs accordingly based on the cropping amounts and data format. compute_output_shape() calculates the output shape after cropping.

The Cropping layers support different cropping configurations like asymmetric cropping amounts on each side, same cropping on all sides, and cropping of different axes by different amounts. They validate that cropping values are within bounds of the input dimensions. Unit tests in files like and thoroughly validate the behavior of these layers under different arguments and configurations.

The Reshape and Flatten layers provide simple ways to modify tensor dimensions with minimal preprocessing code. The Cropping layers allow removing unwanted edge elements from inputs. Together these layers implement common reshaping operations as reusable Keras components.


References: keras/callbacks

Callbacks allow customizing model training by hooking into different stages of the process. Key stages include the start and end of epochs, batches, training, validation, and prediction. The Keras callback system handles this through callback classes that inherit from the base Callback class.

Callbacks implement methods like on_train_begin(), on_epoch_end(), and on_batch_end() to run custom code at these points. This allows behaviors like monitoring metrics, early stopping, model checkpointing, and progress logging. Callbacks are grouped into a CallbackList which ensures all callbacks are properly called at each stage.

The core Callback class defines the callback interface and empty method implementations subclasses can override. It has attributes to store the model and training parameters set via set_model() and set_params().

Many common callbacks are provided in Keras. The History callback automatically records metrics after each epoch into a history dictionary accessible on the model. It overrides on_train_begin() to initialize storage and on_epoch_end() to append results.

The ProgbarLogger prints metrics and progress to stdout using a progress bar. It implements callback methods like on_train_batch_end() to update the bar. The CSVLogger similarly logs to a CSV file by overriding on_epoch_end().

The TensorBoard callback handles logging metrics, weights, and graphs to TensorBoard. It manages summary writers and invokes logging callbacks at different points via methods like on_epoch_end().

The ModelCheckpoint callback saves models or weights periodically using options like save_best_only. It overrides on_train_end() and on_epoch_end() to handle checkpointing. The EarlyStopping callback stops training when a monitored metric stops improving by checking for improvement in on_epoch_end().

The ReduceLROnPlateau callback reduces the learning rate when a metric stops improving. It overrides on_epoch_end() to check for improvement and call the optimizer if needed.

Callback Base Class

References: keras

The Callback base class is defined in …/ It provides a common interface that all Keras callbacks must implement. The key aspects of the Callback class are:

  • It inherits from Python's Callback class to reuse functionality like exception handling.

  • Empty method implementations are provided for all callback hook points like on_train_begin(), on_epoch_begin(), etc. This defines the expected callback API.

  • Subclasses can override any specific hook methods to insert custom logic. For example, on_epoch_end() to evaluate metrics after each epoch.

  • Common properties are available like logs to access metrics from the last batch or epoch.

  • The params attribute allows callbacks to access the training configuration parameters.

  • Callbacks have access to the underlying model via model which enables inspecting layers/weights.

  • Callbacks can maintain custom state across epochs via instance attributes or properties. This state is not tracked with the model itself.

Some important callback implementations include:

  • The ModelCheckpoint callback saves models or weights periodically using options like save_best_only to save improved models based on a monitor metric. It overrides on_epoch_end() to check for metric improvement and save accordingly.

  • The EarlyStopping callback stops training when a monitored quantity stops improving. It overrides on_epoch_end() to check the metrics tracked in self.monitor against the best value and stop training if patience is exceeded.

  • The ReduceLROnPlateau callback reduces the learning rate when a metric stops improving. It overrides on_epoch_end() similarly to check for non-improvement in the monitored metric.

The Callback base class provides a standardized interface for callbacks to hook into the Keras training loop at different points. This allows custom training behaviors to be flexibly implemented and composed together through subclassing.

Model Checkpointing

References: keras/callbacks/

The ModelCheckpoint callback saves Keras models during training either at the end of each epoch or every N batches. It allows saving either the full model or just the model weights. The callback implements methods like on_train_batch_end(), on_epoch_begin(), and on_epoch_end() to determine when to save.

The ModelCheckpoint class handles the main saving logic. Its __init__() method sets up saving options like the file path and monitors the 'best' metric value. on_train_batch_end() saves if the batch interval in save_freq is reached. on_epoch_end() always saves at the end of each epoch if save_freq is an integer.

_save_model() performs the actual saving of either the full model or just the weights. It checks if the current result improves the monitored metric compared to the previous 'best'. If save_best_only=True, it will only overwrite the file in this case. _get_file_path() formats the file path using placeholders like {epoch} from the logs. It raises an error if the format fails.

The ModelCheckpoint callback thus provides a robust way to periodically save Keras models with options to control the saving behavior. By implementing callbacks that run at different points in training, it enables saving models either at the end of each epoch or every N batches for later resuming. The class encapsulates the saving logic while exposing configurable options for controlling file paths and when to save.

Early Stopping

References: keras/callbacks/

The EarlyStopping callback implements early stopping to prevent overfitting. It monitors a given metric like validation loss during training and stops training if the metric does not improve for a specified number of epochs, called the patience. This helps avoid wasting resources on epochs unlikely to improve performance.

The EarlyStopping class inherits from Callback and overrides methods like on_train_begin, on_epoch_end, and on_train_end to implement the early stopping logic. In __init__() it initializes parameters like the metric to monitor, patience, and whether to restore the best model weights.

on_train_begin() resets tracking variables. on_epoch_end() gets the monitored metric value from the logs, checks for improvement over the best seen so far using self.monitor_op, and resets the wait counter if improved. It stops training if patience is exceeded with no improvement. on_train_end() restores the best weights seen during training if configured.

EarlyStopping handles different improvement modes like 'min' and 'max' by introspecting the metric name and setting self.monitor_op to the appropriate Keras operator like ops.less or ops.greater. This allows it to work with any metric without additional configuration.

Learning Rate Scheduling

References: keras/callbacks/

The LearningRateScheduler callback allows dynamically adjusting the learning rate of the optimizer at the beginning of each epoch during training. It inherits from the Callback class and stores the provided schedule function and verbose flag in its __init__ method.

In the on_epoch_begin method, it first checks that the optimizer has a learning_rate attribute. It then gets the current learning rate value by calling backend.convert_to_numpy() on it. It passes the current epoch and rate to schedule to get the updated rate. It handles both the new and old API signatures for schedule for backward compatibility. It checks the returned type is a valid float, and sets the new rate on the optimizer. If verbose, it logs the new rate.

In on_epoch_end, it simply logs the final learning rate value to the logs dictionary. By implementing these methods, it allows dynamically adjusting the learning rate at each epoch during training via the user-provided schedule function. This provides a simple and flexible way to schedule learning rates from within Keras.


References: keras/callbacks/, keras/callbacks/

The CSVLogger and TensorBoard callbacks handle logging metrics and model topology during training. CSVLogger writes metric values to a CSV file at the end of each epoch using the on_epoch_end method. It handles different data types when writing by converting values to strings with handle_value().

TensorBoard enables visualizations and metrics logging for TensorBoard. It has methods like on_train_begin(), on_train_end(), on_epoch_begin(), on_epoch_end(), on_train_batch_begin(), on_train_batch_end() to log at different points. It manages different summary writers for directories with methods like _push_writer() and _pop_writer(). Key functionality includes:

The callbacks handle logging at different points of training via the Keras callback interface, providing a high level API for users. CSVLogger standardizes writing metric values to CSV while TensorBoard enables visualizations and flexible logging.

Remote Monitoring

References: keras/callbacks/

The RemoteMonitor callback integrates Keras training with remote monitoring platforms. The RemoteMonitor class inherits from the base Callback class and is initialized with parameters like the server URL and request path. During training, the RemoteMonitor overrides the on_epoch_end method to collect the epoch number and log metrics. It handles any NumPy arrays in the data and sends a POST request to the server path, serializing the data to JSON. This allows monitoring metrics and logs on each epoch end.

The key aspects of the RemoteMonitor implementation are:

  • The class inherits from Callback to hook into the Keras training loop.

  • The __init__ method sets configuration parameters like the server URL, request path, headers etc. using the requests library under the hood.

  • on_epoch_end collects the epoch number and log metrics from the logs dictionary.

  • Any NumPy arrays in the data are converted to lists to ensure they can be serialized.

  • The data is sent as either JSON if send_as_json=True or form-encoded otherwise via a POST request to the given server path.

  • Any RequestException from the requests library is caught and a warning is printed, allowing training to continue.

This allows critical training metrics and logs to be streamed to a remote monitoring service after each epoch for analysis, debugging and tracking progress over the course of training in a production setting.

Application Callbacks

References: keras/applications/, keras/applications/

The VGG16 and Xception models defined in …/ and …/ include callbacks tailored for these applications. When using these models, certain callbacks can be applied to take advantage of features specific to each model.

The VGG16 and Xception models leverage utilities in …/ for preprocessing inputs and decoding predictions. This file defines functions like preprocess_input() and decode_predictions() that are used by the model definitions.

The preprocess_input() function implements preprocessing expected by the models, such as scaling pixel values between -1 and 1. This function is called by each model to ensure inputs are in the expected format before being passed to the model layers.

The decode_predictions() function provides a convenient way to decode the raw predictions output by each model and obtain human-readable class labels. It handles mapping predictions back to the corresponding ImageNet classes. This utility allows easily interpreting results from the pretrained models.

When using these models via transfer learning, the preprocess_input() and decode_predictions() functions can be leveraged via callbacks to preprocess inputs and postprocess predictions specifically for each model architecture. This allows taking full advantage of utilities implemented as part of each pretrained model definition.

Advanced Callbacks

References: keras/callbacks/, keras/callbacks/

This section covers additional specialized callback classes provided in Keras that implement more advanced or niche functionality compared to the core callbacks.

The …/ file defines the LambdaCallback class, which allows users to define simple callback functions inline without creating new classes. It takes anonymous functions as arguments for different callback events like on_epoch_begin and on_batch_end. These functions will then be called at the appropriate points in training. This provides flexibility while keeping callbacks lightweight.

The …/ file defines the TerminateOnNaN callback class. This callback checks for invalid loss values like NaN or infinity after each training batch by implementing the on_batch_end method. If an invalid loss is encountered, it prints a message and sets the model's stop_training flag to terminate training, helping avoid wasting resources on failed runs.


References: keras/optimizers

The …/optimizers directory contains implementations of various optimization algorithms that can be used to train Keras models. Optimization algorithms are essential for training deep learning models as they iteratively update model weights to minimize a loss function. The key optimization algorithms implemented in Keras include:

The Optimizer base class defined in …/ provides a common interface for all Keras optimizers. It imports the appropriate backend optimizer class based on the Keras backend in use, allowing Keras to support multiple backends like TensorFlow, PyTorch, and JAX with a single optimizer API. The Optimizer class ensures a consistent method signature regardless of backend.

The …/schedules directory contains implementations of learning rate schedules that can be used with Keras optimizers. Learning rate schedules control how the learning rate decays over the course of training, allowing the optimizer to efficiently converge on optimal weights. Schedules like ExponentialDecay, PiecewiseConstantDecay, and CosineDecay are defined to implement important decay functions.

The core stochastic gradient descent algorithm with optional momentum is implemented in the SGD class located at …/ It performs single variable updates using momentum calculations defined in its update_step() method.

Adaptive learning rate methods that dynamically adapt the learning rate for each parameter are implemented in files like …/ for RMSprop, …/ for Adam, and …/ for Adadelta. Each implements the characteristic update rules through update_step() methods while leveraging common functionality from Optimizer.

Additional optimization algorithms are located in files such as …/ for Adagrad and …/ for FTRL. The FTRL optimizer class maintains "accumulators" to track parameter-specific learning rates over time. Adagrad adapts rates based on accumulated squared gradients computed in its update_step().

Thorough unit tests located in files like validate the key functionality, configurations, and mathematical correctness of each optimizer against "golden" values. These help prevent regressions in the optimization logic.

Stochastic Gradient Descent

References: keras/optimizers/

The SGD optimizer implements the stochastic gradient descent algorithm for training neural networks. SGD is one of the most commonly used optimization algorithms in deep learning.

SGD works by estimating the gradient of the loss function for each training example and updating the weights in the opposite direction. Specifically, it calculates the loss gradient for each example, then takes a step in the opposite direction of that gradient, proportional to the learning rate. This has the effect of minimizing the loss function.

The SGD class in Keras handles the implementation of the SGD algorithm. It inherits from the base Optimizer class. In its __init__() method, it initializes properties like the learning rate.

The build() method initializes momentum variables if momentum is enabled. For each trainable variable, it adds a momentum variable to the momentums list using self.add_variable_from_reference(). This sets up the variables needed for applying momentum during weight updates.

At the core of SGD is the update_step() method. This performs a single optimization step. If momentum is disabled, it simply performs a vanilla gradient descent update by subtracting the raw loss gradient from the weights, proportional to the learning rate.

If momentum is enabled, update_step() computes the new momentum value using either the vanilla or Nesterov formula as defined in the class docstring. It then applies this momentum to smoothly update the weights in the direction of the loss gradient. This helps accelerate SGD convergence.

Adaptive Learning Rate Methods

References: keras/optimizers/, keras/optimizers/, keras/optimizers/

These algorithms adapt the learning rate during training based on the characteristics of the gradients:

  • The RMSprop optimizer normalizes the gradient by the running average of its recent magnitude. It maintains a moving average of the squared gradients called the velocities to divide the gradient by. This has the effect of lowering the learning rate for parameters that are changing frequently and raising it for infrequent parameters.

  • The Adam optimizer is based on adaptive estimates of lower-order moments. It computes bias-corrected first and second moment estimates of the gradients called momentums and velocities respectively. It then uses these estimates to perform an adaptive learning rate optimization where frequent parameters have a smaller effective learning rate.

  • The Adadelta optimizer works similarly to Adagrad in adapting the learning rate for each parameter, but does not monotonically decrease the learning rate. Instead it adapts based on a moving window of gradient updates, maintaining accumulated_grad and accumulated_delta_var variables to store exponentially weighted moving averages of squared gradients and parameter updates. It then computes the adaptive learning rate from these accumulated values.

The implementations of these algorithms in Keras closely follow their mathematical formulations. For RMSprop, the RMSprop class stores the velocities moving average in its _velocities attribute. The core update logic in update_step() normalizes the gradient by the square root of _velocities.

For Adam, the Adam class stores the momentums, velocities, and optional velocity hats in _momentums, _velocities and _velocity_hats. The update_step() method calculates the bias-corrected moment estimates and uses them to perform the adaptive update.

Adadelta maintains accumulated_grad and accumulated_delta_var lists for each parameter. The update_step() method assigns new values to these based on the current gradient and previous accumulated values, then applies the adaptive update.

Additional Optimization Algorithms

References: keras/optimizers/, keras/optimizers/, keras/optimizers/, keras/optimizers/

These additional optimization algorithms provide alternative approaches to updating model variables during training. Adagrad, Adamax, Nadam, and FTRL each implement distinct optimization algorithms through custom Keras optimizer classes.

The Adagrad optimizer tracks a separate learning rate for each model parameter, lowering the learning rate more for frequently updated parameters. It maintains per-variable accumulator tensors initialized in its build() method. The update_step() method calculates adaptive learning rates by dividing the overall rate by the square root of the accumulators plus a small epsilon value.

Adamax is based on the Adam algorithm but uses the infinity norm rather than root-mean-square. It initializes separate momentum _m and norm _u variables for each model variable in build(). The update_step() method calculates new momentum m and norm u values, then updates variables using an adaptive learning rate derived from the beta1 exponential moving average.

Nadam implements Nesterov-accelerated Adam, using momentum _momentums and velocity _velocities estimates initialized in build(). Its update_step() method contains the core Nadam update logic, calculating updates based on these estimates, the gradient, learning rate, and other Nadam hyperparameters.

The Ftrl optimizer is suitable for shallow models with large sparse feature spaces. It initializes accumulators and linear variables for each model variable in build(). The update_step() method performs the FTRL update steps outlined in its docstring, using the gradient, learning rate, and regularization terms like L1 and L2. It clips the linear variable and divides by the quadratic term to obtain the final variable update.

Learning Rate Schedules

References: keras/optimizers/schedules/, keras/optimizers/schedules/

The Keras optimizers package provides several classes for controlling the learning rate decay over the course of model training. The core class is LearningRateSchedule, which defines the interface for learning rate schedules through its __call__ method. This method takes a step value and returns the decayed learning rate.

The ExponentialDecay, PiecewiseConstantDecay, PolynomialDecay, InverseTimeDecay, and CosineDecay classes all implement different decay functions to reduce the learning rate over time in a controlled manner. ExponentialDecay decays the rate exponentially using parameters like initial learning rate, decay steps, and decay rate. PiecewiseConstantDecay allows specifying constant rates for intervals of steps defined by boundaries and values lists. PolynomialDecay decays polynomially using initial/final rates, decay steps, and a power value. InverseTimeDecay decays the rate inversely proportional to time. CosineDecay provides cosine decay with optional warmup by increasing the rate linearly at first.

The classes each override LearningRateSchedule's __call__ method to implement the specific decay function. For example, ExponentialDecay computes the decayed rate using its parameters in an exponential formula. PiecewiseConstantDecay uses conditional logic on the step value to lookup the appropriate constant rate from its lists. PolynomialDecay applies its polynomial formula.

The serialize and deserialize functions allow serializing and deserializing learning rate schedules for checkpointing.

Optimizer Base Class

References: keras/optimizers/

The Optimizer class serves as the base class for all Keras optimizer implementations. It inherits from either TFOptimizer, TorchOptimizer, or JaxOptimizer depending on the Keras backend in use. These backend-specific subclasses contain the implementations of optimizer updates and gradient computations that are compatible with each backend framework. BaseOptimizer defines a more generic base class with minimal functionality that is used if an unsupported backend is detected.

When a Keras optimizer is instantiated, it will actually be one of the backend subclass objects under the hood. The Optimizer class ensures a consistent interface for all optimizers regardless of backend.

By conditionally importing and assigning the appropriate backend subclass, Optimizer provides a common Keras optimizer interface while routing the implementation to backend-specific code. This allows Keras code and APIs to remain backend-agnostic.

Optimizer Testing

References: keras/optimizers/, keras/optimizers/, keras/optimizers/

The unit tests for Keras optimizers validate that the key optimization algorithms are implemented correctly. There are test files for the main optimizers:

  • …/ contains tests for the Stochastic Gradient Descent (SGD) optimizer. This tests basic update logic, configuration, weight decay, correctness over many steps, and gradient clipping.

  • …/ tests the RMSprop optimizer update logic, serialization, single step updates, weight decay, correctness against golden values, and gradient clipping functionality.

  • …/ focuses on testing Adam optimizer updates, configuration, weight decay, correctness, clipping, and exponential moving averages.

Each file contains a test case class like SGDTest that inherits from testing.TestCase. This class holds test methods that directly exercise the optimizer code. Tests configure dummy data, apply optimizer updates, and validate the results match expectations. This validates the core optimization algorithms are implemented correctly.

The tests thoroughly cover serialization, single step updates, weight decay, correctness over many steps, and gradient clipping. This ensures the optimizers continue functioning properly under various conditions. The unit tests provide an effective way to prevent regressions and verify the optimizers meet their specifications.


References: keras/losses

The …/losses directory contains the core loss functions for training neural networks in Keras. The main loss functions are defined in …/, including MeanSquaredError and CategoricalCrossentropy.

These losses are implemented as classes that inherit from the base Loss class defined in …/ The Loss class standardizes how losses are implemented in Keras by handling the calling of the subclass' call() method, applying masking and weighting sample losses, and reducing the losses as specified by the reduction type.

The loss function classes are initialized with hyperparameters like the name, reduction type, and dtype. Their call() method contains the core logic to calculate the loss values from y_true and y_pred.

Unit tests for the losses are in …/ Tests are provided for MeanSquaredError and CategoricalCrossentropy. The tests validate behaviors like correctness on sample data, weighted vs unweighted losses, and different reduction types.

Loss Function Implementations

References: keras/losses/

The …/ file implements many common loss functions for training neural networks. The main classes defined are LossFunctionWrapper and individual loss functions like MeanSquaredError, MeanAbsoluteError, CategoricalCrossentropy, and SparseCategoricalCrossentropy.

LossFunctionWrapper acts as a base class that wraps loss functions. It handles calling the loss function and allows configuring the reduction type, such as 'sum' or 'mean'. This provides a consistent API for losses.

Loss functions preprocess inputs with utilities like squeeze_to_same_rank and support passing sample weights. They compute the loss directly from labels and predictions. Losses that handle probabilities like CategoricalCrossentropy convert logits to probabilities internally.

Utilities are also defined, such as convert_binary_labels_to_hinge which preprocesses labels for hinge-based losses. Losses support both standalone functions and classes, enabling both functional and object-oriented usage in Keras models.

Loss Base Class

References: keras/losses/

The Loss class is the base class that all Keras loss functions must inherit from. It standardizes the implementation of loss functions by defining a common interface and functionality. The Loss class handles calling the subclass' call() method to calculate the raw loss values from the inputs. It then applies masking, weighting by sample weights, and reduction to the losses.

The key method subclasses must implement is call(), which contains the logic to calculate the raw loss values from the inputs y_true and y_pred. The Loss class calls this method and passes the results to further processing.

Loss standardizes several aspects of loss function implementation. It sets the name, reduction type, and dtype for all loss functions. These properties ensure losses can be identified and will work properly with Keras models.

The class centralizes common loss reduction logic in methods like reduce_weighted_values(). This method handles applying masking to the sample weights, normalizing input shapes with squeeze_to_same_rank(), weighting the losses by sample weights, and passing the weighted losses to reduce_values() for reduction. reduce_values() sums the losses and optionally divides by the batch size for 'sum_over_batch_size' reduction.

By subclassing Loss, loss functions leverage this standardized reduction logic and common interfaces. This ensures all losses work consistently with Keras. The base class' methods also take care of many implementation details so subclass code can focus just on calculating raw loss values.

Loss Utilities

References: keras/losses/, keras/losses/

The Loss base class standardizes the implementation of all loss functions. It defines the core interface and methods that subclasses must implement, including call(). The base call() method handles masking and weighting of losses.

The LossFunctionWrapper class is used to wrap legacy loss functions that do not support masking or weighting. It overrides their call() method to apply masking/weighting before delegating to the wrapped function. This allows these legacy losses to still work as expected with masking and sample weights.

Loss Function Tests

References: keras/losses/

The unit tests in …/ validate that Keras loss function implementations behave as expected. An ExampleLoss class is defined that implements mean squared error, acting as a simple test case.

The main LossTest class contains various methods to test loss functionality. The test_reduction method ensures losses calculate correctly under different reduction types like 'none', 'sum', and 'sum_over_batch_size'. test_mask checks masking, where the loss is only calculated on unmasked values. test_sample_weight tests sample weighting works as expected. test_mask_and_sample_weight combines these features. test_rank_adjustment verifies upgrading and downgrading input ranks. test_mixed_dtypes handles different dtype inputs properly. test_get_method checks the get() utility for losses. test_dtype_arg validates the dtype argument sets the correct output dtype.

These tests use pytest for running and numpy/Keras backend for operations. They comprehensively validate the core loss calculation logic and ensure losses behave as intended under various conditions. This helps prevent regressions and ensures consistent loss calculation in Keras.


References: keras/metrics

The …/metrics directory contains classes that implement metrics for evaluating model performance during training and testing. Metrics compute statistics like accuracy, precision and recall on model predictions. They are used to monitor and optimize models.

Key classes include Metric, Mean, Sum and MeanMetricWrapper. Metric defines the base interface for metrics, handling state tracking and resetting between epochs. Mean and Sum compute weighted averages and totals. MeanMetricWrapper allows wrapping functions to track their mean value.

Other important files are, and the metric-type subdirectories. provides utilities like updating confusion matrices. contains classes for computing sums and means. The subdirectories hold classes for specific metric types - classification, regression, etc.

Some key classes:

  • Classes like Accuracy, BinaryAccuracy and CategoricalAccuracy in compute accuracy-based metrics for classification. Accuracy calculates the fraction of correct predictions by tracking a 'total' and 'count' variable.

  • FBetaScore and subclasses in calculate F-scores for classification. FBetaScore maintains variables for true/false positives/negatives and computes precision, recall and the final F-beta score.

  • Classes in implement common regression metrics. For example, MeanSquaredError computes the mean squared error directly, while RootMeanSquaredError first calculates the squared error then takes the root.

  • Metrics in rely on confusion matrices. Classes like Precision and Recall accumulate true/false counts, then compute results based on these variables and the confusion matrix.

  • IoU and subclasses in evaluate semantic segmentation using intersection-over-union. They accumulate predictions into confusion matrices to calculate true/false positives/negatives for the IoU.

Metric Base Class

References: keras/metrics/

The Metric base class defines the core interface that all Keras metrics must implement. It handles tracking metric state and computations through subclasses.

The Metric class provides methods for adding metric variables via add_variable(), resetting variables between epochs with reset_state(), and accumulating updates into the variables with update_state(). Subclasses must implement update_state() to define the specific update logic, and result() to compute the final metric value from the state variables.

Metric also defines __call__() as a convenience method to directly call update_state() and result(). It allows metrics to be updated without a Keras session via stateless_update_state() and computed via stateless_result() for distributed settings.

The key aspects of the Metric implementation are:

  • It tracks metric variables through the _variables property and add_variable() method
  • reset_state() simply resets all variables to zero between epochs
  • update_state() accumulates updates into the state variables, with specific logic defined by subclasses
  • result() computes the final metric from the state variables, with logic defined by subclasses

In summary, Metric provides the common scaffolding for metric state management and computations, while subclasses implement the unique logic for each specific metric. This standardized interface allows new metrics to be easily implemented.

Reduction Metrics

References: keras/metrics/

The …/ file contains utilities for computing reduction metrics like sums and means. It defines the core Sum() and Mean() metric classes, which handle reducing values across samples.

The Sum() metric class tracks the running total of values in a total variable using the Zeros() initializer. In update_state(), it calls reduce_to_samplewise_values() to apply sample weights and reduce extra dimensions if needed. This summed value is then assigned to total. reset_state() resets the total to 0, and result() simply returns the total.

The Mean() metric works similarly but tracks both the running total and sample count in variables. In update_state(), it assigns the summed values to total and increments count. reset_state() resets both variables, and result() returns the total divided by the count to compute the actual mean.

The reduce_to_samplewise_values() utility function handles reducing tensor values to the sample dimension based on weights. It takes the values, weights, reduction type like sum() or mean(), and dtype. This allows metrics to work across different value and weight shapes.

The MeanMetricWrapper class inherits from Mean() and allows wrapping an arbitrary metric function. The wrapped function's output is reduced like other metrics and its mean tracked over time. This provides a simple way to track the average of any loss or evaluation metric.

Confusion Matrix Utilities

References: keras/metrics/

The …/ file provides important utilities for updating confusion matrices. It contains the ConfusionMatrix Enum which defines the possible confusion matrix variables as TRUE_POSITIVES, FALSE_POSITIVES, TRUE_NEGATIVES, FALSE_NEGATIVES.

The core function for updating confusion matrix variables is update_confusion_matrix_variables(). It handles tiling predictions, labels, and thresholds to compute the true positives, false positives, etc in an element-wise way. For improved efficiency, _update_confusion_matrix_variables_optimized() provides an optimized implementation when thresholds are evenly distributed. It leverages "buckets" based on thresholds to update variables in one pass.

The confusion_matrix() function computes the actual confusion matrix values given predictions, labels, and number of classes. It uses tf.scatter_nd() to bin the predictions and labels into a confusion matrix tensor.

The is_evenly_distributed_thresholds() helper checks if a list of thresholds is evenly spaced, enabling use of the optimized method.

Accuracy Metrics

References: keras/metrics/

The …/ file implements several important accuracy metrics for evaluating classification models. It contains classes that calculate prediction accuracy in different ways depending on the type of classification problem.

The core Accuracy class calculates the fraction of predictions where the predicted class is equal to the true class. It tracks a 'total' and 'count' variable to calculate accuracy as the total correct predictions over the total number of samples. The Accuracy class subclasses MeanMetricWrapper from …/ to calculate the mean accuracy over batches.

For binary classification problems, the BinaryAccuracy class uses the binary_accuracy function to compare predictions to labels based on a threshold. This allows for non-integer predictions as long as they are above or below the threshold. The class checks that the threshold is valid during initialization.

For multi-class classification with one-hot encoded labels, the CategoricalAccuracy class leverages the categorical_accuracy function. This function handles the details of squeezing labels and casting types as needed before finding matching elements in the predictions and labels.

The sparse_categorical_accuracy function and associated SparseCategoricalAccuracy class are for evaluating models using sparse categorical labels instead of one-hot.

The top_k_categorical_accuracy function calculates accuracy based on whether the true label is within the top K highest probability predictions for each sample.

All of these accuracy metric classes inherit from MeanMetricWrapper to calculate a mean accuracy score over batches during training or evaluation. This base class handles updating the metric result and returning it after each call. It also supports passing optional sample weights.

Regression Metrics

References: keras/metrics/

The regression metrics defined in …/ allow evaluating model performance on regression tasks. Key metrics include:

The cosine_similarity() function directly computes cosine similarity from normalized true and predicted values, used by the CosineSimilarity metric class.

Probabilistic Metrics

References: keras/metrics/

The …/ file defines several probabilistic metric classes that can be used to evaluate models trained with probabilistic losses. These metrics wrap loss functions to compute a performance metric rather than an error signal.

The main classes defined are KLDivergence, Poisson, BinaryCrossentropy, CategoricalCrossentropy, and SparseCategoricalCrossentropy. All inherit from MeanMetricWrapper to compute the mean metric value over samples.

KLDivergence computes the Kullback-Leibler divergence between y_true and y_pred using y_true * log(y_true / y_pred). It wraps the kl_divergence function.

Poisson computes the Poisson metric between y_true and y_pred using y_pred - y_true * log(y_pred). It wraps the poisson function.

CategoricalCrossentropy assumes one-hot encoded multi-class labels. It computes cross-entropy using categorical_crossentropy, allowing options like from_logits and label_smoothing. The entropy computation axis can be set.

SparseCategoricalCrossentropy expects integer class labels rather than one-hot. It wraps sparse_categorical_crossentropy to compute the metric.

All classes are designed to directly evaluate models or be used with Model.compile() for evaluation during training. They provide a standardized way to monitor probabilistic losses as metrics.

F-beta Metrics

References: keras/metrics/

The …/ file defines two main classes for computing F-beta scores for classification problems: FBetaScore and F1Score.

The FBetaScore class computes the F-beta score, which is a weighted harmonic mean of precision and recall. It takes the beta parameter to weight recall vs precision, with higher beta placing more importance on recall. The class inherits from the Metric base class and implements the common Keras metric API.

Internally, it maintains state variables like true_positives, false_positives etc using Keras variables initialized in _build(). The update_state() method handles updating these variables based on the passed y_true and y_pred tensors after applying thresholding. Thresholding logic converts prediction probabilities to binary predictions by comparing values to the threshold parameter.

The final F-beta score is computed in result() via formulas involving the precision, recall, and beta values. It returns a single value or supports different averaging schemes like 'macro' based on the average argument. Sample weights can also be applied via the sample_weight argument.

The F1Score class is a simple subclass that sets beta=1 to compute the specific F1 score metric. Both classes follow best practices like input validation, configurable options, and serializable config.

IoU Metrics

References: keras/metrics/

The …/ file defines several classes for computing the Intersection-Over-Union (IoU) metric, which is commonly used to evaluate semantic segmentation models. The IoU metric measures the overlap between predicted regions and ground truth regions.

The _IoUBase class acts as the base for all IoU metric classes. It handles accumulating predictions and labels into a total_cm confusion matrix using the update_state() method. This function converts inputs to tensors and ignores any provided ignore_class, accumulating the current confusion matrix returned by confusion_matrix() into the total confusion matrix.

The IoU class computes IoU for specific target classes. Its result() method calculates true positives, false positives, and false negatives from the confusion matrix totals for each target class. It then computes the individual class IoUs and returns their mean.

The MeanIoU class computes the mean IoU across all classes by setting the target classes to a list of all class IDs. In result(), it simply calls the superclass method and divides the total IoU by the number of valid classes to get the average IoU.

The OneHotIoU class handles one-hot encoded labels and predictions. Its update_state() method uses argmax to convert the inputs to integer format before accumulating them into the confusion matrix. This allows computing IoU when the inputs are provided as probability distributions over classes rather than discrete class IDs.

Hinge Metrics

References: keras/metrics/

The …/ file defines three classes - Hinge, SquaredHinge, and CategoricalHinge - that implement different hinge-based loss metrics. These classes calculate metrics based on the hinge loss like Hinge, SquaredHinge.

All three classes inherit from the MeanMetricWrapper class in …/ This class handles computing the mean of the loss function over samples. The child hinge metric classes override the __init__ method to set the specific loss function - hinge, squared_hinge, or categorical_hinge respectively - via the fn argument.

To use one of the hinge metric classes, you instantiate it and then call the update_state() method to calculate the loss on new data. The result() method then returns the average loss value. The reset_state() method clears the accumulated state.

For example, to calculate the standard hinge loss on data you would:

  1. Create a Hinge instance
  2. Call update_state() to calculate the loss
  3. result() returns the average hinge loss value
  4. Optionally reset_state() before the next update

This provides a simple way to integrate hinge-based losses as metrics within Keras models.

Confusion Matrix Metrics

References: keras/metrics/

The …/ file contains metrics that rely on the confusion matrix for evaluation. Metrics like Precision and Recall are implemented, which compute various true and false positive/negative counts from the confusion matrix to calculate precision and recall scores.

Several classes inherit from the Metric base class to implement confusion matrix metrics. The _ConfusionMatrixConditionCount abstract base class counts conditions in the confusion matrix, and classes like TruePositives inherit from it to compute specific counts.

The Precision and Recall classes leverage counts from subclasses of _ConfusionMatrixConditionCount to calculate precision and recall scores directly. Threshold-based metrics like SensitivityAtSpecificity and SpecificityAtSensitivity search for the optimal threshold satisfying their condition.

The AUC metric approximates the area under the ROC or PR curve by dividing it into buckets. It computes the average true and false positive rates within each bucket to estimate the area via Riemann summation. All metrics override update_state() to accumulate confusion matrix counts using metrics_utils.update_confusion_matrix_variables().


References: keras/datasets

The …/datasets package provides easy access to several common datasets that can be conveniently loaded for testing, debugging, and example code in Keras. It contains modules for loading popular image, text, and tabular datasets such as MNIST, Fashion MNIST, CIFAR10, IMDB reviews, and Boston housing prices data.

The core module is …/, which imports the individual dataset modules. This allows other code to access the datasets through a single consistent interface. For example, functions like load_data() and get_data() defined in the underlying modules.

Some key business logic implemented across the dataset modules includes:

  • Downloading data files from online sources if not already cached locally using get_file() defined in This ensures data is always available even if the source changes.

  • Loading data stored in common formats like .npz arrays, pickle files, and JSON dictionaries using functions like np.load() and json.load().

  • Preprocessing data into NumPy arrays with the expected shapes through steps like parsing images/labels, truncating sequences, filtering words, and splitting into train/test sets.

  • Providing higher-level functions that encapsulate the entire loading and preprocessing pipeline with simple interfaces, hiding implementation details.

  • Implementing data loading in a consistent way across different types of datasets to make them easy to use interchangeably in examples and testing.

This package makes it very convenient to access common datasets without having to manually download, parse and preprocess data each time. The standardized interfaces also allow focusing experiment code solely on models without dataset handling logic.

MNIST Dataset

References: keras/datasets/

The …/ file provides easy access to the MNIST dataset of handwritten digits for use in examples and testing. It defines a single function, load_data(), which loads the MNIST training and test data from a compressed .npz file stored on Google Cloud Storage.

The load_data() function takes an optional path argument to specify where to cache the dataset locally. It uses get_file() to download the .npz file if not already present, handling file downloading and caching. The .npz file contains compressed NumPy arrays storing the image and label data.

load_data() loads this file using np.load(), extracting the x_train, y_train, x_test, y_test arrays containing 60,000 training images and labels and 10,000 test images and labels. Each grayscale image is 28x28 pixels in size. The label arrays contain the corresponding digit classes from 0-9.

load_data() returns a tuple of the training and test data, along with basic validation of the array shapes. No other algorithms or classes are implemented - it provides a simple interface for easily loading the standardized MNIST dataset into NumPy for use in models.

Fashion MNIST Dataset

References: keras/datasets/

The …/ file provides access to the Fashion MNIST dataset. This dataset contains 60,000 training images and 10,000 test images of clothing items like shirts, shoes and bags. Each 28x28 grayscale image is labeled with one of 10 class names.

The load_data() function handles downloading and loading the dataset. It first defines the local directory and URLs for the required gzip files containing the image and label data. get_file() is used to download these files if not present locally. For each file type (train/test images and labels), reads the raw byte contents. np.frombuffer() extracts the label data while np.reshape() reshapes the raw pixels into a 2D image matrix for the images. Finally, four NumPy arrays are returned containing the preprocessed training and test data - (x_train, y_train) and (x_test, y_test).

CIFAR10 and CIFAR100 Datasets

References: keras/datasets/, keras/datasets/

The files …/ and …/ provide access to the CIFAR-10 and CIFAR-100 image classification datasets.

The …/ file contains the load_data() function, which loads and prepares the CIFAR-10 data for use in Keras models. It downloads the CIFAR-10 archive if needed, then loads the training and test images and labels using the load_batch() function defined elsewhere. It reshapes and transposes the data if needed to match Keras' data format. Finally, it returns NumPy arrays containing the preprocessed image and label data.

The …/ file contains a similar load_data() function for CIFAR-100. It takes an optional label_mode parameter that can be "fine" or "coarse" to control the type of labels returned. It downloads the CIFAR-100 dataset if not present, then loads the training and test batches using load_batch(). It reshapes the label arrays and transposes the image arrays if needed. Finally, it returns tuples of NumPy arrays containing the loaded and processed CIFAR-100 image and label data.

Both files provide easy access to popular image classification datasets for use in examples and testing Keras models. The load_data() functions handle downloading, loading, preprocessing and returning the data in a format that can be directly used with Keras APIs.

IMDB Movie Reviews Dataset

References: keras/datasets/

The Keras dataset module provides access to the IMDB movie reviews dataset for sentiment classification. The IMDB dataset contains 50,000 movie reviews from IMDB, labeled as either positive or negative.

The load_data() function handles loading and preprocessing the IMDB dataset. It downloads the required files if needed, then loads the preprocessed training and test data using np.load(). It shuffles the data randomly using np.random.RandomState() for training. It optionally indexes words starting from a given index using list comprehensions. It also calls the remove_long_seq() function to truncate sequences longer than the provided maxlen parameter. It concatenates the training and test data, filters words using num_words and skip_top, and replaces out-of-vocabulary words with the oov_char. Finally it splits the data back into training and test sets to return.

The get_word_index() function downloads the word index JSON file from the source if needed. It loads and returns a Python dictionary by parsing the JSON file with json.load(). This dictionary maps words in the dataset to their integer indices.

Boston Housing Dataset

References: keras/datasets/

The …/ file provides access to the Boston housing prices regression dataset. This dataset contains information about houses in Boston from the 1970s, including attributes like crime rate and accessibility to ports. The target variable is the median home price.

The core functionality is the load_data() function, which loads the dataset. It takes the path, test split fraction, and random seed as arguments. It first checks that the test split is valid. It then calls get_file() to download the data if needed, caching it locally. The data is loaded into NumPy arrays for the features x and targets y using np.load(). It shuffles the data indices using np.random.RandomState() with the provided seed. The shuffled data is then split into train and test sets by slicing the arrays based on 1 - test_split. These sliced arrays - x_train, y_train, x_test, y_test - are returned.

California Housing Dataset

References: keras/datasets/

The load_data() function in …/ provides access to the California housing prices regression dataset. This function downloads the dataset from an online source if it is not already cached locally. It loads the data from a file containing NumPy arrays for features x and targets y.

The function takes parameters like version, path, test_split, and seed to control how the data is loaded. It asserts that test_split is between 0 and 1. It uses get_file() to download the dataset file if not present.

If version is "small" it subsets the first 600 rows of x and y. It shuffles the data indices using np.random.RandomState() with the provided seed for reproducibility. Finally, it splits the shuffled data into training and test sets based on test_split, returning them as NumPy arrays (x_train, y_train), (x_test, y_test).

Reuters Dataset

References: keras/datasets/

The load_data() function in …/ provides access to the Reuters newswire categorization dataset. It downloads the dataset from Cloud storage if needed, then shuffles and splits the word sequences and labels into train and test sets. Preprocessing options like truncating long sequences, filtering rare words, and replacing rare words with a special token can be applied.

The data is loaded from files containing word indices (xs) and labels (labels) using np.load(). The sequences and labels are shuffled together using np.random.RandomState() before being split into train and test sets. remove_long_seq() truncates sequences longer than the specified maxlen. Word frequencies are used to filter the most common words and replace rare words.

get_word_index() downloads a JSON file mapping words to indices if needed using get_file(). It loads and returns this file as a dictionary. get_label_names() simply returns the list of label names in the order they appear in the training data.


References: keras/applications

The …/applications directory contains implementations of popular pretrained models that can be used for transfer learning tasks. These include computer vision models like VGG16, ResNet50, InceptionV3, as well as NLP models like BERT.

Each model is defined as a Keras Model class or function in its own file, such as …/ These files leverage common Keras layers to build the architectures block-by-block according to specifications in research papers. Weights pretrained on ImageNet are also provided and can be loaded with a single line of code.

Utilities in files like …/ provide functions for standardizing inputs and outputs across models. preprocess_input() handles preprocessing inputs like scaling pixels, while decode_predictions() decodes outputs into human-readable class names. Comprehensive unit tests in …/ validate model behavior under different configurations.

Some key models and their implementations:

These pretrained models can be easily leveraged for transfer learning tasks with utilities like preprocess_input() and decode_predictions(), and by loading weights with a single line of code. The modular implementations in Keras make them straightforward to use and customize.

Pretrained Models

References: keras/applications/, keras/applications/

The files …/, …/, and related files contain implementations of popular convolutional neural network architectures that can be used for transfer learning. These pretrained models were trained on the ImageNet dataset and can serve as base models for feature extraction or fine-tuning on new tasks and datasets.

The VGG16() function in …/ constructs the VGG16 model architecture using Keras layers like Conv2D, MaxPooling2D, Flatten, and Dense. It builds the model block by block, applying these layers to recreate the full VGG16 network structure. The function supports loading pretrained ImageNet weights and handling input shapes.

The InceptionV3() function in …/ defines the Keras implementation of the Inception V3 CNN architecture. It builds the model by stacking mixed blocks containing convolutional and pooling layers. These mixed blocks implement the Inception module architecture where multiple convolution kernels are applied in parallel. The function handles configuring the model, preprocessing inputs, and loading pretrained weights.

Both files contain utilities like preprocess_input() for standardizing input data and decode_predictions() for decoding model outputs. They implement these architectures and related functionality using Keras's functional API with layers. The pretrained weights loaded by these models can be used to initialize a base model for transfer learning to new domains and tasks.

Model Utilities

References: keras/applications/

The utilities in …/ provide common functionality for loading pretrained weights, preprocessing inputs, and decoding predictions from models trained on ImageNet.

This file contains functions for preprocessing input images with preprocess_input() in different modes like "caffe", "tf", and "torch". It can preprocess NumPy arrays or tensors. It first validates the mode and data_format arguments before calling _preprocess_numpy_input() or _preprocess_tensor_input() to perform the actual preprocessing. These functions handle operations such as RGB-BGR conversion, mean subtraction, and scaling pixels depending on the specified mode.

Predictions from models can be decoded into human-readable class names and descriptions using decode_predictions(). It first loads the global CLASS_INDEX mapping, then iterates through predictions to find the top indices and look up the corresponding class information.

The input shape for models is validated and processed by obtain_input_shape(). It handles default input sizes, required shapes for pretrained weights, minimum size checks, and will raise errors for invalid shapes.

Model Testing

References: keras/applications/, keras/applications/

The Keras applications module contains comprehensive unit tests for the pretrained model implementations in These tests are contained in two main files:

The …/ file contains tests for all the available pretrained models. The ApplicationsTest class inherits from testing.TestCase and parameterized.TestCase to leverage Keras testing utilities and parameterize the tests. Tests are run for each model in the MODEL_LIST with different input shapes, channels, and data formats. Key tests include loading models, running inference on sample data, serialization/deserialization, and specifying classifier activations.

The …/ file contains unit tests for the imagenet_utils preprocessing module. The TestImageNetUtils class contains tests for preprocess_input() with both numeric and symbolic data in different modes. It also tests obtain_input_shape() with valid and invalid cases. Tests are parameterized to cover different arguments.

These two files provide a comprehensive set of unit tests for the pretrained model implementations and associated preprocessing utilities. The parameterized tests in ApplicationsTest validate each model under a variety of configurations. TestImageNetUtils thoroughly tests the key preprocessing logic. Together they help ensure the applications code functions as expected.


References: examples/keras_io

The …/keras_io directory contains examples demonstrating end-to-end workflows for training neural network models on different types of data using Keras. It covers domains like computer vision, natural language processing, audio processing, time series forecasting, and structured data modeling.

Some key subdirectories and their purposes are:

  • /vision contains computer vision examples applying CNNs, Transformers and other architectures to tasks like image classification, object detection, segmentation using popular datasets like CIFAR-10, Flowers, COCO.

  • /nlp includes natural language processing examples for sequence modeling, language modeling, text classification and generation leveraging techniques like RNNs, CNNs and Transformers. It uses datasets like IMDB, WikiText.

  • /tensorflow/audio explores audio processing and speech tasks, training models like 1D CNNs and Transformers on datasets for speaker ID, speech recognition, accent detection.

  • /structured_data demonstrates loading tabular data and building neural networks for classification and recommendations on datasets like Adult Census Income, MovieLens ratings. It utilizes techniques like embeddings.

  • /timeseries applies CNNs, RNNs, Transformers and Graph Networks to problems like anomaly detection, forecasting on benchmark time series datasets like NAB, FordA, traffic speeds data.

Many files showcase end-to-end implementations with core classes and functions that aid workflows. Some examples:

Computer Vision

References: examples/keras_io/vision

This section covers workflows for training computer vision models on image data using the Keras examples. The key functionality demonstrated includes:

  • Loading and preprocessing popular image datasets like CIFAR-10, MNIST, Flowers, Oxford Pets, etc. using TensorFlow Datasets and Keras preprocessing layers. This is handled by functions like get_dataset() which load, preprocess, and return batches of images and labels for training.

  • Defining common CNN architectures like ResNet, VGG, MobileNet, DenseNet using the Keras Applications and Keras layers like Conv2D, MaxPooling2D, BatchNormalization. Functions like get_model() build models using these components.

  • Implementing Transformer models for vision using components like Patches to extract patches from images, PatchEncoder to encode patches with positional embeddings, and TransformerEncoder to apply self-attention. Files like …/ demonstrate this approach.

  • Training models on datasets with techniques like data augmentation with RandomFlip, RandomRotation, learning rate scheduling, early stopping, model checkpointing. These are handled by functions like run_experiment().

  • Evaluating trained models on validation data and visualizing predictions. Functions like predict() and display() perform evaluation and visualization.

Some important implementation details:

The get_dataset() function in files like …/ loads images and masks from disk using TensorFlow IO, resizes them to the expected input size, and vectorizes them for training. It returns a TensorFlow Dataset object that can be iterated over to fetch batches.

The get_model() function in files like …/ defines common CNN architectures. It uses Keras layers like SeparableConv2D and Conv2DTranspose with techniques like batch normalization and residual connections.

Files like …/ demonstrate how the Patches class extracts patches from input images using keras.ops.image.extract_patches. The PatchEncoder class encodes the patches into embedding vectors using Dense layers. It also adds learned position embeddings. The TransformerEncoder applies self-attention on the encoded patches using MultiHeadAttention.

The run_experiment() function in many files handles training the models end-to-end on the preprocessed datasets. It applies techniques like learning rate scheduling, early stopping, model checkpointing.

Functions like predict() make predictions on held-out data, and display() visualizes examples to qualitatively evaluate models.

Natural Language Processing

References: examples/keras_io/nlp

This section covers several examples that demonstrate common natural language processing workflows using Keras for tasks like text classification, sequence modeling, and language modeling. The …/nlp directory contains examples applying recurrent neural networks, convolutional neural networks, and Transformer architectures to problems involving text data.

Some key examples include:

  • …/ demonstrates a basic sequence-to-sequence model for adding strings of numbers. It uses the CharacterTable class to one-hot encode variable length string inputs and outputs. An RNN encoder encodes the input while an RNN decoder generates the target sequence.

  • …/ implements sentiment classification on the IMDB dataset using a bidirectional LSTM architecture. It loads and preprocesses the dataset, defines the biLSTM model, and trains it end-to-end for text classification.

  • …/ contains an encoder-decoder model for English to French translation. An LSTM encoder encodes the input sequence while an LSTM decoder generates the target sequence conditioned on the encoder output.

  • …/ demonstrates neural machine translation from English to Spanish using the Transformer architecture. It leverages classes like TransformerEncoder and TransformerDecoder from the KerasNLP library.

  • …/ shows how to pretrain a BERT model from scratch on the WikiText-2 dataset using HuggingFace Transformers functionality.

These examples cover important NLP tasks, model types, and state-of-the-art techniques using popular libraries like Keras, KerasNLP, and HuggingFace Transformers. They provide full workflows from data preprocessing to model definition, training, and evaluation.

Audio Processing

References: examples/keras_io/tensorflow/audio

The examples in the …/audio directory demonstrate workflows for training audio models on tasks like speaker recognition, speech recognition, and accent classification. The code provides end-to-end examples of processing raw audio data into features, building deep learning models, training them, and evaluating performance.

The …/ file implements a speaker recognition model using a 1D CNN. It loads speech samples from different speakers with added background noise, takes the fast Fourier transform (FFT) of the samples to represent them in the frequency domain, and trains the CNN to predict the correct speaker. The residual_block function defines the residual block architecture used in the CNN, which helps train very deep networks and model long-term dependencies in audio. The add_noise function implements an important preprocessing step of adding background noise to the training set in a way that scales the noise based on each sample's amplitude. The audio_to_fft function applies the FFT transform needed to input audio features to the CNN model.

The …/ file trains a model to classify English accents. It uses the pre-trained Yamnet model to extract embeddings from input audio clips with the filepath_to_embeddings function. This function loads the audio, resamples it, runs it through Yamnet to get embeddings, and duplicates the labels to match the number of embeddings for each clip. It creates a labeled TensorFlow dataset with the dataframe_to_dataset function for training the model. Class weights are calculated from the dataset by counting samples for each class using tf.math.bincount on the Yamnet outputs.

The …/ implements an end-to-end automatic speech recognition model using a Transformer architecture. The TransformerEncoder class defines the encoder layer by applying multi-head attention, feed-forward networks, and layer normalization. The TransformerDecoder class similarly implements the decoder layer with masked multi-head self-attention and encoder-decoder attention. The Transformer model class combines the encoder and decoder. It also includes functions for preprocessing the LJSpeech dataset into spectrograms and text.

Generative Modeling

References: examples/keras_io/generative, examples/keras_io/tensorflow/generative

The code in …/generative and …/generative demonstrates workflows for training popular generative models like GANs, VAEs, and diffusion models on image datasets. It implements models such as CycleGAN, DCGAN, VAE, PixelCNN, DDIM, DDPM and more to generate photos, faces, digits and other types of images.

Key classes that power many of these generative models include GAN, DiffusionModel, VAE, and CycleGAN. The GAN class handles the core training logic for GANs by overriding the train_step() method. This method trains the discriminator on real and fake images, and the generator using a loss function. The DiffusionModel class implements the overall training and sampling process for diffusion models. Its train_step() method trains the model via denoising score matching by diffusing inputs with noise and computing the MSE loss between predicted and actual noise. The VAE class combines the encoder and decoder models into a single end-to-end trainable model. The CycleGAN class calculates important cycle consistency and adversarial losses during training.

Some important implementation techniques seen across models include:

  • Defining model architectures like discriminators using convolutional layers and generators using transposed convolutions.
  • Implementing diffusion schedules via utilities that define the forward and reverse diffusion processes.
  • Applying techniques like normalization, position embeddings, and attention in diffusion model architectures.
  • Sampling from models during training using callbacks to monitor generation quality.
  • Preprocessing datasets and defining metrics to evaluate model performance.

The code provides fully implemented generative modeling workflows, demonstrating best practices for building, training, and evaluating popular generative architectures on image and sequence data.

Reinforcement Learning

References: examples/keras_io/tensorflow/rl

This section demonstrates workflows for training reinforcement learning models to optimize agent behaviors. The code implements several important RL algorithms.

The …/rl directory contains Python files that apply different RL algorithms to solve OpenAI Gym environments. The file uses an actor-critic method with a shared neural network to solve CartPole-v1. It trains the actor to output advantageous action probabilities and the critic to estimate returns. The model is updated via policy gradients to maximize rewards.

The file implements an actor-critic method using a Keras deep learning model with shared layers between the actor and critic. The actor outputs the probability of each action, while the critic outputs the expected future reward. The model is trained to maximize rewards by minimizing the loss between the critic's predictions and actual returns, as well as increasing the probability of actions that lead to higher returns compared to the critic's predictions.

Structured Data

References: examples/keras_io/structured_data, examples/keras_io/tensorflow/structured_data

The examples in …/structured_data and …/structured_data demonstrate workflows for training models on tabular and structured data. This includes loading and preprocessing CSV datasets, building neural network models to perform tasks like classification and recommendation, and training the models on the data.

The FeatureSpace class implemented in …/ provides a clean interface for handling feature preprocessing. It is initialized with a features dictionary specifying the preprocessing type for each feature. The adapt() method indexes categorical values and computes normalization stats from the training data. When called on a feature dictionary, FeatureSpace returns a concatenated preprocessed vector. This allows asynchronous preprocessing via TensorFlow data pipelines and including preprocessing directly in inference models.

The RecommenderNet class in …/ defines an embedding-based collaborative filtering model for movie recommendation. It embeds users and movies, computes the dot product of the embeddings to get a match score, and adds biases before passing through sigmoid. This model is trained on the MovieLens dataset to minimize binary cross entropy loss.

Time Series

References: examples/keras_io/timeseries

This section demonstrates workflows for training various types of time series models on benchmark datasets. The examples cover common time series tasks like anomaly detection, classification, and forecasting.

The …/timeseries directory contains several illustrative examples. The file …/ shows how to perform anomaly detection on time series data from the NAB dataset using a convolutional autoencoder model. It loads and normalizes the data to create fixed-length sequences, builds an encoder-decoder model with Conv1D layers, trains it on normal sequences, then detects anomalies in test data based on reconstruction error thresholding.

The file …/ demonstrates time series classification from scratch on the FordA dataset using a fully convolutional neural network (FCNN) architecture. It loads and standardizes the FordA data, defines an FCNN model with Conv1D and pooling layers via the make_model() function, trains it end-to-end with callbacks, and evaluates the saved best model on held-out test data.

Another example is …/, which builds a Transformer model for the time series classification task. It leverages the transformer_encoder() block to efficiently stack encoder layers, applies global average pooling, and adds a classifier head. The model achieves around 85% accuracy on the FordA dataset without hyperparameter tuning.

The file …/ shows timeseries forecasting of climate data using an LSTM architecture. It loads Jena weather data, preprocesses it into windows via timeseries_dataset_from_array(), trains an LSTM-Dense model to predict 12 hours ahead, and validates predictions against true future values.

Transfer Learning

References: examples/keras_io/pytorch, examples/keras_io/tensorflow/vision

This section demonstrates workflows for leveraging pretrained models via transfer learning. Transfer learning is a technique where a model pre-trained on a large dataset is reused as the starting point for a new task, rather than training a model completely from scratch. The weights of the pretrained model are used as an initialization for the new model, and then a subset of the layers are fine-tuned on the new task while keeping other layers frozen. This helps the model learn meaningful representations from a smaller dataset more efficiently.

The code provides examples of transfer learning using popular pretrained models like VGG16, ResNet50, BERT and BiT. The MyBiTModel class loads a BiT model hub module and adds a new classification head on top to fine-tune the model for a new dataset. Only the head layers are initialized randomly while the BiT module weights remain fixed. The model is trained on a small Flowers dataset, demonstrating BiT can achieve good accuracy even with limited labeled data.

The file shows loading a pretrained ResNet18 model from TorchVision and fine-tuning it for image classification on Imagenette using Keras. The TorchModuleWrapper layer plays a key role by allowing any PyTorch module to be used as a Keras layer, enabling the PyTorch ResNet18 to be included inside Keras models and trained end-to-end.