keras
Autogenerated from kerasteam/keras by Mutable.ai Auto Wiki
keras  

GitHub Repository  
Developer  kerasteam 
Written in  Python 
Stars  60k 
Watchers  1.9k 
Created  20150328 
Last updated  20231228 
License  Apache License 2.0 
Homepage  keras.io 
Repository  kerasteam/keras 
Auto Wiki  
Generated at  20231228 
Generated from  Commit 8e897f 
Version  0.0.4 
Keras is a highlevel deep learning API that provides building blocks for developing and training neural networks. It runs on top of lower level frameworks like TensorFlow, PyTorch, and JAX.
The key components of Keras include:

Models  Keras provides model classes like [
Sequential
](…/sequential.py), [Model
](…/model.py), and [Functional
](…/functional.py) for defining neural network architectures. These handle model topology, training loops, inference, and serialization. [Layer
s](…/layer.py) are composed to build models. 
Layers  Keras implements common neural network layers like [
Dense
](…/dense.py), [Conv2D
](…/conv2d.py), [LSTM
](…/lstm.py) which provide the basic building blocks for models. Layers encapsulate weights, computations, and regularization. 
Optimizers  Keras provides implementations of optimization algorithms like [
SGD
](…/sgd.py), [Adam
](…/adam.py), [RMSprop
](…/rmsprop.py) for adapting model weights during training to minimize loss. 
Losses  Keras contains loss functions like [
categorical_crossentropy
](…/losses.py), [mean_squared_error
](…/losses.py) for quantifying model error to optimize during training. 
Metrics  Keras implements metrics like [
Accuracy
](…/accuracy_metrics.py) and [Precision
](…/confusion_metrics.py) for monitoring model performance during training/evaluation. 
Callbacks  Keras provides callback classes like [
ModelCheckpoint
](…/model_checkpoint.py) and [EarlyStopping
](…/early_stopping.py) that plug into model training to enable checkpointing, early stopping, and other behaviors. 
Datasets  Keras includes access to common datasets like MNIST, CIFAR10, IMDB to make model building easier.

Applications  Keras provides implementations of popular pretrained models like VGG16, ResNet50, that can be used for transfer learning.

Examples  The examples directory contains endtoend workflows demonstrating how to build, train, and evaluate models on different types of data.
Keras uses an objectoriented approach, with base classes like Layer
and Model
that provide common functionality and subclasses that implement specific logic. The layers, models, losses, and other components integrate seamlessly to enable quickly developing neural network models.
Models
References: keras/models
The Keras model classes Sequential
, Functional
, and Model
provide the core building blocks for constructing neural networks. Sequential
defines a linear stack of layers, making it simple to build basic models compositionally. Under the hood, Sequential
utilizes an underlying Functional
model for more complex operations.
Functional
represents models as directed graphs of layers, allowing arbitrary connections between inputs and outputs. Its __init__()
method initializes the model from input and output tensors, while call()
runs the model by passing inputs through the graph. Properties like layers
and methods like get_config()
provide information on the model structure and enable serialization.
The Model
class acts as the base for all Keras models. It inherits from Layer
to make models layers themselves that can be connected, and Trainer
to add common training methods. Model
handles initialization of Functional
and subclassing models, building from configuration, I/O via save()
and load_weights()
, and printing summaries with summary()
.
Sequential
provides a simple API for adding layers with add()
while utilizing an underlying Functional
model as needed. Its call()
method applies each layer sequentially. Functional
represents the most flexible approach by allowing arbitrary connections specified during initialization.
Together these classes define the core Keras abstractions for constructing models programmatically through sequential stacking, graphs, or subclassing and enable common functionality like training, serialization, and inference.
Model Base Class
References: keras/models/model.py
The Model
class acts as the base class for all Keras models. It inherits from both the Layer
class and the Trainer
class. Inheriting from Layer
allows models to be treated as layers themselves that can be connected together in Keras' functional API. Inheriting from Trainer
adds common training and inference methods like fit()
, predict()
, etc.
The __init__()
method detects if the model is being initialized as a Functional model or a subclassing model and initializes it appropriately by calling either the Layer
or Functional
__init__()
.
The call()
method raises an error in the base class, as subclasses must override it to define the model's forward pass. Core methods like fit()
, predict()
, etc are just passed through to the Trainer
implementation to inherit common training logic.
The build()
method constructs the model layers. It has special logic to support building from a configuration instead of arguments, allowing models to be defined programmatically without instantiating layers.
The save()
and load_weights()
methods handle checkpointing the full model weights and architecture or just the weights respectively. Under the hood they use Keras' saving library APIs.
The summary()
method prints a text summary of the model using the summary utils. It has options like line_length
to customize the output for terminals of different widths.
By providing a common base with standard methods, the Model
class allows Keras models to be initialized flexibly and trained/evaluated while abstracting away lower level details. It provides the foundation for Keras' modular approach to building deep learning models.
Sequential Models
References: keras/models/sequential.py
The Sequential
class represents a linear stack of layers and provides a simple interface for sequentially adding layers to a model. It handles the ordering of layers and underlying functionality like input/output shape inference. Sequential
inherits from the base Model
class.
Layers can be added to the model sequentially using the add()
method, which validates the layer and adds it to the internal _layers
list. This list is used to track the order of layers. add()
will also call the _maybe_rebuild()
method to reconstruct the underlying Functional
model if needed.
The build()
method constructs this underlying Functional
model, which is stored in the _functional
attribute. It takes the first layer as input, applies each subsequent layer in turn, and sets the resulting tensor as the output. This Functional
model is then used for the call()
method and other operations.
The call()
method has two modes. It will either delegate to the underlying Functional
model, or directly apply each layer in sequence if the model has not yet been built or if the inputs are not supported by _functional
.
The layers
property returns only the layers added by the user, filtering out the automatically generated InputLayer
.
The get_config()
method serializes the layer configurations for saving and loading models. The from_config()
deserializes and recreates the model from this saved configuration.
In summary, Sequential
provides a simple linear interface while utilizing the more complex Functional
model under the hood when needed. It handles ordering layers, building the model, and serializing the configuration.
Functional Models
References: keras/models/functional.py
The Functional
class represents a Keras model defined as a directed graph of layers. It inherits from the Function
and Model
classes. The __init__
method initializes a functional model from input and output tensors, validating the inputs and outputs are Keras tensors.
The call
method defines how the model runs on input data by standardizing the inputs and running them through the graph of layers via _run_through_graph()
. It supports passing training arguments. _standardize_inputs
processes model inputs for calling, converting, adjusting ranks, and adding metadata. _flatten_to_reference_inputs
flattens nested inputs.
Properties like input_shape
, output_shape
and layers
provide information about the model structure. Methods like compute_output_spec
also define the model behavior. The get_config
method serializes the model configuration including layers, nodes, and input/output mappings. functional_from_config
deserializes a model from this configuration.
The Functional
class inherits from Function
to define the forward pass through the model via the call
method. This method runs the inputs through the graph of layers by calling _run_through_graph()
. _run_through_graph()
applies each layer in topological order, passing the output of each layer to the next. _standardize_inputs
processes the model inputs before running them through the graph, converting them to tensors and adjusting ranks/metadata as needed. It supports passing arguments like training modes. The layers
property stores the graph of layers as a list. Properties like input_shape
and output_shape
provide information about the model structure. The get_config
and functional_from_config
methods allow serializing and deserializing the model configuration.
Model Utilities
References: keras/models/cloning.py
The …/cloning.py
file contains utilities for cloning Keras models. It allows creating a new model instance with the same architecture and layers as an existing model, but with newly initialized weights. This is useful for tasks like model ensembling where you want multiple similar models.
The main functionality is contained in the clone_model()
function. This function checks if the input model is a Sequential
or Functional
model and dispatches the cloning logic accordingly. For Sequential
models it calls _clone_sequential_model()
, and for Functional
models it calls _clone_functional_model()
.
For subclassed models, it will serialize and deserialize the model configuration to recreate the model instance. Custom clone functions and input tensors are not supported for subclassed models.
_clone_sequential_model()
iterates through the layers of the input Sequential
model and clones each layer using the provided clone_function
. It also handles cloning the input layer if needed.
_clone_functional_model()
recursively runs the model to clone each layer. It checks the input tensors, and if not provided it will create new placeholder input tensors. It runs the model graph while applying the clone_function
to each layer to clone it.
Model Testing
References: keras/models/sequential_test.py
, keras/models/functional_test.py
This section covers the test cases that validate the behavior of Keras models. The main classes for model testing are SequentialTest
and FunctionalTest
, which inherit from testing.TestCase
.
SequentialTest
contains methods that exercise different functionality of the Sequential
model. Some examples include:
 Adding layers with and without specifying input shapes using
Input
and direct specification  Building and calling models on both symbolic and eager tensors
 Serialization of built and unbuilt models
 Checking for errors like adding duplicate layers
FunctionalTest
contains methods for testing the Functional
model class. Some examples include:
 Building models with multiple inputs and outputs
 Passing scalar and tensor inputs
 Calling models eagerly and symbolically
 Input and output specifications
 Passing inputs by name
 Serialization
Both classes leverage functionality from the Keras and TensorFlow Python APIs like backend.KerasTensor
. The tests validate correct behavior, ensure expected errors are raised, and cover major use cases.
Some key methods in more detail:
test_basic_flow_with_input
andtest_legacy_flow_with_input_shape
inSequentialTest
add layers with and without specifying input shapestest_dict_inputs
andtest_list_inputs
pass different input types to modelstest_errors
checks error cases inSequential
test_serialization
serializes and deserializes modelstest_basic_flow_multi_input
builds a multiinput model inFunctionalTest
Overall the test cases provide comprehensive validation of core model functionality through different scenarios. This ensures the expected interfaces are met and implementations function properly.
Layers
References: keras/layers
The …/layers
directory contains implementations of commonly used neural network layers that serve as basic building blocks for constructing models. These layers implement fundamental operations like dense connections, convolutions, recurrent connections, and more that are used widely across different model types and domains. The layers define clean Keras interfaces while delegating the actual computations to lowerlevel frameworks like TensorFlow wherever possible. This separation of concerns allows new layers to be easily added without having to reimplement core logic.
The key layers provided include Dense
for dense connections, Conv1D/2D/3D
for convolutions, LSTM/GRU/SimpleRNN
for recurrent layers, BatchNormalization
for normalization, and Activation
layers for applying functions like ReLU. These layers have been optimized and tested extensively to ensure they meet expectations. The …/__init__.py
file centralizes access to all layers by importing and reexporting them from their respective submodules, providing a single namespace.
Now discussing some important classes and files in more detail:
The …/core
directory contains fundamental layer types. The Dense
layer in …/dense.py
implements the core dense connection between inputs and weights through matrix multiplication in its call()
method.
The …/convolutional
directory provides convolutional layers. The Conv2D
layer in …/conv2d.py
inherits from the BaseConv
layer, which defines common convolutional logic in call()
. It overrides __init__()
to set hyperparameters and calls the parent initialization.
The …/rnn
directory contains recurrent layers. The LSTM
class in …/lstm.py
implements the core LSTM cell recurrence in its call()
method based on the standard equations. The RNN
layer in …/rnn.py
handles running RNN cells on input sequences via inner_loop()
.
The …/normalization
directory implements normalization layers. The BatchNormalization
layer in …/batch_normalization.py
maintains moving averages of mean and variance to consistently normalize inputs during training and inference.
The …/activations
directory contains activation layers. The Activation
layer in …/activation.py
simply applies the given activation function to inputs in its call()
method.
Core Layers
References: keras/layers/core
The core layers in Keras implement fundamental neural network building blocks like dense connections, embeddings, and input placeholders. These layers form the basic building blocks that can be combined to construct complex models.
The …/core
directory contains implementations of common layer types. The Dense
layer in …/dense.py
handles dense connections through matrix multiplication. Its call()
method performs the core computation of multiplying the inputs by the kernel weights. The Embedding
layer in …/embedding.py
maps integers to dense vectors. It uses ops.take()
in its call()
method to extract embedding vectors from the weights matrix.
The InputLayer
class in …/input_layer.py
is used to define input tensors for models. Its main responsibilities are constructing Keras tensors from arguments like shape and dtype, storing the input tensor, and registering it as the layer output. The Input
function provides a cleaner API than using InputLayer
directly. The Identity
layer in …/identity.py
simply returns its input, preserving properties. This allows inserting identity layers without affecting computation.
The Lambda
layer in …/lambda_layer.py
allows arbitrary Python functions to be used as layers. It implements layer methods like call()
to wrap functions as layers. The Masking
layer in …/masking.py
handles masking timesteps. Its compute_mask()
method generates masks by checking for equality with the mask_value
. The EinsumDense
layer in …/einsum_dense.py
performs dense connections using Einstein summation notation, supporting arbitrary input dimensionality through careful analysis of the provided equation string.
Convolutional Layers
References: keras/layers/convolutional
The …/convolutional
directory contains implementations of common convolutional operations as Keras layers. It provides classes for standard convolutions like Conv1D
, Conv2D
, Conv3D
, as well as transposed/fractionallystrided convolutions with Conv1DTranspose
, Conv2DTranspose
, Conv3DTranspose
. Depthwise convolutions are implemented in DepthwiseConv1D
and DepthwiseConv2D
. Separable convolutions are supported via SeparableConv1D
and SeparableConv2D
.
The core convolutional logic is defined in BaseConv
, which serves as the parent class for standard convolution layers. It handles tasks like input validation, weight initialization, and computing the output shape. Child classes like Conv2D
inherit this functionality while implementing the specific convolution operation.
DepthwiseConv2D
inherits from BaseDepthwiseConv
and performs depthwise separable convolutions. It splits the input into channels, applies a separate depthwise kernel to each channel, and concatenates the results.
SeparableConv2D
leverages BaseSeparableConv
to optimize separable convolutions. BaseSeparableConv
contains the logic to first apply a depthwise convolution followed by a pointwise convolution in a single step.
Comprehensive testing is provided in files like conv_test.py
and depthwise_conv_test.py
to validate the layers against NumPy implementations and with various arguments.
Recurrent Layers
References: keras/layers/rnn
The core RNN layers in Keras are implemented in the …/rnn
directory. This includes commonly used RNN cell types like LSTM
, GRU
, and SimpleRNN
as well as variants like ConvLSTM
. Each layer type has its own class that inherits from the base RNN
class.
The RNN
class in …/rnn.py
serves as the base class for all RNN layers. It handles running the input sequence through an RNN cell and returning the outputs. The RNN
class takes a cell or list of cells as its first argument. Its key methods include __init__()
, compute_output_shape()
, build()
, call()
, and inner_loop()
. inner_loop()
uses the Keras backend RNN op for the core RNN computation. The RNN
layer relies on the cell implementing the call()
method and having state_size
and output_size
attributes. It supports stacked RNNs through the StackedRNNCells
wrapper class and handles statefulness through trainable state tensors.
The LSTM
class is defined in …/lstm.py
. It uses the LSTMCell
class which implements the core LSTM cell logic in its call()
method. LSTMCell
computes the gates and cell state using the standard LSTM equations. It has two implementations for efficiency  one that splits computations and one that fuses them. The LSTM
class handles actually running the RNN by calling LSTMCell
on each timestep using inner_loop()
. It integrates optimized backend implementations like CuDNN.
The GRU
class is defined in a similar way in …/gru.py
. The GRUCell
class contains the core GRU cell logic, computing gates and updating the hidden state. The GRU
class wraps GRUCell
to apply it to input sequences as a full layer.
The SimpleRNN
layer defined in …/simple_rnn.py
uses the SimpleRNNCell
class which defines a basic RNN cell. SimpleRNN
handles running the cell on full input sequences.
Variants like ConvLSTM
defined in …/conv_lstm.py
combine convolutions with standard LSTM computations, applying convolutional operations for both input and recurrent transformations.
Normalization Layers
References: keras/layers/normalization
The Keras layers in the …/normalization
directory implement various normalization techniques that can be applied to inputs. Normalization is useful for stabilizing the learning process and making the optimization problem easier to solve. The key layers are:

BatchNormalization
in…/batch_normalization.py
applies batch normalization to inputs. It maintains moving averages of the mean and variance computed during training to apply consistent normalization during inference. 
LayerNormalization
in…/layer_normalization.py
normalizes over the last axis of inputs by default, providing layer normalization. 
GroupNormalization
in…/group_normalization.py
divides channels into groups and computes the mean and variance within each group, generalizing layer and instance normalization. 
UnitNormalization
in…/unit_normalization.py
normalizes inputs so each has an L2 norm of 1 across specified axes.
The BatchNormalization
layer implements batch normalization through the following steps: in the build()
method it adds trainable gamma
and beta
weights, and in call()
it computes mean/variance of the current batch or uses moving averages from training, applies the normalization formula, and scales and shifts the values using gamma
and beta
.
LayerNormalization
normalizes over the specified axis
by computing mean and variance with ops.moments()
, applying the normalization formula with options for gamma
and beta
, and supporting masking.
GroupNormalization
reshapes inputs into groups, computes pergroup stats with _apply_normalization()
, applies the formula, and reshapes back, with options like gamma
and beta
. It generalizes layer and instance normalization.
UnitNormalization
calculates the L2 norm across axes in call()
with ops.sum()
and ops.rsqrt()
, and multiplies the inputs by the inverse normalization values to implement L2 normalization.
Thorough unit tests in files like batch_normalization_test.py
and layer_normalization_test.py
validate the layers, including correctness tests that pass random inputs.
Activation Layers
References: keras/layers/activations
The Keras layers in the …/activations
directory implement common activation functions as reusable Keras layers. This allows different activations to be easily added to models during construction.
The directory contains layer classes for activations like ReLU
, LeakyReLU
, PReLU
, ELU
, and Softmax
. Each layer class inherits from the base Layer
class and focuses the call()
method on applying the activation function. This delegates other responsibilities like output shape inference to Keras.
The ReLU
layer in …/relu.py
applies the activations.relu
function to its inputs in call()
. It optionally takes hyperparameters like max_value
and negative_slope
.
The LeakyReLU
layer in …/leaky_relu.py
applies activations.leaky_relu
in its call()
method, taking a negative_slope
parameter.
The PReLU
layer in …/prelu.py
learns the alpha
parameter, setting its shape in build()
based on shared_axes
. Its call()
method calculates PReLU directly from inputs and the alpha
weight.
The ELU
layer in …/elu.py
applies the activations.elu
function in its call()
method, optionally taking an alpha
parameter.
Thorough unit tests in files like …/prelu_test.py
verify the implementations match specifications and expectations for serialization, input handling, and error cases.
Pooling Layers
References: keras/layers/pooling
Pooling layers downsample inputs spatially to reduce the number of parameters and computations in convolutional networks. The MaxPooling2D
and AveragePooling2D
layers are commonly used for this purpose. MaxPooling2D
performs 2D max pooling on input tensors, taking the maximum value in each pooling window. AveragePooling2D
similarly pools inputs by taking the average value in each window. Both layers inherit core pooling logic from the BasePooling
layer defined in …/base_pooling.py
.
BasePooling
handles aspects like parameter validation, padding, and output shape computation that are shared between pooling layers. It implements the core call()
method which performs the actual pooling operation by calling tf.nn.max_pool()
or tf.nn.avg_pool()
based on the pool_mode
. Subclasses like MaxPooling2D
and AveragePooling2D
initialize the layer parameters and specify the pool_mode
of "max" or "average" respectively.
The MaxPooling2D
and AveragePooling2D
layers downsamples inputs by applying the specified pooling operation over windows defined by the pool_size
and shifted by strides
. They support both "valid" and "same" padding modes which determine how the input is padded before pooling and thus the output shape. Thorough tests defined in files like …/max_pooling_test.py
and …/average_pooling_test.py
validate the correctness of these layer implementations.
Regularization Layers
References: keras/layers/regularization
The Dropout
, GaussianDropout
, SpatialDropout1D/2D/3D
, and ActivityRegularization
layers implement various regularization techniques to help prevent overfitting during neural network training.
Dropout
randomly sets input units to zero during training based on a given dropout rate. This disrupts coadaptations on the training data and forces units to learn more robust representations. The core dropout logic is implemented in the layer's call()
method. GaussianDropout
applies dropout by multiplying inputs by factors drawn from a Gaussian distribution. SpatialDropout1D/2D/3D
drop entire feature maps rather than individual values, helping regularize spatial structure in early convolutional layers.
The ActivityRegularization
layer allows easily adding L1 and L2 regularization to the activations of another layer during training. In its __init__()
method, it sets the layer's activity_regularizer
attribute to an instance of regularizers.L1L2
using provided L1 and L2 factors. This causes the regularization penalties on the layer's activations to be added to the overall loss. The layer's call()
method simply returns its input, ensuring it is applied without changing the model architecture but still applies regularization. Together, these layers provide a variety of tools to prevent overfitting through regularization.
Attention Layers
References: keras/layers/attention
The …/attention
directory implements various attention mechanisms for Keras. The core Attention
class in …/attention.py
handles dotproduct attention.

The
Attention
class calculates attention scores between query and key tensors using_calculate_scores()
. It supports masking invalid positions. 
_calculate_scores()
handles calculating scores between query and key. It supports differentscore_mode
options like "dot". 
_apply_scores()
applies the scores. It masks positions, computes softmax over scores, optionally applies dropout, and takes a matrix multiplication of the distribution and value. 
Causal masking is supported by lower triangular masking in
_calculate_score_mask()
. 
AdditiveAttention
in…/additive_attention.py
implements additive attention. It overrides_calculate_scores()
to calculate scores as a nonlinear sum of query and key. 
MultiHeadAttention
in…/multi_head_attention.py
enables parallel attention heads. It usesEinsumDense
to project inputs andeinsum
to compute attention in parallel heads._compute_attention_mask()
handles masking. 
GroupedQueryAttention
in…/grouped_query_attention.py
allows differing query and key/value heads. It usesEinsumDense
andops.repeat()
to match dimensions before attention in_compute_attention()
.
Thorough unit tests validate the attention implementations and calculations. The layers provide clean modular implementations of important attention mechanisms.
Preprocessing Layers
References: keras/layers/preprocessing
The Keras preprocessing layers provide a set of tools for common feature engineering and preprocessing tasks in deep learning models. These layers allow features to be normalized, discretized, encoded, and transformed before being fed into models. They handle common tasks like:

Normalization: The
Normalization
layer centers and scales features to have zero mean and unit variance. It can learn normalization statistics from data viaadapt()
or accept precomputed values. 
Discretization: The
Discretization
layer buckets continuous features into discrete bins based on learned quantiles or predefined boundaries. It supports differentoutput_mode
encodings. 
Encoding: The
CategoryEncoding
layer encodes categorical integer features into onehot, multihot or count representations. TheHashing
layer maps features to an integer hash space. 
Transformations: Layers like
FeatureSpace
andTextVectorization
allow complex feature engineering by applying combinations of preprocessing techniques.FeatureSpace
handles normalization, discretization, crossings and outputs features in various formats.TextVectorization
handles common NLP tasks like tokenization and vocabulary indexing.
These layers provide a consistent Keras interface for feature preprocessing tasks. Their adapt()
methods allow statistics and vocabularies to be learned from training data. They produce outputs compatible with deep learning models and support TensorFlow data pipelines.
The core classes implementing these techniques include Normalization
, Discretization
, CategoryEncoding
, Hashing
, FeatureSpace
, and TextVectorization
. Key methods include their __init__()
, adapt()
, call()
, and get_config()
methods which handle initialization, statistic learning, core logic, and serialization respectively. Algorithms like quantile binning and hashing are implemented via lower level TensorFlow functions. The layers provide a clean highlevel API for feature engineering in Keras models.
Merging Layers
References: keras/layers/merging
The Keras layers for merging multiple inputs implement common elementwise operations like addition, multiplication, concatenation, and more. These layers take two or more input tensors and combine them through elementwise functions to produce a single output tensor.
The core layers are Add
, Multiply
, Concatenate
, and Dot
. The Add
layer implements elementwise addition by overriding the _merge_function
in the Merge
base class. It sequentially adds the input tensors using ops.add()
. Multiply
similarly overrides _merge_function
to multiply the inputs together with ops.multiply()
. Concatenate
validates shapes and concatenates inputs along an axis with ops.concatenate()
. Dot
performs dot products along configurable axes via batch_dot()
.
Other layers implement minimum, maximum, average, subtract, and more. For example, Minimum
finds elementwise minimum values by setting the first input as the initial output and recursively taking the minimum of each subsequent input with ops.minimum()
. Average
calculates averages by adding inputs with ops.add()
and dividing by the count.
All merging layers inherit from the Merge
base class in …/base_merge.py
. This handles common functionality like input validation, broadcasting, and masking. Individual layers override _merge_function
to apply the specific TensorFlow operation.
The …/merging_test.py
file contains comprehensive tests for the layers. It defines test parameters and runs correctness, error, and basic checks on each layer. These tests validate the key merging operations and error cases.
Reshaping Layers
References: keras/layers/reshaping
The Keras layers in the …/reshaping
directory provide common reshaping operations that can modify the dimensions or structure of input tensors. This includes layers like Flatten
, Reshape
, and Cropping
which are important for preprocessing data.
The Flatten
layer takes a tensor of any dimensions and reshapes it into a 2D tensor by flattening all dimensions except the batch dimension. This is useful for converting convolutional or recurrent outputs into dense inputs. The Flatten
layer preserves the batch size and handles different data formats like 'channels_first' and 'channels_last'. It uses ops.reshape()
to perform the flattening based on the computed output shape.
The Reshape
layer reshapes the input tensor into a target shape specified during initialization. It implements the compute_output_shape()
method to determine the output shape based on the input shape and target shape. The Reshape
layer resolves target shapes containing 1
dimensions by replacing them with inferred sizes. In its build()
method, it stores the resolved target shape which is then used in call()
with ops.reshape()
to perform the actual reshaping.
The Cropping
layers allow cropping portions of the input tensor along certain axes. For example, Cropping1D
performs 1D cropping on the temporal dimension, Cropping2D
crops spatial dimensions of images, and Cropping3D
crops volumetric data. They take cropping parameters that specify how many elements to remove from each edge. The cropping logic is implemented in the call()
method, which slices the inputs accordingly based on the cropping amounts and data format. compute_output_shape()
calculates the output shape after cropping.
The Cropping
layers support different cropping configurations like asymmetric cropping amounts on each side, same cropping on all sides, and cropping of different axes by different amounts. They validate that cropping values are within bounds of the input dimensions. Unit tests in files like cropping1d_test.py
and cropping2d_test.py
thoroughly validate the behavior of these layers under different arguments and configurations.
The Reshape
and Flatten
layers provide simple ways to modify tensor dimensions with minimal preprocessing code. The Cropping
layers allow removing unwanted edge elements from inputs. Together these layers implement common reshaping operations as reusable Keras components.
Callbacks
References: keras/callbacks
Callbacks allow customizing model training by hooking into different stages of the process. Key stages include the start and end of epochs, batches, training, validation, and prediction. The Keras callback system handles this through callback classes that inherit from the base Callback
class.
Callbacks implement methods like on_train_begin()
, on_epoch_end()
, and on_batch_end()
to run custom code at these points. This allows behaviors like monitoring metrics, early stopping, model checkpointing, and progress logging. Callbacks are grouped into a CallbackList
which ensures all callbacks are properly called at each stage.
The core Callback
class defines the callback interface and empty method implementations subclasses can override. It has attributes to store the model and training parameters set via set_model()
and set_params()
.
Many common callbacks are provided in Keras. The History
callback automatically records metrics after each epoch into a history
dictionary accessible on the model. It overrides on_train_begin()
to initialize storage and on_epoch_end()
to append results.
The ProgbarLogger
prints metrics and progress to stdout using a progress bar. It implements callback methods like on_train_batch_end()
to update the bar. The CSVLogger
similarly logs to a CSV file by overriding on_epoch_end()
.
The TensorBoard
callback handles logging metrics, weights, and graphs to TensorBoard. It manages summary writers and invokes logging callbacks at different points via methods like on_epoch_end()
.
The ModelCheckpoint
callback saves models or weights periodically using options like save_best_only
. It overrides on_train_end()
and on_epoch_end()
to handle checkpointing. The EarlyStopping
callback stops training when a monitored metric stops improving by checking for improvement in on_epoch_end()
.
The ReduceLROnPlateau
callback reduces the learning rate when a metric stops improving. It overrides on_epoch_end()
to check for improvement and call the optimizer if needed.
Callback Base Class
References: keras
The Callback
base class is defined in …/callback.py
. It provides a common interface that all Keras callbacks must implement. The key aspects of the Callback
class are:

It inherits from Python's
Callback
class to reuse functionality like exception handling. 
Empty method implementations are provided for all callback hook points like
on_train_begin()
,on_epoch_begin()
, etc. This defines the expected callback API. 
Subclasses can override any specific hook methods to insert custom logic. For example,
on_epoch_end()
to evaluate metrics after each epoch. 
Common properties are available like
logs
to access metrics from the last batch or epoch. 
The
params
attribute allows callbacks to access the training configuration parameters. 
Callbacks have access to the underlying model via
model
which enables inspecting layers/weights. 
Callbacks can maintain custom state across epochs via instance attributes or properties. This state is not tracked with the model itself.
Some important callback implementations include:

The
ModelCheckpoint
callback saves models or weights periodically using options likesave_best_only
to save improved models based on a monitor metric. It overrideson_epoch_end()
to check for metric improvement and save accordingly. 
The
EarlyStopping
callback stops training when a monitored quantity stops improving. It overrideson_epoch_end()
to check the metrics tracked inself.monitor
against the best value and stop training if patience is exceeded. 
The
ReduceLROnPlateau
callback reduces the learning rate when a metric stops improving. It overrideson_epoch_end()
similarly to check for nonimprovement in the monitored metric.
The Callback
base class provides a standardized interface for callbacks to hook into the Keras training loop at different points. This allows custom training behaviors to be flexibly implemented and composed together through subclassing.
Model Checkpointing
References: keras/callbacks/model_checkpoint.py
The ModelCheckpoint
callback saves Keras models during training either at the end of each epoch or every N batches. It allows saving either the full model or just the model weights. The callback implements methods like on_train_batch_end()
, on_epoch_begin()
, and on_epoch_end()
to determine when to save.
The ModelCheckpoint
class handles the main saving logic. Its __init__()
method sets up saving options like the file path and monitors the 'best' metric value. on_train_batch_end()
saves if the batch interval in save_freq
is reached. on_epoch_end()
always saves at the end of each epoch if save_freq
is an integer.
_save_model()
performs the actual saving of either the full model or just the weights. It checks if the current result improves the monitored metric compared to the previous 'best'. If save_best_only=True
, it will only overwrite the file in this case. _get_file_path()
formats the file path using placeholders like {epoch}
from the logs. It raises an error if the format fails.
The ModelCheckpoint
callback thus provides a robust way to periodically save Keras models with options to control the saving behavior. By implementing callbacks that run at different points in training, it enables saving models either at the end of each epoch or every N batches for later resuming. The class encapsulates the saving logic while exposing configurable options for controlling file paths and when to save.
Early Stopping
References: keras/callbacks/early_stopping.py
The EarlyStopping
callback implements early stopping to prevent overfitting. It monitors a given metric like validation loss during training and stops training if the metric does not improve for a specified number of epochs, called the patience. This helps avoid wasting resources on epochs unlikely to improve performance.
The EarlyStopping
class inherits from Callback
and overrides methods like on_train_begin
, on_epoch_end
, and on_train_end
to implement the early stopping logic. In __init__()
it initializes parameters like the metric to monitor, patience, and whether to restore the best model weights.
on_train_begin()
resets tracking variables. on_epoch_end()
gets the monitored metric value from the logs, checks for improvement over the best seen so far using self.monitor_op
, and resets the wait counter if improved. It stops training if patience is exceeded with no improvement. on_train_end()
restores the best weights seen during training if configured.
EarlyStopping
handles different improvement modes like 'min' and 'max' by introspecting the metric name and setting self.monitor_op
to the appropriate Keras operator like ops.less
or ops.greater
. This allows it to work with any metric without additional configuration.
Learning Rate Scheduling
References: keras/callbacks/learning_rate_scheduler.py
The LearningRateScheduler
callback allows dynamically adjusting the learning rate of the optimizer at the beginning of each epoch during training. It inherits from the Callback
class and stores the provided schedule
function and verbose flag in its __init__
method.
In the on_epoch_begin
method, it first checks that the optimizer has a learning_rate
attribute. It then gets the current learning rate value by calling backend.convert_to_numpy()
on it. It passes the current epoch and rate to schedule
to get the updated rate. It handles both the new and old API signatures for schedule
for backward compatibility. It checks the returned type is a valid float, and sets the new rate on the optimizer. If verbose, it logs the new rate.
In on_epoch_end
, it simply logs the final learning rate value to the logs dictionary. By implementing these methods, it allows dynamically adjusting the learning rate at each epoch during training via the userprovided schedule
function. This provides a simple and flexible way to schedule learning rates from within Keras.
Logging
References: keras/callbacks/csv_logger.py
, keras/callbacks/tensorboard.py
The CSVLogger
and TensorBoard
callbacks handle logging metrics and model topology during training. CSVLogger
writes metric values to a CSV file at the end of each epoch using the on_epoch_end
method. It handles different data types when writing by converting values to strings with handle_value()
.
TensorBoard
enables visualizations and metrics logging for TensorBoard. It has methods like on_train_begin()
, on_train_end()
, on_epoch_begin()
, on_epoch_end()
, on_train_batch_begin()
, on_train_batch_end()
to log at different points. It manages different summary writers for directories with methods like _push_writer()
and _pop_writer()
. Key functionality includes:
 Configuring embeddings visualization with
_configure_embeddings()
.  Logging weights as histograms with
_log_weights()
.  Logging metrics like loss with
_log_epoch_metrics()
.  Batch level logging by pushing/popping writers.
 Profile batch tracing using methods like
_start_trace()
and_stop_trace()
.
The callbacks handle logging at different points of training via the Keras callback interface, providing a high level API for users. CSVLogger
standardizes writing metric values to CSV while TensorBoard
enables visualizations and flexible logging.
Remote Monitoring
References: keras/callbacks/remote_monitor.py
The RemoteMonitor
callback integrates Keras training with remote monitoring platforms. The RemoteMonitor
class inherits from the base Callback
class and is initialized with parameters like the server URL and request path. During training, the RemoteMonitor
overrides the on_epoch_end
method to collect the epoch number and log metrics. It handles any NumPy arrays in the data and sends a POST request to the server path, serializing the data to JSON. This allows monitoring metrics and logs on each epoch end.
The key aspects of the RemoteMonitor
implementation are:

The class inherits from
Callback
to hook into the Keras training loop. 
The
__init__
method sets configuration parameters like the server URL, request path, headers etc. using therequests
library under the hood. 
on_epoch_end
collects the epoch number and log metrics from the logs dictionary. 
Any NumPy arrays in the data are converted to lists to ensure they can be serialized.

The data is sent as either JSON if
send_as_json=True
or formencoded otherwise via a POST request to the given server path. 
Any
RequestException
from therequests
library is caught and a warning is printed, allowing training to continue.
This allows critical training metrics and logs to be streamed to a remote monitoring service after each epoch for analysis, debugging and tracking progress over the course of training in a production setting.
Application Callbacks
References: keras/applications/vgg16.py
, keras/applications/xception.py
The VGG16
and Xception
models defined in …/vgg16.py
and …/xception.py
include callbacks tailored for these applications. When using these models, certain callbacks can be applied to take advantage of features specific to each model.
The VGG16
and Xception
models leverage utilities in …/imagenet_utils.py
for preprocessing inputs and decoding predictions. This file defines functions like preprocess_input()
and decode_predictions()
that are used by the model definitions.
The preprocess_input()
function implements preprocessing expected by the models, such as scaling pixel values between 1 and 1. This function is called by each model to ensure inputs are in the expected format before being passed to the model layers.
The decode_predictions()
function provides a convenient way to decode the raw predictions output by each model and obtain humanreadable class labels. It handles mapping predictions back to the corresponding ImageNet classes. This utility allows easily interpreting results from the pretrained models.
When using these models via transfer learning, the preprocess_input()
and decode_predictions()
functions can be leveraged via callbacks to preprocess inputs and postprocess predictions specifically for each model architecture. This allows taking full advantage of utilities implemented as part of each pretrained model definition.
Advanced Callbacks
References: keras/callbacks/lambda_callback.py
, keras/callbacks/terminate_on_nan.py
This section covers additional specialized callback classes provided in Keras that implement more advanced or niche functionality compared to the core callbacks.
The …/lambda_callback.py
file defines the LambdaCallback
class, which allows users to define simple callback functions inline without creating new classes. It takes anonymous functions as arguments for different callback events like on_epoch_begin
and on_batch_end
. These functions will then be called at the appropriate points in training. This provides flexibility while keeping callbacks lightweight.
The …/terminate_on_nan.py
file defines the TerminateOnNaN
callback class. This callback checks for invalid loss values like NaN or infinity after each training batch by implementing the on_batch_end
method. If an invalid loss is encountered, it prints a message and sets the model's stop_training
flag to terminate training, helping avoid wasting resources on failed runs.
Optimizers
References: keras/optimizers
The …/optimizers
directory contains implementations of various optimization algorithms that can be used to train Keras models. Optimization algorithms are essential for training deep learning models as they iteratively update model weights to minimize a loss function. The key optimization algorithms implemented in Keras include:
The Optimizer
base class defined in …/optimizer.py
provides a common interface for all Keras optimizers. It imports the appropriate backend optimizer class based on the Keras backend in use, allowing Keras to support multiple backends like TensorFlow, PyTorch, and JAX with a single optimizer API. The Optimizer
class ensures a consistent method signature regardless of backend.
The …/schedules
directory contains implementations of learning rate schedules that can be used with Keras optimizers. Learning rate schedules control how the learning rate decays over the course of training, allowing the optimizer to efficiently converge on optimal weights. Schedules like ExponentialDecay
, PiecewiseConstantDecay
, and CosineDecay
are defined to implement important decay functions.
The core stochastic gradient descent algorithm with optional momentum is implemented in the SGD
class located at …/sgd.py
. It performs single variable updates using momentum calculations defined in its update_step()
method.
Adaptive learning rate methods that dynamically adapt the learning rate for each parameter are implemented in files like …/rmsprop.py
for RMSprop, …/adam.py
for Adam, and …/adadelta.py
for Adadelta. Each implements the characteristic update rules through update_step()
methods while leveraging common functionality from Optimizer
.
Additional optimization algorithms are located in files such as …/adagrad.py
for Adagrad and …/ftrl.py
for FTRL. The FTRL optimizer class maintains "accumulators" to track parameterspecific learning rates over time. Adagrad adapts rates based on accumulated squared gradients computed in its update_step()
.
Thorough unit tests located in files like sgd_test.py
validate the key functionality, configurations, and mathematical correctness of each optimizer against "golden" values. These help prevent regressions in the optimization logic.
Stochastic Gradient Descent
References: keras/optimizers/sgd.py
The SGD
optimizer implements the stochastic gradient descent algorithm for training neural networks. SGD is one of the most commonly used optimization algorithms in deep learning.
SGD works by estimating the gradient of the loss function for each training example and updating the weights in the opposite direction. Specifically, it calculates the loss gradient for each example, then takes a step in the opposite direction of that gradient, proportional to the learning rate. This has the effect of minimizing the loss function.
The SGD
class in Keras handles the implementation of the SGD algorithm. It inherits from the base Optimizer
class. In its __init__()
method, it initializes properties like the learning rate.
The build()
method initializes momentum variables if momentum is enabled. For each trainable variable, it adds a momentum variable to the momentums
list using self.add_variable_from_reference()
. This sets up the variables needed for applying momentum during weight updates.
At the core of SGD is the update_step()
method. This performs a single optimization step. If momentum is disabled, it simply performs a vanilla gradient descent update by subtracting the raw loss gradient from the weights, proportional to the learning rate.
If momentum is enabled, update_step()
computes the new momentum value using either the vanilla or Nesterov formula as defined in the class docstring. It then applies this momentum to smoothly update the weights in the direction of the loss gradient. This helps accelerate SGD convergence.
Adaptive Learning Rate Methods
References: keras/optimizers/rmsprop.py
, keras/optimizers/adam.py
, keras/optimizers/adadelta.py
These algorithms adapt the learning rate during training based on the characteristics of the gradients:

The
RMSprop
optimizer normalizes the gradient by the running average of its recent magnitude. It maintains a moving average of the squared gradients called thevelocities
to divide the gradient by. This has the effect of lowering the learning rate for parameters that are changing frequently and raising it for infrequent parameters. 
The
Adam
optimizer is based on adaptive estimates of lowerorder moments. It computes biascorrected first and second moment estimates of the gradients calledmomentums
andvelocities
respectively. It then uses these estimates to perform an adaptive learning rate optimization where frequent parameters have a smaller effective learning rate. 
The
Adadelta
optimizer works similarly to Adagrad in adapting the learning rate for each parameter, but does not monotonically decrease the learning rate. Instead it adapts based on a moving window of gradient updates, maintainingaccumulated_grad
andaccumulated_delta_var
variables to store exponentially weighted moving averages of squared gradients and parameter updates. It then computes the adaptive learning rate from these accumulated values.
The implementations of these algorithms in Keras closely follow their mathematical formulations. For RMSprop, the RMSprop
class stores the velocities
moving average in its _velocities
attribute. The core update logic in update_step()
normalizes the gradient by the square root of _velocities
.
For Adam, the Adam
class stores the momentums, velocities, and optional velocity hats in _momentums
, _velocities
and _velocity_hats
. The update_step()
method calculates the biascorrected moment estimates and uses them to perform the adaptive update.
Adadelta maintains accumulated_grad
and accumulated_delta_var
lists for each parameter. The update_step()
method assigns new values to these based on the current gradient and previous accumulated values, then applies the adaptive update.
Additional Optimization Algorithms
References: keras/optimizers/adagrad.py
, keras/optimizers/adamax.py
, keras/optimizers/nadam.py
, keras/optimizers/ftrl.py
These additional optimization algorithms provide alternative approaches to updating model variables during training. Adagrad, Adamax, Nadam, and FTRL each implement distinct optimization algorithms through custom Keras optimizer classes.
The Adagrad
optimizer tracks a separate learning rate for each model parameter, lowering the learning rate more for frequently updated parameters. It maintains pervariable accumulator tensors initialized in its build()
method. The update_step()
method calculates adaptive learning rates by dividing the overall rate by the square root of the accumulators plus a small epsilon value.
Adamax
is based on the Adam algorithm but uses the infinity norm rather than rootmeansquare. It initializes separate momentum _m
and norm _u
variables for each model variable in build()
. The update_step()
method calculates new momentum m
and norm u
values, then updates variables using an adaptive learning rate derived from the beta1 exponential moving average.
Nadam
implements Nesterovaccelerated Adam, using momentum _momentums
and velocity _velocities
estimates initialized in build()
. Its update_step()
method contains the core Nadam update logic, calculating updates based on these estimates, the gradient, learning rate, and other Nadam hyperparameters.
The Ftrl
optimizer is suitable for shallow models with large sparse feature spaces. It initializes accumulators and linear variables for each model variable in build()
. The update_step()
method performs the FTRL update steps outlined in its docstring, using the gradient, learning rate, and regularization terms like L1 and L2. It clips the linear variable and divides by the quadratic term to obtain the final variable update.
Learning Rate Schedules
References: keras/optimizers/schedules/__init__.py
, keras/optimizers/schedules/learning_rate_schedule.py
The Keras optimizers package provides several classes for controlling the learning rate decay over the course of model training. The core class is LearningRateSchedule
, which defines the interface for learning rate schedules through its __call__
method. This method takes a step value and returns the decayed learning rate.
The ExponentialDecay
, PiecewiseConstantDecay
, PolynomialDecay
, InverseTimeDecay
, and CosineDecay
classes all implement different decay functions to reduce the learning rate over time in a controlled manner. ExponentialDecay
decays the rate exponentially using parameters like initial learning rate, decay steps, and decay rate. PiecewiseConstantDecay
allows specifying constant rates for intervals of steps defined by boundaries and values lists. PolynomialDecay
decays polynomially using initial/final rates, decay steps, and a power value. InverseTimeDecay
decays the rate inversely proportional to time. CosineDecay
provides cosine decay with optional warmup by increasing the rate linearly at first.
The classes each override LearningRateSchedule
's __call__
method to implement the specific decay function. For example, ExponentialDecay
computes the decayed rate using its parameters in an exponential formula. PiecewiseConstantDecay
uses conditional logic on the step value to lookup the appropriate constant rate from its lists. PolynomialDecay
applies its polynomial formula.
The serialize
and deserialize
functions allow serializing and deserializing learning rate schedules for checkpointing.
Optimizer Base Class
References: keras/optimizers/optimizer.py
The Optimizer
class serves as the base class for all Keras optimizer implementations. It inherits from either TFOptimizer
, TorchOptimizer
, or JaxOptimizer
depending on the Keras backend in use. These backendspecific subclasses contain the implementations of optimizer updates and gradient computations that are compatible with each backend framework. BaseOptimizer
defines a more generic base class with minimal functionality that is used if an unsupported backend is detected.
When a Keras optimizer is instantiated, it will actually be one of the backend subclass objects under the hood. The Optimizer
class ensures a consistent interface for all optimizers regardless of backend.
By conditionally importing and assigning the appropriate backend subclass, Optimizer
provides a common Keras optimizer interface while routing the implementation to backendspecific code. This allows Keras code and APIs to remain backendagnostic.
Optimizer Testing
References: keras/optimizers/sgd_test.py
, keras/optimizers/rmsprop_test.py
, keras/optimizers/adam_test.py
The unit tests for Keras optimizers validate that the key optimization algorithms are implemented correctly. There are test files for the main optimizers:

…/sgd_test.py
contains tests for the Stochastic Gradient Descent (SGD) optimizer. This tests basic update logic, configuration, weight decay, correctness over many steps, and gradient clipping. 
…/rmsprop_test.py
tests the RMSprop optimizer update logic, serialization, single step updates, weight decay, correctness against golden values, and gradient clipping functionality. 
…/adam_test.py
focuses on testing Adam optimizer updates, configuration, weight decay, correctness, clipping, and exponential moving averages.
Each file contains a test case class like SGDTest
that inherits from testing.TestCase
. This class holds test methods that directly exercise the optimizer code. Tests configure dummy data, apply optimizer updates, and validate the results match expectations. This validates the core optimization algorithms are implemented correctly.
The tests thoroughly cover serialization, single step updates, weight decay, correctness over many steps, and gradient clipping. This ensures the optimizers continue functioning properly under various conditions. The unit tests provide an effective way to prevent regressions and verify the optimizers meet their specifications.
Losses
References: keras/losses
The …/losses
directory contains the core loss functions for training neural networks in Keras. The main loss functions are defined in …/losses.py
, including MeanSquaredError
and CategoricalCrossentropy
.
These losses are implemented as classes that inherit from the base Loss
class defined in …/loss.py
. The Loss
class standardizes how losses are implemented in Keras by handling the calling of the subclass' call()
method, applying masking and weighting sample losses, and reducing the losses as specified by the reduction type.
The loss function classes are initialized with hyperparameters like the name, reduction type, and dtype. Their call()
method contains the core logic to calculate the loss values from y_true
and y_pred
.
Unit tests for the losses are in …/loss_test.py
. Tests are provided for MeanSquaredError
and CategoricalCrossentropy
. The tests validate behaviors like correctness on sample data, weighted vs unweighted losses, and different reduction types.
Loss Function Implementations
References: keras/losses/losses.py
The …/losses.py
file implements many common loss functions for training neural networks. The main classes defined are LossFunctionWrapper
and individual loss functions like MeanSquaredError
, MeanAbsoluteError
, CategoricalCrossentropy
, and SparseCategoricalCrossentropy
.
LossFunctionWrapper
acts as a base class that wraps loss functions. It handles calling the loss function and allows configuring the reduction type, such as 'sum' or 'mean'. This provides a consistent API for losses.
Loss functions preprocess inputs with utilities like squeeze_to_same_rank
and support passing sample weights. They compute the loss directly from labels and predictions. Losses that handle probabilities like CategoricalCrossentropy
convert logits to probabilities internally.
Utilities are also defined, such as convert_binary_labels_to_hinge
which preprocesses labels for hingebased losses. Losses support both standalone functions and classes, enabling both functional and objectoriented usage in Keras models.
Loss Base Class
References: keras/losses/loss.py
The Loss
class is the base class that all Keras loss functions must inherit from. It standardizes the implementation of loss functions by defining a common interface and functionality. The Loss
class handles calling the subclass' call()
method to calculate the raw loss values from the inputs. It then applies masking, weighting by sample weights, and reduction to the losses.
The key method subclasses must implement is call()
, which contains the logic to calculate the raw loss values from the inputs y_true
and y_pred
. The Loss
class calls this method and passes the results to further processing.
Loss
standardizes several aspects of loss function implementation. It sets the name, reduction type, and dtype for all loss functions. These properties ensure losses can be identified and will work properly with Keras models.
The class centralizes common loss reduction logic in methods like reduce_weighted_values()
. This method handles applying masking to the sample weights, normalizing input shapes with squeeze_to_same_rank()
, weighting the losses by sample weights, and passing the weighted losses to reduce_values()
for reduction. reduce_values()
sums the losses and optionally divides by the batch size for 'sum_over_batch_size' reduction.
By subclassing Loss
, loss functions leverage this standardized reduction logic and common interfaces. This ensures all losses work consistently with Keras. The base class' methods also take care of many implementation details so subclass code can focus just on calculating raw loss values.
Loss Utilities
References: keras/losses/__init__.py
, keras/losses/loss_test.py
The Loss
base class standardizes the implementation of all loss functions. It defines the core interface and methods that subclasses must implement, including call()
. The base call()
method handles masking and weighting of losses.
The LossFunctionWrapper
class is used to wrap legacy loss functions that do not support masking or weighting. It overrides their call()
method to apply masking/weighting before delegating to the wrapped function. This allows these legacy losses to still work as expected with masking and sample weights.
Loss Function Tests
References: keras/losses/loss_test.py
The unit tests in …/loss_test.py
validate that Keras loss function implementations behave as expected. An ExampleLoss
class is defined that implements mean squared error, acting as a simple test case.
The main LossTest
class contains various methods to test loss functionality. The test_reduction
method ensures losses calculate correctly under different reduction types like 'none', 'sum', and 'sum_over_batch_size'. test_mask
checks masking, where the loss is only calculated on unmasked values. test_sample_weight
tests sample weighting works as expected. test_mask_and_sample_weight
combines these features. test_rank_adjustment
verifies upgrading and downgrading input ranks. test_mixed_dtypes
handles different dtype inputs properly. test_get_method
checks the get()
utility for losses. test_dtype_arg
validates the dtype argument sets the correct output dtype.
These tests use pytest
for running and numpy
/Keras
backend for operations. They comprehensively validate the core loss calculation logic and ensure losses behave as intended under various conditions. This helps prevent regressions and ensures consistent loss calculation in Keras.
Metrics
References: keras/metrics
The …/metrics
directory contains classes that implement metrics for evaluating model performance during training and testing. Metrics compute statistics like accuracy, precision and recall on model predictions. They are used to monitor and optimize models.
Key classes include Metric
, Mean
, Sum
and MeanMetricWrapper
. Metric
defines the base interface for metrics, handling state tracking and resetting between epochs. Mean
and Sum
compute weighted averages and totals. MeanMetricWrapper
allows wrapping functions to track their mean value.
Other important files are metrics_utils.py
, reduction_metrics.py
and the metrictype subdirectories. metrics_utils.py
provides utilities like updating confusion matrices. reduction_metrics.py
contains classes for computing sums and means. The subdirectories hold classes for specific metric types  classification, regression, etc.
Some key classes:

Classes like
Accuracy
,BinaryAccuracy
andCategoricalAccuracy
inaccuracy_metrics.py
compute accuracybased metrics for classification.Accuracy
calculates the fraction of correct predictions by tracking a 'total' and 'count' variable. 
FBetaScore
and subclasses inf_score_metrics.py
calculate Fscores for classification.FBetaScore
maintains variables for true/false positives/negatives and computes precision, recall and the final Fbeta score. 
Classes in
regression_metrics.py
implement common regression metrics. For example,MeanSquaredError
computes the mean squared error directly, whileRootMeanSquaredError
first calculates the squared error then takes the root. 
Metrics in
confusion_metrics.py
rely on confusion matrices. Classes likePrecision
andRecall
accumulate true/false counts, then compute results based on these variables and the confusion matrix. 
IoU
and subclasses iniou_metrics.py
evaluate semantic segmentation using intersectionoverunion. They accumulate predictions into confusion matrices to calculate true/false positives/negatives for the IoU.
Metric Base Class
References: keras/metrics/metric.py
The Metric
base class defines the core interface that all Keras metrics must implement. It handles tracking metric state and computations through subclasses.
The Metric
class provides methods for adding metric variables via add_variable()
, resetting variables between epochs with reset_state()
, and accumulating updates into the variables with update_state()
. Subclasses must implement update_state()
to define the specific update logic, and result()
to compute the final metric value from the state variables.
Metric
also defines __call__()
as a convenience method to directly call update_state()
and result()
. It allows metrics to be updated without a Keras session via stateless_update_state()
and computed via stateless_result()
for distributed settings.
The key aspects of the Metric
implementation are:
 It tracks metric variables through the
_variables
property andadd_variable()
method reset_state()
simply resets all variables to zero between epochsupdate_state()
accumulates updates into the state variables, with specific logic defined by subclassesresult()
computes the final metric from the state variables, with logic defined by subclasses
In summary, Metric
provides the common scaffolding for metric state management and computations, while subclasses implement the unique logic for each specific metric. This standardized interface allows new metrics to be easily implemented.
Reduction Metrics
References: keras/metrics/reduction_metrics.py
The …/reduction_metrics.py
file contains utilities for computing reduction metrics like sums and means. It defines the core Sum()
and Mean()
metric classes, which handle reducing values across samples.
The Sum()
metric class tracks the running total of values in a total
variable using the Zeros()
initializer. In update_state()
, it calls reduce_to_samplewise_values()
to apply sample weights and reduce extra dimensions if needed. This summed value is then assigned to total
. reset_state()
resets the total to 0, and result()
simply returns the total.
The Mean()
metric works similarly but tracks both the running total
and sample count
in variables. In update_state()
, it assigns the summed values to total
and increments count
. reset_state()
resets both variables, and result()
returns the total divided by the count to compute the actual mean.
The reduce_to_samplewise_values()
utility function handles reducing tensor values to the sample dimension based on weights. It takes the values, weights, reduction type like sum()
or mean()
, and dtype. This allows metrics to work across different value and weight shapes.
The MeanMetricWrapper
class inherits from Mean()
and allows wrapping an arbitrary metric function. The wrapped function's output is reduced like other metrics and its mean tracked over time. This provides a simple way to track the average of any loss or evaluation metric.
Confusion Matrix Utilities
References: keras/metrics/metrics_utils.py
The …/metrics_utils.py
file provides important utilities for updating confusion matrices. It contains the ConfusionMatrix
Enum which defines the possible confusion matrix variables as TRUE_POSITIVES
, FALSE_POSITIVES
, TRUE_NEGATIVES
, FALSE_NEGATIVES
.
The core function for updating confusion matrix variables is update_confusion_matrix_variables()
. It handles tiling predictions, labels, and thresholds to compute the true positives, false positives, etc in an elementwise way. For improved efficiency, _update_confusion_matrix_variables_optimized()
provides an optimized implementation when thresholds are evenly distributed. It leverages "buckets" based on thresholds to update variables in one pass.
The confusion_matrix()
function computes the actual confusion matrix values given predictions, labels, and number of classes. It uses tf.scatter_nd()
to bin the predictions and labels into a confusion matrix tensor.
The is_evenly_distributed_thresholds()
helper checks if a list of thresholds is evenly spaced, enabling use of the optimized method.
Accuracy Metrics
References: keras/metrics/accuracy_metrics.py
The …/accuracy_metrics.py
file implements several important accuracy metrics for evaluating classification models. It contains classes that calculate prediction accuracy in different ways depending on the type of classification problem.
The core Accuracy
class calculates the fraction of predictions where the predicted class is equal to the true class. It tracks a 'total' and 'count' variable to calculate accuracy as the total correct predictions over the total number of samples. The Accuracy
class subclasses MeanMetricWrapper
from …/reduction_metrics.py
to calculate the mean accuracy over batches.
For binary classification problems, the BinaryAccuracy
class uses the binary_accuracy
function to compare predictions to labels based on a threshold. This allows for noninteger predictions as long as they are above or below the threshold. The class checks that the threshold is valid during initialization.
For multiclass classification with onehot encoded labels, the CategoricalAccuracy
class leverages the categorical_accuracy
function. This function handles the details of squeezing labels and casting types as needed before finding matching elements in the predictions and labels.
The sparse_categorical_accuracy
function and associated SparseCategoricalAccuracy
class are for evaluating models using sparse categorical labels instead of onehot.
The top_k_categorical_accuracy
function calculates accuracy based on whether the true label is within the top K highest probability predictions for each sample.
All of these accuracy metric classes inherit from MeanMetricWrapper
to calculate a mean accuracy score over batches during training or evaluation. This base class handles updating the metric result and returning it after each call. It also supports passing optional sample weights.
Regression Metrics
References: keras/metrics/regression_metrics.py
The regression metrics defined in …/regression_metrics.py
allow evaluating model performance on regression tasks. Key metrics include:

MeanSquaredError
computes the mean squared error between true and predicted values, a standard regression loss metric. It inherits fromreduction_metrics.MeanMetricWrapper
to accumulate error values inupdate_state()
. 
MeanAbsoluteError
computes the mean absolute error. 
RootMeanSquaredError
computes the root mean squared error. It overridesupdate_state()
to first callsqueeze_to_same_rank()
on inputs before computing squared error passed to the baseMean
class.result()
returns the square root of the mean squared error. 
R2Score
computes the Rsquared score, a standard regression metric. It has options for standard or adjusted R2. Methods like__init__()
,_build()
andupdate_state()
accumulate sums of squares, cross products etc needed to compute the final R2 score inresult()
.
The cosine_similarity()
function directly computes cosine similarity from normalized true and predicted values, used by the CosineSimilarity
metric class.
Probabilistic Metrics
References: keras/metrics/probabilistic_metrics.py
The …/probabilistic_metrics.py
file defines several probabilistic metric classes that can be used to evaluate models trained with probabilistic losses. These metrics wrap loss functions to compute a performance metric rather than an error signal.
The main classes defined are KLDivergence
, Poisson
, BinaryCrossentropy
, CategoricalCrossentropy
, and SparseCategoricalCrossentropy
. All inherit from MeanMetricWrapper
to compute the mean metric value over samples.
KLDivergence
computes the KullbackLeibler divergence between y_true
and y_pred
using y_true * log(y_true / y_pred)
. It wraps the kl_divergence
function.
Poisson
computes the Poisson metric between y_true
and y_pred
using y_pred  y_true * log(y_pred)
. It wraps the poisson
function.
CategoricalCrossentropy
assumes onehot encoded multiclass labels. It computes crossentropy using categorical_crossentropy
, allowing options like from_logits
and label_smoothing
. The entropy computation axis can be set.
SparseCategoricalCrossentropy
expects integer class labels rather than onehot. It wraps sparse_categorical_crossentropy
to compute the metric.
All classes are designed to directly evaluate models or be used with Model.compile()
for evaluation during training. They provide a standardized way to monitor probabilistic losses as metrics.
Fbeta Metrics
References: keras/metrics/f_score_metrics.py
The …/f_score_metrics.py
file defines two main classes for computing Fbeta scores for classification problems: FBetaScore
and F1Score
.
The FBetaScore
class computes the Fbeta score, which is a weighted harmonic mean of precision and recall. It takes the beta parameter to weight recall vs precision, with higher beta placing more importance on recall. The class inherits from the Metric
base class and implements the common Keras metric API.
Internally, it maintains state variables like true_positives
, false_positives
etc using Keras variables initialized in _build()
. The update_state()
method handles updating these variables based on the passed y_true
and y_pred
tensors after applying thresholding. Thresholding logic converts prediction probabilities to binary predictions by comparing values to the threshold
parameter.
The final Fbeta score is computed in result()
via formulas involving the precision, recall, and beta values. It returns a single value or supports different averaging schemes like 'macro' based on the average
argument. Sample weights can also be applied via the sample_weight
argument.
The F1Score
class is a simple subclass that sets beta=1
to compute the specific F1 score metric. Both classes follow best practices like input validation, configurable options, and serializable config.
IoU Metrics
References: keras/metrics/iou_metrics.py
The …/iou_metrics.py
file defines several classes for computing the IntersectionOverUnion (IoU) metric, which is commonly used to evaluate semantic segmentation models. The IoU metric measures the overlap between predicted regions and ground truth regions.
The _IoUBase
class acts as the base for all IoU metric classes. It handles accumulating predictions and labels into a total_cm
confusion matrix using the update_state()
method. This function converts inputs to tensors and ignores any provided ignore_class
, accumulating the current confusion matrix returned by confusion_matrix()
into the total confusion matrix.
The IoU
class computes IoU for specific target classes. Its result()
method calculates true positives, false positives, and false negatives from the confusion matrix totals for each target class. It then computes the individual class IoUs and returns their mean.
The MeanIoU
class computes the mean IoU across all classes by setting the target classes to a list of all class IDs. In result()
, it simply calls the superclass method and divides the total IoU by the number of valid classes to get the average IoU.
The OneHotIoU
class handles onehot encoded labels and predictions. Its update_state()
method uses argmax
to convert the inputs to integer format before accumulating them into the confusion matrix. This allows computing IoU when the inputs are provided as probability distributions over classes rather than discrete class IDs.
Hinge Metrics
References: keras/metrics/hinge_metrics.py
The …/hinge_metrics.py
file defines three classes  Hinge
, SquaredHinge
, and CategoricalHinge
 that implement different hingebased loss metrics. These classes calculate metrics based on the hinge loss like Hinge, SquaredHinge.
All three classes inherit from the MeanMetricWrapper
class in …/reduction_metrics.py
. This class handles computing the mean of the loss function over samples. The child hinge metric classes override the __init__
method to set the specific loss function  hinge
, squared_hinge
, or categorical_hinge
respectively  via the fn
argument.
To use one of the hinge metric classes, you instantiate it and then call the update_state()
method to calculate the loss on new data. The result()
method then returns the average loss value. The reset_state()
method clears the accumulated state.
For example, to calculate the standard hinge loss on data you would:
 Create a
Hinge
instance  Call
update_state()
to calculate the loss result()
returns the average hinge loss value Optionally
reset_state()
before the next update
This provides a simple way to integrate hingebased losses as metrics within Keras models.
Confusion Matrix Metrics
References: keras/metrics/confusion_metrics.py
The …/confusion_metrics.py
file contains metrics that rely on the confusion matrix for evaluation. Metrics like Precision
and Recall
are implemented, which compute various true and false positive/negative counts from the confusion matrix to calculate precision and recall scores.
Several classes inherit from the Metric
base class to implement confusion matrix metrics. The _ConfusionMatrixConditionCount
abstract base class counts conditions in the confusion matrix, and classes like TruePositives
inherit from it to compute specific counts.
The Precision
and Recall
classes leverage counts from subclasses of _ConfusionMatrixConditionCount
to calculate precision and recall scores directly. Thresholdbased metrics like SensitivityAtSpecificity
and SpecificityAtSensitivity
search for the optimal threshold satisfying their condition.
The AUC
metric approximates the area under the ROC or PR curve by dividing it into buckets. It computes the average true and false positive rates within each bucket to estimate the area via Riemann summation. All metrics override update_state()
to accumulate confusion matrix counts using metrics_utils.update_confusion_matrix_variables()
.
Datasets
References: keras/datasets
The …/datasets
package provides easy access to several common datasets that can be conveniently loaded for testing, debugging, and example code in Keras. It contains modules for loading popular image, text, and tabular datasets such as MNIST, Fashion MNIST, CIFAR10, IMDB reviews, and Boston housing prices data.
The core module is …/__init__.py
, which imports the individual dataset modules. This allows other code to access the datasets through a single consistent interface. For example, functions like load_data()
and get_data()
defined in the underlying modules.
Some key business logic implemented across the dataset modules includes:

Downloading data files from online sources if not already cached locally using
get_file()
defined in utils.py. This ensures data is always available even if the source changes. 
Loading data stored in common formats like .npz arrays, pickle files, and JSON dictionaries using functions like
np.load()
andjson.load()
. 
Preprocessing data into NumPy arrays with the expected shapes through steps like parsing images/labels, truncating sequences, filtering words, and splitting into train/test sets.

Providing higherlevel functions that encapsulate the entire loading and preprocessing pipeline with simple interfaces, hiding implementation details.

Implementing data loading in a consistent way across different types of datasets to make them easy to use interchangeably in examples and testing.
This package makes it very convenient to access common datasets without having to manually download, parse and preprocess data each time. The standardized interfaces also allow focusing experiment code solely on models without dataset handling logic.
MNIST Dataset
References: keras/datasets/mnist.py
The …/mnist.py
file provides easy access to the MNIST dataset of handwritten digits for use in examples and testing. It defines a single function, load_data()
, which loads the MNIST training and test data from a compressed .npz file stored on Google Cloud Storage.
The load_data()
function takes an optional path
argument to specify where to cache the dataset locally. It uses get_file()
to download the .npz file if not already present, handling file downloading and caching. The .npz file contains compressed NumPy arrays storing the image and label data.
load_data()
loads this file using np.load()
, extracting the x_train
, y_train
, x_test
, y_test
arrays containing 60,000 training images and labels and 10,000 test images and labels. Each grayscale image is 28x28 pixels in size. The label arrays contain the corresponding digit classes from 09.
load_data()
returns a tuple of the training and test data, along with basic validation of the array shapes. No other algorithms or classes are implemented  it provides a simple interface for easily loading the standardized MNIST dataset into NumPy for use in models.
Fashion MNIST Dataset
References: keras/datasets/fashion_mnist.py
The …/fashion_mnist.py
file provides access to the Fashion MNIST dataset. This dataset contains 60,000 training images and 10,000 test images of clothing items like shirts, shoes and bags. Each 28x28 grayscale image is labeled with one of 10 class names.
The load_data()
function handles downloading and loading the dataset. It first defines the local directory and URLs for the required gzip files containing the image and label data. get_file()
is used to download these files if not present locally. For each file type (train/test images and labels), gzip.open()
reads the raw byte contents. np.frombuffer()
extracts the label data while np.reshape()
reshapes the raw pixels into a 2D image matrix for the images. Finally, four NumPy arrays are returned containing the preprocessed training and test data  (x_train
, y_train
) and (x_test
, y_test
).
CIFAR10 and CIFAR100 Datasets
References: keras/datasets/cifar10.py
, keras/datasets/cifar100.py
The files …/cifar10.py
and …/cifar100.py
provide access to the CIFAR10 and CIFAR100 image classification datasets.
The …/cifar10.py
file contains the load_data()
function, which loads and prepares the CIFAR10 data for use in Keras models. It downloads the CIFAR10 archive if needed, then loads the training and test images and labels using the load_batch()
function defined elsewhere. It reshapes and transposes the data if needed to match Keras' data format. Finally, it returns NumPy arrays containing the preprocessed image and label data.
The …/cifar100.py
file contains a similar load_data()
function for CIFAR100. It takes an optional label_mode
parameter that can be "fine" or "coarse" to control the type of labels returned. It downloads the CIFAR100 dataset if not present, then loads the training and test batches using load_batch()
. It reshapes the label arrays and transposes the image arrays if needed. Finally, it returns tuples of NumPy arrays containing the loaded and processed CIFAR100 image and label data.
Both files provide easy access to popular image classification datasets for use in examples and testing Keras models. The load_data()
functions handle downloading, loading, preprocessing and returning the data in a format that can be directly used with Keras APIs.
IMDB Movie Reviews Dataset
References: keras/datasets/imdb.py
The Keras dataset module provides access to the IMDB movie reviews dataset for sentiment classification. The IMDB dataset contains 50,000 movie reviews from IMDB, labeled as either positive or negative.
The load_data()
function handles loading and preprocessing the IMDB dataset. It downloads the required files if needed, then loads the preprocessed training and test data using np.load()
. It shuffles the data randomly using np.random.RandomState()
for training. It optionally indexes words starting from a given index using list comprehensions. It also calls the remove_long_seq()
function to truncate sequences longer than the provided maxlen
parameter. It concatenates the training and test data, filters words using num_words
and skip_top
, and replaces outofvocabulary words with the oov_char
. Finally it splits the data back into training and test sets to return.
The get_word_index()
function downloads the word index JSON file from the source if needed. It loads and returns a Python dictionary by parsing the JSON file with json.load()
. This dictionary maps words in the dataset to their integer indices.
Boston Housing Dataset
References: keras/datasets/boston_housing.py
The …/boston_housing.py
file provides access to the Boston housing prices regression dataset. This dataset contains information about houses in Boston from the 1970s, including attributes like crime rate and accessibility to ports. The target variable is the median home price.
The core functionality is the load_data()
function, which loads the dataset. It takes the path, test split fraction, and random seed as arguments. It first checks that the test split is valid. It then calls get_file()
to download the data if needed, caching it locally. The data is loaded into NumPy arrays for the features x
and targets y
using np.load()
. It shuffles the data indices using np.random.RandomState()
with the provided seed. The shuffled data is then split into train and test sets by slicing the arrays based on 1  test_split
. These sliced arrays  x_train
, y_train
, x_test
, y_test
 are returned.
California Housing Dataset
References: keras/datasets/california_housing.py
The load_data()
function in …/california_housing.py
provides access to the California housing prices regression dataset. This function downloads the dataset from an online source if it is not already cached locally. It loads the data from a file containing NumPy arrays for features x
and targets y
.
The function takes parameters like version
, path
, test_split
, and seed
to control how the data is loaded. It asserts that test_split
is between 0 and 1. It uses get_file()
to download the dataset file if not present.
If version
is "small" it subsets the first 600 rows of x
and y
. It shuffles the data indices using np.random.RandomState()
with the provided seed
for reproducibility. Finally, it splits the shuffled data into training and test sets based on test_split
, returning them as NumPy arrays (x_train, y_train), (x_test, y_test)
.
Reuters Dataset
References: keras/datasets/reuters.py
The load_data()
function in …/reuters.py
provides access to the Reuters newswire categorization dataset. It downloads the dataset from Cloud storage if needed, then shuffles and splits the word sequences and labels into train and test sets. Preprocessing options like truncating long sequences, filtering rare words, and replacing rare words with a special token can be applied.
The data is loaded from files containing word indices (xs
) and labels (labels
) using np.load()
. The sequences and labels are shuffled together using np.random.RandomState()
before being split into train and test sets. remove_long_seq()
truncates sequences longer than the specified maxlen
. Word frequencies are used to filter the most common words and replace rare words.
get_word_index()
downloads a JSON file mapping words to indices if needed using get_file()
. It loads and returns this file as a dictionary. get_label_names()
simply returns the list of label names in the order they appear in the training data.
Applications
References: keras/applications
The …/applications
directory contains implementations of popular pretrained models that can be used for transfer learning tasks. These include computer vision models like VGG16
, ResNet50
, InceptionV3
, as well as NLP models like BERT
.
Each model is defined as a Keras Model
class or function in its own file, such as …/vgg16.py
. These files leverage common Keras layers to build the architectures blockbyblock according to specifications in research papers. Weights pretrained on ImageNet are also provided and can be loaded with a single line of code.
Utilities in files like …/imagenet_utils.py
provide functions for standardizing inputs and outputs across models. preprocess_input()
handles preprocessing inputs like scaling pixels, while decode_predictions()
decodes outputs into humanreadable class names. Comprehensive unit tests in …/applications_test.py
validate model behavior under different configurations.
Some key models and their implementations:
VGG16
,VGG19
: Stack convolutional and max pooling blocks to implement the architectures. Weights are loaded from files.ResNet50
,ResNet101
: UseResNet()
with astack_fn
to build residual blocks and define the models.InceptionV3
: Stackmixed
blocks applying parallel convolutions to implement Inception modules.MobileNetV2
: Apply depthwise separable convolutions via_inverted_res_block()
modules to achieve efficiency.
These pretrained models can be easily leveraged for transfer learning tasks with utilities like preprocess_input()
and decode_predictions()
, and by loading weights with a single line of code. The modular implementations in Keras make them straightforward to use and customize.
Pretrained Models
References: keras/applications/vgg16.py
, keras/applications/inception_v3.py
The files …/vgg16.py
, …/inception_v3.py
, and related files contain implementations of popular convolutional neural network architectures that can be used for transfer learning. These pretrained models were trained on the ImageNet dataset and can serve as base models for feature extraction or finetuning on new tasks and datasets.
The VGG16()
function in …/vgg16.py
constructs the VGG16 model architecture using Keras layers like Conv2D
, MaxPooling2D
, Flatten
, and Dense
. It builds the model block by block, applying these layers to recreate the full VGG16 network structure. The function supports loading pretrained ImageNet weights and handling input shapes.
The InceptionV3()
function in …/inception_v3.py
defines the Keras implementation of the Inception V3 CNN architecture. It builds the model by stacking mixed
blocks containing convolutional and pooling layers. These mixed
blocks implement the Inception module architecture where multiple convolution kernels are applied in parallel. The function handles configuring the model, preprocessing inputs, and loading pretrained weights.
Both files contain utilities like preprocess_input()
for standardizing input data and decode_predictions()
for decoding model outputs. They implement these architectures and related functionality using Keras's functional API with layers. The pretrained weights loaded by these models can be used to initialize a base model for transfer learning to new domains and tasks.
Model Utilities
References: keras/applications/imagenet_utils.py
The utilities in …/imagenet_utils.py
provide common functionality for loading pretrained weights, preprocessing inputs, and decoding predictions from models trained on ImageNet.
This file contains functions for preprocessing input images with preprocess_input()
in different modes like "caffe", "tf", and "torch". It can preprocess NumPy arrays or tensors. It first validates the mode
and data_format
arguments before calling _preprocess_numpy_input()
or _preprocess_tensor_input()
to perform the actual preprocessing. These functions handle operations such as RGBBGR conversion, mean subtraction, and scaling pixels depending on the specified mode.
Predictions from models can be decoded into humanreadable class names and descriptions using decode_predictions()
. It first loads the global CLASS_INDEX
mapping, then iterates through predictions to find the top indices and look up the corresponding class information.
The input shape for models is validated and processed by obtain_input_shape()
. It handles default input sizes, required shapes for pretrained weights, minimum size checks, and will raise errors for invalid shapes.
Model Testing
References: keras/applications/applications_test.py
, keras/applications/imagenet_utils_test.py
The Keras applications module contains comprehensive unit tests for the pretrained model implementations in applications.py. These tests are contained in two main files:
The …/applications_test.py
file contains tests for all the available pretrained models. The ApplicationsTest
class inherits from testing.TestCase
and parameterized.TestCase
to leverage Keras testing utilities and parameterize the tests. Tests are run for each model in the MODEL_LIST
with different input shapes, channels, and data formats. Key tests include loading models, running inference on sample data, serialization/deserialization, and specifying classifier activations.
The …/imagenet_utils_test.py
file contains unit tests for the imagenet_utils
preprocessing module. The TestImageNetUtils
class contains tests for preprocess_input()
with both numeric and symbolic data in different modes. It also tests obtain_input_shape()
with valid and invalid cases. Tests are parameterized to cover different arguments.
These two files provide a comprehensive set of unit tests for the pretrained model implementations and associated preprocessing utilities. The parameterized tests in ApplicationsTest
validate each model under a variety of configurations. TestImageNetUtils
thoroughly tests the key preprocessing logic. Together they help ensure the applications code functions as expected.
Examples
References: examples/keras_io
The …/keras_io
directory contains examples demonstrating endtoend workflows for training neural network models on different types of data using Keras. It covers domains like computer vision, natural language processing, audio processing, time series forecasting, and structured data modeling.
Some key subdirectories and their purposes are:

/vision
contains computer vision examples applying CNNs, Transformers and other architectures to tasks like image classification, object detection, segmentation using popular datasets like CIFAR10, Flowers, COCO. 
/nlp
includes natural language processing examples for sequence modeling, language modeling, text classification and generation leveraging techniques like RNNs, CNNs and Transformers. It uses datasets like IMDB, WikiText. 
/tensorflow/audio
explores audio processing and speech tasks, training models like 1D CNNs and Transformers on datasets for speaker ID, speech recognition, accent detection. 
/structured_data
demonstrates loading tabular data and building neural networks for classification and recommendations on datasets like Adult Census Income, MovieLens ratings. It utilizes techniques like embeddings. 
/timeseries
applies CNNs, RNNs, Transformers and Graph Networks to problems like anomaly detection, forecasting on benchmark time series datasets like NAB, FordA, traffic speeds data.
Many files showcase endtoend implementations with core classes and functions that aid workflows. Some examples:

The
TextVectorization
layer in files liketext_classification_from_scratch.py
handles text preprocessing into integer sequences. 
The
PatchEncoder
class encodes images into patch embeddings with positional encodings for vision Transformers in files likecct.py
. 
The
TransformerBlock
implements transformer layers in NLP examples likeneural_machine_translation_with_transformer.py
. 
Files like
timeseries_classification_transformer.py
contain functions liketransformer_encoder()
that build reusable transformer blocks. 
The
FeatureSpace
class instructured_data_classification_with_feature_space.py
handles structured data preprocessing.
Computer Vision
References: examples/keras_io/vision
This section covers workflows for training computer vision models on image data using the Keras examples. The key functionality demonstrated includes:

Loading and preprocessing popular image datasets like CIFAR10, MNIST, Flowers, Oxford Pets, etc. using TensorFlow Datasets and Keras preprocessing layers. This is handled by functions like
get_dataset()
which load, preprocess, and return batches of images and labels for training. 
Defining common CNN architectures like ResNet, VGG, MobileNet, DenseNet using the Keras Applications and Keras layers like
Conv2D
,MaxPooling2D
,BatchNormalization
. Functions likeget_model()
build models using these components. 
Implementing Transformer models for vision using components like
Patches
to extract patches from images,PatchEncoder
to encode patches with positional embeddings, andTransformerEncoder
to apply selfattention. Files like…/cct.py
demonstrate this approach. 
Training models on datasets with techniques like data augmentation with
RandomFlip
,RandomRotation
, learning rate scheduling, early stopping, model checkpointing. These are handled by functions likerun_experiment()
. 
Evaluating trained models on validation data and visualizing predictions. Functions like
predict()
anddisplay()
perform evaluation and visualization.
Some important implementation details:
The get_dataset()
function in files like …/oxford_pets_image_segmentation.py
loads images and masks from disk using TensorFlow IO, resizes them to the expected input size, and vectorizes them for training. It returns a TensorFlow Dataset
object that can be iterated over to fetch batches.
The get_model()
function in files like …/oxford_pets_image_segmentation.py
defines common CNN architectures. It uses Keras layers like SeparableConv2D
and Conv2DTranspose
with techniques like batch normalization and residual connections.
Files like …/cct.py
demonstrate how the Patches
class extracts patches from input images using keras.ops.image.extract_patches
. The PatchEncoder
class encodes the patches into embedding vectors using Dense
layers. It also adds learned position embeddings. The TransformerEncoder
applies selfattention on the encoded patches using MultiHeadAttention
.
The run_experiment()
function in many files handles training the models endtoend on the preprocessed datasets. It applies techniques like learning rate scheduling, early stopping, model checkpointing.
Functions like predict()
make predictions on heldout data, and display()
visualizes examples to qualitatively evaluate models.
Natural Language Processing
References: examples/keras_io/nlp
This section covers several examples that demonstrate common natural language processing workflows using Keras for tasks like text classification, sequence modeling, and language modeling. The …/nlp
directory contains examples applying recurrent neural networks, convolutional neural networks, and Transformer architectures to problems involving text data.
Some key examples include:

…/addition_rnn.py
demonstrates a basic sequencetosequence model for adding strings of numbers. It uses theCharacterTable
class to onehot encode variable length string inputs and outputs. An RNN encoder encodes the input while an RNN decoder generates the target sequence. 
…/bidirectional_lstm_imdb.py
implements sentiment classification on the IMDB dataset using a bidirectional LSTM architecture. It loads and preprocesses the dataset, defines the biLSTM model, and trains it endtoend for text classification. 
…/lstm_seq2seq.py
contains an encoderdecoder model for English to French translation. AnLSTM
encoder encodes the input sequence while anLSTM
decoder generates the target sequence conditioned on the encoder output. 
…/neural_machine_translation_with_keras_nlp.py
demonstrates neural machine translation from English to Spanish using the Transformer architecture. It leverages classes likeTransformerEncoder
andTransformerDecoder
from the KerasNLP library. 
…/pretraining_BERT.py
shows how to pretrain a BERT model from scratch on the WikiText2 dataset using HuggingFace Transformers functionality.
These examples cover important NLP tasks, model types, and stateoftheart techniques using popular libraries like Keras, KerasNLP, and HuggingFace Transformers. They provide full workflows from data preprocessing to model definition, training, and evaluation.
Audio Processing
References: examples/keras_io/tensorflow/audio
The examples in the …/audio
directory demonstrate workflows for training audio models on tasks like speaker recognition, speech recognition, and accent classification. The code provides endtoend examples of processing raw audio data into features, building deep learning models, training them, and evaluating performance.
The …/speaker_recognition_using_cnn.py
file implements a speaker recognition model using a 1D CNN. It loads speech samples from different speakers with added background noise, takes the fast Fourier transform (FFT) of the samples to represent them in the frequency domain, and trains the CNN to predict the correct speaker. The residual_block
function defines the residual block architecture used in the CNN, which helps train very deep networks and model longterm dependencies in audio. The add_noise
function implements an important preprocessing step of adding background noise to the training set in a way that scales the noise based on each sample's amplitude. The audio_to_fft
function applies the FFT transform needed to input audio features to the CNN model.
The …/uk_ireland_accent_recognition.py
file trains a model to classify English accents. It uses the pretrained Yamnet model to extract embeddings from input audio clips with the filepath_to_embeddings
function. This function loads the audio, resamples it, runs it through Yamnet to get embeddings, and duplicates the labels to match the number of embeddings for each clip. It creates a labeled TensorFlow dataset with the dataframe_to_dataset
function for training the model. Class weights are calculated from the dataset by counting samples for each class using tf.math.bincount
on the Yamnet outputs.
The …/transformer_asr.py
implements an endtoend automatic speech recognition model using a Transformer architecture. The TransformerEncoder
class defines the encoder layer by applying multihead attention, feedforward networks, and layer normalization. The TransformerDecoder
class similarly implements the decoder layer with masked multihead selfattention and encoderdecoder attention. The Transformer
model class combines the encoder and decoder. It also includes functions for preprocessing the LJSpeech dataset into spectrograms and text.
Generative Modeling
References: examples/keras_io/generative
, examples/keras_io/tensorflow/generative
The code in …/generative
and …/generative
demonstrates workflows for training popular generative models like GANs, VAEs, and diffusion models on image datasets. It implements models such as CycleGAN, DCGAN, VAE, PixelCNN, DDIM, DDPM and more to generate photos, faces, digits and other types of images.
Key classes that power many of these generative models include GAN
, DiffusionModel
, VAE
, and CycleGAN
. The GAN
class handles the core training logic for GANs by overriding the train_step()
method. This method trains the discriminator on real and fake images, and the generator using a loss function. The DiffusionModel
class implements the overall training and sampling process for diffusion models. Its train_step()
method trains the model via denoising score matching by diffusing inputs with noise and computing the MSE loss between predicted and actual noise. The VAE
class combines the encoder and decoder models into a single endtoend trainable model. The CycleGAN
class calculates important cycle consistency and adversarial losses during training.
Some important implementation techniques seen across models include:
 Defining model architectures like discriminators using convolutional layers and generators using transposed convolutions.
 Implementing diffusion schedules via utilities that define the forward and reverse diffusion processes.
 Applying techniques like normalization, position embeddings, and attention in diffusion model architectures.
 Sampling from models during training using callbacks to monitor generation quality.
 Preprocessing datasets and defining metrics to evaluate model performance.
The code provides fully implemented generative modeling workflows, demonstrating best practices for building, training, and evaluating popular generative architectures on image and sequence data.
Reinforcement Learning
References: examples/keras_io/tensorflow/rl
This section demonstrates workflows for training reinforcement learning models to optimize agent behaviors. The code implements several important RL algorithms.
The …/rl
directory contains Python files that apply different RL algorithms to solve OpenAI Gym environments. The actor_critic_cartpole.py
file uses an actorcritic method with a shared neural network to solve CartPolev1. It trains the actor to output advantageous action probabilities and the critic to estimate returns. The model is updated via policy gradients to maximize rewards.
The file implements an actorcritic method using a Keras deep learning model with shared layers between the actor and critic. The actor outputs the probability of each action, while the critic outputs the expected future reward. The model is trained to maximize rewards by minimizing the loss between the critic's predictions and actual returns, as well as increasing the probability of actions that lead to higher returns compared to the critic's predictions.
Structured Data
References: examples/keras_io/structured_data
, examples/keras_io/tensorflow/structured_data
The examples in …/structured_data
and …/structured_data
demonstrate workflows for training models on tabular and structured data. This includes loading and preprocessing CSV datasets, building neural network models to perform tasks like classification and recommendation, and training the models on the data.
The FeatureSpace
class implemented in …/structured_data_classification_with_feature_space.py
provides a clean interface for handling feature preprocessing. It is initialized with a features dictionary specifying the preprocessing type for each feature. The adapt()
method indexes categorical values and computes normalization stats from the training data. When called on a feature dictionary, FeatureSpace
returns a concatenated preprocessed vector. This allows asynchronous preprocessing via TensorFlow data pipelines and including preprocessing directly in inference models.
The RecommenderNet
class in …/collaborative_filtering_movielens.py
defines an embeddingbased collaborative filtering model for movie recommendation. It embeds users and movies, computes the dot product of the embeddings to get a match score, and adds biases before passing through sigmoid. This model is trained on the MovieLens dataset to minimize binary cross entropy loss.
Time Series
References: examples/keras_io/timeseries
This section demonstrates workflows for training various types of time series models on benchmark datasets. The examples cover common time series tasks like anomaly detection, classification, and forecasting.
The …/timeseries
directory contains several illustrative examples. The file …/timeseries_anomaly_detection.py
shows how to perform anomaly detection on time series data from the NAB dataset using a convolutional autoencoder model. It loads and normalizes the data to create fixedlength sequences, builds an encoderdecoder model with Conv1D layers, trains it on normal sequences, then detects anomalies in test data based on reconstruction error thresholding.
The file …/timeseries_classification_from_scratch.py
demonstrates time series classification from scratch on the FordA dataset using a fully convolutional neural network (FCNN) architecture. It loads and standardizes the FordA data, defines an FCNN model with Conv1D and pooling layers via the make_model()
function, trains it endtoend with callbacks, and evaluates the saved best model on heldout test data.
Another example is …/timeseries_classification_transformer.py
, which builds a Transformer model for the time series classification task. It leverages the transformer_encoder()
block to efficiently stack encoder layers, applies global average pooling, and adds a classifier head. The model achieves around 85% accuracy on the FordA dataset without hyperparameter tuning.
The file …/timeseries_weather_forecasting.py
shows timeseries forecasting of climate data using an LSTM architecture. It loads Jena weather data, preprocesses it into windows via timeseries_dataset_from_array()
, trains an LSTMDense model to predict 12 hours ahead, and validates predictions against true future values.
Transfer Learning
References: examples/keras_io/pytorch
, examples/keras_io/tensorflow/vision
This section demonstrates workflows for leveraging pretrained models via transfer learning. Transfer learning is a technique where a model pretrained on a large dataset is reused as the starting point for a new task, rather than training a model completely from scratch. The weights of the pretrained model are used as an initialization for the new model, and then a subset of the layers are finetuned on the new task while keeping other layers frozen. This helps the model learn meaningful representations from a smaller dataset more efficiently.
The code provides examples of transfer learning using popular pretrained models like VGG16, ResNet50, BERT and BiT. The MyBiTModel
class loads a BiT model hub module and adds a new classification head on top to finetune the model for a new dataset. Only the head layers are initialized randomly while the BiT module weights remain fixed. The model is trained on a small Flowers dataset, demonstrating BiT can achieve good accuracy even with limited labeled data.
The torchvision_keras.py
file shows loading a pretrained ResNet18 model from TorchVision and finetuning it for image classification on Imagenette using Keras. The TorchModuleWrapper
layer plays a key role by allowing any PyTorch module to be used as a Keras layer, enabling the PyTorch ResNet18 to be included inside Keras models and trained endtoend.