fastai[Edit section][Copy link]
The fastai repository provides a high-level deep learning library built on top of PyTorch. It aims to make deep learning more accessible and productive through its design, abstractions, and features.
Some of the key aspects of fastai include:
-
Provides domain-specific libraries for computer vision, natural language processing, tabular data, and collaborative filtering that handle common tasks like loading data and defining models Computer Vision Natural Language Processing Tabular Data
-
Implements a flexible callback system that allows injecting arbitrary code during model training like learning rate scheduling, mixed precision, and regularization Callbacks
-
Contains a training loop abstraction that handles optimization, losses, metrics, and other training mechanics in a consistent way Model Training
-
Provides utilities for loading, splitting, labeling, encoding, normalizing and transforming various types of data Data Loading and Preprocessing
-
Implements distributed training functionality to train models across multiple GPUs/machines Distributed Training
-
Supports mixed precision training using float16 to accelerate training on GPUs Mixed Precision
-
Contains tools for model interpretation, analysis and debugging like visualization and identifying top losses Interpretability
The key design choices are composing domain-specific libraries on top of a flexible core, providing both high-level abstractions while also allowing detailed customization via the callback system. The libraries build on PyTorch and leverage its capabilities.
Computer Vision[Edit section][Copy link]
References: fastai/vision
, dev_nbs/course
, nbs/examples
The core functionality provided in the fastai library for computer vision allows training common CNN models on image data with a simplified and optimized training process. The …/vision
package handles this end-to-end, from loading and preprocessing image datasets to defining models to a high-level training loop.
Data Loading[Edit section][Copy link]
References: fastai/vision/data.py
, dev_nbs/course
The main classes for loading and preprocessing image data are defined in …/data.py
. The class provides methods for loading image data into PyTorch DataLoaders from various sources like folders, lists, and DataFrames. It handles details like preprocessing, normalization, and splitting data into training, validation, and test sets.
Models[Edit section][Copy link]
References: fastai/vision/models
, fastai/vision/models/__init__.py
, fastai/vision/models/all.py
The …/models
directory contains implementations of common computer vision models through well-defined classes.
Data Augmentation[Edit section][Copy link]
References: fastai/vision/augment.py
The …/augment.py
file contains implementations of common image augmentation techniques that can be randomly applied during training. Core functionality is provided for randomly applying transforms. Classes inherit functionality for applying transforms to images.
Natural Language Processing[Edit section][Copy link]
References: fastai/text
, dev_nbs
The …/text
directory provides utilities for common natural language processing tasks like text classification and language modeling. It contains functionality for preprocessing text data, creating data loaders and defining neural network models for NLP problems.
Models[Edit section][Copy link]
References: fastai/text/models/awdlstm.py
, fastai/text/models/core.py
The …/models
directory contains implementations of common neural networks for natural language processing tasks. The core implementation is defined in …/awdlstm.py
.
Training[Edit section][Copy link]
References: fastai/text/learner.py
The training loop calculates losses using the model's predictions on batches of inputs. It applies the specified optimizer to minimize these losses over epochs. Callbacks can be added to customize training. For example, one callback implements a one-cycle learning rate schedule.
Read moreTabular Data[Edit section][Copy link]
References: fastai/tabular
, nbs/examples
The fastai library provides a set of tools for building, training, and evaluating machine learning models on structured tabular data. The core functionality is centered around preprocessing tabular data stored in Pandas DataFrames, defining common tabular model architectures, and abstracting the training loop into a learner class tailored for tabular tasks.
Read moreTraining Loop[Edit section][Copy link]
References: fastai/tabular/learner.py
, fastai/callback/all.py
The …/learner.py
file handles training tabular models. It constructs data loaders from input data and passes batches to the model during optimization.
Data Loading and Preprocessing[Edit section][Copy link]
References: fastai/data
, fastai/text
, fastai/tabular
, fastai/vision
The fastai library provides extensive functionality for loading, preprocessing, and transforming various types of data for deep learning tasks. This functionality is implemented across several key modules and files in the library.
Read moreData Loading[Edit section][Copy link]
References: fastai/data/load.py
, fastai/data/external.py
The core functionality for loading data from various sources into PyTorch datasets and dataloaders is handled by code in the …/load.py
file. This file contains implementations of the main objects used for loading data.
Data Splits[Edit section][Copy link]
References: fastai/data/load.py
The …/load.py
file contains functionality for splitting data into training and validation sets when loading data. It supports passing a validation split ratio.
Labeling[Edit section][Copy link]
References: fastai/data/load.py
The …/load.py
file contains functionality for labeling and encoding targets as part of the data loading process. Errors during the labeling process are caught and informative errors are raised to help with debugging. The labeling functionality provides a consistent interface that works across different types of data and tasks.
Transforms[Edit section][Copy link]
References: fastai/data/transforms.py
, fastai/vision/augment.py
The …/transforms.py
file contains utilities for loading, splitting, and transforming datasets.
Pipelines[Edit section][Copy link]
References: fastai/data/block.py
The file …/block.py
contains classes and functions for building data pipelines from a data source.
Downloads[Edit section][Copy link]
References: fastai/data/external.py
, fastai/data/download_checks.py
The main functionality for downloading external datasets is handled in …/external.py
. This file contains a constants that centralizes URLs. It also contains a function for retrieving configuration settings and constructing download paths.
Model Training[Edit section][Copy link]
References: fastai
, nbs/examples
The …/learner.py
file provides high-level utilities for training PyTorch models. The core class is Learner
, which combines a model, data loaders (…/load.py
), loss function, and callbacks into a single object. Its main methods orchestrate the overall training loop by calling callbacks at appropriate points.
Training Loop[Edit section][Copy link]
References: fastai/learner.py
The training loop handles the overall flow of training. It contains the model, data loaders, loss function, optimizer, and callbacks. During training, it orchestrates the process by calling callbacks at each step.
Read moreCallbacks[Edit section][Copy link]
References: fastai/callback
, fastai/callback/core.py
Callbacks allow injecting custom logic into the training loop at different points. Key callbacks customize training by running code at the start and end of epochs.
Read moreOptimizers[Edit section][Copy link]
References: fastai/optimizer.py
The …/optimizer.py
file provides implementations of common optimizers for updating model weights during training. Optimizers are implemented by composing callback functions that define the optimization steps.
Learning Rate Schedulers[Edit section][Copy link]
References: fastai/optimizer.py
The …/optimizer.py
file implements various learning rate schedules for fast and effective model training. Learning rate schedules allow dynamically adjusting the learning rate during training to improve optimization.
Losses[Edit section][Copy link]
References: fastai/losses.py
The …/losses.py
file provides commonly used loss functions for training deep learning models on different types of tasks.
Metrics[Edit section][Copy link]
References: fastai/metrics.py
The …/metrics.py
file provides implementations of many common machine learning metrics for monitoring model training. It contains both individual metric functions and subclasses that accumulate metrics over batches.
Utilities[Edit section][Copy link]
References: fastai/torch_core.py
This section covers helper functions provided in …/torch_core.py
that simplify training PyTorch models. Key functionality includes:
Distributed Training[Edit section][Copy link]
References: fastai/distributed.py
The …/distributed.py
file provides functionality for distributing model training across multiple GPUs or machines. It handles wrapping models and data for parallel computation.
Mixed Precision[Edit section][Copy link]
References: fastai
Mixed precision training with float16 can accelerate training on GPUs by performing operations with lower precision numbers while still tracking the model parameters in float32 for better accuracy. This allows utilizing the GPU's tensor cores which provide a significant speedup for float16 operations.
Read moreCallbacks[Edit section][Copy link]
References: fastai/callback
, nbs/examples
The core functionality of callbacks in fastai is to customize model training by injecting logic at different points in the training loop. Callbacks allow injecting code before, after, or during batches, epochs, and entire training runs. This provides a flexible way to implement techniques like learning rate scheduling, regularization, mixed precision training, and distributed training without modifying the core training loop code.
Read moreData Augmentation Callbacks[Edit section][Copy link]
References: fastai/callback/mixup.py
The …/mixup.py
file implements callbacks for data augmentation techniques during model training.
Optimization Callbacks[Edit section][Copy link]
References: fastai/callback/schedule.py
, fastai/callback/tracker.py
The …/schedule.py
file implements callbacks that customize optimization during training by adjusting hyperparameters like the learning rate.
Regularization Callbacks[Edit section][Copy link]
References: fastai
The main regularization callback implemented in fastai applies weight decay during training. This callback calculates the L2 norm of each parameter after the backward pass. The L2 norm is accumulated as a regularization loss term, which gets optimized along with the main training loss function. This helps prevent overfitting by discouraging reliance on a few strong weights.
Read moreLogging Callbacks[Edit section][Copy link]
References: fastai/callback/progress.py
The …/progress.py
file contains callbacks that handle logging training progress and metrics to files. These callbacks provide a consistent interface for monitoring and recording a model's training progress.
Model Analysis Callbacks[Edit section][Copy link]
References: fastai/callback/hook.py
This section covers callbacks that can be used during model training for interpretation, analysis, and debugging purposes. The …/hook.py
file contains utilities that allow inspecting and analyzing models.
Distributed Training Callbacks[Edit section][Copy link]
References: fastai
The …/distributed.py
file contains utilities for distributing training across multiple GPUs or machines. It provides functionality for wrapping models and handling distributed training.
FP16 Training Callbacks[Edit section][Copy link]
References: fastai/callback/fp16.py
The …/fp16.py
file contains callbacks that enable mixed precision training with float16. During mixed precision training, parameters are stored in float16 format to save memory and speed up computation, while gradients are kept in float32 for numerical stability.
Medical Imaging Callbacks[Edit section][Copy link]
References: fastai/medical/imaging.py
The …/imaging.py
file contains specialized callbacks for preprocessing medical imaging data during model training. Metadata and pixel data can be represented differently.
Interpretability[Edit section][Copy link]
References: fastai
The fastai library provides tools for interpreting models implemented in …/interpret.py
.
Model Analysis[Edit section][Copy link]
References: fastai/callback/hook.py
The …/hook.py
file contains utilities for analyzing models during and after training. It provides functions for inspecting models by passing dummy data through them.
Model Debugging[Edit section][Copy link]
References: fastai/callback/hook.py
, fastai/interpret.py
This section covers identifying and debugging errors in models. The …/hook.py
file contains utilities for inspecting models.
Visualization[Edit section][Copy link]
References: fastai/callback/hook.py
, fastai/interpret.py
The …/hook.py
file contains utilities for visualizing models.
Metrics[Edit section][Copy link]
References: fastai/metrics.py
The …/metrics.py
file contains implementations of many common machine learning metrics. It provides individual metric functions as well as the class for accumulating metrics over batches during training.
Losses[Edit section][Copy link]
References: fastai/losses.py
The …/losses.py
file provides loss functions that can be used for model analysis during and after training. It contains common losses that are useful for training deep learning models. Additionally, it implements losses designed for semantic segmentation tasks.
Medical Data[Edit section][Copy link]
References: fastai/medical
The …/medical
directory contains functionality for medical imaging and text data. It provides utilities for loading, preprocessing, and analyzing medical image and text data.