Quickstart Tutorial
In this quickstart tutorial, we will walk you through the MNIST image classification task with RPN
built based on the Taylor's expansion and dual lphm reconciliation functions via the APIs provided by tinybig
.
We assume you have correctly installed the latest tinybig
and its dependency packages already.
If you haven't installed them yet, please refer to the installation page for the guidance.
This quickstart tutorial is prepared based on the
RPN 1 paper [1]
and RPN 2 paper [2]
.
We also recommend reading that paper first for detailed technical information about the RPN models and tinybig
toolkit.
Reference
[1] Jiawei Zhang. RPN: Reconciled Polynomial Network. Towards Unifying PGMs, Kernel SVMs, MLP and KAN. 2024. ArXiv, abs/2407.04819.
[2] Jiawei Zhang. RPN 2: On Interdependence Function Learning. Towards Unifying and Advancing CNN, RNN, GNN and Transformer. 2024. ArXiv, abs/2411.11162.
Environment Setup
This tutorial was written on a mac with apple silicon, and we will use 'mps'
as the device here,
and you can change it to 'cpu'
or 'cuda'
according to the device you are using now.
Loading Datasets
MNIST Dataloader
In this quickstart tutorial, we will take the MNIST dataset as an example to illustrate how tinybig
loads data:
Data downloading outputs
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz
100%|██████████| 9912422/9912422 [00:00<00:00, 12146011.18it/s]
Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz
100%|██████████| 28881/28881 [00:00<00:00, 278204.89it/s]
Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz
100%|██████████| 1648877/1648877 [00:04<00:00, 390733.03it/s]
Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz
100%|██████████| 4542/4542 [00:00<00:00, 2221117.96it/s]
Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw
The mnist_data.load(cache_dir='./data/')
method will download the MNIST dataset from torchvision to a local directory './data/'
.
With the train_loader
and test_loader
, we can access the MNIST image and label mini-batches in the training and
testing sets:
Data batch printing outputs
X shape: torch.Size([64, 784]) y.shape: torch.Size([64])
X tensor([[-0.4242, -0.4242, -0.4242, ..., -0.4242, -0.4242, -0.4242],
[-0.4242, -0.4242, -0.4242, ..., -0.4242, -0.4242, -0.4242],
[-0.4242, -0.4242, -0.4242, ..., -0.4242, -0.4242, -0.4242],
...,
[-0.4242, -0.4242, -0.4242, ..., -0.4242, -0.4242, -0.4242],
[-0.4242, -0.4242, -0.4242, ..., -0.4242, -0.4242, -0.4242],
[-0.4242, -0.4242, -0.4242, ..., -0.4242, -0.4242, -0.4242]])
y tensor([3, 7, 8, 5, 6, 1, 0, 3, 1, 7, 4, 1, 3, 4, 4, 8, 4, 8, 2, 4, 3, 5, 5, 7,
5, 9, 4, 2, 2, 3, 3, 4, 1, 2, 7, 2, 9, 0, 2, 4, 9, 4, 9, 2, 1, 3, 6, 5,
9, 4, 4, 8, 0, 3, 2, 8, 0, 7, 3, 4, 9, 4, 0, 5])
Built-in image data transformation
Note: the tinybig.data.mnist
has a built-in method to flatten and normalize the MNIST images from tensors of size \(28 \times 28\) into vectors of length \(784\) via torchvision.transforms
:
Creating the RPN Model
To model the underlying data distribution mapping \(f: R^m \to R^n\), the RPN model disentangles the input data from model parameters into three component functions:
- Data Expansion Function: \(\kappa: R^m \to R^D\),
- Parameter Reconciliatoin Function: \(\psi: R^l \to R^{n \times D}\),
- Remainder Function \(\pi: R^m \to R^n\),
where \(m\) and \(n\) denote the input and output space dimensions, respectively. Notation \(D\) denotes the target expansion space dimension (determined by the expansion function and input dimension \(m\)) and \(l\) is the number of learnable parameters in the model (determined by the reconciliation function and dimensions \(n\) and \(D\)).
So, the underlying mapping \(f\) can be approximated by RPN as the inner product of the expansion function with the reconciliation function, subsequentlly summed with the remainder function: $$ g(\mathbf{x} | \mathbf{w}) = \left \langle \kappa(\mathbf{x}), \psi(\mathbf{w}) \right \rangle + \pi(\mathbf{x}), $$ where for any input data instance \(\mathbf{x} \in R^m\).
Data Expansion Function
Various data expansion functions have been implemented in tinybig
already. In this tutorial, we will use the
Taylor's expansion function as an example to illustrate how data expansion works.
Data expansion printing outputs
In the above code, we define a Taylor's expansion function of order d=2
and 'layer_norm'
as the post-processing function.
By applying the expansion function to a data batch with one single data instance, we print output the expansion dimensions
as \(D = 784 + 784 \times 784 = 615440\).
Expansion function input shapes
Note: the expansion function will accept batch inputs as 2D tensors of shape (B, m)
, with B
and m
denote the batch size and input dimension,
such as, X[0:1,:]
or X
. If we feed list, array or 1D tensor, e.g., X[0,:]
, to the expansion function, it will report errors).
All the expansion functions in tinybig
has a method calculate_D(m)
, which can automatically calculates the
target expansion space dimension \(D\) based on the input space dimension, i.e., the parameter \(m\). The calculated \(D\) will
be used later in the reconciliation functions.
Parameter Reconciliation Function
In tinybig
, we have implemented different categories of parameter reconciliation functions. Below, we will use the
dual lphm to illustrate how parameter reconciliation works. Several other reconciliation functions will also
be introduced in the tutorial articles.
Assuming we need to build a RPN layer with the output dimension \(n=64\) here:
For the parameters, we need to make sure \(p\) divides \(n\) and \(q\) divides \(D\). As to the rank parameter \(r\), it is defined depending on how many parameters we plan to use for the model.
We use r=5
here, but you can also try other rank values, e.g., r=2
,
which will further reduce the number of parameters but still achieve decent performance.
Automatic parameter creation
We will not create parameters here, which can be automatically created in the RPN head to be used below.
Remainder Function
By default, we will use the zero remainder in this tutorial, which will not create any learnable parameters:
RPN Head
Based on the above component functions, we can combine them together to define the RPN mode. Below, we will first define the RPN head first, which will be used to compose the layers of RPN.
Here, we build a rpn head with one channel of parameters. The parameter data_transformation
is a general name of
data_expansion
, and parameter_fabrication
can be viewed as equivalent to parameter_reconciliation
.
We use these general data_transformation
and parameter_fabrication
names here not only for their current functionality
but also to establish a framework that allows for the future expansion of tinybig
, enabling the addition of new
functions and components under these broader categorical names.
RPN Layer
The above head can be used to build the first RPN layer of RPN:
Deep RPN Model with Multi-Layers
Via a similar process, we can also define two more RPN layers:
By staking these three layers on top of each other, we can build a deep RPN model:
Later on, in the tutorials and examples, we will introduce an easier way to define the model architecture directly with the configuration file instead.
RPN Training on MNIST
Below we will train the RPN model with the loaded MNIST mnist_loaders
.
Learner Setup
tinybig
provides a built-in leaner module, which can train the input model with the provided optimizer. Below, we will
set up the learner with torch.nn.CrossEntropyLoss
as the loss function, torch.optim.AdamW
as the optimizer, and
torch.optim.lr_scheduler.ExponentialLR
as the learning rate scheduler:
Here, we train the model for just 3 epochs to quickly assess its performance. You can increase the number of epochs to train the model until convergence.
Training
With the previously loaded MNIST mnist_loaders
, we can train the RPN model built above with the learner
.
To monitor the learning performance, we also pass an evaluation metric to the learner to record the training accuracy scores:
We count the total number of learnable parameters involved in the RPN model built above and
provide the tqdm
training records as follows:
Model training records
parameter num: 9330
100%|██████████| 938/938 [00:42<00:00, 21.86it/s, epoch=0/3, loss=0.0519, lr=0.002, metric_score=0.969, time=43.1]
Epoch: 0, Test Loss: 0.12760563759773874, Test Score: 0.9621, Time Cost: 3.982516050338745
100%|██████████| 938/938 [00:43<00:00, 21.74it/s, epoch=1/3, loss=0.0112, lr=0.0019, metric_score=1, time=90.2]
Epoch: 1, Test Loss: 0.09334634791371549, Test Score: 0.9717, Time Cost: 4.184643030166626
100%|██████████| 938/938 [00:42<00:00, 21.90it/s, epoch=2/3, loss=0.0212, lr=0.0018, metric_score=1, time=137]
Epoch: 2, Test Loss: 0.08378902525169431, Test Score: 0.9749, Time Cost: 4.574808120727539
Testing
Furthermore, by applying the trained model to the testing set, we can obtain the prediction results obtained by the model as follows:
The above results indicate that RPN with a 3-layer architecture will achieve a decent testing accuracy score of 0.9749
,
also it only uses 9330
learnable parameters, much less than that of MLP and KAN with similar architectures.
What is the Next?
After finishing this quickstart tutorial, you may also check the Tutorials tab and the Examples tab of this website
for more in-depth tutorial articles and code examples developed based on the tinybig
library.