tinybig.module
This module provides the layer, head and component function modules to build the RPN model within the tinyBIG toolkit.
RPN Model Architecture
Formally, given the underlying data distribution mapping \(f: {R}^m \to {R}^n\), the RPN model proposes to approximate function \(f\) as follows: $$ \begin{equation} g(\mathbf{x} | \mathbf{w}) = \left \langle \kappa_{\xi} (\mathbf{x}), \psi(\mathbf{w}) \right \rangle + \pi(\mathbf{x}), \end{equation} $$
The RPN model disentangles input data from model parameters through the expansion functions \(\kappa\) and reconciliation function \(\psi\), subsequently summed with the remainder function \(\pi\), where
-
\(\kappa_{\xi}: {R}^m \to {R}^{D}\) is named as the data interdependent transformation function. It is a composite function of the data transformation function \(\kappa\) and the data interdependence function \(\xi\). Notation \(D\) is the target expansion space dimension.
-
\(\psi: {R}^l \to {R}^{n \times D}\) is named as the parameter reconciliation function (or parameter fabrication function to be general), which is defined only on the parameters without any input data.
-
\(\pi: {R}^m \to {R}^n\) is named as the remainder function.
-
\(\xi_a: {R}^{b \times m} \to {R}^{m \times m'}\) and \(\xi_i: {R}^{b \times m} \to {R}^{b \times b'}\) defined on the input data batch \(\mathbf{X} \in R^{b \times m}\) are named as the attribute and instance data interdependence functions, respectively.
Data Interdependent Transformation Function
Given this input data batch \(\mathbf{X} \in R^{b \times m}\), we can formulate the data interdependence transformation function \(\kappa_{\xi}\) as follows:
\[ \begin{equation} \kappa_{\xi}(\mathbf{X}) = \mathbf{A}^\top_{\xi_i} \kappa(\mathbf{X} \mathbf{A}_{\xi_a}) \in {R}^{b' \times D}. \end{equation} \]
These attribute and instance interdependence matrices \(\mathbf{A}_{\xi_a} \in {R}^{m \times m'}\) and \(\mathbf{A}_{\xi_i} \in {R}^{b \times b'}\) are computed with the corresponding interdependence functions defined above, i.e.,
\[ \begin{equation} \mathbf{A}_{\xi_a} = \xi_a(\mathbf{X}) \in {R}^{m \times m'} \text{, and } \mathbf{A}_{\xi_i} = \xi_i(\mathbf{X}) \in {R}^{b \times b'}. \end{equation} \]
RPN Layer with Multi-Head
Similar to the Transformer with multi-head attention, the RPN model employs a multi-head architecture, where each head can disentangle the input data and model parameters using different expansion, reconciliation and remainder functions, respectively: $$ \begin{equation} \text{Fusion} \left( \left\{ \left\langle \kappa^{(h)}_{\xi^{(h)}}(\mathbf{X}), \psi^{(h)}(\mathbf{w}^{(h)}) \right\rangle + \pi^{(h)}(\mathbf{X}) \right\}; h \in \{1, 2, \cdots, H \} \right), \end{equation} $$ where the superscript "\(h\)" indicates the head index and \(H\) denotes the total head number, and \(\text{Fusion}(\cdot)\) denotes the multi-head fusion function. By default, summation is used to combine the results from all these heads.
RPN Head with Multi-Channel
Similar to convolutional neural networks (CNNs) employing multiple filters, RPN allows each head to have multiple channels of parameters applied to the same data expansion. For example, for the \(h_{th}\) head, RPN defines its multi-channel parameters as \(\mathbf{w}^{(h),0}, \mathbf{w}^{(h),1}, \cdots, \mathbf{w}^{(h), C-1}\), where \(C\) denotes the number of channels. These parameters will be reconciled using the same parameter reconciliation function, as shown below: $$ \begin{equation} Fusion \left( \left\{ \left\langle \kappa^{(h)}_{\xi^{(h), c}}(\mathbf{X}), \psi^{(h)}(\mathbf{w}^{(h), c}) \right\rangle + \pi^{(h)}(\mathbf{X}) \right\}; h,c \in \{1, 2, \cdots, H/C \} \right), \end{equation} $$
Classes in this Module
This module contains the following categories of component functions and modules:
- Base Function Template
- Data Transformation/Expansion/Compression Functions
- Parameter Fabrication/Reconciliation Functions
- Data/Structural Interdependence Functions
- Remainder Functions
- Fusion Functions
- RPN Layer with Multi-Head
- RPN Head with Multi-Channel