Skip to content

hm_reconciliation

Bases: fabrication

The hypercomplex multiplication based parameter reconciliation function.

It performs the hypercomplex multiplication based parameter reconciliation, and returns the reconciled parameter matrix of shape (n, D). This class inherits from the reconciliation class (i.e., the fabrication class in the module directory).

...

Notes

Formally, given the parameter vector \(\mathbf{w}\) of length \(l\) through partitioning and subsequent reshaping, we can create two parameter sub-matrices \(\mathbf{A} \in R^{p \times q}\) and \(\mathbf{B} \in R^{s \times t}\). The hypercomplex multiplication-based reconciliation computes the Kronecker product of these two parameter matrices to define the reconcilied parameter matrix of shape (n, D) as follows: $$ \begin{equation} \psi(\mathbf{w}) = \mathbf{A} \otimes \mathbf{B} \in {R}^{n \times D}, \end{equation} $$ where the parameter dimension parameters should meeting the constraints that \(l = pq + st\) and \(n = ps\) and \(D = qt\).

In implementation, to reduce the number of hyper-parameters and accommodate the parameter dimensions, we can maintain the size of matrix \(\mathbf{A}\) as fixed by two hyper-parameters \(p\) and \(q\), i.e., \(\mathbf{A} \in {R}^{p \times q}\). Subsequently, the desired size of matrix \(\mathbf{B}\) can be directly calculated as \(s \times t\), where \(s =\frac{n}{p}\) and \(t = \frac{D}{q}\). The hyper-parameters \(p\) and \(q\) need to be divisors of \(n\) and \(D\), respectively. Since both \(\mathbf{A}\) and \(\mathbf{B}\) originate from \(\mathbf{w}\), the desired parameter length defining \(\mathbf{w}\) can be obtained as $$ \begin{equation} l = p \times q + \frac{n}{p} \times \frac{D}{q}. \end{equation} $$

Attributes:

Name Type Description
name str, default = 'hypercomplex_multiplication_reconciliation'

Name of the hypercomplex multiplication based parameter reconciliation function

p int, default = 2

Parameter sub-matrix row dimension.

q int, default = None

Parameter sub-matrix column dimension. If q is not provided with initial values, it will be assigned with value p by default.

Methods:

Name Description
__init__

It initializes the hypercomplex multiplication based parameter reconciliation function.

calculate_l

It calculates the length of required parameters for the reconciliation function.

forward

It implements the abstract forward method declared in the base reconciliation class.

Source code in tinybig/reconciliation/lowrank_reconciliation.py
class hm_reconciliation(fabrication):
    r"""
    The hypercomplex multiplication based parameter reconciliation function.

    It performs the hypercomplex multiplication based parameter reconciliation,
    and returns the reconciled parameter matrix of shape (n, D).
    This class inherits from the reconciliation class (i.e., the fabrication class in the module directory).

    ...

    Notes
    ----------
    Formally, given the parameter vector $\mathbf{w}$ of length $l$ through partitioning and subsequent reshaping,
    we can create two parameter sub-matrices $\mathbf{A} \in R^{p \times q}$ and $\mathbf{B} \in R^{s \times t}$.
    The hypercomplex multiplication-based reconciliation computes the Kronecker product of these two parameter matrices
    to define the reconcilied parameter matrix of shape (n, D) as follows:
    $$
        \begin{equation}
            \psi(\mathbf{w}) = \mathbf{A} \otimes \mathbf{B} \in {R}^{n \times D},
        \end{equation}
    $$
    where the parameter dimension parameters should meeting the constraints that $l = pq + st$ and $n = ps$ and $D = qt$.

    In implementation, to reduce the number of hyper-parameters and accommodate the parameter dimensions,
    we can maintain the size of matrix $\mathbf{A}$ as fixed by two hyper-parameters $p$ and $q$, i.e.,
    $\mathbf{A} \in {R}^{p \times q}$.
    Subsequently, the desired size of matrix $\mathbf{B}$ can be directly calculated as $s \times t$,
    where $s =\frac{n}{p}$ and $t = \frac{D}{q}$.
    The hyper-parameters $p$ and $q$ need to be divisors of $n$ and $D$, respectively.
    Since both $\mathbf{A}$ and $\mathbf{B}$ originate from $\mathbf{w}$, the desired parameter length defining
    $\mathbf{w}$ can be obtained as
    $$
        \begin{equation}
            l = p \times q + \frac{n}{p} \times \frac{D}{q}.
        \end{equation}
    $$

    Attributes
    ----------
    name: str, default = 'hypercomplex_multiplication_reconciliation'
        Name of the hypercomplex multiplication based parameter reconciliation function
    p: int, default = 2
        Parameter sub-matrix row dimension.
    q: int, default = None
        Parameter sub-matrix column dimension.
        If q is not provided with initial values, it will be assigned with value p by default.

    Methods
    ----------
    __init__
        It initializes the hypercomplex multiplication based parameter reconciliation function.

    calculate_l
        It calculates the length of required parameters for the reconciliation function.

    forward
        It implements the abstract forward method declared in the base reconciliation class.
    """
    def __init__(self, name='hypercomplex_multiplication_reconciliation', p: int = None, q: int = None, *args, **kwargs):
        """
        The initialization method of the hypercomplex multiplication based parameter reconciliation function.

        It initializes a hypercomplex multiplication based parameter reconciliation function object.
        This method will also call the initialization method of the base class as well.

        Parameters
        ----------
        name: str, default = 'hypercomplex_multiplication_reconciliation'
            Name of the hypercomplex multiplication based parameter reconciliation function
        p: int, default = 2
            Parameter sub-matrix row dimension.
        q: int, default = None
            Parameter sub-matrix column dimension.
            If q is not provided with initial values, it will be assigned with value p by default.

        Returns
        ----------
        fabrication
            The hypercomplex multiplication based parameter reconciliation function object.
        """
        super().__init__(name=name, *args, **kwargs)
        self.p = p
        self.q = q

    def calculate_l(self, n: int, D: int):
        r"""
        The required parameter number calculation method.

        It calculates the number of required learnable parameters, i.e., $l$, of the parameter reconciliation function
        based on the intermediate and output space dimensions, $n$ and $D$, and the parameters $p$ and $q$,
        which can be represented as follows:
        $$
            \begin{equation}
                l = p \times q + \frac{n}{p} \times \frac{D}{q}.
            \end{equation}
        $$

        Parameters
        ----------
        n: int
            The dimension of the output space.
        D: int
            The dimension of the intermediate expansion space.

        Returns
        -------
        int
            The number of required learnable parameters.
        """
        if self.p is None:
            self.p = find_close_factors(n)
        if self.q is None:
            self.q = find_close_factors(D)

        if n % self.p != 0 or D % self.q != 0:
            raise ValueError('The input dimensions {} and {} cannot be divided by parameter p {} and q {}'.format(n, D, self.p, self.q))

        s, t = int(n / self.p), int(D / self.q)
        assert (self.p * self.q * s * t == n * D)
        return self.p * self.q + s * t

    def forward(self, n: int, D: int, w: torch.nn.Parameter, device='cpu', *args, **kwargs):
        r"""
        The forward method of the parameter reconciliation function.

        It applies the hypercomplex multiplication based parameter reconciliation operation to the input parameter vector $\mathbf{w}$,
        and returns the reconciled parameter matrix of shape (n, D) subject to the parameters $p$ and $q$ as follows:
        $$
            \begin{equation}
                \psi(\mathbf{w}) = \mathbf{A} \otimes \mathbf{B} \in {R}^{n \times D},
            \end{equation}
        $$
        where $\mathbf{A} \in {R}^{p \times q}$ and $\mathbf{B} \in {R}^{s \times t}$ are two sub-matrices of obtained
        by partitioning $\mathbf{w}$ into two sub-vectors and subsequently reshaping them into matrices.

        Parameters
        ----------
        n: int
            The dimension of the output space.
        D: int
            The dimension of the intermediate expansion space.
        w: torch.nn.Parameter, default = None
            The learnable parameters of the model.
        device: str, default = 'cpu'
            Device to perform the parameter reconciliation.

        Returns
        ----------
        torch.Tensor
            The reconciled parameter matrix of shape (n, D).
        """
        if self.p is None:
            self.p = find_close_factors(n)
        if self.q is None:
            self.q = find_close_factors(D)

        assert w.ndim == 2 and w.size(1) == self.calculate_l(n=n, D=D)
        s, t = int(n/self.p), int(D/self.q)
        A, B = torch.split(w, [self.p*self.q, s*t], dim=1)
        return torch.einsum('pq,st->psqt', A, B).view(self.p*s, self.q*t)

__init__(name='hypercomplex_multiplication_reconciliation', p=None, q=None, *args, **kwargs)

The initialization method of the hypercomplex multiplication based parameter reconciliation function.

It initializes a hypercomplex multiplication based parameter reconciliation function object. This method will also call the initialization method of the base class as well.

Parameters:

Name Type Description Default
name

Name of the hypercomplex multiplication based parameter reconciliation function

'hypercomplex_multiplication_reconciliation'
p int

Parameter sub-matrix row dimension.

None
q int

Parameter sub-matrix column dimension. If q is not provided with initial values, it will be assigned with value p by default.

None

Returns:

Type Description
fabrication

The hypercomplex multiplication based parameter reconciliation function object.

Source code in tinybig/reconciliation/lowrank_reconciliation.py
def __init__(self, name='hypercomplex_multiplication_reconciliation', p: int = None, q: int = None, *args, **kwargs):
    """
    The initialization method of the hypercomplex multiplication based parameter reconciliation function.

    It initializes a hypercomplex multiplication based parameter reconciliation function object.
    This method will also call the initialization method of the base class as well.

    Parameters
    ----------
    name: str, default = 'hypercomplex_multiplication_reconciliation'
        Name of the hypercomplex multiplication based parameter reconciliation function
    p: int, default = 2
        Parameter sub-matrix row dimension.
    q: int, default = None
        Parameter sub-matrix column dimension.
        If q is not provided with initial values, it will be assigned with value p by default.

    Returns
    ----------
    fabrication
        The hypercomplex multiplication based parameter reconciliation function object.
    """
    super().__init__(name=name, *args, **kwargs)
    self.p = p
    self.q = q

calculate_l(n, D)

The required parameter number calculation method.

It calculates the number of required learnable parameters, i.e., \(l\), of the parameter reconciliation function based on the intermediate and output space dimensions, \(n\) and \(D\), and the parameters \(p\) and \(q\), which can be represented as follows: $$ \begin{equation} l = p \times q + \frac{n}{p} \times \frac{D}{q}. \end{equation} $$

Parameters:

Name Type Description Default
n int

The dimension of the output space.

required
D int

The dimension of the intermediate expansion space.

required

Returns:

Type Description
int

The number of required learnable parameters.

Source code in tinybig/reconciliation/lowrank_reconciliation.py
def calculate_l(self, n: int, D: int):
    r"""
    The required parameter number calculation method.

    It calculates the number of required learnable parameters, i.e., $l$, of the parameter reconciliation function
    based on the intermediate and output space dimensions, $n$ and $D$, and the parameters $p$ and $q$,
    which can be represented as follows:
    $$
        \begin{equation}
            l = p \times q + \frac{n}{p} \times \frac{D}{q}.
        \end{equation}
    $$

    Parameters
    ----------
    n: int
        The dimension of the output space.
    D: int
        The dimension of the intermediate expansion space.

    Returns
    -------
    int
        The number of required learnable parameters.
    """
    if self.p is None:
        self.p = find_close_factors(n)
    if self.q is None:
        self.q = find_close_factors(D)

    if n % self.p != 0 or D % self.q != 0:
        raise ValueError('The input dimensions {} and {} cannot be divided by parameter p {} and q {}'.format(n, D, self.p, self.q))

    s, t = int(n / self.p), int(D / self.q)
    assert (self.p * self.q * s * t == n * D)
    return self.p * self.q + s * t

forward(n, D, w, device='cpu', *args, **kwargs)

The forward method of the parameter reconciliation function.

It applies the hypercomplex multiplication based parameter reconciliation operation to the input parameter vector \(\mathbf{w}\), and returns the reconciled parameter matrix of shape (n, D) subject to the parameters \(p\) and \(q\) as follows: $$ \begin{equation} \psi(\mathbf{w}) = \mathbf{A} \otimes \mathbf{B} \in {R}^{n \times D}, \end{equation} $$ where \(\mathbf{A} \in {R}^{p \times q}\) and \(\mathbf{B} \in {R}^{s \times t}\) are two sub-matrices of obtained by partitioning \(\mathbf{w}\) into two sub-vectors and subsequently reshaping them into matrices.

Parameters:

Name Type Description Default
n int

The dimension of the output space.

required
D int

The dimension of the intermediate expansion space.

required
w Parameter

The learnable parameters of the model.

required
device

Device to perform the parameter reconciliation.

'cpu'

Returns:

Type Description
Tensor

The reconciled parameter matrix of shape (n, D).

Source code in tinybig/reconciliation/lowrank_reconciliation.py
def forward(self, n: int, D: int, w: torch.nn.Parameter, device='cpu', *args, **kwargs):
    r"""
    The forward method of the parameter reconciliation function.

    It applies the hypercomplex multiplication based parameter reconciliation operation to the input parameter vector $\mathbf{w}$,
    and returns the reconciled parameter matrix of shape (n, D) subject to the parameters $p$ and $q$ as follows:
    $$
        \begin{equation}
            \psi(\mathbf{w}) = \mathbf{A} \otimes \mathbf{B} \in {R}^{n \times D},
        \end{equation}
    $$
    where $\mathbf{A} \in {R}^{p \times q}$ and $\mathbf{B} \in {R}^{s \times t}$ are two sub-matrices of obtained
    by partitioning $\mathbf{w}$ into two sub-vectors and subsequently reshaping them into matrices.

    Parameters
    ----------
    n: int
        The dimension of the output space.
    D: int
        The dimension of the intermediate expansion space.
    w: torch.nn.Parameter, default = None
        The learnable parameters of the model.
    device: str, default = 'cpu'
        Device to perform the parameter reconciliation.

    Returns
    ----------
    torch.Tensor
        The reconciled parameter matrix of shape (n, D).
    """
    if self.p is None:
        self.p = find_close_factors(n)
    if self.q is None:
        self.q = find_close_factors(D)

    assert w.ndim == 2 and w.size(1) == self.calculate_l(n=n, D=D)
    s, t = int(n/self.p), int(D/self.q)
    A, B = torch.split(w, [self.p*self.q, s*t], dim=1)
    return torch.einsum('pq,st->psqt', A, B).view(self.p*s, self.q*t)