dual_lphm_reconciliation

Bases: fabrication

The dual low-rank parameterized hypercomplex multiplication (Dual-LPHM) based parameter reconciliation function.

It performs the Dual-LPHM parameter reconciliation, and returns the Dual-LPHM reconciled parameter matrix of shape (n, D). This class inherits from the reconciliation class (i.e., the fabrication class in the module directory).

The dual low-rank parameterized hypercomplex multiplication based parameter reconciliation can be viewed as a more agreesive version of the LPHM based parameter reconciliation function. It replaces both $\mathbf{A}$ and $\mathbf{B}$ in the hypercomplex multiplication based parameter reconciliation with the products of two low-rank sub-matrices, respectively.

...

Notes

Formally, given the parameter vector $\mathbf{w} \in {R}^{l}$ and a rank hyper-parameter $r$, together with the parameter sub-matrix dimension parameters $p$ and $q$, the Dual-LPHM reconciliation function partitions $\mathbf{w}$ into four sub-vectors and subsequently reshapes them into three matrices $\mathbf{P} \in {R}^{p \times r}$, $\mathbf{Q} \in {R}^{q \times r}$, $\mathbf{S} \in {R}^{\frac{n}{p} \times r}$ and $\mathbf{T} \in {R}^{\frac{D}{q} \times r}$. These sub-matrices $\mathbf{P}$, $\mathbf{Q}$, $\mathbf{S}$ and $\mathbf{T}$ help define the Dual-LPHM reconciliation function as follows: $$ \begin{equation} \psi(\mathbf{w}) = \mathbf{A} \otimes \mathbf{B} = ( \mathbf{P} \mathbf{Q}^\top) \otimes ( \mathbf{S} \mathbf{T}^\top) \in {R}^{n \times D}. \end{equation} $$ This necessitates imposing certain limitations on these dimension and rank parameters, and the parameter vector length $l$ can be calculated as follows: $$ \begin{equation} l = r( p + q + \frac{n}{p} + \frac{D}{q} ). \end{equation} $$

For the Dual-LPHM parameter reconciliation function, it adds strict constraints on the parameters $p$ and $q$, which should be the divisors of the target dimensions $n$ and $D$, respectively, i.e., $$ \begin{equation} n \% p = 0 \text{, and } D \% q = 0. \end{equation} $$

Attributes:

Name	Type	Description
`name`	`str, default = 'dual_lphm_reconciliation'`	Name of the Dual-LPHM parameter reconciliation function
`p`	`int, default = 2`	Parameter sub-matrix row dimension.
`q`	`int, default = None`	Parameter sub-matrix column dimension. If q is not provided with initial values, it will be assigned with value p by default.
`r`	`int, default = 2`	Submatrix rank parameter.

Methods:

Name	Description
`__init__`	It initializes the Dual-LPHM parameter reconciliation function.
`calculate_l`	It calculates the length of required parameters for the reconciliation function.
`forward`	It implements the abstract forward method declared in the base reconciliation class.

Source code in tinybig/reconciliation/lowrank_reconciliation.py

class dual_lphm_reconciliation(fabrication):
    r"""
    The dual low-rank parameterized hypercomplex multiplication (Dual-LPHM) based parameter reconciliation function.

    It performs the Dual-LPHM parameter reconciliation, and returns the Dual-LPHM reconciled parameter matrix of shape (n, D).
    This class inherits from the reconciliation class (i.e., the fabrication class in the module directory).

    The dual low-rank parameterized hypercomplex multiplication based parameter reconciliation can be viewed as a more
    agreesive version of the LPHM based parameter reconciliation function.
    It replaces both $\mathbf{A}$ and $\mathbf{B}$ in the hypercomplex multiplication based parameter reconciliation
    with the products of two low-rank sub-matrices, respectively.

    ...

    Notes
    ----------
    Formally, given the parameter vector $\mathbf{w} \in {R}^{l}$ and a rank hyper-parameter $r$, together with the
    parameter sub-matrix dimension parameters $p$ and $q$, the Dual-LPHM reconciliation function partitions $\mathbf{w}$
    into four sub-vectors and subsequently reshapes them into three matrices $\mathbf{P} \in {R}^{p \times r}$,
    $\mathbf{Q} \in {R}^{q \times r}$, $\mathbf{S} \in {R}^{\frac{n}{p} \times r}$ and $\mathbf{T} \in {R}^{\frac{D}{q} \times r}$.
    These sub-matrices $\mathbf{P}$, $\mathbf{Q}$, $\mathbf{S}$ and $\mathbf{T}$ help define the Dual-LPHM reconciliation function as follows:
    $$
        \begin{equation}
            \psi(\mathbf{w}) = \mathbf{A} \otimes \mathbf{B} = ( \mathbf{P} \mathbf{Q}^\top) \otimes ( \mathbf{S} \mathbf{T}^\top) \in {R}^{n \times D}.
        \end{equation}
    $$
    This necessitates imposing certain limitations on these dimension and rank parameters, and the parameter vector
    length $l$ can be calculated as follows:
    $$
        \begin{equation}
            l = r( p + q + \frac{n}{p} + \frac{D}{q} ).
        \end{equation}
    $$

    For the Dual-LPHM parameter reconciliation function, it adds strict constraints on the parameters $p$ and $q$, which
    should be the divisors of the target dimensions $n$ and $D$, respectively, i.e.,
    $$
        \begin{equation}
            n \\% p = 0 \text{, and } D \\% q = 0.
        \end{equation}
    $$

    Attributes
    ----------
    name: str, default = 'dual_lphm_reconciliation'
        Name of the Dual-LPHM parameter reconciliation function
    p: int, default = 2
        Parameter sub-matrix row dimension.
    q: int, default = None
        Parameter sub-matrix column dimension.
        If q is not provided with initial values, it will be assigned with value p by default.
    r: int, default = 2
        Submatrix rank parameter.

    Methods
    ----------
    __init__
        It initializes the Dual-LPHM parameter reconciliation function.

    calculate_l
        It calculates the length of required parameters for the reconciliation function.

    forward
        It implements the abstract forward method declared in the base reconciliation class.
    """
    def __init__(self, name='dual_lphm_reconciliation', p: int = None, q: int = None, r=2, *args, **kwargs):
        """
        The initialization method of the Dual-LPHM parameter reconciliation function.

        It initializes a Dual-LPHM parameter reconciliation function object.
        This method will also call the initialization method of the base class as well.

        Parameters
        ----------
        name: str, default = 'dual_lphm_reconciliation'
            Name of the Dual-LPHM parameter reconciliation function.
        p: int, default = 2
            Parameter sub-matrix row dimension.
        q: int, default = None
            Parameter sub-matrix column dimension.
            If q is not provided with initial values, it will be assigned with value p by default.
        r: int, default = 2
            Submatrix rank parameter.

        Returns
        ----------
        fabrication
            The Dual-LPHM parameter reconciliation function object.
        """
        super().__init__(name=name, *args, **kwargs)
        self.p = p
        self.q = q
        self.r = r

    def calculate_l(self, n: int, D: int):
        r"""
        The required parameter number calculation method.

        It calculates the number of required learnable parameters, i.e., $l$, of the parameter reconciliation function
        based on the intermediate and output space dimensions, $n$ and $D$, and the dimension and rank parameters
        $p$, $q$ and $r$, which can be represented as follows:
        $$
            \begin{equation}
                l = r( p + q + \frac{n}{p} + \frac{D}{q} ).
            \end{equation}
        $$

        Notes
        ----------
        For the Dual-LPHM parameter reconciliation function, it adds strict constraints on the parameters $p$ and $q$, which
        should be the divisors of the target dimensions $n$ and $D$, respectively, i.e.,
        $$
            \begin{equation}
                n \\% p = 0 \text{, and } D \\% q = 0.
            \end{equation}
        $$

        Parameters
        ----------
        n: int
            The dimension of the output space.
        D: int
            The dimension of the intermediate expansion space.

        Returns
        -------
        int
            The number of required learnable parameters.
        """
        if self.p is None:
            self.p = find_close_factors(n)
        if self.q is None:
            self.q = find_close_factors(D)

        if n % self.p != 0 or D % self.q != 0:
            raise ValueError('The input dimensions {} and {} cannot be divided by parameter p {} and q {}'.format(n, D, self.p, self.q))
        s, t = int(n / self.p), int(D / self.q)
        assert (self.p * self.q * s * t == n * D)
        return self.p * self.r + self.q * self.r + s * self.r + t * self.r

    def forward(self, n: int, D: int, w: torch.nn.Parameter, device='cpu', *args, **kwargs):
        r"""
        The forward method of the parameter reconciliation function.

        It applies the Dual-LPHM parameter reconciliation operation to the input parameter vector $\mathbf{w}$,
        and returns the reconciled parameter matrix of shape (n, D) subject to the dimension and rank parameters
        $p$, $q$ and $r$ as follows:
        $$
            \begin{equation}
                \psi(\mathbf{w}) = \mathbf{A} \otimes \mathbf{B} = ( \mathbf{P} \mathbf{Q}^\top) \otimes ( \mathbf{S} \mathbf{T}^\top) \in {R}^{n \times D}.
            \end{equation}
        $$
        where $\mathbf{P} \in {R}^{p \times r}$, $\mathbf{Q} \in {R}^{q \times r}$, $\mathbf{S} \in {R}^{\frac{n}{p} \times r}$ and
        $\mathbf{T} \in {R}^{\frac{D}{q} \times r}$ are all obtained by partitioning $\mathbf{w}$ into sub-vectors
        and subsequently reshaping them into matrices.

        Parameters
        ----------
        n: int
            The dimension of the output space.
        D: int
            The dimension of the intermediate expansion space.
        w: torch.nn.Parameter, default = None
            The learnable parameters of the model.
        device: str, default = 'cpu'
            Device to perform the parameter reconciliation.

        Returns
        ----------
        torch.Tensor
            The reconciled parameter matrix of shape (n, D).
        """
        if self.p is None:
            self.p = find_close_factors(n)
        if self.q is None:
            self.q = find_close_factors(D)

        assert w.ndim == 2 and w.size(1) == self.calculate_l(n=n, D=D)
        s, t = int(n/self.p), int(D/self.q)
        P, Q, S, T = torch.split(w, [self.p*self.r, self.q*self.r, s*self.r, t*self.r], dim=1)
        A = torch.matmul(P.view(self.p, -1), Q.view(-1, self.q)).view(1, -1)
        B = torch.matmul(S.view(s, -1), T.view(-1, t)).view(1, -1)
        return torch.einsum('pq,st->psqt', A, B).view(self.p*s, self.q*t)

`init(name='dual_lphm_reconciliation', p=None, q=None, r=2, *args, **kwargs)`

The initialization method of the Dual-LPHM parameter reconciliation function.

It initializes a Dual-LPHM parameter reconciliation function object. This method will also call the initialization method of the base class as well.

Parameters:

Name	Type	Description	Default
`name`		Name of the Dual-LPHM parameter reconciliation function.	`'dual_lphm_reconciliation'`
`p`	`int`	Parameter sub-matrix row dimension.	`None`
`q`	`int`	Parameter sub-matrix column dimension. If q is not provided with initial values, it will be assigned with value p by default.	`None`
`r`		Submatrix rank parameter.	`2`

Returns:

Type	Description
`fabrication`	The Dual-LPHM parameter reconciliation function object.

Source code in tinybig/reconciliation/lowrank_reconciliation.py

def __init__(self, name='dual_lphm_reconciliation', p: int = None, q: int = None, r=2, *args, **kwargs):
    """
    The initialization method of the Dual-LPHM parameter reconciliation function.

    It initializes a Dual-LPHM parameter reconciliation function object.
    This method will also call the initialization method of the base class as well.

    Parameters
    ----------
    name: str, default = 'dual_lphm_reconciliation'
        Name of the Dual-LPHM parameter reconciliation function.
    p: int, default = 2
        Parameter sub-matrix row dimension.
    q: int, default = None
        Parameter sub-matrix column dimension.
        If q is not provided with initial values, it will be assigned with value p by default.
    r: int, default = 2
        Submatrix rank parameter.

    Returns
    ----------
    fabrication
        The Dual-LPHM parameter reconciliation function object.
    """
    super().__init__(name=name, *args, **kwargs)
    self.p = p
    self.q = q
    self.r = r

`calculate_l(n, D)`

The required parameter number calculation method.

It calculates the number of required learnable parameters, i.e., $l$, of the parameter reconciliation function based on the intermediate and output space dimensions, $n$ and $D$, and the dimension and rank parameters $p$, $q$ and $r$, which can be represented as follows: $$ \begin{equation} l = r( p + q + \frac{n}{p} + \frac{D}{q} ). \end{equation} $$

Notes

For the Dual-LPHM parameter reconciliation function, it adds strict constraints on the parameters $p$ and $q$, which should be the divisors of the target dimensions $n$ and $D$, respectively, i.e., $$ \begin{equation} n \% p = 0 \text{, and } D \% q = 0. \end{equation} $$

Parameters:

Name	Type	Description	Default
`n`	`int`	The dimension of the output space.	required
`D`	`int`	The dimension of the intermediate expansion space.	required

Returns:

Type	Description
`int`	The number of required learnable parameters.

Source code in tinybig/reconciliation/lowrank_reconciliation.py

def calculate_l(self, n: int, D: int):
    r"""
    The required parameter number calculation method.

    It calculates the number of required learnable parameters, i.e., $l$, of the parameter reconciliation function
    based on the intermediate and output space dimensions, $n$ and $D$, and the dimension and rank parameters
    $p$, $q$ and $r$, which can be represented as follows:
    $$
        \begin{equation}
            l = r( p + q + \frac{n}{p} + \frac{D}{q} ).
        \end{equation}
    $$

    Notes
    ----------
    For the Dual-LPHM parameter reconciliation function, it adds strict constraints on the parameters $p$ and $q$, which
    should be the divisors of the target dimensions $n$ and $D$, respectively, i.e.,
    $$
        \begin{equation}
            n \\% p = 0 \text{, and } D \\% q = 0.
        \end{equation}
    $$

    Parameters
    ----------
    n: int
        The dimension of the output space.
    D: int
        The dimension of the intermediate expansion space.

    Returns
    -------
    int
        The number of required learnable parameters.
    """
    if self.p is None:
        self.p = find_close_factors(n)
    if self.q is None:
        self.q = find_close_factors(D)

    if n % self.p != 0 or D % self.q != 0:
        raise ValueError('The input dimensions {} and {} cannot be divided by parameter p {} and q {}'.format(n, D, self.p, self.q))
    s, t = int(n / self.p), int(D / self.q)
    assert (self.p * self.q * s * t == n * D)
    return self.p * self.r + self.q * self.r + s * self.r + t * self.r

`forward(n, D, w, device='cpu', *args, **kwargs)`

The forward method of the parameter reconciliation function.

It applies the Dual-LPHM parameter reconciliation operation to the input parameter vector $\mathbf{w}$, and returns the reconciled parameter matrix of shape (n, D) subject to the dimension and rank parameters $p$, $q$ and $r$ as follows: $$ \begin{equation} \psi(\mathbf{w}) = \mathbf{A} \otimes \mathbf{B} = ( \mathbf{P} \mathbf{Q}^\top) \otimes ( \mathbf{S} \mathbf{T}^\top) \in {R}^{n \times D}. \end{equation} $$ where $\mathbf{P} \in {R}^{p \times r}$, $\mathbf{Q} \in {R}^{q \times r}$, $\mathbf{S} \in {R}^{\frac{n}{p} \times r}$ and $\mathbf{T} \in {R}^{\frac{D}{q} \times r}$ are all obtained by partitioning $\mathbf{w}$ into sub-vectors and subsequently reshaping them into matrices.

Parameters:

Name	Type	Description	Default
`n`	`int`	The dimension of the output space.	required
`D`	`int`	The dimension of the intermediate expansion space.	required
`w`	`Parameter`	The learnable parameters of the model.	required
`device`		Device to perform the parameter reconciliation.	`'cpu'`

Returns:

Type	Description
`Tensor`	The reconciled parameter matrix of shape (n, D).

Source code in tinybig/reconciliation/lowrank_reconciliation.py

def forward(self, n: int, D: int, w: torch.nn.Parameter, device='cpu', *args, **kwargs):
    r"""
    The forward method of the parameter reconciliation function.

    It applies the Dual-LPHM parameter reconciliation operation to the input parameter vector $\mathbf{w}$,
    and returns the reconciled parameter matrix of shape (n, D) subject to the dimension and rank parameters
    $p$, $q$ and $r$ as follows:
    $$
        \begin{equation}
            \psi(\mathbf{w}) = \mathbf{A} \otimes \mathbf{B} = ( \mathbf{P} \mathbf{Q}^\top) \otimes ( \mathbf{S} \mathbf{T}^\top) \in {R}^{n \times D}.
        \end{equation}
    $$
    where $\mathbf{P} \in {R}^{p \times r}$, $\mathbf{Q} \in {R}^{q \times r}$, $\mathbf{S} \in {R}^{\frac{n}{p} \times r}$ and
    $\mathbf{T} \in {R}^{\frac{D}{q} \times r}$ are all obtained by partitioning $\mathbf{w}$ into sub-vectors
    and subsequently reshaping them into matrices.

    Parameters
    ----------
    n: int
        The dimension of the output space.
    D: int
        The dimension of the intermediate expansion space.
    w: torch.nn.Parameter, default = None
        The learnable parameters of the model.
    device: str, default = 'cpu'
        Device to perform the parameter reconciliation.

    Returns
    ----------
    torch.Tensor
        The reconciled parameter matrix of shape (n, D).
    """
    if self.p is None:
        self.p = find_close_factors(n)
    if self.q is None:
        self.q = find_close_factors(D)

    assert w.ndim == 2 and w.size(1) == self.calculate_l(n=n, D=D)
    s, t = int(n/self.p), int(D/self.q)
    P, Q, S, T = torch.split(w, [self.p*self.r, self.q*self.r, s*self.r, t*self.r], dim=1)
    A = torch.matmul(P.view(self.p, -1), Q.view(-1, self.q)).view(1, -1)
    B = torch.matmul(S.view(s, -1), T.view(-1, t)).view(1, -1)
    return torch.einsum('pq,st->psqt', A, B).view(self.p*s, self.q*t)

dual_lphm_reconciliation

__init__(name='dual_lphm_reconciliation', p=None, q=None, r=2, *args, **kwargs)

calculate_l(n, D)

forward(n, D, w, device='cpu', *args, **kwargs)

`init(name='dual_lphm_reconciliation', p=None, q=None, r=2, *args, **kwargs)`

`calculate_l(n, D)`

`forward(n, D, w, device='cpu', *args, **kwargs)`