duplicated_padding_reconciliation

Bases: fabrication

The duplicated padding based parameter reconciliation function.

It performs the duplicated padding based parameter reconciliation, and returns the reconciled parameter matrix of shape (n, D). This class inherits from the reconciliation class (i.e., the fabrication class in the module directory).

...

Notes

Specifically, for the parameter vector $\mathbf{w} \in {R}^{l}$ of length $l$, it can be reshaped into a matrix $\mathbf{W}$ comprising $s$ rows and $t$ columns, where $l = s \times t$. Through the multiplication of $\mathbf{W}$ with a constant matrix $\mathbf{C} \in {R}^{p \times q}$ populated with the constant value of ones, the duplicated padding based parameter reconciliation function can be defined as follows: $$ \begin{equation} \psi(\mathbf{w}) = \mathbf{C} \otimes \mathbf{W} = \begin{bmatrix} C_{1,1} \mathbf{W} & C_{1,2} \mathbf{W} & \cdots & C_{1,q} \mathbf{W} \\ C_{2,1} \mathbf{W} & C_{2,2} \mathbf{W} & \cdots & C_{2,q} \mathbf{W} \\ \vdots & \vdots & \ddots & \vdots \\ C_{p,1} \mathbf{W} & C_{p,2} \mathbf{W} & \cdots & C_{p,q} \mathbf{W} \\ \end{bmatrix} \in {R}^{ps \times qt}, \end{equation} $$ where $\mathbf{W} = \text{reshape}(\mathbf{w})$ and $\otimes$ denotes the Kronecker product operator. The output dimensions should meet the constraints that $p \times s = n$ and $q \times t = D$, where renders $s = \frac{n}{p}$ and $t = \frac{D}{q}$.

For the duplicated padding based parameter reconciliation, the number of required parameter $l$ is defined as $$ \begin{equation} l= s \times t = \frac{n \times D}{pq}, \end{equation} $$ where $p$ and $q$ are the duplication numbers in the row and column, respectively.

Attributes:

Name	Type	Description
`name`	`str, default = 'duplicated_padding_reconciliation'`	Name of the parameter reconciliation function
`p`	`int, default = 2`	Duplication times in the rows.
`q`	`int, default = None`	Duplication times in the columns. If q is not provided with initial values, it will be assigned with value p as well by default.

Methods:

Name	Description
`__init__`	It initializes the parameter reconciliation function.
`calculate_l`	It calculates the length of required parameters.
`forward`	It implements the abstract forward method declared in the base reconciliation class.

Source code in tinybig/reconciliation/basic_reconciliation.py

class duplicated_padding_reconciliation(fabrication):
    r"""
    The duplicated padding based parameter reconciliation function.

    It performs the duplicated padding based parameter reconciliation, and returns the reconciled parameter matrix of shape (n, D).
    This class inherits from the reconciliation class (i.e., the fabrication class in the module directory).

    ...

    Notes
    ----------
    Specifically, for the parameter vector $\mathbf{w} \in {R}^{l}$ of length $l$,
    it can be reshaped into a matrix $\mathbf{W}$ comprising $s$ rows and $t$ columns, where $l = s \times t$.
    Through the multiplication of $\mathbf{W}$ with a constant matrix $\mathbf{C} \in {R}^{p \times q}$
    populated with the constant value of ones, the **duplicated padding based parameter reconciliation** function
    can be defined as follows:
    $$
        \begin{equation}
            \psi(\mathbf{w}) = \mathbf{C} \otimes \mathbf{W} =  \begin{bmatrix}
                                                                C_{1,1} \mathbf{W} & C_{1,2} \mathbf{W}      & \cdots & C_{1,q} \mathbf{W}      \\\\
                                                                C_{2,1} \mathbf{W} & C_{2,2} \mathbf{W}      & \cdots & C_{2,q} \mathbf{W}      \\\\
                                                                \vdots & \vdots & \ddots & \vdots \\\\
                                                                C_{p,1} \mathbf{W} & C_{p,2} \mathbf{W}      & \cdots & C_{p,q} \mathbf{W}      \\\\
                                                                \end{bmatrix} \in {R}^{ps \times qt},
        \end{equation}
    $$
    where $\mathbf{W} = \text{reshape}(\mathbf{w})$ and $\otimes$ denotes the Kronecker product operator.
    The output dimensions should meet the constraints that $p \times s = n$ and $q \times t = D$, where renders
    $s = \frac{n}{p}$ and $t = \frac{D}{q}$.

    For the duplicated padding based parameter reconciliation, the number of required parameter $l$ is defined as
    $$
        \begin{equation}
            l= s \times t = \frac{n \times D}{pq},
        \end{equation}
    $$
    where $p$ and $q$ are the duplication numbers in the row and column, respectively.

    Attributes
    ----------
    name: str, default = 'duplicated_padding_reconciliation'
        Name of the parameter reconciliation function
    p: int, default = 2
        Duplication times in the rows.
    q: int, default = None
        Duplication times in the columns.
        If q is not provided with initial values, it will be assigned with value p as well by default.

    Methods
    ----------
    __init__
        It initializes the parameter reconciliation function.

    calculate_l
        It calculates the length of required parameters.

    forward
        It implements the abstract forward method declared in the base reconciliation class.
    """
    def __init__(self, name='duplicated_padding_reconciliation', p=2, q=None, *args, **kwargs):
        """
        The initialization method of the duplicated padding based parameter reconciliation function.

        It initializes a duplicated padding based parameter reconciliation function object.
        This method will also call the initialization method of the base class as well.

        Parameters
        ----------
        name: str, default = 'duplicated_padding_reconciliation'
            Name of the parameter reconciliation function
        p: int, default = 2
            Duplication times in the rows.
        q: int, default = None
            Duplication times in the columns.
            If q is not provided with initial values, it will be assigned with value p by default.

        Returns
        ----------
        fabrication
            The masking parameter reconciliation function object.
        """
        super().__init__(name=name, *args, **kwargs)
        self.p = p
        self.q = q if q is not None else p

    def calculate_l(self, n: int, D: int):
        r"""
        The required parameter number calculation method.

        It calculates the number of required learnable parameters, i.e., $l$, of the parameter reconciliation function
        based on the intermediate and output space dimensions, $n$ and $D$, and duplication parameters $p$ and $q$,
        which can be represented as follows:
        $$
            \begin{equation}
                l= s \times t = \frac{n \times D}{pq}.
            \end{equation}
        $$

        Parameters
        ----------
        n: int
            The dimension of the output space.
        D: int
            The dimension of the intermediate expansion space.

        Returns
        -------
        int
            The number of required learnable parameters.
        """
        s, t = int(n/self.p), int(D/self.q)
        assert (self.p * self.q * s * t == n * D)
        return s * t

    def forward(self, n: int, D: int, w: torch.nn.Parameter, device='cpu', *args, **kwargs):
        r"""
        The forward method of the parameter reconciliation function.

        It applies the duplicated padding based parameter reconciliation operation to the input parameter vector,
        and returns the reconciled parameter matrix of shape (n, D) subject to duplication parameters $p$ and $q$ as follows:
        $$
            \begin{equation}
                \psi(\mathbf{w}) = \mathbf{C} \otimes \mathbf{W} =  \begin{bmatrix}
                                                                    C_{1,1} \mathbf{W} & C_{1,2} \mathbf{W}      & \cdots & C_{1,q} \mathbf{W}      \\\\
                                                                    C_{2,1} \mathbf{W} & C_{2,2} \mathbf{W}      & \cdots & C_{2,q} \mathbf{W}      \\\\
                                                                    \vdots & \vdots & \ddots & \vdots \\\\
                                                                    C_{p,1} \mathbf{W} & C_{p,2} \mathbf{W}      & \cdots & C_{p,q} \mathbf{W}      \\\\
                                                                    \end{bmatrix} \in {R}^{n \times D},
            \end{equation}
        $$
        where $\mathbf{W} = \text{reshape}(\mathbf{w}) \in R^{s \times t}$ and $\otimes$ denotes the Kronecker product operator.

        Parameters
        ----------
        n: int
            The dimension of the output space.
        D: int
            The dimension of the intermediate expansion space.
        w: torch.nn.Parameter, default = None
            The learnable parameters of the model.
        device: str, default = 'cpu'
            Device to perform the parameter reconciliation.

        Returns
        ----------
        torch.Tensor
            The reconciled parameter matrix of shape (n, D).
        """
        assert w.ndim == 2 and w.size(1) == self.calculate_l(n=n, D=D)
        s, t = int(n / self.p), int(D / self.q)
        A = torch.ones(self.p, self.q, device=device).view(1, -1)
        return torch.einsum('pq,st->psqt', A, w).view(self.p*s, self.q*t).to(device)

`init(name='duplicated_padding_reconciliation', p=2, q=None, *args, **kwargs)`

The initialization method of the duplicated padding based parameter reconciliation function.

It initializes a duplicated padding based parameter reconciliation function object. This method will also call the initialization method of the base class as well.

Parameters:

Name	Description	Default
`name`	Name of the parameter reconciliation function	`'duplicated_padding_reconciliation'`
`p`	Duplication times in the rows.	`2`
`q`	Duplication times in the columns. If q is not provided with initial values, it will be assigned with value p by default.	`None`

Returns:

Type	Description
`fabrication`	The masking parameter reconciliation function object.

Source code in tinybig/reconciliation/basic_reconciliation.py

def __init__(self, name='duplicated_padding_reconciliation', p=2, q=None, *args, **kwargs):
    """
    The initialization method of the duplicated padding based parameter reconciliation function.

    It initializes a duplicated padding based parameter reconciliation function object.
    This method will also call the initialization method of the base class as well.

    Parameters
    ----------
    name: str, default = 'duplicated_padding_reconciliation'
        Name of the parameter reconciliation function
    p: int, default = 2
        Duplication times in the rows.
    q: int, default = None
        Duplication times in the columns.
        If q is not provided with initial values, it will be assigned with value p by default.

    Returns
    ----------
    fabrication
        The masking parameter reconciliation function object.
    """
    super().__init__(name=name, *args, **kwargs)
    self.p = p
    self.q = q if q is not None else p

`calculate_l(n, D)`

The required parameter number calculation method.

It calculates the number of required learnable parameters, i.e., $l$, of the parameter reconciliation function based on the intermediate and output space dimensions, $n$ and $D$, and duplication parameters $p$ and $q$, which can be represented as follows: $$ \begin{equation} l= s \times t = \frac{n \times D}{pq}. \end{equation} $$

Parameters:

Name	Type	Description	Default
`n`	`int`	The dimension of the output space.	required
`D`	`int`	The dimension of the intermediate expansion space.	required

Returns:

Type	Description
`int`	The number of required learnable parameters.

Source code in tinybig/reconciliation/basic_reconciliation.py

def calculate_l(self, n: int, D: int):
    r"""
    The required parameter number calculation method.

    It calculates the number of required learnable parameters, i.e., $l$, of the parameter reconciliation function
    based on the intermediate and output space dimensions, $n$ and $D$, and duplication parameters $p$ and $q$,
    which can be represented as follows:
    $$
        \begin{equation}
            l= s \times t = \frac{n \times D}{pq}.
        \end{equation}
    $$

    Parameters
    ----------
    n: int
        The dimension of the output space.
    D: int
        The dimension of the intermediate expansion space.

    Returns
    -------
    int
        The number of required learnable parameters.
    """
    s, t = int(n/self.p), int(D/self.q)
    assert (self.p * self.q * s * t == n * D)
    return s * t

`forward(n, D, w, device='cpu', *args, **kwargs)`

The forward method of the parameter reconciliation function.

It applies the duplicated padding based parameter reconciliation operation to the input parameter vector, and returns the reconciled parameter matrix of shape (n, D) subject to duplication parameters $p$ and $q$ as follows: $$ \begin{equation} \psi(\mathbf{w}) = \mathbf{C} \otimes \mathbf{W} = \begin{bmatrix} C_{1,1} \mathbf{W} & C_{1,2} \mathbf{W} & \cdots & C_{1,q} \mathbf{W} \\ C_{2,1} \mathbf{W} & C_{2,2} \mathbf{W} & \cdots & C_{2,q} \mathbf{W} \\ \vdots & \vdots & \ddots & \vdots \\ C_{p,1} \mathbf{W} & C_{p,2} \mathbf{W} & \cdots & C_{p,q} \mathbf{W} \\ \end{bmatrix} \in {R}^{n \times D}, \end{equation} $$ where $\mathbf{W} = \text{reshape}(\mathbf{w}) \in R^{s \times t}$ and $\otimes$ denotes the Kronecker product operator.

Parameters:

Name	Type	Description	Default
`n`	`int`	The dimension of the output space.	required
`D`	`int`	The dimension of the intermediate expansion space.	required
`w`	`Parameter`	The learnable parameters of the model.	required
`device`		Device to perform the parameter reconciliation.	`'cpu'`

Returns:

Type	Description
`Tensor`	The reconciled parameter matrix of shape (n, D).

Source code in tinybig/reconciliation/basic_reconciliation.py

def forward(self, n: int, D: int, w: torch.nn.Parameter, device='cpu', *args, **kwargs):
    r"""
    The forward method of the parameter reconciliation function.

    It applies the duplicated padding based parameter reconciliation operation to the input parameter vector,
    and returns the reconciled parameter matrix of shape (n, D) subject to duplication parameters $p$ and $q$ as follows:
    $$
        \begin{equation}
            \psi(\mathbf{w}) = \mathbf{C} \otimes \mathbf{W} =  \begin{bmatrix}
                                                                C_{1,1} \mathbf{W} & C_{1,2} \mathbf{W}      & \cdots & C_{1,q} \mathbf{W}      \\\\
                                                                C_{2,1} \mathbf{W} & C_{2,2} \mathbf{W}      & \cdots & C_{2,q} \mathbf{W}      \\\\
                                                                \vdots & \vdots & \ddots & \vdots \\\\
                                                                C_{p,1} \mathbf{W} & C_{p,2} \mathbf{W}      & \cdots & C_{p,q} \mathbf{W}      \\\\
                                                                \end{bmatrix} \in {R}^{n \times D},
        \end{equation}
    $$
    where $\mathbf{W} = \text{reshape}(\mathbf{w}) \in R^{s \times t}$ and $\otimes$ denotes the Kronecker product operator.

    Parameters
    ----------
    n: int
        The dimension of the output space.
    D: int
        The dimension of the intermediate expansion space.
    w: torch.nn.Parameter, default = None
        The learnable parameters of the model.
    device: str, default = 'cpu'
        Device to perform the parameter reconciliation.

    Returns
    ----------
    torch.Tensor
        The reconciled parameter matrix of shape (n, D).
    """
    assert w.ndim == 2 and w.size(1) == self.calculate_l(n=n, D=D)
    s, t = int(n / self.p), int(D / self.q)
    A = torch.ones(self.p, self.q, device=device).view(1, -1)
    return torch.einsum('pq,st->psqt', A, w).view(self.p*s, self.q*t).to(device)

duplicated_padding_reconciliation

__init__(name='duplicated_padding_reconciliation', p=2, q=None, *args, **kwargs)

calculate_l(n, D)

forward(n, D, w, device='cpu', *args, **kwargs)

`init(name='duplicated_padding_reconciliation', p=2, q=None, *args, **kwargs)`

`calculate_l(n, D)`

`forward(n, D, w, device='cpu', *args, **kwargs)`