Skip to content

masking_reconciliation

Bases: fabrication

The masking parameter reconciliation function.

It performs the masking parameter reconciliation, and returns the masked parameter matrix of shape (n, D). This class inherits from the reconciliation class (i.e., the fabrication class in the module directory).

...

Notes

To mitigate the identified limitation of identity parameter reconciliation function, the masking parameter reconciliation function curtail the count of learnable parameters in \(\mathbf{W}\) to a reduced number of \(l\) via a randomly generated masking matrix \(\mathbf{M}\) as follows: $$ \begin{equation} \psi({\mathbf{w}}) = (\mathbf{M} \odot \text{reshape}(\mathbf{w})) = (\mathbf{M} \odot \mathbf{W}) \in {R}^{n \times D}, \end{equation} $$ where the term \(\mathbf{M} \in \{0, 1\}^{n \times D}\) denotes the binary masking matrix only with \(l\) non-zero entries and \(\odot\) denotes the element-wise product operator.

To facilitate practical implementation, instead of pre-define the parameter dimension \(l\), the masking reconciliation function uses the masking ratio \(p\) as a parameter of the masking based reconciliation function instead. This parameter, in conjunction with the output dimensions \(n \times D\), computes the requisite parameter vector dimension, shown as follows: $$ \begin{equation} l = p \times n \times D, \end{equation} $$ where the masking ratio takes value from \(p \in [0, 1]\). For masking_ratio p=1.0: all parameters are used; while masking_ratio p=0.0: no parameters will be used.

Attributes:

Name Type Description
name str, default = 'masking_reconciliation'

Name of the parameter reconciliation function

p float, default = 0.5

The masking ratio of elements in the parameter matrix, which denotes the percentage of used parameter, e.g., masking_ratio p=1.0: all parameters are used; masking_ratio p=0.0: no parameters will be used.

fixed_mask bool, default = True

Whether the masking matrix is fixed for all inputs or not.

Methods:

Name Description
__init__

It initializes the parameter reconciliation function.

calculate_l

It calculates the length of required parameters.

forward

It implements the abstract forward method declared in the base reconciliation class.

Source code in tinybig/reconciliation/basic_reconciliation.py
class masking_reconciliation(fabrication):
    r"""
    The masking parameter reconciliation function.

    It performs the masking parameter reconciliation, and returns the masked parameter matrix of shape (n, D).
    This class inherits from the reconciliation class (i.e., the fabrication class in the module directory).

    ...

    Notes
    ----------
    To mitigate the identified limitation of identity parameter reconciliation function, the masking parameter
    reconciliation function curtail the count of learnable parameters in $\mathbf{W}$ to a reduced number of $l$
    via a randomly generated masking matrix $\mathbf{M}$ as follows:
    $$
        \begin{equation}
            \psi({\mathbf{w}}) = (\mathbf{M} \odot \text{reshape}(\mathbf{w})) = (\mathbf{M} \odot \mathbf{W}) \in {R}^{n \times D},
        \end{equation}
    $$
    where the term $\mathbf{M} \in \{0, 1\}^{n \times D}$ denotes the binary masking matrix only with $l$
    non-zero entries and $\odot$ denotes the element-wise product operator.

    To facilitate practical implementation, instead of pre-define the parameter dimension $l$, the masking reconciliation
    function uses the masking ratio $p$ as a parameter of the masking based reconciliation function instead.
    This parameter, in conjunction with the output dimensions $n \times D$, computes the requisite parameter vector
    dimension, shown as follows:
    $$
        \begin{equation}
            l = p \times n \times D,
        \end{equation}
    $$
    where the masking ratio takes value from $p \in [0, 1]$. For masking_ratio p=1.0: all parameters are used;
    while masking_ratio p=0.0: no parameters will be used.


    Attributes
    ----------
    name: str, default = 'masking_reconciliation'
        Name of the parameter reconciliation function
    p: float, default = 0.5
        The masking ratio of elements in the parameter matrix, which denotes the percentage of used parameter,
        e.g., masking_ratio p=1.0: all parameters are used; masking_ratio p=0.0: no parameters will be used.
    fixed_mask: bool, default = True
        Whether the masking matrix is fixed for all inputs or not.

    Methods
    ----------
    __init__
        It initializes the parameter reconciliation function.

    calculate_l
        It calculates the length of required parameters.

    forward
        It implements the abstract forward method declared in the base reconciliation class.
    """
    def __init__(self, name='masking_reconciliation', p=0.5, fixed_mask: bool = True, *args, **kwargs):
        """
        The initialization method of the masking parameter reconciliation function.

        It initializes a masking parameter reconciliation function object.
        This method will also call the initialization method of the base class as well.

        Parameters
        ----------
        name: str, default = 'masking_reconciliation'
            Name of the parameter reconciliation function.
        p: float, default = 0.5
            The masking ratio of elements in the parameter matrix, which denotes the percentage of used parameter,
            e.g., masking_ratio p=1.0: all parameters are used; masking_ratio p=0.0: no parameters will be used.
        fixed_mask: bool, default = True
            Whether the masking matrix is fixed for all inputs or not.

        Returns
        ----------
        object
            The masking parameter reconciliation function object.
        """
        super().__init__(name=name, *args, **kwargs)
        self.p = p
        self.mask_matrix = None
        self.fixed_mask = fixed_mask

    def calculate_l(self, n: int, D: int):
        r"""
        The required parameter number calculation method.

        It calculates the number of required learnable parameters, i.e., $l$, of the parameter reconciliation function
        based on the intermediate and output space dimensions, $n$ and $D$, and masking ratio parameter $p$,
        which can be represented as follows:
        $$
            \begin{equation}
                l = p \times n \times D.
            \end{equation}
        $$

        Parameters
        ----------
        n: int
            The dimension of the output space.
        D: int
            The dimension of the intermediate expansion space.

        Returns
        -------
        int
            The number of required learnable parameters.
        """
        return n * D

    def generate_masking_matrix(self, n, D):
        """
        The masking matrix generation method.

        It generates the masking matrix of shape (n, D) subject to the masking ratio parameter $p$.
        The method first randomly generates a matrix of shape (n, D) and then compares it with parameter $p$
        to define the binary masking matrix.

        Parameters
        ----------
        n: int
            The dimension of the output space.
        D: int
            The dimension of the intermediate expansion space.

        Returns
        -------
        torch.Tensor
            The binary masking matrix of shape (n, D).
        """
        self.mask_matrix = torch.rand(n, D) < self.p

    def forward(self, n: int, D: int, w: torch.nn.Parameter, device='cpu', *args, **kwargs):
        r"""
        The forward method of the parameter reconciliation function.

        It applies the masking parameter reconciliation operation to the input parameter vector,
        and returns the reconciled parameter matrix of shape (n, D) subject to the masking ratio $p$ as follows:
        $$
            \begin{equation}
                \psi({\mathbf{w}}) = (\mathbf{M} \odot \text{reshape}(\mathbf{w})) = (\mathbf{M} \odot \mathbf{W}) \in {R}^{n \times D},
            \end{equation}
        $$

        Parameters
        ----------
        n: int
            The dimension of the output space.
        D: int
            The dimension of the intermediate expansion space.
        w: torch.nn.Parameter, default = None
            The learnable parameters of the model.
        device: str, default = 'cpu'
            Device to perform the parameter reconciliation.

        Returns
        ----------
        torch.Tensor
            The reconciled parameter matrix of shape (n, D).
        """
        if not self.fixed_mask:
            self.generate_masking_matrix(n=n, D=D)
        else:
            if self.mask_matrix is None:
                self.generate_masking_matrix(n=n, D=D)
        return w.view(n, D) * self.mask_matrix.to(device)

__init__(name='masking_reconciliation', p=0.5, fixed_mask=True, *args, **kwargs)

The initialization method of the masking parameter reconciliation function.

It initializes a masking parameter reconciliation function object. This method will also call the initialization method of the base class as well.

Parameters:

Name Type Description Default
name

Name of the parameter reconciliation function.

'masking_reconciliation'
p

The masking ratio of elements in the parameter matrix, which denotes the percentage of used parameter, e.g., masking_ratio p=1.0: all parameters are used; masking_ratio p=0.0: no parameters will be used.

0.5
fixed_mask bool

Whether the masking matrix is fixed for all inputs or not.

True

Returns:

Type Description
object

The masking parameter reconciliation function object.

Source code in tinybig/reconciliation/basic_reconciliation.py
def __init__(self, name='masking_reconciliation', p=0.5, fixed_mask: bool = True, *args, **kwargs):
    """
    The initialization method of the masking parameter reconciliation function.

    It initializes a masking parameter reconciliation function object.
    This method will also call the initialization method of the base class as well.

    Parameters
    ----------
    name: str, default = 'masking_reconciliation'
        Name of the parameter reconciliation function.
    p: float, default = 0.5
        The masking ratio of elements in the parameter matrix, which denotes the percentage of used parameter,
        e.g., masking_ratio p=1.0: all parameters are used; masking_ratio p=0.0: no parameters will be used.
    fixed_mask: bool, default = True
        Whether the masking matrix is fixed for all inputs or not.

    Returns
    ----------
    object
        The masking parameter reconciliation function object.
    """
    super().__init__(name=name, *args, **kwargs)
    self.p = p
    self.mask_matrix = None
    self.fixed_mask = fixed_mask

calculate_l(n, D)

The required parameter number calculation method.

It calculates the number of required learnable parameters, i.e., \(l\), of the parameter reconciliation function based on the intermediate and output space dimensions, \(n\) and \(D\), and masking ratio parameter \(p\), which can be represented as follows: $$ \begin{equation} l = p \times n \times D. \end{equation} $$

Parameters:

Name Type Description Default
n int

The dimension of the output space.

required
D int

The dimension of the intermediate expansion space.

required

Returns:

Type Description
int

The number of required learnable parameters.

Source code in tinybig/reconciliation/basic_reconciliation.py
def calculate_l(self, n: int, D: int):
    r"""
    The required parameter number calculation method.

    It calculates the number of required learnable parameters, i.e., $l$, of the parameter reconciliation function
    based on the intermediate and output space dimensions, $n$ and $D$, and masking ratio parameter $p$,
    which can be represented as follows:
    $$
        \begin{equation}
            l = p \times n \times D.
        \end{equation}
    $$

    Parameters
    ----------
    n: int
        The dimension of the output space.
    D: int
        The dimension of the intermediate expansion space.

    Returns
    -------
    int
        The number of required learnable parameters.
    """
    return n * D

forward(n, D, w, device='cpu', *args, **kwargs)

The forward method of the parameter reconciliation function.

It applies the masking parameter reconciliation operation to the input parameter vector, and returns the reconciled parameter matrix of shape (n, D) subject to the masking ratio \(p\) as follows: $$ \begin{equation} \psi({\mathbf{w}}) = (\mathbf{M} \odot \text{reshape}(\mathbf{w})) = (\mathbf{M} \odot \mathbf{W}) \in {R}^{n \times D}, \end{equation} $$

Parameters:

Name Type Description Default
n int

The dimension of the output space.

required
D int

The dimension of the intermediate expansion space.

required
w Parameter

The learnable parameters of the model.

required
device

Device to perform the parameter reconciliation.

'cpu'

Returns:

Type Description
Tensor

The reconciled parameter matrix of shape (n, D).

Source code in tinybig/reconciliation/basic_reconciliation.py
def forward(self, n: int, D: int, w: torch.nn.Parameter, device='cpu', *args, **kwargs):
    r"""
    The forward method of the parameter reconciliation function.

    It applies the masking parameter reconciliation operation to the input parameter vector,
    and returns the reconciled parameter matrix of shape (n, D) subject to the masking ratio $p$ as follows:
    $$
        \begin{equation}
            \psi({\mathbf{w}}) = (\mathbf{M} \odot \text{reshape}(\mathbf{w})) = (\mathbf{M} \odot \mathbf{W}) \in {R}^{n \times D},
        \end{equation}
    $$

    Parameters
    ----------
    n: int
        The dimension of the output space.
    D: int
        The dimension of the intermediate expansion space.
    w: torch.nn.Parameter, default = None
        The learnable parameters of the model.
    device: str, default = 'cpu'
        Device to perform the parameter reconciliation.

    Returns
    ----------
    torch.Tensor
        The reconciled parameter matrix of shape (n, D).
    """
    if not self.fixed_mask:
        self.generate_masking_matrix(n=n, D=D)
    else:
        if self.mask_matrix is None:
            self.generate_masking_matrix(n=n, D=D)
    return w.view(n, D) * self.mask_matrix.to(device)

generate_masking_matrix(n, D)

The masking matrix generation method.

It generates the masking matrix of shape (n, D) subject to the masking ratio parameter \(p\). The method first randomly generates a matrix of shape (n, D) and then compares it with parameter \(p\) to define the binary masking matrix.

Parameters:

Name Type Description Default
n

The dimension of the output space.

required
D

The dimension of the intermediate expansion space.

required

Returns:

Type Description
Tensor

The binary masking matrix of shape (n, D).

Source code in tinybig/reconciliation/basic_reconciliation.py
def generate_masking_matrix(self, n, D):
    """
    The masking matrix generation method.

    It generates the masking matrix of shape (n, D) subject to the masking ratio parameter $p$.
    The method first randomly generates a matrix of shape (n, D) and then compares it with parameter $p$
    to define the binary masking matrix.

    Parameters
    ----------
    n: int
        The dimension of the output space.
    D: int
        The dimension of the intermediate expansion space.

    Returns
    -------
    torch.Tensor
        The binary masking matrix of shape (n, D).
    """
    self.mask_matrix = torch.rand(n, D) < self.p