masking_reconciliation

Bases: fabrication

The masking parameter reconciliation function.

It performs the masking parameter reconciliation, and returns the masked parameter matrix of shape (n, D). This class inherits from the reconciliation class (i.e., the fabrication class in the module directory).

...

Notes

To mitigate the identified limitation of identity parameter reconciliation function, the masking parameter reconciliation function curtail the count of learnable parameters in $\mathbf{W}$ to a reduced number of $l$ via a randomly generated masking matrix $\mathbf{M}$ as follows: $$ \begin{equation} \psi({\mathbf{w}}) = (\mathbf{M} \odot \text{reshape}(\mathbf{w})) = (\mathbf{M} \odot \mathbf{W}) \in {R}^{n \times D}, \end{equation} $$ where the term $\mathbf{M} \in \{0, 1\}^{n \times D}$ denotes the binary masking matrix only with $l$ non-zero entries and $\odot$ denotes the element-wise product operator.

To facilitate practical implementation, instead of pre-define the parameter dimension $l$, the masking reconciliation function uses the masking ratio $p$ as a parameter of the masking based reconciliation function instead. This parameter, in conjunction with the output dimensions $n \times D$, computes the requisite parameter vector dimension, shown as follows: $$ \begin{equation} l = p \times n \times D, \end{equation} $$ where the masking ratio takes value from $p \in [0, 1]$. For masking_ratio p=1.0: all parameters are used; while masking_ratio p=0.0: no parameters will be used.

Attributes:

Name	Type	Description
`name`	`str, default = 'masking_reconciliation'`	Name of the parameter reconciliation function
`p`	`float, default = 0.5`	The masking ratio of elements in the parameter matrix, which denotes the percentage of used parameter, e.g., masking_ratio p=1.0: all parameters are used; masking_ratio p=0.0: no parameters will be used.
`fixed_mask`	`bool, default = True`	Whether the masking matrix is fixed for all inputs or not.

Methods:

Name	Description
`__init__`	It initializes the parameter reconciliation function.
`calculate_l`	It calculates the length of required parameters.
`forward`	It implements the abstract forward method declared in the base reconciliation class.

Source code in tinybig/reconciliation/basic_reconciliation.py

class masking_reconciliation(fabrication):
    r"""
    The masking parameter reconciliation function.

    It performs the masking parameter reconciliation, and returns the masked parameter matrix of shape (n, D).
    This class inherits from the reconciliation class (i.e., the fabrication class in the module directory).

    ...

    Notes
    ----------
    To mitigate the identified limitation of identity parameter reconciliation function, the masking parameter
    reconciliation function curtail the count of learnable parameters in $\mathbf{W}$ to a reduced number of $l$
    via a randomly generated masking matrix $\mathbf{M}$ as follows:
    $$
        \begin{equation}
            \psi({\mathbf{w}}) = (\mathbf{M} \odot \text{reshape}(\mathbf{w})) = (\mathbf{M} \odot \mathbf{W}) \in {R}^{n \times D},
        \end{equation}
    $$
    where the term $\mathbf{M} \in \{0, 1\}^{n \times D}$ denotes the binary masking matrix only with $l$
    non-zero entries and $\odot$ denotes the element-wise product operator.

    To facilitate practical implementation, instead of pre-define the parameter dimension $l$, the masking reconciliation
    function uses the masking ratio $p$ as a parameter of the masking based reconciliation function instead.
    This parameter, in conjunction with the output dimensions $n \times D$, computes the requisite parameter vector
    dimension, shown as follows:
    $$
        \begin{equation}
            l = p \times n \times D,
        \end{equation}
    $$
    where the masking ratio takes value from $p \in [0, 1]$. For masking_ratio p=1.0: all parameters are used;
    while masking_ratio p=0.0: no parameters will be used.


    Attributes
    ----------
    name: str, default = 'masking_reconciliation'
        Name of the parameter reconciliation function
    p: float, default = 0.5
        The masking ratio of elements in the parameter matrix, which denotes the percentage of used parameter,
        e.g., masking_ratio p=1.0: all parameters are used; masking_ratio p=0.0: no parameters will be used.
    fixed_mask: bool, default = True
        Whether the masking matrix is fixed for all inputs or not.

    Methods
    ----------
    __init__
        It initializes the parameter reconciliation function.

    calculate_l
        It calculates the length of required parameters.

    forward
        It implements the abstract forward method declared in the base reconciliation class.
    """
    def __init__(self, name='masking_reconciliation', p=0.5, fixed_mask: bool = True, *args, **kwargs):
        """
        The initialization method of the masking parameter reconciliation function.

        It initializes a masking parameter reconciliation function object.
        This method will also call the initialization method of the base class as well.

        Parameters
        ----------
        name: str, default = 'masking_reconciliation'
            Name of the parameter reconciliation function.
        p: float, default = 0.5
            The masking ratio of elements in the parameter matrix, which denotes the percentage of used parameter,
            e.g., masking_ratio p=1.0: all parameters are used; masking_ratio p=0.0: no parameters will be used.
        fixed_mask: bool, default = True
            Whether the masking matrix is fixed for all inputs or not.

        Returns
        ----------
        fabrication
            The masking parameter reconciliation function object.
        """
        super().__init__(name=name, *args, **kwargs)
        self.p = p
        self.mask_matrix = None
        self.fixed_mask = fixed_mask

    def calculate_l(self, n: int, D: int):
        r"""
        The required parameter number calculation method.

        It calculates the number of required learnable parameters, i.e., $l$, of the parameter reconciliation function
        based on the intermediate and output space dimensions, $n$ and $D$, and masking ratio parameter $p$,
        which can be represented as follows:
        $$
            \begin{equation}
                l = p \times n \times D.
            \end{equation}
        $$

        Parameters
        ----------
        n: int
            The dimension of the output space.
        D: int
            The dimension of the intermediate expansion space.

        Returns
        -------
        int
            The number of required learnable parameters.
        """
        return n * D

    def generate_masking_matrix(self, n, D, device: str = 'cpu'):
        """
        The masking matrix generation method.

        It generates the masking matrix of shape (n, D) subject to the masking ratio parameter $p$.
        The method first randomly generates a matrix of shape (n, D) and then compares it with parameter $p$
        to define the binary masking matrix.

        Parameters
        ----------
        n: int
            The dimension of the output space.
        D: int
            The dimension of the intermediate expansion space.

        Returns
        -------
        torch.Tensor
            The binary masking matrix of shape (n, D).
        """
        self.mask_matrix = torch.rand(n, D, device=device) < self.p

    def forward(self, n: int, D: int, w: torch.nn.Parameter, device='cpu', *args, **kwargs):
        r"""
        The forward method of the parameter reconciliation function.

        It applies the masking parameter reconciliation operation to the input parameter vector,
        and returns the reconciled parameter matrix of shape (n, D) subject to the masking ratio $p$ as follows:
        $$
            \begin{equation}
                \psi({\mathbf{w}}) = (\mathbf{M} \odot \text{reshape}(\mathbf{w})) = (\mathbf{M} \odot \mathbf{W}) \in {R}^{n \times D},
            \end{equation}
        $$

        Parameters
        ----------
        n: int
            The dimension of the output space.
        D: int
            The dimension of the intermediate expansion space.
        w: torch.nn.Parameter, default = None
            The learnable parameters of the model.
        device: str, default = 'cpu'
            Device to perform the parameter reconciliation.

        Returns
        ----------
        torch.Tensor
            The reconciled parameter matrix of shape (n, D).
        """
        if not self.fixed_mask:
            self.generate_masking_matrix(n=n, D=D, device=device)
        else:
            if self.mask_matrix is None:
                self.generate_masking_matrix(n=n, D=D, device=device)
        return w.view(n, D) * self.mask_matrix.to(device)

`init(name='masking_reconciliation', p=0.5, fixed_mask=True, *args, **kwargs)`

The initialization method of the masking parameter reconciliation function.

It initializes a masking parameter reconciliation function object. This method will also call the initialization method of the base class as well.

Parameters:

Name	Type	Description	Default
`name`		Name of the parameter reconciliation function.	`'masking_reconciliation'`
`p`		The masking ratio of elements in the parameter matrix, which denotes the percentage of used parameter, e.g., masking_ratio p=1.0: all parameters are used; masking_ratio p=0.0: no parameters will be used.	`0.5`
`fixed_mask`	`bool`	Whether the masking matrix is fixed for all inputs or not.	`True`

Returns:

Type	Description
`fabrication`	The masking parameter reconciliation function object.

Source code in tinybig/reconciliation/basic_reconciliation.py

def __init__(self, name='masking_reconciliation', p=0.5, fixed_mask: bool = True, *args, **kwargs):
    """
    The initialization method of the masking parameter reconciliation function.

    It initializes a masking parameter reconciliation function object.
    This method will also call the initialization method of the base class as well.

    Parameters
    ----------
    name: str, default = 'masking_reconciliation'
        Name of the parameter reconciliation function.
    p: float, default = 0.5
        The masking ratio of elements in the parameter matrix, which denotes the percentage of used parameter,
        e.g., masking_ratio p=1.0: all parameters are used; masking_ratio p=0.0: no parameters will be used.
    fixed_mask: bool, default = True
        Whether the masking matrix is fixed for all inputs or not.

    Returns
    ----------
    fabrication
        The masking parameter reconciliation function object.
    """
    super().__init__(name=name, *args, **kwargs)
    self.p = p
    self.mask_matrix = None
    self.fixed_mask = fixed_mask

`calculate_l(n, D)`

The required parameter number calculation method.

It calculates the number of required learnable parameters, i.e., $l$, of the parameter reconciliation function based on the intermediate and output space dimensions, $n$ and $D$, and masking ratio parameter $p$, which can be represented as follows: $$ \begin{equation} l = p \times n \times D. \end{equation} $$

Parameters:

Name	Type	Description	Default
`n`	`int`	The dimension of the output space.	required
`D`	`int`	The dimension of the intermediate expansion space.	required

Returns:

Type	Description
`int`	The number of required learnable parameters.

Source code in tinybig/reconciliation/basic_reconciliation.py

def calculate_l(self, n: int, D: int):
    r"""
    The required parameter number calculation method.

    It calculates the number of required learnable parameters, i.e., $l$, of the parameter reconciliation function
    based on the intermediate and output space dimensions, $n$ and $D$, and masking ratio parameter $p$,
    which can be represented as follows:
    $$
        \begin{equation}
            l = p \times n \times D.
        \end{equation}
    $$

    Parameters
    ----------
    n: int
        The dimension of the output space.
    D: int
        The dimension of the intermediate expansion space.

    Returns
    -------
    int
        The number of required learnable parameters.
    """
    return n * D

`forward(n, D, w, device='cpu', *args, **kwargs)`

The forward method of the parameter reconciliation function.

It applies the masking parameter reconciliation operation to the input parameter vector, and returns the reconciled parameter matrix of shape (n, D) subject to the masking ratio $p$ as follows: $$ \begin{equation} \psi({\mathbf{w}}) = (\mathbf{M} \odot \text{reshape}(\mathbf{w})) = (\mathbf{M} \odot \mathbf{W}) \in {R}^{n \times D}, \end{equation} $$

Parameters:

Name	Type	Description	Default
`n`	`int`	The dimension of the output space.	required
`D`	`int`	The dimension of the intermediate expansion space.	required
`w`	`Parameter`	The learnable parameters of the model.	required
`device`		Device to perform the parameter reconciliation.	`'cpu'`

Returns:

Type	Description
`Tensor`	The reconciled parameter matrix of shape (n, D).

Source code in tinybig/reconciliation/basic_reconciliation.py

def forward(self, n: int, D: int, w: torch.nn.Parameter, device='cpu', *args, **kwargs):
    r"""
    The forward method of the parameter reconciliation function.

    It applies the masking parameter reconciliation operation to the input parameter vector,
    and returns the reconciled parameter matrix of shape (n, D) subject to the masking ratio $p$ as follows:
    $$
        \begin{equation}
            \psi({\mathbf{w}}) = (\mathbf{M} \odot \text{reshape}(\mathbf{w})) = (\mathbf{M} \odot \mathbf{W}) \in {R}^{n \times D},
        \end{equation}
    $$

    Parameters
    ----------
    n: int
        The dimension of the output space.
    D: int
        The dimension of the intermediate expansion space.
    w: torch.nn.Parameter, default = None
        The learnable parameters of the model.
    device: str, default = 'cpu'
        Device to perform the parameter reconciliation.

    Returns
    ----------
    torch.Tensor
        The reconciled parameter matrix of shape (n, D).
    """
    if not self.fixed_mask:
        self.generate_masking_matrix(n=n, D=D, device=device)
    else:
        if self.mask_matrix is None:
            self.generate_masking_matrix(n=n, D=D, device=device)
    return w.view(n, D) * self.mask_matrix.to(device)

`generate_masking_matrix(n, D, device='cpu')`

The masking matrix generation method.

It generates the masking matrix of shape (n, D) subject to the masking ratio parameter $p$. The method first randomly generates a matrix of shape (n, D) and then compares it with parameter $p$ to define the binary masking matrix.

Parameters:

Name	Type	Description	Default
`n`		The dimension of the output space.	required
`D`		The dimension of the intermediate expansion space.	required

Returns:

Type	Description
`Tensor`	The binary masking matrix of shape (n, D).

Source code in tinybig/reconciliation/basic_reconciliation.py

def generate_masking_matrix(self, n, D, device: str = 'cpu'):
    """
    The masking matrix generation method.

    It generates the masking matrix of shape (n, D) subject to the masking ratio parameter $p$.
    The method first randomly generates a matrix of shape (n, D) and then compares it with parameter $p$
    to define the binary masking matrix.

    Parameters
    ----------
    n: int
        The dimension of the output space.
    D: int
        The dimension of the intermediate expansion space.

    Returns
    -------
    torch.Tensor
        The binary masking matrix of shape (n, D).
    """
    self.mask_matrix = torch.rand(n, D, device=device) < self.p

masking_reconciliation

__init__(name='masking_reconciliation', p=0.5, fixed_mask=True, *args, **kwargs)

calculate_l(n, D)

forward(n, D, w, device='cpu', *args, **kwargs)

generate_masking_matrix(n, D, device='cpu')

`init(name='masking_reconciliation', p=0.5, fixed_mask=True, *args, **kwargs)`

`calculate_l(n, D)`

`forward(n, D, w, device='cpu', *args, **kwargs)`

`generate_masking_matrix(n, D, device='cpu')`