Skip to content

layer

Bases: Module, function

The RPN layer class for implementing the multi-head module.

It will be used to compose the RPN model with deep architectures.

...

Notes

Similar to the Transformers, for each layer of RPN model, it allows a multi-head architecture, where each head will disentangle the input data and model parameters using different expansion, reconciliation and remainder functions shown as follows: $$ \begin{equation} g(\mathbf{x} | \mathbf{w}, H) = \sum_{h=0}^{H-1} \left\langle \kappa^{(h)}(\mathbf{x}), \psi^{(h)}(\mathbf{w}^{(h)}) \right\rangle + \pi^{(h)}(\mathbf{x}), \end{equation} $$ where the superscript "\(h\)" indicates the head index and \(H\) denotes the total head number. By default, summation is used to combine the results from all these heads.

Attributes:

Name Type Description
m int

The input dimension of the layer.

n int

The output dimension of the layer.

heads torch.nn.ModuleList, default = torch.nn.ModuleList()

The list of RPN heads involved in the layer.

head_fusion fusion, default = None

The fusion function of the outputs learned by multi-heads.

device str, default = 'cpu'

The device for hosting the RPN layer.

Methods:

Name Description
__init__

The initialization method of the RPN-layer module with multiple RPN heads.

get_widthber

The head number retrieval method.

initialize_parameters

Head parameter initialization method.

initialize_fusion_parameters

Fusion component parameter initialization method.

multi_head_fusion

The multi-head outputs fusion method.

forward

The forward method of this multi-head PRN layer module.

__call__

The re-implementatino of the callable method of this RPN layer module.

Source code in tinybig/module/base_layer.py
class layer(Module, function):
    r"""
    The RPN layer class for implementing the multi-head module.

    It will be used to compose the RPN model with deep architectures.

    ...

    Notes
    ----------
    Similar to the Transformers, for each layer of RPN model, it allows a multi-head architecture,
    where each head will disentangle the input data and model parameters using different expansion,
    reconciliation and remainder functions shown as follows:
    $$
        \begin{equation}
            g(\mathbf{x} | \mathbf{w}, H) = \sum_{h=0}^{H-1} \left\langle \kappa^{(h)}(\mathbf{x}), \psi^{(h)}(\mathbf{w}^{(h)}) \right\rangle + \pi^{(h)}(\mathbf{x}),
        \end{equation}
    $$
    where the superscript "$h$" indicates the head index and $H$ denotes the total head number.
    By default, summation is used to combine the results from all these heads.

    Attributes
    ----------
    m: int
        The input dimension of the layer.
    n: int
        The output dimension of the layer.
    heads: torch.nn.ModuleList, default = torch.nn.ModuleList()
        The list of RPN heads involved in the layer.
    head_fusion: fusion, default = None
        The fusion function of the outputs learned by multi-heads.
    device: str, default = 'cpu'
            The device for hosting the RPN layer.

    Methods
    ----------
    __init__
        The initialization method of the RPN-layer module with multiple RPN heads.

    get_widthber
        The head number retrieval method.

    initialize_parameters
        Head parameter initialization method.

    initialize_fusion_parameters
        Fusion component parameter initialization method.

    multi_head_fusion
        The multi-head outputs fusion method.

    forward
        The forward method of this multi-head PRN layer module.

    __call__
        The re-implementatino of the callable method of this RPN layer module.
    """
    def __init__(
        self,
        m: int,
        n: int,
        name: str = "rpn_layer",
        heads: list = None,
        head_configs: dict | list = None,
        width: int = None,
        width_alloc: int | list = None,
        head_fusion=None,
        head_fusion_configs=None,
        parameters_init_method: str = 'xavier_uniform',
        device='cpu',
        *args, **kwargs
    ):
        r"""
        The initialization method of the RPN-layer module with multiple RPN heads.

        It initializes the RPN layer module composed with multiple RPN heads.
        Specifically, this method initializes the dimension configurations of the layer,
        the component heads, and defines the device to host the head.

        Parameters
        ----------
        m: int
            The input dimension of the layer.
        n: int
            The output dimension of the layer.
        heads: torch.nn.ModuleList, default = torch.nn.ModuleList()
            The list of RPN heads involved in the layer. The heads involved in the layer can be initialized
            either directly with the heads parameter or via the head_configs parameter.
        head_configs: list, default = None
            The list of RPN head configurations in the layer.
        width: int, default = None
            The total head number of the layer. It is optional, if the "heads" or the "head_configs" can provide
            sufficient information for the head initialization, this widthber parameter can be set as None.
        width_alloc: list, default = None
            RPN allows the heads with different configurations, instead of listing such configurations one by one,
            it also allows the listing of each configuration types together with the repeating numbers for
            each of them, which are specified by this optional head number allocation parameter.
        head_fusion: fusion, default = None
            The fusion function of the outputs learned by multi-heads.
        head_fusion_configs: dict, default = None
            The fusion function configurations of the outputs learned by multi-heads.
        device: str, default = 'cpu'
            The device for hosting the RPN layer.

        Returns
        ----------
        object
            This method will return the initialized RPN-layer object.
        """
        Module.__init__(self)
        function.__init__(self, name=name, device=device)

        assert m is not None and n is not None
        self.m = m
        self.n = n
        self.fusion_parameters = None
        self.heads = torch.nn.ModuleList()
        self.parameters_init_method = parameters_init_method

        # the multi-head initialization
        if heads is not None:
            # initialize heads from the provided head parameter directly
            self.heads.extend(heads)
            width = len(self.heads)
        elif head_configs is None:
            raise ValueError("Both heads and head_configs are None, this layer cannot be initialized...")
        else:
            # initialize heads from the head configs

            # process the width, width_alloc and head_configs parameters to make them consistent
            width, width_alloc, head_configs = config.process_num_alloc_configs(width, width_alloc, head_configs)
            assert len(width_alloc) == len(head_configs) and sum(width_alloc) == width

            # initialize the multi-head involved in the layer
            for head_repeat_time, head_config in zip(width_alloc, head_configs):
                for head_id in range(0, head_repeat_time):
                    head_class_name = head_config['head_class']
                    head_parameters = head_config['head_parameters']
                    head_parameters['m'] = self.m
                    head_parameters['n'] = self.n
                    head_parameters['device'] = device
                    head_parameters['parameters_init_method'] = self.parameters_init_method
                    self.heads.append(config.get_obj_from_str(head_class_name)(**head_parameters))

        assert len(self.heads) == width and [(self.m, self.n)] * width == [(head.m, head.n) for head in self.heads]

        self.head_fusion = config.instantiation_functions(functions=head_fusion, function_configs=head_fusion_configs, device=device)
        if len(self.heads) > 1 and self.head_fusion is None:
            self.head_fusion = mean_fusion(dims=[head.get_n() for head in heads])
        self.w_head_fusion = None
        self.create_learnable_parameters(init_type=self.parameters_init_method)

    def get_m(self):
        """
        Retrieves the input dimension of the layer.

        Returns
        -------
        int
            The input dimension (`m`) of the layer.
        """
        return self.m

    def get_n(self):
        """
        Retrieves the output dimension of the layer.

        If the layer uses a fusion component, this method calculates the output dimension based
        on the fusion component's configuration.

        Returns
        -------
        int
            The output dimension (`n`) of the layer.
        """
        if self.head_fusion is not None:
            return self.head_fusion.calculate_n()
        else:
            return self.n

    def create_learnable_parameters(self, initialize_parameter_at_creation: bool = False, init_type='xavier_uniform', init_bias=True, *args, **kwargs):

        """
        Creates and optionally initializes learnable parameters for the fusion component.

        This method is responsible for creating any learnable parameters required by the fusion component
        of the layer. It also supports optional parameter initialization.

        Parameters
        ----------
        initialize_parameter_at_creation : bool, default = False
            Whether to initialize the parameters at the time of their creation.
        init_type : str, default = 'xavier_uniform'
            The type of initialization to apply. Supported types include:
            - 'xavier_uniform': Xavier uniform initialization.
            - 'kaiming_uniform': Kaiming uniform initialization.
        init_bias : bool, default = True
            Whether to initialize biases in the parameters, if applicable.

        Returns
        -------
        None
            This method does not return any value. It updates the layer's attributes in-place.
        """

        if self.head_fusion is not None and self.head_fusion.require_parameters:
            l = self.head_fusion.calculate_l()
            self.w_head_fusion = torch.nn.Parameter(torch.rand(1, l))

        if initialize_parameter_at_creation:
            self.initialize_parameters(init_type=init_type, init_bias=init_bias)

    def initialize_parameters(self, init_type='xavier_uniform', init_bias=True):
        """
        Head parameter initialization method.

        It initializes the learnable parameters in each head involved in the layer,
        which will call the parameter initialization method in each of the heads.

        Returns
        -------
        None
            The initialization method doesn't have any return values.
        """
        for head in self.heads:
            head.initialize_parameters(init_type=init_type, init_bias=init_bias)
        if self.w_head_fusion is not None:
            if init_type == 'xavier_uniform':
                torch.nn.init.kaiming_uniform_(self.w_head_fusion, a=math.sqrt(5))
            else:
                torch.nn.init.xavier_uniform_(self.w_head_fusion)

    def initialize_fusion_parameters(self):
        """
        Fusion component parameter initialization method.

        It initializes the learnable parameters for the fusion component.
        The RPN head also allows the linear fusion component to combine the
        outputs of multi-head with learnable parameters.

        Returns
        -------
        None
            The initialization method doesn't have any return values.
        """
        self.fusion_parameters = torch.nn.Parameter(torch.rand(self.n, self.n*len(self.heads)))
        torch.nn.init.xavier_uniform_(self.fusion_parameters)

    def get_width(self):
        """
        Retrieves the number of heads in the layer.

        Returns
        -------
        int
            The number of heads in the multi-head RPN layer.
        """
        return len(self.heads)

    def to_config(self):
        """
        Converts the layer instance into a configuration dictionary.

        This method generates a configuration dictionary that captures the class name, its attributes,
        and the configurations of its heads and fusion components. It can be used to save or reconstruct
        the layer.

        Returns
        -------
        dict
            A dictionary containing the class name and parameters:
            - "layer_class": The full class name of the layer.
            - "layer_parameters": A dictionary of layer attributes, including:
              - "name": The name of the layer.
              - "device": The device hosting the layer.
              - "m": The input dimension.
              - "n": The output dimension.
              - "head_configs": A list of configurations for the heads in the layer.
              - "head_fusion_configs": The configuration of the fusion component, if present.

        """
        layer_class = f"{self.__class__.__module__}.{self.__class__.__name__}"
        layer_parameters = {
            'name': self.name,
            'device': self.device,
            'm': self.m,
            'n': self.n,
            'head_configs': [head.to_config() for head in self.heads] if self.head else [],
        }

        if self.head_fusion is not None:
            layer_parameters['head_fusion_configs']= self.head_fusion.to_config()

        return {
            "layer_class": layer_class,
            "layer_parameters": layer_parameters
        }

    def forward(self, x: torch.Tensor, fusion_strategy: str = 'average', device: str = 'cpu', *args, **kwargs):
        r"""
        The forward method of this multi-head PRN layer module.

        It calculates the outputs with the multi-head RPN layer based on the inputs subject to certain fusion strategy.
        For each layer of RPN model, RPN allows a multi-head architecture,
        where each head will disentangle the input data and model parameters using different expansion,
        reconciliation and remainder functions shown as follows:
        $$
            \begin{equation}
                g(\mathbf{x} | \mathbf{w}, H) = \sum_{h=0}^{H-1} \left\langle \kappa^{(h)}(\mathbf{x}), \psi^{(h)}(\mathbf{w}^{(h)}) \right\rangle + \pi^{(h)}(\mathbf{x}),
            \end{equation}
        $$
        where the superscript "$h$" indicates the head index and $H$ denotes the total head number.
        By default, summation is used to combine the results from all these heads.

        Parameters
        ----------
        x: torch.Tensor
            The input data to the layer.
        fusion_strategy: str, default = 'average'
            The optional fusion_strategy of the forward method. If it is set as None, this layer will use the default
             fusion_strategy at initialization of this layer.
        device: str, default = 'cpu'
            Device used to host this layer for calculation.

        Returns
        -------
        torch.Tensor
            It will return the learning results of this RPN layer.
        """
        assert x is not None and x.ndim == 2 and x.size(1) == self.get_m()

        results = []
        for head in self.heads:
            results.append(head(x=x, device=device))
        assert results != [] and [results[0].shape] * len(results) == [result.shape for result in results]

        if self.head_fusion is not None:
            assert self.head_fusion.get_num() == len(results) and [results[0].shape] * len(results) == [result.shape for result in results]
            result = self.head_fusion(x=results, w=self.w_head_fusion, device=device)
        else:
            assert len(results) == 1
            result = results[0]

        assert result.size(1) == self.get_n()
        return result

__init__(m, n, name='rpn_layer', heads=None, head_configs=None, width=None, width_alloc=None, head_fusion=None, head_fusion_configs=None, parameters_init_method='xavier_uniform', device='cpu', *args, **kwargs)

The initialization method of the RPN-layer module with multiple RPN heads.

It initializes the RPN layer module composed with multiple RPN heads. Specifically, this method initializes the dimension configurations of the layer, the component heads, and defines the device to host the head.

Parameters:

Name Type Description Default
m int

The input dimension of the layer.

required
n int

The output dimension of the layer.

required
heads list

The list of RPN heads involved in the layer. The heads involved in the layer can be initialized either directly with the heads parameter or via the head_configs parameter.

None
head_configs dict | list

The list of RPN head configurations in the layer.

None
width int

The total head number of the layer. It is optional, if the "heads" or the "head_configs" can provide sufficient information for the head initialization, this widthber parameter can be set as None.

None
width_alloc int | list

RPN allows the heads with different configurations, instead of listing such configurations one by one, it also allows the listing of each configuration types together with the repeating numbers for each of them, which are specified by this optional head number allocation parameter.

None
head_fusion

The fusion function of the outputs learned by multi-heads.

None
head_fusion_configs

The fusion function configurations of the outputs learned by multi-heads.

None
device

The device for hosting the RPN layer.

'cpu'

Returns:

Type Description
object

This method will return the initialized RPN-layer object.

Source code in tinybig/module/base_layer.py
def __init__(
    self,
    m: int,
    n: int,
    name: str = "rpn_layer",
    heads: list = None,
    head_configs: dict | list = None,
    width: int = None,
    width_alloc: int | list = None,
    head_fusion=None,
    head_fusion_configs=None,
    parameters_init_method: str = 'xavier_uniform',
    device='cpu',
    *args, **kwargs
):
    r"""
    The initialization method of the RPN-layer module with multiple RPN heads.

    It initializes the RPN layer module composed with multiple RPN heads.
    Specifically, this method initializes the dimension configurations of the layer,
    the component heads, and defines the device to host the head.

    Parameters
    ----------
    m: int
        The input dimension of the layer.
    n: int
        The output dimension of the layer.
    heads: torch.nn.ModuleList, default = torch.nn.ModuleList()
        The list of RPN heads involved in the layer. The heads involved in the layer can be initialized
        either directly with the heads parameter or via the head_configs parameter.
    head_configs: list, default = None
        The list of RPN head configurations in the layer.
    width: int, default = None
        The total head number of the layer. It is optional, if the "heads" or the "head_configs" can provide
        sufficient information for the head initialization, this widthber parameter can be set as None.
    width_alloc: list, default = None
        RPN allows the heads with different configurations, instead of listing such configurations one by one,
        it also allows the listing of each configuration types together with the repeating numbers for
        each of them, which are specified by this optional head number allocation parameter.
    head_fusion: fusion, default = None
        The fusion function of the outputs learned by multi-heads.
    head_fusion_configs: dict, default = None
        The fusion function configurations of the outputs learned by multi-heads.
    device: str, default = 'cpu'
        The device for hosting the RPN layer.

    Returns
    ----------
    object
        This method will return the initialized RPN-layer object.
    """
    Module.__init__(self)
    function.__init__(self, name=name, device=device)

    assert m is not None and n is not None
    self.m = m
    self.n = n
    self.fusion_parameters = None
    self.heads = torch.nn.ModuleList()
    self.parameters_init_method = parameters_init_method

    # the multi-head initialization
    if heads is not None:
        # initialize heads from the provided head parameter directly
        self.heads.extend(heads)
        width = len(self.heads)
    elif head_configs is None:
        raise ValueError("Both heads and head_configs are None, this layer cannot be initialized...")
    else:
        # initialize heads from the head configs

        # process the width, width_alloc and head_configs parameters to make them consistent
        width, width_alloc, head_configs = config.process_num_alloc_configs(width, width_alloc, head_configs)
        assert len(width_alloc) == len(head_configs) and sum(width_alloc) == width

        # initialize the multi-head involved in the layer
        for head_repeat_time, head_config in zip(width_alloc, head_configs):
            for head_id in range(0, head_repeat_time):
                head_class_name = head_config['head_class']
                head_parameters = head_config['head_parameters']
                head_parameters['m'] = self.m
                head_parameters['n'] = self.n
                head_parameters['device'] = device
                head_parameters['parameters_init_method'] = self.parameters_init_method
                self.heads.append(config.get_obj_from_str(head_class_name)(**head_parameters))

    assert len(self.heads) == width and [(self.m, self.n)] * width == [(head.m, head.n) for head in self.heads]

    self.head_fusion = config.instantiation_functions(functions=head_fusion, function_configs=head_fusion_configs, device=device)
    if len(self.heads) > 1 and self.head_fusion is None:
        self.head_fusion = mean_fusion(dims=[head.get_n() for head in heads])
    self.w_head_fusion = None
    self.create_learnable_parameters(init_type=self.parameters_init_method)

create_learnable_parameters(initialize_parameter_at_creation=False, init_type='xavier_uniform', init_bias=True, *args, **kwargs)

Creates and optionally initializes learnable parameters for the fusion component.

This method is responsible for creating any learnable parameters required by the fusion component of the layer. It also supports optional parameter initialization.

Parameters:

Name Type Description Default
initialize_parameter_at_creation bool

Whether to initialize the parameters at the time of their creation.

= False
init_type str

The type of initialization to apply. Supported types include: - 'xavier_uniform': Xavier uniform initialization. - 'kaiming_uniform': Kaiming uniform initialization.

= 'xavier_uniform'
init_bias bool

Whether to initialize biases in the parameters, if applicable.

= True

Returns:

Type Description
None

This method does not return any value. It updates the layer's attributes in-place.

Source code in tinybig/module/base_layer.py
def create_learnable_parameters(self, initialize_parameter_at_creation: bool = False, init_type='xavier_uniform', init_bias=True, *args, **kwargs):

    """
    Creates and optionally initializes learnable parameters for the fusion component.

    This method is responsible for creating any learnable parameters required by the fusion component
    of the layer. It also supports optional parameter initialization.

    Parameters
    ----------
    initialize_parameter_at_creation : bool, default = False
        Whether to initialize the parameters at the time of their creation.
    init_type : str, default = 'xavier_uniform'
        The type of initialization to apply. Supported types include:
        - 'xavier_uniform': Xavier uniform initialization.
        - 'kaiming_uniform': Kaiming uniform initialization.
    init_bias : bool, default = True
        Whether to initialize biases in the parameters, if applicable.

    Returns
    -------
    None
        This method does not return any value. It updates the layer's attributes in-place.
    """

    if self.head_fusion is not None and self.head_fusion.require_parameters:
        l = self.head_fusion.calculate_l()
        self.w_head_fusion = torch.nn.Parameter(torch.rand(1, l))

    if initialize_parameter_at_creation:
        self.initialize_parameters(init_type=init_type, init_bias=init_bias)

forward(x, fusion_strategy='average', device='cpu', *args, **kwargs)

The forward method of this multi-head PRN layer module.

It calculates the outputs with the multi-head RPN layer based on the inputs subject to certain fusion strategy. For each layer of RPN model, RPN allows a multi-head architecture, where each head will disentangle the input data and model parameters using different expansion, reconciliation and remainder functions shown as follows: $$ \begin{equation} g(\mathbf{x} | \mathbf{w}, H) = \sum_{h=0}^{H-1} \left\langle \kappa^{(h)}(\mathbf{x}), \psi^{(h)}(\mathbf{w}^{(h)}) \right\rangle + \pi^{(h)}(\mathbf{x}), \end{equation} $$ where the superscript "\(h\)" indicates the head index and \(H\) denotes the total head number. By default, summation is used to combine the results from all these heads.

Parameters:

Name Type Description Default
x Tensor

The input data to the layer.

required
fusion_strategy str

The optional fusion_strategy of the forward method. If it is set as None, this layer will use the default fusion_strategy at initialization of this layer.

'average'
device str

Device used to host this layer for calculation.

'cpu'

Returns:

Type Description
Tensor

It will return the learning results of this RPN layer.

Source code in tinybig/module/base_layer.py
def forward(self, x: torch.Tensor, fusion_strategy: str = 'average', device: str = 'cpu', *args, **kwargs):
    r"""
    The forward method of this multi-head PRN layer module.

    It calculates the outputs with the multi-head RPN layer based on the inputs subject to certain fusion strategy.
    For each layer of RPN model, RPN allows a multi-head architecture,
    where each head will disentangle the input data and model parameters using different expansion,
    reconciliation and remainder functions shown as follows:
    $$
        \begin{equation}
            g(\mathbf{x} | \mathbf{w}, H) = \sum_{h=0}^{H-1} \left\langle \kappa^{(h)}(\mathbf{x}), \psi^{(h)}(\mathbf{w}^{(h)}) \right\rangle + \pi^{(h)}(\mathbf{x}),
        \end{equation}
    $$
    where the superscript "$h$" indicates the head index and $H$ denotes the total head number.
    By default, summation is used to combine the results from all these heads.

    Parameters
    ----------
    x: torch.Tensor
        The input data to the layer.
    fusion_strategy: str, default = 'average'
        The optional fusion_strategy of the forward method. If it is set as None, this layer will use the default
         fusion_strategy at initialization of this layer.
    device: str, default = 'cpu'
        Device used to host this layer for calculation.

    Returns
    -------
    torch.Tensor
        It will return the learning results of this RPN layer.
    """
    assert x is not None and x.ndim == 2 and x.size(1) == self.get_m()

    results = []
    for head in self.heads:
        results.append(head(x=x, device=device))
    assert results != [] and [results[0].shape] * len(results) == [result.shape for result in results]

    if self.head_fusion is not None:
        assert self.head_fusion.get_num() == len(results) and [results[0].shape] * len(results) == [result.shape for result in results]
        result = self.head_fusion(x=results, w=self.w_head_fusion, device=device)
    else:
        assert len(results) == 1
        result = results[0]

    assert result.size(1) == self.get_n()
    return result

get_m()

Retrieves the input dimension of the layer.

Returns:

Type Description
int

The input dimension (m) of the layer.

Source code in tinybig/module/base_layer.py
def get_m(self):
    """
    Retrieves the input dimension of the layer.

    Returns
    -------
    int
        The input dimension (`m`) of the layer.
    """
    return self.m

get_n()

Retrieves the output dimension of the layer.

If the layer uses a fusion component, this method calculates the output dimension based on the fusion component's configuration.

Returns:

Type Description
int

The output dimension (n) of the layer.

Source code in tinybig/module/base_layer.py
def get_n(self):
    """
    Retrieves the output dimension of the layer.

    If the layer uses a fusion component, this method calculates the output dimension based
    on the fusion component's configuration.

    Returns
    -------
    int
        The output dimension (`n`) of the layer.
    """
    if self.head_fusion is not None:
        return self.head_fusion.calculate_n()
    else:
        return self.n

get_width()

Retrieves the number of heads in the layer.

Returns:

Type Description
int

The number of heads in the multi-head RPN layer.

Source code in tinybig/module/base_layer.py
def get_width(self):
    """
    Retrieves the number of heads in the layer.

    Returns
    -------
    int
        The number of heads in the multi-head RPN layer.
    """
    return len(self.heads)

initialize_fusion_parameters()

Fusion component parameter initialization method.

It initializes the learnable parameters for the fusion component. The RPN head also allows the linear fusion component to combine the outputs of multi-head with learnable parameters.

Returns:

Type Description
None

The initialization method doesn't have any return values.

Source code in tinybig/module/base_layer.py
def initialize_fusion_parameters(self):
    """
    Fusion component parameter initialization method.

    It initializes the learnable parameters for the fusion component.
    The RPN head also allows the linear fusion component to combine the
    outputs of multi-head with learnable parameters.

    Returns
    -------
    None
        The initialization method doesn't have any return values.
    """
    self.fusion_parameters = torch.nn.Parameter(torch.rand(self.n, self.n*len(self.heads)))
    torch.nn.init.xavier_uniform_(self.fusion_parameters)

initialize_parameters(init_type='xavier_uniform', init_bias=True)

Head parameter initialization method.

It initializes the learnable parameters in each head involved in the layer, which will call the parameter initialization method in each of the heads.

Returns:

Type Description
None

The initialization method doesn't have any return values.

Source code in tinybig/module/base_layer.py
def initialize_parameters(self, init_type='xavier_uniform', init_bias=True):
    """
    Head parameter initialization method.

    It initializes the learnable parameters in each head involved in the layer,
    which will call the parameter initialization method in each of the heads.

    Returns
    -------
    None
        The initialization method doesn't have any return values.
    """
    for head in self.heads:
        head.initialize_parameters(init_type=init_type, init_bias=init_bias)
    if self.w_head_fusion is not None:
        if init_type == 'xavier_uniform':
            torch.nn.init.kaiming_uniform_(self.w_head_fusion, a=math.sqrt(5))
        else:
            torch.nn.init.xavier_uniform_(self.w_head_fusion)

to_config()

Converts the layer instance into a configuration dictionary.

This method generates a configuration dictionary that captures the class name, its attributes, and the configurations of its heads and fusion components. It can be used to save or reconstruct the layer.

Returns:

Type Description
dict

A dictionary containing the class name and parameters: - "layer_class": The full class name of the layer. - "layer_parameters": A dictionary of layer attributes, including: - "name": The name of the layer. - "device": The device hosting the layer. - "m": The input dimension. - "n": The output dimension. - "head_configs": A list of configurations for the heads in the layer. - "head_fusion_configs": The configuration of the fusion component, if present.

Source code in tinybig/module/base_layer.py
def to_config(self):
    """
    Converts the layer instance into a configuration dictionary.

    This method generates a configuration dictionary that captures the class name, its attributes,
    and the configurations of its heads and fusion components. It can be used to save or reconstruct
    the layer.

    Returns
    -------
    dict
        A dictionary containing the class name and parameters:
        - "layer_class": The full class name of the layer.
        - "layer_parameters": A dictionary of layer attributes, including:
          - "name": The name of the layer.
          - "device": The device hosting the layer.
          - "m": The input dimension.
          - "n": The output dimension.
          - "head_configs": A list of configurations for the heads in the layer.
          - "head_fusion_configs": The configuration of the fusion component, if present.

    """
    layer_class = f"{self.__class__.__module__}.{self.__class__.__name__}"
    layer_parameters = {
        'name': self.name,
        'device': self.device,
        'm': self.m,
        'n': self.n,
        'head_configs': [head.to_config() for head in self.heads] if self.head else [],
    }

    if self.head_fusion is not None:
        layer_parameters['head_fusion_configs']= self.head_fusion.to_config()

    return {
        "layer_class": layer_class,
        "layer_parameters": layer_parameters
    }