Skip to content

chain_interdependence_layer

Bases: layer

A chain interdependence layer for capturing sequential dependencies in data.

This layer integrates multiple chain interdependence heads to model sequential interdependencies. It supports features such as multi-hop connections, inverse or exponential approximations, parameter reconciliation, and various output processing functions.

Attributes:

Name Type Description
m int

The input dimension of the layer.

n int

The output dimension of the layer.

chain_length int

The length of the chain for modeling interdependencies.

channel_num int

The number of channels in each chain interdependence head.

width int

The number of chain interdependence heads in the layer.

name str

The name of the layer.

bi_directional bool

Whether to include bi-directional dependencies in the chain.

with_multihop bool

Whether to enable multi-hop dependencies.

h int

The number of hops for multi-hop connections.

accumulative bool

Whether to accumulate dependencies across hops.

with_inverse_approx bool

Whether to use inverse approximation for interdependence.

with_exponential_approx bool

Whether to use exponential approximation for interdependence.

self_dependence bool

Whether to include self-dependencies in the chain.

self_scaling float

The scaling factor for self-dependencies.

with_dual_lphm bool

Whether to use dual LPHM reconciliation for parameters.

with_lorr bool

Whether to use LORR reconciliation for parameters.

r int

The rank for parameter reconciliation.

enable_bias bool

Whether to enable bias in parameter reconciliation.

with_residual bool

Whether to include a residual connection in the layer.

with_batch_norm bool

Whether to apply batch normalization to the output.

with_relu bool

Whether to apply ReLU activation to the output.

with_dropout bool

Whether to apply dropout to the output.

p float

Dropout probability.

with_softmax bool

Whether to apply softmax activation to the output.

parameters_init_method str

The initialization method for parameters.

device str

The device to run the layer on ('cpu' or 'cuda').

Methods:

Name Description
__init__

Initializes the chain interdependence layer with specified parameters.

forward

Performs a forward pass through the layer.

Source code in tinybig/layer/chain_based_layers.py
class chain_interdependence_layer(layer):
    """
    A chain interdependence layer for capturing sequential dependencies in data.

    This layer integrates multiple chain interdependence heads to model sequential interdependencies.
    It supports features such as multi-hop connections, inverse or exponential approximations,
    parameter reconciliation, and various output processing functions.

    Attributes
    ----------
    m : int
        The input dimension of the layer.
    n : int
        The output dimension of the layer.
    chain_length : int
        The length of the chain for modeling interdependencies.
    channel_num : int
        The number of channels in each chain interdependence head.
    width : int
        The number of chain interdependence heads in the layer.
    name : str
        The name of the layer.
    bi_directional : bool
        Whether to include bi-directional dependencies in the chain.
    with_multihop : bool
        Whether to enable multi-hop dependencies.
    h : int
        The number of hops for multi-hop connections.
    accumulative : bool
        Whether to accumulate dependencies across hops.
    with_inverse_approx : bool
        Whether to use inverse approximation for interdependence.
    with_exponential_approx : bool
        Whether to use exponential approximation for interdependence.
    self_dependence : bool
        Whether to include self-dependencies in the chain.
    self_scaling : float
        The scaling factor for self-dependencies.
    with_dual_lphm : bool
        Whether to use dual LPHM reconciliation for parameters.
    with_lorr : bool
        Whether to use LORR reconciliation for parameters.
    r : int
        The rank for parameter reconciliation.
    enable_bias : bool
        Whether to enable bias in parameter reconciliation.
    with_residual : bool
        Whether to include a residual connection in the layer.
    with_batch_norm : bool
        Whether to apply batch normalization to the output.
    with_relu : bool
        Whether to apply ReLU activation to the output.
    with_dropout : bool
        Whether to apply dropout to the output.
    p : float
        Dropout probability.
    with_softmax : bool
        Whether to apply softmax activation to the output.
    parameters_init_method : str
        The initialization method for parameters.
    device : str
        The device to run the layer on ('cpu' or 'cuda').

    Methods
    -------
    __init__(...)
        Initializes the chain interdependence layer with specified parameters.
    forward(x, fusion_strategy='average', device='cpu', *args, **kwargs)
        Performs a forward pass through the layer.
    """
    def __init__(
        self,
        m: int, n: int,
        chain_length: int,
        channel_num: int = 1,
        width: int = 1,
        name: str = 'chain_interdependence_layer',
        # interdependence function parameters
        bi_directional: bool = False,
        with_multihop: bool = False, h: int = 1, accumulative: bool = False,
        with_inverse_approx: bool = False,
        with_exponential_approx: bool = False,
        self_dependence: bool = True,
        self_scaling: float = 1.0,
        # parameter reconciliation function parameters
        with_dual_lphm: bool = False,
        with_lorr: bool = False, r: int = 3,
        enable_bias: bool = False,
        # remainder function parameters
        with_residual: bool = False,
        # output processing parameters
        with_batch_norm: bool = False,
        with_relu: bool = True,
        with_dropout: bool = False, p: float = 0.25,
        with_softmax: bool = True,
        # other parameters
        parameters_init_method: str = 'xavier_normal',
        device: str = 'cpu', *args, ** kwargs
    ):
        """
        Initializes the chain interdependence layer.

        Parameters
        ----------
        m : int
            The input dimension of the layer.
        n : int
            The output dimension of the layer.
        chain_length : int
            The length of the chain for modeling interdependencies.
        channel_num : int, default=1
            The number of channels in each chain interdependence head.
        width : int, default=1
            The number of chain interdependence heads in the layer.
        name : str, default='chain_interdependence_layer'
            The name of the layer.
        bi_directional : bool, default=False
            Whether to include bi-directional dependencies in the chain.
        with_multihop : bool, default=False
            Whether to enable multi-hop dependencies.
        h : int, default=1
            The number of hops for multi-hop connections.
        accumulative : bool, default=False
            Whether to accumulate dependencies across hops.
        with_inverse_approx : bool, default=False
            Whether to use inverse approximation for interdependence.
        with_exponential_approx : bool, default=False
            Whether to use exponential approximation for interdependence.
        self_dependence : bool, default=True
            Whether to include self-dependencies in the chain.
        self_scaling : float, default=1.0
            The scaling factor for self-dependencies.
        with_dual_lphm : bool, default=False
            Whether to use dual LPHM reconciliation for parameters.
        with_lorr : bool, default=False
            Whether to use LORR reconciliation for parameters.
        r : int, default=3
            The rank for parameter reconciliation.
        enable_bias : bool, default=False
            Whether to enable bias in parameter reconciliation.
        with_residual : bool, default=False
            Whether to include a residual connection in the layer.
        with_batch_norm : bool, default=False
            Whether to apply batch normalization to the output.
        with_relu : bool, default=True
            Whether to apply ReLU activation to the output.
        with_dropout : bool, default=False
            Whether to apply dropout to the output.
        p : float, default=0.25
            Dropout probability.
        with_softmax : bool, default=True
            Whether to apply softmax activation to the output.
        parameters_init_method : str, default='xavier_normal'
            The initialization method for parameters.
        device : str, default='cpu'
            The device to run the layer on ('cpu' or 'cuda').

        Returns
        -------
        None
        """
        print('* chain_interdependence_layer, width:', width)
        heads = [
            chain_interdependence_head(
                m=m, n=n,
                chain_length=chain_length,
                channel_num=channel_num,
                # -----------------------
                bi_directional=bi_directional,
                with_multihop=with_multihop, h=h, accumulative=accumulative,
                with_inverse_approx=with_inverse_approx,
                with_exponential_approx=with_exponential_approx,
                self_dependence=self_dependence,
                self_scaling=self_scaling,
                # -----------------------
                with_dual_lphm=with_dual_lphm,
                with_lorr=with_lorr, r=r,
                enable_bias=enable_bias,
                # -----------------------
                with_residual=with_residual,
                # -----------------------
                with_batch_norm=with_batch_norm,
                with_relu=with_relu,
                with_dropout=with_dropout, p=p,
                with_softmax=with_softmax,
                # -----------------------
                parameters_init_method=parameters_init_method,
                device=device, *args, ** kwargs
            )
        ] * width
        print('--------------------------')
        super().__init__(name=name, m=m, n=n, heads=heads, device=device, *args, **kwargs)

    def forward(self, x: torch.Tensor, fusion_strategy: str = 'average', device: str = 'cpu', *args, **kwargs):
        """
        Performs a forward pass through the chain interdependence layer.

        Parameters
        ----------
        x : torch.Tensor
            The input tensor with shape `(batch_size, m)`.
        fusion_strategy : str, default='average'
            The strategy for fusing outputs from multiple heads.
        device : str, default='cpu'
            The device to run the computation on ('cpu' or 'cuda').

        Returns
        -------
        torch.Tensor
            The output tensor with shape `(batch_size, n)` after applying the layer.
        """
        assert x is not None and x.ndim == 2

        results = []
        for head in self.heads:
            results.append(head(x=x, device=device))
        assert results != [] and [results[0].shape] * len(results) == [result.shape for result in results]

        if self.head_fusion is not None:
            assert self.head_fusion.get_num() == len(results) and [results[0].shape] * len(results) == [result.shape for result in results]
            result = self.head_fusion(x=results, w=self.w_head_fusion, device=device)
        else:
            assert len(results) == 1
            result = results[0]

        return result

__init__(m, n, chain_length, channel_num=1, width=1, name='chain_interdependence_layer', bi_directional=False, with_multihop=False, h=1, accumulative=False, with_inverse_approx=False, with_exponential_approx=False, self_dependence=True, self_scaling=1.0, with_dual_lphm=False, with_lorr=False, r=3, enable_bias=False, with_residual=False, with_batch_norm=False, with_relu=True, with_dropout=False, p=0.25, with_softmax=True, parameters_init_method='xavier_normal', device='cpu', *args, **kwargs)

Initializes the chain interdependence layer.

Parameters:

Name Type Description Default
m int

The input dimension of the layer.

required
n int

The output dimension of the layer.

required
chain_length int

The length of the chain for modeling interdependencies.

required
channel_num int

The number of channels in each chain interdependence head.

1
width int

The number of chain interdependence heads in the layer.

1
name str

The name of the layer.

'chain_interdependence_layer'
bi_directional bool

Whether to include bi-directional dependencies in the chain.

False
with_multihop bool

Whether to enable multi-hop dependencies.

False
h int

The number of hops for multi-hop connections.

1
accumulative bool

Whether to accumulate dependencies across hops.

False
with_inverse_approx bool

Whether to use inverse approximation for interdependence.

False
with_exponential_approx bool

Whether to use exponential approximation for interdependence.

False
self_dependence bool

Whether to include self-dependencies in the chain.

True
self_scaling float

The scaling factor for self-dependencies.

1.0
with_dual_lphm bool

Whether to use dual LPHM reconciliation for parameters.

False
with_lorr bool

Whether to use LORR reconciliation for parameters.

False
r int

The rank for parameter reconciliation.

3
enable_bias bool

Whether to enable bias in parameter reconciliation.

False
with_residual bool

Whether to include a residual connection in the layer.

False
with_batch_norm bool

Whether to apply batch normalization to the output.

False
with_relu bool

Whether to apply ReLU activation to the output.

True
with_dropout bool

Whether to apply dropout to the output.

False
p float

Dropout probability.

0.25
with_softmax bool

Whether to apply softmax activation to the output.

True
parameters_init_method str

The initialization method for parameters.

'xavier_normal'
device str

The device to run the layer on ('cpu' or 'cuda').

'cpu'

Returns:

Type Description
None
Source code in tinybig/layer/chain_based_layers.py
def __init__(
    self,
    m: int, n: int,
    chain_length: int,
    channel_num: int = 1,
    width: int = 1,
    name: str = 'chain_interdependence_layer',
    # interdependence function parameters
    bi_directional: bool = False,
    with_multihop: bool = False, h: int = 1, accumulative: bool = False,
    with_inverse_approx: bool = False,
    with_exponential_approx: bool = False,
    self_dependence: bool = True,
    self_scaling: float = 1.0,
    # parameter reconciliation function parameters
    with_dual_lphm: bool = False,
    with_lorr: bool = False, r: int = 3,
    enable_bias: bool = False,
    # remainder function parameters
    with_residual: bool = False,
    # output processing parameters
    with_batch_norm: bool = False,
    with_relu: bool = True,
    with_dropout: bool = False, p: float = 0.25,
    with_softmax: bool = True,
    # other parameters
    parameters_init_method: str = 'xavier_normal',
    device: str = 'cpu', *args, ** kwargs
):
    """
    Initializes the chain interdependence layer.

    Parameters
    ----------
    m : int
        The input dimension of the layer.
    n : int
        The output dimension of the layer.
    chain_length : int
        The length of the chain for modeling interdependencies.
    channel_num : int, default=1
        The number of channels in each chain interdependence head.
    width : int, default=1
        The number of chain interdependence heads in the layer.
    name : str, default='chain_interdependence_layer'
        The name of the layer.
    bi_directional : bool, default=False
        Whether to include bi-directional dependencies in the chain.
    with_multihop : bool, default=False
        Whether to enable multi-hop dependencies.
    h : int, default=1
        The number of hops for multi-hop connections.
    accumulative : bool, default=False
        Whether to accumulate dependencies across hops.
    with_inverse_approx : bool, default=False
        Whether to use inverse approximation for interdependence.
    with_exponential_approx : bool, default=False
        Whether to use exponential approximation for interdependence.
    self_dependence : bool, default=True
        Whether to include self-dependencies in the chain.
    self_scaling : float, default=1.0
        The scaling factor for self-dependencies.
    with_dual_lphm : bool, default=False
        Whether to use dual LPHM reconciliation for parameters.
    with_lorr : bool, default=False
        Whether to use LORR reconciliation for parameters.
    r : int, default=3
        The rank for parameter reconciliation.
    enable_bias : bool, default=False
        Whether to enable bias in parameter reconciliation.
    with_residual : bool, default=False
        Whether to include a residual connection in the layer.
    with_batch_norm : bool, default=False
        Whether to apply batch normalization to the output.
    with_relu : bool, default=True
        Whether to apply ReLU activation to the output.
    with_dropout : bool, default=False
        Whether to apply dropout to the output.
    p : float, default=0.25
        Dropout probability.
    with_softmax : bool, default=True
        Whether to apply softmax activation to the output.
    parameters_init_method : str, default='xavier_normal'
        The initialization method for parameters.
    device : str, default='cpu'
        The device to run the layer on ('cpu' or 'cuda').

    Returns
    -------
    None
    """
    print('* chain_interdependence_layer, width:', width)
    heads = [
        chain_interdependence_head(
            m=m, n=n,
            chain_length=chain_length,
            channel_num=channel_num,
            # -----------------------
            bi_directional=bi_directional,
            with_multihop=with_multihop, h=h, accumulative=accumulative,
            with_inverse_approx=with_inverse_approx,
            with_exponential_approx=with_exponential_approx,
            self_dependence=self_dependence,
            self_scaling=self_scaling,
            # -----------------------
            with_dual_lphm=with_dual_lphm,
            with_lorr=with_lorr, r=r,
            enable_bias=enable_bias,
            # -----------------------
            with_residual=with_residual,
            # -----------------------
            with_batch_norm=with_batch_norm,
            with_relu=with_relu,
            with_dropout=with_dropout, p=p,
            with_softmax=with_softmax,
            # -----------------------
            parameters_init_method=parameters_init_method,
            device=device, *args, ** kwargs
        )
    ] * width
    print('--------------------------')
    super().__init__(name=name, m=m, n=n, heads=heads, device=device, *args, **kwargs)

forward(x, fusion_strategy='average', device='cpu', *args, **kwargs)

Performs a forward pass through the chain interdependence layer.

Parameters:

Name Type Description Default
x Tensor

The input tensor with shape (batch_size, m).

required
fusion_strategy str

The strategy for fusing outputs from multiple heads.

'average'
device str

The device to run the computation on ('cpu' or 'cuda').

'cpu'

Returns:

Type Description
Tensor

The output tensor with shape (batch_size, n) after applying the layer.

Source code in tinybig/layer/chain_based_layers.py
def forward(self, x: torch.Tensor, fusion_strategy: str = 'average', device: str = 'cpu', *args, **kwargs):
    """
    Performs a forward pass through the chain interdependence layer.

    Parameters
    ----------
    x : torch.Tensor
        The input tensor with shape `(batch_size, m)`.
    fusion_strategy : str, default='average'
        The strategy for fusing outputs from multiple heads.
    device : str, default='cpu'
        The device to run the computation on ('cpu' or 'cuda').

    Returns
    -------
    torch.Tensor
        The output tensor with shape `(batch_size, n)` after applying the layer.
    """
    assert x is not None and x.ndim == 2

    results = []
    for head in self.heads:
        results.append(head(x=x, device=device))
    assert results != [] and [results[0].shape] * len(results) == [result.shape for result in results]

    if self.head_fusion is not None:
        assert self.head_fusion.get_num() == len(results) and [results[0].shape] * len(results) == [result.shape for result in results]
        result = self.head_fusion(x=results, w=self.w_head_fusion, device=device)
    else:
        assert len(results) == 1
        result = results[0]

    return result