Skip to content

feature_selection_compression

Bases: transformation

The feature selection based data compression function.

It performs the data compression with the provided feature selection method, which incrementally select useful features from the provided data batch.

Notes

Formally, given an input data instance \(\mathbf{x} \in {R}^m\), we can represent the feature selection-based data compression function as follows:

\[
    \begin{equation}
    \kappa(\mathbf{x}) = \text{feature-selection}(\mathbf{x}) \in {R}^d.
    \end{equation}
\]

The output dimension \(d\) may require manual setup, e.g., as a hyper-parameter \(D\).

Parameters:

Name Type Description Default
D int

Number of features to retain after compression.

required
name str

Name of the transformation.

= 'feature_selection_compression'
fs_function feature_selection

A pre-configured feature selection function.

= None
fs_function_configs dict

Configuration for initializing the feature selection function. Should include the class name and optional parameters.

= None

Raises:

Type Description
ValueError

If neither fs_function nor fs_function_configs are specified.

Methods:

Name Description
__init__

Initializes the feature selection based compression function.

calculate_D

It validates and returns the specified number of features (D).

forward

It applies the feature selection and compression function to the input tensor.

Source code in tinybig/compression/feature_selection_compression.py
class feature_selection_compression(transformation):
    r"""
        The feature selection based data compression function.

        It performs the data compression with the provided feature selection method,
        which incrementally select useful features from the provided data batch.

        Notes
        ----------
        Formally, given an input data instance $\mathbf{x} \in {R}^m$, we can represent the feature selection-based data compression function as follows:

        $$
            \begin{equation}
            \kappa(\mathbf{x}) = \text{feature-selection}(\mathbf{x}) \in {R}^d.
            \end{equation}
        $$

        The output dimension $d$ may require manual setup, e.g., as a hyper-parameter $D$.

        Parameters
        ----------
        D : int
            Number of features to retain after compression.
        name : str, default = 'feature_selection_compression'
            Name of the transformation.
        fs_function : feature_selection, default = None
            A pre-configured feature selection function.
        fs_function_configs : dict, default = None
            Configuration for initializing the feature selection function. Should include the class name
            and optional parameters.

        Raises
        ------
        ValueError
            If neither `fs_function` nor `fs_function_configs` are specified.

        Methods
        -------
        __init__(D, name='feature_selection_compression', fs_function=None, fs_function_configs=None, *args, **kwargs)
            Initializes the feature selection based compression function.
        calculate_D(m: int)
            It validates and returns the specified number of features (`D`).
        forward(x: torch.Tensor, device: str = 'cpu', *args, **kwargs)
            It applies the feature selection and compression function to the input tensor.
    """
    def __init__(self, D: int, name='feature_selection_compression', fs_function: feature_selection = None, fs_function_configs: dict = None, *args, **kwargs):
        """
            The initialization method of the feature selection based compression function.

            It initializes the compression function based on
            the provided feature selection method (or its configs).

            Parameters
            ----------
            D : int
                Number of features to retain after compression.
            name : str, default = 'feature_selection_compression'
                Name of the transformation.
            fs_function : feature_selection, default = None
                A pre-configured feature selection function.
            fs_function_configs : dict, default = None
                Configuration for initializing the feature selection function. Should include the class name
                and optional parameters.

            Returns
            ----------
            transformation
                The feature selection based compression function.
        """

        super().__init__(name=name, *args, **kwargs)
        self.D = D

        if fs_function is not None:
            self.fs_function = fs_function
        elif fs_function_configs is not None:
            function_class = fs_function_configs['function_class']
            function_parameters = fs_function_configs['function_parameters'] if 'function_parameters' in fs_function_configs else {}
            if 'n_feature' in function_parameters:
                assert function_parameters['n_feature'] == D
            else:
                function_parameters['n_feature'] = D
            self.fs_function = config.get_obj_from_str(function_class)(**function_parameters)
        else:
            raise ValueError('You must specify either fs_function or fs_function_configs...')

    def calculate_D(self, m: int):
        """
            The compression dimension calculation method.

            It calculates the intermediate compression space dimension based on the input dimension parameter m.
            This method also validates the specified number of features (`D`) and ensures it is less than or equal to `m`.

            Parameters
            ----------
            m : int
                Total number of features in the input.

            Returns
            -------
            int
                The number of features to retain (`D`).

            Raises
            ------
            AssertionError
                If `D` is not set or is greater than `m`.
        """
        assert self.D is not None and self.D <= m, 'You must specify a D that is smaller than m!'
        return self.D

    def forward(self, x: torch.Tensor, device: str = 'cpu', *args, **kwargs):
        r"""
            The forward method of the feature selection based compression function.

            It applies the feature selection based compression function to the input tensor.

            Formally, given an input data instance $\mathbf{x} \in {R}^m$, we can represent the feature selection-based data compression function as follows:

            $$
                \begin{equation}
                \kappa(\mathbf{x}) = \text{feature-selection}(\mathbf{x}) \in {R}^d.
                \end{equation}
            $$

            Parameters
            ----------
            x : torch.Tensor
                Input tensor of shape `(batch_size, num_features)`.
            device : str, optional
                Device for computation (e.g., 'cpu' or 'cuda'). Defaults to 'cpu'.

            Returns
            -------
            torch.Tensor
                Compressed tensor of shape `(batch_size, D)`.

            Raises
            ------
            AssertionError
                If the output tensor shape does not match the expected `(batch_size, D)`.
        """
        b, m = x.shape
        x = self.pre_process(x=x, device=device)

        compression = self.fs_function(torch.from_numpy(x.numpy())).to(device)

        assert compression.shape == (b, self.calculate_D(m=m))
        return self.post_process(x=compression, device=device)

__init__(D, name='feature_selection_compression', fs_function=None, fs_function_configs=None, *args, **kwargs)

The initialization method of the feature selection based compression function.

It initializes the compression function based on the provided feature selection method (or its configs).

Parameters:

Name Type Description Default
D int

Number of features to retain after compression.

required
name str

Name of the transformation.

= 'feature_selection_compression'
fs_function feature_selection

A pre-configured feature selection function.

= None
fs_function_configs dict

Configuration for initializing the feature selection function. Should include the class name and optional parameters.

= None

Returns:

Type Description
transformation

The feature selection based compression function.

Source code in tinybig/compression/feature_selection_compression.py
def __init__(self, D: int, name='feature_selection_compression', fs_function: feature_selection = None, fs_function_configs: dict = None, *args, **kwargs):
    """
        The initialization method of the feature selection based compression function.

        It initializes the compression function based on
        the provided feature selection method (or its configs).

        Parameters
        ----------
        D : int
            Number of features to retain after compression.
        name : str, default = 'feature_selection_compression'
            Name of the transformation.
        fs_function : feature_selection, default = None
            A pre-configured feature selection function.
        fs_function_configs : dict, default = None
            Configuration for initializing the feature selection function. Should include the class name
            and optional parameters.

        Returns
        ----------
        transformation
            The feature selection based compression function.
    """

    super().__init__(name=name, *args, **kwargs)
    self.D = D

    if fs_function is not None:
        self.fs_function = fs_function
    elif fs_function_configs is not None:
        function_class = fs_function_configs['function_class']
        function_parameters = fs_function_configs['function_parameters'] if 'function_parameters' in fs_function_configs else {}
        if 'n_feature' in function_parameters:
            assert function_parameters['n_feature'] == D
        else:
            function_parameters['n_feature'] = D
        self.fs_function = config.get_obj_from_str(function_class)(**function_parameters)
    else:
        raise ValueError('You must specify either fs_function or fs_function_configs...')

calculate_D(m)

The compression dimension calculation method.

It calculates the intermediate compression space dimension based on the input dimension parameter m. This method also validates the specified number of features (D) and ensures it is less than or equal to m.

Parameters:

Name Type Description Default
m int

Total number of features in the input.

required

Returns:

Type Description
int

The number of features to retain (D).

Raises:

Type Description
AssertionError

If D is not set or is greater than m.

Source code in tinybig/compression/feature_selection_compression.py
def calculate_D(self, m: int):
    """
        The compression dimension calculation method.

        It calculates the intermediate compression space dimension based on the input dimension parameter m.
        This method also validates the specified number of features (`D`) and ensures it is less than or equal to `m`.

        Parameters
        ----------
        m : int
            Total number of features in the input.

        Returns
        -------
        int
            The number of features to retain (`D`).

        Raises
        ------
        AssertionError
            If `D` is not set or is greater than `m`.
    """
    assert self.D is not None and self.D <= m, 'You must specify a D that is smaller than m!'
    return self.D

forward(x, device='cpu', *args, **kwargs)

The forward method of the feature selection based compression function.

It applies the feature selection based compression function to the input tensor.

Formally, given an input data instance \(\mathbf{x} \in {R}^m\), we can represent the feature selection-based data compression function as follows:

\[
    \begin{equation}
    \kappa(\mathbf{x}) = \text{feature-selection}(\mathbf{x}) \in {R}^d.
    \end{equation}
\]

Parameters:

Name Type Description Default
x Tensor

Input tensor of shape (batch_size, num_features).

required
device str

Device for computation (e.g., 'cpu' or 'cuda'). Defaults to 'cpu'.

'cpu'

Returns:

Type Description
Tensor

Compressed tensor of shape (batch_size, D).

Raises:

Type Description
AssertionError

If the output tensor shape does not match the expected (batch_size, D).

Source code in tinybig/compression/feature_selection_compression.py
def forward(self, x: torch.Tensor, device: str = 'cpu', *args, **kwargs):
    r"""
        The forward method of the feature selection based compression function.

        It applies the feature selection based compression function to the input tensor.

        Formally, given an input data instance $\mathbf{x} \in {R}^m$, we can represent the feature selection-based data compression function as follows:

        $$
            \begin{equation}
            \kappa(\mathbf{x}) = \text{feature-selection}(\mathbf{x}) \in {R}^d.
            \end{equation}
        $$

        Parameters
        ----------
        x : torch.Tensor
            Input tensor of shape `(batch_size, num_features)`.
        device : str, optional
            Device for computation (e.g., 'cpu' or 'cuda'). Defaults to 'cpu'.

        Returns
        -------
        torch.Tensor
            Compressed tensor of shape `(batch_size, D)`.

        Raises
        ------
        AssertionError
            If the output tensor shape does not match the expected `(batch_size, D)`.
    """
    b, m = x.shape
    x = self.pre_process(x=x, device=device)

    compression = self.fs_function(torch.from_numpy(x.numpy())).to(device)

    assert compression.shape == (b, self.calculate_D(m=m))
    return self.post_process(x=compression, device=device)