feature_selection_compression

Bases: transformation

The feature selection based data compression function.

It performs the data compression with the provided feature selection method, which incrementally select useful features from the provided data batch.

Notes

Formally, given an input data instance \(\mathbf{x} \in {R}^m\), we can represent the feature selection-based data compression function as follows:

\[
    \begin{equation}
    \kappa(\mathbf{x}) = \text{feature-selection}(\mathbf{x}) \in {R}^d.
    \end{equation}
\]

The output dimension \(d\) may require manual setup, e.g., as a hyper-parameter \(D\).

Parameters:

Name	Type	Description	Default
`D`	`int`	Number of features to retain after compression.	required
`name`	`str`	Name of the transformation.	`= 'feature_selection_compression'`
`fs_function`	`feature_selection`	A pre-configured feature selection function.	`= None`
`fs_function_configs`	`dict`	Configuration for initializing the feature selection function. Should include the class name and optional parameters.	`= None`

Raises:

Type	Description
`ValueError`	If neither `fs_function` nor `fs_function_configs` are specified.

Methods:

Name	Description
`__init__`	Initializes the feature selection based compression function.
`calculate_D`	It validates and returns the specified number of features (`D`).
`forward`	It applies the feature selection and compression function to the input tensor.

Source code in tinybig/compression/feature_selection_compression.py

class feature_selection_compression(transformation):
    r"""
        The feature selection based data compression function.

        It performs the data compression with the provided feature selection method,
        which incrementally select useful features from the provided data batch.

        Notes
        ----------
        Formally, given an input data instance $\mathbf{x} \in {R}^m$, we can represent the feature selection-based data compression function as follows:

        $$
            \begin{equation}
            \kappa(\mathbf{x}) = \text{feature-selection}(\mathbf{x}) \in {R}^d.
            \end{equation}
        $$

        The output dimension $d$ may require manual setup, e.g., as a hyper-parameter $D$.

        Parameters
        ----------
        D : int
            Number of features to retain after compression.
        name : str, default = 'feature_selection_compression'
            Name of the transformation.
        fs_function : feature_selection, default = None
            A pre-configured feature selection function.
        fs_function_configs : dict, default = None
            Configuration for initializing the feature selection function. Should include the class name
            and optional parameters.

        Raises
        ------
        ValueError
            If neither `fs_function` nor `fs_function_configs` are specified.

        Methods
        -------
        __init__(D, name='feature_selection_compression', fs_function=None, fs_function_configs=None, *args, **kwargs)
            Initializes the feature selection based compression function.
        calculate_D(m: int)
            It validates and returns the specified number of features (`D`).
        forward(x: torch.Tensor, device: str = 'cpu', *args, **kwargs)
            It applies the feature selection and compression function to the input tensor.
    """
    def __init__(self, D: int, name='feature_selection_compression', fs_function: feature_selection = None, fs_function_configs: dict = None, *args, **kwargs):
        """
            The initialization method of the feature selection based compression function.

            It initializes the compression function based on
            the provided feature selection method (or its configs).

            Parameters
            ----------
            D : int
                Number of features to retain after compression.
            name : str, default = 'feature_selection_compression'
                Name of the transformation.
            fs_function : feature_selection, default = None
                A pre-configured feature selection function.
            fs_function_configs : dict, default = None
                Configuration for initializing the feature selection function. Should include the class name
                and optional parameters.

            Returns
            ----------
            transformation
                The feature selection based compression function.
        """

        super().__init__(name=name, *args, **kwargs)
        self.D = D

        if fs_function is not None:
            self.fs_function = fs_function
        elif fs_function_configs is not None:
            function_class = fs_function_configs['function_class']
            function_parameters = fs_function_configs['function_parameters'] if 'function_parameters' in fs_function_configs else {}
            if 'n_feature' in function_parameters:
                assert function_parameters['n_feature'] == D
            else:
                function_parameters['n_feature'] = D
            self.fs_function = config.get_obj_from_str(function_class)(**function_parameters)
        else:
            raise ValueError('You must specify either fs_function or fs_function_configs...')

    def calculate_D(self, m: int):
        """
            The compression dimension calculation method.

            It calculates the intermediate compression space dimension based on the input dimension parameter m.
            This method also validates the specified number of features (`D`) and ensures it is less than or equal to `m`.

            Parameters
            ----------
            m : int
                Total number of features in the input.

            Returns
            -------
            int
                The number of features to retain (`D`).

            Raises
            ------
            AssertionError
                If `D` is not set or is greater than `m`.
        """
        assert self.D is not None and self.D <= m, 'You must specify a D that is smaller than m!'
        return self.D

    def forward(self, x: torch.Tensor, device: str = 'cpu', *args, **kwargs):
        r"""
            The forward method of the feature selection based compression function.

            It applies the feature selection based compression function to the input tensor.

            Formally, given an input data instance $\mathbf{x} \in {R}^m$, we can represent the feature selection-based data compression function as follows:

            $$
                \begin{equation}
                \kappa(\mathbf{x}) = \text{feature-selection}(\mathbf{x}) \in {R}^d.
                \end{equation}
            $$

            Parameters
            ----------
            x : torch.Tensor
                Input tensor of shape `(batch_size, num_features)`.
            device : str, optional
                Device for computation (e.g., 'cpu' or 'cuda'). Defaults to 'cpu'.

            Returns
            -------
            torch.Tensor
                Compressed tensor of shape `(batch_size, D)`.

            Raises
            ------
            AssertionError
                If the output tensor shape does not match the expected `(batch_size, D)`.
        """
        b, m = x.shape
        x = self.pre_process(x=x, device=device)

        compression = self.fs_function(torch.from_numpy(x.numpy())).to(device)

        assert compression.shape == (b, self.calculate_D(m=m))
        return self.post_process(x=compression, device=device)

`init(D, name='feature_selection_compression', fs_function=None, fs_function_configs=None, *args, **kwargs)`

The initialization method of the feature selection based compression function.

It initializes the compression function based on the provided feature selection method (or its configs).

Parameters:

Name	Type	Description	Default
`D`	`int`	Number of features to retain after compression.	required
`name`	`str`	Name of the transformation.	`= 'feature_selection_compression'`
`fs_function`	`feature_selection`	A pre-configured feature selection function.	`= None`
`fs_function_configs`	`dict`	Configuration for initializing the feature selection function. Should include the class name and optional parameters.	`= None`

Returns:

Type	Description
`transformation`	The feature selection based compression function.

Source code in tinybig/compression/feature_selection_compression.py

def __init__(self, D: int, name='feature_selection_compression', fs_function: feature_selection = None, fs_function_configs: dict = None, *args, **kwargs):
    """
        The initialization method of the feature selection based compression function.

        It initializes the compression function based on
        the provided feature selection method (or its configs).

        Parameters
        ----------
        D : int
            Number of features to retain after compression.
        name : str, default = 'feature_selection_compression'
            Name of the transformation.
        fs_function : feature_selection, default = None
            A pre-configured feature selection function.
        fs_function_configs : dict, default = None
            Configuration for initializing the feature selection function. Should include the class name
            and optional parameters.

        Returns
        ----------
        transformation
            The feature selection based compression function.
    """

    super().__init__(name=name, *args, **kwargs)
    self.D = D

    if fs_function is not None:
        self.fs_function = fs_function
    elif fs_function_configs is not None:
        function_class = fs_function_configs['function_class']
        function_parameters = fs_function_configs['function_parameters'] if 'function_parameters' in fs_function_configs else {}
        if 'n_feature' in function_parameters:
            assert function_parameters['n_feature'] == D
        else:
            function_parameters['n_feature'] = D
        self.fs_function = config.get_obj_from_str(function_class)(**function_parameters)
    else:
        raise ValueError('You must specify either fs_function or fs_function_configs...')

`calculate_D(m)`

The compression dimension calculation method.

It calculates the intermediate compression space dimension based on the input dimension parameter m. This method also validates the specified number of features (D) and ensures it is less than or equal to m.

Parameters:

Name	Type	Description	Default
`m`	`int`	Total number of features in the input.	required

Returns:

Type	Description
`int`	The number of features to retain (`D`).

Raises:

Type	Description
`AssertionError`	If `D` is not set or is greater than `m`.

Source code in tinybig/compression/feature_selection_compression.py

def calculate_D(self, m: int):
    """
        The compression dimension calculation method.

        It calculates the intermediate compression space dimension based on the input dimension parameter m.
        This method also validates the specified number of features (`D`) and ensures it is less than or equal to `m`.

        Parameters
        ----------
        m : int
            Total number of features in the input.

        Returns
        -------
        int
            The number of features to retain (`D`).

        Raises
        ------
        AssertionError
            If `D` is not set or is greater than `m`.
    """
    assert self.D is not None and self.D <= m, 'You must specify a D that is smaller than m!'
    return self.D

`forward(x, device='cpu', *args, **kwargs)`

The forward method of the feature selection based compression function.

It applies the feature selection based compression function to the input tensor.

Formally, given an input data instance \(\mathbf{x} \in {R}^m\), we can represent the feature selection-based data compression function as follows:

\[
    \begin{equation}
    \kappa(\mathbf{x}) = \text{feature-selection}(\mathbf{x}) \in {R}^d.
    \end{equation}
\]

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	Input tensor of shape `(batch_size, num_features)`.	required
`device`	`str`	Device for computation (e.g., 'cpu' or 'cuda'). Defaults to 'cpu'.	`'cpu'`

Returns:

Type	Description
`Tensor`	Compressed tensor of shape `(batch_size, D)`.

Raises:

Type	Description
`AssertionError`	If the output tensor shape does not match the expected `(batch_size, D)`.

Source code in tinybig/compression/feature_selection_compression.py

def forward(self, x: torch.Tensor, device: str = 'cpu', *args, **kwargs):
    r"""
        The forward method of the feature selection based compression function.

        It applies the feature selection based compression function to the input tensor.

        Formally, given an input data instance $\mathbf{x} \in {R}^m$, we can represent the feature selection-based data compression function as follows:

        $$
            \begin{equation}
            \kappa(\mathbf{x}) = \text{feature-selection}(\mathbf{x}) \in {R}^d.
            \end{equation}
        $$

        Parameters
        ----------
        x : torch.Tensor
            Input tensor of shape `(batch_size, num_features)`.
        device : str, optional
            Device for computation (e.g., 'cpu' or 'cuda'). Defaults to 'cpu'.

        Returns
        -------
        torch.Tensor
            Compressed tensor of shape `(batch_size, D)`.

        Raises
        ------
        AssertionError
            If the output tensor shape does not match the expected `(batch_size, D)`.
    """
    b, m = x.shape
    x = self.pre_process(x=x, device=device)

    compression = self.fs_function(torch.from_numpy(x.numpy())).to(device)

    assert compression.shape == (b, self.calculate_D(m=m))
    return self.post_process(x=compression, device=device)