Skip to content

rpn_head

Bases: Module

The RPN head class for implementing the multi-channel module.

It will be used to compose the RPN layer module for building deep RPN models.

...

Notes

Similar to convolutional neural networks (CNNs) employing multiple filters, RPN allows each head to have multiple channels of parameters applied to the same data expansion. RPN defines its multi-channel parameters as \(\mathbf{w}^{0}, \mathbf{w}^{1}, \cdots, \mathbf{w}^{C-1}\), where \(C\) denotes the number of channels. Based on the data expansion, parameter reconciliation and remainder functions, the RPN head will calculate its output with such multi-channel parameters as follows: $$ \begin{equation} g(\mathbf{x} | \mathbf{w}, C) = \sum_{c=0}^{C-1} \left\langle \kappa(\mathbf{x}), \psi(\mathbf{w}^{c}) \right\rangle + \pi(\mathbf{x}), \end{equation} $$ where these multi-channel parameters are fabricated from length \(l\) to shape \((n, D)\) using the identical parameter reconciliation function.

Attributes:

Name Type Description
m int

The input dimension of the head.

n int

The output dimension of the head.

l int, default = None

The number of parameter for each channel in the head.

channel_num int, default = 1

The number of channels in the head.

data_transformation object, default = None

The data transformation function of the head. The data transformation can be initialized directly with this parameter or with the data_transformation_config parameter.

parameter_fabrication object, default = None

The parameter fabrication function of the head. The parameter fabrication can be initialized directly with this parameter or with the parameter_fabrication_config parameter.

remainder object, default = None

The remainder function the head. The remainder can be initialized directly with this parameter or with the remainder_config parameter.

w torch.nn.Parameter, default = None

The parameters used for the parameter reconciliation function of length \(l\) for each channel, which will be fabricated into a parameter matrix of shape \((n, D)\).

b torch.nn.Parameter, default = None

The (optional) bias parameters used for the parameter reconciliation function.

w_remainder torch.nn.Parameter, default = None

The (optional) parameters used for the remainder function.

b_remainder torch.nn.Parameter, default = None

The (optional) bias parameters used for the remainder function.

device str, default = 'cpu'

The device for hosting the head.

Methods:

Name Description
__init__

The initialization method of the RPN-head with multiple channels.

initialize_parameters

The parameter initialization method.

initialize_parameters_kaiming

The kaiming parameter initialization method.

initialize_parameters_xavier

The xavier initialization method.

output_processing

The output processing method of the head.

forward

The forward method of the RPN head module.

__call__

The re-implementation of the builtin callable method based on the forward method.

Source code in tinybig/module/base_head.py
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
class rpn_head(torch.nn.Module):
    r"""
    The RPN head class for implementing the multi-channel module.

    It will be used to compose the RPN layer module for building deep RPN models.

    ...

    Notes
    ----------
    Similar to convolutional neural networks (CNNs) employing multiple filters, RPN allows each head to have multiple
    channels of parameters applied to the same data expansion.
    RPN defines its multi-channel parameters as $\mathbf{w}^{0}, \mathbf{w}^{1}, \cdots, \mathbf{w}^{C-1}$,
    where $C$ denotes the number of channels.
    Based on the data expansion, parameter reconciliation and remainder functions, the RPN head will calculate its
    output with such multi-channel parameters as follows:
    $$
        \begin{equation}
            g(\mathbf{x} | \mathbf{w}, C) = \sum_{c=0}^{C-1} \left\langle \kappa(\mathbf{x}), \psi(\mathbf{w}^{c}) \right\rangle + \pi(\mathbf{x}),
        \end{equation}
    $$
    where these multi-channel parameters are fabricated from length $l$ to shape $(n, D)$ using the identical
    parameter reconciliation function.

    Attributes
    ----------
    m: int
        The input dimension of the head.
    n: int
        The output dimension of the head.
    l: int, default = None
        The number of parameter for each channel in the head.
    channel_num: int, default = 1
        The number of channels in the head.
    data_transformation: object, default = None
        The data transformation function of the head. The data transformation can be initialized directly
        with this parameter or with the data_transformation_config parameter.
    parameter_fabrication: object, default = None
        The parameter fabrication function of the head. The parameter fabrication can be initialized directly
        with this parameter or with the parameter_fabrication_config parameter.
    remainder: object, default = None
        The remainder function the head. The remainder can be initialized directly
        with this parameter or with the remainder_config parameter.
    w: torch.nn.Parameter, default = None
        The parameters used for the parameter reconciliation function of length $l$ for each channel,
        which will be fabricated into a parameter matrix of shape $(n, D)$.
    b: torch.nn.Parameter, default = None
        The (optional) bias parameters used for the parameter reconciliation function.
    w_remainder: torch.nn.Parameter, default = None
        The (optional) parameters used for the remainder function.
    b_remainder: torch.nn.Parameter, default = None
        The (optional) bias parameters used for the remainder function.
    device: str, default = 'cpu'
        The device for hosting the head.

    Methods
    ----------
    __init__
        The initialization method of the RPN-head with multiple channels.

    initialize_parameters
        The parameter initialization method.

    initialize_parameters_kaiming
        The kaiming parameter initialization method.

    initialize_parameters_xavier
        The xavier initialization method.

    output_processing
        The output processing method of the head.

    forward
        The forward method of the RPN head module.

    __call__
        The re-implementation of the builtin callable method based on the forward method.
    """
    def __init__(
        self,
        m: int,
        n: int,
        batch_num: int = None,
        channel_num: int = 1,
        l: int = None,
        l_attribute_interdependence: int = None,
        l_instance_interdependence: int = None,
        l_channel_fusion: int = None,

        input_process_functions=None,
        data_transformation: transformation_class = None,
        attribute_interdependence: interdependence_class = None,
        instance_interdependence: interdependence_class = None,
        parameter_fabrication: fabrication_class = None,
        channel_fusion: fusion_class = None,
        remainder: remainder_class = None,
        output_process_functions=None,

        input_process_function_configs=None,
        data_transformation_configs=None,
        attribute_interdependence_configs=None,
        instance_interdependence_configs=None,
        parameter_fabrication_configs=None,
        channel_fusion_configs=None,
        remainder_configs=None,
        output_process_function_configs=None,

        create_parameters_at_init: bool = True,
        device='cpu',
        *args, **kwargs
    ):
        r"""
        The initialization method of the RPN-head with multiple channels.

        It initializes the RPN head module with multi-channel.
        Specifically, this method initializes the dimension configurations of the head,
        the component functions used in the head, and defines the device to host the head.

        Parameters
        ----------
        m: int
            The input dimension of the head.
        n: int
            The output dimension of the head.
        l: int, default = None
            The number of parameter for each channel in the head.
        channel_num: int, default = 1
            The number of channels in the head.
        data_transformation: object, default = None
            The data transformation function of the head. The data transformation can be initialized directly
            with this parameter or with the data_transformation_config parameter.
        parameter_fabrication: object, default = None
            The parameter fabrication function of the head. The parameter fabrication can be initialized directly
            with this parameter or with the parameter_fabrication_config parameter.
        remainder: object, default = None
            The remainder function the head. The remainder can be initialized directly
            with this parameter or with the remainder_config parameter.
        output_process_functions: object, default = None
            The output processing functions. The output processing function can be initialized directly
            with this parameter or with the output_processing_function_configs parameter.
        data_transformation_configs: dict, default = None
            The data transformation function configuration.
        parameter_fabrication_configs: dict, default = None
            The parameter fabrication function configuration.
        remainder_configs: dict, default = None
            The remainder function configuration.
        output_process_function_configs: dict, default = None
            The output processing function configuration.
        device: str, default = 'cpu'
            The device for hosting the head.

        Returns
        ----------
        object
            This method will return the initialized RPN-head object.
        """
        super().__init__()
        assert (channel_num >= 1) and (m is not None and m >= 1) and (n is not None and n >= 1)

        # initialize the basic attributes
        self.m = m
        self.n = n
        self.batch_num = batch_num
        self.channel_num = channel_num
        self.l = l
        self.l_attribute_interdependence = l_attribute_interdependence
        self.l_instance_interdependence = l_instance_interdependence
        self.l_channel_fusion = l_channel_fusion
        self.device = device

        # initialize data_transformation, interdependence, interdependence_fusion, parameter_fabrication, channel_fusion and remainder functions from either input objects or input configs
        self.data_transformation = config.instantiation_functions(functions=data_transformation, function_configs=data_transformation_configs, device=device)
        self.parameter_fabrication = config.instantiation_functions(functions=parameter_fabrication, function_configs=parameter_fabrication_configs, device=device)
        self.remainder = config.instantiation_functions(functions=remainder, function_configs=remainder_configs, device=device)

        self.attribute_interdependence = config.instantiation_functions(functions=attribute_interdependence, function_configs=attribute_interdependence_configs, device=device)
        self.instance_interdependence = config.instantiation_functions(functions=instance_interdependence, function_configs=instance_interdependence_configs, device=device)

        self.input_process_functions = config.instantiation_functions(input_process_functions, input_process_function_configs, device=device)
        self.output_process_functions = config.instantiation_functions(output_process_functions, output_process_function_configs, device=device)

        self.channel_fusion = config.instantiation_functions(functions=channel_fusion, function_configs=channel_fusion_configs, device=device)
        if self.channel_num > 1 and self.channel_fusion is None:
            self.channel_fusion = mean_fusion(dims=[self.n] * self.channel_num)

        # create learnable parameters for parameter fabrication and remainder functions
        self.w = None
        self.b = None
        self.w_remainder = None
        self.b_remainder = None
        self.w_attribute_interdependence = None
        self.w_instance_interdependence = None
        self.w_channel_fusion = None

        if create_parameters_at_init:
            self.create_learnable_parameters()

    def get_m(self):
        return self.m

    def get_n(self):
        return self.n

    def get_channel_num(self):
        return self.channel_num

    def get_batch_num(self):
        return self.batch_num

    def create_learnable_parameters(
        self,
        initialize_parameter_at_creation: bool = False,
        init_type='xavier_uniform',
        init_bias=True,
        *args, **kwargs
    ):
        m_prime, b_prime = self.m, self.batch_num

        if self.attribute_interdependence is not None:
            if self.attribute_interdependence.require_parameters:
                if self.l_attribute_interdependence is None:
                    self.l_attribute_interdependence = self.attribute_interdependence.calculate_l()
                self.w_attribute_interdependence = torch.nn.Parameter(torch.rand(self.channel_num, self.l_attribute_interdependence, device=self.device))
            assert self.m is not None and self.m >= 1
            m_prime = self.attribute_interdependence.calculate_m_prime(m=self.m)

        if self.instance_interdependence is not None:
            if self.instance_interdependence.require_parameters:
                if self.l_instance_interdependence is None:
                    self.l_instance_interdependence = self.instance_interdependence.calculate_l()
                self.w_instance_interdependence = torch.nn.Parameter(torch.rand(self.channel_num, self.l_instance_interdependence, device=self.device))
            if self.batch_num is not None:
                assert self.batch_num is not None and self.batch_num >= 1
                b_prime = self.instance_interdependence.calculate_b_prime(b=self.batch_num)

        # create learnable parameters for parameter_fabrication function
        if self.parameter_fabrication is not None and self.parameter_fabrication.require_parameters:
            if self.l is None:
                self.l = self.parameter_fabrication.calculate_l(n=self.n, D=self.data_transformation.calculate_D(m=m_prime))
            self.w = torch.nn.Parameter(torch.rand(self.channel_num, self.l, device=self.device))
            if self.parameter_fabrication.enable_bias:
                self.b = torch.nn.Parameter(torch.rand(self.n, device=self.device))

        # create learnable parameters for remainder function
        if self.remainder is not None and self.remainder.require_parameters:
            self.w_remainder = torch.nn.Parameter(torch.rand(self.n, self.m, device=self.device))
            if self.remainder.enable_bias:
                self.b_remainder = torch.nn.Parameter(torch.rand(self.n, device=self.device))
        elif self.m != self.n and not self.remainder.require_parameters and not isinstance(self.remainder, tinybig.remainder.zero_remainder) and not isinstance(self.remainder, tinybig.remainder.constant_remainder):
            raise ValueError('The input and output dimensions {}, {} are different, parameters will be needed '
                             'by the {} to adjust the input dimensions.'.format(self.m, self.n, self.remainder.get_name()))

        # create learnable parameters for channel_fusion function
        if self.channel_fusion is not None and self.channel_fusion.require_parameters:
            if self.l_channel_fusion is None:
                self.l_channel_fusion = self.channel_fusion.calculate_l()
            self.w_channel_fusion = torch.nn.Parameter(torch.rand(1, self.l_channel_fusion, device=self.device))

        # initialize the parameter with certain methods...
        if initialize_parameter_at_creation:
            self.initialize_parameters(init_type=init_type, init_bias=init_bias)

    def initialize_parameters(self, init_type='xavier_uniform', init_bias=True, *args, **kwargs):
        """
        The parameter initialization method.

        It initializes the multi-channel parameters in the head with different initialization approaches,
        e.g., xavier_uniform or kaiming_uniform.
        Depending on the "init_type" parameter, this method will call the corresponding initiation methods.

        Parameters
        ----------
        init_type: str, default = 'xavier_uniform'
            The parameter initialization approach.
        init_bias: bool, default = True
            The boolean tag of bias initialization.

        Returns
        -------
        None
            This initialization method doesn't have any return values.
        """
        if init_type == 'kaiming_uniform':
            self.initialize_parameters_kaiming(init_bias=init_bias, *args, **kwargs)
        elif init_type == 'xavier_uniform':
            self.initialize_parameters_xavier(init_bias=init_bias, *args, **kwargs)

    def initialize_parameters_kaiming(self, init_bias=True, *args, **kwargs):
        """
        The kaiming parameter initialization method.

        It initializes the multi-channel parameters in the head with kaiming_uniform_ method from pytorch.

        Parameters
        ----------
        init_bias: bool, default = True
            The boolean tag of bias initialization.

        Returns
        -------
        None
            This initialization method doesn't have any return values.
        """

        if self.w_attribute_interdependence is not None:
            torch.nn.init.kaiming_uniform_(self.w_attribute_interdependence, a=math.sqrt(5))

        if self.w_instance_interdependence is not None:
            torch.nn.init.kaiming_uniform_(self.w_instance_interdependence, a=math.sqrt(5))

        if self.w is not None:
            torch.nn.init.kaiming_uniform_(self.w, a=math.sqrt(5))

        if self.w_remainder is not None:
            torch.nn.init.kaiming_uniform_(self.w_remainder, a=math.sqrt(5))

        if self.w_channel_fusion is not None:
            torch.nn.init.kaiming_uniform_(self.w_channel_fusion, a=math.sqrt(5))

        if init_bias:
            if self.b is not None:
                fan_in, _ = torch.nn.init._calculate_fan_in_and_fan_out(self.w)
                bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
                torch.nn.init.uniform_(self.b, -bound, bound)
            if self.b_remainder is not None:
                fan_in, _ = torch.nn.init._calculate_fan_in_and_fan_out(self.w_remainder)
                bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
                torch.nn.init.uniform_(self.b_remainder, -bound, bound)

    def initialize_parameters_xavier(self, init_bias=True, *args, **kwargs):
        """
        The xavier initialization method.

        It initializes the multi-channel parameters in the head with xavier_uniform_ method from pytorch.

        Parameters
        ----------
        init_bias: bool, default = True
            The boolean tag of bias initialization.

        Returns
        -------
        None
            This initialization method doesn't have any return values.
        """
        if self.w_attribute_interdependence is not None:
            torch.nn.init.xavier_uniform_(self.w_attribute_interdependence)

        if self.w_instance_interdependence is not None:
            torch.nn.init.xavier_uniform_(self.w_instance_interdependence)

        if self.w is not None:
            torch.nn.init.xavier_uniform_(self.w)

        if self.w_remainder is not None:
            torch.nn.init.xavier_uniform_(self.w_remainder)

        if self.w_channel_fusion is not None:
            torch.nn.init.xavier_uniform_(self.w_channel_fusion)

        if init_bias:
            if self.b is not None:
                torch.nn.init.xavier_uniform_(self.b.view(1, -1))
            if self.b_remainder is not None:
                torch.nn.init.xavier_uniform_(self.b_remainder.view(1, -1))

    def to_config(self):
        head_class = f"{self.__class__.__module__}.{self.__class__.__name__}"
        head_parameters = {
            'name': self.name,
            'device': self.device,
            'm': self.m,
            'n': self.n,
            'l': self.l,
            'batch_num': self.batch_num,
            'channel_num': self.channel_num,
        }

        if self.data_transformation is not None:
            head_parameters['data_transformation_configs'] = self.data_transformation.to_config()
        if self.attribute_interdependence is not None:
            head_parameters['attribute_interdependence_configs'] = self.attribute_interdependence.to_config()
        if self.instance_interdependence is not None:
            head_parameters['instance_interdependence_configs'] = self.instance_interdependence.to_config()
        if self.parameter_fabrication is not None:
            head_parameters['parameter_fabrication_configs'] = self.parameter_fabrication.to_config()
        if self.channel_fusion is not None:
            head_parameters['channel_fusion_configs'] = self.channel_fusion.to_config()
        if self.remainder is not None:
            head_parameters['remainder_configs'] = self.remainder.to_config()
        if self.input_process_functions is not None:
            head_parameters['input_process_function_configs'] = function.functions_to_configs(self.input_process_functions)
        if self.output_process_functions is not None:
            head_parameters['output_process_function_configs'] = function.functions_to_configs(self.output_process_functions)

        return {
            "head_class": head_class,
            "head_parameters": head_parameters
        }

    def __call__(self, *args, **kwargs):
        """
        The re-implementation of the builtin callable method based on the forward method.

        It re-implements the callable method of the head, which will call the "forward" method to calculate the output
        with the multi-channel RPN head module.

        Returns
        -------
        torch.Tensor
            The processed output of the head.
        """
        return self.forward(*args, **kwargs)

    def calculate_kappa_x(self, x: torch.Tensor, device='cpu', *args, **kwargs):
        if self.data_transformation is not None:
            if self.data_transformation.device != device:
                self.data_transformation.to(device)

            kappa_x = self.data_transformation(x, device=device)
            return kappa_x
        else:
            return x

    def calculate_phi_w(self, D: int, channel_index: int = 0, device='cpu', *args, **kwargs):
        assert channel_index in range(self.channel_num)

        if self.parameter_fabrication is not None:
            if self.parameter_fabrication.device != device:
                self.parameter_fabrication.to(device)

            if self.w is not None and 0 <= channel_index < self.w.size(0):
                w_chunk = self.w[channel_index:channel_index+1, :]
            else:
                w_chunk = None
            phi_w = self.parameter_fabrication(w=w_chunk, n=self.n, D=D, device=device)
            return phi_w
        else:
            return None

    def calculate_pi_x(self, x: torch.Tensor, device='cpu', *args, **kwargs):
        if self.remainder is not None:
            if isinstance(self.remainder, tinybig.remainder.zero_remainder):
                return None

            if self.remainder.device != device:
                self.remainder.to(device)
            pi_x = self.remainder(x=x, w=self.w_remainder, b=self.b_remainder, m=self.m, n=self.n, device=device)
            return pi_x
        else:
            return None

    def calculate_attribute_xi_x(self, x: torch.Tensor, channel_index: int = 0, kappa_x: torch.Tensor = None, device='cpu', *args, **kwargs):
        if self.attribute_interdependence is not None:
            if self.attribute_interdependence.device != device:
                self.attribute_interdependence.to(device)

            if self.w_attribute_interdependence is not None and 0 <= channel_index < self.w_attribute_interdependence.size(0):
                w_chunks = self.w_attribute_interdependence[channel_index:channel_index+1, :]
            else:
                w_chunks = None
            xi_x = self.attribute_interdependence(x=x, w=w_chunks, kappa_x=kappa_x, device=device)
            return xi_x
        else:
            return kappa_x if kappa_x is not None else x

    def calculate_instance_xi_x(self, x: torch.Tensor, channel_index: int = 0, kappa_x: torch.Tensor = None, device='cpu', *args, **kwargs):
        if self.instance_interdependence is not None:
            if self.instance_interdependence.device != device:
                self.instance_interdependence.to(device)

            if self.w_instance_interdependence is not None and 0 <= channel_index < self.w_instance_interdependence.size(0):
                w_chunks = self.w_instance_interdependence[channel_index:channel_index+1, :]
            else:
                w_chunks = None
            xi_x = self.instance_interdependence(x=x, w=w_chunks, kappa_x=kappa_x, device=device)
            return xi_x
        else:
            return kappa_x if kappa_x is not None else x

    # this function checks conditions for faster calculation across multi-channels...
    def calculate_kappa_xi_x(self, x: torch.Tensor, channel_index: int = 0, device='cpu', *args, **kwargs):

        # ************** Attribute Interdependence Block **************
        xi_x = self.calculate_attribute_xi_x(x=x, channel_index=channel_index, device=self.device)

        # ************** Data Expansion Block **************
        kappa_x = self.calculate_kappa_x(x=xi_x, device=device, *args, **kwargs)
        assert kappa_x.shape[1] == self.data_transformation.calculate_D(m=xi_x.shape[1])

        # ************** Instance Interdependence Block **************
        kappa_xi_x = self.calculate_instance_xi_x(x=x, channel_index=channel_index, kappa_x=kappa_x, device=self.device)

        return kappa_xi_x

    def calculate_inner_product(self, kappa_xi_x: torch.Tensor, phi_w: torch.Tensor, device: str = 'cpu', *args, **kwargs):
        if phi_w is not None:
            assert kappa_xi_x.ndim == 2 and phi_w.ndim == 2 and kappa_xi_x.size(-1) == phi_w.size(-1)
            if device != 'mps' and (kappa_xi_x.is_sparse or phi_w.is_sparse):
                inner_prod = torch.sparse.mm(kappa_xi_x, phi_w.T)
                if self.b is not None:
                    inner_prod += self.b
            else:
                inner_prod = F.linear(kappa_xi_x, phi_w, bias=self.b)
        else:
            inner_prod = kappa_xi_x
        return inner_prod

    def forward(self, x: torch.Tensor, device='cpu', *args, **kwargs):
        r"""
        The forward method of the RPN head module.

        Based on the data expansion, parameter reconciliation and remainder functions, the RPN head will calculate its
        output with multi-channel parameters as follows:
        $$
            \begin{equation}
                g(\mathbf{x} | \mathbf{w}, C) = \sum_{c=0}^{C-1} \left\langle \kappa(\mathbf{x}), \psi(\mathbf{w}^{c}) \right\rangle + \pi(\mathbf{x}),
            \end{equation}
        $$
        where these multi-channel parameters $\mathbf{w}^{0}, \mathbf{w}^{1}, \cdots, \mathbf{w}^{C-1}$ are fabricated
        from length $l$ to shape $(n, D)$ using the identical parameter reconciliation function.

        Parameters
        ----------
        x: torch.Tensor
            The input data vector.
        device: str, default = 'cpu'
            The device for hosting the head.

        Returns
        -------
        torch.Tensor
            The processed output of the head.
        """
        # ************** Input Processing Block **************
        if x is None:
            raise ValueError("x cannot be None...")

        x = function.func_x(x=x, functions=self.input_process_functions, device=device)

        inner_products = []

        pre_computed_kappa_xi_x = None
        # if the instance functions has no parameters, it can be pre-computed and reused across channels
        if ((self.attribute_interdependence is None or not self.attribute_interdependence.require_parameters) and
            (self.instance_interdependence is None or not self.instance_interdependence.require_parameters)):
            pre_computed_kappa_xi_x = self.calculate_kappa_xi_x(x=x, channel_index=0, device=device)

        for channel_index in range(self.channel_num):

            # ************** Data Transformation Block **************
            if ((self.attribute_interdependence is None or not self.attribute_interdependence.require_parameters)
                and (self.instance_interdependence is None or not self.instance_interdependence.require_parameters)
                and pre_computed_kappa_xi_x is not None):
                kappa_xi_x = pre_computed_kappa_xi_x
            else:
                kappa_xi_x = self.calculate_kappa_xi_x(x=x, channel_index=channel_index, device=device)

            # ************** Parameter Reconciliation Block **************
            phi_w = self.calculate_phi_w(D=kappa_xi_x.size(1), channel_index=channel_index, device=device, *args, **kwargs)

            # ************** Inner Product Calculation Block **************
            inner_prod = self.calculate_inner_product(kappa_xi_x=kappa_xi_x, phi_w=phi_w, device=device, *args, **kwargs)
            inner_products.append(inner_prod)

        # ************** Multi-Channel Fusion Block **************
        if self.channel_fusion is not None:
            assert self.channel_fusion.get_dims() is None or self.channel_fusion.get_num() == len(inner_products)
            result = self.channel_fusion(x=inner_products, w=self.w_channel_fusion, device=device)
            n = self.channel_fusion.calculate_n(dims=[result.size(-1) for result in inner_products])
        else:
            assert len(inner_products) == 1
            result = inner_products[0]
            n = self.n
        assert result.size(-1) == n

        # ************** Remainder Block **************
        pi_x = self.calculate_pi_x(x=x, device=device, *args, **kwargs)
        if pi_x is not None:
            assert pi_x.size(-1) == n
            result += pi_x

        # ************** Output Processing Block **************
        return function.func_x(x=result, functions=self.output_process_functions, device=self.device)

__call__(*args, **kwargs)

The re-implementation of the builtin callable method based on the forward method.

It re-implements the callable method of the head, which will call the "forward" method to calculate the output with the multi-channel RPN head module.

Returns:

Type Description
Tensor

The processed output of the head.

Source code in tinybig/module/base_head.py
def __call__(self, *args, **kwargs):
    """
    The re-implementation of the builtin callable method based on the forward method.

    It re-implements the callable method of the head, which will call the "forward" method to calculate the output
    with the multi-channel RPN head module.

    Returns
    -------
    torch.Tensor
        The processed output of the head.
    """
    return self.forward(*args, **kwargs)

__init__(m, n, batch_num=None, channel_num=1, l=None, l_attribute_interdependence=None, l_instance_interdependence=None, l_channel_fusion=None, input_process_functions=None, data_transformation=None, attribute_interdependence=None, instance_interdependence=None, parameter_fabrication=None, channel_fusion=None, remainder=None, output_process_functions=None, input_process_function_configs=None, data_transformation_configs=None, attribute_interdependence_configs=None, instance_interdependence_configs=None, parameter_fabrication_configs=None, channel_fusion_configs=None, remainder_configs=None, output_process_function_configs=None, create_parameters_at_init=True, device='cpu', *args, **kwargs)

The initialization method of the RPN-head with multiple channels.

It initializes the RPN head module with multi-channel. Specifically, this method initializes the dimension configurations of the head, the component functions used in the head, and defines the device to host the head.

Parameters:

Name Type Description Default
m int

The input dimension of the head.

required
n int

The output dimension of the head.

required
l int

The number of parameter for each channel in the head.

None
channel_num int

The number of channels in the head.

1
data_transformation transformation

The data transformation function of the head. The data transformation can be initialized directly with this parameter or with the data_transformation_config parameter.

None
parameter_fabrication fabrication

The parameter fabrication function of the head. The parameter fabrication can be initialized directly with this parameter or with the parameter_fabrication_config parameter.

None
remainder remainder

The remainder function the head. The remainder can be initialized directly with this parameter or with the remainder_config parameter.

None
output_process_functions

The output processing functions. The output processing function can be initialized directly with this parameter or with the output_processing_function_configs parameter.

None
data_transformation_configs

The data transformation function configuration.

None
parameter_fabrication_configs

The parameter fabrication function configuration.

None
remainder_configs

The remainder function configuration.

None
output_process_function_configs

The output processing function configuration.

None
device

The device for hosting the head.

'cpu'

Returns:

Type Description
object

This method will return the initialized RPN-head object.

Source code in tinybig/module/base_head.py
def __init__(
    self,
    m: int,
    n: int,
    batch_num: int = None,
    channel_num: int = 1,
    l: int = None,
    l_attribute_interdependence: int = None,
    l_instance_interdependence: int = None,
    l_channel_fusion: int = None,

    input_process_functions=None,
    data_transformation: transformation_class = None,
    attribute_interdependence: interdependence_class = None,
    instance_interdependence: interdependence_class = None,
    parameter_fabrication: fabrication_class = None,
    channel_fusion: fusion_class = None,
    remainder: remainder_class = None,
    output_process_functions=None,

    input_process_function_configs=None,
    data_transformation_configs=None,
    attribute_interdependence_configs=None,
    instance_interdependence_configs=None,
    parameter_fabrication_configs=None,
    channel_fusion_configs=None,
    remainder_configs=None,
    output_process_function_configs=None,

    create_parameters_at_init: bool = True,
    device='cpu',
    *args, **kwargs
):
    r"""
    The initialization method of the RPN-head with multiple channels.

    It initializes the RPN head module with multi-channel.
    Specifically, this method initializes the dimension configurations of the head,
    the component functions used in the head, and defines the device to host the head.

    Parameters
    ----------
    m: int
        The input dimension of the head.
    n: int
        The output dimension of the head.
    l: int, default = None
        The number of parameter for each channel in the head.
    channel_num: int, default = 1
        The number of channels in the head.
    data_transformation: object, default = None
        The data transformation function of the head. The data transformation can be initialized directly
        with this parameter or with the data_transformation_config parameter.
    parameter_fabrication: object, default = None
        The parameter fabrication function of the head. The parameter fabrication can be initialized directly
        with this parameter or with the parameter_fabrication_config parameter.
    remainder: object, default = None
        The remainder function the head. The remainder can be initialized directly
        with this parameter or with the remainder_config parameter.
    output_process_functions: object, default = None
        The output processing functions. The output processing function can be initialized directly
        with this parameter or with the output_processing_function_configs parameter.
    data_transformation_configs: dict, default = None
        The data transformation function configuration.
    parameter_fabrication_configs: dict, default = None
        The parameter fabrication function configuration.
    remainder_configs: dict, default = None
        The remainder function configuration.
    output_process_function_configs: dict, default = None
        The output processing function configuration.
    device: str, default = 'cpu'
        The device for hosting the head.

    Returns
    ----------
    object
        This method will return the initialized RPN-head object.
    """
    super().__init__()
    assert (channel_num >= 1) and (m is not None and m >= 1) and (n is not None and n >= 1)

    # initialize the basic attributes
    self.m = m
    self.n = n
    self.batch_num = batch_num
    self.channel_num = channel_num
    self.l = l
    self.l_attribute_interdependence = l_attribute_interdependence
    self.l_instance_interdependence = l_instance_interdependence
    self.l_channel_fusion = l_channel_fusion
    self.device = device

    # initialize data_transformation, interdependence, interdependence_fusion, parameter_fabrication, channel_fusion and remainder functions from either input objects or input configs
    self.data_transformation = config.instantiation_functions(functions=data_transformation, function_configs=data_transformation_configs, device=device)
    self.parameter_fabrication = config.instantiation_functions(functions=parameter_fabrication, function_configs=parameter_fabrication_configs, device=device)
    self.remainder = config.instantiation_functions(functions=remainder, function_configs=remainder_configs, device=device)

    self.attribute_interdependence = config.instantiation_functions(functions=attribute_interdependence, function_configs=attribute_interdependence_configs, device=device)
    self.instance_interdependence = config.instantiation_functions(functions=instance_interdependence, function_configs=instance_interdependence_configs, device=device)

    self.input_process_functions = config.instantiation_functions(input_process_functions, input_process_function_configs, device=device)
    self.output_process_functions = config.instantiation_functions(output_process_functions, output_process_function_configs, device=device)

    self.channel_fusion = config.instantiation_functions(functions=channel_fusion, function_configs=channel_fusion_configs, device=device)
    if self.channel_num > 1 and self.channel_fusion is None:
        self.channel_fusion = mean_fusion(dims=[self.n] * self.channel_num)

    # create learnable parameters for parameter fabrication and remainder functions
    self.w = None
    self.b = None
    self.w_remainder = None
    self.b_remainder = None
    self.w_attribute_interdependence = None
    self.w_instance_interdependence = None
    self.w_channel_fusion = None

    if create_parameters_at_init:
        self.create_learnable_parameters()

forward(x, device='cpu', *args, **kwargs)

The forward method of the RPN head module.

Based on the data expansion, parameter reconciliation and remainder functions, the RPN head will calculate its output with multi-channel parameters as follows: $$ \begin{equation} g(\mathbf{x} | \mathbf{w}, C) = \sum_{c=0}^{C-1} \left\langle \kappa(\mathbf{x}), \psi(\mathbf{w}^{c}) \right\rangle + \pi(\mathbf{x}), \end{equation} $$ where these multi-channel parameters \(\mathbf{w}^{0}, \mathbf{w}^{1}, \cdots, \mathbf{w}^{C-1}\) are fabricated from length \(l\) to shape \((n, D)\) using the identical parameter reconciliation function.

Parameters:

Name Type Description Default
x Tensor

The input data vector.

required
device

The device for hosting the head.

'cpu'

Returns:

Type Description
Tensor

The processed output of the head.

Source code in tinybig/module/base_head.py
def forward(self, x: torch.Tensor, device='cpu', *args, **kwargs):
    r"""
    The forward method of the RPN head module.

    Based on the data expansion, parameter reconciliation and remainder functions, the RPN head will calculate its
    output with multi-channel parameters as follows:
    $$
        \begin{equation}
            g(\mathbf{x} | \mathbf{w}, C) = \sum_{c=0}^{C-1} \left\langle \kappa(\mathbf{x}), \psi(\mathbf{w}^{c}) \right\rangle + \pi(\mathbf{x}),
        \end{equation}
    $$
    where these multi-channel parameters $\mathbf{w}^{0}, \mathbf{w}^{1}, \cdots, \mathbf{w}^{C-1}$ are fabricated
    from length $l$ to shape $(n, D)$ using the identical parameter reconciliation function.

    Parameters
    ----------
    x: torch.Tensor
        The input data vector.
    device: str, default = 'cpu'
        The device for hosting the head.

    Returns
    -------
    torch.Tensor
        The processed output of the head.
    """
    # ************** Input Processing Block **************
    if x is None:
        raise ValueError("x cannot be None...")

    x = function.func_x(x=x, functions=self.input_process_functions, device=device)

    inner_products = []

    pre_computed_kappa_xi_x = None
    # if the instance functions has no parameters, it can be pre-computed and reused across channels
    if ((self.attribute_interdependence is None or not self.attribute_interdependence.require_parameters) and
        (self.instance_interdependence is None or not self.instance_interdependence.require_parameters)):
        pre_computed_kappa_xi_x = self.calculate_kappa_xi_x(x=x, channel_index=0, device=device)

    for channel_index in range(self.channel_num):

        # ************** Data Transformation Block **************
        if ((self.attribute_interdependence is None or not self.attribute_interdependence.require_parameters)
            and (self.instance_interdependence is None or not self.instance_interdependence.require_parameters)
            and pre_computed_kappa_xi_x is not None):
            kappa_xi_x = pre_computed_kappa_xi_x
        else:
            kappa_xi_x = self.calculate_kappa_xi_x(x=x, channel_index=channel_index, device=device)

        # ************** Parameter Reconciliation Block **************
        phi_w = self.calculate_phi_w(D=kappa_xi_x.size(1), channel_index=channel_index, device=device, *args, **kwargs)

        # ************** Inner Product Calculation Block **************
        inner_prod = self.calculate_inner_product(kappa_xi_x=kappa_xi_x, phi_w=phi_w, device=device, *args, **kwargs)
        inner_products.append(inner_prod)

    # ************** Multi-Channel Fusion Block **************
    if self.channel_fusion is not None:
        assert self.channel_fusion.get_dims() is None or self.channel_fusion.get_num() == len(inner_products)
        result = self.channel_fusion(x=inner_products, w=self.w_channel_fusion, device=device)
        n = self.channel_fusion.calculate_n(dims=[result.size(-1) for result in inner_products])
    else:
        assert len(inner_products) == 1
        result = inner_products[0]
        n = self.n
    assert result.size(-1) == n

    # ************** Remainder Block **************
    pi_x = self.calculate_pi_x(x=x, device=device, *args, **kwargs)
    if pi_x is not None:
        assert pi_x.size(-1) == n
        result += pi_x

    # ************** Output Processing Block **************
    return function.func_x(x=result, functions=self.output_process_functions, device=self.device)

initialize_parameters(init_type='xavier_uniform', init_bias=True, *args, **kwargs)

The parameter initialization method.

It initializes the multi-channel parameters in the head with different initialization approaches, e.g., xavier_uniform or kaiming_uniform. Depending on the "init_type" parameter, this method will call the corresponding initiation methods.

Parameters:

Name Type Description Default
init_type

The parameter initialization approach.

'xavier_uniform'
init_bias

The boolean tag of bias initialization.

True

Returns:

Type Description
None

This initialization method doesn't have any return values.

Source code in tinybig/module/base_head.py
def initialize_parameters(self, init_type='xavier_uniform', init_bias=True, *args, **kwargs):
    """
    The parameter initialization method.

    It initializes the multi-channel parameters in the head with different initialization approaches,
    e.g., xavier_uniform or kaiming_uniform.
    Depending on the "init_type" parameter, this method will call the corresponding initiation methods.

    Parameters
    ----------
    init_type: str, default = 'xavier_uniform'
        The parameter initialization approach.
    init_bias: bool, default = True
        The boolean tag of bias initialization.

    Returns
    -------
    None
        This initialization method doesn't have any return values.
    """
    if init_type == 'kaiming_uniform':
        self.initialize_parameters_kaiming(init_bias=init_bias, *args, **kwargs)
    elif init_type == 'xavier_uniform':
        self.initialize_parameters_xavier(init_bias=init_bias, *args, **kwargs)

initialize_parameters_kaiming(init_bias=True, *args, **kwargs)

The kaiming parameter initialization method.

It initializes the multi-channel parameters in the head with kaiming_uniform_ method from pytorch.

Parameters:

Name Type Description Default
init_bias

The boolean tag of bias initialization.

True

Returns:

Type Description
None

This initialization method doesn't have any return values.

Source code in tinybig/module/base_head.py
def initialize_parameters_kaiming(self, init_bias=True, *args, **kwargs):
    """
    The kaiming parameter initialization method.

    It initializes the multi-channel parameters in the head with kaiming_uniform_ method from pytorch.

    Parameters
    ----------
    init_bias: bool, default = True
        The boolean tag of bias initialization.

    Returns
    -------
    None
        This initialization method doesn't have any return values.
    """

    if self.w_attribute_interdependence is not None:
        torch.nn.init.kaiming_uniform_(self.w_attribute_interdependence, a=math.sqrt(5))

    if self.w_instance_interdependence is not None:
        torch.nn.init.kaiming_uniform_(self.w_instance_interdependence, a=math.sqrt(5))

    if self.w is not None:
        torch.nn.init.kaiming_uniform_(self.w, a=math.sqrt(5))

    if self.w_remainder is not None:
        torch.nn.init.kaiming_uniform_(self.w_remainder, a=math.sqrt(5))

    if self.w_channel_fusion is not None:
        torch.nn.init.kaiming_uniform_(self.w_channel_fusion, a=math.sqrt(5))

    if init_bias:
        if self.b is not None:
            fan_in, _ = torch.nn.init._calculate_fan_in_and_fan_out(self.w)
            bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
            torch.nn.init.uniform_(self.b, -bound, bound)
        if self.b_remainder is not None:
            fan_in, _ = torch.nn.init._calculate_fan_in_and_fan_out(self.w_remainder)
            bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
            torch.nn.init.uniform_(self.b_remainder, -bound, bound)

initialize_parameters_xavier(init_bias=True, *args, **kwargs)

The xavier initialization method.

It initializes the multi-channel parameters in the head with xavier_uniform_ method from pytorch.

Parameters:

Name Type Description Default
init_bias

The boolean tag of bias initialization.

True

Returns:

Type Description
None

This initialization method doesn't have any return values.

Source code in tinybig/module/base_head.py
def initialize_parameters_xavier(self, init_bias=True, *args, **kwargs):
    """
    The xavier initialization method.

    It initializes the multi-channel parameters in the head with xavier_uniform_ method from pytorch.

    Parameters
    ----------
    init_bias: bool, default = True
        The boolean tag of bias initialization.

    Returns
    -------
    None
        This initialization method doesn't have any return values.
    """
    if self.w_attribute_interdependence is not None:
        torch.nn.init.xavier_uniform_(self.w_attribute_interdependence)

    if self.w_instance_interdependence is not None:
        torch.nn.init.xavier_uniform_(self.w_instance_interdependence)

    if self.w is not None:
        torch.nn.init.xavier_uniform_(self.w)

    if self.w_remainder is not None:
        torch.nn.init.xavier_uniform_(self.w_remainder)

    if self.w_channel_fusion is not None:
        torch.nn.init.xavier_uniform_(self.w_channel_fusion)

    if init_bias:
        if self.b is not None:
            torch.nn.init.xavier_uniform_(self.b.view(1, -1))
        if self.b_remainder is not None:
            torch.nn.init.xavier_uniform_(self.b_remainder.view(1, -1))