Skip to content

head

Bases: Module, function

The RPN head class for implementing the multi-channel module.

It will be used to compose the RPN layer module for building deep RPN models.

...

Notes

Similar to convolutional neural networks (CNNs) employing multiple filters, RPN allows each head to have multiple channels of parameters applied to the same data expansion. RPN defines its multi-channel parameters as \(\mathbf{w}^{0}, \mathbf{w}^{1}, \cdots, \mathbf{w}^{C-1}\), where \(C\) denotes the number of channels. Based on the data expansion, parameter reconciliation and remainder functions, the RPN head will calculate its output with such multi-channel parameters as follows: $$ \begin{equation} g(\mathbf{x} | \mathbf{w}, C) = \sum_{c=0}^{C-1} \left\langle \kappa(\mathbf{x}), \psi(\mathbf{w}^{c}) \right\rangle + \pi(\mathbf{x}), \end{equation} $$ where these multi-channel parameters are fabricated from length \(l\) to shape \((n, D)\) using the identical parameter reconciliation function.

Attributes:

Name Type Description
m int

The input dimension of the head.

n int

The output dimension of the head.

l (int, optional)

The number of parameters for each channel in the head.

channel_num int, default=1

The number of channels in the head.

batch_num (int, optional)

The batch size used in instance interdependence functions.

data_transformation (object, optional)

The data transformation function for the head.

parameter_fabrication (object, optional)

The parameter fabrication function for the head.

remainder (object, optional)

The remainder function for the head.

w (Parameter, optional)

Parameters for parameter reconciliation, with a length of \(l\) per channel.

b (Parameter, optional)

Bias parameters for parameter reconciliation.

w_remainder (Parameter, optional)

Parameters for the remainder function.

b_remainder (Parameter, optional)

Bias parameters for the remainder function.

device str, default='cpu'

The device hosting the head.

Methods:

Name Description
__init__

Initializes the RPN head with multi-channel settings.

get_m

Retrieves the input dimension of the head.

get_n

Retrieves the output dimension of the head.

get_channel_num

Retrieves the number of channels in the head.

get_batch_num

Retrieves the batch size used in instance interdependence functions.

create_learnable_parameters

Creates learnable parameters for the head.

initialize_parameters

Initializes parameters for the head using various strategies.

initialize_parameters_fanout_std_uniform

Initializes parameters with a fan-out-based uniform distribution.

initialize_parameters_kaiming_uniform

Initializes parameters using the Kaiming uniform distribution.

initialize_parameters_xavier_uniform

Initializes parameters using the Xavier uniform distribution.

initialize_parameters_xavier_normal

Initializes parameters using the Xavier normal distribution.

to_config

Converts the head configuration into a dictionary format.

calculate_kappa_x

Computes the transformed data \(\kappa(\mathbf{x})\).

calculate_phi_w

Computes the reconciled parameters \(\psi(\mathbf{w})\).

calculate_pi_x

Computes the remainder term \(\pi(\mathbf{x})\).

calculate_attribute_xi_x

Computes the attribute interdependence \(\xi_{\text{attribute}}(\mathbf{x})\).

calculate_instance_xi_x

Computes the instance interdependence \(\xi_{\text{instance}}(\mathbf{x})\).

calculate_kappa_xi_x

Computes the combined transformed and interdependent data.

calculate_inner_product

Computes the inner product of \(\kappa(\mathbf{x})\) and \(\psi(\mathbf{w})\).

fusion

Combines the multi-channel outputs into a single output.

forward

Executes the forward pass of the head.

Source code in tinybig/module/base_head.py
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
class head(Module, function):
    r"""
    The RPN head class for implementing the multi-channel module.

    It will be used to compose the RPN layer module for building deep RPN models.

    ...

    Notes
    ----------
    Similar to convolutional neural networks (CNNs) employing multiple filters, RPN allows each head to have multiple
    channels of parameters applied to the same data expansion.
    RPN defines its multi-channel parameters as $\mathbf{w}^{0}, \mathbf{w}^{1}, \cdots, \mathbf{w}^{C-1}$,
    where $C$ denotes the number of channels.
    Based on the data expansion, parameter reconciliation and remainder functions, the RPN head will calculate its
    output with such multi-channel parameters as follows:
    $$
        \begin{equation}
            g(\mathbf{x} | \mathbf{w}, C) = \sum_{c=0}^{C-1} \left\langle \kappa(\mathbf{x}), \psi(\mathbf{w}^{c}) \right\rangle + \pi(\mathbf{x}),
        \end{equation}
    $$
    where these multi-channel parameters are fabricated from length $l$ to shape $(n, D)$ using the identical
    parameter reconciliation function.

    Attributes
    ----------
    m: int
        The input dimension of the head.
    n: int
        The output dimension of the head.
    l: int, optional
        The number of parameters for each channel in the head.
    channel_num: int, default=1
        The number of channels in the head.
    batch_num: int, optional
        The batch size used in instance interdependence functions.
    data_transformation: object, optional
        The data transformation function for the head.
    parameter_fabrication: object, optional
        The parameter fabrication function for the head.
    remainder: object, optional
        The remainder function for the head.
    w: torch.nn.Parameter, optional
        Parameters for parameter reconciliation, with a length of $l$ per channel.
    b: torch.nn.Parameter, optional
        Bias parameters for parameter reconciliation.
    w_remainder: torch.nn.Parameter, optional
        Parameters for the remainder function.
    b_remainder: torch.nn.Parameter, optional
        Bias parameters for the remainder function.
    device: str, default='cpu'
        The device hosting the head.

    Methods
    -------
    __init__
        Initializes the RPN head with multi-channel settings.
    get_m
        Retrieves the input dimension of the head.
    get_n
        Retrieves the output dimension of the head.
    get_channel_num
        Retrieves the number of channels in the head.
    get_batch_num
        Retrieves the batch size used in instance interdependence functions.
    create_learnable_parameters
        Creates learnable parameters for the head.
    initialize_parameters
        Initializes parameters for the head using various strategies.
    initialize_parameters_fanout_std_uniform
        Initializes parameters with a fan-out-based uniform distribution.
    initialize_parameters_kaiming_uniform
        Initializes parameters using the Kaiming uniform distribution.
    initialize_parameters_xavier_uniform
        Initializes parameters using the Xavier uniform distribution.
    initialize_parameters_xavier_normal
        Initializes parameters using the Xavier normal distribution.
    to_config
        Converts the head configuration into a dictionary format.
    calculate_kappa_x
        Computes the transformed data $\kappa(\mathbf{x})$.
    calculate_phi_w
        Computes the reconciled parameters $\psi(\mathbf{w})$.
    calculate_pi_x
        Computes the remainder term $\pi(\mathbf{x})$.
    calculate_attribute_xi_x
        Computes the attribute interdependence $\xi_{\text{attribute}}(\mathbf{x})$.
    calculate_instance_xi_x
        Computes the instance interdependence $\xi_{\text{instance}}(\mathbf{x})$.
    calculate_kappa_xi_x
        Computes the combined transformed and interdependent data.
    calculate_inner_product
        Computes the inner product of $\kappa(\mathbf{x})$ and $\psi(\mathbf{w})$.
    fusion
        Combines the multi-channel outputs into a single output.
    forward
        Executes the forward pass of the head.
    """
    def __init__(
        self,
        m: int,
        n: int,
        name: str = 'rpn_head',
        batch_num: int = None,
        channel_num: int = 1,
        l: int = None,
        l_attribute_interdependence: int = None,
        l_instance_interdependence: int = None,
        l_channel_fusion: int = None,

        input_process_functions=None,
        data_transformation: transformation_class = None,
        attribute_interdependence: interdependence_class = None,
        instance_interdependence: interdependence_class = None,
        parameter_fabrication: fabrication_class = None,
        channel_fusion: fusion_class = None,
        remainder: remainder_class = None,
        output_process_functions=None,

        input_process_function_configs=None,
        data_transformation_configs=None,
        attribute_interdependence_configs=None,
        instance_interdependence_configs=None,
        parameter_fabrication_configs=None,
        channel_fusion_configs=None,
        remainder_configs=None,
        output_process_function_configs=None,

        create_parameters_at_init: bool = True,
        parameters_init_method: str = None,
        device='cpu',
        *args, **kwargs
    ):
        r"""
        The initialization method of the RPN-head with multiple channels.

        It initializes the RPN head module with multi-channel.
        Specifically, this method initializes the dimension configurations of the head,
        the component functions used in the head, and defines the device to host the head.

        Parameters
        ----------
        m: int
            The input dimension of the head.
        n: int
            The output dimension of the head.
        l: int, default = None
            The number of parameter for each channel in the head.
        channel_num: int, default = 1
            The number of channels in the head.
        data_transformation: object, default = None
            The data transformation function of the head. The data transformation can be initialized directly
            with this parameter or with the data_transformation_config parameter.
        parameter_fabrication: object, default = None
            The parameter fabrication function of the head. The parameter fabrication can be initialized directly
            with this parameter or with the parameter_fabrication_config parameter.
        remainder: object, default = None
            The remainder function the head. The remainder can be initialized directly
            with this parameter or with the remainder_config parameter.
        output_process_functions: object, default = None
            The output processing functions. The output processing function can be initialized directly
            with this parameter or with the output_processing_function_configs parameter.
        data_transformation_configs: dict, default = None
            The data transformation function configuration.
        parameter_fabrication_configs: dict, default = None
            The parameter fabrication function configuration.
        remainder_configs: dict, default = None
            The remainder function configuration.
        output_process_function_configs: dict, default = None
            The output processing function configuration.
        device: str, default = 'cpu'
            The device for hosting the head.

        Returns
        ----------
        object
            This method will return the initialized RPN-head object.
        """
        Module.__init__(self)
        function.__init__(self, name=name, device=device)

        assert (channel_num >= 1) and (m is not None and m >= 1) and (n is not None and n >= 1)
        # initialize the basic attributes
        self.m = m
        self.n = n
        self.batch_num = batch_num
        self.channel_num = channel_num
        self.l = l
        self.l_attribute_interdependence = l_attribute_interdependence
        self.l_instance_interdependence = l_instance_interdependence
        self.l_channel_fusion = l_channel_fusion

        # initialize data_transformation, interdependence, interdependence_fusion, parameter_fabrication, channel_fusion and remainder functions from either input objects or input configs
        self.data_transformation = config.instantiation_functions(functions=data_transformation, function_configs=data_transformation_configs, device=device)
        self.parameter_fabrication = config.instantiation_functions(functions=parameter_fabrication, function_configs=parameter_fabrication_configs, device=device)
        self.remainder = config.instantiation_functions(functions=remainder, function_configs=remainder_configs, device=device)

        self.attribute_interdependence = config.instantiation_functions(functions=attribute_interdependence, function_configs=attribute_interdependence_configs, device=device)
        self.instance_interdependence = config.instantiation_functions(functions=instance_interdependence, function_configs=instance_interdependence_configs, device=device)

        self.input_process_functions = config.instantiation_functions(input_process_functions, input_process_function_configs, device=device)
        self.output_process_functions = config.instantiation_functions(output_process_functions, output_process_function_configs, device=device)
        self.channel_fusion = config.instantiation_functions(functions=channel_fusion, function_configs=channel_fusion_configs, device=device)
        if self.channel_num > 1 and self.channel_fusion is None:
            self.channel_fusion = mean_fusion(dims=[self.n] * self.channel_num)

        # create learnable parameters for parameter fabrication and remainder functions
        self.w = None
        self.b = None
        self.w_remainder = None
        self.b_remainder = None
        self.w_attribute_interdependence = None
        self.w_instance_interdependence = None
        self.w_channel_fusion = None

        self.parameters_init_method = parameters_init_method
        if create_parameters_at_init:
            self.create_learnable_parameters()

    def get_m(self):
        """
        Retrieves the input dimension (`m`) of the head.

        Returns
        -------
        int
            The input dimension of the head.
        """
        return self.m

    def get_n(self):
        """
        Retrieves the output dimension (`n`) of the head.

        Returns
        -------
        int
            The output dimension of the head.
        """
        return self.n

    def get_channel_num(self):
        """
        Retrieves the number of channels in the head.

        Returns
        -------
        int
            The number of channels in the head.
        """
        return self.channel_num

    def get_batch_num(self):
        """
        Retrieves the batch size used in instance interdependence functions.

        Returns
        -------
        int or None
            The batch size used for instance interdependence, or None if not specified.
        """
        return self.batch_num

    def create_learnable_parameters(
        self,
        initialize_parameter_at_creation: bool = False,
        init_type: str = 'xavier_uniform',
        init_bias: bool = True,
        *args, **kwargs
    ):
        """
        Creates learnable parameters for the head.

        This method creates parameters for data transformation, parameter reconciliation,
        remainder functions, and channel fusion based on the head configuration.

        Parameters
        ----------
        initialize_parameter_at_creation: bool, default=False
            Whether to initialize parameters during creation.
        init_type: str, default='xavier_uniform'
            The initialization method for parameters.
        init_bias: bool, default=True
            Whether to initialize bias parameters.

        Returns
        -------
        None
        """
        m_prime, b_prime = self.m, self.batch_num

        if self.attribute_interdependence is not None:
            if self.attribute_interdependence.require_parameters:
                if self.l_attribute_interdependence is None:
                    self.l_attribute_interdependence = self.attribute_interdependence.calculate_l()
                self.w_attribute_interdependence = torch.nn.Parameter(torch.rand(self.channel_num, self.l_attribute_interdependence, device=self.device))
            assert self.m is not None and self.m >= 1
            m_prime = self.attribute_interdependence.calculate_m_prime(m=self.m)

        if self.instance_interdependence is not None:
            if self.instance_interdependence.require_parameters:
                if self.l_instance_interdependence is None:
                    self.l_instance_interdependence = self.instance_interdependence.calculate_l()
                self.w_instance_interdependence = torch.nn.Parameter(torch.rand(self.channel_num, self.l_instance_interdependence, device=self.device))
            if self.batch_num is not None:
                assert self.batch_num is not None and self.batch_num >= 1
                b_prime = self.instance_interdependence.calculate_b_prime(b=self.batch_num)

        # create learnable parameters for parameter_fabrication function
        if self.parameter_fabrication is not None and self.parameter_fabrication.require_parameters:
            if self.l is None:
                self.l = self.parameter_fabrication.calculate_l(n=self.n, D=self.data_transformation.calculate_D(m=m_prime))
            self.w = torch.nn.Parameter(torch.rand(self.channel_num, self.l, device=self.device))
            if self.parameter_fabrication.enable_bias:
                self.b = torch.nn.Parameter(torch.rand(self.n, device=self.device))

        # create learnable parameters for remainder function
        if self.remainder is not None and self.remainder.require_parameters:
            self.w_remainder = torch.nn.Parameter(torch.rand(self.n, self.m, device=self.device))
            if self.remainder.enable_bias:
                self.b_remainder = torch.nn.Parameter(torch.rand(self.n, device=self.device))
        elif self.m != self.n and not self.remainder.require_parameters and not isinstance(self.remainder, tinybig.remainder.zero_remainder) and not isinstance(self.remainder, tinybig.remainder.constant_remainder):
            raise ValueError('The input and output dimensions {}, {} are different, parameters will be needed '
                             'by the {} to adjust the input dimensions.'.format(self.m, self.n, self.remainder.get_name()))

        # create learnable parameters for channel_fusion function
        if self.channel_fusion is not None and self.channel_fusion.require_parameters:
            if self.l_channel_fusion is None:
                self.l_channel_fusion = self.channel_fusion.calculate_l()
            self.w_channel_fusion = torch.nn.Parameter(torch.rand(1, self.l_channel_fusion, device=self.device))

        # initialize the parameter with certain methods...
        init_type = self.parameters_init_method if self.parameters_init_method is not None else init_type
        if initialize_parameter_at_creation:
            self.initialize_parameters(init_type=init_type, init_bias=init_bias)

    def initialize_parameters(self, init_type='xavier_uniform', init_bias=True, *args, **kwargs):
        """
        The parameter initialization method.

        It initializes the multi-channel parameters in the head with different initialization approaches,
        e.g., xavier_uniform or kaiming_uniform.
        Depending on the "init_type" parameter, this method will call the corresponding initiation methods.

        Parameters
        ----------
        init_type: str, default = 'xavier_uniform'
            The parameter initialization approach.
        init_bias: bool, default = True
            The boolean tag of bias initialization.

        Returns
        -------
        None
            This initialization method doesn't have any return values.
        """
        init_type = self.parameters_init_method if self.parameters_init_method is not None else init_type

        print('parameter init type', init_type)

        if init_type == 'kaiming_uniform':
            self.initialize_parameters_kaiming_uniform(init_bias=init_bias, *args, **kwargs)
        elif init_type == 'xavier_uniform':
            self.initialize_parameters_xavier_uniform(init_bias=init_bias, *args, **kwargs)
        elif init_type == 'xavier_normal':
            self.initialize_parameters_xavier_normal(init_bias=init_bias, *args, **kwargs)
        elif init_type == 'fanout_std_uniform':
            self.initialize_parameters_fanout_std_uniform(init_bias=init_bias, *args, **kwargs)

    def initialize_parameters_fanout_std_uniform(self, init_bias=True, fan_out: int = None, *args, **kwargs):
        """
        The kaiming parameter initialization method.

        It initializes the multi-channel parameters in the head with kaiming_uniform_ method from pytorch.

        Parameters
        ----------
        init_bias: bool, default = True
            The boolean tag of bias initialization.

        Returns
        -------
        None
            This initialization method doesn't have any return values.
        """
        fan_out = fan_out if fan_out is not None else self.n
        if fan_out is None: fan_out = self.m
        assert fan_out is not None and fan_out > 0
        std = 1. / math.sqrt(fan_out)

        if self.w_attribute_interdependence is not None:
            self.w_attribute_interdependence.data.uniform_(-std, std)

        if self.w_instance_interdependence is not None:
            self.w_instance_interdependence.data.uniform_(-std, std)

        if self.w is not None:
            self.w.data.uniform_(-std, std)
            if init_bias and self.b is not None:
                self.b.data.uniform_(-std, std)

        if self.w_remainder is not None:
            self.w_remainder.data.uniform_(-std, std)
            if init_bias and self.b_remainder is not None:
                self.b_remainder.data.uniform_(-std, std)

        if self.w_channel_fusion is not None:
            self.w_channel_fusion.data.uniform_(-std, std)

    def initialize_parameters_kaiming_uniform(self, init_bias=True, *args, **kwargs):
        """
        The kaiming parameter initialization method.

        It initializes the multi-channel parameters in the head with kaiming_uniform_ method from pytorch.

        Parameters
        ----------
        init_bias: bool, default = True
            The boolean tag of bias initialization.

        Returns
        -------
        None
            This initialization method doesn't have any return values.
        """

        if self.w_attribute_interdependence is not None:
            torch.nn.init.kaiming_uniform_(self.w_attribute_interdependence, a=math.sqrt(5))

        if self.w_instance_interdependence is not None:
            torch.nn.init.kaiming_uniform_(self.w_instance_interdependence, a=math.sqrt(5))

        if self.w is not None:
            torch.nn.init.kaiming_uniform_(self.w, a=math.sqrt(5))

        if self.w_remainder is not None:
            torch.nn.init.kaiming_uniform_(self.w_remainder, a=math.sqrt(5))

        if self.w_channel_fusion is not None:
            torch.nn.init.kaiming_uniform_(self.w_channel_fusion, a=math.sqrt(5))

        if init_bias:
            if self.b is not None:
                fan_in, _ = torch.nn.init._calculate_fan_in_and_fan_out(self.w)
                bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
                torch.nn.init.uniform_(self.b, -bound, bound)
            if self.b_remainder is not None:
                fan_in, _ = torch.nn.init._calculate_fan_in_and_fan_out(self.w_remainder)
                bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
                torch.nn.init.uniform_(self.b_remainder, -bound, bound)

    def initialize_parameters_xavier_uniform(self, init_bias=True, *args, **kwargs):
        """
        The xavier initialization method.

        It initializes the multi-channel parameters in the head with xavier_uniform_ method from pytorch.

        Parameters
        ----------
        init_bias: bool, default = True
            The boolean tag of bias initialization.

        Returns
        -------
        None
            This initialization method doesn't have any return values.
        """
        if self.w_attribute_interdependence is not None:
            torch.nn.init.xavier_uniform_(self.w_attribute_interdependence)

        if self.w_instance_interdependence is not None:
            torch.nn.init.xavier_uniform_(self.w_instance_interdependence)

        if self.w is not None:
            torch.nn.init.xavier_uniform_(self.w)

        if self.w_remainder is not None:
            torch.nn.init.xavier_uniform_(self.w_remainder)

        if self.w_channel_fusion is not None:
            torch.nn.init.xavier_uniform_(self.w_channel_fusion)

        if init_bias:
            if self.b is not None:
                torch.nn.init.xavier_uniform_(self.b.view(1, -1))
            if self.b_remainder is not None:
                torch.nn.init.xavier_uniform_(self.b_remainder.view(1, -1))

    def initialize_parameters_xavier_normal(self, init_bias=True, *args, **kwargs):
        """
        The xavier initialization method.

        It initializes the multi-channel parameters in the head with xavier_uniform_ method from pytorch.

        Parameters
        ----------
        init_bias: bool, default = True
            The boolean tag of bias initialization.

        Returns
        -------
        None
            This initialization method doesn't have any return values.
        """
        if self.w_attribute_interdependence is not None:
            torch.nn.init.xavier_normal_(self.w_attribute_interdependence)

        if self.w_instance_interdependence is not None:
            torch.nn.init.xavier_normal_(self.w_instance_interdependence)

        if self.w is not None:
            torch.nn.init.xavier_normal_(self.w)

        if self.w_remainder is not None:
            torch.nn.init.xavier_normal_(self.w_remainder)

        if self.w_channel_fusion is not None:
            torch.nn.init.xavier_normal_(self.w_channel_fusion)

        if init_bias:
            if self.b is not None:
                torch.nn.init.xavier_normal_(self.b.view(1, -1))
            if self.b_remainder is not None:
                torch.nn.init.xavier_normal_(self.b_remainder.view(1, -1))

    def to_config(self):
        """
        Converts the configuration of the head into a dictionary.

        This includes the head's attributes, such as dimensions, transformation functions,
        interdependence functions, fabrication functions, and remainder functions.

        Returns
        -------
        dict
            A dictionary containing the head's class and parameter configurations.
        """
        head_class = f"{self.__class__.__module__}.{self.__class__.__name__}"
        head_parameters = {
            'name': self.name,
            'device': self.device,
            'm': self.m,
            'n': self.n,
            'l': self.l,
            'batch_num': self.batch_num,
            'channel_num': self.channel_num,
        }

        if self.data_transformation is not None:
            head_parameters['data_transformation_configs'] = self.data_transformation.to_config()
        if self.attribute_interdependence is not None:
            head_parameters['attribute_interdependence_configs'] = self.attribute_interdependence.to_config()
        if self.instance_interdependence is not None:
            head_parameters['instance_interdependence_configs'] = self.instance_interdependence.to_config()
        if self.parameter_fabrication is not None:
            head_parameters['parameter_fabrication_configs'] = self.parameter_fabrication.to_config()
        if self.channel_fusion is not None:
            head_parameters['channel_fusion_configs'] = self.channel_fusion.to_config()
        if self.remainder is not None:
            head_parameters['remainder_configs'] = self.remainder.to_config()
        if self.input_process_functions is not None:
            head_parameters['input_process_function_configs'] = function.functions_to_configs(self.input_process_functions)
        if self.output_process_functions is not None:
            head_parameters['output_process_function_configs'] = function.functions_to_configs(self.output_process_functions)

        return {
            "head_class": head_class,
            "head_parameters": head_parameters
        }

    def calculate_kappa_x(self, x: torch.Tensor, device='cpu', *args, **kwargs):
        r"""
        Computes the transformed data $\kappa(\mathbf{x})$ using the data transformation function.

        If no data transformation function is defined, the input data is returned as-is.

        Parameters
        ----------
        x: torch.Tensor
            The input data to be transformed.
        device: str, default='cpu'
            The device to execute the data transformation.

        Returns
        -------
        torch.Tensor
            The transformed data $\kappa(\mathbf{x})$.
        """
        if self.data_transformation is not None:
            if self.data_transformation.device != device:
                self.data_transformation.to(device)

            kappa_x = self.data_transformation(x, device=device)
            return kappa_x
        else:
            return x

    def calculate_phi_w(self, D: int, channel_index: int = 0, device='cpu', *args, **kwargs):
        r"""
        Computes the reconciled parameters $\psi(\mathbf{w})$ for a specific channel.

        Parameters
        ----------
        D: int
            The dimensionality of the transformed data $\kappa(\mathbf{x})$.
        channel_index: int, default=0
            The index of the channel for which parameters are computed.
        device: str, default='cpu'
            The device to execute the parameter reconciliation.

        Returns
        -------
        torch.Tensor or None
            The reconciled parameters $\psi(\mathbf{w})$ for the specified channel, or None if not applicable.
        """
        assert channel_index in range(self.channel_num)

        if self.parameter_fabrication is not None:
            if self.parameter_fabrication.device != device:
                self.parameter_fabrication.to(device)

            if self.w is not None and 0 <= channel_index < self.w.size(0):
                w_chunk = self.w[channel_index:channel_index+1, :]
            else:
                w_chunk = None
            phi_w = self.parameter_fabrication(w=w_chunk, n=self.n, D=D, device=device)
            return phi_w
        else:
            return None

    def calculate_pi_x(self, x: torch.Tensor, device='cpu', *args, **kwargs):
        r"""
        Computes the remainder term $\pi(\mathbf{x})$ using the remainder function.

        Parameters
        ----------
        x: torch.Tensor
            The input data to compute the remainder term.
        device: str, default='cpu'
            The device to execute the remainder calculation.

        Returns
        -------
        torch.Tensor or None
            The remainder term $\pi(\mathbf{x})$ if a remainder function is defined, otherwise None.
        """
        if self.remainder is not None:
            if isinstance(self.remainder, tinybig.remainder.zero_remainder):
                return None

            if self.remainder.device != device:
                self.remainder.to(device)
            pi_x = self.remainder(x=x, w=self.w_remainder, b=self.b_remainder, m=self.m, n=self.n, device=device)
            return pi_x
        else:
            return None

    def calculate_attribute_xi_x(self, x: torch.Tensor, channel_index: int = 0, kappa_x: torch.Tensor = None, device='cpu', *args, **kwargs):
        r"""
        Computes the attribute interdependence $\xi_{\text{attribute}}(\mathbf{x})$.

        Parameters
        ----------
        x: torch.Tensor
            The input data to compute the attribute interdependence.
        channel_index: int, default=0
            The index of the channel for which interdependence is computed.
        kappa_x: torch.Tensor, optional
            The precomputed transformed data to use for interdependence calculation.
        device: str, default='cpu'
            The device to execute the interdependence calculation.

        Returns
        -------
        torch.Tensor
            The attribute interdependence $\xi_{\text{attribute}}(\mathbf{x})$.
        """
        if self.attribute_interdependence is not None:
            if self.attribute_interdependence.device != device:
                self.attribute_interdependence.to(device)

            if self.w_attribute_interdependence is not None and 0 <= channel_index < self.w_attribute_interdependence.size(0):
                w_chunks = self.w_attribute_interdependence[channel_index:channel_index+1, :]
            else:
                w_chunks = None

            xi_x = self.attribute_interdependence(x=x, w=w_chunks, kappa_x=kappa_x, device=device)

            return xi_x
        else:
            return kappa_x if kappa_x is not None else x

    def calculate_instance_xi_x(self, x: torch.Tensor, channel_index: int = 0, kappa_x: torch.Tensor = None, device='cpu', *args, **kwargs):
        r"""
        Computes the instance interdependence $\xi_{\text{instance}}(\mathbf{x})$.

        Parameters
        ----------
        x: torch.Tensor
            The input data to compute the instance interdependence.
        channel_index: int, default=0
            The index of the channel for which interdependence is computed.
        kappa_x: torch.Tensor, optional
            The precomputed transformed data to use for interdependence calculation.
        device: str, default='cpu'
            The device to execute the interdependence calculation.

        Returns
        -------
        torch.Tensor
            The instance interdependence $\xi_{\text{instance}}(\mathbf{x})$.
        """
        if self.instance_interdependence is not None:
            if self.instance_interdependence.device != device:
                self.instance_interdependence.to(device)

            if self.w_instance_interdependence is not None and 0 <= channel_index < self.w_instance_interdependence.size(0):
                w_chunks = self.w_instance_interdependence[channel_index:channel_index+1, :]
            else:
                w_chunks = None
            xi_x = self.instance_interdependence(x=x, w=w_chunks, kappa_x=kappa_x, device=device)
            return xi_x
        else:
            return kappa_x if kappa_x is not None else x

    # this function checks conditions for faster calculation across multi-channels...
    def calculate_kappa_xi_x(self, x: torch.Tensor, channel_index: int = 0, device='cpu', *args, **kwargs):
        r"""
        Computes the combined transformed and interdependent data $\kappa(\xi(\mathbf{x}))$.

        Parameters
        ----------
        x: torch.Tensor
            The input data to compute the combined transformation and interdependence.
        channel_index: int, default=0
            The index of the channel for which the computation is performed.
        device: str, default='cpu'
            The device to execute the computation.

        Returns
        -------
        torch.Tensor
            The combined transformed and interdependent data $\kappa(\xi(\mathbf{x}))$.
        """
        # ************** Attribute Interdependence Block **************
        xi_x = self.calculate_attribute_xi_x(x=x, channel_index=channel_index, device=self.device)
        # ************** Data Expansion Block **************
        kappa_x = self.calculate_kappa_x(x=xi_x, device=device, *args, **kwargs)
        assert kappa_x.shape[1] == self.data_transformation.calculate_D(m=xi_x.shape[1])

        # ************** Instance Interdependence Block **************
        kappa_xi_x = self.calculate_instance_xi_x(x=x, channel_index=channel_index, kappa_x=kappa_x, device=self.device)
        return kappa_xi_x

    def calculate_inner_product(self, kappa_xi_x: torch.Tensor, phi_w: torch.Tensor, device: str = 'cpu', *args, **kwargs):
        r"""
        Computes the inner product of $\kappa(\mathbf{x})$ and $\psi(\mathbf{w})$.

        Parameters
        ----------
        kappa_xi_x: torch.Tensor
            The transformed and interdependent input data.
        phi_w: torch.Tensor
            The reconciled parameters.
        device: str, default='cpu'
            The device hosting the operation.

        Returns
        -------
        torch.Tensor
            The inner product of the transformed data and parameters.
        """
        if phi_w is not None:
            assert kappa_xi_x.ndim == 2 and phi_w.ndim == 2 and kappa_xi_x.size(-1) == phi_w.size(-1)
            if device != 'mps' and (kappa_xi_x.is_sparse or phi_w.is_sparse):
                inner_prod = torch.sparse.mm(kappa_xi_x, phi_w.T)
                if self.b is not None:
                    inner_prod += self.b
            else:
                inner_prod = F.linear(kappa_xi_x, phi_w, bias=self.b)
        else:
            inner_prod = kappa_xi_x
        return inner_prod

    def fusion(self, inner_products: list[torch.Tensor], device: str = 'cpu', *args, **kwargs):
        """
        Combines the multi-channel outputs into a single output.

        If a channel fusion function is defined, it applies the function to combine
        the inner product results. Otherwise, it returns the first channel's result.

        Parameters
        ----------
        inner_products: list of torch.Tensor
            The inner products computed from each channel.
        device: str, default='cpu'
            The device hosting the operation.

        Returns
        -------
        torch.Tensor
            The fused output.
        """
        if self.channel_fusion is not None:
            assert self.channel_fusion.get_dims() is None or self.channel_fusion.get_num() == len(inner_products)
            result = self.channel_fusion(x=inner_products, w=self.w_channel_fusion, device=device)
            n = self.channel_fusion.calculate_n(dims=[result.size(-1) for result in inner_products])
        else:
            assert len(inner_products) == 1
            result = inner_products[0]
            n = self.n
        assert result.size(-1) == n
        return result

    def forward(self, x: torch.Tensor, device='cpu', *args, **kwargs):
        r"""
        The forward method of the RPN head module.

        Based on the data expansion, parameter reconciliation and remainder functions, the RPN head will calculate its
        output with multi-channel parameters as follows:
        $$
            \begin{equation}
                g(\mathbf{x} | \mathbf{w}, C) = \sum_{c=0}^{C-1} \left\langle \kappa(\mathbf{x}), \psi(\mathbf{w}^{c}) \right\rangle + \pi(\mathbf{x}),
            \end{equation}
        $$
        where these multi-channel parameters $\mathbf{w}^{0}, \mathbf{w}^{1}, \cdots, \mathbf{w}^{C-1}$ are fabricated
        from length $l$ to shape $(n, D)$ using the identical parameter reconciliation function.

        Parameters
        ----------
        x: torch.Tensor
            The input data vector.
        device: str, default = 'cpu'
            The device for hosting the head.

        Returns
        -------
        torch.Tensor
            The processed output of the head.
        """
        # ************** Input Processing Block **************
        if x is None:
            raise ValueError("x cannot be None...")

        x = function.func_x(x=x, functions=self.input_process_functions, device=device)

        inner_products = []

        pre_computed_kappa_xi_x = None
        # if the instance functions has no parameters, it can be pre-computed and reused across channels
        if (
            (self.attribute_interdependence is None or not self.attribute_interdependence.require_parameters) and
            (self.instance_interdependence is None or not self.instance_interdependence.require_parameters) and
            self.channel_num > 1
        ):
            pre_computed_kappa_xi_x = self.calculate_kappa_xi_x(x=x, channel_index=0, device=device)

        for channel_index in range(self.channel_num):

            # ************** Data Transformation Block **************
            if (
                (self.attribute_interdependence is None or not self.attribute_interdependence.require_parameters)
                and (self.instance_interdependence is None or not self.instance_interdependence.require_parameters)
                and pre_computed_kappa_xi_x is not None
            ):
                kappa_xi_x = pre_computed_kappa_xi_x
            else:
                kappa_xi_x = self.calculate_kappa_xi_x(x=x, channel_index=channel_index, device=device)

            # ************** Parameter Reconciliation Block **************
            phi_w = self.calculate_phi_w(D=kappa_xi_x.size(-1), channel_index=channel_index, device=device, *args, **kwargs)

            # ************** Inner Product Calculation Block **************
            inner_prod = self.calculate_inner_product(kappa_xi_x=kappa_xi_x, phi_w=phi_w, device=device, *args, **kwargs)

            inner_products.append(inner_prod)

        # ************** Multi-Channel Fusion Block **************
        result = self.fusion(inner_products=inner_products, device=device)

        # ************** Remainder Block **************
        pi_x = self.calculate_pi_x(x=x, device=device, *args, **kwargs)

        if pi_x is not None:
            assert pi_x.size(-1) == result.size(-1)
            result += pi_x

        # ************** Output Processing Block **************
        output = function.func_x(x=result, functions=self.output_process_functions, device=self.device)
        return output

__init__(m, n, name='rpn_head', batch_num=None, channel_num=1, l=None, l_attribute_interdependence=None, l_instance_interdependence=None, l_channel_fusion=None, input_process_functions=None, data_transformation=None, attribute_interdependence=None, instance_interdependence=None, parameter_fabrication=None, channel_fusion=None, remainder=None, output_process_functions=None, input_process_function_configs=None, data_transformation_configs=None, attribute_interdependence_configs=None, instance_interdependence_configs=None, parameter_fabrication_configs=None, channel_fusion_configs=None, remainder_configs=None, output_process_function_configs=None, create_parameters_at_init=True, parameters_init_method=None, device='cpu', *args, **kwargs)

The initialization method of the RPN-head with multiple channels.

It initializes the RPN head module with multi-channel. Specifically, this method initializes the dimension configurations of the head, the component functions used in the head, and defines the device to host the head.

Parameters:

Name Type Description Default
m int

The input dimension of the head.

required
n int

The output dimension of the head.

required
l int

The number of parameter for each channel in the head.

None
channel_num int

The number of channels in the head.

1
data_transformation transformation

The data transformation function of the head. The data transformation can be initialized directly with this parameter or with the data_transformation_config parameter.

None
parameter_fabrication fabrication

The parameter fabrication function of the head. The parameter fabrication can be initialized directly with this parameter or with the parameter_fabrication_config parameter.

None
remainder remainder

The remainder function the head. The remainder can be initialized directly with this parameter or with the remainder_config parameter.

None
output_process_functions

The output processing functions. The output processing function can be initialized directly with this parameter or with the output_processing_function_configs parameter.

None
data_transformation_configs

The data transformation function configuration.

None
parameter_fabrication_configs

The parameter fabrication function configuration.

None
remainder_configs

The remainder function configuration.

None
output_process_function_configs

The output processing function configuration.

None
device

The device for hosting the head.

'cpu'

Returns:

Type Description
object

This method will return the initialized RPN-head object.

Source code in tinybig/module/base_head.py
def __init__(
    self,
    m: int,
    n: int,
    name: str = 'rpn_head',
    batch_num: int = None,
    channel_num: int = 1,
    l: int = None,
    l_attribute_interdependence: int = None,
    l_instance_interdependence: int = None,
    l_channel_fusion: int = None,

    input_process_functions=None,
    data_transformation: transformation_class = None,
    attribute_interdependence: interdependence_class = None,
    instance_interdependence: interdependence_class = None,
    parameter_fabrication: fabrication_class = None,
    channel_fusion: fusion_class = None,
    remainder: remainder_class = None,
    output_process_functions=None,

    input_process_function_configs=None,
    data_transformation_configs=None,
    attribute_interdependence_configs=None,
    instance_interdependence_configs=None,
    parameter_fabrication_configs=None,
    channel_fusion_configs=None,
    remainder_configs=None,
    output_process_function_configs=None,

    create_parameters_at_init: bool = True,
    parameters_init_method: str = None,
    device='cpu',
    *args, **kwargs
):
    r"""
    The initialization method of the RPN-head with multiple channels.

    It initializes the RPN head module with multi-channel.
    Specifically, this method initializes the dimension configurations of the head,
    the component functions used in the head, and defines the device to host the head.

    Parameters
    ----------
    m: int
        The input dimension of the head.
    n: int
        The output dimension of the head.
    l: int, default = None
        The number of parameter for each channel in the head.
    channel_num: int, default = 1
        The number of channels in the head.
    data_transformation: object, default = None
        The data transformation function of the head. The data transformation can be initialized directly
        with this parameter or with the data_transformation_config parameter.
    parameter_fabrication: object, default = None
        The parameter fabrication function of the head. The parameter fabrication can be initialized directly
        with this parameter or with the parameter_fabrication_config parameter.
    remainder: object, default = None
        The remainder function the head. The remainder can be initialized directly
        with this parameter or with the remainder_config parameter.
    output_process_functions: object, default = None
        The output processing functions. The output processing function can be initialized directly
        with this parameter or with the output_processing_function_configs parameter.
    data_transformation_configs: dict, default = None
        The data transformation function configuration.
    parameter_fabrication_configs: dict, default = None
        The parameter fabrication function configuration.
    remainder_configs: dict, default = None
        The remainder function configuration.
    output_process_function_configs: dict, default = None
        The output processing function configuration.
    device: str, default = 'cpu'
        The device for hosting the head.

    Returns
    ----------
    object
        This method will return the initialized RPN-head object.
    """
    Module.__init__(self)
    function.__init__(self, name=name, device=device)

    assert (channel_num >= 1) and (m is not None and m >= 1) and (n is not None and n >= 1)
    # initialize the basic attributes
    self.m = m
    self.n = n
    self.batch_num = batch_num
    self.channel_num = channel_num
    self.l = l
    self.l_attribute_interdependence = l_attribute_interdependence
    self.l_instance_interdependence = l_instance_interdependence
    self.l_channel_fusion = l_channel_fusion

    # initialize data_transformation, interdependence, interdependence_fusion, parameter_fabrication, channel_fusion and remainder functions from either input objects or input configs
    self.data_transformation = config.instantiation_functions(functions=data_transformation, function_configs=data_transformation_configs, device=device)
    self.parameter_fabrication = config.instantiation_functions(functions=parameter_fabrication, function_configs=parameter_fabrication_configs, device=device)
    self.remainder = config.instantiation_functions(functions=remainder, function_configs=remainder_configs, device=device)

    self.attribute_interdependence = config.instantiation_functions(functions=attribute_interdependence, function_configs=attribute_interdependence_configs, device=device)
    self.instance_interdependence = config.instantiation_functions(functions=instance_interdependence, function_configs=instance_interdependence_configs, device=device)

    self.input_process_functions = config.instantiation_functions(input_process_functions, input_process_function_configs, device=device)
    self.output_process_functions = config.instantiation_functions(output_process_functions, output_process_function_configs, device=device)
    self.channel_fusion = config.instantiation_functions(functions=channel_fusion, function_configs=channel_fusion_configs, device=device)
    if self.channel_num > 1 and self.channel_fusion is None:
        self.channel_fusion = mean_fusion(dims=[self.n] * self.channel_num)

    # create learnable parameters for parameter fabrication and remainder functions
    self.w = None
    self.b = None
    self.w_remainder = None
    self.b_remainder = None
    self.w_attribute_interdependence = None
    self.w_instance_interdependence = None
    self.w_channel_fusion = None

    self.parameters_init_method = parameters_init_method
    if create_parameters_at_init:
        self.create_learnable_parameters()

calculate_attribute_xi_x(x, channel_index=0, kappa_x=None, device='cpu', *args, **kwargs)

Computes the attribute interdependence \(\xi_{\text{attribute}}(\mathbf{x})\).

Parameters:

Name Type Description Default
x Tensor

The input data to compute the attribute interdependence.

required
channel_index int

The index of the channel for which interdependence is computed.

0
kappa_x Tensor

The precomputed transformed data to use for interdependence calculation.

None
device

The device to execute the interdependence calculation.

'cpu'

Returns:

Type Description
Tensor

The attribute interdependence \(\xi_{\text{attribute}}(\mathbf{x})\).

Source code in tinybig/module/base_head.py
def calculate_attribute_xi_x(self, x: torch.Tensor, channel_index: int = 0, kappa_x: torch.Tensor = None, device='cpu', *args, **kwargs):
    r"""
    Computes the attribute interdependence $\xi_{\text{attribute}}(\mathbf{x})$.

    Parameters
    ----------
    x: torch.Tensor
        The input data to compute the attribute interdependence.
    channel_index: int, default=0
        The index of the channel for which interdependence is computed.
    kappa_x: torch.Tensor, optional
        The precomputed transformed data to use for interdependence calculation.
    device: str, default='cpu'
        The device to execute the interdependence calculation.

    Returns
    -------
    torch.Tensor
        The attribute interdependence $\xi_{\text{attribute}}(\mathbf{x})$.
    """
    if self.attribute_interdependence is not None:
        if self.attribute_interdependence.device != device:
            self.attribute_interdependence.to(device)

        if self.w_attribute_interdependence is not None and 0 <= channel_index < self.w_attribute_interdependence.size(0):
            w_chunks = self.w_attribute_interdependence[channel_index:channel_index+1, :]
        else:
            w_chunks = None

        xi_x = self.attribute_interdependence(x=x, w=w_chunks, kappa_x=kappa_x, device=device)

        return xi_x
    else:
        return kappa_x if kappa_x is not None else x

calculate_inner_product(kappa_xi_x, phi_w, device='cpu', *args, **kwargs)

Computes the inner product of \(\kappa(\mathbf{x})\) and \(\psi(\mathbf{w})\).

Parameters:

Name Type Description Default
kappa_xi_x Tensor

The transformed and interdependent input data.

required
phi_w Tensor

The reconciled parameters.

required
device str

The device hosting the operation.

'cpu'

Returns:

Type Description
Tensor

The inner product of the transformed data and parameters.

Source code in tinybig/module/base_head.py
def calculate_inner_product(self, kappa_xi_x: torch.Tensor, phi_w: torch.Tensor, device: str = 'cpu', *args, **kwargs):
    r"""
    Computes the inner product of $\kappa(\mathbf{x})$ and $\psi(\mathbf{w})$.

    Parameters
    ----------
    kappa_xi_x: torch.Tensor
        The transformed and interdependent input data.
    phi_w: torch.Tensor
        The reconciled parameters.
    device: str, default='cpu'
        The device hosting the operation.

    Returns
    -------
    torch.Tensor
        The inner product of the transformed data and parameters.
    """
    if phi_w is not None:
        assert kappa_xi_x.ndim == 2 and phi_w.ndim == 2 and kappa_xi_x.size(-1) == phi_w.size(-1)
        if device != 'mps' and (kappa_xi_x.is_sparse or phi_w.is_sparse):
            inner_prod = torch.sparse.mm(kappa_xi_x, phi_w.T)
            if self.b is not None:
                inner_prod += self.b
        else:
            inner_prod = F.linear(kappa_xi_x, phi_w, bias=self.b)
    else:
        inner_prod = kappa_xi_x
    return inner_prod

calculate_instance_xi_x(x, channel_index=0, kappa_x=None, device='cpu', *args, **kwargs)

Computes the instance interdependence \(\xi_{\text{instance}}(\mathbf{x})\).

Parameters:

Name Type Description Default
x Tensor

The input data to compute the instance interdependence.

required
channel_index int

The index of the channel for which interdependence is computed.

0
kappa_x Tensor

The precomputed transformed data to use for interdependence calculation.

None
device

The device to execute the interdependence calculation.

'cpu'

Returns:

Type Description
Tensor

The instance interdependence \(\xi_{\text{instance}}(\mathbf{x})\).

Source code in tinybig/module/base_head.py
def calculate_instance_xi_x(self, x: torch.Tensor, channel_index: int = 0, kappa_x: torch.Tensor = None, device='cpu', *args, **kwargs):
    r"""
    Computes the instance interdependence $\xi_{\text{instance}}(\mathbf{x})$.

    Parameters
    ----------
    x: torch.Tensor
        The input data to compute the instance interdependence.
    channel_index: int, default=0
        The index of the channel for which interdependence is computed.
    kappa_x: torch.Tensor, optional
        The precomputed transformed data to use for interdependence calculation.
    device: str, default='cpu'
        The device to execute the interdependence calculation.

    Returns
    -------
    torch.Tensor
        The instance interdependence $\xi_{\text{instance}}(\mathbf{x})$.
    """
    if self.instance_interdependence is not None:
        if self.instance_interdependence.device != device:
            self.instance_interdependence.to(device)

        if self.w_instance_interdependence is not None and 0 <= channel_index < self.w_instance_interdependence.size(0):
            w_chunks = self.w_instance_interdependence[channel_index:channel_index+1, :]
        else:
            w_chunks = None
        xi_x = self.instance_interdependence(x=x, w=w_chunks, kappa_x=kappa_x, device=device)
        return xi_x
    else:
        return kappa_x if kappa_x is not None else x

calculate_kappa_x(x, device='cpu', *args, **kwargs)

Computes the transformed data \(\kappa(\mathbf{x})\) using the data transformation function.

If no data transformation function is defined, the input data is returned as-is.

Parameters:

Name Type Description Default
x Tensor

The input data to be transformed.

required
device

The device to execute the data transformation.

'cpu'

Returns:

Type Description
Tensor

The transformed data \(\kappa(\mathbf{x})\).

Source code in tinybig/module/base_head.py
def calculate_kappa_x(self, x: torch.Tensor, device='cpu', *args, **kwargs):
    r"""
    Computes the transformed data $\kappa(\mathbf{x})$ using the data transformation function.

    If no data transformation function is defined, the input data is returned as-is.

    Parameters
    ----------
    x: torch.Tensor
        The input data to be transformed.
    device: str, default='cpu'
        The device to execute the data transformation.

    Returns
    -------
    torch.Tensor
        The transformed data $\kappa(\mathbf{x})$.
    """
    if self.data_transformation is not None:
        if self.data_transformation.device != device:
            self.data_transformation.to(device)

        kappa_x = self.data_transformation(x, device=device)
        return kappa_x
    else:
        return x

calculate_kappa_xi_x(x, channel_index=0, device='cpu', *args, **kwargs)

Computes the combined transformed and interdependent data \(\kappa(\xi(\mathbf{x}))\).

Parameters:

Name Type Description Default
x Tensor

The input data to compute the combined transformation and interdependence.

required
channel_index int

The index of the channel for which the computation is performed.

0
device

The device to execute the computation.

'cpu'

Returns:

Type Description
Tensor

The combined transformed and interdependent data \(\kappa(\xi(\mathbf{x}))\).

Source code in tinybig/module/base_head.py
def calculate_kappa_xi_x(self, x: torch.Tensor, channel_index: int = 0, device='cpu', *args, **kwargs):
    r"""
    Computes the combined transformed and interdependent data $\kappa(\xi(\mathbf{x}))$.

    Parameters
    ----------
    x: torch.Tensor
        The input data to compute the combined transformation and interdependence.
    channel_index: int, default=0
        The index of the channel for which the computation is performed.
    device: str, default='cpu'
        The device to execute the computation.

    Returns
    -------
    torch.Tensor
        The combined transformed and interdependent data $\kappa(\xi(\mathbf{x}))$.
    """
    # ************** Attribute Interdependence Block **************
    xi_x = self.calculate_attribute_xi_x(x=x, channel_index=channel_index, device=self.device)
    # ************** Data Expansion Block **************
    kappa_x = self.calculate_kappa_x(x=xi_x, device=device, *args, **kwargs)
    assert kappa_x.shape[1] == self.data_transformation.calculate_D(m=xi_x.shape[1])

    # ************** Instance Interdependence Block **************
    kappa_xi_x = self.calculate_instance_xi_x(x=x, channel_index=channel_index, kappa_x=kappa_x, device=self.device)
    return kappa_xi_x

calculate_phi_w(D, channel_index=0, device='cpu', *args, **kwargs)

Computes the reconciled parameters \(\psi(\mathbf{w})\) for a specific channel.

Parameters:

Name Type Description Default
D int

The dimensionality of the transformed data \(\kappa(\mathbf{x})\).

required
channel_index int

The index of the channel for which parameters are computed.

0
device

The device to execute the parameter reconciliation.

'cpu'

Returns:

Type Description
Tensor or None

The reconciled parameters \(\psi(\mathbf{w})\) for the specified channel, or None if not applicable.

Source code in tinybig/module/base_head.py
def calculate_phi_w(self, D: int, channel_index: int = 0, device='cpu', *args, **kwargs):
    r"""
    Computes the reconciled parameters $\psi(\mathbf{w})$ for a specific channel.

    Parameters
    ----------
    D: int
        The dimensionality of the transformed data $\kappa(\mathbf{x})$.
    channel_index: int, default=0
        The index of the channel for which parameters are computed.
    device: str, default='cpu'
        The device to execute the parameter reconciliation.

    Returns
    -------
    torch.Tensor or None
        The reconciled parameters $\psi(\mathbf{w})$ for the specified channel, or None if not applicable.
    """
    assert channel_index in range(self.channel_num)

    if self.parameter_fabrication is not None:
        if self.parameter_fabrication.device != device:
            self.parameter_fabrication.to(device)

        if self.w is not None and 0 <= channel_index < self.w.size(0):
            w_chunk = self.w[channel_index:channel_index+1, :]
        else:
            w_chunk = None
        phi_w = self.parameter_fabrication(w=w_chunk, n=self.n, D=D, device=device)
        return phi_w
    else:
        return None

calculate_pi_x(x, device='cpu', *args, **kwargs)

Computes the remainder term \(\pi(\mathbf{x})\) using the remainder function.

Parameters:

Name Type Description Default
x Tensor

The input data to compute the remainder term.

required
device

The device to execute the remainder calculation.

'cpu'

Returns:

Type Description
Tensor or None

The remainder term \(\pi(\mathbf{x})\) if a remainder function is defined, otherwise None.

Source code in tinybig/module/base_head.py
def calculate_pi_x(self, x: torch.Tensor, device='cpu', *args, **kwargs):
    r"""
    Computes the remainder term $\pi(\mathbf{x})$ using the remainder function.

    Parameters
    ----------
    x: torch.Tensor
        The input data to compute the remainder term.
    device: str, default='cpu'
        The device to execute the remainder calculation.

    Returns
    -------
    torch.Tensor or None
        The remainder term $\pi(\mathbf{x})$ if a remainder function is defined, otherwise None.
    """
    if self.remainder is not None:
        if isinstance(self.remainder, tinybig.remainder.zero_remainder):
            return None

        if self.remainder.device != device:
            self.remainder.to(device)
        pi_x = self.remainder(x=x, w=self.w_remainder, b=self.b_remainder, m=self.m, n=self.n, device=device)
        return pi_x
    else:
        return None

create_learnable_parameters(initialize_parameter_at_creation=False, init_type='xavier_uniform', init_bias=True, *args, **kwargs)

Creates learnable parameters for the head.

This method creates parameters for data transformation, parameter reconciliation, remainder functions, and channel fusion based on the head configuration.

Parameters:

Name Type Description Default
initialize_parameter_at_creation bool

Whether to initialize parameters during creation.

False
init_type str

The initialization method for parameters.

'xavier_uniform'
init_bias bool

Whether to initialize bias parameters.

True

Returns:

Type Description
None
Source code in tinybig/module/base_head.py
def create_learnable_parameters(
    self,
    initialize_parameter_at_creation: bool = False,
    init_type: str = 'xavier_uniform',
    init_bias: bool = True,
    *args, **kwargs
):
    """
    Creates learnable parameters for the head.

    This method creates parameters for data transformation, parameter reconciliation,
    remainder functions, and channel fusion based on the head configuration.

    Parameters
    ----------
    initialize_parameter_at_creation: bool, default=False
        Whether to initialize parameters during creation.
    init_type: str, default='xavier_uniform'
        The initialization method for parameters.
    init_bias: bool, default=True
        Whether to initialize bias parameters.

    Returns
    -------
    None
    """
    m_prime, b_prime = self.m, self.batch_num

    if self.attribute_interdependence is not None:
        if self.attribute_interdependence.require_parameters:
            if self.l_attribute_interdependence is None:
                self.l_attribute_interdependence = self.attribute_interdependence.calculate_l()
            self.w_attribute_interdependence = torch.nn.Parameter(torch.rand(self.channel_num, self.l_attribute_interdependence, device=self.device))
        assert self.m is not None and self.m >= 1
        m_prime = self.attribute_interdependence.calculate_m_prime(m=self.m)

    if self.instance_interdependence is not None:
        if self.instance_interdependence.require_parameters:
            if self.l_instance_interdependence is None:
                self.l_instance_interdependence = self.instance_interdependence.calculate_l()
            self.w_instance_interdependence = torch.nn.Parameter(torch.rand(self.channel_num, self.l_instance_interdependence, device=self.device))
        if self.batch_num is not None:
            assert self.batch_num is not None and self.batch_num >= 1
            b_prime = self.instance_interdependence.calculate_b_prime(b=self.batch_num)

    # create learnable parameters for parameter_fabrication function
    if self.parameter_fabrication is not None and self.parameter_fabrication.require_parameters:
        if self.l is None:
            self.l = self.parameter_fabrication.calculate_l(n=self.n, D=self.data_transformation.calculate_D(m=m_prime))
        self.w = torch.nn.Parameter(torch.rand(self.channel_num, self.l, device=self.device))
        if self.parameter_fabrication.enable_bias:
            self.b = torch.nn.Parameter(torch.rand(self.n, device=self.device))

    # create learnable parameters for remainder function
    if self.remainder is not None and self.remainder.require_parameters:
        self.w_remainder = torch.nn.Parameter(torch.rand(self.n, self.m, device=self.device))
        if self.remainder.enable_bias:
            self.b_remainder = torch.nn.Parameter(torch.rand(self.n, device=self.device))
    elif self.m != self.n and not self.remainder.require_parameters and not isinstance(self.remainder, tinybig.remainder.zero_remainder) and not isinstance(self.remainder, tinybig.remainder.constant_remainder):
        raise ValueError('The input and output dimensions {}, {} are different, parameters will be needed '
                         'by the {} to adjust the input dimensions.'.format(self.m, self.n, self.remainder.get_name()))

    # create learnable parameters for channel_fusion function
    if self.channel_fusion is not None and self.channel_fusion.require_parameters:
        if self.l_channel_fusion is None:
            self.l_channel_fusion = self.channel_fusion.calculate_l()
        self.w_channel_fusion = torch.nn.Parameter(torch.rand(1, self.l_channel_fusion, device=self.device))

    # initialize the parameter with certain methods...
    init_type = self.parameters_init_method if self.parameters_init_method is not None else init_type
    if initialize_parameter_at_creation:
        self.initialize_parameters(init_type=init_type, init_bias=init_bias)

forward(x, device='cpu', *args, **kwargs)

The forward method of the RPN head module.

Based on the data expansion, parameter reconciliation and remainder functions, the RPN head will calculate its output with multi-channel parameters as follows: $$ \begin{equation} g(\mathbf{x} | \mathbf{w}, C) = \sum_{c=0}^{C-1} \left\langle \kappa(\mathbf{x}), \psi(\mathbf{w}^{c}) \right\rangle + \pi(\mathbf{x}), \end{equation} $$ where these multi-channel parameters \(\mathbf{w}^{0}, \mathbf{w}^{1}, \cdots, \mathbf{w}^{C-1}\) are fabricated from length \(l\) to shape \((n, D)\) using the identical parameter reconciliation function.

Parameters:

Name Type Description Default
x Tensor

The input data vector.

required
device

The device for hosting the head.

'cpu'

Returns:

Type Description
Tensor

The processed output of the head.

Source code in tinybig/module/base_head.py
def forward(self, x: torch.Tensor, device='cpu', *args, **kwargs):
    r"""
    The forward method of the RPN head module.

    Based on the data expansion, parameter reconciliation and remainder functions, the RPN head will calculate its
    output with multi-channel parameters as follows:
    $$
        \begin{equation}
            g(\mathbf{x} | \mathbf{w}, C) = \sum_{c=0}^{C-1} \left\langle \kappa(\mathbf{x}), \psi(\mathbf{w}^{c}) \right\rangle + \pi(\mathbf{x}),
        \end{equation}
    $$
    where these multi-channel parameters $\mathbf{w}^{0}, \mathbf{w}^{1}, \cdots, \mathbf{w}^{C-1}$ are fabricated
    from length $l$ to shape $(n, D)$ using the identical parameter reconciliation function.

    Parameters
    ----------
    x: torch.Tensor
        The input data vector.
    device: str, default = 'cpu'
        The device for hosting the head.

    Returns
    -------
    torch.Tensor
        The processed output of the head.
    """
    # ************** Input Processing Block **************
    if x is None:
        raise ValueError("x cannot be None...")

    x = function.func_x(x=x, functions=self.input_process_functions, device=device)

    inner_products = []

    pre_computed_kappa_xi_x = None
    # if the instance functions has no parameters, it can be pre-computed and reused across channels
    if (
        (self.attribute_interdependence is None or not self.attribute_interdependence.require_parameters) and
        (self.instance_interdependence is None or not self.instance_interdependence.require_parameters) and
        self.channel_num > 1
    ):
        pre_computed_kappa_xi_x = self.calculate_kappa_xi_x(x=x, channel_index=0, device=device)

    for channel_index in range(self.channel_num):

        # ************** Data Transformation Block **************
        if (
            (self.attribute_interdependence is None or not self.attribute_interdependence.require_parameters)
            and (self.instance_interdependence is None or not self.instance_interdependence.require_parameters)
            and pre_computed_kappa_xi_x is not None
        ):
            kappa_xi_x = pre_computed_kappa_xi_x
        else:
            kappa_xi_x = self.calculate_kappa_xi_x(x=x, channel_index=channel_index, device=device)

        # ************** Parameter Reconciliation Block **************
        phi_w = self.calculate_phi_w(D=kappa_xi_x.size(-1), channel_index=channel_index, device=device, *args, **kwargs)

        # ************** Inner Product Calculation Block **************
        inner_prod = self.calculate_inner_product(kappa_xi_x=kappa_xi_x, phi_w=phi_w, device=device, *args, **kwargs)

        inner_products.append(inner_prod)

    # ************** Multi-Channel Fusion Block **************
    result = self.fusion(inner_products=inner_products, device=device)

    # ************** Remainder Block **************
    pi_x = self.calculate_pi_x(x=x, device=device, *args, **kwargs)

    if pi_x is not None:
        assert pi_x.size(-1) == result.size(-1)
        result += pi_x

    # ************** Output Processing Block **************
    output = function.func_x(x=result, functions=self.output_process_functions, device=self.device)
    return output

fusion(inner_products, device='cpu', *args, **kwargs)

Combines the multi-channel outputs into a single output.

If a channel fusion function is defined, it applies the function to combine the inner product results. Otherwise, it returns the first channel's result.

Parameters:

Name Type Description Default
inner_products list[Tensor]

The inner products computed from each channel.

required
device str

The device hosting the operation.

'cpu'

Returns:

Type Description
Tensor

The fused output.

Source code in tinybig/module/base_head.py
def fusion(self, inner_products: list[torch.Tensor], device: str = 'cpu', *args, **kwargs):
    """
    Combines the multi-channel outputs into a single output.

    If a channel fusion function is defined, it applies the function to combine
    the inner product results. Otherwise, it returns the first channel's result.

    Parameters
    ----------
    inner_products: list of torch.Tensor
        The inner products computed from each channel.
    device: str, default='cpu'
        The device hosting the operation.

    Returns
    -------
    torch.Tensor
        The fused output.
    """
    if self.channel_fusion is not None:
        assert self.channel_fusion.get_dims() is None or self.channel_fusion.get_num() == len(inner_products)
        result = self.channel_fusion(x=inner_products, w=self.w_channel_fusion, device=device)
        n = self.channel_fusion.calculate_n(dims=[result.size(-1) for result in inner_products])
    else:
        assert len(inner_products) == 1
        result = inner_products[0]
        n = self.n
    assert result.size(-1) == n
    return result

get_batch_num()

Retrieves the batch size used in instance interdependence functions.

Returns:

Type Description
int or None

The batch size used for instance interdependence, or None if not specified.

Source code in tinybig/module/base_head.py
def get_batch_num(self):
    """
    Retrieves the batch size used in instance interdependence functions.

    Returns
    -------
    int or None
        The batch size used for instance interdependence, or None if not specified.
    """
    return self.batch_num

get_channel_num()

Retrieves the number of channels in the head.

Returns:

Type Description
int

The number of channels in the head.

Source code in tinybig/module/base_head.py
def get_channel_num(self):
    """
    Retrieves the number of channels in the head.

    Returns
    -------
    int
        The number of channels in the head.
    """
    return self.channel_num

get_m()

Retrieves the input dimension (m) of the head.

Returns:

Type Description
int

The input dimension of the head.

Source code in tinybig/module/base_head.py
def get_m(self):
    """
    Retrieves the input dimension (`m`) of the head.

    Returns
    -------
    int
        The input dimension of the head.
    """
    return self.m

get_n()

Retrieves the output dimension (n) of the head.

Returns:

Type Description
int

The output dimension of the head.

Source code in tinybig/module/base_head.py
def get_n(self):
    """
    Retrieves the output dimension (`n`) of the head.

    Returns
    -------
    int
        The output dimension of the head.
    """
    return self.n

initialize_parameters(init_type='xavier_uniform', init_bias=True, *args, **kwargs)

The parameter initialization method.

It initializes the multi-channel parameters in the head with different initialization approaches, e.g., xavier_uniform or kaiming_uniform. Depending on the "init_type" parameter, this method will call the corresponding initiation methods.

Parameters:

Name Type Description Default
init_type

The parameter initialization approach.

'xavier_uniform'
init_bias

The boolean tag of bias initialization.

True

Returns:

Type Description
None

This initialization method doesn't have any return values.

Source code in tinybig/module/base_head.py
def initialize_parameters(self, init_type='xavier_uniform', init_bias=True, *args, **kwargs):
    """
    The parameter initialization method.

    It initializes the multi-channel parameters in the head with different initialization approaches,
    e.g., xavier_uniform or kaiming_uniform.
    Depending on the "init_type" parameter, this method will call the corresponding initiation methods.

    Parameters
    ----------
    init_type: str, default = 'xavier_uniform'
        The parameter initialization approach.
    init_bias: bool, default = True
        The boolean tag of bias initialization.

    Returns
    -------
    None
        This initialization method doesn't have any return values.
    """
    init_type = self.parameters_init_method if self.parameters_init_method is not None else init_type

    print('parameter init type', init_type)

    if init_type == 'kaiming_uniform':
        self.initialize_parameters_kaiming_uniform(init_bias=init_bias, *args, **kwargs)
    elif init_type == 'xavier_uniform':
        self.initialize_parameters_xavier_uniform(init_bias=init_bias, *args, **kwargs)
    elif init_type == 'xavier_normal':
        self.initialize_parameters_xavier_normal(init_bias=init_bias, *args, **kwargs)
    elif init_type == 'fanout_std_uniform':
        self.initialize_parameters_fanout_std_uniform(init_bias=init_bias, *args, **kwargs)

initialize_parameters_fanout_std_uniform(init_bias=True, fan_out=None, *args, **kwargs)

The kaiming parameter initialization method.

It initializes the multi-channel parameters in the head with kaiming_uniform_ method from pytorch.

Parameters:

Name Type Description Default
init_bias

The boolean tag of bias initialization.

True

Returns:

Type Description
None

This initialization method doesn't have any return values.

Source code in tinybig/module/base_head.py
def initialize_parameters_fanout_std_uniform(self, init_bias=True, fan_out: int = None, *args, **kwargs):
    """
    The kaiming parameter initialization method.

    It initializes the multi-channel parameters in the head with kaiming_uniform_ method from pytorch.

    Parameters
    ----------
    init_bias: bool, default = True
        The boolean tag of bias initialization.

    Returns
    -------
    None
        This initialization method doesn't have any return values.
    """
    fan_out = fan_out if fan_out is not None else self.n
    if fan_out is None: fan_out = self.m
    assert fan_out is not None and fan_out > 0
    std = 1. / math.sqrt(fan_out)

    if self.w_attribute_interdependence is not None:
        self.w_attribute_interdependence.data.uniform_(-std, std)

    if self.w_instance_interdependence is not None:
        self.w_instance_interdependence.data.uniform_(-std, std)

    if self.w is not None:
        self.w.data.uniform_(-std, std)
        if init_bias and self.b is not None:
            self.b.data.uniform_(-std, std)

    if self.w_remainder is not None:
        self.w_remainder.data.uniform_(-std, std)
        if init_bias and self.b_remainder is not None:
            self.b_remainder.data.uniform_(-std, std)

    if self.w_channel_fusion is not None:
        self.w_channel_fusion.data.uniform_(-std, std)

initialize_parameters_kaiming_uniform(init_bias=True, *args, **kwargs)

The kaiming parameter initialization method.

It initializes the multi-channel parameters in the head with kaiming_uniform_ method from pytorch.

Parameters:

Name Type Description Default
init_bias

The boolean tag of bias initialization.

True

Returns:

Type Description
None

This initialization method doesn't have any return values.

Source code in tinybig/module/base_head.py
def initialize_parameters_kaiming_uniform(self, init_bias=True, *args, **kwargs):
    """
    The kaiming parameter initialization method.

    It initializes the multi-channel parameters in the head with kaiming_uniform_ method from pytorch.

    Parameters
    ----------
    init_bias: bool, default = True
        The boolean tag of bias initialization.

    Returns
    -------
    None
        This initialization method doesn't have any return values.
    """

    if self.w_attribute_interdependence is not None:
        torch.nn.init.kaiming_uniform_(self.w_attribute_interdependence, a=math.sqrt(5))

    if self.w_instance_interdependence is not None:
        torch.nn.init.kaiming_uniform_(self.w_instance_interdependence, a=math.sqrt(5))

    if self.w is not None:
        torch.nn.init.kaiming_uniform_(self.w, a=math.sqrt(5))

    if self.w_remainder is not None:
        torch.nn.init.kaiming_uniform_(self.w_remainder, a=math.sqrt(5))

    if self.w_channel_fusion is not None:
        torch.nn.init.kaiming_uniform_(self.w_channel_fusion, a=math.sqrt(5))

    if init_bias:
        if self.b is not None:
            fan_in, _ = torch.nn.init._calculate_fan_in_and_fan_out(self.w)
            bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
            torch.nn.init.uniform_(self.b, -bound, bound)
        if self.b_remainder is not None:
            fan_in, _ = torch.nn.init._calculate_fan_in_and_fan_out(self.w_remainder)
            bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
            torch.nn.init.uniform_(self.b_remainder, -bound, bound)

initialize_parameters_xavier_normal(init_bias=True, *args, **kwargs)

The xavier initialization method.

It initializes the multi-channel parameters in the head with xavier_uniform_ method from pytorch.

Parameters:

Name Type Description Default
init_bias

The boolean tag of bias initialization.

True

Returns:

Type Description
None

This initialization method doesn't have any return values.

Source code in tinybig/module/base_head.py
def initialize_parameters_xavier_normal(self, init_bias=True, *args, **kwargs):
    """
    The xavier initialization method.

    It initializes the multi-channel parameters in the head with xavier_uniform_ method from pytorch.

    Parameters
    ----------
    init_bias: bool, default = True
        The boolean tag of bias initialization.

    Returns
    -------
    None
        This initialization method doesn't have any return values.
    """
    if self.w_attribute_interdependence is not None:
        torch.nn.init.xavier_normal_(self.w_attribute_interdependence)

    if self.w_instance_interdependence is not None:
        torch.nn.init.xavier_normal_(self.w_instance_interdependence)

    if self.w is not None:
        torch.nn.init.xavier_normal_(self.w)

    if self.w_remainder is not None:
        torch.nn.init.xavier_normal_(self.w_remainder)

    if self.w_channel_fusion is not None:
        torch.nn.init.xavier_normal_(self.w_channel_fusion)

    if init_bias:
        if self.b is not None:
            torch.nn.init.xavier_normal_(self.b.view(1, -1))
        if self.b_remainder is not None:
            torch.nn.init.xavier_normal_(self.b_remainder.view(1, -1))

initialize_parameters_xavier_uniform(init_bias=True, *args, **kwargs)

The xavier initialization method.

It initializes the multi-channel parameters in the head with xavier_uniform_ method from pytorch.

Parameters:

Name Type Description Default
init_bias

The boolean tag of bias initialization.

True

Returns:

Type Description
None

This initialization method doesn't have any return values.

Source code in tinybig/module/base_head.py
def initialize_parameters_xavier_uniform(self, init_bias=True, *args, **kwargs):
    """
    The xavier initialization method.

    It initializes the multi-channel parameters in the head with xavier_uniform_ method from pytorch.

    Parameters
    ----------
    init_bias: bool, default = True
        The boolean tag of bias initialization.

    Returns
    -------
    None
        This initialization method doesn't have any return values.
    """
    if self.w_attribute_interdependence is not None:
        torch.nn.init.xavier_uniform_(self.w_attribute_interdependence)

    if self.w_instance_interdependence is not None:
        torch.nn.init.xavier_uniform_(self.w_instance_interdependence)

    if self.w is not None:
        torch.nn.init.xavier_uniform_(self.w)

    if self.w_remainder is not None:
        torch.nn.init.xavier_uniform_(self.w_remainder)

    if self.w_channel_fusion is not None:
        torch.nn.init.xavier_uniform_(self.w_channel_fusion)

    if init_bias:
        if self.b is not None:
            torch.nn.init.xavier_uniform_(self.b.view(1, -1))
        if self.b_remainder is not None:
            torch.nn.init.xavier_uniform_(self.b_remainder.view(1, -1))

to_config()

Converts the configuration of the head into a dictionary.

This includes the head's attributes, such as dimensions, transformation functions, interdependence functions, fabrication functions, and remainder functions.

Returns:

Type Description
dict

A dictionary containing the head's class and parameter configurations.

Source code in tinybig/module/base_head.py
def to_config(self):
    """
    Converts the configuration of the head into a dictionary.

    This includes the head's attributes, such as dimensions, transformation functions,
    interdependence functions, fabrication functions, and remainder functions.

    Returns
    -------
    dict
        A dictionary containing the head's class and parameter configurations.
    """
    head_class = f"{self.__class__.__module__}.{self.__class__.__name__}"
    head_parameters = {
        'name': self.name,
        'device': self.device,
        'm': self.m,
        'n': self.n,
        'l': self.l,
        'batch_num': self.batch_num,
        'channel_num': self.channel_num,
    }

    if self.data_transformation is not None:
        head_parameters['data_transformation_configs'] = self.data_transformation.to_config()
    if self.attribute_interdependence is not None:
        head_parameters['attribute_interdependence_configs'] = self.attribute_interdependence.to_config()
    if self.instance_interdependence is not None:
        head_parameters['instance_interdependence_configs'] = self.instance_interdependence.to_config()
    if self.parameter_fabrication is not None:
        head_parameters['parameter_fabrication_configs'] = self.parameter_fabrication.to_config()
    if self.channel_fusion is not None:
        head_parameters['channel_fusion_configs'] = self.channel_fusion.to_config()
    if self.remainder is not None:
        head_parameters['remainder_configs'] = self.remainder.to_config()
    if self.input_process_functions is not None:
        head_parameters['input_process_function_configs'] = function.functions_to_configs(self.input_process_functions)
    if self.output_process_functions is not None:
        head_parameters['output_process_function_configs'] = function.functions_to_configs(self.output_process_functions)

    return {
        "head_class": head_class,
        "head_parameters": head_parameters
    }