sst2
Bases: text_dataloader
A dataloader class for the SST-2 dataset.
This class provides methods to load and preprocess the SST-2 dataset, which contains sentiment classification labels for sentences.
Attributes:
Name | Type | Description |
---|---|---|
name |
str, default = 'sst2'
|
The name of the dataset. |
train_batch_size |
int, default = 64
|
The batch size for training data. |
test_batch_size |
int, default = 64
|
The batch size for testing data. |
max_seq_len |
int, default = 32
|
The maximum sequence length for text data. |
Methods:
Name | Description |
---|---|
__init__ |
Initializes the SST-2 dataset dataloader. |
load_datapipe |
Loads training and testing pipelines for the SST-2 dataset. |
get_class_number |
Returns the number of classes in the SST-2 dataset (2). |
get_train_number |
Returns the number of training examples (67,349). |
get_test_number |
Returns the number of testing examples (872). |
get_idx_to_label |
Returns the mapping from indices to labels. |
Source code in tinybig/data/text_dataloader_torchtext.py
819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 |
|
get_class_number(*args, **kwargs)
staticmethod
Returns the number of classes in the SST-2 dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*args
|
tuple
|
Additional arguments. |
()
|
**kwargs
|
dict
|
Additional keyword arguments. |
{}
|
Returns:
Type | Description |
---|---|
int
|
The number of classes (2). |
Source code in tinybig/data/text_dataloader_torchtext.py
get_idx_to_label(*args, **kwargs)
staticmethod
Returns the mapping from indices to labels for the SST-2 dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*args
|
tuple
|
Additional arguments. |
()
|
**kwargs
|
dict
|
Additional keyword arguments. |
{}
|
Returns:
Type | Description |
---|---|
dict
|
A dictionary mapping indices to labels. |
Source code in tinybig/data/text_dataloader_torchtext.py
get_test_number(*args, **kwargs)
staticmethod
Returns the number of testing examples in the SST-2 dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*args
|
tuple
|
Additional arguments. |
()
|
**kwargs
|
dict
|
Additional keyword arguments. |
{}
|
Returns:
Type | Description |
---|---|
int
|
The number of testing examples (872). |
Source code in tinybig/data/text_dataloader_torchtext.py
get_train_number(*args, **kwargs)
staticmethod
Returns the number of training examples in the SST-2 dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*args
|
tuple
|
Additional arguments. |
()
|
**kwargs
|
dict
|
Additional keyword arguments. |
{}
|
Returns:
Type | Description |
---|---|
int
|
The number of training examples (67,349). |
Source code in tinybig/data/text_dataloader_torchtext.py
load_datapipe(cache_dir='./data/', *args, **kwargs)
staticmethod
Loads training and testing pipelines for the SST-2 dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cache_dir
|
str
|
Directory to store cached data. |
= './data/'
|
*args
|
tuple
|
Additional arguments. |
()
|
**kwargs
|
dict
|
Additional keyword arguments. |
{}
|
Returns:
Type | Description |
---|---|
tuple
|
A tuple containing training and testing data pipelines. |