cnlpt.HierarchicalTransformer module
Module containing the Hierarchical Transformer module, adapted from Xin Su.
- cnlpt.HierarchicalTransformer.set_seed(seed, n_gpu)
Set the random seeds for
random
, numpy, and pytorch to a specific value.- Parameters:
seed – the seed to use
n_gpu – the number of GPUs being used
- class cnlpt.HierarchicalTransformer.MultiHeadAttention
Bases:
Module
Multi-Head Attention module
Original author: Yu-Hsiang Huang (https://github.com/jadore801120/attention-is-all-you-need-pytorch)
- Parameters:
n_head – the number of attention heads
d_model – the dimensionality of the input and output of the encoder
d_k – the size of the query and key vectors
d_v – the size of the value vector
- __init__(n_head, d_model, d_k, d_v, dropout=0.1)
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(q, k, v, mask=None)
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class cnlpt.HierarchicalTransformer.PositionwiseFeedForward
Bases:
Module
A two-feed-forward-layer module
Original author: Yu-Hsiang Huang (https://github.com/jadore801120/attention-is-all-you-need-pytorch)
- Parameters:
d_in – the dimensionality of the input and output of the encoder
d_hid – the inner hidden size of the positionwise FFN in the encoder
dropout – the amount of dropout to use in training (default 0.1)
- __init__(d_in, d_hid, dropout=0.1)
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class cnlpt.HierarchicalTransformer.ScaledDotProductAttention
Bases:
Module
Scaled Dot-Product Attention
Original author: Yu-Hsiang Huang (https://github.com/jadore801120/attention-is-all-you-need-pytorch)
- Parameters:
temperature – the temperature for scaled dot product attention
attn_dropout – the amount of dropout to use in training for scaled dot product attention (default 0.1, not tuned in the rest of the code)
- __init__(temperature, attn_dropout=0.1)
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(q, k, v, mask=None)
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class cnlpt.HierarchicalTransformer.EncoderLayer
Bases:
Module
Compose with two layers
Original author: Yu-Hsiang Huang (https://github.com/jadore801120/attention-is-all-you-need-pytorch)
- Parameters:
d_model – the dimensionality of the input and output of the encoder
d_inner – the inner hidden size of the positionwise FFN in the encoder
n_head – the number of attention heads
d_k – the size of the query and key vectors
d_v – the size of the value vector
dropout – the amount of dropout to use in training in both the attention and FFN steps (default 0.1)
- __init__(d_model, d_inner, n_head, d_k, d_v, dropout=0.1)
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(enc_input, slf_attn_mask=None)
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class cnlpt.HierarchicalTransformer.HierarchicalModel
Bases:
PreTrainedModel
Hierarchical Transformer model (https://arxiv.org/abs/2105.06752)
Adapted from Xin Su’s implementation (https://github.com/xinsu626/DocTransformer)
- Parameters:
config –
transformer_head_config –
class_weights –
final_task_weight –
freeze –
- config_class
alias of
CnlpConfig
- __init__(config, *, freeze=-1.0, class_weights=None)
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=False, event_tokens=None)
Forward method.
- Parameters:
input_ids (torch.LongTensor of shape (batch_size, num_chunks, chunk_len), optional) – A batch of chunked documents as tokenizer indices.
attention_mask (torch.LongTensor of shape (batch_size, num_chunks, chunk_len), optional) – Attention masks for the batch.
token_type_ids (torch.LongTensor of shape (batch_size, num_chunks, chunk_len), optional) – Token type IDs for the batch.
position_ids – (torch.LongTensor of shape (batch_size, num_chunks, chunk_len), optional): Position IDs for the batch.
head_mask (torch.LongTensor of shape (num_heads,) or (num_layers, num_heads), optional) – Token encoder head mask.
inputs_embeds (torch.FloatTensor of shape (batch_size, num_chunks, chunk_len, hidden_size), optional) – A batch of chunked documents as token embeddings.
labels (torch.LongTensor of shape (batch_size, num_tasks), optional) – Labels for computing the sequence classification/regression loss. Indices should be in [0, …, self.num_labels[task_ind] - 1]. If self.num_labels[task_ind] == 1 a regression loss is computed (Mean-Square loss), If self.num_labels[task_ind] > 1 a classification loss is computed (Cross-Entropy).
output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers.
output_hidden_states – If True, return a matrix of shape (batch_size, num_chunks, hidden size) representing the contextualized embeddings of each chunk. The 0-th element of each chunk is the classifier representation for that instance.
event_tokens – not currently used (only relevant for token classification)
Returns: