cnlpt.HierarchicalTransformer module

Module containing the Hierarchical Transformer module, adapted from Xin Su.

cnlpt.HierarchicalTransformer.set_seed(seed, n_gpu)

Set the random seeds for random, numpy, and pytorch to a specific value.

Parameters:

seed – the seed to use
n_gpu – the number of GPUs being used

class cnlpt.HierarchicalTransformer.MultiHeadAttention

Bases: Module

Multi-Head Attention module

Original author: Yu-Hsiang Huang (https://github.com/jadore801120/attention-is-all-you-need-pytorch)

Parameters:

n_head – the number of attention heads
d_model – the dimensionality of the input and output of the encoder
d_k – the size of the query and key vectors
d_v – the size of the value vector

__init__(n_head, d_model, d_k, d_v, dropout=0.1): Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(q, k, v, mask=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class cnlpt.HierarchicalTransformer.PositionwiseFeedForward

Bases: Module

A two-feed-forward-layer module

Original author: Yu-Hsiang Huang (https://github.com/jadore801120/attention-is-all-you-need-pytorch)

Parameters:

d_in – the dimensionality of the input and output of the encoder
d_hid – the inner hidden size of the positionwise FFN in the encoder
dropout – the amount of dropout to use in training (default 0.1)

__init__(d_in, d_hid, dropout=0.1): Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class cnlpt.HierarchicalTransformer.ScaledDotProductAttention

Bases: Module

Scaled Dot-Product Attention

Original author: Yu-Hsiang Huang (https://github.com/jadore801120/attention-is-all-you-need-pytorch)

Parameters:

temperature – the temperature for scaled dot product attention
attn_dropout – the amount of dropout to use in training for scaled dot product attention (default 0.1, not tuned in the rest of the code)

__init__(temperature, attn_dropout=0.1): Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(q, k, v, mask=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class cnlpt.HierarchicalTransformer.EncoderLayer

Bases: Module

Compose with two layers

Original author: Yu-Hsiang Huang (https://github.com/jadore801120/attention-is-all-you-need-pytorch)

Parameters:

d_model – the dimensionality of the input and output of the encoder
d_inner – the inner hidden size of the positionwise FFN in the encoder
n_head – the number of attention heads
d_k – the size of the query and key vectors
d_v – the size of the value vector
dropout – the amount of dropout to use in training in both the attention and FFN steps (default 0.1)

__init__(d_model, d_inner, n_head, d_k, d_v, dropout=0.1): Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(enc_input, slf_attn_mask=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class cnlpt.HierarchicalTransformer.HierarchicalModel

Bases: PreTrainedModel

Hierarchical Transformer model (https://arxiv.org/abs/2105.06752)

Adapted from Xin Su’s implementation (https://github.com/xinsu626/DocTransformer)

Parameters:

config –
transformer_head_config –
class_weights –
final_task_weight –
freeze –

config_class: alias of CnlpConfig

__init__(config, *, freeze=-1.0, class_weights=None): Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=False, event_tokens=None)

Forward method.

Parameters:

input_ids (torch.LongTensor of shape (batch_size, num_chunks, chunk_len), optional) – A batch of chunked documents as tokenizer indices.
attention_mask (torch.LongTensor of shape (batch_size, num_chunks, chunk_len), optional) – Attention masks for the batch.
token_type_ids (torch.LongTensor of shape (batch_size, num_chunks, chunk_len), optional) – Token type IDs for the batch.
position_ids – (torch.LongTensor of shape (batch_size, num_chunks, chunk_len), optional): Position IDs for the batch.
head_mask (torch.LongTensor of shape (num_heads,) or (num_layers, num_heads), optional) – Token encoder head mask.
inputs_embeds (torch.FloatTensor of shape (batch_size, num_chunks, chunk_len, hidden_size), optional) – A batch of chunked documents as token embeddings.
labels (torch.LongTensor of shape (batch_size, num_tasks), optional) – Labels for computing the sequence classification/regression loss. Indices should be in [0, …, self.num_labels[task_ind] - 1]. If self.num_labels[task_ind] == 1 a regression loss is computed (Mean-Square loss), If self.num_labels[task_ind] > 1 a classification loss is computed (Cross-Entropy).
output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers.
output_hidden_states – If True, return a matrix of shape (batch_size, num_chunks, hidden size) representing the contextualized embeddings of each chunk. The 0-th element of each chunk is the classifier representation for that instance.
event_tokens – not currently used (only relevant for token classification)

Returns: