cnlpt.HierarchicalTransformer module

Module containing the Hierarchical Transformer module, adapted from Xin Su.

cnlpt.HierarchicalTransformer.set_seed(seed, n_gpu)

Set the random seeds for random, numpy, and pytorch to a specific value.

Parameters:
  • seed – the seed to use

  • n_gpu – the number of GPUs being used

class cnlpt.HierarchicalTransformer.MultiHeadAttention

Bases: Module

Multi-Head Attention module

Original author: Yu-Hsiang Huang (https://github.com/jadore801120/attention-is-all-you-need-pytorch)

Parameters:
  • n_head – the number of attention heads

  • d_model – the dimensionality of the input and output of the encoder

  • d_k – the size of the query and key vectors

  • d_v – the size of the value vector

__init__(n_head, d_model, d_k, d_v, dropout=0.1)

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(q, k, v, mask=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class cnlpt.HierarchicalTransformer.PositionwiseFeedForward

Bases: Module

A two-feed-forward-layer module

Original author: Yu-Hsiang Huang (https://github.com/jadore801120/attention-is-all-you-need-pytorch)

Parameters:
  • d_in – the dimensionality of the input and output of the encoder

  • d_hid – the inner hidden size of the positionwise FFN in the encoder

  • dropout – the amount of dropout to use in training (default 0.1)

__init__(d_in, d_hid, dropout=0.1)

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class cnlpt.HierarchicalTransformer.ScaledDotProductAttention

Bases: Module

Scaled Dot-Product Attention

Original author: Yu-Hsiang Huang (https://github.com/jadore801120/attention-is-all-you-need-pytorch)

Parameters:
  • temperature – the temperature for scaled dot product attention

  • attn_dropout – the amount of dropout to use in training for scaled dot product attention (default 0.1, not tuned in the rest of the code)

__init__(temperature, attn_dropout=0.1)

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(q, k, v, mask=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class cnlpt.HierarchicalTransformer.EncoderLayer

Bases: Module

Compose with two layers

Original author: Yu-Hsiang Huang (https://github.com/jadore801120/attention-is-all-you-need-pytorch)

Parameters:
  • d_model – the dimensionality of the input and output of the encoder

  • d_inner – the inner hidden size of the positionwise FFN in the encoder

  • n_head – the number of attention heads

  • d_k – the size of the query and key vectors

  • d_v – the size of the value vector

  • dropout – the amount of dropout to use in training in both the attention and FFN steps (default 0.1)

__init__(d_model, d_inner, n_head, d_k, d_v, dropout=0.1)

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(enc_input, slf_attn_mask=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class cnlpt.HierarchicalTransformer.HierarchicalModel

Bases: PreTrainedModel

Hierarchical Transformer model (https://arxiv.org/abs/2105.06752)

Adapted from Xin Su’s implementation (https://github.com/xinsu626/DocTransformer)

Parameters:
  • config

  • transformer_head_config

  • class_weights

  • final_task_weight

  • freeze

config_class

alias of CnlpConfig

__init__(config, *, freeze=-1.0, class_weights=None)

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=False, event_tokens=None)

Forward method.

Parameters:
  • input_ids (torch.LongTensor of shape (batch_size, num_chunks, chunk_len), optional) – A batch of chunked documents as tokenizer indices.

  • attention_mask (torch.LongTensor of shape (batch_size, num_chunks, chunk_len), optional) – Attention masks for the batch.

  • token_type_ids (torch.LongTensor of shape (batch_size, num_chunks, chunk_len), optional) – Token type IDs for the batch.

  • position_ids – (torch.LongTensor of shape (batch_size, num_chunks, chunk_len), optional): Position IDs for the batch.

  • head_mask (torch.LongTensor of shape (num_heads,) or (num_layers, num_heads), optional) – Token encoder head mask.

  • inputs_embeds (torch.FloatTensor of shape (batch_size, num_chunks, chunk_len, hidden_size), optional) – A batch of chunked documents as token embeddings.

  • labels (torch.LongTensor of shape (batch_size, num_tasks), optional) – Labels for computing the sequence classification/regression loss. Indices should be in [0, …, self.num_labels[task_ind] - 1]. If self.num_labels[task_ind] == 1 a regression loss is computed (Mean-Square loss), If self.num_labels[task_ind] > 1 a classification loss is computed (Cross-Entropy).

  • output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers.

  • output_hidden_states – If True, return a matrix of shape (batch_size, num_chunks, hidden size) representing the contextualized embeddings of each chunk. The 0-th element of each chunk is the classifier representation for that instance.

  • event_tokens – not currently used (only relevant for token classification)

Returns: