Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
import numpy as np
import torch
import tensorflow as tfx = torch.tensor(
[
[1.0, 2, 3, 4, 5],
[6.0, 7, 8, 9, 10],
]
)
xtensor([[ 1., 2., 3., 4., 5.],
[ 6., 7., 8., 9., 10.]])Tips¶
numpy.padandtorch.nn.utils.rnn.pad_sequencecan only increase the length of sequence (nump array, list or tensor) whiletf.keras.preprocessing.sequence.pad_sequencecan both increase and decrease the length of a sequence.numpy.padimplements many different ways (constant, edge, linear_ramp, maximum, mean, median, minimum, reflect, symmetric, wrap, empty and abitrary padding function) to pad a sequence whiletorch.nn.utils.rnn.pad_sequenceandtf.keras.preprocessing.sequence.pad_sequenceonly support padding a constant value (as this is only use case in NLP).You can easily control the final length (after padding) with
numpy.padandtf.keras.preprocessing.sequence.pad_sequence.torch.nn.utils.rnn.pad_sequencepad each tesor to be have the max length of all tensors. You cannot easily usetorch.nn.utils.rnn.pad_sequeenceto pad sequence to an arbitrary length.Both
numpy.padpads a single iterable object (numpy array, list or Tensor),torch.nn.utils.rnn.pad_sequencepads a sequence of Tensors, andtf.keras.preprocessing.sequence.pad_sequencepads a sequence of iterable objects (numpy arrays, lists or Tensors).
Overall,
tf.keras.preprocessing.sequence.pad_sequence is the most useful for NLP.
torch.nn.utisl.rnn.pad_sequence seems to be quite limited.
numpy.pad can be used to easily implement customized padding strategy.
a = [1, 2, 3, 4, 5]
np.pad(a, (2, 3), "constant", constant_values=(4, 6))array([4, 4, 1, 2, 3, 4, 5, 6, 6, 6])t = torch.nn.utils.rnn.pad_sequence(
[
torch.tensor([1, 2, 3]),
torch.tensor([1, 2, 3, 4]),
]
)
ttensor([[1, 1],
[2, 2],
[3, 3],
[0, 4]])t[0]tensor([1, 1])torch.nn.utils.rnn.pad_sequence(
[
[1, 2, 3],
[1, 2, 3, 4],
]
)---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-25-242698656610> in <module>
2 [
3 [1, 2, 3],
----> 4 [1, 2, 3, 4],
5 ]
6 )
~/.local/lib/python3.7/site-packages/torch/nn/utils/rnn.py in pad_sequence(sequences, batch_first, padding_value)
325 # assuming trailing dimensions and type of all the Tensors
326 # in sequences are same and fetching those from sequences[0]
--> 327 max_size = sequences[0].size()
328 trailing_dims = max_size[1:]
329 max_len = max([s.size(0) for s in sequences])
AttributeError: 'list' object has no attribute 'size'tf.keras.preprocessing.sequence.pad_sequences(
[[1, 2, 3, 4, 5]],
maxlen=3,
dtype="long",
value=0,
truncating="post",
padding="post",
)array([[1, 2, 3]])tf.keras.preprocessing.sequence.pad_sequences(
[[1, 2, 3, 4, 5]],
maxlen=9,
dtype="long",
value=0,
truncating="post",
padding="post",
)array([[1, 2, 3, 4, 5, 0, 0, 0, 0]])