-
Notifications
You must be signed in to change notification settings - Fork 2.2k
add min_padding_length to TokenCharactersIndexer (#1954) #1967
add min_padding_length to TokenCharactersIndexer (#1954) #1967
Conversation
@@ -33,10 +36,12 @@ class TokenCharactersIndexer(TokenIndexer[List[int]]): | |||
""" | |||
# pylint: disable=no-self-use | |||
def __init__(self, | |||
min_padding_length: int = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be added at the end of the argument list, otherwise you'll break existing calls to the constructor (which is why tests are failing). This is also why it needs to have a default value of 0 - we don't want to break backwards compatibility.
And, lastly, it's important for every new piece of functionality in the core library to have a test making sure it does what it's supposed to. Can you add a test making sure that things actually get padded correctly when this parameter is used?
Thanks for your reply. I am willing to add a test case and make code backward compatibility
|
* test min_padding_length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, it looks very close to done. We just need to switch the line endings back to how they were in the token indexer, and simplify the test a bit.
assert padded == {"char": [[2, 3, 3, 4, 5, 6, 7, 8], | ||
[9, 10, 0, 0, 0, 0, 0, 0], | ||
[11, 12, 4, 10, 13, 14, 4, 0], | ||
[15, 0, 0, 0, 0, 0, 0, 0]]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 84 through 126 here test functionality that's already tested elsewhere. The only thing we really need to test is the last case that you have, starting at line 128.
key, value = item.keys(), item.values() | ||
assert set([key_padding_lengths]) == set(key) | ||
value_padding_lengths = max(value_padding_lengths, max(value)) | ||
assert value_padding_lengths == 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need for lines 130 through 141; you can just go straight to computing the padding, and just have the assert that's on line 145. The rest is all implicitly tested with that one assert.
@@ -1,113 +1,119 @@ | |||
from typing import Dict, List |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you ended up changing the line endings on this file to DOS line endings. Can you switch it back?
OK. PR's coming soon. |
* test min_padding_length
….com/WrRan/allennlp into WrRan-patch-token_characters_indexer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM!
patch to #1954
Although the parameter
min_padding_length
is provided a default value (0
), I think it may be a better idea with no default value.