Multiline summary is not split correctly #315

FlorianGD · 2025-05-20T13:52:59Z

I saw the error in CI because I used master (that was needed for a while to use with pre-commit version 4 and above).

Some multi lines sentences are split incorrectly. This is the behavior of the function split_summary

>>> split_summary(["First sentence is here. Second sentence is long", "and split in two. I even have a third sentence here.", "", "And other text here."])
['First sentence is here.',
 'and split in two. I even have a third sentence here.',
 'Second sentence is long',
 '',
 'And other text here.']

I think this comes from the split_summary function that does this:

    lines[0] = first_sentence
    if rest_text:
        lines.insert(2, rest_text)

and inserts at the wrong place if we have sentences that are too long.

I am not familiar with the code base, but maybe something along those lines could work?

def split_summary(lines) -> List[str]:
    """Split multi-sentence summary into the first sentence and the rest."""
    if not lines or not lines[0].strip():
        return lines

    text = lines[0].strip()

    tokens = re.split(r"(\s+)", text)  # Keep whitespace for accurate rejoining
    sentence = []
    rest = []
    i = 0

    while i < len(tokens):
        token = tokens[i]
        sentence.append(token)

        if token.endswith(".") and not any(
            "".join(sentence).strip().endswith(abbr) for abbr in ABBREVIATIONS
        ):
            i += 1
            break

        i += 1

    rest = tokens[i:]
    first_sentence = "".join(sentence).strip()
    rest_text = "".join(rest).strip()

    new_lines = [first_sentence, ""]
    if rest_text:
        new_lines.append(rest_text)
        new_lines.extend(line for line in  lines[1:] if line)

    return new_lines

This gives:

>>> split_summary(["First sentence is here. Second sentence is long", "and split in two. I even have a third sentence here.", "", "And other text here."])
['First sentence is here.',
 '',
 'Second sentence is long',
 'and split in two. I even have a third sentence here.',
 'And other text here.']

I do not know if the result should be processed more before returning or if it is something that is taken into account elsewhere in the codebase.

The text was updated successfully, but these errors were encountered:

github-actions bot added the fresh This is a new issue label May 20, 2025

weibullguy added P: bug PEP 257 violation or existing functionality that doesn't work as documented C: convention Relates to docstring format convention and removed fresh This is a new issue labels May 20, 2025

github-actions bot added the U: high label May 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiline summary is not split correctly #315

Multiline summary is not split correctly #315

Multiline summary is not split correctly #315

Multiline summary is not split correctly #315

Comments