8000 Multiline summary is not split correctly · Issue #315 · PyCQA/docformatter · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Multiline summary is not split correctly #315

New issue

Have a question about th 8000 is project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
FlorianGD opened this issue May 20, 2025 · 0 comments
Open

Multiline summary is not split correctly #315

FlorianGD opened this issue May 20, 2025 · 0 comments
Labels
C: convention Relates to docstring format convention P: bug PEP 257 violation or existing functionality that doesn't work as documented U: high

Comments

@FlorianGD
Copy link

I saw the error in CI because I used master (that was needed for a while to use with pre-commit version 4 and above).

Some multi lines sentences are split incorrectly. This is the behavior of the function split_summary

>>> split_summary(["First sentence is here. Second sentence is long", "and split in two. I even have a third sentence here.", "", "And other text here."])
['First sentence is here.',
 'and split in two. I even have a third sentence here.',
 'Second sentence is long',
 '',
 'And other text here.']

I think this comes from the split_summary function that does this:

    lines[0] = first_sentence
    if rest_text:
        lines.insert(2, rest_text)

and inserts at the wrong place if we have sentences that are too long.

I am not familiar with the code base, but maybe something along those lines could work?

def split_summary(lines) -> List[str]:
    """Split multi-sentence summary into the first sentence and the rest."""
    if not lines or not lines[0].strip():
        return lines

    text = lines[0].strip()

    tokens = re.split(r"(\s+)", text)  # Keep whitespace for accurate rejoining
    sentence = []
    rest = []
    i = 0

    while i < len(tokens):
        token = tokens[i]
        sentence.append(token)

        if token.endswith(".") and not any(
            "".join(sentence).strip().endswith(abbr) for abbr in ABBREVIATIONS
        ):
            i += 1
            break

        i += 1

    rest = tokens[i:]
    first_sentence = "".join(sentence).strip()
    rest_text = "".join(rest).strip()

    new_lines = [first_sentence, ""]
    if rest_text:
        new_lines.append(rest_text)
        new_lines.extend(line for line in  lines[1:] if line)

    return new_lines

This gives:

>>> split_summary(["First sentence is here. Second sentence is long", "and split in two. I even have a third sentence here.", "", "And other text here."])
['First sentence is here.',
 '',
 'Second sentence is long',
 'and split in two. I even have a third sentence here.',
 'And other text here.']

I do not know if the result should be processed more before returning or if it is something that is taken into account elsewhere in the codebase.

@github-actions github-actions bot added the fresh This is a new issue label May 20, 2025
@weibullguy weibullguy added P: bug PEP 257 violation or existing functionality that doesn't work as documented C: convention Relates to docstring format convention and removed fresh This is a new issue labels May 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: convention Relates to docstring format convention P: bug PEP 257 violation or existing functionality that doesn't work as documented U: high
Projects
None yet
Development

No branches or pull requests

2 participants
0