8000 Don't insert a duplicate import when applying package autocorrects by neilparikh · Pull Request #8765 · sorbet/sorbet · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Don't insert a duplicate import when applying package autocorrects #8765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

neilparikh
Copy link
Collaborator
@neilparikh neilparikh commented Apr 12, 2025

When applying autocorrects in core/ErrorFlusherStdout.cc, we will now ignore the autocorrects that are set on the missing import errors. Instead, we store all the missing imports for a given package in PackageInfo, and then aggregate those to compute the autocorrect. This means that 2 missing import errors for the same missing import will only lead to the import being inserted once

Something similar is done in code_action.cc. There, we have a list of the all the edits for autocorrects for the current file. While constructing that list, we will skip missing import errors. Then, we aggregate the autocorrects from the PackageInfo, but only those that apply for the current file.

Easier to review commit-by-commit.

Motivation

Don't import the same package more than once.

Test plan

See included automated tests.

@neilparikh neilparikh changed the title Neil dedup import cli Don't insert a duplicate import when applying package autocorrects in -a mode Apr 12, 2025
@neilparikh neilparikh marked this pull request as ready for review April 12, 2025 00:54
@neilparikh neilparikh requested a review from a team as a code owner April 12, 2025 00:54
@neilparikh neilparikh requested review from froydnj and removed request for a team April 12, 2025 00:54
Copy link
Contributor
@froydnj froydnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right general idea, but it is the wrong layer for the implementation to live at -- ErrorFlusherStdout should not have any knowledge of the packager errors, for instance. (Note also that we'd have to write separate support for the LSP implementation, which is not desirable.)

I think it's possible to rewrite this such that the worker threads in the VisibilityChecker maintain their own lists of any import errors that have happened, and they can send those accumulated errors off to a queue which gets drained and deduplicates errors before sending them off to the actual error queue in the GlobalState. (I believe that we do this kind of pattern in at least one other place?) I do not think we want to be storing this kind of information in the package DB -- we can maintain this same mapping in whatever structure we have in the VisibilityChecker instead. Happy to explain/explore more over Slack.

size_t idx;
for (auto result = taskq->try_pop(idx); !result.done(); result = taskq->try_pop(idx)) {
ast::ParsedFile &f = files[idx];
if (!f.file.data(gs).isPackage(gs)) {
auto pkgName = gs.packageDB().getPackageNameForFile(f.file);
if (pkgName.exists()) {
core::Context ctx{gs, core::Symbols::root(), f.file};
VisibilityCheckerPass pass{ctx, gs.packageDB().getPackageInfo(pkgName)};
VisibilityCheckerPass pass{ctx, *nonConstGs.packageDB().getPackageInfoNonConst(pkgName)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible for this kind of pattern -- having non-constant shared data used in worker threads -- to be safe, but it is not safe in this instance. We could have multiple files that share the same package both have import errors, and then separate threads would be modifying the same package without any locking between them.

@neilparikh neilparikh force-pushed the neil-dedup-import-cli branch 2 times, most recently from 6de52db to 93edb89 Compare April 21, 2025 01:10
@neilparikh neilparikh force-pushed the neil-dedup-import-cli branch 2 times, most recently from e0f5469 to b011b82 Compare April 25, 2025 00:24
@neilparikh neilparikh changed the title Don't insert a duplicate import when applying package autocorrects in -a mode Don't insert a duplicate import when applying package autocorrects Apr 25, 2025
@neilparikh neilparikh requested a review from froydnj April 25, 2025 00:26
@neilparikh neilparikh force-pushed the neil-dedup-import-cli branch from b011b82 to 767cd2b Compare April 25, 2025 18:55
Copy link
Contributor
@froydnj froydnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the review delay here, I think this is better than the previous version. Still a few things to fix.

IIUC, this is going to collide with @jez's PackageRef work, so you may want to coordinate -- if you haven't already -- on what the right thing to do there is.

Comment on lines +632 to +640
resultq->push({f.file, std::move(pass.missingImports)}, 1);
}

barrier.DecrementCount();
});

barrier.Wait();

std::pair<core::FileRef, UnorderedMap<std::pair<core::packages::MangledName, core::packages::ImportType>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will have to be rebased across #8852.

Even after that's done, we should make the previous loop here use multiplexJob so that the aggregation loop below can proceed in parallel with the processing of the visibility checking and not wait for all the visibility checks to be done.

for (auto &[key, value] : trackedMissingImports_) {
auto importName = key.first;
auto importType = key.second;
auto autocorrect = value.second;
Copy link
Contributor

Choose a reason for hid 8000 ing this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto autocorrect = value.second;
auto &autocorrect = value.second;

// test_import.
continue;
}
allEdits.insert(allEdits.end(), autocorrect.edits.begin(), autocorrect.edits.end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These autocorrect.edits should probably be move iterators?

std::vector<core::AutocorrectSuggestion::Edit> allEdits;
for (auto &[_, value] : trackedMissingImports_) {
auto files = value.first;
auto autocorrect = value.second;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto autocorrect = value.second;
auto &autocorrect = value.second;

auto files = value.first;
auto autocorrect = value.second;
if (files.contains(file)) {
allEdits.insert(allEdits.end(), autocorrect.edits.begin(), autocorrect.edits.end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on move iterators here.

Comment on lines +475 to +476
// TODO(neil): find out why this is sometimes not true and enable this ENFORCE
/* ENFORCE(autocorrect.has_value()); */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we doing this TODO before committing?

: title(title), edits(edits), isDidYouMean(isDidYouMean) {}
bool skipWhenAggregated;

AutocorrectSuggestion() : title(""), edits({}), isDidYouMean(false), skipWhenAggregated(false) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: I would be inclined to declare the (bool-only) fields as bool isDidYouMean = false; in the class definition and then this constructor could just be AutocorrectSuggestion() = default;.

Comment on lines +706 to +708
if (files.contains(file)) {
files.erase(file);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: better to do something like:

auto it = files.find(file);
if (it == files.end()) {
  continue;
}
files.erase(it);

so you don't do unnecessary hash lookups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0