8000 Find circular dependencies · Issue #189 · seddonym/grimp · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Find circular dependencies #189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
K4liber opened this issue Feb 16, 2025 · 26 comments
Open

Find circular dependencies #189

K4liber opened this issue Feb 16, 2025 · 26 comments

Comments

@K4liber
Copy link
K4liber commented Feb 16, 2025

Adding a functionality for finding circular dependencies between modules/packages.

See seddonym/import-linter#250

@seddonym
Copy link
Owner

Thanks for looking at this - just FYI I am on the fence about adding this to Grimp - when I've looked at this before, the results you get on a real code base are not as helpful as you might expect. (You can have potentially millions of results just because if a codebase isn't layered, then there are potentially very many cycles.)

There is always the option of converting the graph to networkx as documented here, if analysis is needed that Grimp doesn't support.

I would be open to reviewing something, but it may be thorny to figure out how to present useful results, on real code bases.

@seddonym
Copy link
Owner
seddonym commented Feb 17, 2025

Oh, I've just realised what this is related to - seddonym/import-linter#250. Sorry if I'm giving you mixed messages 😄 I've added a link in the description.

As I mentioned in my other comment, I think we might want to do some cycle-detection in Grimp. But I'm le 8000 ss convinced that this should be part of Grimp's public API.

Instead I would imagine we would have a high level method a bit like find_illegal_dependencies_for_layers that returns a meaningful result that Import Linter could be a thin wrapper around. It's probably worth agreeing on what that API might look like - and thoroughly defining what it would actually return, in the case of complex import graphs - before going too far down the implementation route.

@K4liber
Copy link
Author
K4liber commented Feb 22, 2025

@seddonym thanks for the comments. Looking at the current API of Grimp, I can see that a public method def find_cycles(self) -> list[list[str]] has a higher generality level than other public methods. Something more specific should fit better to the current code base.

@Peter554 already took a look into the Draft of implementation and he has some good insights. Before any implementation let's agree: 1. If you would like to have such a functionality in Grimp and 2. How the method signature should look like.

When it comes to the point nr 1, I would really encourage you guys to welcome such functionality in the Grimp since it will allows to implement a new contract in the import-linter (seddonym/import-linter#250) with the efficient (Rust) and independent (using Grimp instead of third-party package) approach. The new contract is a really friendly contract from the user perspective since it does not require any considerations on the layers hierarchy. It does simply not allow for any bidirectional dependency on modules or packages level. If I would start using import-linter for my project, such a contract could be the easiest start to improve the dependencies structure.

The proposed method signatures (point 2.):

  1. (get it all approach) def find_circular_dependencies(self, consider_package_dependencies: bool = False) -> set[PackageDependency]
  • (+) finds all circular dependencies in a single request
  • (-) can be slow
  1. (fail fast approach) def find_circular_dependency(self, consider_package_dependencies: bool = False) -> PackageDependency | None
  • (+) return the first found circular dependency which should be much faster to execute on a large code base
  • (-) even if faster than finding all dependencies at once, the user experience decreases since having multiple dependencies and willing to fix all of them, the user need to take an iterative approach (it's the similar case while you try to fix multiple circular import errors in python and you just get one error at each program execution)

Please let me know guys what do you think and thanks for your time. I am not writing any more code until we agree on something here :)

@Peter554
Copy link
Contributor
Peter554 commented Feb 22, 2025

Thanks @K4liber - my feeling is that it probably does make sense to add this to grimp in some form, although I think/hope we can make the PR quite a bit simpler (see #190 (comment)). Anyways, let's wait for @seddonym to say more what he thinks.

As for the proposed signature, I'd x-post my suggestion here #190 (comment). How about this?

def find_shortest_cycle(self, module: str, as_package: bool) -> Optional[Tuple[str, ...]]:
    """
    Returns the shortest import cycle from one module/package to itself, or None if no cycle exists.
    """
    ...
  • It's more granular - some users won't want to compute cycles for the entire graph but just care about a single module/package. Computing the cycles for the entire graph is likely to be extremely slow for the sorts of codebases @seddonym and I work with (the graph is very big!).
  • The module: str, as_package: bool is quite consistent with the existing grimp APIs (compare with e.g. find_downstream_modules)

@Peter554
Copy link
Contributor

@K4liber I didn't quite understand your suggestion - what does consider_package_dependencies do? Perhaps it would clearer if you add a docstring to those proposed signatures.

@K4liber
Copy link
Author
K4liber commented Feb 22, 2025

The problem is described in more details in this closed issue: seddonym/import-linter#249

@K4liber
Copy link
Author
K4liber commented Feb 23, 2025

@Peter554 The proposed signatures with docstrings:

  1. get it all approach
def find_circular_dependencies(
            self,
            consider_package_dependencies: bool = False
        ) -> set[list[str]]:
        """
        Find all circular dependencies between imports.

        "consider_package_dependencies=True" indicates looking for dependencies 
        not only between modules, but also on package level. Let's consider the following structure:

        project_root
        - package_1
        - - __init__.py
        - - module_a.py
        - - module_c.py
        - package_2
        - - __init__.py
        - - module_b.py

        and the following dependencies:

        - module_a depends on module_b
        - module_b depends on module_c

        We do not have any circular dependency between modules, but we have a circular
        dependency between packages:

        - package_1 depends on package_2 (since module_a depends on module_b)
        - package_2 depends on package_1 (since module_b depends on module_c)

        The method will return the following circular dependencies in such a case:

        {["package_1", "package_2", "package_1"]}

        """

  1. fail fast approach
def find_circular_dependency(
            self,
            consider_package_dependencies: bool = False
        ) -> list[str] | None:
        """
        Returns first found circular dependency between imports if any found.

        "consider_package_dependencies=True" indicates looking for dependencies 
        not only between modules, but also on package level. Let's consider the following structure:

        project_root
        - package_1
        - - __init__.py
        - - module_a.py
        - - module_c.py
        - package_2
        - - __init__.py
        - - module_b.py

        and the following dependencies:

        - module_a depends on module_b
        - module_b depends on module_c

        We do not have any circular dependency between modules, but we have a circular
        dependency between packages:

        - package_1 depends on package_2 (since module_a depends on module_b)
        - package_2 depends on package_1 (since module_b depends on module_c)

        The method will return the following circular dependency in such a case:

        ["package_1", "package_2", "package_1"]

        """

@seddonym
Copy link
Owner

Thanks @K4liber for your continued thinking about this!

When it comes to the point nr 1, I would really encourage you guys to welcome such functionality in the Grimp since it will allows to implement a new contract in the import-linter

I'm 100% behind this, in theory - I also want to be able to use a contract like that. But I think the details are tricky. There are two main things I'd like to understand better:

  • Useful results: we should make sure that Grimp / (or at least Import Linter) provides meaningful insight into import cycles, even for large or highly cyclic code bases.
  • Determinism: I think we should design for deterministic results. find_illegal_dependencies_for_layers currently isn't fully deterministic (at least not last time I checked), and that makes it harder to test and harder to guarantee that updates won't break peoples' contracts. For example, in a data structure that we choose to represent a cycle, how do we choose which item comes first?

Starter for ten: how should we represent the cycles in the following code base?

graph
    a --> b 
    b --> c
    c --> d
    c --> e
    d --> a
    e --> a
Loading

Or this one?

graph
   a <--> b
   b <--> c
   c <--> d
   c <--> a
   b <--> d
   d <--> a
Loading

@K4liber
Copy link
Author
K4liber commented Mar 1, 2025

@seddonym good points.

But first let me go back to the @Peter554 proposition "find_shortest_cycle". I am not sure how we should utilize such a function. I think that the story goes as follows:

"I (as a developer in some python project) would like to check if I do have any circular dependencies. I would like to keep the structure of my modules/packages clean and hierarchical. It will allow we to avoid circular import errors and keep reduced binding between components.

I would like to search for circular dependencies in my project on the module or package level. I could fix such dependencies one by one, but it will be more friendly to use if a functionality could yield all the found dependencies at once."

I think we should go for a "find_circular_dependency" solution which will just yield a first found dependency if it will be significantly more optimal and we do not need to bother about repeating ourselves in the output by returning the equivalent cycles (sub-cycles or ones with changed order) multiple times.

Coming back to @seddonym questions. I hope I partially have answered them already. Maybe "find_circular_dependency" is not the best when it comes to usefulness, but it is good enough with keeping a good balance between usefulness and simplicity.

When it comes to "determinism", I assume existence of a deterministic algorithm that can be used in "find_circular_dependency". Maybe we could introduce an optional "root" parameter, which is a starting node (package/module) for the algorithm, to make sure the output is "deterministic". I will check what kind of algorithms we can use.

@Peter554
Copy link
Contributor
Peter554 commented Mar 1, 2025

Hi @K4liber thanks for the update.

But first let me go back to the @Peter554 proposition find_shortest_cycle(module, as_package). I am not sure how we should utilize such a function.

My thinking is that this function could be called for a single module/package, but could equally be called for many modules /packages (even for every module/package in the entire graph) in a for loop (this would be needed if you want an import linter contract that checks the entire graph).

about repeating ourselves in the output by returning the equivalent cycles (sub-cycles or ones with changed order) multiple times.

This is a good point. In my suggested approach above using find_shortest_cycle, we'd probably want to deduplicate equivalent cycles. To clarify the approach I'm suggesting:

  • Add find_shortest_cycle(module, as_package) to grimp
  • Create a contract in import linter that calls find_shortest_cycle for all modules of interest (maybe all the modules in the graph, if that's really what you want)
  • Deduplicate equivalent cycles in import linter to get the contract result (I think this shouldn't be too hard)
  • Show the contract results

I could fix such dependencies one by one, but it will be more friendly to use if a functionality could yield all the found dependencies at once."

With my suggested approach we could show one cycle (the shortest cycle) per module/package in the graph in import linter (deduplicating equivalent cycles). This would probably already be plenty of feedback for the user. It's worth noting that none of the methods within grimp so far return all possible chains (see e.g. #193) - I believe computing all chains is an NP hard problem (presumably this is the same for cycles).

I can see you are making some good points. I'm most unsure about:

  • Precedent/consistency: The API you are proposing is not very consistent with the existing grimp APIs. Grimp doesn't yet have any methods that operate on the entire graph like this. Is there a way we can make the proposal more consistent with grimps existing APIs?
  • Performance: The method you are proposing - to analyse the entire graph with no option to do anything more granular - sounds very computationally intensive. Maybe it's not - I might be misunderstanding the proposed algorithm. The sorts of graphs @seddonym and I work with are very big. See large_graph in the benchmarks file - do you think your method could work for such a graph?

Perhaps this isn't an either/or question. Perhaps we want both of these. A more granular API and a higher level API. This would be kind of consistent with the import chain APIs: find_shortest_chain, find_shortest_chains and find_illegal_dependencies_for_layers. Perhaps eventually we'll want multiple methods related to cycles. But I'd suggest the lower-level, simpler API to start i.e. find_shortest_cycle.

@K4liber
Copy link
Author
K4liber commented Mar 1, 2025

@Peter554 since the find_shortest_cycle(module, as_package) fits better to the current design of Grimp API that's an advantage for this proposition. And finding the shortest cycle starting from a provided node makes it deterministic or at least leave less space for not-deterministic behavior.

I think we can just use BFS for finding the shortest cycle in directed graph which has a O(E) complexity (O(V+E) to be more precised, but we can assume E > V).

In the more high level functionality (import-linter) user optionally provide a node (module or package) and only find the shortest cycle with this node included. If the user do not provide the node, we loop over every node which will give in worst case scenario O(V*E) complexity.

I think we can already build a useful contract from that. But we can also discuss about more comprehensive approach with returning all cycles at once and if the find_shortest_cycle could be a part of such an algorithm or we are going to repeat a lot of computations.

To conclude, it makes sense for me to have the following methods:

  1. (Grimp)
def find_shortest_cycle(
            self, 
            module: str,
            as_package: bool
    ) -> Optional[Tuple[str, ...]]:
    """
    Returns the shortest import cycle from one module/package to itself, or None if no cycle exists.
    """
    ...
  1. (import-linter)
def find_circular_dependency(
            self,
            modules: Optional[list[str]] = None,
            as_package: bool = False
        ) -> Optional[list[str]]:

And then create a new contract based on find_circular_dependency.

@Peter554 Peter554 mentioned this issue Mar 2, 2025
5 tasks
@Peter554
Copy link
Contributor
Peter554 commented Mar 2, 2025

@seddonym @K4liber I've started a draft PR on the find_shortest_cycle method I'm proposing -> #194

@seddonym
Copy link
Owner
seddonym commented Mar 4, 2025

Thanks for the comments everyone.

My current thinking about our end goal would be for Grimp to have a high level function something like this:

@dataclass
class CyclicDependency:
    parent: str
    siblings: frozenset[str]
    # Cycles are chains of imports, from importer to imported, and so on.
    # The first item in the cycle will be from the sibling first in the alphabet.
    cycles: frozenset[tuple[str, ...]] 


def find_cycles_between_siblings(
    self,
    ancestor: str,
    allow_indirect: bool = True,
    max_generations: int | None = None,
    max_cycles_per_dependency: int = 10,
    ignore_imports: Set[str] = frozenset(),
) -> Set[CyclicDependency]:
    """
    Return cyclic dependencies between sets of siblings that are descendants of the supplied package.
    
    Let's say we search an ancestor called `mypackage.foo`.
    First, import cycles will be searched for between its children
    (`mypackage.foo.blue`, `mypackage.foo.green`, `mypackage.foo.yellow`)
    - if any are found they'll be added to the results.

    Then we proceed to the next generation. Each children is considered in
    turn as the new ancestor, whose children are searched. And so on, depending
    on the value of `max_generations`.
 
    If a cycle is found, the algorithm will temporarily remove it from the graph before continuing
    with its search. As a result, any cycles that have sections in common with other cycles may not be returned.

    Args:
        - ancestor: name of the Python module whose descendants we should
           consider, e.g. `mypackage.blue`.
        - allow_indirect: whether to consider cycles containing modules that are
           not descendants of the supplied ancestor.
        - max_generations: how many generations of modules to consider, e.g. a
           max_depth of 2 would first consider the children, and then move down a
           generation to consider the children of each of those children.
        - max_cycles_per_dependency: the maximum number of cycles to include in each
          `CyclicDependency`. For code bases with a very large number of cycles, this helps
          limit the time the search takes to run.
        - ignore_imports: set of import expressions to exclude from consideration.

    Returns a set of CyclicDependencies, e.g.:
        {
            CyclicDependency(
                parent="mypackage.foo",
                siblings=frozenset({"mypackage.foo.blue", "mypackage.foo.green", "mypackage.foo.yellow"}),
                cycles=frozenset({
                    (
                        # The beginning and end of each cycle will all be within `blue`, because
                        # it's earlier than `green` and `yellow` in the alphabet.
                        "mypackage.foo.blue.alpha",
                        "mypackage.foo.yellow.beta.one",
                        "mypackage.foo.green.gamma",
                        "mypackage.bar",  # Included because allow_indirect_imports=True
                        "mypackage.foo.blue.delta.two",
                    ),
                    (
                        "mypackage.foo.blue.alpha",
                        "mypackage.foo.green.epsilon",
                        "mypackage.foo.blue.zeta",
                    ),
                }),
            ),
            CyclicDependency(
                parent="mypackage.foo",
                siblings=frozenset({"mypackage.foo.blue", "mypackage.foo.green"}),
                cycles=frozenset({
                    (
                        "mypackage.foo.blue.alpha",
                        "mypackage.foo.green.beta",
                        "mypackage.foo.blue.gamma",
                    ),
                    (
                        "mypackage.foo.blue.delta",
                        "mypackage.foo.green.epsilon",
                        "mypackage.foo.green.zeta",
                        "mypackage.foo.blue.eta",
                    ),
                }),
            ),
            CyclicDependency(
                parent="mypackage.foo.yellow",
                siblings=frozenset({"mypackage.foo.yellow.alpha", "mypackage.yellow.gamma"}),
                cycles=frozenset({
                    (
                        "mypackage.foo.yellow.alpha.one",
                        "mypackage.foo.yellow.gamma.two",
                    ),
                }),
            ),
        }    
"""

Open to making some lower level functions too, but interested to know what people think of this as a first pass of an API to work towards? Is the documentation clear?

@K4liber
Copy link
Author
K4liber commented Mar 8, 2025

@seddonym solid documentation, sounds good to me. With the additional parameters, the described functionality will be scalable.

I have two questions:

  1. I am not sure what is a justification on CyclicDependency.sibilings attribute existence. Maybe it can just be a property calculated over the cycles? Its a detail actually, but I am not sure if it should be included in the broken contract log.

  2. How about next to the max_cycles_per_dependency parameter we can introduce max_cycles_overall? It will allow for ASAP feedback if one executes the contract check with max_cycles_overall=1.

@seddonym
Copy link
Owner
seddonym commented Mar 10, 2025

I am not sure what is a justification on CyclicDependency.sibilings attribute existence. Maybe it can just be a property calculated over the cycles? Its a detail actually, but I am not sure if it should be included in the broken contract log.

The reason I included this here is so we could report more nicely about the cycle without querying the graph again. E.g.

Cycle detected in `mypackage.foo`.
There is a cycle between mypackage.foo.blue, mypackage.foo.green, and mypackage.foo.yellow.

    mypackage.foo.blue.alpha -> 
    mypackage.foo.yellow.beta.one ->
    mypackage.foo.green.gamma ->
    mypackage.bar ->
    mypackage.foo.blue.delta.two

Does that make sense?

@seddonym
Copy link
Owner

How about next to the max_cycles_per_dependency parameter we can introduce max_cycles_overall? It will allow for ASAP feedback if one executes the contract check with max_cycles_overall=1.

Not a bad idea, though it does complicate the API even more. Another option would be to return a generator instead of a set so callers could just get the first one. I'm not sure. Maybe we should start with what we have and experiment with what it's like to use on a real code base.

@K4liber
Copy link
Author
K4liber commented Mar 10, 2025

I am not sure what is a justification on CyclicDependency.sibilings attribute existence. Maybe it can just be a property calculated over the cycles? Its a detail actually, but I am not sure if it should be included in the broken contract log.

The reason I included this here is so we could report more nicely about the cycle without querying the graph again. E.g.

Cycle detected in `mypackage.foo`.
There is a cycle between mypackage.foo.blue, mypackage.foo.green, and mypackage.foo.yellow.

    mypackage.foo.blue.alpha -> 
    mypackage.foo.yellow.beta.one ->
    mypackage.foo.green.gamma ->
    mypackage.bar ->
    mypackage.foo.blue.delta.two

Does that make sense?

Yes, makes sense :)

@K4liber
Copy link
Author
K4liber commented Mar 10, 2025

How about next to the max_cycles_per_dependency parameter we can introduce max_cycles_overall? It will allow for ASAP feedback if one executes the contract check with max_cycles_overall=1.

Not a bad idea, though it does complicate the API even more. Another option would be to return a generator instead of a set so callers could just get the first one. I'm not sure. Maybe we should start with what we have and experiment with what it's like to use on a real code base.

Yeah, using a generator could be a good idea. That's probably a detail that can be applied in the very end of development without a lot of effort. When it comes to me, I am satisfied with such a contract. The goal is clear now. We can start/continue the development.

@K4liber
Copy link
Author
K4liber commented Mar 23, 2025

@Peter554 what are your thoughts on having a high level method find_cycles_between_siblings implemented in Grimp with a signature that @seddonym suggested?

@K4liber
Copy link
Author
K4liber commented Mar 23, 2025

@seddonym I just have a second thoughts that the max_cycles_per_dependency parameter is maybe not the best parameter to reduce the search time. I dont think we are going to implement an algorithm that will keep counting the cycles per each CycleDependency and when the limit is reached, it can start to ignore the rest of a search space for this CyclicDependency. It seems like we still need to "find it all" and then reduce the results based on the max_cycles_per_dependency value. Parameter itself is ok, but the statement "For code bases with a very large number of cycles, this helps limit the time the search takes to run." to be reconsider after low level implementation of the cycle search algorithm.

@seddonym
Copy link
Owner

It seems like we still need to "find it all" and then reduce the results based on the max_cycles_per_dependency value.

Could you say a little bit more about why you think this is the case? I would expect the algorithm to work similarly to the pathfinding used for find_illegal_dependencies_for_layers, where we find the shortest path and then remove it from the graph before looking for the next one.

I've included this kind of behaviour in the proposed docstring above:

If a cycle is found, the algorithm will temporarily remove it from the graph before continuing with its search.

@K4liber
Copy link
Author
K4liber commented Mar 24, 2025

Oh, you wrote about it already. But what do you mean by "removing cycle from the graph"? If you remove any edge it can affect other cycles, not only the recently found one, right?

@seddonym
Copy link
Owner

If you remove any edge it can affect other cycles, not only the recently found one, right?

Yes exactly. Bear in mind that the last time I ran a cycle detection algorithm on the main codebase I work on, it returned many millions of cycles before I stopped it! It just wasn't useful.

This is a design choice that we'll need to evaluate by running the algorithm on real code bases, in my opinion.

@K4liber
Copy link
Author
K4liber commented Mar 25, 2025

In such a case, we cannot guarantee that the method will return all the cycles. I am ok with that, but I think it should be underlined in the method description.

As a result, any cycles that have sections in common with other cycles may not be returned., it is already there, nevermind. Let's code ;)

@K4liber
Copy link
Author
87FF K4liber commented May 19, 2025

I was reading Bob Uncle book and he mentioned "Acyclic dependencies principle" (https://en.wikipedia.org/wiki/Acyclic_dependencies_principle) which is exactly what I wanted to achieve through a "Tree contract" (seddonym/import-linter#250). Btw. It should not be called "Tree contract", but "DAG contract" since DAG is a structure allowed for the package dependencies, tree is too restrict. "AcyclicDependenciesContract" sounds good as well.

@seddonym
Copy link
Owner

I agree, I think acyclic is a good name for the contract type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0