-
Notifications
You must be signed in to change notification settings - Fork 15
Find circular dependencies #189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for looking at this - just FYI I am on the fence about adding this to Grimp - when I've looked at this before, the results you get on a real code base are not as helpful as you might expect. (You can have potentially millions of results just because if a codebase isn't layered, then there are potentially very many cycles.) There is always the option of converting the graph to networkx as documented here, if analysis is needed that Grimp doesn't support. I would be open to reviewing something, but it may be thorny to figure out how to present useful results, on real code bases. |
Oh, I've just realised what this is related to - seddonym/import-linter#250. Sorry if I'm giving you mixed messages 😄 I've added a link in the description. As I mentioned in my other comment, I think we might want to do some cycle-detection in Grimp. But I'm le 8000 ss convinced that this should be part of Grimp's public API. Instead I would imagine we would have a high level method a bit like |
@seddonym thanks for the comments. Looking at the current API of Grimp, I can see that a public method @Peter554 already took a look into the Draft of implementation and he has some good insights. Before any implementation let's agree: 1. If you would like to have such a functionality in Grimp and 2. How the method signature should look like. When it comes to the point nr 1, I would really encourage you guys to welcome such functionality in the Grimp since it will allows to implement a new contract in the The proposed method signatures (point 2.):
Please let me know guys what do you think and thanks for your time. I am not writing any more code until we agree on something here :) |
Thanks @K4liber - my feeling is that it probably does make sense to add this to grimp in some form, although I think/hope we can make the PR quite a bit simpler (see #190 (comment)). Anyways, let's wait for @seddonym to say more what he thinks. As for the proposed signature, I'd x-post my suggestion here #190 (comment). How about this? def find_shortest_cycle(self, module: str, as_package: bool) -> Optional[Tuple[str, ...]]:
"""
Returns the shortest import cycle from one module/package to itself, or None if no cycle exists.
"""
...
|
@K4liber I didn't quite understand your suggestion - what does |
The problem is described in more details in this closed issue: seddonym/import-linter#249 |
@Peter554 The proposed signatures with docstrings:
|
Thanks @K4liber for your continued thinking about this!
I'm 100% behind this, in theory - I also want to be able to use a contract like that. But I think the details are tricky. There are two main things I'd like to understand better:
Starter for ten: how should we represent the cycles in the following code base? graph
a --> b
b --> c
c --> d
c --> e
d --> a
e --> a
Or this one? graph
a <--> b
b <--> c
c <--> d
c <--> a
b <--> d
d <--> a
|
@seddonym good points. But first let me go back to the @Peter554 proposition "find_shortest_cycle". I am not sure how we should utilize such a function. I think that the story goes as follows: "I (as a developer in some python project) would like to check if I do have any circular dependencies. I would like to keep the structure of my modules/packages clean and hierarchical. It will allow we to avoid circular import errors and keep reduced binding between components. I would like to search for circular dependencies in my project on the module or package level. I could fix such dependencies one by one, but it will be more friendly to use if a functionality could yield all the found dependencies at once." I think we should go for a "find_circular_dependency" solution which will just yield a first found dependency if it will be significantly more optimal and we do not need to bother about repeating ourselves in the output by returning the equivalent cycles (sub-cycles or ones with changed order) multiple times. Coming back to @seddonym questions. I hope I partially have answered them already. Maybe "find_circular_dependency" is not the best when it comes to usefulness, but it is good enough with keeping a good balance between usefulness and simplicity. When it comes to "determinism", I assume existence of a deterministic algorithm that can be used in "find_circular_dependency". Maybe we could introduce an optional "root" parameter, which is a starting node (package/module) for the algorithm, to make sure the output is "deterministic". I will check what kind of algorithms we can use. |
Hi @K4liber thanks for the update.
My thinking is that this function could be called for a single module/package, but could equally be called for many modules /packages (even for every module/package in the entire graph) in a
This is a good point. In my suggested approach above using
With my suggested approach we could show one cycle (the shortest cycle) per module/package in the graph in import linter (deduplicating equivalent cycles). This would probably already be plenty of feedback for the user. It's worth noting that none of the methods within grimp so far return all possible chains (see e.g. #193) - I believe computing all chains is an NP hard problem (presumably this is the same for cycles). I can see you are making some good points. I'm most unsure about:
Perhaps this isn't an either/or question. Perhaps we want both of these. A more granular API and a higher level API. This would be kind of consistent with the import chain APIs: |
@Peter554 since the I think we can just use BFS for finding the shortest cycle in directed graph which has a O(E) complexity (O(V+E) to be more precised, but we can assume E > V). In the more high level functionality (import-linter) user optionally provide a node (module or package) and only find the shortest cycle with this node included. If the user do not provide the node, we loop over every node which will give in worst case scenario O(V*E) complexity. I think we can already build a useful contract from that. But we can also discuss about more comprehensive approach with returning all cycles at once and if the To conclude, it makes sense for me to have the following methods:
And then create a new contract based on |
Thanks for the comments everyone. My current thinking about our end goal would be for Grimp to have a high level function something like this: @dataclass
class CyclicDependency:
parent: str
siblings: frozenset[str]
# Cycles are chains of imports, from importer to imported, and so on.
# The first item in the cycle will be from the sibling first in the alphabet.
cycles: frozenset[tuple[str, ...]]
def find_cycles_between_siblings(
self,
ancestor: str,
allow_indirect: bool = True,
max_generations: int | None = None,
max_cycles_per_dependency: int = 10,
ignore_imports: Set[str] = frozenset(),
) -> Set[CyclicDependency]:
"""
Return cyclic dependencies between sets of siblings that are descendants of the supplied package.
Let's say we search an ancestor called `mypackage.foo`.
First, import cycles will be searched for between its children
(`mypackage.foo.blue`, `mypackage.foo.green`, `mypackage.foo.yellow`)
- if any are found they'll be added to the results.
Then we proceed to the next generation. Each children is considered in
turn as the new ancestor, whose children are searched. And so on, depending
on the value of `max_generations`.
If a cycle is found, the algorithm will temporarily remove it from the graph before continuing
with its search. As a result, any cycles that have sections in common with other cycles may not be returned.
Args:
- ancestor: name of the Python module whose descendants we should
consider, e.g. `mypackage.blue`.
- allow_indirect: whether to consider cycles containing modules that are
not descendants of the supplied ancestor.
- max_generations: how many generations of modules to consider, e.g. a
max_depth of 2 would first consider the children, and then move down a
generation to consider the children of each of those children.
- max_cycles_per_dependency: the maximum number of cycles to include in each
`CyclicDependency`. For code bases with a very large number of cycles, this helps
limit the time the search takes to run.
- ignore_imports: set of import expressions to exclude from consideration.
Returns a set of CyclicDependencies, e.g.:
{
CyclicDependency(
parent="mypackage.foo",
siblings=frozenset({"mypackage.foo.blue", "mypackage.foo.green", "mypackage.foo.yellow"}),
cycles=frozenset({
(
# The beginning and end of each cycle will all be within `blue`, because
# it's earlier than `green` and `yellow` in the alphabet.
"mypackage.foo.blue.alpha",
"mypackage.foo.yellow.beta.one",
"mypackage.foo.green.gamma",
"mypackage.bar", # Included because allow_indirect_imports=True
"mypackage.foo.blue.delta.two",
),
(
"mypackage.foo.blue.alpha",
"mypackage.foo.green.epsilon",
"mypackage.foo.blue.zeta",
),
}),
),
CyclicDependency(
parent="mypackage.foo",
siblings=frozenset({"mypackage.foo.blue", "mypackage.foo.green"}),
cycles=frozenset({
(
"mypackage.foo.blue.alpha",
"mypackage.foo.green.beta",
"mypackage.foo.blue.gamma",
),
(
"mypackage.foo.blue.delta",
"mypackage.foo.green.epsilon",
"mypackage.foo.green.zeta",
"mypackage.foo.blue.eta",
),
}),
),
CyclicDependency(
parent="mypackage.foo.yellow",
siblings=frozenset({"mypackage.foo.yellow.alpha", "mypackage.yellow.gamma"}),
cycles=frozenset({
(
"mypackage.foo.yellow.alpha.one",
"mypackage.foo.yellow.gamma.two",
),
}),
),
}
""" Open to making some lower level functions too, but interested to know what people think of this as a first pass of an API to work towards? Is the documentation clear? |
@seddonym solid documentation, sounds good to me. With the additional parameters, the described functionality will be scalable. I have two questions:
|
The reason I included this here is so we could report more nicely about the cycle without querying the graph again. E.g.
Does that make sense? |
Not a bad idea, though it does complicate the API even more. Another option would be to return a generator instead of a set so callers could just get the first one. I'm not sure. Maybe we should start with what we have and experiment with what it's like to use on a real code base. |
Yes, makes sense :) |
Yeah, using a generator could be a good idea. That's probably a detail that can be applied in the very end of development without a lot of effort. When it comes to me, I am satisfied with such a contract. The goal is clear now. We can start/continue the development. |
@seddonym I just have a second thoughts that the |
Could you say a little bit more about why you think this is the case? I would expect the algorithm to work similarly to the pathfinding used for I've included this kind of behaviour in the proposed docstring above:
|
Oh, you wrote about it already. But what do you mean by "removing cycle from the graph"? If you remove any edge it can affect other cycles, not only the recently found one, right? |
Yes exactly. Bear in mind that the last time I ran a cycle detection algorithm on the main codebase I work on, it returned many millions of cycles before I stopped it! It just wasn't useful. This is a design choice that we'll need to evaluate by running the algorithm on real code bases, in my opinion. |
In such a case, we cannot guarantee that the method will return all the cycles. I am ok with that, but I think it should be underlined in the method description.
|
I was reading Bob Uncle book and he mentioned "Acyclic dependencies principle" (https://en.wikipedia.org/wiki/Acyclic_dependencies_principle) which is exactly what I wanted to achieve through a "Tree contract" (seddonym/import-linter#250). Btw. It should not be called "Tree contract", but "DAG contract" since DAG is a structure allowed for the package dependencies, tree is too restrict. "AcyclicDependenciesContract" sounds good as well. |
I agree, I think |
Uh oh!
There was an error while loading. Please reload this page.
Adding a functionality for finding circular dependencies between modules/packages.
See seddonym/import-linter#250
The text was updated successfully, but these errors were encountered: