-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workspace discovery #14308
base: main
Are you sure you want to change the base?
Workspace discovery #14308
Conversation
eb9fa30
to
3cd8c12
Compare
3cd8c12
to
7522b98
Compare
|
82a48ae
to
1fc09bb
Compare
Oh damn you pre-commit. I'm just going to exclude that file over wasting time fighting pre-commit configuration. |
4b8d1cf
to
1a6640a
Compare
1a6640a
to
35d3969
Compare
I will review this today. |
Oh I'm not requested here -- that's fine actually. Some feedback on the summary:
No strong opinion though it'd be nice to change this in uv if you decide to do something different here.
I would probably advise against doing this... It makes the logic more complex to implement, test, and document, and it deviates from uv without good reason? I would be surprised if this ever comes up. Even the example you gave isn't quite right given that the exclude contains a dash, if I'm understanding correctly ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workspace discovery looks great, thank you!
My main question about this PR (and most of my comments) are about how we intend to map these workspace settings to a module resolver search paths config (which is one of the most important outputs from workspace discovery.) I think currently in this PR there is a mismatch between workspace discovery and the module resolver's translation to search paths, which means we will do the wrong thing in a lot of cases, but I'm not sure if it's the module resolver or the workspace code that should change, because I don't understand what the intended semantics of SearchPathSettings::workspace_root
are.
My review is mostly focused on these semantic and project structure questions; relying on Charlie's greater relevant experience from uv
for the more filesystem and pyproject oriented stuff.
pub struct SearchPathSettings { | ||
/// List of user-provided paths that should take first priority in the module resolution. | ||
/// Examples in other type checkers are mypy's MYPYPATH environment variable, | ||
/// or pyright's stubPath configuration setting. | ||
pub extra_paths: Vec<SystemPathBuf>, | ||
|
||
/// The root of the workspace, used for finding first-party modules. | ||
pub src_root: SystemPathBuf, | ||
pub workspace_root: SystemPathBuf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear what our intended semantics are in cases where the location of pyproject.toml
is not the Python search path root, for example if the search path root for Python code is inside a src/
directory relative to pyproject.toml
(I think this is a common setup.) Should workspace_root
in this case be the directory containing pyproject.toml
, or the src/
subdirectory?
It seems to me that the "workspace root" should probably be the directory containing pyproject.toml
, but that path doesn't really belong in SearchPathSettings
, it belongs in a project/workspace config struct. SearchPathSettings
should IMO only contain the path(s) where we actually find Python imports relative to (which in the above example would be the src/
subdirectory.) So renaming a SearchPathSettings
member to workspace_root
seems confusing to me, and probably not the right direction? (I realize the comment already described this as "the root of the workspace".)
A related question is how workspace members' code is expected to get onto the import search path. Do we offer some red-knot-specific functionality to make this happen automatically, or do we require that either workspace members are editable-installed into the venv (and thus we find them via .pth
files) or else you have to use extra_paths
explicitly for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did ask about this in Discord because I felt we discussed this before and it was unclear to me how it should work. @AlexWaygood then gave me a reply that seemed reasonable to me, see https://discord.com/channels/1039017663004942429/1228460843033821285/1304410214757433364
Generally, this PR doesn't change how we resolve the search path settings. It keeps doing what we used to do and I also don't mind reverting the rename of the SearchPathSettings::workspace_root
.
A related question is how workspace members' code is expected to get onto the import search path. Do we offer some red-knot-specific functionality to make this happen automatically, or do we require that either workspace members are editable-installed into the venv (and thus we find them via .pth files) or else you have to use extra_paths explicitly for them?
I understand that we rely on the package manager to put projects on the path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with everything @AlexWaygood says in the first two paragraphs of that Discord comment. I'm not sure about this part:
I think we should probably rename the first-party search path; calling it src is kind of confusing. It's not meant to correspond to the src directory inside a workspace; it's meant to correspond to the workspace root
In a project layout where the Python packages all go inside a src/
subdirectory, at runtime the project must be installed to be importable, and usually it is run via some installed command/script, and .
is not actually added to the import search path in that case. Which is a good thing, because if it were, you'd end up with overlapping search paths and all your modules importable as both foo
and src.foo
, which is not good, especially at runtime.
It is not clear to me how we can detect this project structure and avoid putting the workspace root on the search path in this case. I guess one option is just to punt on it and hope nobody is affected by the fact that src.foo
imports are wrongly allowed.
I understand that we rely on the package manager to put projects on the path.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm also not clear that it's right to have only one "first-party" search path. I guess it partly depends what we mean by "first-party", but to the extent that it influences things like "which code do we show diagnostics for", I think it is important that all code within the workspace root (even if it is inside a src/
subdir or a workspace member, and thus discovered via editable-install .pth
files in the venv) should be considered as "first-party".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm also not clear that it's right to have only one "first-party" search path. I guess it partly depends what we mean by "first-party", but to the extent that it influences things like "which code do we show diagnostics for", I think it is important that all code within the workspace root (even if it is discovered via editable-install
.pth
files in the venv) should be considered as "first-party".
I suppose that depends what we're using the "first-party search path" for. If we're using it as the canonical definition for what constitutes "first-party" code, then what you say makes sense. But do we need to tie those two concepts together? (I guess it would simplify some things?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are obviously lots of possible ways to structure this that could work, but I think it's pretty confusing to have a "first-party search path" and then a concept of "first-party code" which is not related to the first-party search path. It overloads the term "first-party" and suggests that one of those things should be renamed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. If we're going to go with the "there should only ever be one of this kind of search path", I'd rename it everywhere to workspace_root
or just root
, rather than first_party
.
|
||
let root = path.to_path_buf(); | ||
if !system.is_directory(path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run red-knot on a single file within a package (if you want to see errors only from that one file, but you want imports from other modules to work)? If so, is it the caller's responsibility to call this function with just the directory portion of the file name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run red-knot on a single file within a package (if you want to see errors only from that one file, but you want imports from other modules to work)?
Not today, long-term yes. But it doesn't fundamentally change how red knot works. It only changes which files it calls check
for.
In the file case, red knot searches the workspace, and then sets the open_files
(or possibly open_paths
, this is TBD) to the paths provided via the CLI and then calls check
.
It's important that the workspace discovery is done as if no path was provided to guarantee that red knot picks up the same settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, but I'm not sure if I fully understand this comment:
It's important that the workspace discovery is done as if no path was provided to guarantee that red knot picks up the same settings.
Workspace discovery always starts with some path that is provided by the user, right? So what does it mean to do workspace discovery "as if no path was provided"?
// Create a package with a default configuration | ||
PackageMetadata { | ||
name: path.file_name().unwrap_or("<virtual>").into(), | ||
root: path.to_path_buf(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be looking for __init__.py
and walking up until we don't find one here, so that virtual packages can work (and imports within them can work) even if you give a path inside the package? I think both pyright and mypy have some version of this logic. It doesn't work with namespace packages, but I'm not sure there's anything that can be done there -- at that point we can't really figure out where the root is supposed to be, and you need explicit config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could consider adding a feature like this if users ask for it in the future. Supporting it would complicate things a bit, because we then create a workspace that covers more files than path
, that means that red knot needs to filter out all files that are outside path
before calling check
.
}; | ||
|
||
let Some(name) = project.name.as_ref() else { | ||
return Err(WorkspaceDiscoveryError::MissingPackageName { package_path: root }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason this needs to be an error for us? It seems odd to me that we'll happily synthesize a workspace with the name <virtual>
if you don't have a pyproject.toml
at all, but if you do have a pyproject.toml
we'll error if it doesn't supply a name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a strong reason for it. The same as that it doesn't matter much to us whether packages have unique names inside a workspace. But I do see value in having the discovery consistent between uv and red-knot and name
is a required setting according to the pyproject.toml
specification, unless the project
table isn't specified at all.
That's why I think the question rather is: Should the project
table be required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason this needs to be an error for us? It seems odd to me that we'll happily synthesize a workspace with the name
We also only use virtual
in case red knot runs in /
. It otherwise uses the directory name as fallback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be reasonable to allow this (projects without a project
table). The project
table isn't required in the spec, and recent PEPs (like the dependency groups PEP) have tried to make those more first-class. We don't fully support them in uv (we allow them as workspace roots, but it's considered legacy), but we arguably should in the future.
So, in short: I think it would make more sense for red-knot to allow members / roots that lack a [project]
table, but it's also reasonable to defer in favor of consistency with uv.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning towards keeping it as is for now. We might actually need the name, depending on where we land on how workspaces integrate into module resolution (do we rely on the packager, in this case we won't need the name, or do we have our own, custom handling for packages, in which case we need the name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning towards keeping it as is for now. We might actually need the name, depending on where we land on how workspaces integrate into module resolution (do we rely on the packager, in this case we won't need the name, or do we have our own, custom handling for packages, in which case we need the name)
I don't understand how these two things are related at all, or why we would ever need a project name for module resolution. The project name is just PyPI metadata that's unrelated to the import system or package names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked this through in our 1:1 and this was a misunderstanding on my end of how python does module resolution.
let workspace = WorkspaceMetadata::discover(&root, &system, None) | ||
.context("Failed to discover workspace")?; | ||
|
||
assert_eq!(workspace.root(), &*root); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the project structure in this test, the workspace root should not be a search path at all for the module resolver; only the src/
subdirectory should be. Because Python imports in this structure won't look like import src.foo
, they'll look like import foo
. It's not clear to me the intention in this PR of how to handle that.
Maybe we expect the src/
subdirectory to get onto the search path via editable install of this project into the venv, but in that case we really don't want to add the workspace root to the search path, because then both /app
and /app/src
would be on the search path and we'd wrongly support both import src.foo
and import foo
as different ways to import the same module, and this is really confusing; those imports won't both work at runtime, and if they do that's even worse, because you end up with two separate copies of the foo
module and all its contents. Overlapping search paths are bad.
If the intention is that SearchPathSettings::workspace_root
should not itself go onto the SearchPaths::static_paths
at all, but instead should just be used to categorize other search paths discovered (e.g. from editable-install pth files in the venv) into first-party vs third-party (depending whether they are within the workspace root), this could work, but it would mean that we need to change the code in SearchPaths::from_settings
so that it doesn't add workspace_root
to static_paths
.
In this case my concern would be that I think some projects don't install their main first-party codebase into their venv as an editable install, and we need to figure out how to support that case (and how to tell the difference.) I think other type checkers don't rely on editable install of the main first-party codebase.
|
||
system | ||
.memory_file_system() | ||
.write_files([(root.join("src/foo.py"), ""), (root.join("src/bar.py"), "")]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it somewhat confusing to use the src/
naming convention in this test, because I think it strongly suggests that this case simply represents user error.
The specific name src/
suggests "my top-level Python package(s) live inside app/src/
, and that should be the search path root", not "my top-level Python package is named src
, and app/
should be the search path root". But our handling here assumes the latter.
If someone has a Python codebase that isn't packaged with a pyproject.toml
, it's unlikely they put their top-level Python modules inside a src/
subdirectory; they probably just put them directly in /app
. (The reason this is likely is that if they don't have packaging metadata, they can't easily editable-install their project into the venv, and so they are probably relying on Python's implicit "CWD is on sys.path
" behavior.) I've seen lots of real-world apps without packaging metadata that worked this way.
If they don't have packaging metadata, they should run red-knot directly on their Python source tree, not on a parent directory of it, because we don't have a good way to figure out which subdirectory is the Python import root (unless we special-case src/
and assume that a subdirectory with that name is where the Python code lives.)
If they do run red-knot on /app
when their Python modules live in app/src
, and we have no packaging metadata to tell us that the src/
subdirectory is the real import root, I guess we have to just assume that /app
is the import root (and their top-level Python package is actually named src
), since there's no config to tell us otherwise. But given the specific subdirectory name src
, this is probably the wrong conclusion; most likely all their imports will fail to resolve because they import foo
, they don't import src.foo
. We could consider some heuristics, e.g. treat the subdirectory name src/
specially, or look for whether or not there is a src/__init__.py
.
This is also related (in the opposite direction) to whether if they run red-knot on /app/src/foo/bar/
and /app/src/foo/__init__.py
exists, should we walk up to the first directory that doesn't have __init__.py
and determine that app/src/
is the actual root, or should we just let their imports fail because we are assuming the wrong root. (I think in this case we should do the walk-up.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also related (in the opposite direction) to whether if they run red-knot on /app/src/foo/bar/ and /app/src/foo/init.py exists, should we walk up to the first directory that doesn't have init.py and determine that app/src/ is the actual root, or should we just let their imports fail because we are assuming the wrong root. (I think in this case we should do the walk-up.)
This is an interesting example and I think it might make sense for us to walk up the tree, but I'm not sure if it should change the workspace-root or only the inferred SearchPathSettings
. I'd like to exclude this for now but it's something I should bring up in the CLI design document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it's totally fine to defer the walking-up thing, as long as we have it noted somehow as something we probably want to address in future.
In general I don't think my comments on this PR are blocking, which is why I accepted. I don't think this PR does anything wrong, it's more that the rename to workspace_root
and the way the projects in the tests are structured highlight an existing issue that I think we need to come to a resolution on. (If the tests just didn't use a src/
subdirectory layout, then I think they would be fully correct as-is, and we'd just be punting the problem of how we handle src/
layouts for future resolution.)
src_root: self | ||
extra_paths: self.clone().extra_paths.unwrap_or_default(), | ||
workspace_root: self | ||
.clone() | ||
.src_root |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the src_root
from the SearchPathConfiguration
represents some explicit red-knot config where the user tells us their source root, and this overrides setting the discovered workspace root as the workspace root in the SearchPathSettings
?
… identical names (#9094) ## Summary Align uv's workspace discovery with red knots (see astral-sh/ruff#14308 (comment)) * Detect nested workspace inside the current workspace rather than testing if the current workspace is a member of any outer workspace. * Detect packages with identical names. ## Test Plan I added two integration tests. I also back ported the tests to main to verify that both these invalid workspaces aren't catched by uv today. That makes this, technically, a breaking change but I would consider the change a bug fix. --------- Co-authored-by: Charlie Marsh <[email protected]>
Summary
Disocver workspaces and packages by searching for
pyproject.toml
. This PR loads the pyproject.toml files but it disregards all settings other thanknot.workspace
andproject.name
. I plan to add settings support in a separate PR (this one is already chunky).path
's ancestor chain and find the firstpyproject.toml
.pyproject.toml
contains noknot.workspace
table, then keep traversing thepath
's ancestorchain until we find one or reach the root.
package (the first found package without a
knot.workspace
table is a member. If not, createa single package workspace for the closest package.
pyproject.toml
with aknot.workspace
table, then create a single-package workspace.pyproject.toml
, create an ad-hoc workspace forpath
that consists of a single package and uses the default settings.
Differences to uv
I tried to keep the implementation close to uv's behavior. The main differences are:
pyproject.toml
(nor in any of its ancestor) is a valid use case. Red knot then creates a single package workspace with the default settings.Red knot includes a member if it is explicitly listed in members but matches aexclude
glob. E.g.members=["packages/*", "frontend"]
andexclude=["frontend-*]
still includesfrontend
if the directory is an exact match (not a glob-match). I think uv always excludesfrontend
which seems counterintuitivemembers
globs and then removing excluded members. uv does that too, but not for the "current package" where it usesis_excluded_from_workspace
andis_included_in_workspace
. I found using only globs easier. (https://github.com/astral-sh/uv/blob/e47e8fe9656b9f23891f9651f7bbc881f9e2bc7f/crates/uv-workspace/src/workspace.rs#L1133-L1147)CC: @charliermarsh
Open questions
Right now, the discovery fails if a package has no name. This matches
uv
s behavior.Test Plan
Added unit tests and basic file watcher tests. We should add more file watcher tests but I struggled coming up with good examples because most test cases require some form of configuration support to make the change observable.