Workspace discovery #14308

MichaReiser · 2024-11-13T08:21:46Z

Summary

Disocver workspaces and packages by searching for pyproject.toml. This PR loads the pyproject.toml files but it disregards all settings other than knot.workspace and project.name. I plan to add settings support in a separate PR (this one is already chunky).

Traverse upwards in the path's ancestor chain and find the first pyproject.toml.
If the pyproject.toml contains no knot.workspace table, then keep traversing the path's ancestor
chain until we find one or reach the root.
If we've found a workspace, then resolve the workspace's members and assert that the closest
package (the first found package without a knot.workspace table is a member. If not, create
a single package workspace for the closest package.
If there's no pyproject.toml with a knot.workspace table, then create a single-package workspace.
If no ancestor directory contains any pyproject.toml, create an ad-hoc workspace for path
that consists of a single package and uses the default settings.

Differences to uv

I tried to keep the implementation close to uv's behavior. The main differences are:

Running Red Knot in a directory without any pyproject.toml (nor in any of its ancestor) is a valid use case. Red knot then creates a single package workspace with the default settings.
uv errors when the current workspace is included in any outer workspace. However, I think it won't complain if the current workspace contains a member that is a workspace itself. This seems the wrong way around to me because what's important is that the current workspace has no nested workspace. Outer workspaces can be validated when running a command in them. This is why Red Knot only validates the workspace's members instead of discovering parent directories (https://github.com/astral-sh/uv/blob/e47e8fe9656b9f23891f9651f7bbc881f9e2bc7f/crates/uv-workspace/src/workspace.rs#L1193)
Red knot includes a member if it is explicitly listed in members but matches a exclude glob. E.g. members=["packages/*", "frontend"] and exclude=["frontend-*] still includes frontend if the directory is an exact match (not a glob-match). I think uv always excludes frontend which seems counterintuitive
The member discovery in Red Knot collects all member paths by running all members globs and then removing excluded members. uv does that too, but not for the "current package" where it uses is_excluded_from_workspace and is_included_in_workspace. I found using only globs easier. (https://github.com/astral-sh/uv/blob/e47e8fe9656b9f23891f9651f7bbc881f9e2bc7f/crates/uv-workspace/src/workspace.rs#L1133-L1147)

CC: @charliermarsh

Open questions

Right now, the discovery fails if a package has no name. This matches uvs behavior.

Test Plan

Added unit tests and basic file watcher tests. We should add more file watcher tests but I struggled coming up with good examples because most test cases require some form of configuration support to make the change observable.

github-actions · 2024-11-13T10:05:31Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

MichaReiser · 2024-11-13T12:24:29Z

Oh damn you pre-commit. I'm just going to exclude that file over wasting time fighting pre-commit configuration.

charliermarsh · 2024-11-13T17:23:45Z

I will review this today.

crates/red_knot_workspace/src/workspace/pyproject/package_name.rs

crates/ruff_db/src/system/memory_fs.rs

crates/red_knot_workspace/src/workspace/pyproject.rs

charliermarsh · 2024-11-13T17:34:04Z

Oh I'm not requested here -- that's fine actually. Some feedback on the summary:

uv errors when the current workspace is included in any outer workspace. However, I think it won't complain if the current workspace contains a member that is a workspace itself. This seems the wrong way around to me because what's important is that the current workspace has no nested workspace. Outer workspaces can be validated when running a command in them.

No strong opinion though it'd be nice to change this in uv if you decide to do something different here.

Red knot includes a member if it is explicitly listed in members but matches a exclude glob. E.g. members=["packages/", "frontend"] and exclude=["frontend-] still includes frontend if the directory is an exact match (not a glob-match). I think uv always excludes frontend which seems counterintuitive

I would probably advise against doing this... It makes the logic more complex to implement, test, and document, and it deviates from uv without good reason? I would be surprised if this ever comes up. Even the example you gave isn't quite right given that the exclude contains a dash, if I'm understanding correctly ;)

carljm

The workspace discovery looks great, thank you!

My main question about this PR (and most of my comments) are about how we intend to map these workspace settings to a module resolver search paths config (which is one of the most important outputs from workspace discovery.) I think currently in this PR there is a mismatch between workspace discovery and the module resolver's translation to search paths, which means we will do the wrong thing in a lot of cases, but I'm not sure if it's the module resolver or the workspace code that should change, because I don't understand what the intended semantics of SearchPathSettings::workspace_root are.

My review is mostly focused on these semantic and project structure questions; relying on Charlie's greater relevant experience from uv for the more filesystem and pyproject oriented stuff.

carljm · 2024-11-13T20:15:06Z

crates/red_knot_python_semantic/src/program.rs

 pub struct SearchPathSettings {
    /// List of user-provided paths that should take first priority in the module resolution.
    /// Examples in other type checkers are mypy's MYPYPATH environment variable,
    /// or pyright's stubPath configuration setting.
    pub extra_paths: Vec<SystemPathBuf>,

    /// The root of the workspace, used for finding first-party modules.
-    pub src_root: SystemPathBuf,
+    pub workspace_root: SystemPathBuf,


I'm not clear what our intended semantics are in cases where the location of pyproject.toml is not the Python search path root, for example if the search path root for Python code is inside a src/ directory relative to pyproject.toml (I think this is a common setup.) Should workspace_root in this case be the directory containing pyproject.toml, or the src/ subdirectory?

It seems to me that the "workspace root" should probably be the directory containing pyproject.toml, but that path doesn't really belong in SearchPathSettings, it belongs in a project/workspace config struct. SearchPathSettings should IMO only contain the path(s) where we actually find Python imports relative to (which in the above example would be the src/ subdirectory.) So renaming a SearchPathSettings member to workspace_root seems confusing to me, and probably not the right direction? (I realize the comment already described this as "the root of the workspace".)

A related question is how workspace members' code is expected to get onto the import search path. Do we offer some red-knot-specific functionality to make this happen automatically, or do we require that either workspace members are editable-installed into the venv (and thus we find them via .pth files) or else you have to use extra_paths explicitly for them?

I did ask about this in Discord because I felt we discussed this before and it was unclear to me how it should work. @AlexWaygood then gave me a reply that seemed reasonable to me, see https://discord.com/channels/1039017663004942429/1228460843033821285/1304410214757433364

Generally, this PR doesn't change how we resolve the search path settings. It keeps doing what we used to do and I also don't mind reverting the rename of the SearchPathSettings::workspace_root.

A related question is how workspace members' code is expected to get onto the import search path. Do we offer some red-knot-specific functionality to make this happen automatically, or do we require that either workspace members are editable-installed into the venv (and thus we find them via .pth files) or else you have to use extra_paths explicitly for them?

I understand that we rely on the package manager to put projects on the path.

I agree with everything @AlexWaygood says in the first two paragraphs of that Discord comment. I'm not sure about this part:

I think we should probably rename the first-party search path; calling it src is kind of confusing. It's not meant to correspond to the src directory inside a workspace; it's meant to correspond to the workspace root

In a project layout where the Python packages all go inside a src/ subdirectory, at runtime the project must be installed to be importable, and usually it is run via some installed command/script, and . is not actually added to the import search path in that case. Which is a good thing, because if it were, you'd end up with overlapping search paths and all your modules importable as both foo and src.foo, which is not good, especially at runtime.

It is not clear to me how we can detect this project structure and avoid putting the workspace root on the search path in this case. I guess one option is just to punt on it and hope nobody is affected by the fact that src.foo imports are wrongly allowed.

I understand that we rely on the package manager to put projects on the path.

👍

I guess I'm also not clear that it's right to have only one "first-party" search path. I guess it partly depends what we mean by "first-party", but to the extent that it influences things like "which code do we show diagnostics for", I think it is important that all code within the workspace root (even if it is inside a src/ subdir or a workspace member, and thus discovered via editable-install .pth files in the venv) should be considered as "first-party".

I guess I'm also not clear that it's right to have only one "first-party" search path. I guess it partly depends what we mean by "first-party", but to the extent that it influences things like "which code do we show diagnostics for", I think it is important that all code within the workspace root (even if it is discovered via editable-install .pth files in the venv) should be considered as "first-party".

I suppose that depends what we're using the "first-party search path" for. If we're using it as the canonical definition for what constitutes "first-party" code, then what you say makes sense. But do we need to tie those two concepts together? (I guess it would simplify some things?)

There are obviously lots of possible ways to structure this that could work, but I think it's pretty confusing to have a "first-party search path" and then a concept of "first-party code" which is not related to the first-party search path. It overloads the term "first-party" and suggests that one of those things should be renamed.

Agreed. If we're going to go with the "there should only ever be one of this kind of search path", I'd rename it everywhere to workspace_root or just root, rather than first_party.

carljm · 2024-11-13T20:36:39Z

crates/red_knot_workspace/src/workspace/metadata.rs


-        let root = path.to_path_buf();
+        if !system.is_directory(path) {


Can you run red-knot on a single file within a package (if you want to see errors only from that one file, but you want imports from other modules to work)? If so, is it the caller's responsibility to call this function with just the directory portion of the file name?

Can you run red-knot on a single file within a package (if you want to see errors only from that one file, but you want imports from other modules to work)?

Not today, long-term yes. But it doesn't fundamentally change how red knot works. It only changes which files it calls check for.

In the file case, red knot searches the workspace, and then sets the open_files (or possibly open_paths, this is TBD) to the paths provided via the CLI and then calls check.

It's important that the workspace discovery is done as if no path was provided to guarantee that red knot picks up the same settings.

Makes sense, but I'm not sure if I fully understand this comment:

It's important that the workspace discovery is done as if no path was provided to guarantee that red knot picks up the same settings.

Workspace discovery always starts with some path that is provided by the user, right? So what does it mean to do workspace discovery "as if no path was provided"?

carljm · 2024-11-13T21:03:00Z

crates/red_knot_workspace/src/workspace/metadata.rs

+            // Create a package with a default configuration
+            PackageMetadata {
+                name: path.file_name().unwrap_or("<virtual>").into(),
+                root: path.to_path_buf(),


Should we be looking for __init__.py and walking up until we don't find one here, so that virtual packages can work (and imports within them can work) even if you give a path inside the package? I think both pyright and mypy have some version of this logic. It doesn't work with namespace packages, but I'm not sure there's anything that can be done there -- at that point we can't really figure out where the root is supposed to be, and you need explicit config.

We could consider adding a feature like this if users ask for it in the future. Supporting it would complicate things a bit, because we then create a workspace that covers more files than path, that means that red knot needs to filter out all files that are outside path before calling check.

crates/red_knot_workspace/src/workspace/metadata.rs

carljm · 2024-11-13T21:10:22Z

crates/red_knot_workspace/src/workspace/metadata.rs

+        };
+
+        let Some(name) = project.name.as_ref() else {
+            return Err(WorkspaceDiscoveryError::MissingPackageName { package_path: root });


Is there any reason this needs to be an error for us? It seems odd to me that we'll happily synthesize a workspace with the name <virtual> if you don't have a pyproject.toml at all, but if you do have a pyproject.toml we'll error if it doesn't supply a name.

I don't think there's a strong reason for it. The same as that it doesn't matter much to us whether packages have unique names inside a workspace. But I do see value in having the discovery consistent between uv and red-knot and name is a required setting according to the pyproject.toml specification, unless the project table isn't specified at all.

That's why I think the question rather is: Should the project table be required?

Is there any reason this needs to be an error for us? It seems odd to me that we'll happily synthesize a workspace with the name

We also only use virtual in case red knot runs in /. It otherwise uses the directory name as fallback

I think it would be reasonable to allow this (projects without a project table). The project table isn't required in the spec, and recent PEPs (like the dependency groups PEP) have tried to make those more first-class. We don't fully support them in uv (we allow them as workspace roots, but it's considered legacy), but we arguably should in the future.

So, in short: I think it would make more sense for red-knot to allow members / roots that lack a [project] table, but it's also reasonable to defer in favor of consistency with uv.

I'm leaning towards keeping it as is for now. We might actually need the name, depending on where we land on how workspaces integrate into module resolution (do we rely on the packager, in this case we won't need the name, or do we have our own, custom handling for packages, in which case we need the name)

I'm leaning towards keeping it as is for now. We might actually need the name, depending on where we land on how workspaces integrate into module resolution (do we rely on the packager, in this case we won't need the name, or do we have our own, custom handling for packages, in which case we need the name)

I don't understand how these two things are related at all, or why we would ever need a project name for module resolution. The project name is just PyPI metadata that's unrelated to the import system or package names.

We talked this through in our 1:1 and this was a misunderstanding on my end of how python does module resolution.

_typos.toml

crates/red_knot_workspace/src/workspace/metadata.rs

carljm · 2024-11-13T21:32:24Z

crates/red_knot_workspace/src/workspace/metadata.rs

+        let workspace = WorkspaceMetadata::discover(&root, &system, None)
+            .context("Failed to discover workspace")?;
+
+        assert_eq!(workspace.root(), &*root);


In the project structure in this test, the workspace root should not be a search path at all for the module resolver; only the src/ subdirectory should be. Because Python imports in this structure won't look like import src.foo, they'll look like import foo. It's not clear to me the intention in this PR of how to handle that.

Maybe we expect the src/ subdirectory to get onto the search path via editable install of this project into the venv, but in that case we really don't want to add the workspace root to the search path, because then both /app and /app/src would be on the search path and we'd wrongly support both import src.foo and import foo as different ways to import the same module, and this is really confusing; those imports won't both work at runtime, and if they do that's even worse, because you end up with two separate copies of the foo module and all its contents. Overlapping search paths are bad.

If the intention is that SearchPathSettings::workspace_root should not itself go onto the SearchPaths::static_paths at all, but instead should just be used to categorize other search paths discovered (e.g. from editable-install pth files in the venv) into first-party vs third-party (depending whether they are within the workspace root), this could work, but it would mean that we need to change the code in SearchPaths::from_settings so that it doesn't add workspace_root to static_paths.

In this case my concern would be that I think some projects don't install their main first-party codebase into their venv as an editable install, and we need to figure out how to support that case (and how to tell the difference.) I think other type checkers don't rely on editable install of the main first-party codebase.

carljm · 2024-11-13T21:37:18Z

crates/red_knot_workspace/src/workspace/metadata.rs

+
+        system
+            .memory_file_system()
+            .write_files([(root.join("src/foo.py"), ""), (root.join("src/bar.py"), "")])


I find it somewhat confusing to use the src/ naming convention in this test, because I think it strongly suggests that this case simply represents user error.

The specific name src/ suggests "my top-level Python package(s) live inside app/src/, and that should be the search path root", not "my top-level Python package is named src, and app/ should be the search path root". But our handling here assumes the latter.

If someone has a Python codebase that isn't packaged with a pyproject.toml, it's unlikely they put their top-level Python modules inside a src/ subdirectory; they probably just put them directly in /app. (The reason this is likely is that if they don't have packaging metadata, they can't easily editable-install their project into the venv, and so they are probably relying on Python's implicit "CWD is on sys.path" behavior.) I've seen lots of real-world apps without packaging metadata that worked this way.

If they don't have packaging metadata, they should run red-knot directly on their Python source tree, not on a parent directory of it, because we don't have a good way to figure out which subdirectory is the Python import root (unless we special-case src/ and assume that a subdirectory with that name is where the Python code lives.)

If they do run red-knot on /app when their Python modules live in app/src, and we have no packaging metadata to tell us that the src/ subdirectory is the real import root, I guess we have to just assume that /app is the import root (and their top-level Python package is actually named src), since there's no config to tell us otherwise. But given the specific subdirectory name src, this is probably the wrong conclusion; most likely all their imports will fail to resolve because they import foo, they don't import src.foo. We could consider some heuristics, e.g. treat the subdirectory name src/ specially, or look for whether or not there is a src/__init__.py.

This is also related (in the opposite direction) to whether if they run red-knot on /app/src/foo/bar/ and /app/src/foo/__init__.py exists, should we walk up to the first directory that doesn't have __init__.py and determine that app/src/ is the actual root, or should we just let their imports fail because we are assuming the wrong root. (I think in this case we should do the walk-up.)

This is also related (in the opposite direction) to whether if they run red-knot on /app/src/foo/bar/ and /app/src/foo/init.py exists, should we walk up to the first directory that doesn't have init.py and determine that app/src/ is the actual root, or should we just let their imports fail because we are assuming the wrong root. (I think in this case we should do the walk-up.)

This is an interesting example and I think it might make sense for us to walk up the tree, but I'm not sure if it should change the workspace-root or only the inferred SearchPathSettings. I'd like to exclude this for now but it's something I should bring up in the CLI design document

Yes, I think it's totally fine to defer the walking-up thing, as long as we have it noted somehow as something we probably want to address in future.

In general I don't think my comments on this PR are blocking, which is why I accepted. I don't think this PR does anything wrong, it's more that the rename to workspace_root and the way the projects in the tests are structured highlight an existing issue that I think we need to come to a resolution on. (If the tests just didn't use a src/ subdirectory layout, then I think they would be fully correct as-is, and we'd just be punting the problem of how we handle src/ layouts for future resolution.)

carljm · 2024-11-13T22:26:43Z

crates/red_knot_workspace/src/workspace/settings.rs

-            src_root: self
+            extra_paths: self.clone().extra_paths.unwrap_or_default(),
+            workspace_root: self
+                .clone()
                .src_root


So the src_root from the SearchPathConfiguration represents some explicit red-knot config where the user tells us their source root, and this overrides setting the discovered workspace root as the workspace root in the SearchPathSettings?

… identical names (#9094) ## Summary Align uv's workspace discovery with red knots (see astral-sh/ruff#14308 (comment)) * Detect nested workspace inside the current workspace rather than testing if the current workspace is a member of any outer workspace. * Detect packages with identical names. ## Test Plan I added two integration tests. I also back ported the tests to main to verify that both these invalid workspaces aren't catched by uv today. That makes this, technically, a breaking change but I would consider the change a bug fix. --------- Co-authored-by: Charlie Marsh <[email protected]>

MichaReiser force-pushed the micha/workspace-discovery branch from eb9fa30 to 3cd8c12 Compare November 13, 2024 09:45

MichaReiser added 3 commits November 13, 2024 10:45

Basic workspace discovery without configuration support

8a0beff

Add system.glob

e4ba975

More integration tests

7522b98

MichaReiser force-pushed the micha/workspace-discovery branch from 3cd8c12 to 7522b98 Compare November 13, 2024 09:45

MichaReiser added the red-knot Multi-file analysis & type inference label Nov 13, 2024

Documentation, use explicit error codes, move tests

9a42466

MichaReiser force-pushed the micha/workspace-discovery branch from 82a48ae to 1fc09bb Compare November 13, 2024 12:14

Fix tests on windows

244a469

MichaReiser force-pushed the micha/workspace-discovery branch 5 times, most recently from 4b8d1cf to 1a6640a Compare November 13, 2024 16:20

Some windows annoyance

35d3969

MichaReiser force-pushed the micha/workspace-discovery branch from 1a6640a to 35d3969 Compare November 13, 2024 16:25

MichaReiser marked this pull request as ready for review November 13, 2024 16:28

MichaReiser requested review from carljm, AlexWaygood and sharkdp as code owners November 13, 2024 16:28

charliermarsh reviewed Nov 13, 2024

View reviewed changes

crates/red_knot_workspace/src/workspace/pyproject/package_name.rs Show resolved Hide resolved

charliermarsh reviewed Nov 13, 2024

View reviewed changes

crates/ruff_db/src/system/memory_fs.rs Show resolved Hide resolved

charliermarsh reviewed Nov 13, 2024

View reviewed changes

crates/red_knot_workspace/src/workspace/pyproject.rs Outdated Show resolved Hide resolved

MichaReiser mentioned this pull request Nov 13, 2024

Detect nested workspace inside the current workspace and members with identical names astral-sh/uv#9094

Merged

carljm approved these changes Nov 13, 2024

View reviewed changes

Remove is_excluded deviation

9bcdd7d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workspace discovery #14308

Workspace discovery #14308

MichaReiser commented Nov 13, 2024 •

edited

Loading

github-actions bot commented Nov 13, 2024 •

edited

Loading

MichaReiser commented Nov 13, 2024 •

edited

Loading

charliermarsh commented Nov 13, 2024

charliermarsh commented Nov 13, 2024

carljm left a comment

carljm Nov 13, 2024

MichaReiser Nov 14, 2024

carljm Nov 14, 2024

carljm Nov 14, 2024 •

edited

Loading

AlexWaygood Nov 14, 2024

carljm Nov 14, 2024

AlexWaygood Nov 14, 2024

carljm Nov 13, 2024

MichaReiser Nov 14, 2024 •

edited

Loading

carljm Nov 14, 2024

carljm Nov 13, 2024

MichaReiser Nov 14, 2024 •

edited

Loading

carljm Nov 13, 2024

MichaReiser Nov 14, 2024

MichaReiser Nov 14, 2024

charliermarsh Nov 14, 2024

MichaReiser Nov 14, 2024

carljm Nov 14, 2024

MichaReiser Nov 14, 2024

carljm Nov 13, 2024

carljm Nov 13, 2024

MichaReiser Nov 14, 2024

carljm Nov 14, 2024 •

edited

Loading

carljm Nov 13, 2024


		let root = path.to_path_buf();
		if !system.is_directory(path) {

Workspace discovery #14308

Are you sure you want to change the base?

Workspace discovery #14308

Conversation

MichaReiser commented Nov 13, 2024 • edited Loading

Summary

Differences to uv

Open questions

Test Plan

github-actions bot commented Nov 13, 2024 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

MichaReiser commented Nov 13, 2024 • edited Loading

charliermarsh commented Nov 13, 2024

charliermarsh commented Nov 13, 2024

carljm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carljm Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaReiser Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaReiser Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carljm Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaReiser commented Nov 13, 2024 •

edited

Loading

github-actions bot commented Nov 13, 2024 •

edited

Loading

`ruff-ecosystem` results

MichaReiser commented Nov 13, 2024 •

edited

Loading

carljm Nov 14, 2024 •

edited

Loading

MichaReiser Nov 14, 2024 •

edited

Loading

MichaReiser Nov 14, 2024 •

edited

Loading

carljm Nov 14, 2024 •

edited

Loading