Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workspace discovery #14308

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Workspace discovery #14308

wants to merge 7 commits into from

Conversation

MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Nov 13, 2024

Summary

Disocver workspaces and packages by searching for pyproject.toml. This PR loads the pyproject.toml files but it disregards all settings other than knot.workspace and project.name. I plan to add settings support in a separate PR (this one is already chunky).

  1. Traverse upwards in the path's ancestor chain and find the first pyproject.toml.
  2. If the pyproject.toml contains no knot.workspace table, then keep traversing the path's ancestor
    chain until we find one or reach the root.
  3. If we've found a workspace, then resolve the workspace's members and assert that the closest
    package (the first found package without a knot.workspace table is a member. If not, create
    a single package workspace for the closest package.
  4. If there's no pyproject.toml with a knot.workspace table, then create a single-package workspace.
  5. If no ancestor directory contains any pyproject.toml, create an ad-hoc workspace for path
    that consists of a single package and uses the default settings.

Differences to uv

I tried to keep the implementation close to uv's behavior. The main differences are:

  • Running Red Knot in a directory without any pyproject.toml (nor in any of its ancestor) is a valid use case. Red knot then creates a single package workspace with the default settings.
  • uv errors when the current workspace is included in any outer workspace. However, I think it won't complain if the current workspace contains a member that is a workspace itself. This seems the wrong way around to me because what's important is that the current workspace has no nested workspace. Outer workspaces can be validated when running a command in them. This is why Red Knot only validates the workspace's members instead of discovering parent directories (https://github.com/astral-sh/uv/blob/e47e8fe9656b9f23891f9651f7bbc881f9e2bc7f/crates/uv-workspace/src/workspace.rs#L1193)
  • Red knot includes a member if it is explicitly listed in members but matches a exclude glob. E.g. members=["packages/*", "frontend"] and exclude=["frontend-*] still includes frontend if the directory is an exact match (not a glob-match). I think uv always excludes frontend which seems counterintuitive
  • The member discovery in Red Knot collects all member paths by running all members globs and then removing excluded members. uv does that too, but not for the "current package" where it uses is_excluded_from_workspace and is_included_in_workspace. I found using only globs easier. (https://github.com/astral-sh/uv/blob/e47e8fe9656b9f23891f9651f7bbc881f9e2bc7f/crates/uv-workspace/src/workspace.rs#L1133-L1147)

CC: @charliermarsh

Open questions

Right now, the discovery fails if a package has no name. This matches uvs behavior.

Test Plan

Added unit tests and basic file watcher tests. We should add more file watcher tests but I struggled coming up with good examples because most test cases require some form of configuration support to make the change observable.

@MichaReiser MichaReiser added the red-knot Multi-file analysis & type inference label Nov 13, 2024
Copy link
Contributor

github-actions bot commented Nov 13, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@MichaReiser
Copy link
Member Author

MichaReiser commented Nov 13, 2024

Oh damn you pre-commit. I'm just going to exclude that file over wasting time fighting pre-commit configuration.

@MichaReiser MichaReiser force-pushed the micha/workspace-discovery branch 5 times, most recently from 4b8d1cf to 1a6640a Compare November 13, 2024 16:20
@MichaReiser MichaReiser marked this pull request as ready for review November 13, 2024 16:28
@charliermarsh
Copy link
Member

I will review this today.

@charliermarsh
Copy link
Member

Oh I'm not requested here -- that's fine actually. Some feedback on the summary:

uv errors when the current workspace is included in any outer workspace. However, I think it won't complain if the current workspace contains a member that is a workspace itself. This seems the wrong way around to me because what's important is that the current workspace has no nested workspace. Outer workspaces can be validated when running a command in them.

No strong opinion though it'd be nice to change this in uv if you decide to do something different here.

Red knot includes a member if it is explicitly listed in members but matches a exclude glob. E.g. members=["packages/", "frontend"] and exclude=["frontend-] still includes frontend if the directory is an exact match (not a glob-match). I think uv always excludes frontend which seems counterintuitive

I would probably advise against doing this... It makes the logic more complex to implement, test, and document, and it deviates from uv without good reason? I would be surprised if this ever comes up. Even the example you gave isn't quite right given that the exclude contains a dash, if I'm understanding correctly ;)

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workspace discovery looks great, thank you!

My main question about this PR (and most of my comments) are about how we intend to map these workspace settings to a module resolver search paths config (which is one of the most important outputs from workspace discovery.) I think currently in this PR there is a mismatch between workspace discovery and the module resolver's translation to search paths, which means we will do the wrong thing in a lot of cases, but I'm not sure if it's the module resolver or the workspace code that should change, because I don't understand what the intended semantics of SearchPathSettings::workspace_root are.

My review is mostly focused on these semantic and project structure questions; relying on Charlie's greater relevant experience from uv for the more filesystem and pyproject oriented stuff.

pub struct SearchPathSettings {
/// List of user-provided paths that should take first priority in the module resolution.
/// Examples in other type checkers are mypy's MYPYPATH environment variable,
/// or pyright's stubPath configuration setting.
pub extra_paths: Vec<SystemPathBuf>,

/// The root of the workspace, used for finding first-party modules.
pub src_root: SystemPathBuf,
pub workspace_root: SystemPathBuf,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear what our intended semantics are in cases where the location of pyproject.toml is not the Python search path root, for example if the search path root for Python code is inside a src/ directory relative to pyproject.toml (I think this is a common setup.) Should workspace_root in this case be the directory containing pyproject.toml, or the src/ subdirectory?

It seems to me that the "workspace root" should probably be the directory containing pyproject.toml, but that path doesn't really belong in SearchPathSettings, it belongs in a project/workspace config struct. SearchPathSettings should IMO only contain the path(s) where we actually find Python imports relative to (which in the above example would be the src/ subdirectory.) So renaming a SearchPathSettings member to workspace_root seems confusing to me, and probably not the right direction? (I realize the comment already described this as "the root of the workspace".)

A related question is how workspace members' code is expected to get onto the import search path. Do we offer some red-knot-specific functionality to make this happen automatically, or do we require that either workspace members are editable-installed into the venv (and thus we find them via .pth files) or else you have to use extra_paths explicitly for them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did ask about this in Discord because I felt we discussed this before and it was unclear to me how it should work. @AlexWaygood then gave me a reply that seemed reasonable to me, see https://discord.com/channels/1039017663004942429/1228460843033821285/1304410214757433364

Generally, this PR doesn't change how we resolve the search path settings. It keeps doing what we used to do and I also don't mind reverting the rename of the SearchPathSettings::workspace_root.

A related question is how workspace members' code is expected to get onto the import search path. Do we offer some red-knot-specific functionality to make this happen automatically, or do we require that either workspace members are editable-installed into the venv (and thus we find them via .pth files) or else you have to use extra_paths explicitly for them?

I understand that we rely on the package manager to put projects on the path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with everything @AlexWaygood says in the first two paragraphs of that Discord comment. I'm not sure about this part:

I think we should probably rename the first-party search path; calling it src is kind of confusing. It's not meant to correspond to the src directory inside a workspace; it's meant to correspond to the workspace root

In a project layout where the Python packages all go inside a src/ subdirectory, at runtime the project must be installed to be importable, and usually it is run via some installed command/script, and . is not actually added to the import search path in that case. Which is a good thing, because if it were, you'd end up with overlapping search paths and all your modules importable as both foo and src.foo, which is not good, especially at runtime.

It is not clear to me how we can detect this project structure and avoid putting the workspace root on the search path in this case. I guess one option is just to punt on it and hope nobody is affected by the fact that src.foo imports are wrongly allowed.

I understand that we rely on the package manager to put projects on the path.

👍

Copy link
Contributor

@carljm carljm Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm also not clear that it's right to have only one "first-party" search path. I guess it partly depends what we mean by "first-party", but to the extent that it influences things like "which code do we show diagnostics for", I think it is important that all code within the workspace root (even if it is inside a src/ subdir or a workspace member, and thus discovered via editable-install .pth files in the venv) should be considered as "first-party".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm also not clear that it's right to have only one "first-party" search path. I guess it partly depends what we mean by "first-party", but to the extent that it influences things like "which code do we show diagnostics for", I think it is important that all code within the workspace root (even if it is discovered via editable-install .pth files in the venv) should be considered as "first-party".

I suppose that depends what we're using the "first-party search path" for. If we're using it as the canonical definition for what constitutes "first-party" code, then what you say makes sense. But do we need to tie those two concepts together? (I guess it would simplify some things?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are obviously lots of possible ways to structure this that could work, but I think it's pretty confusing to have a "first-party search path" and then a concept of "first-party code" which is not related to the first-party search path. It overloads the term "first-party" and suggests that one of those things should be renamed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. If we're going to go with the "there should only ever be one of this kind of search path", I'd rename it everywhere to workspace_root or just root, rather than first_party.


let root = path.to_path_buf();
if !system.is_directory(path) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run red-knot on a single file within a package (if you want to see errors only from that one file, but you want imports from other modules to work)? If so, is it the caller's responsibility to call this function with just the directory portion of the file name?

Copy link
Member Author

@MichaReiser MichaReiser Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run red-knot on a single file within a package (if you want to see errors only from that one file, but you want imports from other modules to work)?

Not today, long-term yes. But it doesn't fundamentally change how red knot works. It only changes which files it calls check for.

In the file case, red knot searches the workspace, and then sets the open_files (or possibly open_paths, this is TBD) to the paths provided via the CLI and then calls check.

It's important that the workspace discovery is done as if no path was provided to guarantee that red knot picks up the same settings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, but I'm not sure if I fully understand this comment:

It's important that the workspace discovery is done as if no path was provided to guarantee that red knot picks up the same settings.

Workspace discovery always starts with some path that is provided by the user, right? So what does it mean to do workspace discovery "as if no path was provided"?

// Create a package with a default configuration
PackageMetadata {
name: path.file_name().unwrap_or("<virtual>").into(),
root: path.to_path_buf(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be looking for __init__.py and walking up until we don't find one here, so that virtual packages can work (and imports within them can work) even if you give a path inside the package? I think both pyright and mypy have some version of this logic. It doesn't work with namespace packages, but I'm not sure there's anything that can be done there -- at that point we can't really figure out where the root is supposed to be, and you need explicit config.

Copy link
Member Author

@MichaReiser MichaReiser Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider adding a feature like this if users ask for it in the future. Supporting it would complicate things a bit, because we then create a workspace that covers more files than path, that means that red knot needs to filter out all files that are outside path before calling check.

};

let Some(name) = project.name.as_ref() else {
return Err(WorkspaceDiscoveryError::MissingPackageName { package_path: root });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason this needs to be an error for us? It seems odd to me that we'll happily synthesize a workspace with the name <virtual> if you don't have a pyproject.toml at all, but if you do have a pyproject.toml we'll error if it doesn't supply a name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a strong reason for it. The same as that it doesn't matter much to us whether packages have unique names inside a workspace. But I do see value in having the discovery consistent between uv and red-knot and name is a required setting according to the pyproject.toml specification, unless the project table isn't specified at all.

That's why I think the question rather is: Should the project table be required?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason this needs to be an error for us? It seems odd to me that we'll happily synthesize a workspace with the name

We also only use virtual in case red knot runs in /. It otherwise uses the directory name as fallback

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be reasonable to allow this (projects without a project table). The project table isn't required in the spec, and recent PEPs (like the dependency groups PEP) have tried to make those more first-class. We don't fully support them in uv (we allow them as workspace roots, but it's considered legacy), but we arguably should in the future.

So, in short: I think it would make more sense for red-knot to allow members / roots that lack a [project] table, but it's also reasonable to defer in favor of consistency with uv.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards keeping it as is for now. We might actually need the name, depending on where we land on how workspaces integrate into module resolution (do we rely on the packager, in this case we won't need the name, or do we have our own, custom handling for packages, in which case we need the name)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards keeping it as is for now. We might actually need the name, depending on where we land on how workspaces integrate into module resolution (do we rely on the packager, in this case we won't need the name, or do we have our own, custom handling for packages, in which case we need the name)

I don't understand how these two things are related at all, or why we would ever need a project name for module resolution. The project name is just PyPI metadata that's unrelated to the import system or package names.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked this through in our 1:1 and this was a misunderstanding on my end of how python does module resolution.

_typos.toml Show resolved Hide resolved
let workspace = WorkspaceMetadata::discover(&root, &system, None)
.context("Failed to discover workspace")?;

assert_eq!(workspace.root(), &*root);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the project structure in this test, the workspace root should not be a search path at all for the module resolver; only the src/ subdirectory should be. Because Python imports in this structure won't look like import src.foo, they'll look like import foo. It's not clear to me the intention in this PR of how to handle that.

Maybe we expect the src/ subdirectory to get onto the search path via editable install of this project into the venv, but in that case we really don't want to add the workspace root to the search path, because then both /app and /app/src would be on the search path and we'd wrongly support both import src.foo and import foo as different ways to import the same module, and this is really confusing; those imports won't both work at runtime, and if they do that's even worse, because you end up with two separate copies of the foo module and all its contents. Overlapping search paths are bad.

If the intention is that SearchPathSettings::workspace_root should not itself go onto the SearchPaths::static_paths at all, but instead should just be used to categorize other search paths discovered (e.g. from editable-install pth files in the venv) into first-party vs third-party (depending whether they are within the workspace root), this could work, but it would mean that we need to change the code in SearchPaths::from_settings so that it doesn't add workspace_root to static_paths.

In this case my concern would be that I think some projects don't install their main first-party codebase into their venv as an editable install, and we need to figure out how to support that case (and how to tell the difference.) I think other type checkers don't rely on editable install of the main first-party codebase.


system
.memory_file_system()
.write_files([(root.join("src/foo.py"), ""), (root.join("src/bar.py"), "")])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it somewhat confusing to use the src/ naming convention in this test, because I think it strongly suggests that this case simply represents user error.

The specific name src/ suggests "my top-level Python package(s) live inside app/src/, and that should be the search path root", not "my top-level Python package is named src, and app/ should be the search path root". But our handling here assumes the latter.

If someone has a Python codebase that isn't packaged with a pyproject.toml, it's unlikely they put their top-level Python modules inside a src/ subdirectory; they probably just put them directly in /app. (The reason this is likely is that if they don't have packaging metadata, they can't easily editable-install their project into the venv, and so they are probably relying on Python's implicit "CWD is on sys.path" behavior.) I've seen lots of real-world apps without packaging metadata that worked this way.

If they don't have packaging metadata, they should run red-knot directly on their Python source tree, not on a parent directory of it, because we don't have a good way to figure out which subdirectory is the Python import root (unless we special-case src/ and assume that a subdirectory with that name is where the Python code lives.)

If they do run red-knot on /app when their Python modules live in app/src, and we have no packaging metadata to tell us that the src/ subdirectory is the real import root, I guess we have to just assume that /app is the import root (and their top-level Python package is actually named src), since there's no config to tell us otherwise. But given the specific subdirectory name src, this is probably the wrong conclusion; most likely all their imports will fail to resolve because they import foo, they don't import src.foo. We could consider some heuristics, e.g. treat the subdirectory name src/ specially, or look for whether or not there is a src/__init__.py.

This is also related (in the opposite direction) to whether if they run red-knot on /app/src/foo/bar/ and /app/src/foo/__init__.py exists, should we walk up to the first directory that doesn't have __init__.py and determine that app/src/ is the actual root, or should we just let their imports fail because we are assuming the wrong root. (I think in this case we should do the walk-up.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also related (in the opposite direction) to whether if they run red-knot on /app/src/foo/bar/ and /app/src/foo/init.py exists, should we walk up to the first directory that doesn't have init.py and determine that app/src/ is the actual root, or should we just let their imports fail because we are assuming the wrong root. (I think in this case we should do the walk-up.)

This is an interesting example and I think it might make sense for us to walk up the tree, but I'm not sure if it should change the workspace-root or only the inferred SearchPathSettings. I'd like to exclude this for now but it's something I should bring up in the CLI design document

Copy link
Contributor

@carljm carljm Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it's totally fine to defer the walking-up thing, as long as we have it noted somehow as something we probably want to address in future.

In general I don't think my comments on this PR are blocking, which is why I accepted. I don't think this PR does anything wrong, it's more that the rename to workspace_root and the way the projects in the tests are structured highlight an existing issue that I think we need to come to a resolution on. (If the tests just didn't use a src/ subdirectory layout, then I think they would be fully correct as-is, and we'd just be punting the problem of how we handle src/ layouts for future resolution.)

src_root: self
extra_paths: self.clone().extra_paths.unwrap_or_default(),
workspace_root: self
.clone()
.src_root
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the src_root from the SearchPathConfiguration represents some explicit red-knot config where the user tells us their source root, and this overrides setting the discovered workspace root as the workspace root in the SearchPathSettings?

charliermarsh added a commit to astral-sh/uv that referenced this pull request Nov 15, 2024
… identical names (#9094)

## Summary

Align uv's workspace discovery with red knots (see
astral-sh/ruff#14308 (comment))

* Detect nested workspace inside the current workspace rather than
testing if the current workspace is a member of any outer workspace.
* Detect packages with identical names.

## Test Plan

I added two integration tests. I also back ported the tests to main to
verify that both these invalid workspaces aren't catched by uv today.
That makes this, technically, a breaking change but I would consider the
change a bug fix.

---------

Co-authored-by: Charlie Marsh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants