Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make each assignment to define a distinct variable with independent type #18516

Open
JukkaL opened this issue Jan 23, 2025 · 4 comments
Open

Comments

@JukkaL
Copy link
Collaborator

JukkaL commented Jan 23, 2025

These long-standing issues can be solved by generalizing the variable renaming pass that is used by --allow-redefinition:

Each assignment to a name would generate a separate internal variable. We will use "phi nodes" to merge these variables when different control flow paths assign to different variants. Here is a simple example:

def f() -> None:
    if c():
        x = 0
    else:
        x = ""
    reveal_type(x)

The new variable renaming pass would produce a new AST that resembles this program (note that phi(...) is a new special AST node type and not a function call):

def f() -> None:
    if c():
        x = 0
    else:
        x' = ""
    x'' = phi(x, x')
    reveal_type(x'')  # int | str

This resembles the static single assignment form (SSA) used by many compilers, but probably would not conform to it 100% due to various practical reasons.

I'm working on a prototype implementation.

I hope that we can make this sufficiently compatible with the current semantics so that we can enable it by default (in mypy 2.0, possibly).

An alternative way to provide similar functionality would be to infer variable types from multiple assignments, similar to what already happens if a variable is declared as x: object. The renaming approach has a few notable benefits:

  • It can (more) easily support partial types (e.g. inferring a list item type from an x.append call).
  • It's a generalization of how we've already implemented --allow-redefinition.
  • It should make it easier to generate efficient code in mypyc.
  • The implementation will mostly be a new renaming pass, so it won't make other parts of mypy much more complicated (though mypyc needs changes).
  • The conditional type binder has some tech debt and I'm not excited about making it even more complicated, which would be the case if we'd use the alternative approach.
@tyralla
Copy link
Collaborator

tyralla commented Jan 23, 2025

Cool! Do you think it makes sense and would be sufficiently easy to implement to differentiate between annotated and unannotated variables (eventually depending on another option)? I mean something like this:

--allow-redefinition-soft
x = 1
x = ""
reveal_type(x)  # `str`
y: int = 1
y = ""  # error
reveal_type(y)  # `int`
z: int = 1
z: str = ""  # error
reveal_type(z)  # `int`
--allow-redefinition-medium
x = 1
x = ""
reveal_type(x)  # `str`
y: int = 1
y = ""  # error
reveal_type(y)  # `int`
z: int = 1
z: str = ""
reveal_type(z)  # `str`
--allow-redefinition-strong
x = 1
x = ""
reveal_type(x)  # `str`
y: int = 1
y = ""
reveal_type(y)  # `str`
z: int = 1
z: str = ""
reveal_type(z)  # `str`

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jan 24, 2025

My current work-in-progress implementation behaves almost like your --allow-redefinition-soft example. However, the following is allowed, since the first definition won't be visible to the second as they are mutually exclusive:

if f():
    x: int = 0
else:
    x: str = ""

I'm still figuring out what the exact rules should be.

@ilevkivskyi
Copy link
Member

@JukkaL I was thinking about our today's discussion on implementing this using binder. I found one case where we need to be careful, imagine we have a conditional definition at class level that must be deferred:

class C:
    if bool():
        x = 1
    else:
        x = somehow.defer

then after deferring and processing some other code involving C.x we will get incomplete/incorrect type. So each time we defer we need to both:

  • reset var.type to None
  • reset var.is_ready to False

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jan 29, 2025

Created another issue which explains in some detail how to implement this using the binder:

I'm currently prototyping both approaches, and the binder based approach seems much simpler (but less powerful) so far, which is probably a good tradeoff. Using renaming for this seems pretty complicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants