Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python library does not deserialise maps with extra serialised fields #20109

Open
TrentHouliston opened this issue Jan 25, 2025 · 1 comment
Open
Labels

Comments

@TrentHouliston
Copy link
Contributor

TrentHouliston commented Jan 25, 2025

The python version of the protobuf deserialiser doesn't deserialise map files properly if they have additional fields in the serialised data.

If you have two protocol buffers like so

message A {
    map<int32, string> a = 1;
}

message B {
    message BB {
        optional int32 key = 1;
        optional string value = 2;
    }

    repeated BB a = 2;
}

Then according to the protobuf spec they should be encoded identically. However in the Python decoder at least, if there is an additional field in message like

message C {
    message CC {
        optional int32 key = 1;
        optional string value = 2;
        optional int32 other = 3;
    }

    repeated CC a = 2;
}

Then when decoding C as an A rather than just the other field being ignored (like it would be for message B) instead the entire key/value pair ends up not being decoded and silently ignored.

Instead, you should get each map value as normal with the extra field ignored as this would match the "maps are equivalent to a repeated virtual message" behaviour.

When encoding and decoding using protoc it displays that behaviour, so at the very least, it is inconsistent.

What version of protobuf and what language are you using?
protoc --version libprotoc 3.21.4
pip show protobuf Version: 5.29.3

Language: Python

What operating system (Linux, Windows, ...) and version?
Linux/MacOS

What runtime / compiler are you using (e.g., python version or gcc version)
Python 3.10.5

@TrentHouliston TrentHouliston added the untriaged auto added to all issues by default when created. label Jan 25, 2025
@TrentHouliston TrentHouliston changed the title Python library does not deserialise maps with extra serialised fields. Python library does not deserialise maps with extra serialised fields Jan 25, 2025
@acozzette
Copy link
Member

I believe that what happens in this case is that upb puts the entire map entry (i.e. key + value + other) in the unknown fields. So if you look at the map field it will appear to be missing that entry, but the entry will still be there if you serialize the message. I assume the rationale is that this way we're able to preserve the contents of other, whereas it would be silently dropped if we were to parse it into the map.

It would be nice for the behavior in this situation to be more consistent, but I don't think we will be able to prioritize working on it unfortunately. It's probably best to try to avoid this edge case of unknown fields in map entries.

@acozzette acozzette added upb and removed untriaged auto added to all issues by default when created. labels Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants