Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable deprecated proto2 groups #630

Closed
wants to merge 1 commit into from
Closed

Conversation

timostamm
Copy link
Member

Groups are a deprecated proto2 feature. On the syntax level, they allow to declare a field and a message at the same time. On the wire level, the group message is encoded with delimited tags instead by length.

Here is an example - message M has a group field g:

syntax="proto2";
message M {
  optional group G = 1 {
    string field = 1;
  }
}

In descriptors, groups are represented as messages, and like many other implementations, Protobuf-ES generates groups as messages, Unfortunately, it does not implement the group wire format, and fails to serialize / parse messages correctly from the binary format.

This PR adds the following method to DescMessage:

  /**
   * Is this message a proto2 group?
   * Groups are a deprecated feature.
   */
  isLegacyProto2Group(): boolean;

For message M, the method returns false, but for message M.G, the method returns true.

To avoid serializing broken data, this PR updates protoc-gen-es to not generate code for groups. Message M.G and the field M.g will not be included in the generated code. Groups will still be handled as unknown fields, and are retained in a serialization roundtrip.

Since the group encoding will be revived with editions with the message_encoding feature, we will implement the group encoding soon, and it's quite possible we'll be able re-enable code generation for proto2 groups.

@timostamm timostamm requested review from smaye81, srikrsna-buf, a team and emcfarlane November 30, 2023 14:18
@jhump
Copy link
Member

jhump commented Nov 30, 2023

Is there any concern about backwards-compatibility now that this repo and the proto plugin are post-v1.0?

After all, groups would technically work for users if they were using protobuf-es in both the client and server.

Since the group encoding will be revived with editions with the message_encoding feature, we will implement the group encoding soon, and it's quite possible we'll be able re-enable code generation for proto2 groups.

If we are seriously considering adding back group support later, when the wire format is implemented, perhaps it would be best to just accelerate implementing the wire format?

Also, instead of DescMessage.isLegacyProto2Group(), you could add DescField.isDelimitedMessageEncoding(). That way it's relevant in both editions and proto2 files. The implementation can check the field descriptor proto's type field (if it's set to TYPE_GROUP) and examine the resolved feature set's message encoding value.

@timostamm
Copy link
Member Author

After all, groups would technically work for users if they were using protobuf-es in both the client and server.

Only by accident. It's quite clearly a bug to serialize data to the wire that's not conformant. That means it's fine to break behavior for a bug fix, unless it's likely that users depend on the broken behavior. Considering that groups have been deprecated with a notice in the documentation for longer than protobuf-es exists, that does seem unlikely.

Also, instead of DescMessage.isLegacyProto2Group(), you could add DescField.isDelimitedMessageEncoding(). That way it's relevant in both editions and proto2 files.

The intention was to not generate the unused messages for group declarations. message_encoding = DELIMITED can be used with any message. group implicitly creates a message that can't be referenced elsewhere.

Maybe that distinction isn't that important 🤔 I'm currently looking into the group encoding, and will get back to this.

@jhump
Copy link
Member

jhump commented Dec 1, 2023

group implicitly creates a message that can't be referenced elsewhere.

This is not accurate. It can be used from other messages like any other message. It just won't use group encoding from those other contexts. While this may not have been intentional when they originally created groups (and may be an artifact of the descriptor representation), it's still a part of the language that can't be changed due to backwards compatibility reasons.

So this is valid:

syntax = "proto2";
message Foo {
    optional group Bar = 1 {
        optional string name = 1;
        optional uint64 id = 2;
    }
}
message Baz {
    optional Foo.Bar bar = 1;
}

@timostamm
Copy link
Member Author

Oh boy, groups are a bit messy 🙁 We could stop generating group fields, but keep generating group messages. I think it's actually the cleaner solution to fully support proto2 groups. See #640.

@timostamm timostamm closed this Dec 5, 2023
@timostamm timostamm deleted the tstamm/disable-groups branch December 5, 2023 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants