Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GCC 15 Wunterminated-string-initialization issues #1543

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ac000
Copy link
Member

@ac000 ac000 commented Jan 21, 2025

    Fix GCC 15 Wunterminated-string-initialization issues
    
    When building with the upcoming GCC 15, we are hit by the new
    Wunterminated-string-initialization compiler warning (however, due to
    -Werror, this is a build failure). E.g.
    
      src/nxt_string.c: In function ‘nxt_encode_uri’:
      src/nxt_string.c:601:36: error: initializer-string for array of ‘unsigned char’ is too long [-Werror=unterminated-string-initialization]
        601 |     static const u_char  hex[16] = "0123456789ABCDEF";
            |                                    ^~~~~~~~~~~~~~~~~~
    
    We have created a character array containing 16 elements, but is not NUL
    terminated. While this type of (lookup/translation table) use case is
    perfectly valid and we could simply disable the warning, it is probably
    still a useful warning to have enabled.
    
    There is a 'nonstring' attribute, but that doesn't influence this
    warning. Well not yet at least... but that looks to change!
    
    Lets just do what nginx does and create them NUL terminated. Except in
    src/nxt_http_parse.c where they *really* want to be non-NUL
    terminated and we use *arrays* of chars.
    
    Link: <https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/Common-Variable-Attributes.html>
    Link: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117178#c21>
    Signed-off-by: Andrew Clayton <[email protected]>

@ac000 ac000 force-pushed the unterminated-strings branch 3 times, most recently from fe37226 to a365f98 Compare January 21, 2025 04:03
@ac000 ac000 changed the title Don't declare unterminated character arrays/strings Fix GCC 15 Wunterminated-string-initialization issues Jan 21, 2025
@ac000 ac000 force-pushed the unterminated-strings branch from a365f98 to 72ea5ce Compare January 21, 2025 04:17
@ac000 ac000 marked this pull request as ready for review January 21, 2025 05:49
@ac000 ac000 requested review from hongzhidao and avahahn January 21, 2025 05:50
@ac000 ac000 force-pushed the unterminated-strings branch from 72ea5ce to 98f11e6 Compare January 21, 2025 15:51
@ac000
Copy link
Member Author

ac000 commented Jan 21, 2025

  • Highlight that this is a build failure...
$ git range-diff 72ea5ce9...98f11e65
1:  72ea5ce9 ! 1:  98f11e65 Fix GCC 15 Wunterminated-string-initialization issues
    @@ Commit message
         Fix GCC 15 Wunterminated-string-initialization issues
     
         When building with the upcoming GCC 15, we are hit by the new
    -    Wunterminated-string-initialization compiler warning. E.g.
    +    Wunterminated-string-initialization compiler warning (however, due to
    +    -Werror, this is a build failure). E.g.
     
           src/nxt_string.c: In function ‘nxt_encode_uri’:
           src/nxt_string.c:601:36: error: initializer-string for array of ‘unsigned char’ is too long [-Werror=unterminated-string-initialization]

Copy link
Contributor

@hongzhidao hongzhidao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that I’m unable to check if the change is reasonable.
As I understand, we introduce nxt_str_t to handle string with a non traditional way, that it explicitly specify its length. In this case, the stuff like hex is an array of u_char/uint8 type, I think they are not entirely strings whose ending char ‘\0’ is used.
I’d like to ask team’s review.
Thanks.

@ac000
Copy link
Member Author

ac000 commented Jan 21, 2025

Taking this as an example

    static const u_char  hex[16] = "0123456789ABCDEF";                          

That declares a non-NUL terminated string (by loosing the terminating \0 from the string literal), which in this case is OK, but could often be an error, which this new compiler warning warns about. See here for details (Yes, this was introduced by Alex!).

You know that these are usually used as translation table type things and is really being treated as just an array of characters.

So assuming we want to keep this new warning enabled. The two main options are

Make it NUL terminated, which this patch does. (As you say ) It doesn't matter that it's NUL terminated or not (other than using an extra byte) as we only ever access the array indices 0-15 and never the NUL byte.

This is what nginx does.

The other option would be to define it as a character array like

    static const u_char  hex[16] = { '0', '1', '2', '3', '4', '5', '6', '7',
                                     '8', '9', 'A', 'B', 'C', 'D', 'E', 'F' }; 

I chose the first option for most of the cases as it's simpler and they don't matter being NUL terminated.

You can see I took the second option for a couple of them where it does matter that they are non-NUL terminated.

I suppose you could maybe declare it as a nxt_str_t... OK, lets stop there... (besides that's just going to give you a pointer to a NUL terminated string literal, which is no better than char [] and char [] better describes its purpose...)

When building with the upcoming GCC 15, we are hit by the new
Wunterminated-string-initialization compiler warning (however, due to
-Werror, this is a build failure). E.g.

  src/nxt_string.c: In function ‘nxt_encode_uri’:
  src/nxt_string.c:601:36: error: initializer-string for array of ‘unsigned char’ is too long [-Werror=unterminated-string-initialization]
    601 |     static const u_char  hex[16] = "0123456789ABCDEF";
        |                                    ^~~~~~~~~~~~~~~~~~

We have created a character array containing 16 elements, but is not NUL
terminated. While this type of (lookup/translation table) use case is
perfectly valid and we could simply disable the warning, it is probably
still a useful warning to have enabled.

There is a 'nonstring' attribute, but that doesn't influence this
warning. Well not yet at least... but that looks to change!

Lets just do what nginx does and create them NUL terminated. Except in
src/nxt_http_parse.c where they *really* want to be non-NUL
terminated and we use *arrays* of chars.

Link: <https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/Common-Variable-Attributes.html>
Link: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117178#c21>
Signed-off-by: Andrew Clayton <[email protected]>
@ac000 ac000 force-pushed the unterminated-strings branch from 98f11e6 to 36d6901 Compare January 21, 2025 23:04
@ac000
Copy link
Member Author

ac000 commented Jan 21, 2025

  • There is work under way to make the nonstring attribute silence the Wunterminated-string-initialization warning. Although I think it's too late for that work to make it into GCC 15 as it's now in the regression and documentation fixes stage...
 $ git range-diff 98f11e65...36d69013
1:  98f11e65 ! 1:  36d69013 Fix GCC 15 Wunterminated-string-initialization issues
    @@ Commit message
         still a useful warning to have enabled.
     
         There is a 'nonstring' attribute, but that doesn't influence this
    -    warning.
    +    warning. Well not yet at least... but that looks to change!
     
         Lets just do what nginx does and create them NUL terminated. Except in
         src/nxt_http_parse.c where they *really* want to be non-NUL
         terminated and we use *arrays* of chars.
     
    +    Link: <https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/Common-Variable-Attributes.html>
    +    Link: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117178#c21>
         Signed-off-by: Andrew Clayton <[email protected]>
     
      ## src/nxt_http_parse.c ##

@hongzhidao
Copy link
Contributor

To minimize it, it’s d discussion about if the below code is good or not.

static const u_char hex[16] = "0123456789ABCDEF";

Personally I prefer to keep the explicit [16] because:

  1. It clearly expresses the intent of a hexadecimal lookup table
  2. Provides explicit boundary checking
  3. Self-documenting code - size is known without looking at the initialization value

Regarding string initialization: The original syntax fits this lookup table use case. GCC 15's warning is a bit overzealous here. We don't actually need this string to be NUL-terminated since it's only accessed through 0-15 indices.

Removing [16] just to satisfy the compiler warning actually obscures the code's intent. If we really need to handle this warning, I'd prefer to suppress it with a compiler directive rather than change the code structure.

Welcome to others suggestions on it. It’s also a discussion if we need to support GCC 15 and the new compile option.

@ac000
Copy link
Member Author

ac000 commented Jan 22, 2025

To minimize it, it’s d discussion about if the below code is good or not.

static const u_char hex[16] = "0123456789ABCDEF";

Personally I prefer to keep the explicit [16] because:

This fails to build (due to -Werror) with GCC 15 (I.e. the Fedora Rawhide GCC CI check, the Clang check is failing due to Ruby 3.4 API deprecations)... which is the whole point of this PR...

Regarding string initialization: The original syntax fits this lookup table use case. GCC 15's warning is a bit overzealous here. We don't actually need this string to be NUL-terminated since it's only accessed through 0-15 indices.

No, I know we don't need it to be NUL terminated, if we did then we had bigger problems.

Making it NUL terminated is harmless however... and is what the nginx folks did. nginx actually builds fine with GCC 15 (with -Wextra).

Removing [16] just to satisfy the compiler warning actually obscures the code's intent. If we really need to handle this

Yes, I agree with that...

warning, I'd prefer to suppress it with a compiler directive rather than change the code structure.

Welcome to others suggestions on it. It’s also a discussion if we need to support GCC 15 and the new compile option.

Do we need to support GCC 15?, err yes... why wouldn't we? We support all versions of Clang/GCC that support -std=gnu11

Seeing as there is work in progress to make the nonstring variable attribute quell this warning, that will hopefully make GCC 16. We could temporarily disable this warning on GCC and the re-enable it for GCC 16 (hopefully) and later.

@hongzhidao
Copy link
Contributor

I thought the PR is only because of the option -Wu-s-i* support in GCC 15.

@ac000
Copy link
Member Author

ac000 commented Jan 22, 2025

Correct.

@hongzhidao
Copy link
Contributor

While I'm comfortable with the current approach, I'll stay neutral as others might prefer the new GCC
15 option.

@ac000
Copy link
Member Author

ac000 commented Jan 22, 2025

Well, we'll need to do something... GCC 15 will land in Fedora 42 in April, but then even before that it is currently stopping the Fedora Rawhide GCC CI from running...

Comment on lines -519 to +522
static const u_char normal[256] nxt_aligned(64) =
/* The last '\0' is not needed because the string is NUL terminated */
static const u_char normal[] nxt_aligned(64) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ac000 ,

That comment is incorrect:

/* The last '\0' is not needed because the string is NUL terminated */

Even though this is a "string literal", it is not a string. The standard prohibits having NUL bytes in the middle of a string, and this is thus not a string. See footnote 75 in C23 (n3220): https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf#subsection.6.4.5.

75)A string literal may not be a string (see 7.1.1), because a null character can be embedded in it by a \0 escape sequence.

So I would disable the diagnostic. This change (I'm referring specifically to this hunk, and not the entire patch; most of the patch, I like it) is bad for readability, IMO. Instead, I'd take the opportunity to fix the few places where it wouldn't hurt to adapt to the warning, but then disable it until [[gnu::nonstring()]] allows you to re-enable it.

Cheers,
Alex

Cc: @thesamesam

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add the [[gnu::nonstring]] attribute already, so that later you only need to re-enable the diagnostic. That would already enhance readability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ac000 ,

That comment is incorrect:

/* The last '\0' is not needed because the string is NUL terminated */

Admittedly I just yanked that comment out of nginx, but In this case I would take string to just mean "array of characters".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ac000 ,
That comment is incorrect:

/* The last '\0' is not needed because the string is NUL terminated */

Admittedly I just yanked that comment out of nginx,

Ughhh. Maybe tell them that their comment is actively harmful?

but In this case I would take string to just mean "array of characters".

Yeah, but I this one is really too obscure. The new code feels like a hybrid between a string and a nonstring. And I'd say the [256] was actually informative, since it's not obvious to see how many zeros there are.

"the string is NUL terminated" reminds just so much to a proper string, that it will most likely confuse someone at some point.

I'd keep the old version here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, can you point to the comment in nginx? I can't find it with grep.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior related commits:

nginx/nginx@e5efadb

nginx/nginx@3338cfd

(sounds like this confused Igor himself.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this context, I would assume "string" just means "string of characters" where string is synonymous with 'series', like you might say "string of robberies".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this context, I would assume "string" just means "string of characters" where string is synonymous with 'series', like you might say "string of robberies".

I'd use "array" instead of string. Or "string literal", if you want. But using string seems inducing confusion.

Maybe I'm too much of a pedantic regarding wording issues; that's my job. :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, if we're being pedantic then I think the word would be pedant.

Anyway, I'm probably just going to close this PR and simply disable the warning for now, and then look at sprinkling __attribute__((nonstring))'s (nicer named) around when Kees' patch makes it in, maybe GCC 16, maybe even GCC 15, which would be prefect, or we may be sitting here in two years time still waiting for it to land. Never can tell... (No, think positive thoughts...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, if we're being pedantic then I think the word would be pedant.

Lol. True.

Anyway, I'm probably just going to close this PR and simply disable the warning for now,

I do like that one, however: #1543 (comment)

I'd do that change, and then turn the diagnostic off.

and then look at sprinkling __attribute__((nonstring))'s (nicer named) around when Kees' patch makes it in, maybe GCC 16, maybe even GCC 15, which would be prefect, or we may be sitting here in two years time still waiting for it to land. Never can tell... (No, think positive thoughts...)

:-)

Comment on lines -167 to +170
static const nxt_http_ver_t http11 = { "HTTP/1.1" };
static const nxt_http_ver_t http10 = { "HTTP/1.0" };
static const nxt_http_ver_t http11 =
{{ 'H', 'T', 'T', 'P', '/', '1', '.', '1' }};
static const nxt_http_ver_t http10 =
{{ 'H', 'T', 'T', 'P', '/', '1', '.', '0' }};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep this as it was. It was much more readable. I'd make it a [[gnu::nonstring]], and wait for a future compiler to re-enable the diagnostic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it looks hideous, but it meant we could keep the warning enabled... but then that whole nxt_http_ver_t thing is only used in that function and it seems to be purely to allow for doing a string comparison using unit64_t's...

Comment on lines -115 to +116
static const u_char hexadecimal[16] = "0123456789abcdef";
static const u_char HEXADECIMAL[16] = "0123456789ABCDEF";
static const u_char hexadecimal[] = "0123456789abcdef";
static const u_char HEXADECIMAL[] = "0123456789ABCDEF";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I like it. Everything that need not be a nonstring, should be a string. Everybody knows hexadecimal enough that the lost information is negligible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants