Given the untyped_sequence
and int_sequence
below:
typedef struct {
void* data; // first item
size_t size; // number of items
size_t item_size; // item byte size
} untyped_sequence;
typedef struct {
int* data; // first int
size_t size; // number of ints
size_t item_size; // int byte size
} int_sequence;
QUESTION: Is it UB to put them as two union members, initialize an instance of that union using the int_sequence
member, then mutating the int
data using the untyped_sequence
member?
GCC, Clang and MSVC give no warnings about this, but that doesn't necessarily mean anything.
Minimal runnable example ():
#include <string.h>
#include <stdio.h>
typedef struct {
void* data; // first item
size_t size; // number of items
size_t item_size; // item byte size
} untyped_sequence;
typedef struct {
int* data; // first int
size_t size; // number of ints
size_t item_size; // int byte size
} int_sequence;
typedef union {
int_sequence typed;
untyped_sequence untyped;
} sequence;
void untyped_zero_first(untyped_sequence untyped) {
memset(untyped.data, 0, untyped.size * untyped.item_size);
}
int main(void) {
int ints[4] = {1, 2, 3, 4};
sequence s = {
.typed.data = ints,
.typed.size = 4,
.typed.item_size = sizeof(int)
};
untyped_zero_first(s.untyped);
// prints "0, 0, 0, 0" for GCC, Clang, MSVC - but is ut UB?
printf("%d, %d, %d, %d\n", ints[0], ints[1], ints[2], ints[3]);
}
Given the untyped_sequence
and int_sequence
below:
typedef struct {
void* data; // first item
size_t size; // number of items
size_t item_size; // item byte size
} untyped_sequence;
typedef struct {
int* data; // first int
size_t size; // number of ints
size_t item_size; // int byte size
} int_sequence;
QUESTION: Is it UB to put them as two union members, initialize an instance of that union using the int_sequence
member, then mutating the int
data using the untyped_sequence
member?
GCC, Clang and MSVC give no warnings about this, but that doesn't necessarily mean anything.
Minimal runnable example (https://godbolt./z/PT6ahh4qq):
#include <string.h>
#include <stdio.h>
typedef struct {
void* data; // first item
size_t size; // number of items
size_t item_size; // item byte size
} untyped_sequence;
typedef struct {
int* data; // first int
size_t size; // number of ints
size_t item_size; // int byte size
} int_sequence;
typedef union {
int_sequence typed;
untyped_sequence untyped;
} sequence;
void untyped_zero_first(untyped_sequence untyped) {
memset(untyped.data, 0, untyped.size * untyped.item_size);
}
int main(void) {
int ints[4] = {1, 2, 3, 4};
sequence s = {
.typed.data = ints,
.typed.size = 4,
.typed.item_size = sizeof(int)
};
untyped_zero_first(s.untyped);
// prints "0, 0, 0, 0" for GCC, Clang, MSVC - but is ut UB?
printf("%d, %d, %d, %d\n", ints[0], ints[1], ints[2], ints[3]);
}
Is this union pointer member type punning UB in C?
Yes, in that the language spec does not define the behavior (as opposed to explicitly declaring it undefined).
Unlike C++, C does not have a sense of an "active" member of a union. Accessing a different member than was initialized or last stored does not, in and of itself, produce undefined behavior. Since C17, the behavior is not even implementation-defined. You can just do it, which involves (as a note in the spec clarifies) reinterpreting the appropriate part of the stored value according to the type of the accessed member.
But in your particular case, that's not enough. C does not require that the size and representation of type void *
be the same as the size and representation of type int *
. As far as the spec is concerned, there is no telling, at the point where your example code calls untyped_zero_first(s.untyped)
, what s.untyped.data
points to. It might even be a trap representation if your implementation's void *
representation affords those.
In practice, you're unlikely to run into a modern platform in which different object pointer types in fact do have different size or representation, so your code is likely to work as intended, but C does not guarantee that.
Union Type-Punning Exception (C11, Section 6.5.2.3, Paragraph 3):
"A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, to the unit in which it resides), and vice versa."
"If the member used to access the contents of a union object is not the same as the member last stored into, the behavior is implementation-defined."
Effective Type Rule (C11, Section 6.5, Paragraph 7):
"An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of the object,
- a type that is the signed or unsigned type corresponding to the effective type of the object,
- a character type."
Strict Aliasing Rule (C11, Section 6.5, Paragraph 7):
- "An object shall have its stored value accessed only by an lvalue expression that has one of the following types: a type compatible with the effective type of the object..."
Answering in a few words:
Example invoking and not invoking UB assuming assuming the correctness if the implementation.
typedef struct {
void* data; // first item
size_t size; // number of items
size_t item_size; // item byte size
} untyped_sequence;
typedef struct {
int* data; // first int
size_t size; // number of ints
size_t item_size; // int byte size
} int_sequence;
typedef struct {
float* data; // first int
size_t size; // number of ints
size_t item_size; // int byte size
} float_sequence;
typedef union {
int_sequence typed;
untyped_sequence untyped;
float_sequence floatseq;
} sequence;
void untyped_zero_first(untyped_sequence untyped) {
memset(untyped.data, 0, untyped.size * untyped.item_size);
}
int main(void) {
int ints[4] = {1, 2, 3, 4};
//no UB here
sequence s =
{
.typed.data = ints,
.typed.size = 4,
.typed.item_size = sizeof(int)
};
untyped_zero_first(s.untyped);
printf("%d, %d, %d, %d\n", s.typed.data[0], s.typed.data[1], s.typed.data[2], s.typed.data[3]);
//UB
printf("%f, %f, %f, %f\n", s.floatseq.data[0], s.floatseq.data[1], s.floatseq.data[2], s.floatseq.data[3]);
}
void *
can be just converted to anint *
easily. – KamilCuk Commented Nov 17, 2024 at 15:25void *
andint *
may differ in size, code risks UB. Considervoid *
not fully well defined whenint *
is smaller. – chux Commented Nov 17, 2024 at 15:26untyped_sequence
andint_sequence
could differ in size. – chux Commented Nov 17, 2024 at 15:39