I'm creating an DWARF parser, this parser must be able to detect which classes are abstract classes. The parser is created to parse the output of a certain code-base. I have a working method that solves 90% of the cases, but I want to reach full coverage of the code-base. To show what the final 10% is about, a minimum, reproducible exmaple was created.
Based on the C++ definition, we have the following statements:
In my example I have the following:
Interface
which is abstract, and only has pure virtual functionsPartial
which is abstract, because it derives from an abstract and does not override the function bFunc()
, so the final overrider is still pure virtual, hence the class itself is abstract.Full
which is concrete (i.e. not an abstract class), as the final specifiers of both functions are not pure virtual.The above classes are used as an interface, I furthermore specify two classes which I will use to create objects
Foo
, based on the Partial
classBar
, based on the Full
classThe code is shown below:
// Pure virtual abstract class, used as an interface
class Interface {
public:
virtual void aFunc() = 0;
virtual void bFunc() = 0;
};
// Still abstract Interface
class Partial: public Interface {
public:
void aFunc() override {}
};
// Concrete Interface
class Full: public Interface {
public:
void aFunc() override {}
void bFunc() override {}
};
// Concrete
class Foo : public Partial {
public:
void fooFunc(){ }
void bFunc() override {}
};
// Concrete
class Bar : public Full {
public:
void barFunc(){ }
};
Foo inst_foo;
Bar inst_bar;
Partial* inter_foo = &inst_foo;
Full* inter_bar = &inst_bar;
// Main
int main() {
return 0;
}
Class diagram
┌─────────┐
│Interface│
└────▲────┘
│
┌──────┴─────┐
┌───┴───┐ ┌───┴──┐
│Partial│ │ Full │
└───▲───┘ └───▲──┘
│ │
┌──┴──┐ ┌──┴──┐
│ Foo │ │ Bar │
└─────┘ └─────┘
Now let us examine the DWARF debug_info dump. I've separated the class description from the variable description. The full example could be found at:
Class-info
<1><33>: Abbrev Number: 15 (DW_TAG_class_type)
<34> DW_AT_name : (string) Foo
<38> DW_AT_byte_size : (implicit_const) 8
<38> DW_AT_decl_file : (implicit_const) 1
<38> DW_AT_decl_line : (data1) 22
<39> DW_AT_decl_column : (implicit_const) 7
<39> DW_AT_containing_type: (ref4) <0x348>
<3d> DW_AT_sibling : (ref4) <0xe4>
<2><41>: Abbrev Number: 9 (DW_TAG_inheritance)
<42> DW_AT_type : (ref4) <0x1a7>, Partial
<46> DW_AT_data_member_location: (implicit_const) 0
<46> DW_AT_accessibility: (implicit_const) 1 (public)
...
<1><fd>: Abbrev Number: 15 (DW_TAG_class_type)
<fe> DW_AT_name : (string) Bar
<102> DW_AT_byte_size : (implicit_const) 8
<102> DW_AT_decl_file : (implicit_const) 1
<102> DW_AT_decl_line : (data1) 29
<103> DW_AT_decl_column : (implicit_const) 7
<103> DW_AT_containing_type: (ref4) <0x348>
<107> DW_AT_sibling : (ref4) <0x18e>
<2><10b>: Abbrev Number: 9 (DW_TAG_inheritance)
<10c> DW_AT_type : (ref4) <0x260>, Full
<110> DW_AT_data_member_location: (implicit_const) 0
<110> DW_AT_accessibility: (implicit_const) 1 (public)
...
<1><1a7>: Abbrev Number: 13 (DW_TAG_class_type)
<1a8> DW_AT_name : (strp) (offset: 0xa5d): Partial
<1ac> DW_AT_byte_size : (implicit_const) 8
<1ac> DW_AT_decl_file : (implicit_const) 1
<1ac> DW_AT_decl_line : (data1) 9
<1ad> DW_AT_decl_column : (implicit_const) 7
<1ad> DW_AT_containing_type: (ref4) <0x348>
<1b1> DW_AT_sibling : (ref4) <0x23d>
<2><1b5>: Abbrev Number: 9 (DW_TAG_inheritance)
<1b6> DW_AT_type : (ref4) <0x348>, Interface
<1ba> DW_AT_data_member_location: (implicit_const) 0
<1ba> DW_AT_accessibility: (implicit_const) 1 (public)
...
<1><260>: Abbrev Number: 13 (DW_TAG_class_type)
<261> DW_AT_name : (strp) (offset: 0x21e3): Full
<265> DW_AT_byte_size : (implicit_const) 8
<265> DW_AT_decl_file : (implicit_const) 1
<265> DW_AT_decl_line : (data1) 15
<266> DW_AT_decl_column : (implicit_const) 7
<266> DW_AT_containing_type: (ref4) <0x348>
<26a> DW_AT_sibling : (ref4) <0x316>
<2><26e>: Abbrev Number: 9 (DW_TAG_inheritance)
<26f> DW_AT_type : (ref4) <0x348>, Interface
<273> DW_AT_data_member_location: (implicit_const) 0
<273> DW_AT_accessibility: (implicit_const) 1 (public)
...
<1><348>: Abbrev Number: 13 (DW_TAG_class_type)
<349> DW_AT_name : (strp) (offset: 0x332d): Interface
<34d> DW_AT_byte_size : (implicit_const) 8
<34d> DW_AT_decl_file : (implicit_const) 1
<34d> DW_AT_decl_line : (data1) 2
<34e> DW_AT_decl_column : (implicit_const) 7
<34e> DW_AT_containing_type: (ref4) <0x348>
<352> DW_AT_sibling : (ref4) <0x404>
Variable-info
<1><e9>: Abbrev Number: 11 (DW_TAG_variable)
<ea> DW_AT_name : (strp) (offset: 0x2e8c): inst_foo
<ee> DW_AT_decl_file : (implicit_const) 1
<ee> DW_AT_decl_line : (data1) 34
<ef> DW_AT_decl_column : (implicit_const) 10
<ef> DW_AT_type : (ref4) <0x33>, Foo
<f3> DW_AT_external : (flag_present) 1
<f3> DW_AT_location : (exprloc) 9 byte block: 3 0 0 0 0 0 0 0 0 (DW_OP_addr: 0)
...
<1><193>: Abbrev Number: 11 (DW_TAG_variable)
<194> DW_AT_name : (strp) (offset: 0xfe3): inst_bar
<198> DW_AT_decl_file : (implicit_const) 1
<198> DW_AT_decl_line : (data1) 35
<199> DW_AT_decl_column : (implicit_const) 10
<199> DW_AT_type : (ref4) <0xfd>, Bar
<19d> DW_AT_external : (flag_present) 1
<19d> DW_AT_location : (exprloc) 9 byte block: 3 8 0 0 0 0 0 0 0 (DW_OP_addr: 8)
...
<1><242>: Abbrev Number: 11 (DW_TAG_variable)
<243> DW_AT_name : (strp) (offset: 0x1eda): inter_foo
<247> DW_AT_decl_file : (implicit_const) 1
<247> DW_AT_decl_line : (data1) 36
<248> DW_AT_decl_column : (implicit_const) 10
<248> DW_AT_type : (ref4) <0x256>
<24c> DW_AT_external : (flag_present) 1
<24c> DW_AT_location : (exprloc) 9 byte block: 3 10 0 0 0 0 0 0 0 (DW_OP_addr: 10)
<1><256>: Abbrev Number: 6 (DW_TAG_pointer_type)
<257> DW_AT_byte_size : (implicit_const) 8
<257> DW_AT_type : (ref4) <0x1a7>, Partial
...
<1><31b>: Abbrev Number: 11 (DW_TAG_variable)
<31c> DW_AT_name : (strp) (offset: 0x73b): inter_bar
<320> DW_AT_decl_file : (implicit_const) 1
<320> DW_AT_decl_line : (data1) 37
<321> DW_AT_decl_column : (implicit_const) 10
<321> DW_AT_type : (ref4) <0x32f>
<325> DW_AT_external : (flag_present) 1
<325> DW_AT_location : (exprloc) 9 byte block: 3 18 0 0 0 0 0 0 0 (DW_OP_addr: 18)
<1><32f>: Abbrev Number: 6 (DW_TAG_pointer_type)
<330> DW_AT_byte_size : (implicit_const) 8
<330> DW_AT_type : (ref4) <0x260>, Full
What I currently do to check if a class is abstract, is by checking if the value of the DW_AT_containing_type
attribute of a class is self-referencing. This is a cheap method that works for 90% of the codebase the parser is intended for. In the example above it works for the Interface
class as the DW_AT_containing_type
has value 0x348 and the class definition starts at offset 0x348.
This doesn't work for the class Partial
or Full
. Purely by looking at the DWARF description of the class there is no distinction between Partial
which is abstract and Full
which isn't.
The complete DWARF info also shows information for the functions, for the sake of clarity I've only kept the relevant info.
<1><348>: Abbrev Number: 13 (DW_TAG_class_type)
<349> DW_AT_name : (strp) (offset: 0x332d): Interface
<2><3c7>: Abbrev Number: 16 (DW_TAG_subprogram)
<3c8> DW_AT_external : (flag_present) 1
<3c8> DW_AT_name : (strp) (offset: 0xf55): aFunc
<3d2> DW_AT_virtuality : (implicit_const) 1 (virtual)
<3d5> DW_AT_containing_type: (ref4) <0x348>
<3d9> DW_AT_declaration : (flag_present) 1
<2><3e7>: Abbrev Number: 10 (DW_TAG_subprogram)
<3e8> DW_AT_name : (strp) (offset: 0x1293): bFunc
<3f2> DW_AT_virtuality : (implicit_const) 1 (virtual)
<3f5> DW_AT_containing_type: (ref4) <0x348>
<3f9> DW_AT_declaration : (flag_present) 1
<3f9> DW_AT_object_pointer: (ref4) <0x3fd>
...
<1><1a7>: Abbrev Number: 13 (DW_TAG_class_type)
<1a8> DW_AT_name : (strp) (offset: 0xa5d): Partial
<2><220>: Abbrev Number: 10 (DW_TAG_subprogram)
<221> DW_AT_name : (strp) (offset: 0xf55): aFunc
<22b> DW_AT_virtuality : (implicit_const) 1 (virtual)
<22e> DW_AT_containing_type: (ref4) <0x1a7>
<232> DW_AT_declaration : (flag_present) 1
...
<1><260>: Abbrev Number: 13 (DW_TAG_class_type)
<261> DW_AT_name : (strp) (offset: 0x21e3): Full
<2><2d9>: Abbrev Number: 16 (DW_TAG_subprogram)
<2da> DW_AT_name : (strp) (offset: 0xf55): aFunc
<2e4> DW_AT_virtuality : (implicit_const) 1 (virtual)
<2e7> DW_AT_containing_type: (ref4) <0x260>
<2eb> DW_AT_declaration : (flag_present) 1
<2><2f9>: Abbrev Number: 10 (DW_TAG_subprogram)
<2fa> DW_AT_name : (strp) (offset: 0x1293): bFunc
<304> DW_AT_virtuality : (implicit_const) 1 (virtual)
<307> DW_AT_containing_type: (ref4) <0x260>
<30b> DW_AT_accessibility: (implicit_const) 1 (public)
<30b> DW_AT_declaration : (flag_present) 1
Unfortunately GCC does not distinguish virtual
and pure virtual
via the DW_AT_virtuality
attribute (as defined in DWARF 5). Although aFunc()
and bFunc()
are pure virtual, only the virtual
AT value used.
Nevertheless, the above shows that we can make a distinction by analysis of the implemented functions.
Interface
class is abstract as the DW_AT_containing_type
is self-referencing.Interface
class declares two virtual functions. aFunc()
and bFunc()
I don't believe the method is foolproof, but it is the best method I could think of.
My question is: Is there a better way to determine if a class is abstract?