Friday, 16 August 2013

Faster inheritance through directly inlining code

Faster inheritance through directly inlining code

I've come up with a way(not necessarily new) that replaces virtual tables
for switch statements. This method allows inlining virtual functions at
the cost of added memory.
Instead using a table look up, a sort of switch is used
switch (objecttype)
{
case objectA : inlined virtual function call for foo from objectA; break;
case objectB : inlined virtual function call for foo from objectB; break;
case objectC: inlined virtual function call for foo from objectC; break;
default: vtable call;
}
So instead of using a ptr lookup and making a call, a comparison is done.
Inlining of code can be done for known classes.
To make this work well(avoid more than just the function call), objects
would need to store their type. Types would need to be sequential.
e.g.,
class A
{
ushort objectType; // internal id, say for class A it is 1000
ushort objectInc; // internal. represents a sort of offset into the
jump table
}
class B : A
{
unshort objectInc; // one more than A's objectInc, has the same
objectType
}
etc...
Then the switch statement can be made into an efficient jump table
comparing on the objectType(make sure it's correct) and using objectInc
and the code size to jump directly to the virtual function's code(instead
of a bunch of comparisons).
As far as I can tell the downside to this scheme is more memory(larger
classes and more inlined functions) and more compiler complexity but
virtual functions can be directly inlined(the whole switch statement could
be) so there are no wrapping calls. The only extra overhead should be just
a few cycles extra due to a few comparisons and jumps(which is O(1)).
Does anyone have any useful comments on the performance of such a scheme
and why it is not used(I'm sure I'm not the first to think of this)? I
think it would be rather efficient except possibly cache invalidation due
to the comparison but I think on average for base classes, it could be
made within just a few cycles of a direct inlined method call.
BTW, the table could sort of be looked as a list of inlined virtual
function calls for each object derivative.
suppose we have the following
class A
{
void foo();
}
class B : A
{
override void foo();
}
class C : A
{
override void foo();
}
A a = new C();
a.foo(); // but calls fooWrap
/// internal
void fooWrap(A a)
{
switch(a.Type)
{
case A: a.foo(); break; // A.foo() can be inlined here
case B: b.foo(); break; // B.foo() can be inlined here
case C: c.foo(); break; // C.foo() can be inlined here
default: // use vtable lookup, a's type is not known at compile time
}
}
(normally fooWrap would be the vtable lookup)
now fooWrap could be inlined directly too in which case the cost of the
call to foo is just the switch statement, which can be further optimized
by using an efficient jump list.

No comments:

Post a Comment