Hello,
     This email summarizes my questions and suggestions related to 
optimizing Ip::Address implementation.
Disclaimer: I am writing from general performance and API principles 
point of view. I am not an IP expert and may miss some critical 
portability concerns, among other things. I am counting on Amos and 
others to audit any ideas worth considering.
Ip::Address is actually a socket address. Besides the IP address itself, 
the class maintains the socket address family and port (at least).
Currently, Ip::Address is designed around a single sockaddr_in6 data 
member that holds the address family, socket port, the IP address 
itself, and some extra info. This is 28 bytes of storage, most of which 
are used in case of an IPv6 address.
If we were to deal with IPv4 addresses only, the corresponding "native" 
IPv4 sockaddr_in structure takes 16 bytes, and only the first 8 of which 
are normally used.
Currently, Ip::Address implementation strategy can be summarized as 
"convert any input into a full-blown IPv6 address as needed, store that 
IPv6 address, and then convert that IPv6 address into whatever the code 
actually needs". This is a simple, solid approach that probably helped 
eliminate many early IPv6 implementation bugs.
Unfortunately, it is rather expensive performance-wise because the input 
is usually not an IPv6 address and the output is usually not an IPv6 
address either. While each single conversion (with the exception of 
conversion from and to textual representation) is relatively cheap, 
these conversions add up. Besides, the conversions themselves and even 
the "what am I really?" tests are often written rather inefficiently.
For example, to go from an IPv4 socket address to an a.b.c.d textual 
representation of that address, the current code may allocate, 
initialize, use, and discard several sockaddr_in6 structures, do a dozen 
of sockaddr_in6 structure scans for certain values, and copy parts of 
sockaddr_in and sockaddr_in6 structures a few times a long the way to 
the ntoa() call. After the ntoa() call, scan the resulting string to 
find its end.
The old code equivalent of the above? Initialize one IPv4 sockaddr_in 
structure on stack and pass it to ntoa()!
I see three ways to optimize this without breaking correctness:
* one-struct: Keep using a single sockaddr_in6 data member and convert 
everything into an IPv6 socket address as we do now. Optimize each 
conversion step, remove repeated "who am I?" checks during each step, 
and optimize each check itself.
* union: Replace sockaddr_in6 storage with a union of sockaddr_in and 
sockaddr_in6 structures (or their wrappers). The Ip::Address class will 
check the address family part (that must overlap in all socket 
addresses, by design) to know what the address is and then use the right 
structure. This will avoid most conversions for the currently dominating 
IPv4-to-IPv4 path. If the caller does need a different version of the 
address than the one we stored, the "up" or "down" conversion is 
unavoidable and is still be handled by the Ip::Address.
* two-struct: Similar to union, but keeping two separate structures, not 
merged in a union. This costs more memory and whole-object copying 
cycles, but allows us to keep both native and converted versions of the 
addresses. I do not know whether that is useful.
The "one struct" approach is simpler to implement given the current code 
state, but the "union" approach should be faster as long as IPv4-to-IPv4 
paths are common. The union approach may also yield overall simpler code 
as there will be very few "what am I know?" checks and concerns. The 
two-struct approach attempts to minimize back-and-forth conversion costs 
by giving access to both v4 and v6 versions at the same time.
FWIW, on a conceptual level, boost.asio library uses a union approach 
(only one address version is valid per Address object) while their 
implementation uses the two-struct approach (there is a place to store 
both IPv4 and IPv6 versions). This does not surprise me much because 
this Boost library does not seem to care about RAM/copying overheads.
Interestingly enough, boost.asio does not support transparent v4-v6 
conversions (AFAICT): To do a conversion, the calling code needs to know 
what IP version the generic address is storing, get that 
version-specific address, and then ask it to convert to another version. 
Perhaps that means that nobody needs a general "give me IPv6 regardless 
of what you really have" API?
https://svn.boost.org/trac/boost/browser/trunk/boost/asio/ip/address.hpp
There are some secondary optimizations and API changes that I would like 
to discuss, but let's first see if there is a consensus which overall 
design approach is the way to go. There may be more options than the 
three outlined above.
Thank you,
Alex.
Received on Thu Nov 18 2010 - 20:27:42 MST
This archive was generated by hypermail 2.2.0 : Fri Nov 19 2010 - 12:00:05 MST