Hi Robert,
I won't volunteer to 'help' per se, but I will volunteer to peer 
interestedly over your shoulder the whole time.  I might even point at 
stuff, and say "Hmmmm" on occasion.
Ok, that's my way of saying this sounds like an interesting area of 
work, and I was just looking at that part of the code this past weekend 
(which hasn't been touched since 2.2STABLE5+patch, as far as I could 
tell) for a client who had a curiosity about byte range requests and 
Squid.  And what I'll actually do in the way of helping is try to keep 
up with your development and help you where I can with testing and idea 
bouncing.  I might even attempt to write a line or two of code.
Let me know if there's anything I can do to help get things rolling.
First thoughts:
Requires a disk format or swap.state change, probably.  (Whether we 
break it up or put the pieces all in one place.)
Breaks the storetree stuff that is going to be merged in from the 
squidng project.  storetree does a hash of the URL for storage 
indexing...and keeps no record of what was just stored.  If the object 
is not complete, reiser_raw won't know that and will serve it 
incomplete.  So a new field would also be needed in reiser_raw.  Which I 
guess applies to the standard object store as well...so chalk that one 
up to "necessary change" for version DEVEL2.5.
Possibilities:
The idea of rewriting files and expiring the 'pieces' as new pieces come 
in, until we have a complete object.  Adds significant write overhead, 
but keeps the index simple and objects end up being as large as they can 
be...reducing read and seek overhead.  I think we should avoid 
fragmenting of the object store, if possible, for performance reasons. 
But that's just a hunch.  I could be wrong, since range requests are 
being used on mostly big objects anyway, I guess?  But this plan doesn't 
account for 'holes' in the ranges being requested.  How likely is that? 
  And would it be wise to just accept having to fetch the whole object 
or the data in between two points in such a case in order to avoid the 
complexity of having several separate parts of one object at different 
locations in the store?
I'll stop talking for now until I actually understand what I'm talking 
about.
Robert Collins wrote:
> Hi everyone,
>     I've added a new branch on sourceforge for working on storeing and returning HITS from partial responses. I don't know how fast
> I'll move it along :]. If anyone wants to collaborate on it then fantastic.
> 
> for reference, the tag is storepartial.
> 
> My rough approach plan is to
> a) get strong validation working. (if it's not already)
> b) figure out how best to store multiple sections from a URL in the object store. IE should we consider each non-overlapping range a
> separate URI? Or perhaps store a series of sections in the ondisk object with common details in the meta data and then 1..n sections
> of defined length and offset? I haven't put a great deal of effort into this, and I'm hoping to avoid invalidating the existing
> caches when it happens.
> c) get squid caching the partial responses and serving hits that are completely covered by in-cache range data.
> d) look into extra optimisations (for example, if we have a partial response in cache, ask the origin for a HEAD, and if a strong
> validator comparison succeeds send one or more range requests to the origin, fulfilling the client from the store and from the
> origin.)
> 
> Rob
                                   --
                      Joe Cooper <joe@swelltech.com>
                  Affordable Web Caching Proxy Appliances
                         http://www.swelltech.com
Received on Tue Dec 12 2000 - 02:33:20 MST
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:13:03 MST