Apologies in advance if this is inappropriate for this mailing
list...
I am writing a Perl tool which "crawls" the Squid cache directories
looking for cached files matching some criteria, so that they can be
processed in some way. Currently I'm working off a 2.2Stable5+patches
version, but I understand the format of the cache content files
themselves is supposed to be fairly stable going forward?
My understanding from the Squid programmers guide
<http://www.squid-cache.org/Doc/Prog-Guide/prog-guide-23.html>, plus
inspection of actual cache contents, was that a cache file is supposed
to consist of a string of meta-data tuples (type=byte, length=int,
value). I inferred by eyeballing the data that this was followed by
the actual HTTP headers, a blank line, and the data associated with the
object.
This doesn't quite seem to match what I'm parsing, though it's fairly
close. What I'm actually finding in the files is more like this:
Meta-data-header for meta-data itself (type=03x, header_meta_length, value= {
meta-data tuple (store_meta_type, meta_length, value) ;
meta-data tuple (store_meta_type, meta_length, value) ;
...
00x 00x
}
HTTP headers...
HTTP headers...
Object data
EOF
For the metadata, meta_length appears to be an (unsigned?) integer in
local format, and includes the size in bytes of the meta_type code and
the meta_length integer itself.
So, in effect there are 3 different parsing regimes to interpret the
file: one for the meta-data (type/length driven, within the initial
file length specified by the meta-data-header); one for the HTTP
headers (line-oriented); and one for the actual data contents - binary
read from end of HTTP headers to EOF.
Is this correct? My tool is running, so I am parsing these files at
least partially successfully, but would like to know I'm actually
getting the correct results, or how to fix it so it's trustworthy. I
have tried to figure out where I could confirm this in the Squid
sources, but due to the event-driven coroutining structure of all of
the store routines, I can't figure out exactly what object type is
being written to the disk, or even in what routines it's written to the
disk.
A few words from someone who understands the storage structures would
be very helpful. Thanks,
-- Clifton
-- Clifton Royston -- LavaNet Systems Architect -- cliftonr@lava.net "An absolute monarch would be absolutely wise and good. But no man is strong enough to have no interest. Therefore the best king would be Pure Chance. It is Pure Chance that rules the Universe; therefore, and only therefore, life is good." - ACReceived on Thu Jan 27 2000 - 03:13:32 MST
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:21 MST