Thanks for the excellent advice Robert.  More comments inline below.
Robert Collins wrote:
 > I've been working on several things in this area Joe....
 >
 >> From a 'where to make the change' angle you have two choices:
 >> client_side, where you make the change on every request (read CPU
 > hog), or in http.c (aka server_side!) where you can modify the data
 > coming into squid.
Right you are.  Forgetting client_side.c and deleting that source tree.
  All wrong, and I'm glad I asked before going too far down it.
 > You can look at the changes to http.c in the te branch on
 > squid.sourceforge.net to see how to alter incoming data. The filter
 >  model that Patrick McManus put toghether for te codings would also
 >  make sense for in-squid data modifications (process the data
 > recieved chunk be recieved chunk). (Although it wouldn't be marked
 > as a te coding :-]).
 >
 > The advantage of altering the incoming data is that a) the
 > modifications get cached. and b) after the first retrieval, you can 
recalculate
 >  the content-length for future requests, keep http/1.0 persistent
 > conns happy. I don't suggest you touch the te code just yet, unless
 >  this is a medium term project :-]
 >
 > In the filter code you could use callbacks if you need external
 > helpers (I'm already considering the need for that), but you'll have 
to split the htttp function that calls perform_te (for me - for you 
performurlrewrite/...) If you want to head down that path letme
 > know and I'll split it up for you (save duplicate work)..
I'll look at it right now and see if I can figure out what it's doing 
and what I would need to do with it.  I welcome any guidance/assistance 
you care to offer.
 >> Am I an idiot?  It appears to me that it is possible to read and work on
 >> all of the object in client_side.c, and the noanim patch posted here a
 >> few weeks ago does just that without problems.  But I very well could be
 >> missing something.
 >
 >
 > It doesn't cache the results. String matching on blocked data isn't 
the cheapest operation, and doing it n * without caching the
 > results seems silly to me.
I /knew/ it!  I am an idiot, after all. ;-)  So I'll put it on the 
server side and cache the results.
 >> Assuming it is possible, can I use the ACL interface to generate the
 >> match lists, or do I need to come up with a method to handle the match
 >> string and the replacement string?  It would be nice to have a named ACL
 >> for the match strings, and it seems reasonable that this would work.  So
 >> can I run /anything/, including whole html pages, through a regex or
 >> string matching ACL?  Anyone have pointers for how to tackle this one?
 >
 >
 > I think you need a new ACL - you may have data comin and be sitting 
on the block boundary
 > (ie s/jobloggs/johnloggs/ - the first block of data you receive may be
 >
 > asdasdasdasdasdjob
 > and the second block
 > loggs is a strange person
 >
 > you will need to buffer the possible string hit 'job' and not send it 
on until you've seen the second block or hit EOF & flush any
 > buffered data.
Ok.  I think I can manage that, and it's something I didn't think of at 
all.  This is going to be a little more complicated than I'd 
anticipated.  But I think I can manage it.
I'm sure I'll be back with more questions once I've begun the 
implementation.
Thanks.
-- Joe Cooper <joe@swelltech.com> http://www.swelltech.comReceived on Mon Jan 15 2001 - 16:45:23 MST
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:13:18 MST