On Thu, 16 Dec 2004, Rod Walker wrote:
> I`m hoping to use Squid for the retrieval and caching of large data
> files(~1GB) for High Energy Physics applications. One of the considerations
> is the file transfer rates relative to gridFTP, which can use multiple
> parallel streams to increase the transfer rate.
>
> On googling around a little I found several multi-streamed 'wget-like'
> http clients, e.g. aget, prozilla, that get transfer speeds comparable to
> gridftp. These do not respect the http_proxy environment variable and do not
> use the squid cache, probably for the very good reason that splitting a
> file into several chunks for transfer will make it very hard to cache.
It is in theory not that hard to cache, but Squid still lacks some of the
the needed capabilities. What is needed to deal with this proper is the
ability to cache partial objects. There is also some minor implications
relating to the ETag header but this is pretty minimal for the specific
question.
Doing the stream splitting/merging withing Squid in response to a single
request is not that feasible. The problem with this is that the proxy need
to delay possibly huge amounts of data to be sent to the client until all
data before has been received and sent already. This causes a number of
problems in timeouts, buffering, bandwidth usage etc. It is by far best if
the client initiates the multi-stream transfer.
Regards
Henrik
Received on Fri Dec 17 2004 - 05:38:17 MST
This archive was generated by hypermail pre-2.1.9 : Fri Dec 31 2004 - 12:00:05 MST