I'll stop posting soon I think... :-b
> I guess what stephen is talking about here is an extent based
> filesystem.
>
> There's a few points to keep in mind. We're not implementing a general
> purpose filesystem here...
>
> #1. We delete as often as we create.
> #2. We generally know how big things are going to end up before we
> create them.
> #3. We normally sit fairly close to full. (i.e. 95% used).
> #4. Objects are normally fairly small. (i.e. 5 orders of mag less than
> the size of the disk or better).
> #5. Most objects are 'soft', in that we're allowed to delete them
> almost any time we like if we really need to.
>
> #6. We never do anything other than sequential reading or writing.
> #7. We never need to partial truncate. We only ever delete entirely.
> #8. We only ever append to a file (never seek into the middle and
> write).
>
> These points has some interesting consequences.
>
> #2 means that we can be very very successful at avoiding
> fragmentation.
> #4 means that we should have a very high success rate at finding
> places to put objects such that the entire object is in a
> single extent.
> #5 says that we can 'fix' fragmentation if it gets disasterous.
> #6 says that extents are really groovy, becuase it makes sense to say
> "1234 blocks starting at 34" rather than "1234, 1235, 1236, ... "
> #7 further justifies extents.
>
> Noting that there's a LOT of literature on fragmentation, and a decent
> amount on extent based filesystems.
>
> Noting that there's no need to embed the extent meta info into the
> extent itself. It would be more normal for an inode like:
> type struct {
> unsigned long start;
> unsigned long len;
> } Extent;
>
> struct inode {
> ....
> Extent e[10];
> Extent ie;
> Extent die;
> Extent tie;
> };
>
> where 'ie' is an extent of Extent structures, die is a doubly indirected
> block of extent structures etc. etc.
>
> Noting that in 99.9999% of cases it would even need an 'ie' let alone
> a 'die' or a 'tie' (because you know how big the object will be, so
> you simply put it into free space that is large enough to contain it,
> so zero fragmentation. Only so very small fraction of the time will
> you ever need to fragment [ by virtue of #3, #4 and even #5 at a
> pinch]).
If you apply all your points #1 - #8 then there's no need even for an
extent based filesystem, why not just do like the CNFS (Cyclical News
File System)?
Extent based filesystems work very well for few transfers of large files
such as database operations.
As the files get smaller the win from using extents starts to drop
dramatically. Once we get into the range of lots of little files (ie squid)
there will be the need to constantly move stuff about (or delete as you say)
to prevent excessive fragmentation. The extent based filesystem I know of
in common use (Irix EFS) has problems with lots of little files because it
spends all of its time moving stuff about to keep fragmentation from hurting
for bigger files. Solaris's UFS was found to be 4 times faster than EFS on
a news server in testing we performed at UniMelb.
If, as you say, you start deleting files in order to reduce fragmentation,
you still have to inform the user process that you've done that.
I'm not saying it can't be done and that it won't work. I'm voicing my
doubts. A CNFS style filesystem is the ultimate solution if you pursue the
EFS path with the points you have raised. If you still want to maintain
some semblance or the current file persistency behaviour then a stripped
down UFS will give you that with only a small performance hit from an all
out CNFS style solution.
Stew.
-- Stewart Forster (Snr. Development Engineer) connect.com.au pty ltd, Level 9, 114 Albert Rd, Sth Melbourne, VIC 3205, Aust. Email: slf@connect.com.au Phone: +61 3 9251-3684 Fax: +61 3 9251-3666Received on Tue Jul 29 2003 - 13:15:52 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:53 MST