This page is a wiki. Please login or create an account to begin editing.


9 posts / 0 new
Last post
BootSector's picture
Offline
Joined: 2018 Feb 18
Is there an HFS expert in the house?

Since I've been working with HFS disk images a lot more lately I've started studying the file system itself in detail. Understanding all the data structures and where data is has various advantages especially if I end up encountering any damaged disk where partial or complete recovery might be possible. (I might also like to write some utilities someday).

I've made a lot of progress but there is still one question I have that I can't come up with a satisfying answer for. I'm hoping someone out there might already know.

The documentation in Inside Macintosh: Files assigns a CNID for the Catalog and the Extend Overflow files. These never appear in the catalog since the two files are special and don't show as normal files on the disk. But, if the catalog file grows large and fragmented enough records for the catalog file can appear in the extent file and they use this CNID. I've seen this happen and it seems to work just fine.

In principle at least, it seems possible that the extent file itself could also become large and fragmented enough that it would need extent overflow records of it's own. This is obviously a horrible fringe case and creates all kinds of complexity like having to make sure the records for later extents are always stored in the earlier extents in the file (or they couldn't be found). But, the existence of the CNID seems to imply this is possible.

I've tried to create this by creating a worst case disk where two large files were interleaved with each sector being an individual extent and it still didn't come close to filling up the three "root" extents for the file stored in the MDB.

So, it seems like the system is designed such that overflow file is always over provisioned to ensure that this never actually happens. Does anyone know for sure?

It just seems odd that the ID was assigned if it was never used but I guess this could just be for completeness. If this ever does happen it seems like it would be such an obscure case that a utility could probably just treat it as unsupported and refuse to work with the image. Or, maybe just defrag the image at that point which kind of makes the whole thing moot anyway.

Thanks!!

Comments

OpenSourceMac's picture
Offline
Joined: 2019 Jan 21

I was a bit underwelmed by iDefrag ( https://macintoshgarden.org/apps/idefrag-173 ) until I realized that you don't have to do the SUPER-SLOW "Optimize" function, but instead, can do a quick "Online" defrag, and then the "Metadata" defrag that effects exactly what you are describing - it is like night and day. I have one drive with 1,460 files on it and even when I use only one processor, it will populate all icons in about 1/10th of a second - after defraging in this way. Normally, it takes like 20 seconds - so a HUGE improvement.

BootSector's picture
Offline
Joined: 2018 Feb 18

I'm guessing what is happening in this case, with the "quick online" defragment, is to just defragment the catalog file and possibly rebalancing the catalog BTree. (BTrees are supposed to be self-balancing but in my experiments with the HFS disks it seems to me the catalog tree quickly becomes suboptimal as you delete files). This would be pretty quick to do even for a large disk and probably would have a noticeable impact on reading the contents of folders

The "optimize" function is probably a more traditional defragment which completely joins all the pieces of each file so that each file is in a single contiguous block. This would speed up reading all the individual files but each file has to be done separately on each file (twice if it has data and resource forks) to see any benefit so it would take much longer.

It's funny you'd mention this since as I've been working on HFS I got to thinking about how, back in the day, we never talked about defragmenting Macintosh disks at all. I can find no official Apple utility that ever did this and searching Macintosh Garden for it gives very little results beyond iDefrag that you already pointed out. I always remember running defrags on my PCs FAT disks but never on my Macs. Odd since, unlike more modern file systems, HFS seems just as prone to fragmentation as FAT is.

That said, the defragmenting thing wasn't my direct question. Basically, what I want to know is that if the special file HFS uses to keep track of all the fragments of files on the disk can itself become fragmented enough that it also needs to keep track of it's own pieces.

krausjxotv's picture
Offline
Joined: 2019 Mar 31

I often used DiskExpress to defragment hard drives, https://macintoshgarden.org/apps/diskexpress-ii-version-22

OpenSourceMac's picture
Offline
Joined: 2019 Jan 21

Actually 'Quick' fully defrags the files, but doesn't compact them (a different option). Then if you run 'Metadata' it defrags catalog, spotlight and the B-tree.

Kinda Rocks Actually.

sfp1954's picture
Offline
Joined: 2013 Dec 29

I used Norton's Speed Disk many times. So everyone I knew degramented their drives on a regular basis.

24bit's picture
Offline
Joined: 2010 Nov 19

Exactly what I thought. Wink

adespoton's picture
Offline
Joined: 2015 Feb 15

If you look at how Speed Disk works, it does multiple passes for different parts of the drive structure. It defragments the catalog, the B-Tree, and then not only defragments files, but sorts them on the physical disk so that smaller files are stored near the center of the spindles and large files are near the outside -- by file type. This means that when you need fast I/O loading lots of little files, you'll find them on the part of the platter that gets you back to the start of file faster.

Most of what I knew about HFS physical and logical storage I forgot well over a decade ago though.

BootSector's picture
Offline
Joined: 2018 Feb 18

For anyone else who may care now or in the future, I've continued to experiment with this and I've concluded that the Extent Overflow File itself cannot have more than the three extents that are stored in the MDB. This means it will never contain any records referring to itself and so any process reading the structure doesn't need to look for additional later extents beyond what it can find in the MDB. Likewise, code trying to update the file should never try to create a fourth extent or create any records in the extent file that refer back to it.

First, as I alluded to earlier, upon initialization of the volume, the first extent created for the file is large enough that it's unlikely to ever fill up completely under typical usage patterns. If the second or third extent does need to get created the system tries to create it as the same size as the first (there would need to be an open space at least that large on the disk). It appears that if the three extents stored in the MDB are all full-sized then that's just enough space to store enough records for the case where each allocation block on the disk is an individual extent. An extreme worst case scenario that's impractical to ever approach.

Where you might run into something under more unusual amounts of fragmentation would be where there aren't any gaps large enough to create the second or third extents at full size. In that case you could potentially fill up the file to the point where a fourth extent might be needed but this would be difficult to reach through typical means either intentionally or unintentionally. You'd probably have to incrementally lengthen a number of files many times so that they became all intertwined. That's not something that could be easily done or that I even attempted.

Instead, I tried to create this state artificially by manipulating the data structures on the disk. One thing I tried was to modify the size of the extent file so that each extent was smaller and it was much easier to fill up. I also tried a different approach where I left the file itself alone but marked a tight pattern of used blocks in the volume bitmap so when I copied a file in it quickly became highly fragmented. Then I just removed the flags I set and was able to copy in more files which also became highly fragmented. This also had the effect of forcing the second and third extents for the overflow file to be much smaller due to lack of a large opening on the disk.

I tried each approach a number of times and the result was always the same. The system never created a fourth extent which I could tell by comparing the size in the drXTFlSize field to the number of blocks in the drXTExtRec field in the MDB. Whenever I tried to save a file that would have required creating the fourth extent the operation failed and I got a popup with the crypt error message "Unknown Error." If the second or third extents didn't already exist then they were created after that but otherwise there were no logical changes to the file system. I take that to mean that lower-level code errored out when it realized there wasn't enough overflow space to store all the required extents. Since this should basically never happen it's not surprising it's not handled more gracefully.

In conclusion, I think the system is designed such that enough space is preallocated in the overflow file so that it basically never needs to grow larger than what can be referenced directly by the MDB. This avoids the complexities that would arise with reading and writing the file when it contained entries to it's own later extents. I did all this testing in Mini vMac II running System 7.5.5 and mostly with an image the size of a high density floppy disk.

If I write any utilities in the future I plan to validate the image by checking that the drXTFlSize and drXTExtRec fields agree on the length of overflow file. I think the System Software might also make this same check and reject the disk as uninitialized if they don't match.

I still wonder about why the CNID is assigned for the overflow file in Inside Macintosh though.