In computing, '
file system fragmentation', sometimes called 'file system aging', is the inability of a file system to lay out related data sequentially (contiguously), an inherent phenomenon in
storage-backed file systems that allow in-place modification of their contents. It is a special case of
data fragmentation. File system fragmentation increases disk head movement or ''seeks'', which are known to hinder
throughput. The correction to existing fragmentation is to compress files and free space back into contiguous areas, a process called ''
defragmentation''.
Why fragmentation occurs
When a file system is first initialized on a partition (the partition is formatted for the file system), the entire space allotted is empty.
[1] This means that the allocator algorithm is completely free to position newly created files anywhere on the disk. For some time after creation, files on the file system can be laid out near-optimally. When the
operating system and
applications are installed or other
archives are unpacked, laying out separate files sequentially also means that related files are likely to be positioned close to each other.
However, as existing files are deleted or truncated, new regions of free space are created. When existing files are appended to, it is often impossible to resume the write exactly where the file used to end, as another file may already be allocated there — thus, a new fragment has to be allocated. As time goes on, and the same factors are continuously present, free space as well as frequently appended files tend to fragment more. Shorter regions of free space also mean that the allocator is no longer able to allocate new files contiguously, and has to break them into fragments. This is especially true when the file system is more full — longer contiguous regions of free space are less likely to occur.
Consider the following scenario, as shown by the image on the right:
A blank disk has 5 files, A, B, C, D and E each using 10 blocks of space (for this section, a ''block'' is an allocation unit of that system, it could be 1K, 100K or 1 megabyte and is not any specific size). On a blank disk, all of these files will be allocated one after the other. (Example (1) on the image.) If file B is deleted, there are two options, leave the space for B empty and use it again later, or compress all the files after B so that the empty space follows it. This could be time consuming if there were hundreds or thousands of files which needed to be moved, so in general the empty space is simply left there, marked in a table as available for later use, then used again as needed.
[2] (Example (2) on the image.) Now, if a new file, F, is allocated 7 blocks of space, it can be placed into the first 7 blocks of the space formerly holding the file B and the 3 blocks following it will remain available. (Example (3) on the image.) If another new file, G is added, and needs only three blocks, it could then occupy the space after F and before C. (Example (4) on the image). Now, if subsequently F needs to be expanded, since the space immediately following it is no longer available, there are two options: (1) add a new block somewhere else and indicate that F has a second ''extent'', or (2) move the file F to someplace else where it can be created as one contiguous file of the new, larger size. The latter operation may not be possible as the file may be larger than any one contiguous space available, or the file conceivably could be so large the operation would take an undesirably long period of time, thus the usual practice is simply to create an ''extent'' somewhere else and chain the new extent onto the old one. (Example (5) on the image.) Repeat this practice hundreds or thousands of times and eventually the file system has many free segments in many places and many files may be spread over many extents. If, as a result of free space fragmentation, a newly created file (or a file which has been extended) has to be placed in a large number of extents, access time for that file (or for all files) may become excessively long.
To summarize, factors that typically cause or facilitate fragmentation, include:
★ low free space.
★ frequent deletion, truncation or extension of files.
★ overuse of
sparse files.
Performance implications
File system fragmentation is projected to become more problematic with newer hardware due to the increasing disparity between
sequential access speed and
rotational delay (and to a lesser extent
seek time), of consumer-grade
hard disks,
[ ] which file systems are usually placed on. Thus, fragmentation is an important problem in recent file system research and design. The containment of fragmentation not only depends on the on-disk format of the file system, but also heavily on its implementation.
[ ]
In simple file system
benchmarks, the fragmentation factor is often omitted, as realistic aging and fragmentation is difficult to model. Rather, for simplicity of comparison, file system benchmarks are often run on empty file systems, and unsurprisingly, the results may vary heavily from real-life access patterns.
[ ]
Types of fragmentation
File system fragmentation may occur on several levels:
★ Fragmentation within individual
files and their
metadata.
★ Free space fragmentation, making it increasingly difficult to lay out new files contiguously.
★ The decrease of
locality of reference between separate, but related files.
File fragmentation
Individual file fragmentation occurs when a single file has been broken into multiple pieces (called
extents on extent-based file systems). While disk file systems attempt to keep individual files contiguous, this is not often possible without significant performance penalties. File system check and defragmentation tools typically only account for file fragmentation in their "fragmentation percentage" statistic.
Free space fragmentation
Free (unallocated) space fragmentation occurs when there are several unused areas of the file system where new files or metadata can be written to. Unwanted free space fragmentation is generally caused by deletion or truncation of files, but file systems may also intentionally insert fragments ("bubbles") of free space in order to facilitate extending nearby files (see
proactive techniques below).
Related file fragmentation
Related file fragmentation, also called application-level (file) fragmentation, refers to the lack of
locality of reference between related files. Unlike the previous two types of fragmentation, related file fragmentation is a much more vague concept, as it heavily depends on the access pattern of specific applications. This also makes objectively measuring or estimating it very difficult. However, arguably, it is the most critical type of fragmentation, as studies have found that the most frequently accessed files tend to be small compared to available disk throughput per second.
[ A Large-Scale Study of File-System Contents, John R. Douceur, William J. Bolosky, , , ACM SIGMETRICS Performance Evaluation Review, ]
To avoid related file fragmentation and improve locality of reference, assumptions about the operation of applications have to be made. A very frequent assumption made is that it is worthwhile to keep smaller files within a single
directory together, and lay them out in the natural file system order. While it is often a reasonable assumption, it does not always hold. For example, an application might read several different files, perhaps in different directories, in the exact same order they were written. Thus, a file system that simply orders all writes successively, might work faster for the given application.
Techniques for mitigating fragmentation
Several techniques have been developed to fight fragmentation. They can usually be classified into two categories: ''proactive'' and ''retroactive''. Due to the hard predictability of access patterns, these techniques are most often
heuristic in nature, and may degrade performance under unexpected workloads.
Proactive techniques
Proactive techniques attempt to keep fragmentation at a minimum at the time data is being written on the disk. The simplest of such is, perhaps, appending data to an existing fragment in place where possible, instead of allocating new blocks to a new fragment.
Many of today's file systems attempt to preallocate longer chunks, or chunks from different free space fragments, called
extents to files that are actively appended to. This mainly avoids file fragmentation when several files are concurrently being appended to, thus avoiding them from becoming excessively intertwined.
A relatively recent technique is delayed allocation in XFS and ZFS; the same technique is also called allocate-on-flush in reiser4 and ext4. This means that when the file system is being written to, file system blocks are reserved, but the locations of specific files are not laid down yet. Later, when the file system is forced to flush changes as a result of memory pressure or a transaction commit, the allocator will have much better knowledge of the files' characteristics. Most file systems with this approach try to flush files in a single directory contiguously. Assuming that multiple reads from a single directory are common, locality of reference is improved.[ ] Reiser4 also orders the layout of files according to the directory hash table, so that when files are being accessed in the natural file system order (as dictated by readdir), they are always read sequentially.[ The Reiser4 Filesystem Hans Reiser ]
Bittorrent and other peer-to-peer filesharing clients have an "Antifragmentation" feature that allocates the full space needed for a file when initiating downloads.
Retroactive techniques
Retroactive techniques attempt to reduce fragmentation, or the negative effects of fragmentation, after it has occurred. Many file systems provide defragmentation tools, which attempt to reorder fragments of files, and often also increase locality of reference by keeping smaller files in directories, or directory trees, close to each other on the disk.
The HFS Plus file system transparently defragments files that are less than 20 MiB in size and are broken into 8 or more fragments, when the file is being opened.[ Mac OS X Internals: A Systems Approach, Amit Singh, , , Addison Wesley, , ]
See also
★ Fragmentation
★ Defragmentation
★ File system
★ Locality of reference
Notes and references
1. The partition is not ''completely empty'': some internal file system structures are always created. However, these are typically contiguous, and their existence is negligible. Some file systems, such as NTFS and ext2+, might also preallocate empty contiguous regions for special purposes.
2. The practice of leaving the empty space behind after a file is deleted, marked in a table as available for later use, then used again as needed
is why undelete programs were able to work, they simply recovered the file whose name had been deleted from the directory, but the contents were still on disk.