What Can Metadata Tell You About Your Files?
When trying to determine if a file is still in use, IT administrators often rely on metadata as the source of truth. File metadata includes a lot of information, but for understanding activity, there are two fields that are the most important: Modified Time (M-Time) and Accessed Time (A-Time).
These fields are pretty self explanatory. The M-Time is the date and time when the file was last modified, while the A-Time is the date and time when the file was last accessed. One or both of these fields will give an IT admin the information they need to decide if a file should remain where it is, be moved into archive storage, or - conversely - into a higher performance storage tier.
However, deciding which field to rely on (and when) involves a few factors. Some are technical, and others are simply business logic.
Which Field Matters More?
When determining what data is still being used or not, we recommend organizations should almost always rely on A-Time. Here is why:
Most organizations rely on reference data for their workflows, especially in industries like life sciences, or geospatial that produce a lot of data from specific workflows that are never modified after creation. In fact, the same reference data is often reused with future workflows for months or years after creation. To identify data that can be archived, these organizations should focus on finding data where the files haven’t been accessed recently. In these cases, A-Time is the source of truth when identifying what data is hot or cold.
What about organizations where the NAS systems aren’t configured to update A-Time? Updating A-Time can be resource intensive for some systems or workloads, which leads some organizations to decide to leave A-Time “disabled”. In these cases, A-Time is still listed in the metadata - it will just always be set to equal M-Time. Since this makes A-Time and M-Time interchangeable, it doesn't really matter which one you use.
So when does M-Time matter more for defining hot and cold data? The answer = when you know you have processes that update A-Time that don’t reflect actual file usage.
Many existing workflows (ex: backup solutions, security applications, or services that rely on scans) can sometimes update A-Time during their automated operations. In these cases, A-Time would incorrectly mark datasets as being recently accessed when in reality they are not in use by any real workflow. For these circumstances, organizations should rely on M-Time for determining what data is hot and what is cold.
Viewing A-Time and M-Time
The good news - solutions like Igneous DataDiscover make it easy to view your unstructured data, sorted by either A-Time or M-Time:
Regardless of what workflows your organization uses, you can easily find all of your unused datasets and archive them to a cost effective long-term storage solution, like the public cloud.
Want to Learn More About Igneous DataDiscover?
Igneous delivers a high-performance, protocol and platform agnostic file system scan using our proprietary AdaptiveSCAN™ technology. Scan your complete system using 75% fewer IOPS than Linux, regardless of scale. DataDiscover collects and aggregates file-system metadata from any source into the Igneous InfiniteINDEX™. This data catalog with limitless scalability will simplify how you locate datasets, files and objects. Answer meaningful questions that lead to better unstructured data management strategies with Igneous.