Filesystems on Rockfish
Rockfish uses a combination of high-performance and research-tier file systems to support a wide range of workloads. Most storage is backed by IBM Spectrum Scale (GPFS), with additional storage from Krieger IT (VAST) for eligible researchers.
Storage on Rockfish is intended solely for research and educational purposes. Users are expected to manage their data responsibly, and storage quotas are enforced per group.
General Guidelines
Data stored on ARCH-managed filesystems is not backed up by default.
Users are responsible for maintaining their own backups or purchasing backup services.
Storage increases are granted on a case-by-case basis, based on need and system capacity.
ARCH reserves the right to delete or move data as necessary to maintain system stability.
Temporary storage for large projects is available — please contact ARCH staff.
Important
Data subject to restrictions - including but not limited to, HIPAA, PHI, or CUI is not permitted on Rockfish. If your research involves an IRB and the data is de-identified, please reach out to help@arch.jhu.edu for further guidance, prior to storing or processing any data.
Filesystems at a Glance
File System |
System Type |
Total Size |
Block Size |
Default Quota |
Files per TB |
Backed Up? |
|---|---|---|---|---|---|---|
/home/ |
NVMe SSD (ZFS) |
20 TB |
128 KB |
50 GB per user |
N/A |
Limited |
/scratch4/ |
IBM GPFS |
3.8 PB |
4 MB |
1 TB per group |
2M |
No |
/scratch16/ |
IBM GPFS |
3.6 PB |
16 MB |
By request |
1M |
No |
/data/ |
IBM GPFS |
5.1 PB |
16 MB |
1 TB per group |
400K |
No |
/vast/ |
VAST |
N/A |
32 KB |
By request |
N/A |
No |
Local Scratch
Each compute node has a local 1+ TB NVMe hard drive mounted as “/tmp”. The latency to these NVMe flash drives is orders of magnitude lower than for spinning disk (GPFS), usually microseconds versus milliseconds. Users who read/write small files may want to use this space instead of the scratch file sets. It will provide better performance. Make sure you write files back to “scratch” or “data” before the job ends. Likewise, make sure you delete files and directories at the end of jobs.
/home/
Each user receives 50 GB of storage in /home/, backed by high-speed NVMe SSDs and ZFS. This area is intended for frequently used code, scripts, and configuration files.
Warning
/home/ is not intended for I/O from jobs. Use /scratch instead.
Limited file recovery may be possible, but is not guaranteed.
/scratch4/
This default scratch space is optimized for high file-count and smaller file sizes using a 4 MB block size.
1 TB per group (default)
Suitable for: genomics, bioinformatics, mechanical engineering
Purged automatically after 90 days of inactivity (based on access time)
Not backed up or recoverable
/scratch16/
This scratch space is optimized for sequential I/O and streaming workloads.
No default allocation. Available by request with justification
16 MB block size
Suitable for: physics, large-scale simulations, chemistry
Same 90-day purge policy applies
Not backed up or recoverable
/data/
This area is ideal for storing high-value data generated during or after computation, including:
1 TB per group (default)
Processed results
Intermediate analysis
Files you want to retain longer than 30 days
/data/ is not backed up, so users must implement their own preservation strategy.
/vast/
This all-flash storage is provided by Krieger IT for researchers who have purchased space.
Mounted at /vast/ on Rockfish
Available to all JHU researchers
Request form and pricing info:
Quota Reporting with quotas.py
ARCH provides a command-line tool called quotas.py to help users monitor their disk usage across the /home, /data, /scratch4, /scratch16, and /vast filesystems.
This tool runs automatically at login and displays the current usage for your home directory and your research group’s shared allocations. However, you can manually run it at any time to check your usage or monitor quotas for your research group.
Usage:
quotas.py
Example Output:
[root@login01 ~]# quotas.py
+---------------------------------------------------------------------------------+
| Home Usage for user <your_username> as of Tue Apr 15 15:00:06 2025 |
+---------------------+-------------------+-------------------+-------------------+
| Used | Quota | Percent | Files |
+---------------------+-------------------+-------------------+-------------------+
| XX.XX GB | 50.00 GB | 68.56% | XXX,XXX |
+---------------------+-------------------+-------------------+-------------------+
+-----------------------------------------------------------------------------------------------+
| GPFS Usage for Group <group_name> as of Tue Apr 15 15:00:17 2025 |
+-------------+------------+-------------+----------+--------------+----------------+-----------+
| FS | Used | Quota | Used % | Files | Files Quota | Files % |
+-------------+------------+-------------+----------+--------------+----------------+-----------+
| data | XX.XX TB | 10.00 TB | XX.XX% | X,XXX,XXX | 40,960,000 | XX.XX% |
| scratch4 | XX.XX TB | 10.00 TB | XX.XX% | X,XXX,XXX | 20,480,000 | XX.XX% |
| scratch16 | XX.XX TB | 10.00 TB | XX.XX% | X,XXX,XXX | 10,240,000 | XX.XX% |
+-------------+------------+-------------+----------+--------------+----------------+-----------+
Fields:
Used: Current usage for the filesystem
Quota: Allocated quota for the user or group
Percent: Percentage of usage relative to quota
Files: Number of files currently stored
Files Quota: Maximum allowed number of files
Files %: Percent of file quota used
Tip
File quotas are just as important as storage size. Exceeding your file quota may prevent new files from being written even if space remains.