Filesystems on Rockfish

Rockfish uses a combination of high-performance and research-tier file systems to support a wide range of workloads. Most storage is backed by IBM Spectrum Scale (GPFS), with additional storage from Krieger IT (VAST) for eligible researchers.

Storage on Rockfish is intended solely for research and educational purposes. Users are expected to manage their data responsibly, and storage quotas are enforced per group.

General Guidelines

  • Data stored on ARCH-managed filesystems is not backed up by default.

  • Users are responsible for maintaining their own backups or purchasing backup services.

  • Storage increases are granted on a case-by-case basis, based on need and system capacity.

  • ARCH reserves the right to delete or move data as necessary to maintain system stability.

  • Temporary storage for large projects is available — please contact ARCH staff.

Important

Data subject to restrictions - including but not limited to, HIPAA, PHI, or CUI is not permitted on Rockfish. If your research involves an IRB and the data is de-identified, please reach out to help@arch.jhu.edu for further guidance, prior to storing or processing any data.

Filesystems at a Glance

File System

System Type

Total Size

Block Size

Default Quota

Files per TB

Backed Up?

/home/

NVMe SSD (ZFS)

20 TB

128 KB

50 GB per user

N/A

Limited

/scratch4/

IBM GPFS

3.8 PB

4 MB

1 TB per group

2M

No

/scratch16/

IBM GPFS

3.6 PB

16 MB

By request

1M

No

/data/

IBM GPFS

5.1 PB

16 MB

1 TB per group

400K

No

/vast/

VAST

N/A

32 KB

By request

N/A

No

Local Scratch

Each compute node has a local 1+ TB NVMe hard drive mounted as “/tmp”. The latency to these NVMe flash drives is orders of magnitude lower than for spinning disk (GPFS), usually microseconds versus milliseconds. Users who read/write small files may want to use this space instead of the scratch file sets. It will provide better performance. Make sure you write files back to “scratch” or “data” before the job ends. Likewise, make sure you delete files and directories at the end of jobs.

/home/

Each user receives 50 GB of storage in /home/, backed by high-speed NVMe SSDs and ZFS. This area is intended for frequently used code, scripts, and configuration files.

Warning

/home/ is not intended for I/O from jobs. Use /scratch instead.

Limited file recovery may be possible, but is not guaranteed.

/scratch4/

This default scratch space is optimized for high file-count and smaller file sizes using a 4 MB block size.

  • 1 TB per group (default)

  • Suitable for: genomics, bioinformatics, mechanical engineering

  • Purged automatically after 90 days of inactivity (based on access time)

  • Not backed up or recoverable

/scratch16/

This scratch space is optimized for sequential I/O and streaming workloads.

  • No default allocation. Available by request with justification

  • 16 MB block size

  • Suitable for: physics, large-scale simulations, chemistry

  • Same 90-day purge policy applies

  • Not backed up or recoverable

/data/

This area is ideal for storing high-value data generated during or after computation, including:

  • 1 TB per group (default)

  • Processed results

  • Intermediate analysis

  • Files you want to retain longer than 30 days

/data/ is not backed up, so users must implement their own preservation strategy.

/vast/

This all-flash storage is provided by Krieger IT for researchers who have purchased space.

  • Mounted at /vast/ on Rockfish

  • Available to all JHU researchers

  • Request form and pricing info:

📄 Request VAST Storage & View Pricing

Quota Reporting with quotas.py

ARCH provides a command-line tool called quotas.py to help users monitor their disk usage across the /home, /data, /scratch4, /scratch16, and /vast filesystems.

This tool runs automatically at login and displays the current usage for your home directory and your research group’s shared allocations. However, you can manually run it at any time to check your usage or monitor quotas for your research group.

Usage:

quotas.py

Example Output:

[root@login01 ~]# quotas.py
+---------------------------------------------------------------------------------+
|         Home Usage for user <your_username> as of Tue Apr 15 15:00:06 2025     |
+---------------------+-------------------+-------------------+-------------------+
|         Used        |       Quota       |      Percent      |       Files       |
+---------------------+-------------------+-------------------+-------------------+
|       XX.XX GB      |      50.00 GB     |      68.56%       |      XXX,XXX      |
+---------------------+-------------------+-------------------+-------------------+

+-----------------------------------------------------------------------------------------------+
|         GPFS Usage for Group <group_name> as of Tue Apr 15 15:00:17 2025                      |
+-------------+------------+-------------+----------+--------------+----------------+-----------+
|      FS     |    Used    |    Quota    |  Used %  |    Files     |  Files Quota   |  Files %  |
+-------------+------------+-------------+----------+--------------+----------------+-----------+
|     data    |  XX.XX TB  |  10.00 TB   |  XX.XX%  |  X,XXX,XXX   |   40,960,000   |   XX.XX%  |
|   scratch4  |  XX.XX TB  |  10.00 TB   |  XX.XX%  |  X,XXX,XXX   |   20,480,000   |   XX.XX%  |
|  scratch16  |  XX.XX TB  |  10.00 TB   |  XX.XX%  |  X,XXX,XXX   |   10,240,000   |   XX.XX%  |
+-------------+------------+-------------+----------+--------------+----------------+-----------+

Fields:

  • Used: Current usage for the filesystem

  • Quota: Allocated quota for the user or group

  • Percent: Percentage of usage relative to quota

  • Files: Number of files currently stored

  • Files Quota: Maximum allowed number of files

  • Files %: Percent of file quota used

Tip

File quotas are just as important as storage size. Exceeding your file quota may prevent new files from being written even if space remains.