Filesystems on DSAI

DSAI uses a combination of high-performance and research-tier file systems to support a wide range of workloads. Most storage is backed by IBM Spectrum Scale (GPFS), with additional storage from Krieger IT (VAST) for eligible researchers.

Storage on DSAI is intended solely for research and educational purposes. Users are expected to manage their data responsibly, and storage quotas are enforced per group.

General Guidelines

  • Data stored on ARCH-managed filesystems is not backed up by default.

  • Users are responsible for maintaining their own backups or purchasing backup services.

  • Storage increases are granted on a case-by-case basis, based on need and system capacity.

  • ARCH reserves the right to delete or move data as necessary to maintain system stability.

  • Temporary storage for large projects is available — please contact ARCH staff.

Important

Data subject to restrictions - including but not limited to, HIPAA, PHI, or CUI is not permitted on DSAI. If your research involves an IRB and the data is de-identified, please reach out to help@arch.jhu.edu for further guidance, prior to storing or processing any data.

Filesystems at a Glance

File System

System Type

Total Size

Default Quota

Backed Up?

/home/

WEKA

10 TB

50 GB per user

Limited

/scratch/

WEKA

800 TB

20 TB per group

No

Local Scratch

Each compute node has a local 1+ TB NVMe hard drive mounted as “/tmp”. The latency to these NVMe flash drives is orders of magnitude lower than for spinning disk (GPFS), usually microseconds versus milliseconds. Users who read/write small files may want to use this space instead of the scratch file sets. It will provide better performance. Make sure you write files back to “scratch” or “data” before the job ends. Likewise, make sure you delete files and directories at the end of jobs.

/home/

Each user receives 50 GB of storage in /home/, backed by WEKA. This area is intended for frequently used code, scripts, and configuration files.

Warning

/home/ is not intended for I/O from jobs. Use /scratch/ instead.

/scratch/

Each group receives 10TB of storage in /scratch/, backed by WEKA.

Quota Reporting with quotas.py

ARCH provides a command-line tool called quotas.py to help users monitor their disk usage across the /home and /scratch filesystems.

This tool runs automatically at login and displays the current usage for your home directory and your research group’s shared allocations. However, you can manually run it at any time to check your usage or monitor quotas for your research group.

Usage:

quotas.py

Example Output:

[user@dsailogin ~]$ quotas.py
+---------------------------------------------------------------------+
|          Usage for user as of Thu May  1 10:01:05 2025              |
+---------------------------+-------------+-------------+-------------+
|             FS            |     Used    |    Quota    |    Used %   |
+---------------------------+-------------+-------------+-------------+
|      /home/$user/          |   64.33 MB  |   50.00 GB  |    0.13%    |
|     /scratch/$PI/          |   83.27 GB  |   10.00 TB  |      0%     |
+---------------------------+-------------+-------------+-------------+

Fields:

  • FS: Filesystem Path

  • Used: Current usage for the filesystem

  • Quota: Allocated quota for the user or group

  • Used %: Percentage of usage relative to quota