We have multiple levels of compute that let our users optimize the cost-to-performance ratio.
|Debug||limited compute resources with no-to-little wait time at a cost premium||2.0|
|Ordinary||meant for most production tasks, extensive compute resources at the base rate||1.0|
|Saving||significantly lower rate through utilizing idle compute resources; compute resources may be terminated at any time depending on the load in the data center||0.2|
|Premium||premium-quality resources (eg. low-latency interconnect)||2.7|
1 For GOF queues charge factor is 8.8, for GPOF queue (available on Azure as of 2018-09-18) - 5.5
It is advised to use Debug level while prototyping your calculations, Ordinary for mission-critical tasks, and Saving - for restartable runs that can tolerate interruptions (eg. check-pointed relaxation runs).
"Saving" compute level and compute resources termination
The concept of saving resources is very similar to the spot-based instances introduced by AWS. When the datacenter has increased load, some or all saving compute servers may be terminated. We attempt restarting the calculations by resubmitting the corresponding job to resource manager queue. At current, no charge for the first whole hour is incurred upon compute resource termination. More information available here
Depending on the size and degree of urgency, simulation tasks can be directed by user to different submission queues to optimize cost/efficiency ratio.
|Name||Level||Meaning||Nodes/job||Charge policy||Max nodes1||Cores/Node||GPU/Node|
|GOF||Ordinary||GPU-enabled ordinary fast||≤50||node-hours||100||MAX||1|
|G4OF||Ordinary||GPU-enabled ordinary fast||≤50||node-hours||100||MAX||4|
|G8OF||Ordinary||GPU-enabled ordinary fast||≤50||node-hours||100||MAX||8|
|GPOF6||Ordinary||GPU-enabled ordinary fast, P1006||≤50||node-hours||100||MAX||1|
|GP2OF||Ordinary||GPU-enabled ordinary fast, P100||≤50||node-hours||100||MAX||2|
|GP4OF||Ordinary||GPU-enabled ordinary fast, P100||≤50||node-hours||100||MAX||4|
1. maximum number per single cluster, may be administratively adjusted depending on load
2. jobs are charged according to the number of core-seconds consumed
3. maximum number of cores per node depends on the cluster and is shown in platform here
4. jobs are charged according to the number of node-seconds consumed
5. charged according to the number of whole (integer) node hours consumed
6. For GPU-enabled queues the second non-numeric character in the name stands for the type of GPU used according to the NVIDIA classification. Thus, for GPOF queue NVIDIA P100 instances are used. By default (eg. for GOF queues) "V100" is used. Readers can find more about the types of GPU instances available from the supported cloud providers using Links section below.
Approximate wait times for a single job to start (unless datacenter capacity is exceeded):
- debug: no wait
- on-demand: 5 min
- saving: 10-15 min
Premium compute level multiplier is applied based on the underlying compute hardware. We isolate resources by clusters, hence the multiplier will be applied when a premium-level compute cluster is used together with queue-based multipliers.