Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

So you've heard about high performance computing (HPC) and want to explore this with your own data. Great!

The first thing to consider when it comes to HPC is to evaluate if your analysis approach is suitable. A typical CPU-core in HPC is not that much faster than a high-end desktop processor. Therefore, in its performance, HPC heavily relies on the use of multiple processors and parallel computations. Your analysis pipeline needs to be written and executed in a parallel fashion to benefit from the performance gain. 
Roughly, parallellization can be divided in task parallellization and data parallellization. In the first, you perform different, simultaneous tasks on the same data, whereas in the latter the data is split in subsets that undergo the same procedure.
If you want to analyze your data on a cluster or grid, you need to think of a way to parallellize. 

The HPC-infrastrucure can be used in two ways. The local cluster that is setup in Maastricht contains of 2 compute nodes that each contain 64 CPU-cores. Further specifications are: 512 GB of system memory, 10 TB of scratch space and 40 TB storage. This compute cluster provides a low threshold entry point to high performance computing, since the environment and procedure to follow resemble your local Linux installation.

However, the local cluster is limited in capacity, so if you really want to go HPC, you need to scale up to the grid. Main prerequisite is that your analysis needs to be embarrasingly parallel (thousands of simultaneous jobs), since a lot of overhead and queuing is involved.