Most high-performance computing projects are related to scientific goals such as weather forecasting, mathematical modeling, molecular simulation or drug discoveries, which can make defining the project’s requirements difficult. As a result, properly defining user requirements becomes the biggest obstacle of an HPC project....
The complex nature of HPC hardware and software further complicates end users’ understanding of its requirements. When it comes to aspects like HPC architectures, memory sizes, software, bandwidth and latency requirements, users often find themselves at sea.
“HPC user teams are unable to estimate their exact computing requirements at times,” said R P Thangavelu, a scientist with the CSIR Centre for Mathematical Modeling and Computer Simulation (CMMACS ) at National Aerospace Laboratories. “Existing facilities may often be inadequate, so their thinking will be limited to those setups. They are not able to think ‘quick and big’, which becomes an issue.” In such instances, a loose definition of requirements can blur the final goal.
User expectations can throw a spanner in the works, so it’s important to deal with them during the initial stages. Often, users have the tendency to set the project’s final scientific goal higher than is practically required or feasible from a technology or budgetary perspective, leading to oversizing. Additionally, some users consider a huge HPC system as the panacea to all requirements. “Many users feel that since a particular task takes time on a desktop PC, they need a supercomputer,” said N Seetharama Krishna, head of operations at Computational Research Laboratories (CRL) Limited. “On further discussion, you realize [the user’s] expectation is not to run more than two jobs per day. This translates to underutilization of an HPC system, which can handle 100 jobs per day. You should be extremely careful when users provide requirements.”
Finally, users also tend to skip pre- or post-processing steps while calculating
the total processing time required for a particular task.
Starting on the Right Note
The first objective of defining user requirements is to extract and understand the scientific goals for the HPC project. Even though this sounds like an easy step, it’s the beginning of the end in many failed HPC efforts.
Users also tend to skip pre- or post-processing steps while calculating the total processing time required for a particular task.
A suitable manager or auditor is critical at this point, since he or she will be in charge of the entire requirement analysis. Such an auditor should have a fair understanding of the specific domain. In case this is not feasible, the manager or auditor should be capable enough to conduct specific background research prior to interviewing users. He or she should be able to maintain a balance between domain knowledge and technology to translate requirements into the most suitable computing requirements.
Requirement analysis can be conducted using various methods, including user interviews, questionnaires and FAQs. However, it’s best to perform these as face-to-face discussions. “You should interact with computational scientists one-on-one to properly understand their requirements,” said Thangavelu. “It’s not like you can call a general meeting and everyone will be able to come up with their requirements. It doesn’t work that way.”
Some of the parameters to be captured for a new project are:
- Scientific project in mind
- End goal of the project
- Size of the required experiments. If the aim is to simulate climate patterns, does the user require simulations for one day or spread over many months?
- Do user teams have preferred applications in mind for the scientific project?
- Is in-house development required? Is an application unavailable for the requirements?
- Expectations from a benchmarking perspective
- Short-term or intermediate goals for the project. For example, what’s the user’s goal in the next six months to a year? What stage of the project is expected to be completed in six months?
- Data sets the job will use
- Number of users
- Frequency of job submissions
- User access roles
- Required redundancy levels
- Information security requirements
- Compliance requirements—regulatory or otherwise
If the project requires an upgrade, the following additional aspects should be
examined when defining requirements:
More High Performance Computing stories
- Details of existing HPC infrastructure (clusters, storage, interconnects, power, cooling, etc.)
- Application software in use
- Application behavior patterns/trends
- Planned application software (if a new application is proposed for use)
- Data center architecture
- Benchmark performance—existing and expected
- Documentation of existing solution and development exercises, if available
Look Up, Follow Up
Once essential requirements have been collected, it’s the auditor’s job to document them. “After requirement analysis, I research if similar projects have been implemented elsewhere,” said Krishna. “I also coordinate with my contacts in that specific domain to find out the ways in which I can meet that requirement. Feasibility is revealed as well during this step.” Figure 1 shows the flow of defining requirements in an HPC project. If a user has lofty goals for the project, you should have an open discussion.
Figure 1: HPC requirement definition feedback loop
Typical challenges that arise at this point involve aspects such as cost, scope, time and the feasibility of available technologies. Depending on this discussion, you may need to rework the project’s requirements. Scope changes are often restricted to modifications on the scalability front.
“People want faster solutions and [want to] constantly run bigger problems,” said Subram Natarajan, executive, Deep Computing, Systems and Technology Group at IBM India/SA . “The scope usually changes only in those two dimensions, and not from a domain-specific point of view.”
HPC professionals stress that discussions on requirement modifications should be kept as an educative process for the end user. Therefore, the HPC team empowers end users to make informed decisions and reduce risks. This is the best way to create a mutually acceptable set of requirements that will ensure the overall success of the HPC project.