zinkevych - Fotolia

Put supercomputing on curriculum, says Earlham Institute’s Tim Stitt

Tim Stitt, the head of scientific computing at the Earlham Institute, says children should be learning supercomputing and data analysis concepts from a young age

Children should be learning about supercomputing and data analysis as well as computational thinking, says the head of scientific computing at the Earlham Institute, Tim Stitt.

Stitt says that by the time students get to higher education they have little knowledge about supercomputing concepts because they have been taught linear coding rather than parallel programming.

These habits take time to break, and Stitt admits that those already working in the field often have very little time to train newcomers.

“They have no idea what they’re doing, to be fair,” he says. “They can write programmes, probably in Python, which is fine if you’re working on a desktop or laptop. But when you’re given a large machine with 2,500 or 3,000 process cores on it and 30TB of RAM and are told to write a code that uses all of this resource, that’s when they begin to struggle.”

The Earlham Institute, formerly the Genomic Analysis Centre (TGAC), has over eight petabytes of storage on its campus in Norwich, and its scientists generate huge amounts of data every week.

But the raw data produced from sequencing extracted genomes then has to be analysed, which not only produces more data but also requires parallel programming processes to properly utilise the site’s computing power.

The need for data scientists

“We have biologists who aren’t necessarily as computer-savvy as physicist or chemists who use supercomputing resources at undergraduate level,” Stitt says.

“They need to start writing parallel programmes, but they’re not taught this when they’re doing their studies.”

As most computing processors are multicore, coders without parallel programming skills are unable to write programs that can use all the cores at the same time, making work extremely inefficient.

Stitt calls this the centre’s biggest skills challenge. Teaching graduates the skills they need to understand data analysis and parallel computing involves a “steep learning curve”.

“We’re generating much much more data and we have more to analyse,” he says. “Unless we start using these large high-performance computers effectively, we’ll be generating lots of data and we won’t be able to analyse it in any reasonable time.”

Read more about high-performance computing

  • IDC’s server shipment data shows mainframe users are running Linux on the z13, while high-performance computing lifts x86 sales.
  • High performance computing specialist aims at analytics and transaction-heavy workloads in the enterprise with its upgraded Intel Broadwell-powered SFA array, the 14KXi.

Computational thinking

In September 2014 a new computing curriculum was introduced in the UK making it mandatory for children between the ages of five and 16 to be taught computational thinking.

But Stitt says this may “compound the issue”, as children will be taught serial rather than parallel programming skills, making supercomputing concepts harder to learn later on.

“If they’re just taught traditional sequential programming at a young age, then this issue could be worse because they have even more years of conditioning in sequential programming,” Stitt says.

“We need more programmers by all means, but we need to start producing more programmers who can work in a parallel nature.”

The younger, the better

When children are taught serial programming from a young age, Stitt believes they will find it difficult to learn parallel programming subsequently as it will seem “unnatural” because they have been taught to solve problems sequentially rather than through multitasking.

The earlier the age at which children are introduced to alternative ways to programme, the better, believes Stitt.

“The younger you learn it, the more comfortable you become with it,” he says. “And then once they get to graduate school and start working on real projects it’s second nature to them, but the curriculum just isn’t set up that way at the moment.

“We generate lots of computer scientists and lots of software engineers through universities, but very few of them ever learn parallel programming.”

The creative side

Research has found that many girls are turned away from computing and scientific careers because they believe these subjects are too difficult.

Stitt suspects this is because of an emphasis on coding in the current curriculum, and a misconception about the kinds of people who usually fill coding jobs.

“Girls tend to be better at problem solving,” he says. “Nowadays it’s all about programming but it’s more than that, programming is a small part of it.”

Many believe that one of the reasons few people go into science, technology, engineering or maths (Stem) careers is because they are not aware of the types of roles available, such as design, project management or creative roles.

Stitt states women would suit roles designing initial plans for projects, as opposed to just focusing on coding. “We need them for problem solving,” he says.

The next generation of data scientists

Some organisations are turning to upskilling to get the skilled workers they need, by teaching internal candidates to take on unfilled roles.

But Stitt says this is not always possible for the scientists at the Earlham Institute due to time restraints.

“We have researchers who live and die by publications,” he explains. “In reality they don’t necessarily have time to sit and mentor or train a new faculty member on tech, particularly on the high-performance computing side, because they’re not really experts themselves.

“In an ideal world they would love to have the time to be able to mentor new people coming through.”

Collaborative approach

Collaboration between industry, the government and education establishments is often cited as the best way to ensure that schools and universities are providing students with the skills they need for a career in Stem.

Stitt says the institute is working with Intel, which supplies much of the lab’s equipment, to bring in new recruits who will help people to learn how to use Intel tools.

But he warns: “Those schemes are few and far between, and there only a few recruits and a lot of HPC centres that need them.”

Current collaborative efforts such as the Centre of Advanced Knowledge of Engineering (Cake) are designed to use education and industry collaboration to teach people what skills businesses really need, and create a pipeline of data scientists for industry.

Stitt hopes such initiatives will begin to target even younger students, and that they are combined with practical experience to further cement knowledge.

Read more on IT education and training