robdowner - stock.adobe.com
With over a third of Australia’s modern mammals having become extinct in the past two centuries, the country’s researchers are speeding up genomics research to protect endangered animals.
For the past decade, Carolyn Hogg, a senior research manager for the Australasian Wildlife Genomics Group in the University of Sydney’s science faculty, has been assembling and annotating genomes for the Tasmanian devil, a rare marsupial that is being threatened into extinction by a contagious cancer.
But identifying the function and location of specific genes in a genome can be laborious and resource intensive. Hogg likened the process to putting together a 5,000-piece jigsaw puzzle with no picture to work with.
Solving the puzzle would involve spreading out all the pieces and finding the edges. “Slowly, you start to slot bits together, and you contract the space used by the other pieces. We’re often working with more than a billion pieces of jigsaw and no guide,” she said.
To address the challenge, the University of Sydney teamed up with Amazon Web Services (AWS) on a trial project in 2019 to tap cloud-based services, such as Amazon EC2 and S3 storage, to process, analyse and categorise genomic data.
Within 12 weeks, Hogg’s team managed to analyse and process the data in more than 50 data pipelines – the same task would have taken longer if they had relied on the university’s high-performance computing (HPC) infrastructure.
“One of our most commonly employed pipelines used to take us up to a week with HPC because we would have to split it up into different commands and then wait in the queue for each one,” said Parice Brandies, a doctoral student on the team who worked on the data pipelines.
“And, if there was an error, we would have to start again. We’ve got it down to under three hours from start to finish with Amazon,” she said.
That said, Hogg’s team still uses Australia’s National Computational Infrastructure (NCI) to crunch certain workloads depending on the size of the data pipeline.
“The thing that AWS gives us is the ability to scale the size of the machine quite rapidly to the size of the data that we have,” said Hogg. “Historically, we’ve had to look at the size of the data and work out where we can break the pipeline to optimise our processing time.”
The genetic research is already being used to support a wildlife conservation programme initiated by the Tasmanian government to maintain a healthy population of Tasmanian devils on Maria Island off mainland Tasmania.
Specifically, Hogg’s team did a genetic assessment to ascertain which Tasmanian devils could be put on the island, and subsequently monitored changes in their genes. Then, they identified those that could be moved back to Tasmania to improve the genetic make-up of the diseased animals on the mainland.
“The diseased population in mainland Tasmania receives new genes, and that will help them to be more resilient in the future,” Hogg said.
Read more about IT in scientific research in Australia
- Australia’s Commonwealth Scientific and Research Organisation upgrades its high-performance computing infrastructure to keep pace with global research.
- The University of Sydney upgrades its supercomputing infrastructure to answer big questions on cosmology and keep pace with growing research needs.
- Australian researchers are using Amazon’s Lambda serverless computing service to solve pressing health problems.
- Murdoch University pioneers an open source project designed to build drag-and-drop-style web-based user interfaces suitable for supercomputers in a boost to genomic DNA research.
Now, Hogg’s team is starting a new project to assemble and annotate the genomes of 40 to 50 of Australia’s most threatened species.
The researchers will be sharing this genome data on the AWS Public Dataset Program, an initiative designed to give researchers anywhere in the world access to scientific datasets, with the aim of accelerating scientific discovery.
At the same time, Hogg’s team is planning to put up a database of immune genes in an Amazon S3 bucket that will be publicly searchable by other researchers globally, adding that the current Covid-19 coronavirus pandemic is a testament of the need to better understand genomes.
“The reason we’re able to sequence genomes so quickly during the pandemic and be able to understand the differences between the virus and other species that we can potentially come into contact with is because we’ve got access to genomic data,” she said.