psdesign1 - Fotolia
Genomics England, the UK government-owned organisation aiming to map 100,000 genomes by 2017, has recently recruited its first patients, at St Mary’s Hospital in Manchester. The hospital is part of that city’s NHS Genomics Medicine Centre, one of 11 around England which will participate.
It has also diagnosed its first patients, through pilot work carried out at Newcastle Hospitals NHS Foundation Trust and the University of Newcastle.
Leslie Hedley, whose first kidney transplant failed, saw his father, brother and uncle die of kidney failure. His daughter, Terri Parker, has early signs of kidney damage, and her daughter might also be affected. Sequencing of Hedley’s genome found a particular genetic variant, which will allow his family to be tested and, where necessary, for their blood pressure to be controlled through appropriate drugs.
Also in Newcastle, brothers William and Allan Carpenter have been diagnosed with inherited nerve damage and may join a treatment trial which could prevent other family members from developing the condition.
But Genomics England is also developing the IT infrastructure that will be needed to carry out big data analysis on the immense quantities of data produced by genomics. This will replace its current interim systems, funded by the Medical Research Council, with infrastructure as a service (IaaS) from Skyscape, and Ark as the datacentre provider.
Planning for this second phase is well advanced, says Augusto Rendon, director of bioinformatics and genome analysis at Genomics England. “It will move us from the interim private cloud solution, which may not support all the needs we have, to a much more focused, assigned datacentre,” he says. It is planned to be in place by the end of March 2016.
This work took an important step forward in March 2015, when Genomics England announced that 10 pharmaceutical companies had joined its Genomics Expert Network for Enterprises (Gene) Consortium for a year-long trial, including AstraZeneca, GlaxoSmithKline and Roche.
“It’s about building trust that we’ve collected data to the standard they can use, and familiarise themselves with the data,” says Rendon. “It’s also the beginning of deciding and scoping the kinds of environment that will be available.” He says the trial will help it work out what the companies need. “We recognise some have quite sophisticated informatics in-house. They know what they are doing, so we’re keen to learn from that.”
John Reynders is vice-president of bioinformatics for Alexion Pharmaceuticals, a US-based member of the Gene Consortium. “It’s been a very collaborative set of discussions, where across the 10 industry participants we are able to provide a converged view,” he says. “It’s been great to be involved in the pilot stage, while Genomics England is bringing its capabilities online. We’ve future-proofed.”
John Reynders, Alexion Pharmaceuticals
Companies, in common with academic researchers, will not be able to take possession of Genomics England’s raw data – they have to run their analysis within the organisation’s virtual environment. The lack of direct access means that Genomics England is keen to explore how federation of data will work – “The ability to run analysis across data silos, without violating the individual silos,” Rendon says. “We’re putting a lot of effort into that right now.”
Genomics England’s use of data federation also makes sense given the size of genomic data sets – a full DNA sequence is around 200GB, and in total Genomics England expects to gather 12-15PB (petabytes). Researchers will need to compare it to reference data held elsewhere, with different rules. “How do you not copy all that into our datacentre? These are things we’re just starting to explore,” says Rendon. “They are quite fundamental changes in how we do things.” Some of this will take place through the Global Alliance for Genomics and Health, which is working to set standards in this field.
“There was a tipping point when the data became so large,” says Alexion’s Reynders. “The approach of bringing the analysis to the data protects privacy and is more efficient. It’s a far more effective way to do genomics analysis than pulling in all the data to your servers.”
Alexion has reflected this shift in its own IT choices, he adds. “In the past, people would be building up significant high-performance computing systems on-premise. But there are now ‘genomics as a service’ environments. It’s the very rare cases where a company might elect to do something on-premise. For all of us to try to build these capabilities, it’s a fairly heavy lift.”
Varying types of data analysis
Genomics England not only has to design processes to support researchers from companies and academia in using many people’s data, but also to support clinicians. “They are slightly different types of analysis,” says Rendon. Researchers are more likely to want to compare thousands of patients with thousands of control samples, whereas clinicians need swift information on their own patients – although he adds that many users have both roles, and these will have to be managed in terms of information governance.
“One of the challenges for us is to have a multi-tenant environment, where data is stored for different sets of people,” says Rendon. “How do you avoid duplicating databases?” There is also the need to remove personal identification from data, although he says that keeping the raw data within Genomics England’s systems, as well as requiring participants to sign up to data access agreements and also an audit trail of who sees and uses what information, means that “you’ve got multiple levels of protection”.
Rendon says Genomics England is in an explorative collaboration with Hadoop software firm Cloudera on this work, which is contributing expertise on the Apache HBase database and its Parquet storage format.
The data from each sample will undergo a standard basic analysis, with researchers expected to build on this as a starting point. “There is a tension – everyone thinks they can do it better,” says Rendon. However, “it’s wasteful if everyone is doing the same thing,” given that the basic analysis will take around 1,000 core hours to produce.
As well as DNA samples, the analysis requires data on patients’ conditions and medical histories. The NHS trusts involved have options on how they provide this – some will use a standard interface, but those with systems capable of it will build automated extraction tools drawing on their electronic patient record systems. Some trusts made successful bids to the Department of Health’s Integrated Digital Care Technology fund to enable this.
Researching rare diseases
Genomics England’s initial focus is on rare diseases, where the germline DNA – that which a person has from birth – is analysed. Cancer treatment requires samples of the cancerous cells’ DNA, which is much harder to carry out.
Tim Hubbard, head of genomics analysis at Genomics England, describes current ways of obtaining samples as “not very DNA-friendly – there has been no reason for them to be”. Pathology labs may need to change the temperature of processes and their use of dyes, as well as having to gather larger samples than has previously been the case, to make them so.
Hubbard adds that the organisation is also looking at how it collects data from the NHS in a consistent fashion, and in return providing “the best clinical interpretation we can”, such as by improving collaboration across multi-disciplinary teams, which currently do not exist outside a few major centres.
John Reynders, Alexion Pharmaceuticals
In June, Genomics England chose four companies to provide clinical interpretation services from August, subject to their passing a test phase and agreeing a contract: Congenica and Omicia in rare disease, Nanthealth in cancer, and Wuxi Nextcode in both areas, with a joint bid from Lockheed Martin and Cypher Genomics in reserve.
Alexion’s John Reynders believes that what Genomics England is doing is unique. “You’re looking not only at a very ambitious research effort, but also clinical interpretation sites. You have the connection from the genome to how the disease presents itself,” which will help clinical understanding.
The work benefits from its scale, he says. “If I look at other countries, you might have research in hospital networks, but Genomics England is looking at a national scale, exploring how we can introduce this into national practice.
“I don’t know of any other nation that is this far ahead with the scale of the genomics, the scale of interpretation and the ambition to introduce it to clinical practice across the nation. It’s creating a new archetype in genomics-based medicine,” he concludes.
Read more about genomics
- Cancer Research UK and government-funded Genomics England are using DNA-sequenced big data analytics to develop personalised cancer treatments.
- In the US: How, with the combination of a SaaS analytical platform and a custom algorithm, Nationwide Children's Hospital is taking on population-scale genomics.
- Will personalised medicine – specifically, patient genome sequencing – drive adoption of personal health records (PHR) services?
Read more on Big data analytics
UK Biobank to build AWS-hosted data analysis platform with DNAnexus to speed research
Genomics England taps up AWS and Lifebit to create cloud-based Covid-19 research environment
NHS Long-Term Plan to bring health service into the digital age
UK genome sequencing project reaches 100,000 genomes goal