magdal3na - Fotolia

Data scientists could learn a lot from Zebrafish

Never mind the number crunching. Nick Booth thinks data scientists should learn to start from the bottom up like Dr Bill Chaudhry at the Institute of Genetic Medicine

Recently, I downloaded some big data into my limited capacity brain from the more  sophisticated database residing in the Hippocampus system of Dr Bill Chaudhry of Newcastle University. We were networked together through the efficient portal of Harriet Southwood of the British Heart Foundation.

Chaudhry is using zebrafish to save people with congenital heart disease. This is a condition that is diagnosed 12 times a day in new born babies. So around 4,500 new humans with this heart condition will be added to society every year.

Having seen how this affected my cousin and her family and professionals carers in her short life, I can appreciate how valuable this work is. The suffering it causes families - and the resulting strain on the NHS - could be alleviated through better understanding of what goes wrong when our hearts are in development.

Dr Chaudhry’s investigation into the genetic causes, effects and possible resolutions exemplifies how Big Data should be done. What’s interesting - from an IT industry standpoint - is that the investigation starts from the bottom up.

It’s been claimed that ‘data scientist’ will be the sexiest job title of the 21st century and that everybody will want to be one. No, it won’t and no, they won’t. Neither is data journalism romantic. Who wants to be a half man-half desk? This faulty logic is a perfect example of how the IT industry gets it wrong, over and over again. They assume we all want to learn to speak machine language. Humans should never have to work around the nuances of the IT system - it should be the other way around.

People should be experts in their fields, not experts in the proprietary command systems peculiar to each manufacturer. Somehow we’ve lost the plot and started to believe that ‘the IT is the business’. No, it isn’t. The ‘business’ revolves around the people - lawyers, geologists, pharmacologists - who know all the right questions to ask of the data. Not the people who know all the right commands, but wouldn’t know a catheter from a cathode ray tube.

I don’t care how many Teradata tools you can use, or how terabytes your SAP HANA  can crunch through in memory, who how often you can get Splunky with your customer behaviour. You can have the best kit in the world, but if you’ve got no basic instincts or understanding you’ll never be much use in any game.

Imagine how, say, a ‘data scientist’ would analyse the story of the British Heart Foundation’s sponsorship of Dr Chaudhry’s research.

These are some of the variable pieces of data about the work of the British Heart Foundation’s work that might be in their Oracle Advanced Big Data Appliance.

Zebrafish zygotes. Shirts. Newcastle University. Compact Disks. Mice. Single Chambered hearts. The Institute of Genetic Medicine. November 13th. Old coats. Premature deaths. Right ventricle. Valves. Charity shops.

If you asked a data scientist to spot the story there, they’d probably conclude that gangs of mice and zebrafish in Newcastle shirts are killing pensioners for their CDs, by grabbing by them by the heart. That’s because data scientists like to ‘drill down’ onto data, often without a clue what the relationships are between all the different types of information. They know all the commands, but none of the complexities.

The story from the bottom up is very different. This is because Dr Chaudhry who is leading the investigation into congenital heart conditions at Newcastle University’s Institute of Genetic Medicine, knows the relationships between all the variables from the start.

Zebrafish embryos are known for the punctuality of their development. Once fertilised their eggs develop with almost perfect predictability. You could almost set your watch by a Zebrafish zygote. You could certainly use them for a time based study of how embryos develop.

Though zebrafish have evolved different hearts from us (ours ended up with four chambers, with two auricles and two ventricles, while theirs have evolved to have a simpler structure) our zygotes all start out with more or less the same template. Theirs don’t develop two ventricles when they are fully grown, but there is a beginning of one. For some reason of evolutionary nature, the zebrafish species seems to have abandoned the idea of needing two ventricles, which has worked out alright for them. They have the option to have hearts like humans one day though.

This means that zebrafish zygotes have comparable hearts to our own. Since their growth is uniform over time, extracting material from these zygotes can give researchers an insight into how the human heart develops. A study of the genetic material shows which genes, for example, might be responsible for the components of the heart. From that Chaudhry and his team are able to patiently piece together the story of which genes are responsible for which actions, and work out what might have gone wrong in cases of human congenital heart disease. Why, for example, was my cousin born with a ‘hole in the heart’ (as they called it then). And why do some people have faulty valves in their hearts? What are the genomes or gene sequences that do the coding for this? How come they can be rendered faulty in 4, 400 cases a year. What caused that genetic malfunction and can it be switched off? These are all questions that Chaudhry’s study, is close to answering. “Every time I think I’m nearly there, something unexpected arises and I have to question all my assumptions,” says Chaudhry.

The study has much wider significance since all humans have a variety of faulty genes.

The lesson for tomorrow’s data scientists, and the manufacturers of the systems, is that it’s far more important to know to know the subject than to have the most powerful tools. A human hippocampus still trumps HANA, no matter how much memory and processing power the machine is supposed to have.

Chaudhry’s study is one of several being founded by the funds raised by the British Heart Foundation. All the sales of your donated shoes and CDs, in charity shops, pay for these types of  vital research. This study alone which could mean life or death to the 4,400 people born with congenital heart diseases every year.

This week the BHF has announced a drive to raise £750,000 to fund more research, such as Chaudhry’s continuing work into the Development and Maintenance of The Arterial Valves

Read more on Enterprise Storage Management