OR WAIT 15 SECS
Emil Eifrem is CEO of Neo4j, the company behind the world’s leading graph database.
Emil Eifrem examines how life science researchers at Munich’s German Center for Diabetes Research are uncovering insights in their data with a new way of working with complex data
Emil Eifrem examines how life science researchers at Munich’s German Centre for Diabetes Research are uncovering insights with a new way of working with complex data
Diabetes is one of the most widespread diseases worldwide. Increasingly both type 1 and 2 forms of the condition in our ageing population will present major healthcare challenges in the coming years, with type 2 diabetes in children having risen 40% in three years amid Britain’s obesity epidemic, for instance.
Clearly, investigating its causes and, through new scientific findings, developing effective prevention and treatment measures to halt the emergence or progression of diabetes is a priority for policymakers, citizens and the research community in all advanced economies.
Germany is no exception with a dedicated national resource for diabetes study, its German Center for Diabetes Research (DZD). With its headoffice in Munich, DZD brings together experts from across the Federal Republic to develop effective prevention and treatment measures for diabetes across multiple disciplines, and to investigate which treatments and lifestyle interventions the latest biomedical technologies may offer citizens dealing with the condition.
The Center is an instructive example of what is possible in diabetes research when a new way of tackling the problem is explored. That’s because, in order to better understand diabetes’ causes, its scientists examine the disease from as many different angles as they can, since one discipline is insufficient to answer a multifaceted biomedical question. Its researchers do this by combining basic research data sources (including genetics, epigenetics, metabolic pathways and other work) with data from clinical studies. Connecting this highly heterogeneous data is a challenge, but necessary in order to answer biomedical questions across disciplines.
Besides connecting this research data from various disciplines, locations and species, DZD’s leadership also want an easy-to-understand visualisation of data and easy querying so that its scientists can easily identify potentially promising new breakthroughs. The result is a master database to consolidate currently disconnected stores of information, and provide its 400-strong team of scientists with a holistic view, enabling them to gain valuable insights into the causes and progression of diabetes.
In search of a suitable data tool to build such a system, DZD’s Dr Alexander Jarasch, the Center’s Head of Bioinformatics and Data Management, drew on experience gleaned from previous work on a project at Munich’s Helmholtz Zentrum. That work had used a form of working with complex data where you want to better understand relationships in the data called graph database technology. This proved such a positive experience he decided to repeat the experiment at the DZD. The graph software used here was Neo4j’s graph platform.
As a result, Dr. Jarasch has offered his colleagues a new internal tool, DZDconnect, which is being built using graph technology that sits as a layer over the various relational databases that link disparate DZD systems and data silos. And while not fully implemented yet, DZD staffers can already access metadata from clinical studies in the DZDConnect prototype – and are particularly impressed by the visualization and the easy querying it has made possible.
Many researchers wonder if graph databases (technology that powered the Paradise Papers investigation as well as other examples of cracking big data problems) could provide the pharma industry with the opportunity for valuable, previously unobtainable insight that has the potential to improve our lives. That’s because, not only is graph technology ideally suited to depicting hidden relationships and discovering “known” and “unknown” unknowns at big data scale, it is also able to handle dynamic and constantly evolving data – something that is vital with scientific or bioinformatics analysis research.
That’s useful because real world data comes in different and highly unstructured formats. This means that big data life science research must go beyond simplistic managing, analysis and storage of data that fits neatly into a specific discipline and find new ways to achieve its objectives. This realisation has resulted in re-visiting the tools historically utilized for the purpose, including SQL and relational database technology. Unfortunately, traditional relational database methods can’t cope with the volume, as well as the inconsistent data, we need to use for impactful large-scale diabetes research of the kind we really need to start doing; medical data by its nature is very heterogeneous, running from cell-level to detailed data to macro-scale disease network tracking – all in the same research.
Often, scientists want to link either end of the spectrum as that is where the interesting results can lie, but it is a real challenge to model this – especially as the reality is that hidden relationships in data is where actual breakthroughs lie hidden. Graph database technology, however, has appeared as a viable and powerful alternative, because unlike the relational database, which stores data in rows and columns, graph databases connect data points in unstructured big data, essentially joining the dots to create a picture of the relationships between them. These data patterns are difficult to detect using SQL-based RDBMS but also other nonSQL approaches, such as Hadoop.
As human beings we also look to model connections between data elements visually, building up an intuitive model. The relational data model does not match our mental visualization of the problem, technically defined as “object-relational impedance mismatch”. Trying to take a data model based on relationships and pushing it into a tabular framework, the same way a data platform like a relational database does, ultimately creates disconnect that can not only cost valuable time, but can also lead to missing potentially useful patterns and leads.
The promise, then, is that the more detailed the information, the easier it is to identify relationships and patterns. The idea is to eventually enable DZD’s researchers to pose useful questions in a Natural Language form, such as - How many blood samples have we received from male patients under 69 and which parameters did we measure? Which studies are the samples from? How has the value of glucose-level changed in the long term? Can the change be attributed to a healthier diet, a drug or a hitherto unknown factor?
Jarasch confirms that, “With graph we were able to combine and query data across various locations. Even though only part of the data has been integrated, queries have already shown interesting connections which will now be further researched by our scientists”. In the long term, as much DZD data as possible should be integrated into graph database, Jarasch believes. The next step is to see how human data from clinical research will be complemented with highly standardized data from animal models, such as mice, to find communalities.
Interestingly, it’s not just graph technology being employed with complex data at DZD. AI techniques like Machine Learning will play a key role going forward, says DZD, with a particular area of interest building a system able to “read” scientific texts and integrate them into the database ready for analysis. “Technology makes it easier to view medical issues from different perspectives and across indications,” Jarasch points out. “This also makes it possible to identify correlations between various common diseases.”
The kind of innovative data management and analysis approach DZD is pioneering could well be the way forward in precision medicine, prevention and treatment of diabetes – and, perhaps, for other diseases. There are at least eight significant cancer research projects that are using graph databases we know of right now, for instance.
Further work to build graph-based data structures for research will enable ever more highly trained specialists to have access to data in a form they can work with earlier in their research. And ultimately, having the power to dive deeper and make the unknowns known in order to uncover the potentials in real-world data is a compelling tool in life science research.
Emil Eifrem is CEO of Neo4j.
Learn more about graph databases at the dedicated Neo4j portal here.