Skip to main content

Big Data Analytics- The Microscope and Telescope for Pharma/CRO


This article is produced in consultation with Industry experts, and KOLs in India and across the world and especially highlights how “Predictive Analytics algorithms” are poised to provide Useful Analytics to the Pharma industry. 


 In today’s highly competitive market, it is extremely important for contract research and manufacturers to have access to information that allows them to target the specific segments of the pharmaceutical and biotechnology industry that are looking to outsource the particular services they provide. 

Understanding shifts in annual outsourcing budgets and spending can help CROs and CMOs to better position themselves for capturing business, particularly at a time when many – if not all – pharma and biotech companies are looking to cut costs and streamline operations. 

As per a recent report by NiceInsights, Analytical Services, Clinical Research, and Clinical Monitoring will be the most demanded services over the next 12 to 18 months. For CROs and CMOs looking to increase partnership opportunities with pharma and biotech companies, one thing is for certain – knowledge is power. Predictive Models for Drug discovery especially Biotherapeutics- The vast majority of pertinent life sciences data is unstructured, residing in the text of scientific papers and journals, Web pages, electronic medical records, e-mails, textual information embedded in structured databases, and product registration documents and filings. 

The inability to search text for relevant information has led, no doubt, to countless missed opportunities for faster and less expensive drug discovery and development. Predictive analytics can also be used to model and predict the behaviour of biomolecules, and are increasingly being used in the search for new chemical entities (NCEs) to fill the drug discovery pipelines of the pharmaceutical and biotechnology industries. We define ‘predictive bioanalytics’ as the use of homology-driven methods at the atomic scale that integrates heterogeneous biological data sources to identify multiscale relations between biomolecules as interaction signatures, which can then be used to assess the probability of a compound becoming a drug for a particular indication and or disease. 

New data-mining applications – feature expanded analytics, user-friendly interfaces, and powerful algorithms that allow researchers to analyze structured and unstructured data. By leveraging the diversity of available molecular and clinical data, predictive modeling could help identify new potential-candidate molecules with a high probability of being successfully developed into drugs that act on biological targets safely and effectively. Predictive Models for Patient/Subject discovery- Today, a researcher, for example, might use data mining to find clusters of disease subtypes in hope of finding subtypes to focus on that specific target or hopefully enable a more precise treatment course. 

Attribute-importance algorithms now help researchers, for instance, select the subset of genes most likely used in discriminating types of cancer. Researchers can use predictive analytics to find factors associated with a disease or predict which patient might respond best to an experimental treatment. 

 Carefully conducted clinical trials are performed in human volunteers to provide answers to questions such as: 

 Does a treatment work? 

Does it work better than other treatments? 

Does it have side effects? 

Clinical trials also provide important information on the cost-effectiveness of a treatment, the clinical value of a diagnostic test and how a treatment improves the quality of life. Ever wondered how patients get selected for clinical trials without this advanced algos? Traditionally, physicians have selected trials by manual analysis of patients’ data. The review of resulting selections has shown that they usually do not check all clinical trials and occasionally miss an appropriate trial. 

 Until now we only have some web systems to address the problem to an extent. To address this problem, the Industry has developed near expert systems that help to select trials for each patient. It prompts a clinician to enter the results of medical tests and uses them to identify appropriate trials. If the available records do not provide enough data, the system suggests additional tests. This is a cumbersome process to find eligible patients and doesn't help reduce any related costs. With Predictive analytics models, Patients are identified to enrol in clinical trials based on more sources—for example, social media—than doctors’ visits. Furthermore, the criteria for including patients in a trial could take significantly more factors (for instance, genetic information) into account to target specific populations, thereby enabling trials that are smaller, shorter, less expensive, and more powerful. 

 Predictive Models for Performance and Operational Analytics- For a mature CRO, determining the productivity, utilization and profitability of clinical research initiatives can help the CRO team allocates the most effective team members to certain projects. Based on the client which contracts the study, trends such as: Invoice-to-Cash cycle times Propensity of the client to request amendments to the research deliverables Ensure the scheduling of clinical researchers and clinical laboratory space is realistic, based on the nature and volume of projects One can also get a sneak peek to trial site performance capabilities for selecting the best investigators, eliminating non-performing sites and reducing enrolment timelines. With such insights, planning and forecasting clinical enrolment performance, rescuing off-track trials and optimizing contingency plans for those trials becomes easy. 

 Predictive Models for RISK Monitoring- Traditional monitoring typically allocates resources equally among study sites, regardless of clinical data or the risk to patients. 

Routine visits to all clinical sites with 100% SDV (comparing all data points on every case report form to all subjects’ medical records) are common — and are the largest cost driver in clinical trial budgets. Predictive analytics allows for sponsors to assess investigator risk and allocate monitoring resources where they are needed most. Real-time predictive modelling in the form of risk-based monitoring enables a study sponsor to adjust the level of monitoring as risk changes at individual sites. 

 Predictive Models for Financial Modelling and Cost-effectiveness  If you have a rich data asset then using Predictive Analytics you can benchmark cost and visualise data in that context to help sponsors forecast, budget and negotiate the cost for outsourcing clinical trials. 

 Predictive Models for manufacturing outcomes- Predictive Analytics modules can be used for process Improvement Analysis to Improve Scheduling and Throughput. Take an examples of this biotech company which is the only final stage bio-manufacturing facility in the world. 

The company is struggling to meet rapidly increasing customer demand. Repeated unanticipated production delays and starvation at critical parts of the operation were causing not only late and missed deliveries, but the expiration of batches of product at a cost of approximately $1 million per batch. 

Predictive analytics can be deployed in such a facility to uncover the root cause(s) of the unanticipated delays creating the late and missed deliveries; project such incompetence in advance; Run “what if” scenarios to see if additional capacity from the new facility would be required; the facility could produce two more lots per month; better way to do long-term expansion planning and predict when and where more line capacity would be required in future. 

 Whether you're a renowned ACO, Medicare Shared Savings Program, risk-bearing provider or health plan collaborating with providers in risk-based contracts, you need to understand the risk profile and experiences of your risk-based populations. One can only achieve this by combining, clinical, financial and predictive analytics with robust technology and expert data management services across multiple players. You can read my last post on Predictive Analytics for Better Patient outcomes here>> 


 Use Cases: 

  Cell-Based Assays for Drug Target Discovery: Lessons from Transcriptomic Studies With Human Lymphoblastoid Cell Lines Noam Shomron, Ph.D., Associate Professor, Head, Functional Genomics Laboratory, Faculty of Medicine, Tel-Aviv University Genome-wide pharmacogenomic studies for developing targeted therapies offer the advantage of hypothesis-free search for tentative drug response biomarkers (efficacy and safety). However, they require large patient cohorts and are therefore very costly. This talk presents our experience with an alternative approach, based on genome-wide transcriptomic profiling of a panel of human lymphoblastoid cell lines (LCLs) representing unrelated healthy donors. 

Using Big Data Analytics in the Cloud to Improve Vaccine Yields Jerry Megaro, Director, Manufacturing Advanced Analytics and Innovation, Merck Craig Sutherland, Executive Director, Technology & Data Science, Life Sciences and Health Practice, Booz Allen Hamilton Here we present a proof-of-concept experiment where big data tools and techniques were applied to integrate and analyze 12 years worth of vaccine manufacturing data from 16 data sources to identify characteristics that influence yield. Additionally, the talk will describe how we leveraged shared data lake platform services operated within an Amazon Web Services Virtual Private Cloud (VPC) and scaled up Elastic Compute Cloud (EC2) services on-demand to support the analysis effort. 

 From Big Data to Smart Data: Using Quantitative Systems Pharmacology for De-risking R&D Projects in CNS R&D Hugo Geerts, Ph.D., CSO, Computational Neuropharmacology, In Silico Biosciences Despite heavy investment in CNS R&D and increased available information in multiple –omics databases, the clinical failure rate remains above 90%. The ‘reductionist’ trap that focuses on very detailed bits and pieces of data makes it a huge challenge to generate actionable knowledge in an integrative and useful way. We propose Quantitative Systems Pharmacology as an example of a completely new generation of deep analytics approaches that can be a possible powerful and valuable tool for knowledge generation in the context of going from ‘Big Data to Smart Data’. The use of computer-based mechanistic modeling based upon the physiology of (human) brain networks, functional imaging of genetics and pharmacology of drug-receptor interaction and parametrized with clinical data as a common language allows integrating a wide diversity of information into an actionable platform that can be interrogated at different levels. We will show examples in Alzheimer’s disease and schizophrenia where this approach could have made a substantial impact on clinical trial success rate. Similar to other ‘engineering’ industries, the platform can also be developed also as a knowledge repository that conserves and expand ‘tacit’ corporate scientific knowledge.” 

 An Integrative Platform for Discovery of Drugs and Small Chemicals Associated With Autism Spectrum Disorder Adam Brown, Ph.D. Student, Biological and Biomedical Sciences, Harvard University Repositioning approved drug molecules in novel therapeutic areas is of key interest to the pharmaceutical industry. To aid in this effort, several large databases have been compiled in an attempt to aggregate gene-level information about drug molecules (e.g. Drugbank, the Comparative Toxicogenomics Database, and the Connectivity Map, among others). However, no pipelines exist to comprehensively query of gene expression databases for associations between drugs and diseases. To address these issues, we developed a generalized Kolmogorov-Smirnov enrichment test-based methodology for identifying compounds that modulate genes derived from expert-curated and publicly available datasets. We demonstrated our approach by correctly identifying two out of three frontline prostate cancer therapies from a database of over 7,000 drug and non-drug compounds. We predict a list of eight drugs that modulate genes that are also perturbed in autism spectrum disorder (ASD), seven of which target genes that have previously been implicated in ASD. Finally, we show that our method significantly enriches for drugs from a database containing both drug and non-drug compounds, demonstrating its utility as a hypothesis-generating tool for drug repositioning studies. 

 The Next Generation of Systems and Network Biology to Support Gene Variant and Expression Co-Analysis Chris Willis, Ph.D., Solution Scientist, Discovery and Translational Sciences, Thomson Reuters To date, drugs have only explored targeting 10% of the coding human genome. Systems biology approaches aim to increase these numbers through understanding molecular disruptions at the pathway level in disease. This talk will discuss the challenges associated with analyzing the exponential growth in biomedical research knowledge and multi-omics data. Furthermore, this talk will demonstrate the use of Thomson Reuters MetaCoreTM platform to analyze gene variant and transcriptomic data simultaneously. Lastly, the development of bioinformatic algorithms and the value of taking a combination knowledge/data driven approach to data analysis will highlight the roadmap for MetaCoreTM. 


 References 

 Market segmentation. BusinessDictionary.com. WebFinance, Inc.http://www.businessdictionary.com/definition/market-segmentation.html. Accessed December 29, 2014. Spending Trends in the CMO and CRO Market PharmPro, January 2012- NiceInsight BigData Analytics and solutions conference, October 2014, Boston Promodel- Formulation and Filling - Scheduling and Throughput Analysis Higher Quality at Lower Cost: The Benefits of Portfolio-Wide Predictive Analytics in Clinical Trial Monitoring Aug 26, 2013 By Brian Slizgi, Robert Franco, Mark Hronec, Applied Clinical trials How big data can revolutionize pharmaceutical R&D April 2013 | byJamie Cattell, Sastry Chilukuri, and Michael Levy Analyzing Big Data: the path to competitive advantage- Geoconnexion Predictive Analysis Is Data Mining’s Future- BioITWorld Mining for Information Gold in Unstructured Claims Data By Steve Holcomb Big Data and Pharmaceutical R&D: The Value Equation By Tim Carruthers on February 27, 2014 2:51 PM http://www.fda.gov/Drugs/DevelopmentApprovalProcess/ucm284077.htm Citations 1 www.sims.berkeley.edu/research/projects/how-much-info/ 2 www.nlm.nih.gov/pubs/factsheets/medline.html 3 Berry, M.J.A. and Linhoff, G. Data Mining Techniques for Marketing, Sales and Customer Support. Wiley, 1997. 4 Berger, A. and Berger, C.R. “Data mining as a tool for research and knowledge development in nursing.” CINMay/June 2004. 5 Stephens, S. and Tamayo, P. “Supervised and unsupervised data mining techniques for life sciences.” Curr Drug Disc June 2003. 6 Agosta, L. “The future of data mining -- predictive analytics.” DM Rev August 2004. 7 Blumberg, R. and Atre, S. “The problem with unstructured data.” DM Rev February 2003. 8 Alonso, O. and Ford, R. Text Mining with Oracle Text. January 2004 (PDF at http://www.oracle.com/ Image credits: Business Insights- Drug Discovery- http://www.business-insights.com/drug-discovery

Popular posts from this blog

Innovations that caught my attention recently-#Healthtech

No. 1- Lyme bacteria use the same technique as White Blood Cells to navigate and move in blood vessels In an interesting case of convergent evolution Lyme bacteria use the same technique as White Blood Cells to navigate and move in blood vessels.To zip through the bloodstream and spread infection throughout the body, the bacteria that cause Lyme disease take a cue from the white blood cells trying to attack them. Both use specialized bonds to stick to the cells lining blood vessels and move along at their own pace, biologist Tara Moriarty and colleagues report September 6 in Cell Reports. “It’s really an amazing case of convergent evolution,” says Wendy Thomas, a biologist at the University of Washington in Seattle who wasn’t part of the study. “There’s little structural similarity between the molecules involved in these behaviors, and yet their behavior is the same.” No.2- Wearable Robot for people who lost their hand function This wearable robot helps disabled patients re...

PDAs in Healthcare -Passe or in Vogue

The PDA is a very small and portable, handheld computer, which has many more functions than a calculator, and the capacity to store information much like a Personal Computer (PC). Basic functionality available on most PDAs includes an address book, schedule, calendar, note pad, and e-mail. The PDA is convenient to use in clinical and field situations for quick data management, and the information can be synchronized with a PC . By means of a wireless network, information can be exchanged anytime from anywhere to and from a PDA, and the network will provide immediate access to all kinds of necessary clinical and administrative data . Health care professionals need access to information several times a day, and the PDA has the potential to provide this. For the PDA, there are numerous documents and medical software applications available, with a wide variation in quality. A large number of medical students take advantage of the PDA for educational purposes and patient care with great sa...

Mhealth - Counterfeit Drugs India

WHO tells a story “By April 1999, reports of 771 cases of substandard drugs had been entered into the WHO database on counterfeit drugs, 77% of which were from developing countries. (Data analysis showed that in 60% of the 325 cases for which detailed data were available, an active ingredient was lacking.)” Bad medicines don't just threaten lives; they undermine the entire medical system . What is being done? There are distinct aspects to deciphering and de-complexifying the counterfeit pharmaceutical supply chain. One that is probably more in use today by almost all pharmaceutical companies worldwide is the product-based tracking methodology which incorporates the use of high technology systems to identify counterfeit products in the market. These technologies include tamper-evident packaging, holographics, bar codes and the more recent RFID. Indian Scenario People buy two tablets and never a strip. The unique number idea doesn’t work here. Chemists know that t...