Overview of Work Packages 6 and 7:
WP6: ND4BB Information Centre
This WP has delivered an antibiotic research data repository, along with the solutions to format and enter data into and query/pull summary data out of such an Information Centre. This ND4BB Information Centre (InfoCentre) combines legacy data on successful and failed approaches to antibiotic drug discovery from European Federation of Pharmaceutical Industries and Associations (EFPIA) and public partners with the data and high level descriptors of the data generated within ND4BB Topics (see Home Page). Data stored within the repository are broad even by the Pharmaceutical industry standards, encompassing: High-Throughput Screening and Hit-to-Lead and Lead-to-Candidate “Discovery” studies; “Early Discovery” in-vitro, in-vivo and modelling data; and “Development” data including certain pre-clinical and clinical trial findings.
A detailed selection process was undertaken to identify a suitable ELN solution for the Public partners in the TRANSLOCATION consortium. A scoping exercise included demos by the vendors to the user group and collection of user requirements from all public partners. The Public partners within TRANSLOCATION represent a highly diverse range of potential users and although it was apparent that any single ELN would be unable to meet 100% of requirements, the evaluation process focused on which solution best aligned with the functional requirements and met the majority of the needs of users within a reasonable financial framework.
The main drivers for the final decision were
- Some end users are not familiar with the usage of ELNs, the selected solution should be easy to use to get these users “on board”
- Users should take care about the correct data and the correct format of the data stored in the ELN instead of using high sophisticated features
- Basic functionality out-of-the-box available, as few configurations as possible
- Easy to install and maintain
- Web interface available to support Linux and Mac OS users without local installations of middleware like Citrix
- Experience of the vendor with a hosted solution at an international provider
- Possibility to use the ELN after the five years project time
- Proposed installation timeframe
- Per user/year costs for academics and commercial users.
According to this evaluation process the cloud hosted BIOVIA solution (formerly known as iLabber by contour/accelrys) was the proposed solution. This was presented to the Steering Committee and approved. Training of super users started end of May and in parallel trainings for end users started. The go live date for all consortium members was mid of June 2013.
Meanwhile, after more than three years of operation, the number of projects has increased to 39 and there are more than 2300 experiments created in the ELN by around 100 users.
While in the first 6 month we had 214 support requests, mainly about creating new user accounts or projects, the number decreased significantly to 81 requests for the next six month. Overall, the total number of requests (492) is lower than expected and especially the number of questions about technical issues was extremely low.
After more than two and a half year of operations we conducted a user acceptance survey, which will be published soon.
Data access rules are a basic requirement for developing a secure, valuable and sustainable Information Centre. All participants in the project, i.e. data owners, data analysts as well as internal and external IT personnel working on the data repositories must be aware of the value and the sensitivity of the data they are dealing with. It is possible to control and restrict access to repositories by technical as well as organizational means. Technical methods are normally coupled with high maintenance efforts like documenting who has access to which data and for how long and why. Organizational measures are much easier to implement but need additional effort to control the correct handling, e.g. audits. Especially for the long term utilization of the repositories the procedures should be easy to handle and control and maintenance should be limited to the minimum.
In parallel a separate document explaining the overall information governance was created describing the required procedures to handle data files and folders in a common way. Finally, separate swimlane diagrams are drafted showing the required procedures to grant access and uploading data to the repositories. Both additional documents were compiled to explain required procedures in more details.
This is an ongoing activity. The different processes are documented for a later publication
The installed solution was selected according to the following requirements:
- Store data/files from EFPIA companies until uploaded into the final Information Centre
- Upload more than one file at a time
- Structure the uploaded files in a folder structure maintained by the user
- Upload files without limitation to the size
- Upload files of all ‘common’ formats
- All files will be deleted after uploading the data into the Information Centre
- Simple installation of the long term solution
- Low costs for hardware/software and maintenance
- High security standards
- Out of the box solution
- Short term availability
- Flexible user access
- Simple User interface available for all operating systems
For administration of FTP and LDAP accounts a Web Management was implemented at https://iic.nd4bb.eu. Secure access to the FTP server is granted only by SSL/TLS. This protocol is also used during file transfer over the internet. All other protocols are explicitly excluded. Secure transfer within the University of Cagliari is established by a separation of the network via a dedicated virtual LAN (VLAN). Failover of the system is arranged by two independent RAID6 storage systems with a redundant network configuration. The FTP server name is registered under the DNS domain nd4bb.eu (iic.nd4bb.eu). For additional security a certificate for FTP and HTTP transfer was registered at an independent Certification Authority (CA).
The open source FTP client “FileZilla” was selected and described as the end user interface, but other clients should also work with this configuration.
The procedure to upload files to the Interim Information Centre can be found here.
Work instructions for administration of the Interim Information Centre are available here.
The value of the Information Centre is not the technical solution or the data itself, but the chance to evaluate data across companies, across all phases of research & development, to combine data from private companies with data from academia and to explore the data in an innovative and flexible way. According to our investigation at project start there was no existing platform able to handle all kind of expected data.
The major criteria for selection of the implemeted solution are
- The data uploaded to the Information Centre will be no primary/raw data but rather curated data
- No update/delete procedures to single data need to be implemented
- Access to the final data pool will be read only
- Publically available data will not directly be uploaded to the Information Centre
- Focus of this project is on scientific results, not on technology
- Before clinical data can be included into the Information Centre, security and data privacy settings as well as corresponding workflows must demonstrate that the Information Centre is capable to handle sensitive data accordingly
- To build the Information Centre only limited funding is available
- The timeline to setup a functional Information Centre is quite narrow
- Time and resources are limited to develop and test completely new tools
- Although expected data volume will not exceed several terabytes, the complexity of the data sets would make technologies used for “big data” useful to consider
The main task of the final InfoCentre is data analysis. It is expected that transactions are mainly bulk uploads. Changes or updates to data are minimal. The final Information Centre solution is optimized for data analysis. The data for analysis are primarily structured data. It is anticipated that unstructured data (like images or 3D structures) are analysed by their metadata, not by the original data directly. A structured database with focus of analysis seems to be the most appropriate solution. Thus OLAP (On-line Analytical Processing) systems are the preferred solutions.
The finally selected solution is a combination of two databases: tranSMART for patient and ~omics data and LSP for discovery/preclinical data. One major advantage of the proposed solution by combining existing databases is the utilization of the available databases building up a combined repository. Advantages are
- Well established databases
- Detailed documentation available
- Existing procedures to upload external data
- Functional user interfaces (UIs) available
- Trained staff
Beside these advantages it was considered that using multiple databases it should be avoided to create or use
- doubled structures, i.e. tables or columns for identical data in different schemas
- incompatible extensions, i.e. additional linking tables/procedures should be backward compatible with the originating systems
This approach allows to store data in the Information Centre using the adopted analysis tools (see Action 7) cross all database schemas beside the individual tools developed for the originating database.
The appropriate infrastructure hosting option for the productive Information Centre was assessed and defined based on the following four issues:
The installation and configuration of the operating system started in April 2014. At the same time a first version of a tranSMART-Spotfire connector (see Action 7) was developed. The installation of the tranSMART system and a ported version of LSP started. A functional version of the Information Centre was available in early October.
Processing of the data according to the requirements of other topics of the ND4BB program started immediately after the first data arrived. Select dictionaries & tools and define processes for data curation and data upload is an ongoing process (see Action 3) and based on the quality of the data extracted and prepared by the data owners for import to the Information Centre.
Functionality expected by the end users is a flexible interface combining data according to the user’s specification. Focus is on an easy to use data selection process than easy exploitation functionality (single key visualization/dashboard style interface). As end user frontend applications two widely used tools were selected for the initial phase, Spotfire and R. These tools are frequently used by the pharmaceutical industry (Spotfire) and academic partners (R) for analyzing data sets from heterogeneous sources. It is anticipated that with more data uploaded to the system and more users evaluating available data sets, new tools will be demanded by the users community.
The fully implemented InfoCentre solution provide a framework to govern data sharing with granular (individual) security and access/privilege controls (such as read only or create, update and delete) which provides hierarchal levels of access to different parts of the systems content. Internal users have full read only access to the InfoCentre along with instructor lead training and supporting documentation. Online help is provided to internal users and a help desk is available to guide users or resolve issues.
Upload of data will be done solely by trained data managers, responsible for data harmonisation, transformation and consistency checking. Additional annotation of data will be done in close cooperation with data owners. This will guarantee a consistent, high quality data set read for cross-study analysis.
Beside the technical implementation all required maintenance and monitoring processes and supporting functionalities were installed, tested and documented. Management of the whole system is splitted according to the knowledge and available resources between the partners of the workpackage, namely The Hyve, GRITsystems and Fraunhofer.
For external (non-consortia users) an InfoCentre-based web interface will give the scientific community access to high level overviews of data collected within individual discovery, pre-clinical and clinical programs. An easy to use system will allow external users to request analysis of data from the internal InfoCentre. All requests will be reviewed by TRANSLOCATION team and will be approved on the basis of scientific merit. The analysis will be created by an internal data custodians, trained in high sophisticated statistical methods. Help will be provided to external users by an help desk, working as a single point of contact (SPoC).
Post funding sustainability of the Information Centre as a service for merging and evaluating data from private and public partners along the research and development chain beyond the life span of the project is a key factor proving that the selected and developed solution meets essential requirements of the scientific community. A sustainable Information Centre will demonstrate that
- a shared repository is demanded by a community
- the solution adds value to the scientific community
- it is in compliance with H2020 demands on sustainability and collective intelligence
and finally that the funding was thoroughly invested.
Sustainability in context of a full service platform describes long term availability and promotion of tools and assistances to support user demands. A sustainable service cannot be defined in advance by defining key requirements, but need continuously be reflected and adopted to add new value to a changing user community. On the other hand, to initiate a sustainability service some basic features are required as there need to be a functional system, a supporting community and, most important, a user community demanding the system for their daily work. All three basic requirements are obtainable for the ND4BB Information Centre. The next step to build a viable, long-term service will be to find and select an organisation which is able to promote the service to a more international community.
The preferred model describes an ongoing support by the existing project partners, familiar with all the technical and operational requirements, while the managing processes will be moved from the ND4BB TRANSLOCATION project to an international worling organisation like JPIAMR. Besides moving the management activities from a temporary project group to a long-term persisting board of experts in antimicrobial research, this will give an exceptional opportunity to broaden the user community by inviting international partners of JPIAMR to join the Information Centre user community.
Actually we are working on several parallel streams to increase the visibility and the power of the InfoCentre. The team is
- Mobilzing an active user community
- Adding functionality to the InfoCentre demanded by the user community
- Uploading additional data to the InfoCentre
- Simplifying the annotation process for data sets
Note: Output of Task 6 is the input of WP7
WP7: Combining R&D experience to develop best practice and avoid duplication
In this WP the TRANSLOCATION team analyses the data collected in the ND4BB Information Centre (WP6) and deliver recommendations and best practices for more efficient antibacterial drug discovery and development efforts. One of the central tenants of the ND4BB Programme is that unprecedented open sharing, analysis, and integration of knowledge and experience amongst programme members can lessen discovery and development barriers and thereby improve the efficiency and productivity of antibiotic pipelines. The goal of WP7 is to help realize this aspiration through the development and implementation of a mining, analysis, and modelling framework capable of harnessing the collective power contained in the wealth of preclinical and clinical knowledge that will be summarized in the Information Centre (WP6). As the work progresses, collaborators will distil data and learnings into concrete best practices, recommendations and models in order to foster more efficient and cost effective antibacterial drug discovery. These findings will subsequently be disseminated to the broader antibiotic research community.
Beside the ELN and the Information Centre an Antibiotic Database was developed. For more information see page LINKS