Open Access Research Article

Interactive Big Data Analytics Platform for Healthcare and Clinical Services

D Chrimes1*, MH Kuo2, AW Kushniruk2 and B Moa3

1Database Integration and Management, Vancouver Island Health Authority, Canada

2School of Health Information Science, University of Victoria, Canada

3Advanced Research Computing, University of Victoria, Canada

Corresponding Author

Received Date: August 20, 2018;  Published Date: September 20, 2018

Abstract

A Big Data Platform (BDA) with Hadoop/MapReduce technologies distributed over HBase (key-value NoSQL database storage) and generate hospitalization metadata was established for testing functionality and performance. Performance tests retrieved results from simulated patient records with Apache tools in Hadoop’s ecosystem. At optimized iteration, Hadoop distributed file system (HDFS) ingestion with HBase exhibited sustained database integrity over hundreds of iterations; however, to complete its bulk loading via MapReduce to HBase required a month. The framework over generated HBase data files took a week and a month for one billion (10TB) and three billion (30TB), respectively. Apache Spark and Apache Drill showed high performance. However, inconsistencies of MapReduce limited the capacity to generate data. Hospital system based on a patient encounter-centric database was very difficult to establish because data profiles have complex relationships. Recommendations for key-value storage should be considered for healthcare when analyzing large volumes of data over simplified clinical event models.

Keywords: Adaptable architectures; Big data; Data mining; Distributed filing system; Distributed data structures; Healthcare informatics; Hospital systems; Metadata; Relational database

Citation
Signup for Newsletter
Scroll to Top