Interactive Big Data Analytics Platform for Healthcare and Clinical Services
2School of Health Information Science, University of Victoria, Canada
3Advanced Research Computing, University of Victoria, Canada
Received Date: August 20, 2018; Published Date: September 20, 2018
A Big Data Platform (BDA) with Hadoop/MapReduce technologies distributed over HBase (key-value NoSQL database storage) and generate hospitalization metadata was established for testing functionality and performance. Performance tests retrieved results from simulated patient records with Apache tools in Hadoop’s ecosystem. At optimized iteration, Hadoop distributed file system (HDFS) ingestion with HBase exhibited sustained database integrity over hundreds of iterations; however, to complete its bulk loading via MapReduce to HBase required a month. The framework over generated HBase data files took a week and a month for one billion (10TB) and three billion (30TB), respectively. Apache Spark and Apache Drill showed high performance. However, inconsistencies of MapReduce limited the capacity to generate data. Hospital system based on a patient encounter-centric database was very difficult to establish because data profiles have complex relationships. Recommendations for key-value storage should be considered for healthcare when analyzing large volumes of data over simplified clinical event models.
Keywords: Adaptable architectures; Big data; Data mining; Distributed filing system; Distributed data structures; Healthcare informatics; Hospital systems; Metadata; Relational database