Research Article

Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services

Box 1

Information from interviewed groups involved in clinical reporting at Vancouver Island Health Authority (VIHA).
Group 1 - Architect
(i) Focus on the current demographics using standardized metadata of CIHI, hospitalization, and readmission
for BC and VIHA.
(ii) CIHI requests hospitals to submit data based on set data collection targets and standards. Much of the used
collected data is from DAD and some ADT; therefore, combining the 2 databases to form NoSQL database
is representative.
(iii) ADT is location, medical service and inpatient to discharge, so we can add those columns while diagnosis
and procedure are separate and can add those to the patient encounter even though they are separate.
(iv) Requested by and regulated by CIHI all metadata associations can be based on the encounter and MRN at
hospital level with PHN as a primary key.
(v) It is the most important system that holds the patient’s non-clinical information. These are based at the
patient encounter level and are represented by columns and rows in existing database for our NoSQL.
(vi) ADT is collected when patient is still in the hospital, but DAD data is recorded after patient leaves the
healthcare facility. Combining ADT and DAD is already done at the hospital level and can have representation
of hospital system via non-relational database.
(vii) DAD contain the clinical information that is collected ADT is the location, date and time of the visit, and
patient personal information. Data elements for data are based on profiles at the metadata level. And
there is a data dictionary that we can simulate.
(viii) Patients are identified using their PHN, MRN and encounter number. Encounter level queries are important
as well as hospital level patients’ metadata that is possible to represent encounters as rows in database.
Group 2 - Reporting
(i) Produce standard reports hourly, daily, weekly, monthly, and yearly with no errors for reporting, the
metadata are supposed to be standardized at the enterprise architecture. Dependencies in the data can be
simulated with the correct metadata.
(ii) ADT is implemented from vendor and source of truth and automated, DAD is abstraction and utilizes
source; therefore, the 2 databases are already linked. Combining ADT and DAD is possible and representative
of hospital system while supporting clinical reporting and benchmarking our simulations.
(iii) Significant relevance to reporting to CIHI can show similar queries in simulation.
(iv) Standardized reporting is available to show similar queries in simulation.
(v) Primary keys are important for data integrity and no errors while linking encounter to patient. Database
keys need to be represented.
(vi) Encounter level data important to standard reporting and data integrity. Simulation patient encounters
at hospital level to represent clinical reporting.
(vii) Key stores important to index data because foundation of system is based on patient encounter. Need to
utilize technologies to create key stores and unique indexes of the encounters to query the data.
(viii) Important queries need to incorporate as proof of concept with certain fields from hospital systems:
(a) Frequency of Diagnosis (Dx) Code with LOS, Frequency of Diagnosis (Dx) Code with LOS, Diagnosis
Code with Discharge date and Discharge time, Diagnosis Code with Unit Transfer Occurrence, Diagnosis
Code with Location building, Location Unit, Location Room, Location Bed, Discharge Disposition,
Diagnosis Code with Encounter Type and LOS, Diagnosis Code with Medical Services and LOS, Highest LOS
for MRNs with Admit date, Frequency (or number) of Admit category with Discharge_Date,
Provider Service with Diagnosis codes.
(ix) Combining the columns, we need to be able to perform these basic calculations:
(a) [Discharge time/date] – [Admission time/date] = length of stay (LOS) [Current date] – [Birth date] = Age
(b) [Left Emergency Department (ED) date/time] – [Admission to ED date/time] = Wait time in ED
(c) Intervention start date/time = needs to be between [Admission time/date] and [Discharge time/date]
(d) (Intervention) Episode Duration = Should be less than LOS
(e) Transfer In/Out Date = Should be between [Admission time/date] and [Discharge time/date]
(f) Days in Unit = should be less than or equal to LOS.
Group 3 - Data Warehouse
(i) Like key stores, we need dependencies in our database to be representative of existing system relevant to the
hospital operations.
(ii) Certain data elements with standardized metadata are necessary for the data to be accurate. The process
needs to generate same metadata with accurate dependencies.
(iii) Integration is not necessary for system to work but only to query the data ad hoc or correctly, and currently
no real time or streaming data. Integration depends on patient healthcare numbers from system at
each encounter and linkage between ADT and DAD via indexed rows.
(iv) Medical Services is not currently utilized in clinical reporting because it is not DAD abstracted, but could be
utilized in data warehouse. The reason is due to CIHI’s data standards can integrate medical services and other
metadata from ADT with direct linkage to metadata from DAD.
(v) Transfers are important to ADT and flow of patients in the system as their encounters progress and change.
We can use transfers and locations in the database as simulated metadata of known profiles from hospital.
(vi) Combining columns against encounter rows is already implemented at the hospital level; therefore, ADT and
DAD combination is relevant and simulation valuable.
(vii) Groupings allow building and construct of database to add columns progressively based on the encounter.
(viii) Diagnosis is important because it is health outcome of hospital. Groupings important as performance
metrics. Simulating queries based on encounters.