Applying machine learning to patient EMR
I recently came across this article dealing with an engineer tasked with trying to apply machine learning to patient electronic medical records. Of course I was interested to see how he would solve this problem, as the challenges of dealing with a heterogenous set of formatted data might give some insight as to how to approach a similar system, where data from the medical literature, conferences, meetings, personal notes, video and audio media could be processed by machine learning in a way that relevant on-demand information could be extracted in real-time for the clinician.Alas, the author of the article was not successful, and was able to post the reasons for this.
First, the data was "fragmented", and by this the author means that each separate system was not cooperative with the other systems, and the data was spread in different formats (and presumably different structures), such that it was not possible or feasible to be sure that the necessary data record was present in the format or data structure that was necessary or needed. So if you needed to know a specific parameter about a patient, it might be in the hospital's database, but it might be in the records of a different clinic or diagnostic imaging center, etc. These systems did not communicate, as they had no reason or directive to do so. Therefore, obtaining quality data was not going to be easy.
Much of the rest of the article deals with quirks of how data is entered into the system, or that medical workflow is done by humans not by machines, so that factors that determine triage are not necessary understandable to a machine, which operates by algorithm, whereas human healthcare providers could approach a complex problems through various starting algorithmic points, and this confounds data mining efforts. Also, reasons for why things are the way they are in medicine, could be due to governmental regulation, to natural history of a disease, as well as to doctors' and nurses' workflow practices.
Some parallels can be drawn to the effort of extracting relevant data from the medical literature. Where would a breakthrough result be located? If it's brand-new, it would be from reports released by medial companies during or shortly after a major meeting. These are usually news releases to major outlets, but they could be on social media platforms. Major study results would not be in the meeting abstract book, however - you have to wait until after the meeting and presentations. The full study often comes out several months later, but an abstract with commentary from various thought leaders is usually available weeks later. This can be of value to a clinician, especially for practice-changing results. But how would a machine know this and learn to retreive it? Heck, even humans have difficulty. But it is important to eventually be able to get the final publication, with all the details, including the supplementary data, because this is required and important when you are treating a patient according to the study, and would want to know details such as how long after the end of chemotherapy was radiation initiated, or what did the researchers do when there were unexpected hepatic or renal abnormalities. The devil is in the details.
Deploying AI in Medicine
The second article that I will discuss is this one, which discusses the role of AI in medicine. The author discusses the threat of AI in the field of radiology, where software-guided diagnostic imaging is already in use. If accurate pattern recognition is the goal of the machine learning, then indeed, the radiologist will find formidable competition. But someone has to program the system with the abnormalities that need to be recognized and considered to be worthy of further evaluation. The imaging technology will also evolve, and software will need to be trained against the new dataset. But examples of AI failure are then discussed, including the notorious failure of IBM's Watson Health system. The examples cited by the author are simply early efforts and I have no doubt that more capable technology will improve on this. But it is disappointing that against relative restricted tasks given to existing software and computing platforms, we are nowhere near being able to deploy this technology in the clinic.The prospect of an intelligent and capable medical knowledge retrieval platform is likely still at least a decade away, if a system has to work with the data as they are now. The goal would be achieved much sooner if efforts were made to make the information formatted into something a computer could incorporate easily. The HL7 system was developed to help standardize electronic clinical and administrative data, so that systems like Cerner and Epic can work.
Unfortunately, no one (that I can tell) is even thinking about doing this for medical research data, much less working on a standard.