Physician-centered informatics

My Google GenAI Capstone project

One site that is bookmarked by most clinical oncologists is the link to the Guidelines of the National Comprehensive Cancer Network. First published in 2002, it is a comprehensive set of disease-specific consensus treatment summaries that have been used as a means to provide practicing oncologists with clear summaries of evidence-based treatment, distilled through the many thought leaders in each disease site. The guidelines consist of flow diagrams that provide the clinician with algorithmic diagnostic and treatment recommendations based on the findings of the initial diagnostic evaluation. The guidelines also contain a discussion section summarizing the studies from which the guidelines were formulated.

When one wishes to formulate a treatment plan for a newly-diagnosed case of cancer, it is common to refer to these guidelines, not only for the reassurance that the treatment plan is reasonable, but insurance companies will sometimes decline to pre-approve treatments involving expensive agents, if they are not included in these guidelines. The guidelines themselves are detailed, and I have found that while the flow diagrams are convenient, it is frequently essential to review the rationale for these recommendation in the discussion section as well. There are a limited number of hyperlinks in the document, and it is not convenient to find immediate answers to specific clinical questions.

By now, most people are familiar with a search engine interface, and now most people have also had experience with ChatGPT. While LLMs are subject to hallucinations, searches that are based on chat models that are trained on a specific set of data, such as the NCCN guidelines, are not likely to be subject to such limitations. For this purpose, I wrote a python app that reads the NCCN guideline PDFs that one might have stored in a folder, and provides a query interface to the data. While this has been very helpful, there have been many situations where the information I desire does not reside in the guidelines, and the app resorts to a general web search. All that one needs to use the app is an API-key to OpenAI, as well as an account with the NCCN. The model uses the gpt-4o-mini model which is very, very inexpensive, costing a fraction of a penny per million tokens.

The NCCN Guidelines are updated at least annually, and for those cancers in which relevant practice-changing research become available more frequently and warrant more urgent incorporation, the guidelines are updated during interim meetings. New updates can occur as often as monthly, for common neoplasms, and many busy oncologists may not find time to keep these guidelines updated. The legal consequences of basing treatment on outdated recommendations has been not explored, to my knowledge, but I would hate to know that I did not provide up-to-date care to my patient.

Recently, I participated in a five-day intensive Generative AI course, sponsored by Google and Kaggle. I learned how to apply the techniques of Retrieval Augmented Generation (RAG), few-shot prompting, using agents to organize the coding tasks, as well as grounding an app with search. I decided to apply what I learned to updating my NCCN app, working in the Google universe.

This app requires a Google Gen AI API key. One must install the necessary libraries and store API keys, usernames and passwords in the Kaggle notebook. For this app, I used the Gemini-2.5-Pro model, as smaller models did not have the context length to handle the few shot prompts or the PDF processing. Model selection is set up in the setup_genai function.

The main module of this app is the def cancer_info_chatbot function, which is the orchestrator of several agents. Once the query has been received, it calls the setup_genai agent to make sure that the API key is available and valid. The next step is to determine what kind of cancer is implicit in the query. Sometimes it is explicitly stated, but at other times, one may use abbreviations, such as “DLBCL” for “diffuse large B-cell lymphoma”. For this reason, the query goes to detect_cancer_type which feeds the query into the agent generate_content_with_model which converts the text string into a client object and feeds that into the LLM to intelligently parse the intended tumor type, and categorize it into one of NCCN’s designated labels. Once the tumor type is identified, the label is sent to the agent get_nccn_pdf. This agent poses as a browser to interact with the NCCN website, then logs in with the supplied credentials, using the BeautifulSoup python library to parse the webpage. The category page is obtained and the link to the appropriate cancer type is taken, which leads to the page containing the PDF file. The cancer file is identified and downloaded, and sent to the extract_text_from_pdf agent to extract the textual elements, and this is sent to the LLM, along with the query, to formulate the answer. This is the RAG portion of the process.

The query was also sent to be searched on Google, using their GoogleSearch tool. This is called grounding with Google search, which improves accuracy and recency of responses. Few-shot prompting with examples was provided to help refine the format and content of the responses.

The result of this effort is an app that allows me to obtain the latest staging and treatment information from the NCCN guidelines database, with search backup if needed. No longer will I have to remember to download the latest PDF to my document folder. This illustrates the utility of using agents, RAG, few-shot prompting and grounding with Google search.

If you don't have a Kaggle account, and are interested in giving this a try for yourself, it's also on my Github:

https://github.com/gwtakahashi/NCCN-Guidelines-Google-GenAI-Capstone/blob/main/README.md

What do physicians do for information management?

Recently, someone posted on Reddit (r/medicine) a query about what systems other physicians use for knowledge management.

The statement "Or, like me, do you feel like your knowledge peaked the day you left residency?" is scary. Do we really want physicians to feel this way? What about continuous medical education? How do we improve? I think the author was looking for more than just an online textbook, since it's a given that resources like that will be used. But how to really curate your knowledge bases, and store the documents you read for easy retrieval.

The respondents' replies were all over the place. The first response was "pure chaos" - just using txt files. Several used Obsidian. I tried using this for a while, but the interface is not immediately intuitive, and even though I began to get more familiar with it, I found that I was still having to hunt for basic functionality. Since I didn't use it that often, it was easy to forget which icon provided access to functionality I needed. It has a markdown editor, but as a place to store PDFs, it didn't have the folder structure I was looking for.

Several commented on storage locales where could just dump files, like Dropbox, OneNote or Google Drive.

Some suggested note-taking apps, like Joplin or Notion.

I use Zotero a lot, and highly recommend it. It's like an electronic file cabinet. The best feature is that when you are browsing on the web, and come across something worth saving, you can click on a button, and the document is saved in Zotero. That's convenient!

I just wish that there were more formatting options in the folder structure, so I could highlight a particularly important reference.

But it's clear that most doctors don't have a great way to store their new pieces of knowledge, all in one place. Things from journals, books, meeting, notes, etc.

And then, once you store it in Zotero, how do you locate what you want? You have to take pains to layout your folder structure meaningfully so that you don't have to spend a lot of time searching for a particular article.

However, once there is an LLM system, where you can load external material (e.g. PDFs) and query the database with a chat interface, I will probably stop using Zotero. There are many RAG (retrieval-augmented generation) utilities but they are still very clumsy to use. It needs to be private and not-online, because there could be private notes. But as of today, you still need to open Terminal and load the LLM, and load the RAG. You need an API key from OpenAI. The RAG model must have access to the vector database which you need to set up separately, and doctors need to know how they want to chunk the material that they input into the model. The vast majority of doctors can't be bothered with setting all this up. How many doctors are familiar with Pinecone, Weaviate, prompt engineering, Python and Jupyter notebooks?

There's still a lot of work to be done to make this user-friendly for a segment of people who need this very badly.

Remarks on some recent "A.I. in healthcare" articles

In the past year since ChatGPT first opened up to new accounts, it seems that all business are looking for ways to incorporate it into their practice, not only to look for ways to increase efficiency and save money, but perhaps also to appear forward-looking and to join the trend and not miss out. I've mentioned that medicine and healthcare delivery have been one of the ways that those in the executive suite have been looking to incorporate into their business – or practice. It's helpful to review that artificial intelligence is not just chatbots or Natural Language Processing. It is also computer vision and deep learning, which itself comprises classification, modeling and prediction. We've seen how computer vision advances have help the radiologists, cardiologist and neurologists interpret their visual clinical data. Time sequence models, such as LSTM, have helped predict usage trends, which is of value to accountants and planners. The New England Journal of Medicine, in their article on Artificial Intelligence in U.S. Health Care Delivery summarized the state of the technology in healthcare in 2023. As I have discussed before, the areas where it has been implemented have been in the insurance reimbursement area, to help with claims tracking and audits. The article discusses how modeling has helped improve efficiences in operating room utlization. In the clinical world, the use of A.I. discussed in the article has been in using deep learning models to predict sepsis, or predict clinical outcomes in the ICU or emergency department, looking for factors that predict readmission or death. These are the low-hanging fruit scenarios, and mainly represent the application of data science techniques to various deep learning architectures, rendering operations more efficient.

The article mentioned the slow adaptation of A.I. in healthcare delivery. The reasons for this? One is the "variability and heterogeneity" of the data, which is understandable. This includes data generated by all the numerous sensors and imaging modalities (sometimes with audio as well as visual components), and the massive corpus of text information (both handwritten and printed). Before any data can be used by an A.I. system, there is need for preprocessing. Everything must be translated into a language that the learning architecture and a database, can process. This usually mean conversion to vectors (or tensors), but the question then becomes, whose format shall be used? There is now a trend to vector databases, which would be especially helpful in the medical world, since the old system of classifying things by human-created categories is laborious and slow. Furthermore, it would be difficult to correlate someone whose disease was given one of the R category of ICD-10 codes with more specific and precise codes. Vector or tensor databases promise to hold multimedia data, and enable trainable queries that will find hidden associations between conditions. At present, I suspect that most stored information is human-entered, and represents only a small subset of the information generated in patient encounters. All other data is still likely represented in formats standard to their media, such as PDF, JPG, WAV or MP4 files. EHR vendors, such as Epic, still store patient data in SQL-based servers, however Oracle is one of their database providers, and Oracle is investing in vector databases, so the technology may change.

But apart from this is lack of trust in artificial intelligence, both from patients as well as doctors. In this article from the New York Times, there is brief mention of Google's Med-PALM 2 chat model, which is geared to help healthcare workers, but concerns were raised over privacy and informed consent. I worry about the data from which it was trained. Content gets out of data quickly, and these models will need continual updating to stay current enough to be trusted by clinicians. Although connections to the Internet are now being incorporated with Retrieval Augmented Generation with langchain, so that problem may be tractable. But there are still ways to go before generative A.I. is ready to replace the physician. In an emergency department in Boston, GPT4 (so far, the best in class chat model) was so-so in diagnosing a woman's painful knee. The correct diagnosis was considered by the chatbot, but so was the human physician's differential diagnosis. At this time, it is probably true that a physician with A.I. is better than a physician without A.I., but this is mainly restricting to formulating a diagnosis. I don't think it's quite where it needs to be in terms of accuracy in predicting outcomes or prescribing treatment. It can provide suggestions, but skilled and experienced physicians may not find the occasion to consult the chatbot very often. And chatbots still struggle to explain why they came up with a conclusion, although the hallucination problem is improving with new fine-tuning algorithms.

At this time, I think it's still difficult to know how far chatbots and Natural Language Processing will be used in the clinic to perform duties of clinicians. A promising effort was made previously with the app Babylon, but it failed spectacularly. Sadly, it was going down in flames just as the new chat technology was in ascendance. I suspect that the developers unable to convince investors to give them more time to utilize the new transformer model in Natural Language Processing, which was showing the world just how awesome it was compared to previous chatbot technology.

What will medicine look like in the era of artificial intelligence?

Nature featured an article recently, weighing in on what the future of medicine might look like, now that artificial intelligence is taking off. Before discussing this article, I would like to review where things are with artificial intelligence and machine learning. The most significant development to the public, and which has garnered the most media coverage, has got to be generative chat A.I. It's hard to believe that back in November 2022, there were no users of ChatGPT. Within a week, there were a million users.

from https://research.aimultiple.com/chatgpt/

Since then, there have been a plenitude of other models that have made use of the transformer architecture to develop similar natural language processing (NLP) interfaces, and now many are using chat software routinely in their daily lives. Advances in machine learning and data science has been less visible, and have been utilized mainly in the corporate world to get a better handle on what's going on with the business and planning side of things. Computer vision advances have allowed for things like autonomous vehicles, but even in non-autonomous cars, computer vision apps help to display speed limits to drivers, as well as warnings about entering school zones and other road events. So I was eager to read about how the world of medicine has benefited from this, especially for the practicing clinician.

The trees of artificial intelligence and machine learning have borne fruit mainly in the field of computer vision (specifically with imaging assistance for the radiologist) and sensor technology (interpretation of rythms, heart sounds, and hemodynamic data).

from https://healthexec.com/topics/artificial-intelligence/fda-has-now-cleared-more-500-healthcare-ai-algorithms
Other areas where deep learning has made inroads have been in clinical decision support and in the interpretation of sensor data generated by neurologic and hematologic lab devices. Natural language processing has not much visible impact. It has the potential to allow for the processing of unstructured physician notes, with all the jargon and ungrammatical text, the barely-readable handwriting, into searchable and indexable vector databases for analysis and retrieval. This could unlock a large corpus of data, previously resistant to easy parsing. Doctors could use NLP apps to transcribe their notes more accurately, right into the EHR, without having to dictate them for transcription later. An NLP-driven chat interface could help them retrieve information, not only over the Internet, but from their own notes, and their own private libraries of PDF articles that they've saved on their computers.

So what did the Nature reporter have to say? Once again, the dominant inroad has been with radiology and the interpretation of imaging. It is of interest that some radiologists have soured on the use of A.I.-assistance, claiming that it did not really help them and sometimes resulted in more time spent. They didn't think that there were enough of those cases where the algorithm detected a significant finding that the human radiologist missed. We still aren't clear on the impact of an incorrect A.I.-generated diagnosis. Who gets the blame? How does a radiologist defend his/her opinion if one incorrectly defers to the algorithm?

The article glosses over other imaging applications, then moves on to eye (retinal) imaging, which is essentially the same concept, which is using machine learning to interpret findings seen on ophthalmologic imaging.

I would like to believe that there is more to the future of medicine that can be enhanced by artificial intelligence than just help with interpretation of medical imaging. Hospitals are using it to help with billing and logistics. Insurance companies are using data science techniques to look at practice patterns and elicit cost saving interventions. But efforts to help the poo clinician do his/her work more efficiently and more quickly, are still lagging.

The advances in chat to provide a natural and quick retrieval system of large datasets still eludes the access of the clinician. The main reasons are because the interface still does not have the precision and recall needed for mission-critical situations, and chat models still hallucinate. Such drawbacks could lead to a wrong and harmful clinical decision. But I feel that much of the needed information resides behind proprietary digital walls such as journal paywalls and even the National Library of Medicine itself. Having an NLP interface, like Bing does with its search engine, would be interesting to experiment with, but one doesn't want to take a legal risk with wrong or incomplete searches in the medical world. With each medical query, physicians still need to peruse a collection of articles to determine whether a clinical study is appropriate to their patient, by scanning the study methodology and patient population, as well as the statistics and limitations. At this time, no chat model will produce all this information.

Another greatly-needed application is clinical trial search. It is truly heartbreaking that patients have to have special skills and contacts to be able to locate appropriate and life-saving clinical trials. This is something that might benefit from a combination of vector databses and natural language processing to make trial searching easier and faster. These are some of the things I had hoped that there would breakthroughs.

New LLMs can pass medical exams - should human doctors be worried?

Progress in large language models (LLM) has been rapid lately and, I suspect, is moving faster than our understanding of what these models are really capable of. Microsoft's GPT-4 has exhibited evidence of a deeper world-model understanding than even GPT-3.5, which is scary as well as exhilarating.

For the application of helping physicians in practice, an enterprising startup has put out a chat-based app, Nabla, that promises to help physicians with their chart notes. I am not sure that LLM is mature enough to deploy for this application. First of all, the software runs on a cloud server, and this is always a concern. The company claims that it is "HIPAA-eligible" and "GDPR-compliant" but it will have to be approved by hospital or clinic security before it can be deployed. From what I can see, it outputs rather simple statements based on patient input, and seems to akin to a voice dictation system that is just able to pad snippets into a regular sentence. It won't create the kind of chart notes that I am accustomed to generating, especially in the Assessment and Plan section, which depends on a knowledge of the literature and interpretation of clinical findings and lab results, and sets down my line of thinking. So far, I've not encountered software that will save me that effort. As this software isn't asked to be creative, there is probably no risk of hallucinations or other unwanted side-effects of more complex generative chat. Never before, has a physician dictated a chart note with confidential and sensitive information to a startup corporate entity before. As protected information will be exchanged, will each user's input will be stored for use in a future training set? If so, how is protected information censored?

In the area of expert systems, great strides have been made by Google with LLM as expert systems. However it has been recognized that:

The problem is that the medical domain is a special domain. In contrast to other fields, there are different issues and even greater safety issues. As we have seen models like ChatGPT can also hallucinate, and be capable of the spread of misinformation.

In machine learning, the performance of a model is compared to human level performance, but human capabilities are compared with a theoretical level of perfection, and the difference is the Bayes optimal error. What the AI developer aims for is a model that has a higher level of accuracy than humans, that is, a lower Bayes error.

Google has been working with a whole slew of language models, but the top performers are the ones based on PaLM and FLAN. These models have been tested side by side, and while Flan-PaLM had the edge in taking exams, Med-PaLM scored higher with questions likely to be asked by consumers. This might be because PaLM was trained on databases like Wikipedia and social media.

But it's amazing that the LLMs could answer test questions like this:

But although these models can pass medical licensing board exams, I don't feel that these are ready to be deployed in the clinic.

I've not seen much written about problems that have been reported with other LLMs such as ChatGPT or GPT-3/4, such as hallucination, bias and toxicity. I have questions about how to "edit" the information that trains the 540 billion parameters of PaLM. For example, if you train it on a medical document that is found to be erroneous or false, how do you remove this knowledge from the model so that it doesn't make decisions based on that information again? How does one update the model on new information? Training is a time-intensive process and the large model requires hardware not readily available in a doctor's office. Smaller models might provide "good-enough" accuracy with reasonable training time, since in the medical world, training needs to occur regularly. A model like Flan-PaLM might beat a human oncologist today, but a few years later, an expert oncologist might defeat a model that has not been updated and retrained and validated.

Right now, it appears that companies like Google see their models deployed in the consumer space, to help with diagnostics and to provide answers to basic health questions. While I applaud this efforts, I would like to see some effort being made to help the beleaguered clinician who has to parse mountains of new data each month. Another worthy aim for AI would be to complement human clinicians, and be a true "peripheral brain" as we used to refer to software apps on our PDAs of years past.

Could a ChatGPT-type app be the indispensable physician's assistant?

In late December 2022, Google reportedly raised a Code Red alarm regarding the successes seen with the ChatGPT app developed by OpenAI.

ChatGPT was tested against Google in fielding various queries. The results were surprisingly quite good, and the weakness were primarily in retrieving images, videos and tweets. But when it was accurate, the testers were satisfied with getting relevant search results that provided them with the information they were looking for, instead of a list of links that they had to investigate themselves. CNBC also found ChatGPT to be superior to Google. These are just a few instances of the recognition that Google's business model might fall apart very soon, considering the velocity at which technology like this develops.

It has been noted that ChatGPT seems to have a better understanding of the semantics of the query than the Google search engine, which is one of the challenges of retrieving accurate and relevant medical information, where there are different ways of referring to medical conditions, different abbreviations, etc. How many times, would I have to manually sort through a list of search results for "small cell lung cancer" weeding out "non small cell lung cancer" results, because the search engine couldn't understand the query. Even PubMed, with their MeSH term system, still fails to distinguish between the two.

Andrew Ng, of DeepLearning.AI points out that a searching based on querying Large Language Models (LLMs) are, at this time, limited by memory requirements. The amount of memory needed to store the billions of parameters needed for the model to perform, could be as much as 800 GB, which is far greater than something like Wikipedia, which only takes up about 150 GB. (You can even download Wikipedia if you want to.) But the size of available medical and scientific knowledge is likely to be far larger. At this time, it would be too much to ask for ChatGPT to be the dream medical assistant. It also has a bad habit of hallucinating knowledge it does not know, and doesn't seem to be aware that it is doing so. To do this in a medical context would be devastating.

Still, I will be watching this space closely, and hope to get involved in a project that will spur this along.

Update:
Some are already impressed with the potential of OpenAI in the diagnostician space. So am I, although this is just one example, and the disturbing tendency for OpenAI to confabulate is a legitimate concern. There are already efforts made to remedy this.

When a search engine loses it's focus - the limitations of Google

Google has changed

Retrieving useful search results from a search engine is not always reliable. At this time, I look to Google as the standard-bearer of how well a search engine should perform. The original page-rank algorithm was impressive, and showed how poorly other search engines performed. However, Google's business model has changed, and it is clear that its algorithm now seems to take other factors into account. They got into the business of filter-bubbling, and preferentially showing you results you are likely to click on. The end result is that your search results are often different from other people's results. Also, the more you click on certain links, the more this is taking into account by the algorithm, and this affects what you will see the next time you enter the search terms. And now, Google has taken it upon themselves to decide if a website contains "disinformation" or information that might be harmful to their preferred political candidate. All this decreases the usefulness of Google, but it is still the default search engine used by most people. Gonna be a tough habit to break.

It can't parse metonyms or understand the nuance behind the query

A blogger named Sandy Maguire <a href="https://reasonablypolymorphic.com/blog/monotonous-web/index.html">recently posted</a> on his frustrations with Google. First, he decried that Google was not smart enough to understand metonymy. He gives the example of really enjoying the vibe at Koh Lanta, Thailand, and so he asked Google to find the "Koh Lanta of Croatia". He expected Google to figure out that he wanted Google to understand that he was searching for corresponding place in Croatia that would give him the same "vibe" as Koh Lanta. Not surprisingly Google failed. I mean, c'mon.

But having worked for Google, he knows that Google also rewards a website for how long you stay on the search result. (Did you know that?) Websites designed with SEO in mind will rank higher, even though their content is not what you wanted.

Then he describes that Google, being the most-used search engine, feels that because it is under constant government scrutiny, that it modifies its search results accordingly. He yearns for the Google of 2006, when you got good search results, as you expected. Instead of what you get now. Sandy feels that Google is the victim of its own success, and would like to see another search engine that ignores the tactics of "aggressive SEO", and boosts personal blogs, where people often have more interesting things to say. A search engine "by the people and for the people".

But we don't even have anything approaching a Google-type search engine for the medical field

So how shall one design a search engine that will pull up relevant and quality results for a physician who needs this information to treat patients? One tactic would be to avoid casting a net over the entire Web, but instead focus on select sites likely to contain the specific and relevant information. This would be medical journals and published guidelines, but also meeting summaries from relevant symposia as well as smaller meetings. Of course, the content from these meetings are the property of certain groups and academic centers, and they will not likely yield up their intellectual property without remuneration and possibly some editorial control. This limitation will be difficult to overcome without financial backing and some political clout.

Only after it were made possible to have medical journals and meeting organizers allow their content to be indexed and be made retrievable by some entity other than PubMed, would it be possible to study the benefit of more sophisticated search strategies to retrieve relevant documents for a query. Google already does index some content in medical journals, so I do pay attention to what Google is doing. So far, however, they are not much use as a clinical query tool. At least not for physicians, nurses and other healthcare providers who want content not intended for the lay audience.

Search giant enters the healthcare business; The scientific ontology of sentiment; Combining RSS and Slack

Alphabet enters the healthcare field

Google's parent company, Alphabet, recently announced that they were entering the healthcare business. Wonder what took them so long. Alphabet, as everyone knows, is in the business of building a database of people. They already know what people search for, where they go during their travels, what apps that have installed on their phone, what websites they visit, etc. But knowing details about their health would be a treasure trove. Of course, they wouldn't be able to get identifying information about an individual, as this would violate HIPAA, unless there was consent given somehow. (Hmm, when was the last time you **really** read the end-used agreement?) But they might obtain interesting information about healthcare trends, and provider prescribing patterns. They might get involved in Clinical Decision Support, and maybe guide/influence healthcare practices. Might their efforts lead to software that might help the user-physician to make clinical decisions?

The scientific ontology of sentiment

Speaking of Google, one blogger describes the lack of the "scientific ontology of sentiment" in academic papers. If I understand him correctly, he laments not being able to determine quickly whether or not an academic paper agrees with or disagrees with another journal article. You have to read the article first, and make that determination manually. He mentions Google Scholar as being helpful in not only the displaying references of a journal entry, but also other papers that reference that paper. Something that the old Science Citation Index used to do, which was very helpful. But annotating each journal entry or paper with additional information, such as a rank as to its clinical importance, from minor observations to practice-changing results, might be helpful for a system trying to determine in real-time whether or not to retrieve the article and display as high-priority to a busy physician, who can read only so much with limited time.

Will RSS find new life with Slack?

My Heme-Onc.news site works on a system similar to RSS. This system was all the rage around 2007, and is now used here and there, by dedicated users. Google dropping its Reader app led to a significant decline in the use of this technology. A recent blog posting suggested that RSS might find resurgence by using a feature of Slack, namely the Alerts channel. This would trigger an alert if there were updates to an RSS feed. This could be helpful if there was a feed that sent out notifications of a report of high-priority and importance (see above), but not if one was updated with every single change event. That would get annoying quickly. Still, it's an innovative way of combining two technologies, which might be refined and made useful in some other invocation. I would not want to install Slack or get an account on Slack solely for the purpose of getting notifications. Not at this time, when there is no RSS feed to take advantage of it. Same reason for not utilizing the WebSub protocol. Journal publishers have not embraced this protocol, choosing instead to email subscribers an excerpt of their table of contents of the month.

Search engine challenges

The Next Google

I came across this article from a writer named DKB, called The Next Google. The article is about how some innovators, looking to create the next great search engine, are doing more than providing a list of links from an indexed database, that just provides a list of links. The first search engine discussed, Kagi, is different in that it allows toggle switches to activate (or deactivate) searches of videos, news sites, discussion boards, Wikipedia, and listicle-type articles. There is also a feature called Kagi Instant Answers, which is apparently driven by Kagi AI, helps you "find exactly what you are looking for". There is a lens filter that directs searches to academic domains (with a .edu TLD) or discussion sites, or non-commercial sites. The concept is still very much a work in progress, but the goal is to add some structure to search results. Then there's Neeva, which calls itself the Everything Engine. This project was started by a former Google worker, and a couple others. But the aim of Neeva was to be free of ads and to be private. Of course, these are of little value if the search results are crummy. In an interview with the founders, they describe how they determine authority by whether Reddit posts or (in the case of tech topics) Stack Exchange, point to the article in question. In medicine, this would be more challenging, and although one could design a system based on the journal impact, such as the SJR ranking, sometimes the most important data are so fresh that it only exists in abstract form (yet) until something more definitive is available. What determines impact and importance is often the opinion of thought leaders on the research result. How will that be indexed?

How can we identify quality items?

Other up-and-coming search engines are also discussed, each trying to improve on Google, promising better privacy and customization, emphasizing content from what is considered to be more "trusted" sources, but these have to play out before we know if these strategies are successful. But how to tease out what is important in a clinical context? I don't see how to search for and assign an abstract impact score based on opinion pieces obtained after major conferences. Maybe some enterprising search engine will be able to tag these opinion pieces with something like Google's Tag My Knowledge so that they show up during organic searches.

Getting notified of new developments

This developer wanted to be notified whenever there was a new release of a particular app. Much like how, as a physician, I might want to be notified that there was something new and important in terms of cancer management. It turns out that the solution for him was simple - it was built into GitHub. I wish there was something similar for medicine (esp. oncology).

Science is hard - or the Important stuff is getting more diluted with non-important stuff

And finally, an article that purports that Science (with a capital S) is getting harder in the sense that we're seeing less "big" discoveries, and instead, more little discoveries. We're not seeing as many research papers that describe Nobel prize-winning work, as before. As the literature field is populated with lesser impact articles, a decreasing percentage of papers end up being "most cited". Reasons for this are not given, but it is stated that scientists "all seem to have an increasing preference for the work of the past, relative to the present". Also, there is the "burden of knowledge" where new discoveries require new knowledge. If there is more chaff than wheat, we really need something that will help us home in on the quality research data.

Artificial intelligence in medicine

Applying machine learning to patient EMR

I recently came across this article dealing with an engineer tasked with trying to apply machine learning to patient electronic medical records. Of course I was interested to see how he would solve this problem, as the challenges of dealing with a heterogenous set of formatted data might give some insight as to how to approach a similar system, where data from the medical literature, conferences, meetings, personal notes, video and audio media could be processed by machine learning in a way that relevant on-demand information could be extracted in real-time for the clinician.

Alas, the author of the article was not successful, and was able to post the reasons for this.

First, the data was "fragmented", and by this the author means that each separate system was not cooperative with the other systems, and the data was spread in different formats (and presumably different structures), such that it was not possible or feasible to be sure that the necessary data record was present in the format or data structure that was necessary or needed. So if you needed to know a specific parameter about a patient, it might be in the hospital's database, but it might be in the records of a different clinic or diagnostic imaging center, etc. These systems did not communicate, as they had no reason or directive to do so. Therefore, obtaining quality data was not going to be easy.

Much of the rest of the article deals with quirks of how data is entered into the system, or that medical workflow is done by humans not by machines, so that factors that determine triage are not necessary understandable to a machine, which operates by algorithm, whereas human healthcare providers could approach a complex problems through various starting algorithmic points, and this confounds data mining efforts. Also, reasons for why things are the way they are in medicine, could be due to governmental regulation, to natural history of a disease, as well as to doctors' and nurses' workflow practices.

Some parallels can be drawn to the effort of extracting relevant data from the medical literature. Where would a breakthrough result be located? If it's brand-new, it would be from reports released by medial companies during or shortly after a major meeting. These are usually news releases to major outlets, but they could be on social media platforms. Major study results would not be in the meeting abstract book, however - you have to wait until after the meeting and presentations. The full study often comes out several months later, but an abstract with commentary from various thought leaders is usually available weeks later. This can be of value to a clinician, especially for practice-changing results. But how would a machine know this and learn to retreive it? Heck, even humans have difficulty. But it is important to eventually be able to get the final publication, with all the details, including the supplementary data, because this is required and important when you are treating a patient according to the study, and would want to know details such as how long after the end of chemotherapy was radiation initiated, or what did the researchers do when there were unexpected hepatic or renal abnormalities. The devil is in the details.

Deploying AI in Medicine

The second article that I will discuss is this one, which discusses the role of AI in medicine. The author discusses the threat of AI in the field of radiology, where software-guided diagnostic imaging is already in use. If accurate pattern recognition is the goal of the machine learning, then indeed, the radiologist will find formidable competition. But someone has to program the system with the abnormalities that need to be recognized and considered to be worthy of further evaluation. The imaging technology will also evolve, and software will need to be trained against the new dataset. But examples of AI failure are then discussed, including the notorious failure of IBM's Watson Health system. The examples cited by the author are simply early efforts and I have no doubt that more capable technology will improve on this. But it is disappointing that against relative restricted tasks given to existing software and computing platforms, we are nowhere near being able to deploy this technology in the clinic.

The prospect of an intelligent and capable medical knowledge retrieval platform is likely still at least a decade away, if a system has to work with the data as they are now. The goal would be achieved much sooner if efforts were made to make the information formatted into something a computer could incorporate easily. The HL7 system was developed to help standardize electronic clinical and administrative data, so that systems like Cerner and Epic can work.

Unfortunately, no one (that I can tell) is even thinking about doing this for medical research data, much less working on a standard.