Literature Review of "First Do No Harm" paper and lecture by Drs. Cohen and Gordon
Some areas of my profession take a cooler view of AI than I do. I break down the paper here.
Before I first started writing on Substack, I was reluctant. I had plenty to say, but I knew that not everyone would like what I had to say. I often disagree with prevailing opinions in my profession, and I tend to disagree - shall we say - emphatically and enthusiastically. I tend to play the game at maximum velocity, and that, along with a habit of sledgehammer-direct communication, has frequently rubbed folks the wrong way. I don’t know that my career is better for it, but sometimes my conscience is. As with many things in life, I’m working to get better.
I talked to a coach about the idea of writing, and she said, “So what? Own that identity. You’re not the guy who gets along with everybody.” I took her advice and decided to accept the potential of being disliked and to put forth the opinions and perspective I’ve developed over almost a decade of clinical practice.
This is one of those times when I felt I had to write. It is the way I enter the world and the best way for me to represent myself and my profession.
Even when I write critically about aspects of veterinary medicine, I endeavor to limit the critique to actions and omit my opinion of the individuals. I will not do so here, because I think it’s important to convey my respect and admiration for Dr. Ira Gordon. I haven’t met Dr. Cohen (though I hold considerable professional esteem for anyone who can complete a residency and pass specialty boards), but I have met Dr. Gordon. He’s friendly, insightful, wildly accomplished, and enthusiastic about veterinary medicine. I was honored to meet him.
While my criticisms of the positions represented by this paper and lecture will sometimes be strongly worded, I maintain a high opinion of the authors. Their professional accomplishments are considerable and worthy of respect, and even if they weren’t, they would still be worthy of respect as people. We are not enemy combatants on opposite sides of the battlefield, we are simply colleagues who disagree. What we have in common far exceeds those areas of dissension.
The paper was written by Dr. Eli Cohen and Dr. Ira Gordon, who are well-credentialed and accomplished specialists. The lecture, based on the paper’s research, was given by Dr. Cohen and is available on the AVMA Axon platform.
First, Do No Harm
“In order for artificial intelligence (AI) to be trustfully adopted in veterinary medicine, it needs to be lawful, ethical, and robust. As veterinary medicine evolves with new technologies, veterinarians are charged with upholding and adhering to ethical conduct. This is particularly important in the realm of AI because technology innovators may be unfamiliar with, or insensitive to healthcare policy or medical and research ethics.”
Cohen, E. B. and Gordon, I. K. (2022). First, do no harm. ethical and legal issues of artificial intelligence and machine learning in veterinary radiology and radiation oncology. Veterinary Radiology & Ultrasound, 63(S1), 840-850. https://doi.org/10.1111/vru.13171
There is a lot of this paper with which I disagree, but there is some of this paper that is factually untrue or written in a fashion that seems to be intended to create fear, and doubt, and to obscure the truth. Some of the factual mistakes made here are likely innocent, the result of inadequate research by the authors or the rapid advancement of technology in the year or so since the paper’s research and publication.
First, I do not believe that it’s the responsibility of veterinarians to ensure that artificial intelligence is “lawful, ethical, and robust.” I am a practitioner of medicine, not a legislator. Most of the laws concerning the use of artificial intelligence haven’t been written yet. We haven’t come up with a legal framework as it pertains to AI use in medicine and certainly not in veterinary medicine. Physicians Eric Topol and Bertolan Mesko wrote an interesting paper on the topic of regulation, and seemed to conclude that an entirely new regulatory framework will need to be created. “Lawful” generally means “in accordance with the law,” and the AI software and its use in veterinary medicine is, at this time, entirely lawful.
As for it being ethical, I refer to Dr. Cohen’s own lecture on artificial intelligence, in which the title of his first content slide reads, “A.I. Has No Conscience.” He is certainly correct. Artificial intelligence is a thing, and has no conscience. Neither does my scalpel blade, cell phone, Subaru, or stethoscope. Things do not have a conscience, but our use of them can be ethical or unethical. We do not need to define or deem artificial intelligence as “ethical” but rather put forth a standard for its ethical use. As veterinary medicine has done so often before, we are rightly leaving it in the hands of practitioners to determine if its use is ethical or not.
Finally, the last mandate is “robust.” I turn to statistics on ChatGPT, which is, as of the time of this writing, the foremost large-language model in the world. It is trained on more than 100 trillion parameters and constantly honed by its more than 50,000,000 unique visitors every day and works in 95 human languages. It is the most advanced large-language model available to the general public. It has passed medical exams, business exams, and the Turing Test. That surely must meet any definition of “strong and healthy; vigorous,” as it might refer to artificial intelligence. And with each iteration accelerating its learning, it’s only getting better.
“Our first tenet in veterinary medicine is ‘Primum non nocere’ or ‘first do no harm.’ This is the Hippocratic Oath and should guide what we do,” says Dr. Cohen in his lecture titled “First do no harm: Ethical and Legal Considerations of A.I.”
To be blunt: that’s not true.
The first sentence of the Veterinarian’s Oath is, “Being admitted to the profession of veterinary medicine, I solemnly swear to use my scientific knowledge and skills for the benefit of society through the protection of animal health and welfare, the prevention and relief of animal suffering, the conservation of animal resources, the promotion of public health, and the advancement of medical knowledge.” It goes on, but nowhere does it say or suggest “do no harm.”
The Hippocratic Oath does not contain the words “do no harm,” although an early version, circa 245 AD, seems to acknowledge the principle of “non-maleficence.” It also swears this oath by “Apollo Healer, by Asclepius, by Hygieia, by Panacea, and by all the gods and goddesses,” so perhaps what is essentially a prayer to an ancient pagan pantheon is not as universally applicable to modern medicine as is suggested. Although I acknowledge that most medical schools use some version of an oath, many do not include the words “do no harm.” (Several openly admonish euthanasia and abortion too.)
This isn’t a matter of oaths sworn or broken, it’s one thoughtful, intentional, considered scientific study and application. Something that is often accomplished most effectively with the help of thorough research and study.
Respectfully to Drs. Cohen and Gordon, I believe that the experimentation, use, and trials of artificial intelligence will serve to enhance the knowledge and skills for the benefit of society through the protection of animal health and welfare, the prevention and relief of animal suffering, the conversation of animal resources, the promotion of public health, and the advancement of medical knowledge. That it threatens the business of teleradiology ought to be a subordinate concern.
What is unethical, to my mind, is publishing - in a medical journal - false statements of fact that could have been easily verified at any stage of research, authorship, or peer-review.
AI Influence on Clinical Decision-Making
If “bad AI” can influence radiologists to poor performance, do we think the danger is with the use of AI? Or with the training and performance of the radiologists?
Dr. Cohen cites this paper in his Axon lecture. The paper is behind a paywall, but the results are available. The work notes the influence that AI has on radiologists, not general practitioners, in interpreting certain types of radiographic images.
When the radiologists are aided by “good AI” they are correct in the range of about 80%. When they are aided by “bad AI,” they are influenced to worse performance ranging from 19.8% correct by inexperienced radiologists to 45.5% correct by very experienced radiologists. Troublingly, the study does not include a control group where radiologists are not aided by AI.
If “bad AI” can influence radiologists to poor performance, do we think the danger is with the use of AI? Or with the training and performance of the radiologists? Who is accountable for the outcome? It’s the doctor.
It seems to me that this doesn’t so much demonstrate the risk of using AI, it’s the risk of using AI while being asleep at the wheel. Dr. Cohen’s notion of “a radiologist in the loop” sounds pleasant in theory, but if we use Dr. Cohen’s own cited study as evidence of the risk of AI influence on outcomes, how much value can we expect from mandating expert involvement?
(I actually agree with Dr. Cohen in principle on this point, but I think we have to be more intentional than simply mandated involvement.)
I’ve been pretty blunt - and borderline repetitive - in saying that I don’t believe artificial intelligence is a replacement. It is, however, an augmentation in the same way stethoscopes and email and gasoline-powered automobiles are augmentations to human work. With controversial exceptions, no one imagines to hold an inanimate object accountable for how a human uses it, and such should be the case here.
Consultations hold a unique place in the practice of medicine, it’s true, as the VCPR remains with me rather than a distant specialist whom I ask for advice or insight. I remain responsible for where I source my information. If I rely on a myopic radiologist (who, using “good AI,” may only get things right 80% of the time) or a scattered internist or an incorrect message board reply or an outdated textbook, the credit, success, fault, failure and responsibility remain with me.
I can’t imagine a serious clinician who would attempt to dodge responsibility, blaming faulty information and believing themselves absolved of guilt. Maybe a specialty that experiences a high rate of lawsuits might believe differently, but ownership of the VCPR is the standard in the day-to-day life of a clinician.
I wouldn’t have it any other way.
Hallucination
Offering a six-month-old use case of a chatbot hallucination as evidence of the limitations or dangers of using a large-language model is modestly helpful in illustrating risk, but not a path to drawing meaningful conclusions about current utility.
The demonstration is akin to searching for something on Google, not getting the desired result, and deeming the whole project too dangerous to continue. ChatGPT is not a search engine, being troubled by its failure in that capacity is like blaming a baseball bat for damage caused when it’s used to hit someone on the head.
I wish Dr. Cohen had used this as an opportunity to teach the audience about responsible use rather than an extended way to paint the software as dangerous. I worry that his demonstration of a large-language model’s failure represents a fundamental misunderstanding of its structure and function.
Nowhere is it noted the vast improvements in hallucinations in further iterations of ChatGPT.
Testing and Validation
The assertion that we should not practice medicine if we lack complete understanding represents a complete lack of understanding of the practice of medicine.
Did John Snow know all the effects of temperature effects when he tested ether? Had he ever seen the organism Vibrio cholerae when he convinced the authorities to remove the Broad Street pump?
Did Edward Jenner understand the genome of the virus when he tested cowpox as a smallpox vaccine?
Did Alexander Fleming understand beta-lactam antibiotics and drug resistance when he noticed a contaminated bacterial culture?
Medicine is forever a work in progress. We are always improving and always getting better. Sometimes we have to try things that might work before we have things that will work. Holding back innovation until the invention works perfectly is a good way to freeze progress and adopt the status quo as a finished product. How could we possibly accept that in medicine of all things?
Regulation
As current regulations do not pertain to veterinary medicine, I’m choosing to abstain from commenting at length on what I believe to be little more than a straw man argument. If Drs. Cohen and Gordon find the medical device or software as medical device (SaMD) regulation to be inadequate, I encourage them to write their respective legislators. Their position is entirely hypothetical, speculative, and not from a position of professional expertise.
None of the regulations discussed in Dr. Cohen’s lecture or the paper have any bearing on the use of artificial intelligence in veterinary medicine. Dr. Cohen adds his own opinion of what does and does not constitute a medical device. His expertise is limited to veterinary medicine and radiology and does not extend to artificial intelligence, medical devices, or regulatory interpretation.
Data Ownership and Privacy
This is another area where I will refrain from commenting at length. There is considerable variety in legislation and regulations as to data ownership of veterinary medical records between states and I am not of a mind nor expertise to search and verify case law and legislation on the matter fifty times.
I believe that this will become more important and better defined in the future, but it isn’t yet.
General Recommendations
The seven points in italics are taken from the original paper verbatim. Comments beneath are mine.
AI technology does no harm.
A functional impossibility. Abiding by this rule as it is written, we would not have scalpels, antibiotics, or vaccinations to name a few.
There’s a time in the lifespan of everything that works when it doesn’t yet. Why would only artificial intelligence be subject to a standard of absolute perfection?
Radiologists and other domain experts should be “in the loop” from start to finish of development, deployment, and supervision of AI products.
The paper cited in Dr. Cohen’s paper demonstrated the weaknesses of subject matter and domain experts in the use of these products. While I agree in principle, I believe we are limited in our own expertise and value. Expert opinion, as we’ve proven, is fallible.
AI companies and their products should be transparent, and provide/disclose information relating to data use, validation and training, calibration, outcomes, and errors.
This would remove much of the financial incentive to develop these technologies and would, quite likely, eliminate all but the academic endeavors to do so.
AI products should be subject to peer review (ideally prior to entry into the market for clinical use) and guided by position/white papers by domain experts (e.g. ACVR/ECVDI) when available.
Again, I encourage any who believes this to write their legislators. This is far beyond the authority of any veterinary body to regulate. Further, should we invite such regulation (and we absolutely should not), it should apply to all medical devices and SaMD.
When medical errors occur, a root-cause analysis should be performed to identify to points at which decision making was faulty. Ideally this would be shared on a national database. Companies should be transparent when errors occur.
AI works differently than traditional software, applying the same means of analysis to probabilistic software that we would use to evaluate deterministic software is likely to be ineffective.
A national database on technology that evolves this fast would be meaningless.
Training already occurs on AI bots. A generative, pre-trained, transformer is what you use with ChatGPT. It’s constantly receiving feedback.
Until further progress is made, the profession should strive to have radiologists involved in final imaging diagnosis in conjunction with AI, rather than by AI alone.
Respectfully, I am a veterinarian and have read thousands of radiographs without the assistance of artificial intelligence or a radiologist’s support in the past decade. I have been grateful for the opportunity to consult with specialists when requested and I value teleradiology services. But very experienced radiologists are, with the help of “good AI,” correct little more than 80% of the time. In my opinion, that falls short of absolutely requiring their advice on every case.
Conflict of Interest
Radiology is a field perhaps most threatened and poised for disruption by the advancement and widespread use of artificial intelligence. This paper was written by two members of the ACVR who claim no conflicts of interest with the use of artificial intelligence. However, one author owns and operates a teleradiology company and the other has a variety of business interests that may or may not be threatened by or benefit from AI. The authors’ declaration of “no conflict of interest” is not entirely truthful.
Conclusion
I have a decidedly unfair advantage in my criticisms, namely that artificial intelligence has advanced mightily in the last six months to 12 months since the publication of the lecture and paper, respectively. However, as these works are still accessible and have not been retracted, I believe I am justified in offering critique. The world has advanced, and the works persist in their original form.
My perspective is different. I’m a clinician, so I experience my patients in living color, rather than the black-and-white of a radiologist’s screen. And my particular line of work is markedly less threatened by the advancements in artificial intelligence than those common to Diplomates of the American College of Veterinary Radiology.
The lack of recent papers cited, despite an abundance in the scientific literature, regarding the advances of artificial intelligence and their use in medicine could indicate to the cynical mind that the authors sought to demean the potential of AI or inspire fear at its potential use. Less than a quarter of the scientific references listed in a 2022 paper were published after 2018. In a field moving as fast as artificial intelligence, we must endeavor to keep up with the advancements.
In defense of Dr. Cohen, I feel he should have included specific examples of AI-powered teleradiology providers that make wild assertions about the power and value of radiology. I did something similar in an article about veterinary telemedicine. Direct citations of false, misleading, or disingenuous claims that threaten the quality of care provided to patients would’ve been compelling, much more so than references to science fiction movies. These companies advancing a notion of a world without oversight or responsibility in medical practice is dangerous. (For example, an AI teleradiology company advertising “99% Accuracy” on their Sponsored Google advertisement but making it quite difficult to find a white paper or peer-reviewed publication on their website supporting it.)
I believe that my critique boils down to this: to do no harm is a foolish, simplistic mandate. There is always risk. A patient could be allergic to an antibiotic or anesthetic risk, they could slip on the floor of the hospital and injure themselves, or they could be so startled by my knock at the exam room door as to have a cardiac event. Should I not use medications, have a hospital, or knock on the doors? Or should I accept that the world is a wicked one and do my best with the tools and information available? We live in a world where things can go wrong, and to insist that nothing goes wrong when creating something new is an unattainable standard.
The authors seem to intentionally lack anything good to say about the power and potential of the application of artificial intelligence, which is crucial in the evaluation of ethics, especially medical ethics. Risk, sacrifice, failure, and catastrophe are unfortunate parts of scientific advancement, and to fail to acknowledge the many existing applications of artificial intelligence that benefit us and our patients is a failure on the part of the authors and the peer-reviewed publication. Omitting all the good that can be done with this technology appears a deliberate and disappointing choice when discussing perhaps the most potent aid to medical practice since the internet.
The lack of noted conflicts is something I also find troubling. The authors’ professional business interests are or were in competition with uses and users of existing and developing technology. I do not believe that a conflict of interest can be denied with integrity.
I believe that the authors are experts in radiology, but it is clear that they are not experts in artificial intelligence. However, so far as it is possible, I am. I don’t perceive AI to be the dangerous existential threat to practice and patients that it’s made out to be in this paper and subsequent lecture. If it were, well, I’ll trade the authors’ Potter Stewart quote for my own, “I know it when I see it, and [this] is not that.”
Author’s Note: I reached out to Dr. Cohen and Dr. Gordon prior to publication. Dr. Cohen did not respond, it was late in the day when I reached out. Dr. Gordon did respond and was utterly gracious and professional in his reply.
The possibility of AI influencing clinical specialists is an interesting one. I previously wrote about a different (albeit older, 2021, which is ancient in AI timelines) study that found essentially the *opposite*: that non-experts (ER and IM docs) were more susceptible to incorrect advice when told it was AI-based, while radiologists were highly resistant and able to see through the errors (so-called algorithmic avoidance)
https://open.substack.com/pub/allscience/p/5-minute-paper-do-as-ai-say?r=1nnpgl&utm_medium=ios&utm_campaign=post
This has important implications because many of the radiology AI start-ups are actually NOT marketed for use within radiologist workflows, but rather to cut them out and directly provide advice to general practitioners. In my view, that does put a much higher burden of proof in terms of accuracy and guardrails.
I have much less of an issue with assistive functions like already on-market IDEXX radiology AI (disclaimer, my former employer, though I was not involved in those products in any way) that automated measurement of a VHS score or TPLO measurements. This saves time and improves accuracy, can be accessed by either the radiologist or the GP, and critically, can be overridden by humans if it appears like flawed landmark assessment created an incorrect measurement