If AI Can Detect Race, What About Bias?

This transcript has been edited for clarity.

Eric J. Topol, MD: Hello. I’m Eric Topol for Medicine and the Machine on Medscape. I’m with my co-host Abraham Verghese. We have a terrific podcast today with Dr Judy Gichoya from Emory University. Dr Gichoya is a radiologist and data scientist.

She has a remarkable background. She received her medical degree at Moi University in Kenya, went on to earn a master’s in science at Indiana University-Purdue University in Indianapolis, did her diagnostic radiology at Indiana University, studied interventional radiology at Oregon Health & Science University, and has now joined the faculty at Emory University in Atlanta.

She was named the most influential radiology researcher of 2021 by AuntMinnie.com, the premier radiology website.

Today, we want to talk about what I consider one of the most important papers on artificial intelligence (AI). The paper was published in May 2022 in Lancet Digital Health by Judy and her colleagues. Welcome, Judy.

Judy Gichoya, MD: Thank you for the invitation.

Topol: This research looking at the ability to detect race with medical imaging is extraordinary. It builds on a few papers that showed how retinal imaging can detect sex accurately. In fact, human retinal experts can only pick up sex from a photo 50% of the time, whereas deep learning has 97% accuracy. This wasn’t anticipated. Can you give us the background of your study and your impressions?

Gichoya: Eric, I would like to have a big, majestic story telling you about how I was inspired to work on this. But really, it was an accident. You may not appreciate this because you’re a prolific writer, but the paper was rejected initially.

Two years ago, if you were looking at the special issues that were coming out, they were mainly focused on social justice in addition to COVID and all the issues that were affecting us with systemic racism.

The Journal of the American College of Radiology had a special issue to talk about bias in medical imaging. And I thought, this is a good time. I had participated in a data conference the previous year with some students from Singapore. And I realized that the chest x-ray dataset for the MIMIC database was underutilized.

I said, why don’t we look at this problem with this public MIMIC dataset? I found some of the earlier work that had been done by a team from Toronto who are now collaborators and friends. They had shown that we have very high rates of underdiagnosis when you look at the 14 chest x-ray labels in the MIMIC dataset.

When I found out that work had been done with that dataset, I said, OK, why don’t we look at the Emory dataset, which has an equal population of 50% Black persons and White persons?

I wrote to the Toronto authors and said, let’s repeat your study with Emory data. I was already seeing what their conclusion would be — that if you publish more diverse datasets, they will show bias.

Unfortunately, when we ran the preliminary results, we saw that the amplitude went down. So if you’re looking at the false-positive rate, with whatever AI Fairness 360 metric you chose, it went down but it wasn’t eliminated. That was concerning for us. In the process, we started having discussions about what could be going on. We had already given the model diverse datasets, but bias wasn’t completely eliminated.

One of our collaborators, Po-Chih Kuo from Taiwan, came back and said, these models are learning the patient’s race as part of their prediction. And we shamed him and said, of course not, you made a mistake. Go back. We even took the code and ran it again, and persistently we found this.

So now the project changed. This work had already been rejected for publication as an abstract submission. But now we were excited. Was this real? What could be causing it? We thought maybe there were confounders. Maybe in these models, all the Black patients are sick, they have cardiomegaly, and this is what the model is learning.

I don’t want to give away the ending, but we haven’t found the reason why, although this performance represents superhuman ability. It was a happy accident, and an amazing group of collaborators that led us to ask and answer this question.

Abraham Verghese, MD: Dr Gichoya, welcome to the program. I must confess that I’m not a math or computer person, which is probably why I’m in medicine. This may be a naive question, but in a way, it seems to me a blessing in disguise in the sense that if AI detects our race, the consequences aren’t as apparent as with, for example, housing loans or mortgage refinancing, where clearly the providers’ bias on the previous datasets is creeping in. Is that the same problem we might anticipate?

Gichoya: This is a great question. Why does this matter? But first, to answer the question you didn’t ask, I’ll say that one thing I learned from this whole project is the importance of communicating new science. This work straddles the clinicians and computer scientists. The computer scientists and mathematicians, they want a simple problem. They want to say, this case is biased, and I’m going to work on my math and fix it.

Why does race matter? Now, we understand even without algorithms that race matters in terms of pain outcomes, maternal mortality, and so on. But this ability for the AI models to see your self-reported race in medical imaging, at least in our research, is important for two reasons.

One, because we cannot really understand why. That may be OK. As a radiologist, I don’t really understand how magnetic resonance imaging works, but I do understand when it fails. So maybe the answer is to move away from trying to understand why to understanding when it matters.

Second, when we change the medical images so that I can show you just a gray image and tell you this was a chest x-ray, the AI models still provide a surprisingly good performance, better than humans.

This tells us that if you show me five images of a skin prediction algorithm and it doesn’t work, I can say, well, this belongs to dark-skinned patients. In this case for radiology, you really cannot tell, because it’s an ability that we radiologists, when we tried to do this task, perform randomly at 50%-55%.

On the other hand, we see two algorithms that work well. One is the Mirai algorithm from Harvard that says, just give me the mammogram image, and I will tell you the 3- and 5-year breast cancer risk for this patient. I don’t need to look at any clinical information. And you start to see these models perform way better than even the Tyrer-Cuzick model or the Gail model for Black patients. So it’s exciting.

The second is the osteoarthritis prediction algorithm from Ziad Obermeyer, which looks at the knee image and grades and correlates it to the pain score. All this is to say, this is a mess. We don’t understand when it hurts and when it helps. But that’s where the science needs to go.

It’s still early to be able to figure out why this matters. What I can say is that the approach of the computer scientists and mathematicians is too simplistic to harness the power of AI, at least for image-based models moving forward.

Topol: You mentioned that you created saliency maps and tried to deconstruct the model to see if you could find features that would help you understand how it’s detecting race. But you couldn’t find anything. Is that right?

Gichoya: Right. We couldn’t find anything.

Topol: Across many different types of imaging — chest x-rays, computed tomographic scans, mammograms, and more. Did you have any confirmation with Asian ancestry?

Gichoya: The Emory datasets do not have a big population of Asians, but the Stanford datasets do. We are in a partnership of eight federated learning centers where we are doing this work again. One of those centers is in India, and another is in Taiwan.

In our preliminary inference, what was surprising for our model is that I could send it to you and you don’t need to fine-tune it. You don’t need to train it. You just run the inference, and you have 94%-95% accuracy. That was quite surprising, because typically, when you bring a model to new data, the performance drops.

Someone tested it on a Japanese cohort and the performance was terrible; it was around 20%. That’s the only time. We never had enough data, which is why we went to this federated learning model to figure out why. When it was tested on the Taiwanese population, it was good. What we haven’t been able to do is look at race in Black or African-American persons.

But it’s also heterogeneous. For example, I’m from the African continent. Would we see a difference or a drop in performance? People suggested that we look at the performance of prior failures of pulse oximeters, for example, because it could be an equipment and calibration mechanism. Maybe that’s what’s causing this phenomenon. But we really don’t know.

Verghese: In Eric’s last book, Deep Medicine, AI was featured heavily. Eric’s conclusion was that, in a way, AI plus humans will be a formula that’s going to make us much more astute clinicians than AI or humans alone. But I’m not yet seeing the full application of that partnership of humans with AI. We seem to see these pure AI papers. They’re handed over to people like me who are much more patient-centered. Where do these two streams come together to make us better at doing what we do?

Gichoya: There is a technical answer to that question, and then there’s a sort of reality answer to that question. Most of the acceleration in terms of AI has been driven by funding from venture capitalists. I believe some of the companies start off with domain experts and then drop them off, trying to accelerate to fail.

Initially, I did not believe we would see this autonomous AI that works without humans in the loop. But in Europe, an AI tool developed by Oxipit was approved this year, and it reads chest x-rays without a radiologist. This tells me that this is the best time to be doing this type of work and research.

We need to understand what a human-machine partnership would look like. So let’s look through the radiology workflow and find an AI algorithm that could be used to prioritize which studies I should dictate fast and which ones I shouldn’t. As the referring provider, you may say, I don’t care, I’m an emergency medicine doctor, I just want to the fastest read. In that case, for the emergency medicine doctor, speed is the utmost value.

But if you’re a cancer expert, then the detail is important — you may even want one specific radiologist to interpret your studies because you sit on tumor boards with that radiologist and discuss these cases, and there’s this trust. We know this happens.

I may have trainees with different skill levels. And I may want to read the easiest study so that they have a chance to see a rare case or a more complex study. All those human values. I know there’s the value system design. But nothing has been done in terms of designing the values that meet the needs of the radiologists, the patients, and their referring providers. This is an area where it will be interesting to see what comes up.

If you have an autonomous AI, it generates a report. We are in an era where all patients should get their reports within about 24 hours. When they have a question, who are they going to call? If I disagree, what am I going to tell my patient?

We know all these impacts are coming that we haven’t even started to think about — maybe we are thinking about them — but I don’t believe we have tried to think about the scale or the burden that will come with this human-machine partnership and what a successful partnership would look like.

Topol: That’s such a critical point about implementation and how we’re not really prepared in so many ways as this goes forward. It’s extraordinary what machines can be trained to see accurately. For example, going back to the retina, how it can pick up the coronary artery calcification score from the retinal vessels and predict heart risk, and many other things, like kidney disease, hepatobiliary disease, Alzheimer’s disease, blood pressure control, and glucose control.

You need to have an unlimited imagination about what you may be able to pick up. One of the striking things about your recent study is that it was almost unimaginable that machines could have eyes like this.

Your background is striking. You’re one of the rare physician scientists who is jointly trained in radiology (no less interventional radiology), and as a data scientist in AI. There aren’t many of you in the world. You talk about machine eyes, but your human eyes have transdisciplinary expertise, which puts you in an unusual category. How many physicians are there with backgrounds like yours?

Gichoya: The number is increasing, but it’s still not many people. It’s a small circle, so you get to know everyone. But we’ve seen from medical schools quite an appetite for people with computer science backgrounds.

Unfortunately, medical school can kill all those other interests. Or when you have other interests, people assume that you are not a good physician. But I may be biased because I tend to work with these people, and we’re starting to nurture some people who are coming up in this field.

The second thing is that maybe the needs are shifting. Perhaps we don’t need people who can program. One thing I love to do is look at what Google AI is coming up with, in their blogs or when they’re presenting. It tells you that if you think about Amazon’s or Microsoft’s internal process, how long it takes for an internet protocol to come through. If you see what they’re working on and when they publish it, for me it’s like a proxy to tell me where this field is going. So thinking that you’re going to come up with a new metric is false.

This morning, I saw that Google AI has now revealed how they’re going to use chest x-ray embedding and use just 500 images to train a COVID model. That’s crazy if you think about what that means in our field. Or recently when Amazon said, hey, why don’t you come work with us? Our expertise is to figure out how to inject adverts at runtime.

So I do not decide that this movie should have this advert before it runs. Instead, I see Eric browsing the internet for socks. So when he’s watching this movie, I can inject a socks advert. These are unimaginable things.

These physician scientists are not supported in their academic institutions, because I may generate more money for the university when I work as an interventional radiologist than an informatician. That means you do need to find a home that supports the informatician to increase the number of these people.

Also, there need to be new skills. It may be that your strong validators bring big domain expertise. But you can speak to the computer scientists instead of dying trying to learn the math and program. Because validation and being able to pick up these ideas and quickly test them is the most critical piece. As we start to think about the ethical implications of these human-machine collaborations, we’ll need different minds from those we have right now in the workforce.

Verghese: Dr Gichoya, I’m fascinated by your story. You and I come from the same continent. I was born in Addis Ababa, not far from you in Nairobi, and began my medical school there before the revolution, when my studies were interrupted.

I’m intrigued by the journey you’ve taken to come to America, and I’m also reminded about the richness that international medical graduates bring. Not only are they necessary in terms of man- and womanpower, but they also bring a richness that people don’t often appreciate. Your life story certainly is a nice illustration. Talk about your origins and your journey to get here.

Gichoya: I was born east of Nairobi. I’m the first physician in our family. I was growing up when computers were new. In fact, I brought my first computer when my home did not have any electricity. This was in high school. I had to keep it at my uncle’s place and incentivize the usage by playing music and movies from there.

I bet that you also have this same feeling of opportunity and gratitude of the chances afforded to you. When I learned how to program, which was in Pascal, I had to copy all my answers from a floppy disk drive and bring them to my computer, and then come up with new questions, and then go back to the city and pay for internet, because there wasn’t widely available internet and we didn’t have cell phones.

The Kenyan way is that if you do well in high school, then you most likely will go to medical school. I was drawn to Moi University in western Kenya, because they used a problem-based learning that allowed a lot of curiosity. And so, at the last minute, I agreed to attend that school and I ended up enjoying my time there.

Out of laziness, I started to connect colleagues’ computers so that we could exchange notes and movies, and any materials that we wanted — videos we had recorded, pictures. Consequently, I ended up enjoying computers.

When I was in medical school, we were grappling with the HIV pandemic. There was a lot of emphasis on electronic medical records (EMRs) and trying to figure out where the patients were dying. I got involved with this and then pivoted to health informatics. Later, I would come to the United States to specialize, but I’d done quite a lot of work by then.

It’s been great. I’ve enjoyed working at this intersection of medicine and technology. I’ve had fantastic mentors and friends, and my cup keeps overflowing. I’m trying to pay it forward.

Topol: You’ve also worked at the National Institutes of Health (NIH). Tell us about your experience with the National Institute of Biomedical Imaging and Bioengineering (NIBIB).

Gichoya: About 2 years ago, the NIH was trying to build capacity and rejuvenate this focus on AI and data science. It was clear to me that the NIH was behind on this area. Most of the investments into this have been made by the National Science Foundation.

One of the ways to accomplish this rejuvenation was to bring experts as data scholars to the NIH to work and learn. This is also a way to accelerate expertise, to learn about the NIH, bring new voices, and allow bidirectional learning. So I had this opportunity, sponsored by NIBIB.

In fact, I don’t work for NIBIB; I work for the Fogarty International Center, which is investing $75 million in Africa to harness data science for health. That’s been amazing. It’s also the right time. This was just funded; we are in our first year. It’s sort of like the amazingness of reverse innovation, if you think about it.

For example, how do you conduct multidrug resistance surveillance in a big continent? How do you think about genomics? How do you think about climate change? How do you think about maternal and child health? There are seven funded hubs. I support the coordinating center and the open data science platform, trying to figure out how to harness data science for health, building a community of data scientists, and now working to expand partnerships and writing about the current data science status and what it will be in 10 years in terms of priorities. That’s going to be published in Nature.

It has been an amazing experience for me. You cannot imagine how the NIH works. Also, I’m more comforted when I don’t get grants. I don’t take it personally now. I see it’s a tough world out there.

Verghese: Where is all this is heading? You’ve hinted at that, but in terms of diagnostic radiology or radiology in general and AI, how will it all unfold?

You would think things like echocardiography and ultrasound would make us better at the bedside, better diagnosticians. But I may be the last noncardiologist who can pick up mitral stenosis cold. Even some cardiologists would struggle with it. You might say, who needs that? But I wonder, how is all this AI going to make us better physicians?

Gichoya: People are realizing that doing this work is extremely difficult. So we’re starting to see consolidation. I believe more of the investments will go to platforms. If you think about when EMRs were first being introduced into the healthcare space, you saw a lot of people trying to lock you into their platform.

We see this a lot in the marketplace. Everyone wants you to install their platform — Judy’s platform — so that I can then distribute all the AI models. If you’ve tried to work through such a program, you know how difficult it is, so you’re never going to change the platform. We see that type of market consolidation coming.

Another area where AI has potential — but we need to understand what it means — is similar to the work we did for this reading race paper, seeing these hidden signals that radiologists may not really appreciate.

There’s new research that shows that you can use the same models to tell you what the patient’s healthcare costs will be. That’s more concerning in my opinion; I think this research shows that, since we don’t have enough audit tools, there’s potential for confounders.

But if you can start to tell what the healthcare costs will be from imaging alone, I think this implies hidden signals, We’ve done some work that shows that you can use the chest radiograph to complete the patient’s problem list. It’s going to tell you that this patient has cardiomegaly or congestive heart failure. And when you audit the charts, you find a missing code.

I believe these population health types of projects will have a bigger impact because they provide opportunistic screening. They provide triage for ambulatory surgeries, as we start to see the work on body composition coming in, telling you the frailty of a given patient. It is indirectly helping you make a diagnosis.

Another area, apart from population health, for opportunistic screening is the area of triage. Most radiologists are not in academia; they are in private practice, so they must read more studies. I would bet my money that we’re going to see adoption of these AI technologies in these markets where you are reading more studies. I don’t know how enjoyable the job will be if you only read complex studies because all the normal studies have been read by the AI algorithm. Now your day is just full of difficult studies.

But I believe that the biggest threat for radiology is not even AI; it is market consolidation, these buyouts, and venture capital money injections. The venture capitalists will have a very low threshold for improving productivity and output.

So, when we have the AI tools that can do that — and we’re starting to see some of these companies also buy the AI companies that are building software — it will be market forces more than the immediate needs of the patient that we will address, because of what’s happening in the bigger space.

Topol: I think the teleradiology businesses in India and many other places are going to be machine radiology businesses. Before we wrap up, I do want to get your sense about bias and AI.

I think the public and the medical community tend to believe there’s something intrinsically wrong with AI and that it’s biased, whereas in many of the studies, and even some that you touched on, the bias wasn’t about the algorithm but rather the data that were input— they were terribly biased, and oftentimes missed.

Where is the culprit? We’re never going to eradicate bias, but how can we improve this situation and the predicament we face?

Gichoya: Everyone is putting a lot of effort into this — a lot of NIH investments to bring more datasets. Even I have had to learn a lot more about this. People think that just because you are included, that’s enough. When I get my arms around this, I hope to be able to say that representation is not enough. Just because you include a person does not mean that you eliminate the bias either in the data or in the algorithm.

Bias is being found everywhere — in glomerular filtration rates, O2 saturations. Research by Dr Celi’s group has found racial differences in pulse oximetry readings that lead to lower levels of oxygen supplementation in Asian, Black, and Hispanic patients.

There are also these things that are not really behaviors, but patterns that the models can assess. For example, as an interventional radiologist, I tend to do more embolizations for gastrointestinal bleeds at night. We know that if you are short-staffed on the night shift, people will say, call the interventional radiologist more than during the day when they are sufficiently staffed to do endoscopy. So we’re starting to see these patterns. What I worry is that maybe people will be turned off from looking at these problems.

There are two things I think we should do. One is to not shame, and instead encourage uncovering these biases. The second is to figure out the consequences and discuss the implications. Because when you publish a paper that says that oxygen supplementation is different among races, the goal at minimum is that unconsciously, you’re thinking about this.

I don’t know how to disseminate that. But Dr Verghese is an expert. Maybe he can make that one of his next keynote presentations. How do we communicate this in a nonthreatening way to make sure that people are not shying away from trying to figure out some of the patterns? It’s an unintended consequence, but the datasets show what’s going on.

My final comment about this is that we need to offer incentives for people to work on data. The funders don’t fund this. We’ve seen some really good work, with support from the Moore Foundation and the Lacuna Fund. But it’s not the mainstream bodies that understand the importance of just working on good dataset curation. This has been humbling for our group for the past 3 years. It’s a thankless job.

Verghese: It’s such a pleasure to get to talk to someone like you. Whatever you’re doing, please keep doing it. Because you clearly are breaking new ground. I’m delighted to have had this chance to chat with you.

Gichoya: Thank you.

Topol: Speaking of algorithmic AI models, you’re a model for the future of medicine. We’re going to be following your career with great interest. In so many ways, you’re advancing the field and you’re still young, you’re just getting started.

We hope to see more physicians pursue joint disciplines like you, because you have insights that do not happen when you’re siloed. Your work, and not just what we predominantly discussed today, is already a great contribution, and we know you’re going to keep building on that.

Thank you for joining us today.

Follow Medscape on Facebook, Twitter, Instagram, and YouTube


Leave a Reply

Your email address will not be published. Required fields are marked *