Real Gen AI Use Cases in Healthcare | Matthew Woo, Co-Founder Summer Health
Show Notes
Matthew Woo, Co Founder and Head of Product at Summer Health joins the thinksquad, aka Danielle and Nikhil to unpack how Summer Health builds an AI first company. He breaks down how they think through operationalizing AI, instilling an AI first culture throughout their team and how you can start applying it now.
This episode is sponsored by Out of Pocket, because no one is prouder than us than us.
You should also check out our courses, including ones taught by yours truly (How to Build A Healthcare Call Center and Healthcare 101).
Referenced websites:
OpenAI Customer Story:
https://openai.com/customer-stories/summer-health
---
Hosts:
Nikhil Krishnan (twitter)
Danielle Poreh (LinkedIn)
Guest:
Matthew Woo (LinkedIn)
--
Timestamps
(00:00) Intro to Matthew
(02:10) About Summer Health
(03:04) Achieving SLA for SMS Response Time
(04:28) Evolution of SLA and Scaling
(05:55) Improving SLA through Staffing and Routing
(08:42) Managing On-Call Operations
(09:36) AI Integration in Healthcare
(16:19) Adopting GPT Models in Summer Health
(19:44) Process of Obtaining a BAA with OpenAI
(21:29) Availability and Cost of BAAs
(23:49) Exploring Other Large Language Models
(26:34) Designing Effective Prompts for AI
(29:53) Embedding AI in Company Operations
(32:43) AI Tools and Workflow Optimization
(36:51) Expanding to Multimodal AI
(43:02) Scoring Empathy in Conversations
(44:44) Choosing the Right Problems for AI
(46:09) Bringing AI into an Organization
(47:07) Starting Small and Proving Use Cases
(48:05) Implementing AI into Clinical Workflow
(49:03) Monitoring and Reviewing AI Output
(50:01) Doctor's Excitement and Adoption of AI
(51:27) Takeaways
Podcast Transcript
[00:00:00] Nikhil: So I feel like 2023 was the year that people only talked about two things in healthcare, GLP 1s or GPT 3. 5 or 4, depending on how much money we're willing to shell out.
[00:00:17] Danielle: You're the only one who's talking about GLP 1s this much and like,
I can't tell if it's because you're on this weight loss journey that no one knows about.
[00:00:24] Nikhil: This is why we're doing a podcast, no one can see that. But I do feel like everyone was talking about large language models. How do we bring it to our orgs?
Like, what does an AI first organization look like? It's all these like hand wavy buzzwords, but it was cool to talk to Matthew who, you know, at Summer Health are trying to like make this practically an AI first organization and What some of the things they've been trying, um, to do that have worked really
[00:00:44] Danielle: well.
I feel like Matthew could grow, like, a Dumbledore sized beard and talk about A. I. all day and I would just, like, sit there and listen, like, as a pupil. He's just, like, so wise. Only
[00:00:54] Nikhil: you talk about people as Dumbledore of things, so That's true, a
[00:00:58] Danielle: second time I've done that. [00:01:00] Exactly. No, but for real, he was, like, spitting all this wisdom and just, like, simple reframes.
Like, he was, like Look at GPT as your intern. And I was like, oh shit, like I would do that so differently now. So I think you just like reframed so much of the hand wavy stuff into actually useful ways to use AI.
[00:01:16] Nikhil: Totally. I think there's a lot of really practical stuff people can take away from this episode, especially if they're trying to prototype a lot of new, um, like new products.
Um, and hopefully you take a listen and some of this applies to things you're trying to do within your own org. So take a listen, let us know what you think. And, and, uh, we're excited to do more of these. Yeah.
[00:01:35] Danielle: Welcome, Matthew. Stoked to have you on here.
[00:01:39] Matthew: Are you happy to be here? Are you ready to have fun? Yeah, I'm ready. I'm ready to start my creator journey. My influencer journey. I'm joking. I just want to talk about the nitty gritty stuff. It should be fun with you guys.
[00:01:49] Nikhil: Step one, not do podcast interviews from a phone booth.
[00:01:54] Matthew: Just kidding. Step
[00:01:55] Danielle: two, get a microphone and
[00:01:57] Matthew: headphones that work.
[00:01:59] Nikhil: Just [00:02:00] kidding, just kidding. We're not going to roast you the whole time, I promise.
[00:02:04] Danielle: Matthew, maybe you can start us off by sharing a bit about what Summer Health is and what y'all do.
[00:02:10] Matthew: Yeah, so Summer Health is Asynchronous messaging platform, and we focusing on SMS as the primary channel, and we really focus on helping parents turn primary care into everyday care, especially with, you know, the shortage of pediatricians and so parents can easily reach a doctor within On average, like two and a half minutes.
And on top of that, we've built not only the access to primary care, but also specialists. So everything from nutrition, lactation, sleep, uh, and coming soon behavioral. So that's in a nutshell what summer health is. Can you tell
[00:02:46] Nikhil: me a little bit about like how you hit that SLA of like under X? Number of minutes for SMS.
Like, is that like pulled out of your ass kind of thing? Or is there like science behind it?
[00:02:56] Matthew: No. Well, well, I think that there's, [00:03:00] I would say that there's two parts of that question, like one, you know, how did we come up with an SLA that we thought was compelling and then to whatever actual SLA is. And basically when we started Summer Health and we did user research, we heard from people that the most similar kind of analogy is calling a nurse hotline.
Where, one, they would have to wait to get in touch with a nurse, and then two, they would have to wait to hear back from the nurse on, you know, from an actual doctor, and typically that process took about two to three hours. I mean, we thought to ourselves, well, if you wanted to make this ten times better, you know, what would that be?
And we landed at about 15 minutes SLA. The second thing is, you know, we track this religiously, um, and it's one of our goals that every single month that we have a 95 percent SLA. Currently it's around 96. 7%. But that's, you know, more or less kind of where we landed. And yeah, right now, as of November, it was 2.
87 minutes was the average wait time for someone to get in touch with a doctor. Obviously you can go into specific of how we make that happen, but [00:04:00] that's kind of the That's better response time than my parents and my friends. Which is impressive.
[00:04:07] Danielle: My mom answers that fast, but it's usually like one word answers that are like totally off base and not related to anything that I said
[00:04:13] Matthew: before.
[00:04:14] Nikhil: SLA achieved if you just say, okay. Yeah,
[00:04:17] Danielle: can you give me a sense of your scale? So was 2. 87 or whatever that number is. It's now, what was it like on day zero? And then like, what were, what were some of the evolution of that SLA?
And also on top of that, if you could share a bit more on just like how many texts we're talking about here, uh, in the general volume.
[00:04:37] Matthew: Yeah. So, you know, we've been around for about a year and a half and we've done over 10, 000 visits. Um, so, you know, pretty sizable, uh, sample size of, you know, our ability to deliver on the SLA when we first started out.
Um, our SLA was about 85 percent in terms of 15 minutes and actually the funny [00:05:00] story is that we went into a board meeting with our investors and they're like, if you promise a 15 minute SLA, it better be. Okay. Over 90%. And then we spent the next three months just optimizing our operations, um, to really make sure that we could hit that.
And, and, you know, kudos to, uh, I mean, all the, say that the investor, it was Alfred Lin at Sequoia, uh, kudos to him. Like once he really focused on that metric, he really saw it. Uh, retention, improve and engagement, and people started to use this in a different way. It's similar to how when people first started using Uber, it was really just a taxi service or maybe to the airport, but because it was so easy to use, people started to use it for grocery runs and like nights out in the town.
So we kind of saw a very similar behavior as we started to really improve the speed at which we were able to deliver care. Can you, um, talk a little bit about
[00:05:47] Nikhil: like what the hardest things were to go from the 85% SLA to where you are now? Like what things had to change? Is it like. You're triaging requests and inbound.
Is it like dealing with, you know, [00:06:00] figuring out like where do you introduce tech? Is it staffing? Like what are the hard? What is like, what are the main levers you use to basically reduce the SLA
[00:06:06] Matthew: time? I would say it's two things. One was the staffing and then two was just the systems we built around routing.
The, and then the last thing I say, I'll say maybe the start off is we approach it from a very naive perspective. Um, and maybe it's because no one on the team. Uh, aside from our CEO, Ellen had worked in health care before, but we didn't really think like, oh, let's like supplement this with nurse practitioners or P. A. S. We really wanted to keep to the promise of getting people to doctors immediately, regardless of how big or small the question was. So the way we approached it was one from a staffing perspective. How do we make sure that there is a provider on? at any given time. And what we ended up doing was we started to really focus on providers that had multiple licenses, um, and really started to [00:07:00] recruit really heavily on that.
And I think at this point, we've probably cornered the market for pediatricians that have 50 state licenses. I think we probably have 80 percent of the pediatricians that have that type of coverage and therefore allows us to basically have someone on at any given time. So that's more of a staffing scheduling.
Um, you know, uh, solution there. The second thing that we did was, again, I'm not sure how other people do it, but we had, we had built our entire routing system from the ground up. And so it started off with a very simple round robin. But what we would do is every time you try to write it to a particular doctor, if they didn't respond within Um, and then from there, we would like.
We were ready to the next doctor that was available. If a doctor that wasn't available at the 10 minute mark, then we would actually page and on call operations. Um, so at the beginning, that was me and, and Tim, my head of ops. Um, and then from there, we would like, now we start calling our doctors, like literally just like.
Pulling up the phone, calling them and be like, Hey, can you get online? Take this. Um, and then [00:08:00] from there, we would like. Um, Um, Um, We weren't relying on some type of app to like set a push notif. We would actually start to set up very similar to how engineers have on calls. We basically have pager duty for doctors where if they didn't respond after a given time, we would first, you know, send them an SMS.
If they didn't respond by SMS, we then, you know, basically. Call them, uh, using PagerDuty, uh, obviously our version of PagerDuty internally, so that's kind of how we, you approach it from a very, like, I don't know, traditional engineering software type of approach when it comes to on call. Are you getting calls in,
like,
[00:08:35] Nikhil: the middle of the night?
Like, are you, are you getting these escalations and you're like, oh, shit, it's 2am, I gotta, like, call it back now.
[00:08:42] Matthew: Definitely, definitely in the beginning, you know, first couple of months, uh, it wasn't great for my relationship, but now, but now I would say it's probably closer to like maybe once a month and we now do it on rotation between, uh, three different people on Ops, so.
Wait, isn't
[00:08:56] Danielle: your partner
[00:08:57] Matthew: a doctor? My partner is a doctor. [00:09:00] That being said, she is a dermatologist, so she doesn't quite get the on call situation. Like I do. But I run it! You're in bed and
[00:09:07] Danielle: you're just like, can you
[00:09:08] Nikhil: take this one? The true ops hack no one wants to talk about. Marry a doctor.
[00:09:16] Matthew: The other thing I'll say is, uh, definitely, you know, if you're planning to do this on call, have an Apple Watch, uh, or something that's gonna buzz you instead of, uh, using an actual phone alarm.
It would definitely help with the, you know, the conflicts with your partner.
[00:09:30] Danielle: So do you send all your doctors an Apple
[00:09:32] Matthew: Watch now? Uh, no, no, but mainly for op, like, well, maybe that's something we should think about, but no, it's mainly for the people on the ops team. So I'm going to
[00:09:41] Danielle: summarize that back because I think there was so much value in there before we move on.
So the first thing I heard you say, like, was, uh, staffing. So you hired folks that could have built in redundancies because they could Basically take a patient from anywhere, which means you wouldn't have to delegate that task out to folks that could only service in [00:10:00] that particular geography, which is like a super, super smart way to think about building in redundancy.
And then the 2nd thing was a lot of the routing system. So you were incredibly tight around SLA and escalations. During, in really, really short intervals to make sure somebody was aware of it and that an action was taken. Is that sort of like the two big pieces of it?
[00:10:22] Matthew: Yeah, I think those are the two big pieces of it.
And, and obviously there's data analysis that goes into when are you experiencing the most SLLAs and making sure that you have redundancies there. But, you know, that's, that's an initial kind of how we approached it. What,
[00:10:35] Danielle: what are some of those like edge cases or times of high volume that breaker systems?
[00:10:40] Matthew: Yeah. So what we found was that the times that broker system in the early days was between 7 p. m. to about 12 a. m. makes a lot of sense because parents are just getting off work and you know, the pediatric office is often closed or even the urgent care, frankly, and then also in the morning. So when they wake up and their kid's suddenly sick and they can't bring the kid to [00:11:00] school, they're now reaching out to us.
And so there was also, I think the time of 8 a. m. to 11 a. m. was also a really busy time for us as well. So kind of like that, that sweet spot.
[00:11:09] Nikhil: I'm curious, just like follow up on one of the things you said before about how like behavior change happened as you shortened the SLAs over time. Like, can you give us some examples of like, here's like an emergent behavior that like came out that we didn't really expect or as we shortened things down, like we saw these new types of interactions that we had to basically
[00:11:29] Matthew: deal with.
Yeah, so just to share, you know, we started off with a very simple hypothesis, which is if you were the fastest way to get care, um, you know, that's what, that's how we would wind up the market. And while that was true for urgent care, as you mentioned, we started to see other types of behaviors. So instead of it just becoming, uh, instead of people just coming in and sharing, Hey, my kid had a fever.
You know, X, X, Y, and Z symptoms, you know, should I bring into the urgent care? We started to get [00:12:00] parents, especially in the middle of the day, just having some type of question around, Hey, I'm trying to get my kid to sleep longer, you know, how should I think about it? Or, you know, Hey, I'm trying to, I'm planning to transition them from, you know, breastfeeding to formula.
You know, how should I, how should I think about that? How should I think about that? What are some of the side effects? You know, hey, I, you know, Kit was exposed to XYZ. You know, is that something I should be concerned about? Or hey, I think he might have this allergy. How do I go about testing that? And so I think that was kind of where we started to see the need to bring in specialists to help answer some of those questions.
In triage, we actually had a kind of an internal e consult model, but we wanted to, you know, Give that access to parents and really empower them. I think the other second emergent behavior that was really interesting was, um, and I think this is how we want some of our lifelong fans of Summer Health, which is sometimes when you have to go into urgent care or emergency room, you're [00:13:00] really scared as a parent.
And, you know, the doctor's asking you a lot of questions. It's a really, there's a lot of, um, anxiety you have of like, hey, am I explaining the situation correctly? So because of our asynchronous SMS, model. Karen's actually brought us into the emergency room. We would have a four to five hour sessions where there would be like, Hey, the doctor said, X, Y, Z.
What should I ask? You know, how should I explain what's going on? And then they would then explain to our summer of pediatricians. Hey, the doctor said this. Should I be concerned or not? So I think it just Enable the very different interaction model that this is just doesn't exist in the market. Really?
Wow.
[00:13:39] Nikhil: Yeah, that's super interesting I'm sure a lot of people probably listening to this have been in a situation where they're like Sitting in the ER waiting for the next appointment or you know they're kind of like not listening super closely because they're like You know, dealing with something relatively traumatic and having someone who's like almost a champion in the room with you.
I feel like it's like that. [00:14:00] It's like the having a doctor friend or like having a doctor parent, right? That's kind of like the the dream a little bit.
[00:14:06] Matthew: Exactly. I think
[00:14:08] Danielle: the overarching takeaway from me is like the quote the best abilities availability and you've just nailed that. They're just being constantly available.
Are there any kind of scheduling? My ops brain is going off a little bit because you're talking about four hour sessions, not just like what traditional async can feel like, which is like one off things. So like, how do you manage more longer interactions? And then from there, we can totally jump into AI and get, pick your brain
[00:14:34] Matthew: on that.
Yeah, I think that, you know, the reason why The interesting thing about SMS and maybe just broadly asynchronous care at least what we have found so far and you know There's probably other factors that we can get into is that because it's not like a video call or an in person call. I doctors don't have to Doctors let me put this [00:15:00] way.
Um, they don't have to be on For the entire time they can manage multiple conversations at the same time. So they don't really feel this pressure to be like, hey I need to fit the session with a 15 minute because I have to hit a certain quota to make X amount of dollars and I think the second thing is because messaging is so much easier than having to get ready for a video call and Constantly, I know when like I guess when you're on a video call, you often have to show a certain level of emotion and expression.
That's actually like, really, um, that takes a lot of energy and effort. But with SMS, it's pretty effortless. In fact, you know, some of our doctors, probably to, you know, probably not the best model, but they, uh, not the best work life balance. They sometimes provide care while they're having dinner with their family.
But like, it's that easy, so that when an interaction lasts four hours, We actually haven't heard from our doctors that they have a problem with that and we don't pay them any more for that session than they do for something that maybe lasts, you know, on average, typically 15 to 20 minutes. So I think asynchronous [00:16:00] is both beneficial, not just for the caregiver, but also the doctor.
I guess
[00:16:04] Nikhil: maybe switching gears now and we can like talk a little bit about AI since that's what the people want to hear. Uh, but so you guys started, you said a year and a half ago, right? So. I'm guessing like large language model like craze and hype had already taken hold at that point. Was the, first of all, did like Summer Health kind of like come out of the idea that like, hey, AI might play a core role here?
Or is it like you guys started and this came out and you're like, oh, we should totally be doing this for every, for like a lot more stuff internally.
[00:16:36] Matthew: Yeah, I would say that. We were excited about the developments in the GPT space, I think around that time when we started GPT 3 was, maybe not even GPT 3, GPT 2.
5 was kind of in the wild. You know, we had a hypothesis that, you know, probably the next five years that this would happen, maybe 10 years this would happen, and that you would need to have a lot of [00:17:00] interactions and medical data to basically be able to train your models. Um, so, in some ways, we, we always were kind of building towards that vision, but we just didn't know how.
Fast, that vision would come to, uh, to actuality. And so when we saw the performance of, you know, GPT 4 on a bunch of the medical, uh, license exams, that's when we got really excited because that vision that we thought was five years out, that required, had all these prerequisites, suddenly came to the, the fold.
And obviously this is where we can start talking more about, but, you know, it was always part of the plan, but we just didn't think the plan would happen so soon. And we thought we had needed a lot of other things to, to be there for that to happen. So you see GPT-4,
[00:17:41] Nikhil: like what's the first thing that you're like, oh, we definitely need to use that
[00:17:45] Matthew: here?
Basically, just to be really clear, what got us really excited about was obviously, you know, the explosion of chat to GPT. So I don't want to say that I'm some type of Oracle that knew this ahead of time. I didn't really understand the full application of it. But once chat came [00:18:00] about and you start to play around with it, you're like, wow, this is.
Really amazing. What can we do with the API? And for about a month, there was nothing that we really could do. All I started to do was start to ask on behalf of my CEO permission to use her, uh, conversations on summer health to basically see what was possible. So I was trying to do things like, can I generate a medical note?
Can I probably pretend to be a. Try to diagnose, you know, some of the symptoms, you know, what's the difference between a chat to BT answering a parents concerned question versus a real doctor and really just trying to understand the use cases. Didn't know what would happen, but just was really interested in that probably about a month and a half later, because I think the craze of chat to BT probably started towards the tail end of November.
Um, and then probably around late January. I heard some rumblings that. OpenAI might offer a BEA for HIPAA. And so immediately I had a few friends that worked there as early engineers reach out to them. Like, Hey, can you put me in touch with whoever is reviewing these applications? And [00:19:00] that's when I got put in touch with their head of go to market and, uh, a month later, uh, probably towards the end of February, early March, uh, we had signed a BEA with OpenAI.
And so that's kind of how that transpired.
[00:19:14] Danielle: Wow. I'm like sitting low on that wait list to get on GPT 4 right now.
[00:19:20] Nikhil: Well, I feel like independent, you know, people do, like, are probably not high on their priority list. I don't think they're ever
[00:19:27] Matthew: going to give it to me. Sorry, you don't matter. I'll try to find a good word for you, Danielle.
Yeah, look it
[00:19:32] Danielle: up, come on. I'm sure everyone that's listening to this is like, oh my god, how do I get a BAA with OpenAI? Like, that sounds like holy grail. What did that process look like aside from texting those friends and engineers and have you seen it done in a not backdoor way since then?
[00:19:49] Matthew: Yeah, so the process looked like one, make an intro, two, jump on the call.
They really cared a lot about obviously user risk and safety. They didn't want, you know, people to be [00:20:00] creating these AI doctors that would basically replace the need to actually talk to medical professionals. professional. So they're really vetting my use case, making sure it's a low risk use case where we're planning to use it first.
And that's partly why we first jumped into a medical note summarization because that's a fairly low risk. Uh, we still keep a human in the loop. And so they really got into the details of how we were planning to implement this. And how do you plan to mitigate risk? Um, and that's, I think what helped us push us over the finish line.
Um, truthfully, so that's kind of where it started. We're actually going to be launching a case study on summer health with open AI next week, so, you know, stay tuned for that guardrails.
[00:20:37] Nikhil: So they put guardrails around the use cases that you're allowed to use it for. And so basically like, Hey, if you do anything outside of this, like.
No dice,
[00:20:44] Matthew: we're going to cut you off. Yeah, yeah, so basically before, uh, after the call with the head of the GTM there, he sent me a document of the types of use cases they would allow and just make sure that we're actually doing this and then I had to write another email that reinforced, you know, what we were actually [00:21:00] planning to use it for and if that materially changed, I would reach out to them.
And then it basically went into an application process and I think the CTO at that time was reviewing every single application to make sure that it was a safe use case.
[00:21:13] Nikhil: Wow. That's crazy.
[00:21:15] Danielle: Since then, have you figured out, has, have there been other BAAs in place that you've seen from other digital health companies or has that slowed down and demand's too high and everybody's sitting on that wait list like
[00:21:26] Matthew: me?
Yeah. I. So after March, let's see, maybe two months later, so that's now April or so, I was at a dinner and someone's like, oh, like, you know, how did, um, you know, what are you doing? I talked a little bit about the BA with OpenAI. I think around that time you could still get it, but it would then cost you money.
So we didn't pay anything for our BA with OpenAI because we were just so early. Um, I think at that time they were starting to charge closer to 10, 000 or so. I don't know the exact figure, but somewhere in that range. Um, now it sounds like, again, I haven't really been keeping in touch. I probably should.
It sounds like it's really [00:22:00] difficult to get, but, um, yeah, that's kind of, that's, that's all I know as of right now. It's kind of an interesting question
[00:22:07] Nikhil: too, that's open, which is like, um, when, depending on how much they charge for this, right, there's still like a, is this cheaper than a human being to do the same task kind of question, like, especially.
Depending on if you can outsource it overseas and all this kind of stuff, right? Like, when it's free, you can just mess around. Like, yeah, do whatever. Like, anything's, anything's possible, right? Um, but I do think there's like, it has to be a cost inflection point. And also, like. Maybe a PHI sort of like, uh, PHI question about around, like how the data flows that for it to actually be usable, a lot of these contexts that people have been dreaming of, right?
Because it's like, listen, if it's 7 or 10, whatever, per interaction with this thing, it's like, that's not, that's not feasible. Right.
[00:22:54] Matthew: Yeah, I mean, obviously I think that everything about two way, there's like the, uh, fixed [00:23:00] costs of signing the BAA and then there's the actual API usage costs that's charged on their per token basis.
And I'm going to, I'm totally going to butcher this. And so someone should just look it up and verify, but it's like 0. 00001 cent. And so the amount of API calls that we've made over the course of, I don't know, four to five months of doing, you know, Tens of thousands of visits has been like 500, right? And, you know, if you were to use a service, One of the vendors out there that could do this auto generation, you're, you're paying close to a hundred K in some cases.
And so, yeah, there's definitely that cost inflection point for sure. But, uh, you know, if you do have the technical capabilities and able to sign be, I, I think it's significantly cheaper personally. Did
[00:23:49] Nikhil: you consider any of the other, um, large language models when you were looking for a partner on this? And also like now.
Like sort of maybe post OpenAI
[00:23:57] Matthew: debacle,
[00:23:58] Nikhil: like, uh, do [00:24:00] you, do you think about like diversifying like the partner base that you work with for this kind of stuff, or is it just better to go all in on, on
[00:24:07] Matthew: one? Yeah, so I would say in the beginning we didn't consider any large language models just in terms of a base.
Uh, foundational model, I think at that time, opening the eye and I think to be honestly caught everyone else off guard. So they were kind of the only game in town. I mean, they call that caught them off guard. Exactly. Uh, I would say now looking forward, you know, we are in discussions with Google as well.
Um, we, we played around with Palm. It does seem to hallucinate a little bit more than, uh, GPT four. That being said, they do have a medical, uh. Based language model being met palm and so we're currently on the wait list. I think hopefully by the end of the year We're supposed to be off that wait list to start using it and we're really curious to see you know How that performance relative to open AI on things like medical summarizations the interesting thing though talking to?[00:25:00]
the team at Google is that There is a difference in terms of how different language large language models based on the different use cases So for example what they've seen internally is that palm? It's much better at summarizing the parent or patient side of the conversation and MetPalm is much better at summarizing and synthesizing insights from the doctor.
So we definitely see a world where we'll start to use different models for different types of use cases and also speed. You know, for example, you know, if there is a huge, um, volume of traffic on certain, uh, uh, large language models, the speed in which we need the return might be too long. So we might want to look into something like Llama, where, you know, we can have a little bit more control of the compute.
So, we are definitely Personally, internally, we're looking at more of a platform in terms of how we can leverage multiple different large language models, um, and really kind of fine tune their use case.
[00:25:57] Danielle: One thing I've heard you talk about is the [00:26:00] sophistication and the nuance of prompt engineering and how you worked a lot on making sure you were asking the right questions, which probably saves you a lot of money API calls are.
Well, the API calls are just better because you know exactly what you want out of it. Can you give us the story on the whole kind of saga and build up towards designing the right prompt and the original kind of thinking around using AI for clinical notes?
[00:26:25] Nikhil: All my homies are chaining prompts nowadays, you know, it's like the
[00:26:28] Matthew: new pastime.
Yeah, yeah, yeah. Um, I would say that there's a couple of things that was really helpful for us to get started. So one was that, you know, we. Okay, to start, like, again, obviously ask for permission to use certain types of information, but if there's certain, if you're comfortable sharing your own medical information in chatGPT, just start there.
Just figure out, hey, if I were to post X amount of data into chatGPT and give a particular prompt, what would it get as the output? That generally is enough to [00:27:00] generate some type of excitement within, within the company. Um, based on that, I suggest to your CEO, like, hey, let's have a, let's have a hackathon.
Let's really try to see. Um, you know, what we could possibly do in just two days. And so, from there, one of our engineers, I didn't think of this, but he basically set up, um, what we call like a, a Python notebook. I think it's pretty standard in the data science community. Where it really made it really easy for us to basically create a template that Our team could use to start kind of playing around with like, Hey, if you were to input an X conversation, uh, and had this prompt, you know, what would the output be?
Um, so we had the setup for the hackathon and during the hackathon, you know, our head of clinical work really closely with our engineer and basically started with the most basic being like, Hey, here's a conversation, please summarize an X presentation. And then the next iteration was like, okay. That was like three out of five in terms of evaluation.
Let's get a little bit more specific. For the diagnosis, please, like, try to use more, uh, you know, grade five language terminology, [00:28:00] you know, remove pronouns. Like, very, very specific. I think, um, one analogy or metaphor that you can use for how you interface with GPT is to think of it not as an expert, but as an intern.
Right, like your intern is going to, you can ask an intern to help save you time, but you really need to probe them both on the specifics and also how they came to the particular conclusion. And as you kind of iterate through that type of framework, you can get to a point where you can basically write a very detailed instruction set for an intern to basically produce a very reproducible output.
And so for us, it took about probably 20 iterations over the span of Four to five hours to basically come out with a prompt that our head of medical felt, you know, what basically when we tested it on internal use cases, you know, was 80. 687 percent of the time accurate. Um, and that's actually, you know, again, at that point, that was like a small sample size of like 20 conversations.
Now, when we actually look at the real data, [00:29:00] which is like, how often do people, how often do providers edit the kind of medical note from the prompt? It more or less is like 84 to 85%. So, um, you know, it's interesting how heuristics kind of. play out, uh, at a larger scale as well.
[00:29:15] Nikhil: You, you must've had much more competent interns than I've managed in the past.
Just kidding. All my interns, please don't mail me.
[00:29:25] Matthew: You guys still have jobs is what Nikhil is saying. That's true. That's true. What's
[00:29:30] Danielle: your general take on AI in your company now?
[00:29:33] Matthew: I would say that there's probably a couple different. Angles your layers. Look at this. One is just how do you in embed AI into every part of your company in terms of its operating cadence and system?
And so one is there are so many tools that many startups use now, whether that be notion or loom or zoom. Just, just [00:30:00] turn on the AI features. I know it costs more, but it's going to save a ton of time. There was a time where I would do a user research interview, record it on Zoom, you know, watch it again, take out the, get the key insights, share it with the team.
That literally took me, I don't know, maybe like 30 to 40 minutes. Now all I do is I copy the transcript, put it into Notion, ask Notion to summarize it. And like within 10 minutes, I have all the insights that I need to share with the team. Um, the second thing that was really interesting is like, how do you start enabling better cross collaboration across different functional areas and actually allow everyone in your company to up level themselves?
And so the really cool thing that we saw internally was that now our head of clinical or even our head of ops, uh, has GPT to help them analyze and understand our data better by using it to, um, you know, generate queries using SQL. But. When, the trick is to actually pair them with a data scientist so that if they ever do get stuck, there is a real human being that can kind of, uh, that can unblock them.
But most of the time now, 80 percent of the time, they're able [00:31:00] to just figure out on their own exactly what they need versus in the past, you know, we just had this massive team for a data scientist. Our data scientists, interestingly, started to work much more closely with their engineers, and she basically became like a data engineer slash like backend engineer that created a lot of application for us to do things like the auto generation of care plans that are hyper personalized based on the user.
Um, so that was really fascinating. So the point being that like, you can really allow almost everyone in your company to operate at the top of their license, kind of like in the medical term, or MP, or uh, I guess. PAs are acting, you know, in some ways as doctors, uh, internally as your company because of AI.
And then the third thing, obviously, is just how we approach every single product problem. Um, I would say right now it's more of a distributed. expectation that everyone is thinking about how we leverage AI in all of our workflows. But that being said, as we start to be more sophisticated in tying multiple models together, I could definitely see a world where we hire a specialist to come in.
And right now, we basically have an expectation that everyone in the company is [00:32:00] constantly thinking about how do we integrate AI. So there is no specialist. Everyone, you know, one of the big hot takes is that, you know, in the past, You know, whether you invest in AI was a strategic choice right now. It's just an imperative.
That being said, as we get more sophisticated, as I kind of mentioned where there's now, not only is there multiple language models, but there's multiple Modalities in which large language models can play in, uh, definitely say within the next six to 12 months, we'll probably hire someone that's more focused on this particular space to constantly help us be at the leading edge.
[00:32:27] Nikhil: Well, that's interesting. I, one thing I wanted to ask is like, so for example, in that notion use case, right? Of like, Hey, we're going to record a user interview. We're going to bring it to notion, ask notion to summarize. Do you like check to basically see like, okay, how good was the summarization versus like, if I did it myself.
Um, I would have maybe done X, Y, and Z. Cause one thing that I have think about is like, if everyone is using a lot of these tools, like, uh, AI kind of spits like the most, like, I don't know how to say this, but like midway [00:33:00] answers, right? It's like the, it's like the exact, like average quality answers in most cases, versus like, I feel like a lot of really, you know, important startup insights come from these like.
random fringe emergent behaviors or like weird things that maybe like you yourself think are important, but maybe the AI does not, for example. Right. So I'm curious, like, do you, do you, do you battle test AI features? Have you noticed, for example, also some companies have been, have done a better job with their, with their AI, you
[00:33:28] Matthew: know, kind of like turn on tools.
Yeah. So there's a couple of questions. There's one, you know, how, how effective is it? And, you know, two, can you pull out the nuance? And then three. Are there certain tools that are better than others? So, the first question is, yeah, in the beginning I definitely, you know, Would just I would write notes during the user research session and then I would compare what was basically been able to pull out of Let's say, you know using notions AI I would say that it got 80 percent of the way [00:34:00] there And I think that's true of all of these AI kind of task automation I think it's a fool's errand to try to get it to a point where it truly replaces the need for a real user researcher I think if anything it's an accelerant or an augmentation of your ability to focus on the things that matter the most because there's some Basic things are going to be captured in every single user research survey, like the background of a particular person, you know, what the care setup is at home, where do they live, you know, what the, you know, what the profession is.
All those psychographic things can just be pulled in through, uh, the summary tool versus you can really now focus on more of the nuances. And so when you're reading the summary that the AI came up, you can just add to it where you really want to focus your time versus wasting time, you know, basically, you know, sharing kind of the basic information that, you know, is typical.
Um, The second thing is around, um, you know, battle testing. I think that, I wouldn't say battle testing, but what I've done iteratively is, and I mean this is kind of the common theme, is you constantly iterate on the prompt, right? So if you notice it missed something, just [00:35:00] iterate on the prompt, and then notice you can save the template, um, and so that's something that you can do.
Actually, one thing I've been playing around with now is ChatGPT launch ability to add, create your own ChatGPT. And so I've been playing around with that where I'm trying to create my own prompts there and sharing it with my team. So that's easier for them to do things like user research or data analyst and whatnot.
So that's something that I'm kind of playing around with. And then the third thing is their tools. I would say like I mean, I'm going to say the Notion's pretty good. I think it's really flexible. Um, they have a really, I mean, I think people have probably seen this. They have their new Q& A built tool. Loom has been fantastic.
Uh, it's just saved so much time. Especially now, in the past I would, I would do a launch post. Right now I just record a Loom about a launch and it basically summarizes it in a way that people can just consume it, which is great. Um, so I think those two, and then if you're a little bit more advanced and you want to play around with it into more of your daily tools, there's a tool called Raycast that is a cool command launcher, and you can basically, you know, assuming your company has access to OpenAI's GPT, you can basically get one of their API keys, put it in there, [00:36:00] and, you know, use it in any app that you're using, so whether that be Google Sheets, I don't know, your email, whatever, it can get the context to basically help you automate a lot of things, or at least give you an initial starting point for most things.
On your point, Nikhil,
[00:36:13] Danielle: around this idea of, um, catching the nuance, I think a lot of the challenges that we have in care delivery is just getting access to objective data in a way that's like right when you need it. So, like, so many use cases that I have and I can't access because I'm on the wait list. So maybe me and you can sign an NDA and then I can use one of your logs.
Yeah,
[00:36:37] Matthew: yeah, yeah. Do you sign DAAs
[00:36:38] Danielle: with individuals? Yeah, but anyways, like, just the sheer information retrieval problem that we have, um, on high amounts of data that otherwise are just like over, uh, Zendesk, Notion, XYZ, and just being able to aggregate it and ask simple questions, I think is so much value in and of itself.
Um, which I [00:37:00] imagine that Three minutes buys you quite a lot of time to do good information retrieval and good summarization of problems and even engineering problems, which you don't have that luxury when you're on a phone call. And it's, it's crazy to me to think about it like that. Cause like SMS now just unlocked time for you to do better back office functions.
Um, and in my head, like that is worth its weight in gold. Do you, did you know that it would be so valuable?
[00:37:27] Nikhil: There's also just like to riff on that. There's also this open question now, like. How much, how many tools you actually need to like create libraries and databases on the backend to like keep things really meticulous and organized.
Like I know for companies I've worked with or, you know, worked at, like you use like combination of like air table and notion, you try to like your best to get everyone to follow these, like really neat processes to keep things like detailed and organized and retrieval is easy. But if like. You know, if large language models makes it super easy to search across this stuff.
It's almost like, [00:38:00] Hey, just dump it into one place. We'll figure it out
[00:38:02] Matthew: later. Right. So yeah, a hundred percent. And you know, I think Danielle, one of the questions, you know, you had, you had put into, um, you asked me previously, it was like, is there an idea that someone should run with, you know, if, if they wanted to start a company and when someone should just rethink this whole, like tracking health metrics space, it's like, I don't know if You know, for the parents out there, there's like the, uh, I think it's baby center, there's like huckleberry, there's all these different apps that allow you to track all these very specific metrics, but everyone does it for like maybe the first week, first two weeks, and it's become so tedious that you just don't do it.
There should be a way for you to just use your voice to. Basically say like unstructured information and turn it into structured information. I think that's the other way to kind of think about your question, Nikhil, which is like the inverse, which is even the input should be unstructured. And then the AI basically structured it in a way that makes it useful for you in the future.[00:39:00]
There's a really cool consumer app called AudioPen where I'll listen, because I love listening to audio books. And I always forget to like write my thoughts and so after a chapter, I'll just like talk for a minute, probably like 90 percent gibberish, but it takes the 10 percent that's not gibberish and like structures in a way that makes sense for me to reference later.
So I think someone should do that for health care. Um, and that would be, you know, probably, uh, you know, really impactful. I can just imagine you on like public
[00:39:26] Nikhil: transit, just like talking for a minute into your phone while everyone's staring at you being like. What's this dude doing?
[00:39:32] Danielle: Everyone's focused on longevity and Matthew's like, I'm just going to AI the shit out of
[00:39:36] Matthew: my life right now.
Yeah.
[00:39:38] Nikhil: Yeah, exactly. That's super interesting. Um, I'm curious, like, so obviously we've talked a lot about text, right? As like one of the big modalities and you've built a ton of asynchronous stuff. Open AI has been coming out with all these new, more, more multimodal, um, stuff, right? Images and video soon and audio and all this, these other things.
Have you [00:40:00] considered. Other use cases that you might bring some of these more multimodal models into, or are you really just like hyper focused on text and trying to build those workflows there?
[00:40:09] Matthew: Yeah, I would say two things. One, continue to push on text. I think that one thing that was really interesting, and I think this is a phenomenon for everyone that has been leveraging the outputs of touch, um, GPT, uh, um, generation, you know, obviously with, with some editing and inaccuracy checking, is that.
Patients actually like the medical notes written by GPT more than they like the medical notes written by doctors. I don't know if anyone's ever written, gotten an actual useful note from a doctor, but it's usually like one or two sentences, whereas GPT actually really dives into the conversation and pulls out more like five or six sentences that really help the patient remember exactly what the conversation, how the conversation went and vice versa.
Um, the second, the reason why I pull that out is that, you know, one thing that's been challenging [00:41:00] for us in Texas that. You know, words only make up, what, 10 percent of, uh, of communication. There's, you know, the tonality and body language is obviously missing in SMS. So we're really trying to push the envelope of, like, how do we actually help doctors communicate empathy and understanding for a parent?
Um, and I think that's something that, you know, we haven't quite, quite cracked yet, but that's something that we're actively thinking about. In terms of the other modalities, we're really excited about that. Um, on a similar line, I don't know if anyone's tried the new, you know, ChatGPT kind of voice interaction, but it sounds incredibly natural.
There's an API that they expose that allows you to do that. It's called Whisper. We're starting to play around with it. Like, is there a way that we can help doctors communicate empathy by creating a synthesized voice for them? Um, you know, these are some really interesting things that I think, you know, it's, it's just hard to do, hard to communicate via text unless you're using the right emojis.
And, uh, you know, that's always a challenge because emojis can have many different meanings, as we know. So, I just
[00:41:57] Nikhil: respond with the, I just respond with the upside down [00:42:00] smiley face one each time because no one knows what it means, which is great.
[00:42:03] Matthew: Right, right. You just
[00:42:04] Danielle: never respond, as you know.
[00:42:07] Matthew: This SLA is terrible.
I have no SLAs. I have no SLAs. Shittiest Exactly. Uh, yeah. Um, and then the other things, obviously, like image is going to be big, uh, and same with video as well. Um, and so, we're really excited about how we can help doctors. Get preliminary analysis of images and also to give them more higher confidence so that if there is a situation where we can act on that information, we can versus sending them in person.
And so not only we're talking to, you know, opening up how we might leverage that, but we're also talking to Google because they also have a multimodal medical based. LLM that, that we're really excited to potentially use.
[00:42:45] Danielle: You alluded a few times to the word empathy and providers and the intersection of those two.
And I understand that you're using a little bit of AI now to do some sort of scoring. Can you touch on that a little bit and what
[00:42:58] Matthew: you're working on? Yeah. [00:43:00] So the, this is, you know, one of the things where, you know, It's been helpful because every quarter it used to be that our head of clinical, uh, and another provider would basically, you know, do a sample size of 30 conversations to across all of our providers to get a better understanding of How the care is being delivered.
And so one thing that we thought during the hackathon was, you know, is there a way that we can actually use AI to help us score and understand how empathetic conversation was? And so much of what we did before, we basically, uh, you know, use the LM to generate a score. We had a doctor evaluate it. Um, and I would say the false positive is much higher in this case.
So it wasn't the highest accuracy. It was closer to like 70, 72 percent of the time it was accurate. Um, but it allows us to get a better pulse to find any outliers for certain providers where, you know, their empathy score was just really, really low. Um, that being said, one thing that, you know, because a lot of these things are more [00:44:00] probabilistic, not deterministic, um, you really have to not just get a score.
But, get the LLM to explain why they scored it at a particular level. Uh, and that gave us a better nuance because, you know, for example, if a doctor's following up and the doc, and the patient responds, Thank you. You know, they're going to give it a really low empathy score because, you know, the length of the conversation was really short.
There wasn't a lot of like, hey, I understand how you're feeling. And so, you know, having that score was helpful, but getting into the nuance was more important to actually make it useful. How do
[00:44:31] Danielle: you find the right problems to solve with AI and, or have you developed some sort of framework about the right business cases to apply AI
[00:44:39] Matthew: to?
Yeah, I think that, you know, there's basically two axes that we look at, you know, one is, you know, how impactful is this particular problem if you were able to solve it, uh, In a much more personalized and higher quality way and then like what level of AI autonomy is there [00:45:00] and anything that is kind of in the upper right corner is high impact and high AI autonomy.
We try to really quickly validate it through some really quick prototyping and then we basically focus on, you know, making that and then much more scalable or like systematize it. The second thing I'll say is that, you know, one thing that we're just starting to think about is how do we. internally create a really flexible way for a clinical team to leverage LLMs to um, I don't know what the right term is, but like help automate a lot of the flows leveraging LLMs because there's a lot of It's not, you can't think of it as almost like Zapier, right?
There's a lot of clinical information in the fire, like standard. And so, you know, for page, when the patient closes a visit based on the patient's like medical history, blah, blah, blah, blah. You know, here's a prop generate this particular task for a provider to do X, Y, Z later. And so we can start to [00:46:00] help the clinical team have more control in helping guide certain types of patterns for doctors to have, you know, more proactive and quality care.
If that makes sense. There's going to be more, there's, there's more to come with that. And maybe, you know, A couple months from now, I can share a little bit more about how we're, how we're building that platform internally.
[00:46:16] Nikhil: Awesome. Well, we're sort of coming up towards the end of time. So one last question to like Close things out for a lot of companies that are probably maybe thinking about, Hey, I want to bring AI into my org, or I want to become an AI first company.
Like, do you have any suggestions on one, maybe like first small ways they can do that. And then also to like, what are the big things to think about, um, to make that transition that might help. Oh, and also like, when should a company think about even. Doing that, right? Is everyone just like an
[00:46:47] Matthew: AI first company now?
Well, I think everyone should be, um, to that, to answer that question directly. And where to start, I think it's always about finding either yourself or someone in the company that has shown a [00:47:00] lot of passion for it. Um, and that might be because they're already commenting your Slack about all the different use cases.
Task that person with trying to find some early quick wins by leveraging things that don't have to be HIPAA compliant initially. Um, but you know, just using check GPT to prove out some of the use cases that could be powerful. Um, and then from there, you know, Yeah, it might be difficult to get a BEA, but there are ways in which you can strip out PHI information in a way that might be helpful, or you look at some of the open source models like Llama, um, and you can host it on your own servers, and that way you don't actually have to worry anything about HIPAA.
So, uh, you know, that would be the kind of third step. And the fourth step is I'm assuming that you've generated some excitement. I know I've said this a few times, but like actually hosting a hackathon and bringing some of this to life with real clinical data, I think would go a long way. And the beauty of this is assuming that you have some type of interface to.
display the output. It took us like two days after the hackathons to implement into our clinical [00:48:00] workflow. I don't think it's like, you know, some of the other prototypes in the past where, you know, it would take you maybe months to get it onto the roadmap. You know, some of these things, once you get into a point that makes sense, as long as there is a human in a loop and somewhere where you can display the output, it probably gets 80 percent of the way there.
80
[00:48:18] Danielle: percent is like your general benchmark for how successful AI is, it seems like.
[00:48:22] Matthew: 80, 80 Rito principle. Exactly. Rito principle. You only need to get 80% of the way there. I think it's a fool errand to ever try to get to a hundred percent. My last
[00:48:30] Danielle: question on that is maybe just how you solve for that other 20%.
Do you always have a review step before something goes live? And if it's, especially if it's going to face a patient. Um, and what are some of just those guardrails you have in place around the 20%
[00:48:45] Matthew: failure mode? Yeah, I mean right now there's always a human in the loop and you know. There's no plans to ever kind of remove that step, uh, right now.
So, I think if we were to ever remove that step, we would just have [00:49:00] to be much more diligent in trying to understand, um, what's the right metric. It's really difficult to know, um, what the right metric is. So, for now, human in the loop, um, always. We do try to track, uh, not try. We track how often the doctors actually edit the medical note, how anomalies where Literally the doctor's just hitting sign every single time without ever editing and we talk to them and be like, hey You know, it's really important that you review this type of, um, information ahead of, or make sure that the AI is actually generating the right, um, output.
So, that's mainly how we monitor it internally, uh, right now.
[00:49:36] Nikhil: So, I know I said last question, but I have one more follow up to that, which is, do the docs that you work with, like, Are they generally excited about using AI tools for this stuff? Like, I feel like, you know, when you're as internet brained as I am, and you spend too much time online, people, all the docs, you know, it's very polarizing.
Like all the docs are like AI bad, right? I assume, I mean, on the one hand, I'm sure there's like a self [00:50:00] selecting group of docs that like choose to work with you all if they, You know, if they weren't excited about AI, they probably are not like gravitating towards you. But I'm just curious, like, are the docs excited about it?
Is there like some archetype of doctor that you've seen who's like particularly really leaning into this
[00:50:15] Matthew: stuff? I would say that our doctors love it. They think it's magic. Um, so I think, you know, especially when it comes to clinical, you know, kind of administrative work, it used to take doctors probably five 10 minutes to write the medical note.
They usually batch it up and then it would take collectively maybe hours. Now it takes like A minute to two minutes. Um, and then in terms of like the type of doctor, honestly, we have doctors across the board. So we have ones that are purely focused on telemedicine that we have ones that have practices that just want to be able to provide care without all the administrative burden.
And I think across the board, all of them are excited [00:51:00] about the ways in which a eyes are just making their life easier so they can just focus on doing the thing that they care about most, which is just delivering care. So. I think across the board, at least for the doctors on summer health platform. Uh, they love it.
That's awesome. Great.
[00:51:14] Nikhil: Well, I think that's, those are pretty much all the questions that we have. I want to just thank you for coming to chat. I think, uh, people are going to like learn a lot about places and how they can hopefully implement a lot of these new AI tools. So thanks for coming to chat with us.
[00:51:29] Matthew: Oh, no problem. Hopefully it's helpful. And then, you know, people can always reach me at Matthew at summerhealth. com if they ever want to, that's my email. Hopefully I don't get a flood. If not, you'll be getting some chatbot responses. I have gotten a chatbot response from you right now. Custom GPT for yourself, a hundred percent.
SLA would be much slower. It would be like, you know, 24 hours, not two minutes and 87 seconds. Purposeful typos. To make it seem legit. Awesome. Well, I mean, that's actually really quick on that. It's really funny 'cause you [00:52:00] know, people ask us like, oh, do, do people think your doctors are bots? And we're like, no.
'cause our doctors make so many spelling errors and type that they're like, of course this is a human being.
[00:52:08] Nikhil: Of course, once the, once the bot figures that out though, it's over for us.
[00:52:13] Matthew: True tour test stuff, you know? Exactly. Exactly. Yeah. Awesome. Well Matthew, thanks for coming to join us. Appreciate it.