BJJ Podcasts

Does performance-based renumeration improve outcomes in the treatment of hip fracture: results from the WHiTE multicentre hip fracture cohort

May 01, 2021 The Bone & Joint Journal Episode 37
BJJ Podcasts
Does performance-based renumeration improve outcomes in the treatment of hip fracture: results from the WHiTE multicentre hip fracture cohort
Show Notes Transcript

Listen to Andrew Duckworth, Xavier Griffin & Matt Costa discuss the paper 'Does performance-based renumeration improve outcomes in the treatment of hip fracture: results from the WHiTE multicentre hip fracture cohort' published in the May 2021 issue of The Bone & Joint Journal.

Click here to read the paper

[00:00:00] Welcome everyone to our BJJ podcast for the month of May. I am Andrew Duckworth and a warm welcome from your team here at The Bone & Joint Journal. As always, we'd like to start by thanking all of you for your continued comments and support as well as a big gratitude to our many authors and colleagues who take part.

We hope that you're continuing to enjoy our podcast and all our knowledge translation work delivered so far this year. With regards to our podcasts, we've covered some excellent studies published here in the journal, including a meta-analysis on the effect of antibiotic loaded bone cement on the risk of revision, following hip and knee arthroplasty, as well as the large UK multicentre trial, comparing the X-Bolt device with the sliding hip screw for trochanteric fractures of the hip. We also want to highlight our new special edition podcasts that are with our specialty editors here at The Journal. These started in the March with Sam Oussedik, our Specialty Editor for Knee, and we do hope these are providing some insight into the great work our specialty editors do here at The Journal, and also to highlight to everyone listening that the papers we discussed in those podcasts have temporary free access. 

So back to this month, as you all know, over the next 20 to 30 minutes or so we'll cover a range of aspects of the chosen paper emphasising the important points of how the study has been designed and put together [00:01:00] as well as some key messages from the paper and how these potentially fit into each of your day-to-day clinical practices.

So today I have the pleasure of being joined firstly, by Professor Xavier Griffin from Queen Mary University in London to discuss their study entitled "Does performance-based renumeration improve outcomes in the treatment of hip fracture: results from the WHiTE multicentre hip fracture cohort", which has been published in the May edition of the BJJ.

Welcome Xavier and a big thank you for taking the time to join us today. 

Pleasure. Thank you very much. 

Xavier and I are delighted to be joined by his co-author and our specialty editor for trauma here at the journal, Professor Matt Costa from Oxford. Matt, great to have you back with us. 

Great. Thanks Andrew. 

So guys, if we first of all look at the aim of this study - that was to determine whether the national standards of best practice  are associated with improved health-related quality of life outcomes in hip fracture patients. So Matt if I could start with you, if you could give us maybe a brief introduction to the study and some background to the literature regarding pay for performance initiatives and how they've been used effectively in healthcare systems in attempt to generate and drive improvements in patient outcomes.

[00:02:00] Yeah. So I guess this goes all the way back to the very basic question of how do you manage the delivery of healthcare Andrew. So we know that in order to manage healthcare, you need to be able to measure healthcare and that is an old business sort of saying from many years ago, but it's equally true to the healthcare system. So we've been collecting data on healthcare delivery for years. But just collecting data in itself, isn't really enough to change the way that healthcare is delivered to improve outcomes for patients. So if 20% of patients have an intervention X, and you can measure that, that's great, but that doesn't tell you whether they're getting better or worse, or whether that 20% should be 50% or indeed 0% should be getting that treatment. So just the measurements don't by themselves help you. You need to link the measurement to a standard -some sort of measurement that indicates improved healthcare delivery or ideally improved health outcomes for patients themselves. So creating standards against which to measure is kind of the essence of health [00:03:00] data collection. And that's something that's done all around the world and has been for years. 

The next step on from setting quality standards for delivery of healthcare is to see whether you can incentivize them. So can you pay people more in this case to actually deliver a better cap? And again, that's not a new idea that's been around for a long time. And there's some evidence to say that if you pay  patients, they actually get better outcomes, for instance, in smoking cessation, if you pay the individual patient, there are randomized trials that suggest they are better at giving up smoking. And equally there's some evidence to say that if you pay a healthcare system for delivering care against particular standards, they get better outcomes for patients than those that don't.

Now the evidence is a bit mixed. So in the US they try to incentivize certain treatments for heart attacks in patients within their system and have slightly mixed results about whether the actual outcomes were improved. The patients seem to have a lower mortality, but there were other problems in the healthcare system. So you can produce perverse incentives, driving changes and help get it up to the [00:04:00] detriment of other patient groups. 

In UK, there's been an awful lot of debates, ongoing debates about incentives, payments, incentives, around quality standards in primary care, so the quality outcome framework that many of the readers will be familiar with so linked to chronic diseases like asthma and diabetes and some evidence to say they improve those outcomes, but that may also harm other aspects of care. 

 So what about the trauma side ? Well hip fracture is where we have probably our best data in trauma from registries in the UK and around the world. And we've previously published our group and others around the world to show that by linking quality standards to payments you can actually drive down mortality. We've shown that in interupted case time series work that *inaudible* from our group published a couple of years back in BJJ and others have reproduced that sort of work around the world. 

So the next step though, was really well mortality is only one measure, but not necessarily the most important one to patients, certainly for the hip fracture group. So the question we really wanted to ask was can we measure [00:05:00] improvements based on these quality standards and the payments linked to them in terms of the thing that is most important to patients after a hip fracture, which is quality of life. And that's where the WHiTE study came in.

Thats a really useful overview. And like you say in your paper, you know, the mortality for patients with hip fracture, since the introduction of best practice tariff dropped from 8.4% in 2012 to 7.1% in 2017, which is a remarkable reduction, really isn't it in many ways in just a very short period of time. 

So Xavier if I come to you next, before we move on to the sort of the meat of the study you know, obviously this is based around the WHiTE cohort data, just for our listeners, I'm sure many are aware of it, but just a brief description of the WHiTE cohort and which was used in the data sets for this study. 

So the WHiTE cohort is a multicentre study that we've run across the whole of the UK. The data that are published previously with the journal and here are relevant to just a snapshot in time within that quite organic growing study. [00:06:00] And so the data that are particularly being discussed today, come from 20 different sites. and they're actually just from hospitals in England and Wales Andrew because those were the sites open at the time. And we're talking about the period of time between May, 2014 and April, 2017. So you see that this has been quite a long-term project And we recruited in that window across these 20 sites, 8,673 participants. So those are the people that this study is particularly looking at. 


We did previously do some work to see to what degree those 8,000 or 8,500 patients reflect the wider population of hip fractures and we're getting the best part of 70,000 people with hip fracture across the UK per year. So this is just a sample of those people. And we showed that actually the degree to which these patients in the study look like patients in the broader population is very good. [00:07:00] So we've got a very good fit. If anything the NSA slightly fitter, but it's a very small difference and there's a slightly disproportionate number of teaching and large major trauma centres represented. But I think overall in the previous work that we've done, we've shown that those effects are very small. So we're very hopeful that this is representative of the whole UK population. 

Yeah, absolutely. Absolutely. So if we move on to the study itself, like you say, it's a multicentre cohort study , 20 UK NHS hospitals in England and Wales treating hip fracture patients. Just before we move on to data you used for this study, just to remind people, the eligibility criteria for the WHiTE cohort and the pathway they normally follow and what sort of data you routinely collect for it. 

Yeah. So that's a great question. So the underlying thrust of the cohort is that we want to reflect, as broadly as possible, the fragility fracture population that have hip fracture. So we've got really straightforward eligibility. So if you're 60 years or older [00:08:00] and you have a hip fracture, then you can go into the study and we'd like to see conceivably everyone in the study. So that means importantly, whether you do or do not have acute or chronic cognitive impairment at the time presentation. So maybe you come in with UTI, you've got acute confusion, you can still go in the study. Maybe you have dementia, you can still go in the study or maybe you're actually very fit and you'd come off your bicycle. Those people can go in the study too. So It's a very generalizable group of patients.

Now we had to hone in to a subgroup of those eight and a half thousand that I mentioned before, because we needed data both at baseline, but we also needed data at four months because that's the key timepoint for hip fracture. And so we're actually looking at subgroup of the 8,000 or 8,500 down to 6,532. So we're talking ballpark 6,500 people generating this data. 

Now in terms of how they were treated. We tried to [00:09:00] reflect practice as it's happening in the real world, in the UK or in England or Wales. And so we didn't specify how you should treat every single individual patient, but we did say that it should be treated in accordance to your local pathways. And actually those local pathways are well reflected in the NICE guidance. And most hospitals will follow the NICE guidance. So when you look at the NHS reports, the degree to which NICE guidance is followed is somewhere between 85 and 95% across the recommendations. Accepting the key recommendation about total hip arthroplasty, which remains relatively controversial.

So I think in terms of the treatment that patients are receiving, it's essentially NICE recommended treatment. And what did we collect? Well, everyone will be familiar with the National Hip Fracture database. We essentially mirrored the NHFD data set at baseline. And so that's all the things that you would expect about the patient information and their mobility, what their residential status was. [00:10:00] But the key, extra thing that we collected was what was their baseline quality of life, health-related quality of life. And then at four months we repeated the recommended NHFD dataset and we had the health-related quality of life again. 

So if you imagine we've got 6,500 people treated in the NICE pathway , across 20 sites. And they are telling us everything about the NHFD but they're also telling us their quality of life for those two timepoints. 

That's great. Really, really nice overview of how the data has been sort of put together. If we could just touch upon what you've mentioned there. So obviously the primary outcome here was, like Matt said, you've looked at mortality before. It's health-related quality-of-life now. And you've used the EQ5D for that 5 L. And just looking into that, because obviously these are hip fracture patients, as we all know. So how have you dealt with the issues as such, you know, people who can't complete it, whether they are cognitively impaired or whatever. And also just obviously you [00:11:00] collect retrospective pre-injury scores as well, and how accurate do you feel that they are?

So this is really important. So the key thing that's special about this study compared to all the other studies is the fact that we're collecting quality-of-life data, because essentially it's difficult to do. And that's the key USP about this study. So we use the EQ-5DL and we use that because that's the hip fracture core outcome set recommended tool for measuring what's important for patients. So what we're trying to really ask is, does best practice tariff influence the outcomes that patients value?

So what is EQ-5D-5L? Well, it's a quality of life score and it asks five questions and it generates what's called a health state, which is a sort of summary of your answers to those five questions. What we then have to do is we have to transform it into a number that we can use for analyses. And that's something called utility. What is [00:12:00] utility? Well, that is just a process that EQ-5D have gone through with a UK population to say, here's a health state. How much do you value that health state? And they do that process outside of the study. And then we use that algorithm to produce a number and that number ranges from perfect health through to worst possible health. And it's interesting that death is not actually the worst possible health state. Death is fixed at zero. So one is best health, zero is death, and we've got negative values as possible, which represents health states that people think are worse than death.

 And you could imagine that if you had a hip fracture and you're lying in bed and you're confused and you're in a hospital that you don't know and you're racked with pain and you don't think you can walk again. That might be conceivably worse than death. So it's not surprising that hip fracture patients do report those health states particularly early on. 

Yeah, absolutely. 

In terms of why we think EQ-5D is very helpful Our [00:13:00] population has about 30% of people that can't give a self-report of any patient-reported measures. And EQ-5D is validated both in hip fracture and outside hip fracture for proxy reporting. So being reported by someone that knows the patient. 

And we've done lots of work that is published extensively in the journal to show that if you compare it to things like the Oxford Hip Score or other types of quality life scores, EQ-5D behaves very well, statistically and seems to yield similar information. 

You asked about the retrospective recall. Obviously if we could, we'd like to know what those patients had before they had their fracture, but we don't know about them. So we can't do that. So what we do is they come into hospital and we ask them two or three days ago, before your hip fracture, what was your quality of life? And we go through the score.  

There's some biases that can creep in when you do that. But what we've done is we've done a check [00:14:00] of what would the similar age and sex matched group of people give on average in the UK as their scores? And we're finding very similar scores in our hip fracture patients, although just a little bit below the average, which is probably what you'd expect because hip fracture patients tend to be a little bit frailer than the average age and sex-adjusted population. 

I think just finally, the last thing I'd say about the EQ-5D is we're going to talk about some numbers later on. So what do the numbers on the scale mean? So some context, probably a difference of 0.05 through to 0.07 or 0.08 is what the patients would value as a minimum difference. And if you look, you look at what does that mean in other health contexts? Well, patients with asthma report a utility drop that is in the ballpark of 0.05. And then through acute heart attack and chronic illness associated with diabetes mellitus, [00:15:00] you could see scores of 0.06, 0.07. So if we're finding differences like that, you can contextualize those in the context of other diseases that we understand well.

Yeah, no, I think that's really useful actually, to put it into the concept of those other common conditions. And before we move on to the results, as we alluded to, I mean, obviously is the key part of this is the best practice tariff indicators. How were they, you know, for, I suppose, for those who are not familiar with them, what they are and how were they assessed during the study period? 

So we followed the laid down, the mandated best practice. We didn't invent these. And again, they are relatively live and organic. So they're changing with time. At the time that the study was conducted, we were lucky no more than just lucky that the criteria didn't change. And so we have seven different criteria and we can just briefly whip through those. So surgery in under 36 hours, joint care under the consultant [00:16:00] orthopaedic surgeon geriatrician, admitted using an agreed NDT protocol, assessed by geriatrician within 72 hours,  post-operative MDT rehabilitation, falls, risk and bone health assessments and pre and postop delirium assessment. So those are your seven. Keep them in your head cause we'll be coming back to them. Just out of interest renumeration under the best practice tariff, for international listeners who are not intimately familiar with it in terms of funding their service, you have to get all seven of those to get your money. If you fall on one of them, you don't get anything. But we looked at each individually in the process of study, but from a renumeration point of view you're trying to get all seven in the back. 

Yeah. And just to be clear, so the five of the seven are coded directly as you have stated in your paper and, whether they're under joint care, that's attained according to the GMC number, if they're available. And the quality of assessment [00:17:00] is they need two valid AMT scores to report. That's right, isn't it? Yeah.

Absolutely right. Yep. So some of the questions, the five that you mentioned, we asked directly, did you do this, the others we extracted from various different parts of the dataset? Yes. 

Perfect. So if we move on to the results, so maybe you can talk about some of the analyses you performed there. So just to reiterate , you know, in terms of the numbers, just over 6,500 patients, like you said, who had both baseline and four month EQ-5Ds. The mean age of the cohorts was 83 years, about three quarters female, very, as you say, very consistent with the hip fracture population as a whole. And just over a thousand died prior to the four month follow-up. So first of all, what did we see in terms of or what did you find in terms of the overall attainment of each of those best practice tariffs? 

So in terms of the attainment actually overall we are, if you look at each individual criterion, we're not too bad across the piece. So [00:18:00] many of the criteria are being achieved in greater than 90 or 95% of all the patients. There is an obvious standout there which is surgery in London, 36 hours. We're under 80% and so, you know, that's continues to be a challenge across the 20 centres. 

However, when you look at how many patients achieved all of the seven criteria together , only 57% of the patients had all seven criteria.

Yes. It's quite interesting. Isn't it? In terms of, that's a really interesting table that table too, like you say, a lot of them, apart from the delayer well above 90%, you know but that overall one is the keystone in that figure. 

So in terms of those indicators and the individual indicators, how were they associated and which ones had the largest and clinically relevant increase in the four month EQ-5D?

Yes so there's a few things here. First is [00:19:00] that for each individual criteria achieving you know yes, on the best practice or ticking, the box of that criteria was associated with a better EQ-5D in all cases. That's the first thing. If you then borrow into that a little bit more three of the criteria are stand out because they showed statistically significant benefits. So there was a real difference in achieving it in terms of the health-related quality life. And those were joint care,  bone health and falls assessment and doing a competency assessment pre-op and post-op. Those are the three that were really significant. Now, if you look at how much benefit do the patients realize for each of those , you know, how big was the effect, compared to those sorts of numbers we're talking about before, it's probably a relatively small or moderate effect for each individual criteria.

If you then look at the patients that achieved, at a minimum, those three, compared to the patients that didn't achieve all three of those, [00:20:00] now unsurprisingly we got a statistically significant difference, but now that effect is actually really very substantial. So it's bigger than the effect that we see when someone has an acute heart attack. It is bigger than the chronic illness associated with diabetes. So achieving these three things together for a patient is associated with a very substantial and statistically significant benefit in the HR QOL. 

Yeah. That difference in the EQ-5D it's quite marked, isn't it, as you say? And in terms of , if we just hone down maybe on, on the delay to surgery, cause obviously that again, in the indicator, that was the one which was the below 80%, you did a bit of further analysis sort of trying to tease out, you know, in terms of the subgroups that were there. And sort of trying to control for the confinding factors. What did you find there? 

So this is really interesting. As Matt has already said, we've known for some time, really through to the registries and sessional and NHFD [00:21:00] that a delay of grading the 36 hours is associated with an increased risk of death. So that's not new information. And we found the same thing here in our group. 

What we were able to do over and above that was ask the question. Okay, this patient was delayed. Why were they delayed? Was it a medical delay? Was it because you had to do something to optimize your patient prior to surgery? That we called medical delay. Or was it administered delay where you run out of operating time, you didn't have a surgeon et cetera, et cetera.

When you look at the groups of people that were delayed, and then you break it into the two subgroups, medical versus administered delays. What you see as the patients in those groups are completely different. Totally, totally different. So people that are delayed for medical reasons are sicker, they have lower baseline quality of life, and they have a very, very high four months mortality, its just over 20%. 

If you look at the patients that are delayed due to [00:22:00] administrative reasons, actually on average, they're fitter than patients in the whole group and they actually do better than the average patient. And they have overall reduced risk of mortality at four months and have a better HR QOL at four months. So these two groups of patients are completely different from each other. And there's probably some systematic reasons. It's not just random that they look like that. 

Do you think? Yeah, that's interesting. Do you think anything with the latter group, the more healthy group, is that potentially anything to do with maybe waiting for the person to do the total hip replacement? I dunno... I suppose you cant fease that out of the data really? Can we or not? 

So I think that almost certainly surgeons or other members of the MDT are selecting patient to be in one or the other group. So we're selecting our patients who need medical delay. And it's no surprise that they're the sicker patients. Equally we're selecting our patients who [00:23:00] are going to experience an administered delay, and I suspect what's happening is that people are making sensible decisions. I've got two patients that I need to operate on, this patient is fitter so they can more easily sustain one more day for that surgery. Clearly, you know, you'd like to do them both today, but if you can't, who's going to be delayed? The fitter patient . 

Or there's some systematic health system problem where you can't deliver the treatment you want to, like a total hip and following, you know, the good principles of girth. You want someone that's confident doing a total hip replacement for that patient. And so you wait for the appropriate surgeon so that they get the first operation being the right operation. 

I think the conclusion from that is that whilst none of these data take away from the humanitarian considerations that, you know, spending more than the minimum period of time in bed with an untreated hip fractures is painful and very distressing, it's probably the case that it is safe to [00:24:00] delay for the right surgery, from the right surgeon at the right time in daylight hours. And being delayed for a medical cause almost certainly means that you're at very high risk of having a poor outcome. Now we should think carefully about when it's appropriate to delay and these specialist societies and associations have put out good guidance for that. I'm not suggesting we should delay patients who are sick. But that ultimately those patients are likely to have a bad outcome almost irrespective of what happens. It may not be causally linked to that delay as such. 

Yeah. No, absolutely. Absolutely. So Matt, if I come back to you, I mean, sort of putting the study into context, you know, the strengths of the study without question, you know, in terms of the size, the methodology used, that robust analysis performed, it's a large multicentre cohort study, very likely, very representative throughout the UK. And it does provide very, I think, very strong and compelling evidence about the association between best practice tariff and the outcome for our hip fracture patients. But what [00:25:00] do you feel in terms of the context of the current literature? And really in terms of our day-to-day clinical practice, what the key take away messages with the study are, and maybe in context any limitations of the study as well.

Sure. So well, as you said, I mean, I guess the key message is that if you achieve the quality standards, if you provide the level of care that we think is best practice, then essentially you will have better outcomes for your patients. All of those quality indicators were linked to improved health-related quality of life. As Xavier mentioned though, if you dig into that data, the quality indicators really drove the big improvements in quality of life for patients were interesting  because they were really the multidisciplinary care quality indicators. So it'd been assessed by the geriatrician and early assessment by a physio. So some people say, well, how can being assessed for falls risk actually improve the quality of life? How does that work? Well, of course it's not the assessment for falls risk. It's the fact that a senior geriatrician has taken interest in your ongoing medical care. So it's the [00:26:00] review assessment and treatments or conditions that lead to fall. It seems to me improve quality of life. And I think that's a really, really powerful message. This is the first time I think we've had really good quality data to say that that sort of care that we think is important is linked not just to mortality, which in itself is important, but even more so from the patient perspective is now linked to to quality of life for patients afterwards, which is what they tell us is most important.

I guess the big caveat and the real limitation is, and Xavier has mentioned it, is observational data. So we can't take causality from this. It's an association. Now it's a strong one. We find it pretty good data with a fairly robust and repeated statistical analysis and really look at this and we think it's a real effect, but we can't say that seeing the geriatrician improves your quality of life. We can say that that assessment by a the geriatrician is associated with your improvement and quality of life. It's not a randomized trial where we can eliminate all of the unknown confounding factors here. But that criticism is true with [00:27:00] every single observational data set that's ever reviewed by the journal or any journal around the world. So I think we did the best we could, but you can't assume causality here is the big message.

No, no, absolutely. But like you say, in terms of the size of the study and the analysis performed, I think it's as good as it gets in terms of the associations you can make from observational data. 

And Matt, if I just pick up on the, in terms of the best practice tariff indicators, were you surprised by the figured? How did you interpret that? And do you think that's changed at all? Obviously this study finished in 2017, or the data sorry did.  Do you think that's changed since then? 

Yeah. Well, I mean, in terms of the analysis is quite helpful. So having half your group in one category achieving best practice is quite useful for the distributions. But yeah, I mean, it's in a way it's disappointing that only, you know, just over half of the patients who achieve all of these quality indicators, or we are achieving [00:28:00] those quality indicators on their behalf, but I wasn't too surprised. And the key thing here is the, and maybe an implication for future practice is that you need to change these indicators. You need to tailor them to what really matters to patients. So now we know that so many indicators, for instance having a protocol for admission, everyone achieves that. So that's no longer discriminator. It doesn't help you anymore if everyone's doing it. I was going to say it wasn't useful at the beginning, but now you change it and you improve those. You've got to continually assess these quality indicators and perhaps more importantly what i s a sensible and useful quality that indicates and improves quality of life in the UK. It may be very different in other parts of the world. So in middle-income countries, the processes of delivery are very different. So having a target of less than 36 hours to surgery in the context of the UK makes a lot of sense from admission. But if, in a low middle income country, you've got a seven day delay before you get to hospital, the 36 hours after you arrive doesn't make a huge amount of [00:29:00] sense in that context. You've got to tailor your quality standards to your health care system  is perhaps you're the big message for all the people around the world who are listening to this podcast and reading the paper.

Yeah, no, absolutely. I think that was something I took from the paper is that actually this data can be useful, like you say, these indicators can be dynamic and they can continuously evolve. And like you say, drop some off if they just become standard almost. Everybody's doing it so it doesn't really matter. And then adding others in that we think can really make an impact.

Xavier, if I just come back to you maybe to finish off, I mean, one of the things you do mention in the study and the discussion, which I think is really interesting because obviously this is about, you know, it's almost pay-related best practice tariff, but what do we know about the cost-effectiveness of these indicators? Is there any work out there and is that an area for future research do you think? 

So, unfortunately our study, wasn't really able to address that and so we have been quite up front about that in the limitations. We would have liked to have done that of [00:30:00] course . Cost-effectiveness is a big part of how we plan our NHS service delivery, globally and in the UK.

Why didn't we do it? Well, because it's actually quite complicated. So we've got  an outcome for the patient. We've got the EQ-5D in the utility, and we'd like to know how much this costs for those patients. But the issue is that as everyone knows , the service is built for all the patients and having an ortho geriatrician or having a joint protocol and all the work there was wrapped around that and the costs for providing it are not really administered on an individual patient basis. They are administered at the service level. And so it's a hugely important question, and I think it's a valuable thing to tackle next. But it was really outside the scope of what we were able to do here within the WHiTE cohort.

Sure. No, absolutely. Well guys, that's all we have time for actually, but thank you so much for both of you taking the time to join us and congratulations on a really interesting and [00:31:00] excellent study. And obviously the WHiTE studies as a whole. It was great to have you both with us. 

Thanks, Andrew.

And to our listeners, we do hope you've enjoyed joining us. And we encourage you to share your thoughts and comments through social media and a like. Feel free to tweet a post about anything we've discussed here today. And thanks again for joining us. Take care of everyone.