Our aim is to determine the interobserver reliability for surgeons to detect Hill-Sachs lesions on magnetic resonance imaging (MRI), the certainty of judgement, and the effects of surgeon characteristics on agreement.
Twenty-nine patients with Hill-Sachs lesions or other lesions with a similar appearance on MRIs were presented to 20 surgeons without any patient characteristics. The surgeons answered questions on the presence of Hill-Sachs lesions and the certainty of diagnosis. Interobserver agreement was assessed using the Fleiss’ kappa (κ) and percentage of agreement. Agreement between surgeons was compared using a technique similar to the pairwise t-test for means, based on large-sample linear approximation of Fleiss' kappa, with Bonferroni correction.
The agreement between surgeons in detecting Hill-Sachs lesions on MRI was fair (69% agreement; κ, 0.304; p<0.001). In 84% of the cases, surgeons were certain or highly certain about the presence of a Hill-Sachs lesion.
Although surgeons reported high levels of certainty for their ability to detect Hill-Sachs lesions, there was only a fair amount of agreement between surgeons in detecting Hill-Sachs lesions on MRI. This indicates that clear criteria for defining Hill-Sachs lesions are lacking, which hampers accurate diagnosis and can compromise treatment.
During anterior shoulder dislocation, the head of the humerus can be pressed against the antero-inferior part of the glenoid rim and cause an impression fracture of the posterior superior lateral humeral head, known as a Hill-Sachs lesion [
A Hill-Sachs lesion can be detected on radiographic imaging, but computed tomography (CT) and magnetic resonance imaging (MRI) are more sensitive [
This gap in the literature is critical, as discordant diagnoses by healthcare professionals can have detrimental impacts on patient care and recovery. Consequently, if reliability is low, healthcare providers do not agree on the presence of Hill-Sachs lesions. That means that patients with (and without) Hill-Sachs lesions can be diagnosed and treated differently by surgeon. Additionally, the incidence of Hill-Sachs lesions in the literature can vary, largely due to differences in clinical judgement. We are interested specifically in treating surgeon radiological judgement rather than the expert radiologist assessment judgement because surgeons always assess MRIs before discussing treatment options with the patient.
Halma et al. [
This is the fourth study on this important topic, and we aimed to provide further insight into the role of MRI as a diagnostic instruments that can be used by surgeons. Specifically, we aimed to determine: (1) the interobserver reliability for surgeons to detect Hill-Sachs lesions on MRI, (2) the certainty of surgeons regarding their judgement, and (3) the effects of surgeon characteristics on agreement. To achieve this, we incorporated results from a substantially sized group of surgeons with varying levels of expertise to assess multiple MRIs with and without Hill-Sachs lesions and with no additional patient characteristics for context. We hypothesized that agreement would be fair, certainty would be high, and agreement would increase with corresponding increase in level of expertise.
This study has been approved by the Institutional Review Board of the OLVG Hospital (IRB No. WO 16.052).
Our hospital database was screened for available shoulder MRIs of patients with shoulder instability based on diagnosis codes. The medical records of these patients were manually screened by two researchers (HA and AS) for MRIs with Hill-Sachs lesions (n=19) or other defects with a similar appearance (n=10). These other defects were visible at the typical location for a Hill-Sachs lesion, but were not a Hill-Sachs lesion as reported by the musculoskeletal radiologist. Such lesions included bone cyst, erosion of cartilage, small grooves, or the bare area of the humeral head [
The MRI results were uploaded to a secure online survey platform (
We did not provide any patient characteristics to isolate and assess the role of the MRI, which is just one of the available diagnostic tools. Because age, sex, and history of recurrent instability can predispose patients toward a Hill-Sachs or other diagnosis in regular clinical practice, not providing this information allowed assessment of the research question based purely on MRI.
Sample size was based on expert opinion, numbers of MRIs and respondents in previous studies, [
The interobserver variability was determined using Fleiss’ Kappa, a statistical measure for assessing agreement of a fixed number of more than two observers. The kappa (κ) value is interpreted as poor (<0 points), slight (0.01–0.20), fair (0.21–0.40), moderate (0.41–0.60), substantial (0.61–0.80), or almost perfect (0.81–1) agreement. The overall kappa values were calculated for each MRI and indicated the extent to which surgeons agreed on the presence or absence of a Hill-Sachs lesion. All surgeon characteristics were presented in absolute numbers and percentages, and surgeons were grouped according to characteristics. A technique similar to the classical pairwise t-test for means, based on a large-sample linear approximation of Fleiss' kappa, was used to test differences in interobserver agreement [
We invited 106 surgeons in total, and 20 surgeons completed the survey (19%). The majority was employed in Europe and specialized in shoulder and elbow surgery. Among the three surgeons with another specialty, two specialized in orthopedic traumatology.
The observer answers are summarized in
Reponses for evaluating the presence of a Hill-Sachs lesion indicated that 32% of the answers were very certain, 52% were certain, 16% had some doubts, and 0% were very uncertain.
Surgeons with 11–20 years of experience had better agreement than surgeons with 6–10 years of experience (11–20 years: 90% agreement; κ=0.703 vs. 6–10 years: 66% agreement; κ=0.235, p=0.005). Having 0–5 years of experience did not influence agreement in comparison with 6–10 years (71% agreement, κ=0.363 vs. 66% agreement, κ=0.235, p=0.046) or 11–20 years (71% agreement; κ=0.363 vs. 90% agreement; κ=0.703, p=0.05). Country of specialty, shoulder and elbow specialty, and involvement in resident or fellowship training did not affect the level of agreement within subgroups of surgeons, as detailed in
This study showed fair interobserver reliability to detect Hill-Sachs lesions on MRI, indicating that MRI alone should be interpreted with caution in clinical decision making. Although the surgeons were mostly (84%) certain or very certain regarding their decision about the presence of a Hill-Sachs lesion, the degree of agreement between surgeons in detecting a Hill-Sachs lesion on MRI was only fair. In this sample of 20 surgeons, agreement was not affected consistently by surgeon’s country of specialty, years of experience, specialty, or fellowship training.
The fair agreement for the presence of Hill-Sachs lesions could be attributed to difference in interpretation of the transition zone between cartilage and bone. Lack of cartilage can have the same appearance as an impression fracture and could be mistaken for a Hill-Sachs lesion, or vice versa. Moreover, the articular surface of the humeral head is the smallest in the superior-posterior segment and is the typical location of a Hill-Sachs lesion [
The fact that the two surgeons with 11–20 years of experience had better agreement when assessing the presence of a Hill-Sachs lesion supports the value of subspecialties. Our results show a slightly higher agreement between surgeons with less than 5 years of experience in comparison with those with of 6–10 years, but both agreements were fair with a difference of only 5%, which limits the clinical relevance of this finding. The fair agreement with high level of confidence about the presence of a Hill-Sachs lesion indicates that surgeons cannot rely on their personal sense of certainty for these types of diagnostic and treatment decisions.
We included a representative mix of MRIs that consisted of smaller and larger Hill-Sachs lesions as well as lesions that are similar in appearance to simulate the clinical setting. We agree that adding these cases of lesions with a similar appearance to a Hill-Sachs lesion likely limits agreement between surgeons, but deemed this inclusion an important parameter for adequately assessing agreement as these cases provided relevant simulations of the clinical population. There were cases in the set of MRIs that had varying agreement that ranged from bad to good, but the overall agreement was fair. We think that the overall agreement best represents the clinical setting that consists not only of cases wherein lesions are easily distinguished from each other.
There are some limitations for interpreting the results of this study. First, we only had a response rate of 19%, which could influence our data due to lack of generalizability to all surgeons. Second, we did not confirm the Hill-Sachs lesions by arthroscopy. However, the accuracy and correlation between the MRI and arthroscopic findings have been documented in previous studies [
Another limitation is that we looked at years of experience of the surgeons and not at the volume of shoulder and elbow procedures they had performed. Years of experience might be biased due to young, subspecialized shoulder surgeons performing many more shoulder procedures than older surgeons who have a wider scope of interest. Finally, some of the MRIs were performed with intravascular contrast. To our knowledge, there is no known difference in assessing Hill-Sachs lesions between MRIs with and without contrast.
A strength of this study was that a widely used interobserver agreement method (kappa) was used to assess the degree of consensus between surgeons regarding the presence and treatment of Hill-Sachs lesions and was augmented with percentage of agreement, which is easier to interpret. Moreover, we assessed consensus based on MRIs, which are most commonly used to detect pathology that causes glenohumeral instability [
Future research could address the disagreements that arise by evaluating and defining the criteria for individual surgeons to use to diagnose Hill Sachs lesions. These criteria can be considered and included in guideline development. Furthermore, an important and trending topic is to evaluate the most reliable measurement for glenoid and humeral bone loss [
Shoulder and Elbow Center (collaborators): Gregory R. Waryasz; Matthijs R. Krijnen; Pierre Mansat; Sven A.F. Tulner; Christian M. Fortanier; Carola F. van Eck; Ruud P. van Hove; Christiaan J.A. van Bergen; John N. Trantalis; Paul Hoogervorst; Tjarco D.W. Alta; Guus J.M. Janus; Alexander van Tongel; Diederik J.W. Meijer; Ronald N. Wessel; Mark Schnetzke; John Cheung; Derek F.P. van Deurzen.
None.
None.
Examples of magnetic resonance imaging (MRI) included in this study. (A) An MRI of a shoulder with a large Hill-Sachs lesion. (B) An MRI of a shoulder with a small Hill-Sachs lesion. (C) An MRI with intra-articular contrast of a shoulder with a small Hill-Sachs lesion.
Studies since 2000 that have assessed MRI accuracy and reliability
Study | Observer | Number of MRIs wherein Hill Sachs is studied | Kappa | Sensitivity (%) | Specificity (%) |
---|---|---|---|---|---|
Beason et al. (2019) [ |
22 Shoulder/sports medicine fellowship-trained orthopedic surgeons | 20 | 0.33 | ||
Halma et al. (2012) [ |
2 Radiologists, 1 orthopedic surgeon | 50 | R1 vs. R2: 0.21 | 0–33 | 72.3–95.7 |
R1 vs. OS: 0.31 | |||||
R2 vs. OS: –0.01 | |||||
Saqib et al. (2017) [ |
Radiologist | 194 | - | 91 | 91 |
Kalson et al. (2011) [ |
Shoulder radiologist or musculoskeletal radiologist | 95 | - | 71 | 85 |
Hayes et al. (2010) [ |
2 Radiologists | 87 | - | 96.3 | 90.6 |
Theodoropoulos et al. (2010) [ |
Community-based radiologists | 238 | 1 (Unenhanced MRI) | Unenhanced MRI : 0 | Unenhanced MRI : 85–100 |
Fellowship-trained radiologists | 0.788 (MR arthrogram) | MRMR arthrogram : 50–70 | MR MR arthrogram: 99–100 | ||
Probyn et al. (2007) [ |
Musculoskeletal radiologist or musculoskeletal imaging fellow | 15 | - | 100 | (due to only 1 patient not having a Hill-Sachs) |
van Grinsven et al. (2007) [ |
2 Radiologists | 61 | 0.45 | - | - |
Kirkley et al. (2003) [ |
2 Musculoskeletal radiologists | 16 | - | 100 | 100 |
van Grinsven et al. (2015) [ |
4 Radiologists | 45 | Between radiologists: 0.51 and 0.46 | 41.1–73.8 | 81.3–88.5 |
4 Orthopedic surgeons | Between orthopedic surgeons: 0.46 and 0.41 | ||||
(2 teams of radiologists and 2 of orthopedic surgeons) | |||||
Chauvin et al. (2013) [ |
3 Radiologists with experience in musculoskeletal disorders | 66 | - | 100 | 94 |
Mahmoud et al. (2013) [ |
2 Musculoskeletal radiologists | 31 | - | 81.8 | 95.2 |
O'Brien et al. (2012) [ |
2 Musculoskeletal radiologists | 165 | 0.964 | - | - |
Simão et al. (2012) [ |
3 Radiologists | 56 | 0.64 | 100 | 78 |
Results per MRI
MRI | Hill-Sachs present |
Certainty regarding presence of Hill-Sachs |
||||
---|---|---|---|---|---|---|
Yes (%) | No (%) | Very uncertain (%) | Some doubts (%) | Certain (%) | Absolutely certain (%) | |
1 | 90 | 10 | 0 | 0 | 45 | 55 |
2 | 45 | 55 | 0 | 15 | 45 | 40 |
3 | 90 | 10 | 0 | 5 | 70 | 25 |
4 | 95 | 5 | 0 | 15 | 50 | 35 |
5 | 5 | 95 | 0 | 5 | 55 | 40 |
6 | 80 | 20 | 0 | 20 | 40 | 40 |
7 | 90 | 10 | 0 | 15 | 30 | 55 |
8 | 30 | 70 | 0 | 25 | 45 | 30 |
9 | 90 | 10 | 0 | 5 | 60 | 35 |
10 | 60 | 40 | 5 | 25 | 50 | 20 |
11 | 80 | 20 | 0 | 15 | 40 | 45 |
12 | 60 | 40 | 0 | 30 | 40 | 30 |
13 | 55 | 45 | 0 | 20 | 55 | 25 |
14 | 20 | 80 | 0 | 20 | 65 | 15 |
15 | 30 | 70 | 0 | 15 | 65 | 20 |
16 | 75 | 25 | 0 | 25 | 45 | 30 |
17 | 100 | 0 | 0 | 10 | 40 | 50 |
18 | 55 | 45 | 0 | 35 | 50 | 15 |
19 | 90 | 10 | 0 | 15 | 45 | 40 |
20 | 60 | 40 | 0 | 35 | 45 | 20 |
21 | 95 | 5 | 0 | 0 | 55 | 45 |
22 | 90 | 10 | 0 | 10 | 60 | 30 |
23 | 75 | 25 | 0 | 15 | 60 | 25 |
24 | 5 | 95 | 0 | 5 | 75 | 20 |
25 | 70 | 30 | 0 | 30 | 40 | 30 |
26 | 50 | 50 | 0 | 15 | 70 | 15 |
27 | 100 | 0 | 0 | 10 | 45 | 45 |
28 | 90 | 10 | 0 | 15 | 45 | 40 |
29 | 60 | 40 | 5 | 10 | 70 | 15 |
MRI: magnetic resonance imaging.
Agreement by surgeon characteristics on presence of Hill-Sachs lesions
Variable | Agreement (%) | Fleiss’ kappa (κ) | p-value |
---|---|---|---|
Country of specialty | |||
Europe (n=15, 75%) | 70 | 0.323 | 0.863 (vs. USA) |
0.067 (vs. other) | |||
United States (n=2, 10%) | 66 | 0.289 | 0.394 (vs. other) |
Other (n=3, 15%) | 66 | 0.114 | |
Year of practice | |||
0–5 (n=8, 40%) | 71 | 0.363 | 0.046 (vs. 6–10) |
0.050 (vs. 11–20) | |||
6–10 (n=10, 50%) | 66 | 0.235 | 0.005 |
11–20 (n=2, 10%) | 90 | 0.703 | |
Specialty | 0.876 | ||
Shoulder and elbow surgery (n=17, 85%) | 69 | 0.298 | |
Other (n=3, 15%) | 68 | 0.276 | |
Involved in resident or fellow training | 0.172 | ||
Yes (n=13, 65%) | 67 | 0.259 | |
No (n=7, 35%) | 72 | 0.366 |
Statistically significant.