Determining the Minimally Important Change of the Michigan Hand outcomes Questionnaire in patients undergoing trigger finger release

Introduction: The Michigan Hand outcomes Questionnaire (MHQ) is a widely used instrument to evaluate treatment results for hand conditions. Establishing the Minimally Important Change (MIC) is essential for interpreting change in outcome that is clinically relevant. Purpose of the Study: The purpose of this study was to determine the MIC of the MHQ total and subscale scores in patients undergoing trigger finger release. Study Design: This is a prospective cohort study conducted between December 2011 and February 2020. Methods: Patients completed the MHQ prior to surgery and 3 months postoperatively. The MIC of the MHQ was determined using 5 anchor-based methods (ie, 2 anchor mean change methods and 3 receiver operating characteristic methods). The median MIC value was determined to represent the triangulated MIC. Results: A total of 1814 patients were included. The MIC for the MHQ total score ranged from 7.7 to 10.9, with a triangulated estimate of 9.3. The MIC estimates for 5 of 6 of the MHQ subscales ranged from 7.7 to 20.0. No MICs could be determined for the MHQ subscale “aesthetics” due to low correlations between the anchor questions and MHQ change scores. Conclusions: These MIC estimates can contribute to the interpretation of clinical outcomes following trigger finger release and for assessment of power in prospective trials. © 2021 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )


Introduction
In recent years, the use of patient-reported outcome measures (PROMs) is gaining more interest in clinical research and the evaluation of treatment outcomes. 1 PROMs allow patients to provide direct information about their perceived health outcomes. This en-ables clinicians to gain insight into the effects of treatment on patient's symptoms and functioning. Yet, to interpret if a significant change in a PROM is also clinically relevant, the concept of the Minimally Important Change (MIC)-defined as "the smallest difference in an outcome measure that patients perceive as important"is essential. 2 , 3 The MIC can be determined using multiple methods, categorized as distribution-based or anchor-based. Distribution-based methods determine the MIC based on the variability in outcome at baseline. In contrast, anchor-based methods determine the MIC based on a separate anchor question in which patients rate the change in outcome on a Likert scale. For example, an anchor question related to change in hand function might be, "Since the inter-

ARTICLE IN PRESS
JID: HANTHE [mNS; 2:58 ] vention, how has your hand function changed?". The MIC is then determined as the change in outcome corresponding to a minimal change on the anchor question. A limitation of distribution-based methods is that the MIC is not based on a patient-reported change in outcome. Therefore, anchor-based methods are the appropriate way to estimate the MIC directly. [4][5][6] Two frequently used anchorbased methods are the anchor mean change (AMC) and receiver operating characteristic (ROC) method. As the best methodology for determination of the MIC remains a topic of debate, it is recommended to combine MICs from multiple methods to achieve the best estimation. 4 , 5 To evaluate treatment outcomes in hand conditions, the Michigan Hand outcomes Questionnaire (MHQ) 7 is often used and recommended in standardized outcome sets, such as the one recently proposed by the International Consortium for Health Outcomes Measurement (ICHOM). 8 The MHQ evaluates various aspects of hand function, including overall hand function, activities of daily living, work performance, pain, aesthetics, and satisfaction (with scores ranging from 0 to 100). 7 The MIC of the MHQ has been estimated in 3 studies, [9][10][11] in which the study samples included patients with a variety of hand conditions, including carpal tunnel syndrome, osteoarthritis, Dupuytrens contracture, and tendonitis. Only one study estimated the MIC for the MHQ total score, which was found to be 10.8. 11 The MICs for the subscales ranged from 3 to 33, although none of these studies reported the MIC for the subscale "aesthetics." [9][10][11] Within these mixed populations, patients with different hand conditions have been shown to score differently on specific subscales (eg, the MIC for the MHQ subscale pain was found to be 23 in patients with a carpal tunnel release and 3 in patients undergoing silicone MCP arthroplasty). 10 This indicates that MIC values cannot be reliably generalized to other hand conditions, such as trigger finger.
Trigger finger is a highly prevalent condition with a lifetime incidence of 3%. 12 The most common complaints are pain and locking of the finger. Initial management typically consists of nonsurgical treatment (eg, activity modification, splinting, or a steroid injection). 13 In cases of failed nonsurgical treatment, severe locking, or a long symptom duration, surgical trigger finger release (TFR) can be considered. As TFR is one of the most commonly performed procedures of the hand, 14 more insight into relevant change from the patient's perspective is essential to evaluate treatment outcomes adequately. Therefore, the aim of this study was to determine the MIC for the MHQ total score and subscale scores in patients that underwent TFR using 5 anchor-based methods.

Study design
This is a cohort study, based on data from the Hand and Wrist Study cohort, 15 using prospectively acquired data from a consecutive population-based sample, reported following the STROBE statement. 16

Setting
Between December 2011 and February 2020, data were collected at Xpert Clinic Hand and Wrist Care and Xpert Clinic Hand Therapy as part of routine outcome measurements. 15 Xpert Clinic comprises 26 locations, 23 European Board-certified hand surgeons, and over 150 hand therapists. The local medical ethics review board approved the study. All patients provided written informed consent.

Participants
All patients that were planned for TFR were screened for eligibility. Patients were excluded if (1) they were aged < 18 years; (2) there was recurrence following prior TFR; (3) there was a concomitant treatment of the affected hand (eg, TFR of another finger during the same session); (4) there was a subsequent surgical treatment of the same hand < 6 months or the contralateral hand < 3 months; or (5) there was missing data in the MHQ at baseline or 3 months postoperatively. Furthermore, patients with a baseline score of ≥90 on the MHQ total score were excluded from the analysis to prevent ceiling effects ( Fig. 1 ), since this could result in an underestimation of the MIC. 11 For the MHQ subscales, patients with a baseline score ≥90 on a specific MHQ subscale were excluded from the analysis. This resulted in different sam ple sizes per subscale, as patients with a ceiling effect on a specific subscale may not have this on another subscale.

Treatment
All surgeries were performed by Federation of European Societies for Surgery of the Hand certified hand surgeons. Patients underwent open release of the A1 pulley under local anesthesia following common practice. 17 Patients received standard postoperative care, consisting of a bandage for 3 days and hand therapy. Sutures were removed between 10 and 14 days postoperatively. All patients had a follow-up appointment with their hand surgeon approximately 3 months postoperatively.

Variables and measurements
The MHQ consists of 37 items evaluating 6 subdomains: overall hand function, activities of daily living, work performance, pain, aesthetics, and satisfaction with hand function (score 0-100, higher scores indicate better performance except for the subscale pain). 18 For interpretability, we converted the pain subscale in this study so that higher scores indicate less pain (ie, a score of 0 indicates no pain and 100 indicates the worst possible pain). The primary outcome of this study was the MIC of the MHQ total score, whereas MICs of the MHQ subscale scores were secondary outcomes. The MHQ has been evaluated in mixed populations of hand conditions and has a good reliability (ie, intraclass correlation coefficient of 0.95 for the MHQ total score and approximately 0.90 for the MHQ subscales) and validity. 3 , 18-21 The MHQ was assessed at baseline and 3 months postoperatively. Additional baseline measurements included age, sex, comorbidity, type of work, hand dominance, treatment side, symptom duration, and prior steroid injection. Furthermore, the question "How is your satisfaction with the treatment result?" was assessed at 3 months postoperatively, which has recently been shown to have a good construct validity and high test-retest reliability. 22

Study size
No recommendations on sample size calculation for determining the MIC were found in literature. However, the sample size of 1814 patients that were included in this study should be sufficient.

Statistical methods
Missing value analysis on outcomes at 3 months showed a nonsignificant Little's tests ( P = .140), suggesting that missing values were missing completely at random. 23 , 24 For further evaluation of missing data at 3 months, demographic characteristics at baseline were compared between patients with and without the presence of the primary outcome at 3 months (Supplementary Table 1). No significant differences were found. The MICs for the MHQ total and subscale scores were calculated using 5 anchor-based methods (ie, 2 AMC methods and 3 ROC methods). As it is strongly recommended to use multiple independent anchors, 4 we used 2 measures of satisfaction as anchors, providing an indication of treatment success. The first anchor was based on the question: "How is your satisfaction with the treatment result?". Responses were limited to one of the following items on a Likert scale: "excellent," "good," "fair," "moderate," or "poor." The second anchor was based on the satisfaction domain of the MHQ, in which patients rated their satisfaction with hand function. 18 Since the latter anchor question is inherently correlated with the MHQ total score and MHQ satisfaction subscale, we omitted this anchor to estimate the MICs for these subscales. We performed a Spearman correlation analysis between the MHQ total and subscale scores and the anchors to determine if the anchors were suitable for further analysis. The anchors were considered suitable if they correlated with the MHQ score of the specific subscale by ≥0. 3. 4 Anchor mean change methods AMC methods use an anchor question to compare outcomes of patients who respond different to the anchor question. 25 The first AMC method was conducted using the question "How is your satisfaction with the treatment result?" as an external anchor. The mean change in MHQ score for patients rating their satisfaction as "fair" was determined as a representation of the MIC. In the second AMC method, the satisfaction domain of the MHQ was used as an anchor question. 9-11 , 26 For this, the MHQ satisfaction score was transformed to a Likert scale by dividing the total score into 5 equal categories of 20 points. The mean change in MHQ score for patients categorized in the group "somewhat satisfied" was determined to represent the MIC.

Receiver operating characteristic methods
ROC methods examine the diagnostic ability of different MIC thresholds to discriminate between patients classified as satisfied and dissatisfied on the anchor question. 5 , 27 ROC methods require dichotomization of the anchor question, in this case: being satisfied or dissatisfied. The first method included the satisfaction with treatment result question. Patients rating their satisfaction with "excellent," "good," or "fair" were classified as satisfied, whereas patients rating their satisfaction with "moderate" or "poor" were classified as dissatisfied. In the second ROC method, the satisfaction domain of the MHQ was used as an anchor. Patients with a score exceeding 60 points (ie, patients rating their satisfaction as "very satisfied" or "somewhat satisfied" on the transformed Likert scale) were classified as satisfied, whereas the remaining patients were classified as dissatisfied. For the third ROC method, the change in the satisfaction domain of the MHQ at 3 months compared to baseline was calculated. Patients demonstrating an effect   size of at least 0.5 were classified as satisfied. [9][10][11] For the MHQ total score and its subscales, we evaluated the accuracy of all 3 ROC methods to discriminate between satisfied and dissatisfied patients using the area under the curve (AUC). The minimal threshold for discriminative ability was determined by an AUC cut-off of 0.75. 28 Following identification of subscales with discriminative ability, the change in MHQ scores with the greatest sensitivity and specificity to determine satisfaction was calculated with Youden's index and was considered as the MIC. 29 , 30 Triangulated estimate of the MIC It is recommended to triangulate (ie, to create a weighted mean) MICs from multiple methods to achieve a single value or small range of values to represent a best estimate of the MIC. 4 , 5 For triangulation, it is recommended to assign most weight to MICs determined with anchor-based methods and MICs calculated from anchors with most proximity to the target PROM. 4 Since the current study solely used anchor-based methods and both anchor questions are based on satisfaction, equal weight was assigned to all MICs. Therefore, the median MIC was determined to represent the triangulated MIC. Furthermore, we used a distribution plot to provide insight into the distribution of change in MHQ total scores for satisfied and dissatisfied patients in relation to the triangulated MIC. 31 , 32

Results
Following application of the eligibility criteria, 1814 patients were included in the primary analysis ( Fig. 1 ). The baseline characteristics and MHQ scores at baseline and follow-up are listed in Tables 1 and 2 , respectively. An overview of the estimated MICs using the 5 methods and the triangulated MICs is shown in Table 3 and Figure 2 . For the MHQ total score, we found a triangulated MIC of 9.3. The distribution plot shows that 66% of patients are correctly classified as satisfied (ie, sensitivity) and that 73% of patients are correctly classified as dissatisfied (ie, specificity) when using a MIC threshold of 9.3 ( Fig. 3 ). Triangulated MICs for the remaining MHQ subscales ranged from 7.7 to 20.0. Yet, we were unable to estimate the MICs for the subscale "aesthetics," due to low correlations between the anchor questions and MHQ scores on this subscale (Supplementary Table 2).

Anchor mean change method
Mean change in MHQ total score ( Fig. 4 ) and MHQ subscale scores ( Supplementary Figs. 1 and 2) was higher for groups with increasing satisfaction. For the MHQ total score, we only used the satisfaction with treatment result anchor and found a MIC of 10.9 ( Table 4 ). The MICs for the MHQ subscales ranged from 7.7 to 23.3 using the satisfaction with treatment result anchor. The satisfaction with hand function anchor resulted in MICs ranging from 9.8 to 18.4.

Receiver operating characteristic methods
To estimate the MIC for the MHQ total score, we only used the satisfaction with treatment result anchor. This resulted in an AUC of 0.75 and a MIC of 7.7 ( Table 5 ). No MICs were determined for MHQ subscales without discriminative ability (ie, AUC < 0.75). MICs for the subscales with sufficient discriminative ability ranged from 7.5 to 17.5.

Discussion
In this study, we found that the triangulated MIC for the MHQ total score was 9.3 (range: 7.7-10.9) in patients surgically treated for trigger finger. Triangulated MIC estimates could be estimated in 5 of 6 of the MHQ subscales and ranged from 7.7 to 20.0.
Although studies examining the MIC for the MHQ are scarce, our findings are in line with prior literature. One study estimated the MIC for the MHQ total score and found a MIC of 10.8, 11 which is in line with the MIC of 9.3 in the present study. It should be noted, however, that they used the MHQ satisfaction subscale to estimate the MIC. Since the MHQ total score is partially derived from the MHQ satisfaction score, this prompts cautious interpretation of the estimated MIC due to the inherent correlation with the anchor question. Additionally, although it is widely accepted that the MHQ yields acceptable psychometric properties under the classical test theory, 3 , 18-21 a recently published study using Rasch analyses found contrary results. 33 They concluded that the MHQ subscales evaluate different domains of hand function, indicating that it may not be appropriate to sum all 37 items into a total score. Although this does not necessarily imply that the estimated MIC for the MHQ total score is inaccurate, the study by Farzad et al highlights that the MHQ total score should be interpreted with caution.
Considering the MHQ subscales, the estimated MICs are also consistent with prior literature. [9][10][11] For example, in a population with 6 different hand conditions, the triangulated MICs for the MHQ subscales ranged from 10.9 to 21.3. 11 However, they were unable to estimate the MIC for the aesthetics subscale due to a low correlation with the anchor question. In the present study, we also found a low correlation between the aesthetics subscale and both anchor questions. A plausible explanation may be that the aesthetics subscale is of less relevance in this specific population, as patients with trigger finger primarily complain of pain and locking. This is also supported by the high baseline scores on  Abbreviations: MHQ = Michigan Hand outcomes Questionnaire; MIC = Minimally Important Change; ADL = activities of daily living. § The MHQ satisfaction anchors were not used to estimate the MICs for the MHQ total score and MHQ satisfaction subscale due to the inherent correlation with these specific subscales.

Fig. 2. Estimated
Minimally Important Changes (MICs) using 5 anchor-based methods and triangulated estimated MICs for the Michigan Hand outcomes Questionnaire total score and subscale scores. The triangulated MICs (shown in black) were calculated as the median of the estimated MICs using the anchor-based methods. The MHQ satisfaction anchors were not used to estimate the MICs for the MHQ total score and satisfaction subscale due to the inherent correlation with these specific subscales. For ROC methods with an area under the curve below 0.75 (ie, minimal threshold for discriminative ability), no MIC could be estimated. No MICs could be estimated for the MHQ subscale aesthetics due to low correlations with the anchor question. Abbreviations: MIC = Minimally Important Change; MHQ = Michigan Hand outcomes Questionnaire; AMC = anchor mean change; ROC = receiver operating characteristic.   this subscale. In specific, approximately 50% of patients were excluded from the analyses due to a baseline score > 90 points, and the remaining patients still yielded a mean baseline score of 74 points. Hence, although a strength of the MHQ is that it evaluates various aspects of hand function, these findings suggest that not all subscales are relevant to certain hand conditions and may be subject to ceiling effects. This indicates that clinicians should consider using only certain subscales of the MHQ, depending on the hand condition being evaluated. Furthermore, the MHQ satisfaction subscale was an outlier in both studies with MIC values of 21.3 11 and 20.0 (present study). As both studies used an anchor based on satisfaction, one might suggest that these estimated MICs for the MHQ satisfaction subscale could be an overestimation. Yet, we found a mean change of 28 points on this subscale for the entire group, whereas we observed a mean change on the other MHQ subscales ranging from 9 to 19 points. This indicates that the outlier in the MIC for the MHQ satisfaction subscale results from the substantial change in MHQ scores from baseline to 3 months postoperatively.
The distribution plot of the MHQ total scores highlights that it is important to acknowledge that a MIC value may not apply to each individual patient, as the meaningfulness of improvement can vary between individuals. Application of the triangulated MIC for the MHQ total score resulted in a sensitivity of 66%, indicating that 34% of the satisfied patients were misclassified as dissatisfied (ie, not reaching the MIC threshold). Following the same principle, the specificity indicated that 27% of the dissatisfied patients were misclassified as satisfied by the triangulated MIC. We therefore advocate to primarily use MICs for the evaluation of treatment outcomes on group-level. However, when interpreting MIC values at the individual level, use of a distribution plot may provide guidance since this plot shows the proportions of satisfied and dissatisfied patients with a specific change in outcome. 32 , 34 For example, a patient with a change in MHQ total score of 20 is most likely satisfied, since the proportion of patients with this score is almost 5 times higher in the group of satisfied patients compared to dissatisfied patients. Hence, using these distribution plots may provide  insight into change scores related to satisfaction in individual patients.
A strength of this study is the large study sample of 1814 patients. Furthermore, only anchor-based methods were used in this study. This is a strength, since distribution-based methods do not provide information about change in outcome and do not take into account clinically relevant changes from the patient's perspective. In contrast, anchor-based methods determine the MIC based on change in outcome perceived by patients as clinically relevant. Therefore, anchor-based methods may be a more appropriate estimate of the MIC. 4 , 6 In the present study, we used 5 anchor-based methods to provide a range of MIC estimates, serving as a reference to most accurately estimate the MIC.
However, several limitations of this study should also be considered. First, it is important to acknowledge that dichotomization of satisfaction levels is a relatively arbitrary method. In the present study, patients reporting their satisfaction with treatment result as "fair" were categorized as satisfied. If this category would be categorized as unsatisfied, this would probably lead to a higher MIC. Nevertheless, since the results of the different methods were comparable, it is likely that this categorization was a proper estimate of patients' experienced satisfaction. Second, due to absence of test-retest measurements, the smallest detectable change (SDC) for the MHQ could not be determined. Hence, we were unable to compare our MIC values with SDC scores. However, since random measurement error due to low reliability may cancel out at a group-level, SDC values are mainly relevant at individual patientlevel, whereas MICs should mainly be used to interpret the clinical relevance of change at group-level. Third, although the prevention of ceiling effect reduces the risk of underestimating the MIC, a substantial number of patients was excluded from analysis. Additionally, the observational design of this study was also associated with a substantial proportion of missing data. Yet, nonresponder analysis demonstrated that missing data were missing completely at random. Despite this, an advantage of our observational design compared to a more artificial, experimental design is its ecological validity, making the MIC values calculated in this study more generalizable to actual patients undergoing TFR.

Conclusions
This study found a MIC of 9.3 for the MHQ total score in patients surgically treated for trigger finger. The MICs for 5 of 6 of the MHQ subscales ranged from 7.7 to 20.0. These MIC estimates may assist in treatment evaluation following TFR, for example when determining the proportion of patients with a meaningful improvement in comparative effectiveness studies, the development of clinical prediction models, or assessment of power in prospective studies. Future research should determine MIC values for patients with other hand conditions to improve the interpretability of treatment outcomes.