Reliability of Hallux Rigidus Radiographic Grading System

Introduction. The purpose of this study was to determine the interand intra-observer reliability of a clinical radiographic scale for hallux rigidus. Methods. A total of 80 patients were retrospectively selected from the patient population of two foot and ankle orthopaedic surgeons. Each corresponding series of radiographic images (weightbearing anteroposterior, weight-bearing lateral, and oblique of the foot) was randomized and evaluated. Re-randomization was performed and the corresponding radiograph images renumbered. Four orthopaedic foot and ankle surgeons graded each patient, and each rater reclassified the re-randomized radiographic images three weeks later. Results. Sixty-one out of 80 patients (76%) were included in this study. For intra-observer reliability, most of the raters showed “excellent” agreement except one rater had a “substantial” agreement. For inter-observer reliability, only 14 out of 61 cases (23%) showed total agreement between the eight readings from the four surgeons, and 11 out of the 14 cases (79%) were grade 3 hallux rigidus. One of the raters had a tendency to grade at a higher grade resulting in poorer agreement. If this rater was excluded, the results demonstrated a “substantial” agreement by using this classification. Conclusion. The hallux rigidus radiographic grading system should be used with caution. Although there is an “excellent” level of intra-observer agreement, there is only “moderate” to “substantial” level of inter-observer reliability. KS J Med 2015;8(4):125-134. Introduction Osteoarthrosis of the first metatarsophalangeal (MTP) joint of the foot may cause significant pain, disability, and difficulty wearing footwear. The term, hallux rigidus, is used to describe a condition commonly associated with degenerative arthritis of the first MTP joint with osteophyte formation, which results in a painful joint with reduction in the range of motion, especially dorsiflexion. Hallux rigidus is a progressive condition, and may present in early or late stages with varying degrees of stiffness and osteophytic thickening of the joint. Chronic MTP joint inflammation leads to capsular distention and eventually to a loss of capsular and collateral ligament integrity. Throughout the literature discussing foot and ankle disabilities, there have been multiple classification methods for hallux rigidus that have involved clinical findings, radiographic findings, or a combination of both. The role of these classification systems is to help a physician to choose an appropriate method of treatment as well as to provide a reasonably precise estimation of the outcome of that treatment. Some researchers have used these classification systems to compare the results of different studies and treatment procedures. For these classification systems to be useful, the classification system must produce the same desired results time after time in the hands of any physician or researcher who attempts to use it. Reliable


Introduction
Osteoarthrosis of the first metatarsophalangeal (MTP) joint of the foot may cause significant pain, disability, and difficulty wearing footwear.2][3] Hallux rigidus is a progressive condition, and may present in early or late stages with varying degrees of stiffness and osteophytic thickening of the joint.Chronic MTP joint inflammation leads to capsular distention and eventually to a loss of capsular and collateral ligament integrity.
17][18][19][20][21][22][23][24][25][26] For these classification systems to be useful, the classification system must produce the same desired results time after time in the hands of any physician or researcher who attempts to use it.8][39][40][41][42][43][44][45][46][47][48] Beeson et al. 26 performed an exhaustive literature review on hallux rigidus classification systems, and found a total of 18 different classification systems without any studies to determine the reliability of the systems.][7][8][9][10][11][12][13][14] Giannini et al. 5 and Coughlin et al. 10 presented a reasonable summary of the various radiographic grading systems.To our knowledge, there has not been a study that specifically addressed the reliability of radiographic grading for hallux rigidus.The purpose of this study was to determine the inter-and intra-observer reliability of a clinical radiographic scale for hallux rigidus.

Methods
Participants.A total of 80 patients were selected retrospectively from the patient population of two orthopaedic surgeons, who specialized in foot and ankle surgery, in a mid-western city.The study sample was selected based on three radiographs (weight bearing anterior-posterior (AP), oblique, and lateral) of the patients who were diagnosed with hallux rigidus.Poor quality or inadequate radiographs, or evidence of prior surgery were exclusions in this study.Patients with inter-metatarsal angles of greater than 15 degrees (normal is 9 degrees) or hallux valgus angles greater than 20 degree (normal is 15 degrees) also were excluded.
Instruments and Procedures.This study was approved by the Human Subjects Committees as minimal risk and with a waiver of consent and waiver of HIPAA authorization.Three different hard copy radiograph images that had been used for clinical decision-making for each selected patient were obtained.Radiograph images included views of the hallux from the weight-bearing AP, weight-bearing lateral, and oblique radiographs.These radiographs were de-identified of any patient information, and were enhanced and converted to black and white using Kodak EasyShare software (Version 8.2, Kodak, Rockester, NY).Each corresponding series of radiographic images (weight-bearing AP, weight-bearing lateral, and oblique) was randomized, given a number, and recorded on a CD-ROM disk of images.
The inter-and intra-observer reliability for classifying the hallux rigidus involved adjustment of the proportion of agreement among observers with a correction for the proportion of expected agreement by chance.
To evaluate inter-observer variability, four attending orthopaedic surgeons whom were trained in foot and ankle surgery were asked to classify the group of radiographic images independently according to the Giannini-modified Coughlin and Shurnas' classification systems (Table 1).Each attending orthopaedic surgeon was given a packet which contained descriptions and diagrams of Giannini's modification of Coughlin and Shurnas' grading system, 5 a score sheet, a CD-ROM disk of radiographic images, and a return mail envelope.To evaluate intraobserver reliability, two rounds of scoring were conducted for each rater with rerandomization of the radiographic images three weeks later and re-numbering between each round.

Grade 0 Normal
Grade 1 Dorsal osteophyte is main finding, minimal joint space narrowing, minimal periarticular scelorosis, minimal flattening of the metatarsal heads with a lateral spur Grade 2 Dorsal, lateral, and possibly medial osteophytes with a flattened appearance of the metatarsal head, no more than ¼ of dorsal joint space involved on the lateral radiograph, and mild to moderate joint space narrowing and sclerosis, sesamoids usually not involved Grade 3 Substantial joint space narrowing, periarticular cystic changes, more than ¼ of dorsal joint space involved, sesamoids are enlarged, cystic, and/or irregular Statistics.The inter-and intra-observer reliability for classifying the hallux rigidus was calculated with the use of weighted Kappa coefficients by using the SPSS software (Version 16.0; SPSS Inc., Chicago, IL).According to guidelines described by Landis and Koch, 49 a value of ≤ 0.2 indicates "poor" or "slight" agreement, 0.21 to 0.40 is "fair" agreement, 0.41 to 0.6 is "moderate" agreement, 0.61 to 0.8 is "substantial" agreement, and > 0.80 is "excellent" agreement.In addition, the percentage of patients where all four examiners agreed on the grade was determined.

Results
Of the 80 patients diagnosed with hallux rigidus from the two foot and ankle surgeons' patient populations, 61 patients (76%) met the required criteria and were included in this study.For intra-observer reliability, most of the attending surgeons showed "excellent" agreement by using the Giannini-modified Coughlin and Shurnas' classification systems to grade the hallux rigidus of the foot (mean weighted Kappa coefficient: 0.82 ± 0.07; range: 0.72 -0.88;Table 2).These results implied that each rater agreed well with themselves when reading the same radiographs at different time points.Only one of the raters had a "substantial" agreement (weighted Kappa coefficient of 0.72).For inter-observer reliability, only 14 out of the 61 cases (23%) showed total agreement between the eight readings from the four surgeons, and 11 out of the 14 cases (79%) were grade 3 hallux rigidus.Figures 1  and 2 illustrate "excellent" agreement cases for Grade 2 and Grade 3 hallux rigidus, respectively.Most of the cases showed "excellent" agreement within one grade difference (53 out of 61, 87%) and the mean weighted Kappa was 0.64 ± 0.13 (range: 0.44 -0.83).Figure 3 shows an example of poor agreement.One of the raters had a tendency to grade the hallux rigidus radiographs at a higher grade than the other three raters, resulting in poorer agreement.If this rater was excluded, the results show a "substantial" agreement by using this classification to grade the hallux rigidus of the foot (mean weighted Kappa coefficient: 0.76 ± 0.06; range: 0.68 -0.83).

Discussion
Hallux rigidus is a common form of osteoarthrosis in the foot. 50Radiographic examination, including weight-bearing AP and lateral radiographs, usually finds asymmetric joint narrowing and a flattened metatarsal head.The lateral radiographs usually are the most revealing.With advancement of the disease, more of the joint surface is involved.Subchondral cysts, sclerosis, and bony proliferation at the joint margins occur and the joint narrowing progresses. 19,31,51With the use of the radiographic grading system, orthopaedic surgeons should be able to provide optimum care to patients who have these common acquired disorders of the foot.The Gianninimodified Coughlin and Shurnas' classification system, like all other classification systems, is intended to aid clinical decision-making for treatment as well as to provide a reasonably precise estimation of the treatment outcome for hallux rigidus.There are many other hallux rigidus classification systems which are very similar to each other.This study used the Giannini-modified Coughlin and Shurnas' classification system because it is widely referred in studies.However, this classification system relies on radiographic findings, regardless of subjective and clinical findings.To be useful, a classification should have at least moderate rater consistency.The results of this study indicated that this particular grading system should be used with caution, as only 75% reach "excellent" agreement for intraobserver reliability, and "moderate" to "substantial" agreement for the interobserver reliability.The practical utility of having a system with only high intraobserver reliability is questionable, and it likely would not provide any help with communications between physicians or researchers regarding the population in their studies.
A major point of concern with a radiograph-only system for hallux rigidus was that radiographs are only a part of the evaluation of a patient with hallux rigidus.Coughlin et al. 10 addressed this concern and included a fourth category for patients with pain in the midrange of motion (a clinical finding) and grade 3 radiographic changes.An ideal study with this subject would have both a radiographic and a clinical exam component which would reproduce the clinician's experience treating this disorder more closely.The logistics of such a study likely would be difficult.
To achieve optimal results, surgical treatment should be individualized with use of different surgical techniques depending upon the degree of arthritis and other clinical considerations.
Non-operative treatment, including modifications of shoe wear, use of a shoe insert, and use of antiinflammatory medication, should be discussed in detail with the patient in accordance to the degree of symptoms. 10,52f non-operative measures fail, operative intervention, such as arthrodesis, arthroplasty, cheilectomy, proximal phalanx osteotomy, dorsal closing wedge osteotomy, waterman green, Youngswick, Reverdin green, distal oblique sliding osteotomy, sagittal Z osteotomy, and Drago may be indicated. 53Cheilectomy, which essentially consists of a debridement arthroplasty of the joint, may be appropriate. 54,55Once more extensive involvement has occurred, arthrodesis is preferred for younger patients whereas resection arthroplasty may be more appropriate for elderly patients who have a less active lifestyle. 56Taranow et al 57 recently presented a different classification system and surgical algorithm for treatment of the varied manifestations of hallux rigidus.This classification includes radiographic findings, motion restriction, and location of pain to guide appropriate surgical choices better.They also recommended procedures to preserve motion, when present, and address the significance of mid-motion and sesamoid pain.
In this study, there were several limitations.First, this was a pilot study that addresses an area where further research is needed.The sample size was relatively small and patients were only drawn from practices of two local foot and ankle surgeons.As such, only four raters were included in the study and bias of an outlier potentially could affect the inter-observer reliability substantially.Furthermore, each rater only evaluated the hallux rigidus radiographs on two occasions.
This study was limited due to the presence of fewer "normal" radiographs rather than "abnormal" radiographs.This also was a retrospective study evaluating a single radiographic classification system.Further research should include a larger sample size, multiple foot and ankle surgeons, and patients should be followed prospectively to assess the validity of the classification system treatment outcome and establish guidelines that would allow orthopedists to allocate their treatment more efficiently.

Conclusion
This study was the first to evaluate the reliability of any hallux rigidus radiographic grading system.Overall, this hallux rigidus radiographic grading system should be used with caution as the results showed that even though there is an "excellent" level of intraobserver agreement, but there is only "moderate" to "substantial" level of interobserver reliability.As is common in many orthopaedic grading systems, the overall reliability of this grading system was not "excellent", thus they may cause confusion with communication in the literature regarding the treatment of hallux rigidus.Further studies are encouraged and needed to support the conclusion of this study.