Advertisement

Clinical instrument to retrospectively capture levels of EDSS

Published:December 03, 2019DOI:https://doi.org/10.1016/j.msard.2019.101884

      Highlights

      • The EDSS is a vital outcome measure in MS, but is only obtained prospectively.
      • CIRCLE was developed to extract a retrospective EDSS level from clinical notes.
      • This work validates the accuracy and precision of CIRCLE compared to a formal EDSS.

      Abstract

      Background

      The Expanded Disability Status Scale (EDSS), a common outcome measure in Multiple Sclerosis (MS), is obtained prospectively through a direct standardized evaluation. The objective of this study is to develop and validate an algorithm to derive EDSS scores from previous neurological clinical documentation.

      Methods

      The algorithm utilizes data from the history, review of systems, and physical exam. EDSS scores formally obtained from research patients were compared to captured EDSS (c‐EDSS) scores. To test inter‐rater reliability, a second investigator captured scores from a subset of patients. Agreement between formal and c-EDSS scores was assessed using a weighted kappa. Clinical concordance was defined as a difference of one-step in EDSS (0.5) and functional system (1.0) scores.

      Results

      Clinical documentation from 92 patients (EDSS range 0.0–8.5) was assessed. Substantial agreement between the c‐EDSS and formal EDSS (kappa 0.80; 95% CI 0.74–0.86) was observed. The mean difference between scores was 0.16. The clinical concordance was 78%. Near-perfect agreement was found between the two raters (kappa 0.89; 95% CI 0.84–0.95). The mean inter-rater difference in c-EDSS was 0.23.

      Conclusions

      This algorithm reliably captures EDSS scores retrospectively with substantial correlation with formal EDSS and high inter‐rater agreement. This algorithm may have practical implications in clinic, MS research and clinical trials.

      Keywords

      1. Background

      Quantifying the degree of disability caused by multiple sclerosis (MS) is the cornerstone of MS research and clinical trials (
      • Cohen J.A.
      • Reingold S.C.
      • Polman C.H.
      • Wolinsky J.S.
      International advisory committee on clinical trials in multiple S. disability outcome measures in multiple sclerosis clinical trials: current status and future prospects.
      ). The Expanded Disability Status Scale (EDSS) is the most widely used disability outcome measure for MS and is accepted by regulatory authorities for use in research studies and clinical trials. It is obtained through direct standardized evaluation of the person with MS (pwMS) (
      • Kurtzke J.F.
      Rating neurologic impairment in multiple sclerosis: an expanded disability status scale (EDSS).
      ). Seven Functional Systems (FS) (vision, brainstem, pyramidal, cerebellar, sensory, bowel/bladder, and cerebral) are evaluated and scored. The overall EDSS score, ranging from 0–10, is based on FS scores and ambulatory status (
      • Kurtzke J.F.
      Rating neurologic impairment in multiple sclerosis: an expanded disability status scale (EDSS).
      ).
      Although current guidelines recommend documentation of a quantitative measure of disability in the clinical setting (
      • Rae-Grant A.
      • Bennett A.
      • Sanders A.E.
      • Phipps M.
      • Cheng E.
      • Bever C.
      Quality improvement in neurology: multiple sclerosis quality measures: executive summary.
      ), the EDSS is not easily assessed and tracked in clinical practice due to its required specialized training, complexity, and time-intensive nature (
      • Rae-Grant A.
      • Bennett A.
      • Sanders A.E.
      • Phipps M.
      • Cheng E.
      • Bever C.
      Quality improvement in neurology: multiple sclerosis quality measures: executive summary.
      ;
      • Baldassari L.E.
      • Salter A.R.
      • Longbrake E.E.
      • Cross A.H.
      • Naismith R.T.
      Streamlined EDSS for use in multiple sclerosis clinical practice: development and cross-sectional comparison to EDSS.
      ). Attempts to streamline the EDSS have shown relatively good agreement with formal EDSS scores (
      • Baldassari L.E.
      • Salter A.R.
      • Longbrake E.E.
      • Cross A.H.
      • Naismith R.T.
      Streamlined EDSS for use in multiple sclerosis clinical practice: development and cross-sectional comparison to EDSS.
      ). Other attempts that used patient-derived data (either by written questionnaire or telephone interview) showed varying degrees of agreement with the formal EDSS, especially at lower scores (
      • Bowen J.
      • Gibbons L.
      • Gianas A.
      • Kraft G.H.
      Self-administered expanded disability status scale with functional system scores correlates well with a physician-administered test.
      ;
      • Collins C.D.
      • Ivry B.
      • Bowen J.D.
      • Cheng E.M.
      • Dobson R.
      • Goodin D.S.
      • et al.
      A comparative analysis of patient-reported expanded disability status scale tools.
      ;
      • Huda S.
      • Cavey A.
      • Izat A.
      • Mattison P.
      • Boggild M.
      • Palace J.
      Nurse led telephone assessment of expanded disability status scale assessment in MS patients at high levels of disability.
      ;
      • Ingram G.
      • Colley E.
      • Ben-Shlomo Y.
      • Cossburn M.
      • Hirst C.L.
      • Pickersgill T.P.
      • et al.
      Validity of patient-derived disability and clinical data in multiple sclerosis.
      ;
      • Lechner-Scott J.
      • Kappos L.
      • Hofman M.
      • Polman C.H.
      • Ronner H.
      • Montalban X.
      • et al.
      Can the expanded disability status scale be assessed by telephone?.
      ). Retrospective capture of EDSS based on recollection of symptoms by pwMS has been useful primarily for obtaining major benchmarks (
      • Ingram G.
      • Colley E.
      • Ben-Shlomo Y.
      • Cossburn M.
      • Hirst C.L.
      • Pickersgill T.P.
      • et al.
      Validity of patient-derived disability and clinical data in multiple sclerosis.
      ).
      Despite the specialized training that is often required for the EDSS, its inter-rater inconsistencies were high (
      • Ontaneda D.
      • Fox R.J.
      • Chataway J.
      Clinical trials in progressive multiple sclerosis: lessons learned and future perspectives.
      ;
      • Kappos L.
      • Bar-Or A.
      • Cree B.A.C.
      • Fox R.J.
      • Giovannoni G.
      • Gold R.
      • et al.
      Siponimod versus placebo in secondary progressive multiple sclerosis (EXPAND): a double-blind, randomised, phase 3 study.
      ;
      • Noseworthy J.H.
      • Vandervoort M.K.
      • Wong C.J.
      • Ebers G.C.
      Interrater variability with the expanded disability status scale (EDSS) and functional systems (FS) in a multiple sclerosis clinical trial. The Canadian Cooperation MS Study Group.
      ;
      • Landis J.R.
      • Koch G.G.
      The measurement of observer agreement for categorical data.
      ;
      • Plemel J.R.
      • Liu W.Q.
      • Yong V.W.
      Remyelination therapies: a new direction and challenge in multiple sclerosis.
      ;
      • Amato M.P.
      • Fratiglioni L.
      • Groppi C.
      • Siracusa G.
      • Amaducci L.
      Interrater reliability in assessing functional systems and disability on the Kurtzke scale in multiple sclerosis.
      ;
      • Francis D.A.
      • Bain P.
      • Swan A.V.
      • Hughes R.A.
      An assessment of disability rating scales used in multiple sclerosis.
      ;
      • Goodkin D.E.
      • Cookfair D.
      • Wende K.
      • Bourdette D.
      • Pullicino P.
      • Scherokman B.
      • et al.
      Inter- and intrarater scoring agreement using grades 1.0 to 3.5 of the Kurtzke expanded disability status scale (EDSS). Multiple Sclerosis Collaborative Research Group.
      ), especially at lower EDSS levels (
      • Landis J.R.
      • Koch G.G.
      The measurement of observer agreement for categorical data.
      ); thus, the more rigorous Neurostatus EDSS was developed to improve inter-rater reliability (
      • Cohen J.A.
      • Reingold S.C.
      • Polman C.H.
      • Wolinsky J.S.
      International advisory committee on clinical trials in multiple S. disability outcome measures in multiple sclerosis clinical trials: current status and future prospects.
      ;
      • Kappos L.
      • D’Souza M.
      • Lechner-Scott J.
      • Lienert C.
      On the origin of Neurostatus.
      ). An algorithmic electronic scoring approach to the Neurostatus has further improved its inter-rater reliability and consistency (
      • D’Souza M.
      • Yaldizli O.
      • John R.
      • Vogt D.R.
      • Papadopoulou A.
      • Lucassen E.
      • et al.
      Neurostatus e-Scoring improves consistency of expanded disability status scale assessments: a proof of concept study.
      ).
      To our knowledge, no validated tool is commonly used to retrospectively capture EDSS scores using data from prior clinical documentation (
      • Cohen J.A.
      • Reingold S.C.
      • Polman C.H.
      • Wolinsky J.S.
      International advisory committee on clinical trials in multiple S. disability outcome measures in multiple sclerosis clinical trials: current status and future prospects.
      ;
      • Ontaneda D.
      • Fox R.J.
      • Chataway J.
      Clinical trials in progressive multiple sclerosis: lessons learned and future perspectives.
      ). This has implications for retrospective chart review studies and clinical trials, such as documentation of EDSS progression (or confirmed stability) as a requirement for study entry (
      • Ontaneda D.
      • Fox R.J.
      • Chataway J.
      Clinical trials in progressive multiple sclerosis: lessons learned and future perspectives.
      ;
      • Kappos L.
      • Bar-Or A.
      • Cree B.A.C.
      • Fox R.J.
      • Giovannoni G.
      • Gold R.
      • et al.
      Siponimod versus placebo in secondary progressive multiple sclerosis (EXPAND): a double-blind, randomised, phase 3 study.
      ). We aimed to develop and validate a standardized algorithm to retrospectively derive an EDSS score by review of neurology clinical documentation to address the gap in retrospective disability evaluation.

      2. Methods

      2.1 Development of algorithm

      To develop the Clinical Instrument to Retrospectively Capture Levels of the EDSS (CIRCLE), all available objective and subjective data from a clinical note (including neurologic examination, history, and review of systems) were matched to elements belonging to each FS (
      • Kurtzke J.F.
      Rating neurologic impairment in multiple sclerosis: an expanded disability status scale (EDSS).
      ). Several principles were defined to guide the development of the algorithm:
      • Objective findings from the neurologic examination are prioritized, but subjective data are used when objective information is missing, or for certain FS (e.g., bowel/bladder problems, fatigue).
      • Because worse signs and symptoms produce higher FS scores, the algorithm handles duplicate data by prioritizing more severe and disabling abnormalities within each FS, as well as for the overall EDSS (i.e., ambulation is assessed first because restricted ambulation produces a high EDSS score).
      • To address inconsistencies or missing data, a default severity of “moderate” is applied if the severity of a symptom/sign is not clear in the records.
      • If no abnormality within a FS is mentioned in the chart, it is scored as normal.
      • Optional adjustments are included for comparison to the Neurostatus scoring system (i.e., adjustments to sensory and bowel/bladder FS, inclusion of disc pallor and fatigue) (
        • Ontaneda D.
        • Fox R.J.
        • Chataway J.
        Clinical trials in progressive multiple sclerosis: lessons learned and future perspectives.
        ).

      2.2 Use of algorithm

      Full CIRCLE instructions are available in Appendix A. Briefly, users review the office note and mark all abnormalities on a scoring sheet (Appendix B). In addition to the neurologic examination, users review the subjective and review of systems portions of the note. For example, data on visual acuity, visual field testing and any mention of visual symptoms are pertinent components for the visual FS.
      Using data from the scoring sheet, abnormalities within each FS are considered in a stepwise fashion based on their degree of severity (i.e., paraplegia is considered before abnormal reflexes; see Fig. 1 for an example of how to score the pyramidal FS). Even if there is a discrepancy in severity within an FS, the algorithm prioritizes more severe abnormalities (with the exception of ambulation). When no abnormalities are mentioned, the FS score is zero.
      Fig. 1
      Fig. 1Example of pyramidal system scoring. The pyramidal FS is scored by first checking whether any muscle weakness is present. More profound weakness or weakness of more limbs leads to a higher FS score. If no muscle weakness is present, other symptoms including motor fatigability, gait difficulty, and any disability are assessed; if any are present, the FS is scored as 2. If none of these symptoms are endorsed, signs only on exam may lead to an FS of 1; otherwise, the FS is scored as 0. Within an FS, once an FS score is established (as denoted by the asterisk), there is no need to proceed further with the algorithm.
      The instrument also generates an ambulation score, allowing direct comparison to be made with Neurostatus ambulation scores. Similar to the derivation of a formal EDSS, FS scores are used to calculate an overall captured EDSS (c-EDSS) score using the EDSS CIRCLE and ambulation table (Fig. 2, Appendix C).
      Fig. 2
      Fig. 2EDSS CIRCLE. The EDSS CIRCLE is used to calculate a c-EDSS score by determining the highest FS score first (starting in the center-most ring). The frequency of the highest FS score and lesser FS scores are used to determine which final c-EDSS score applies (outer-most ring in the corresponding sector). For patients with restricted ambulation, the ambulation table determines the c-EDSS score (Appendix C); individual FS scores may still be calculated for these patients using the scoring sheet (Appendix B).

      2.3 Initial testing

      An initial version of the algorithm was tested in 20 pwMS (from investigator-initiated studies) from the University of Pennsylvania, and the c-EDSS scores were directly compared to formal EDSS scores. Minor adjustments were made to correct for any systematic score deviations and supplement for missing data; specifically, the use of subjective complaints when objective findings were lacking was added. The final version of the algorithm is the version tested and validated herein.

      2.4 Patient selection and chart review

      Patients included in this study were culled from a convenience sample taken from databases of research patients from three investigator-initiated studies at Washington University in St. Louis (WUSM) (two longitudinal imaging studies, one study into use of dalfampridine for vision). All 138 patients were screened for study eligibility. Many of these patients are also managed clinically at the same institution. All pwMS in these studies had formal research evaluations including original EDSS scoring as part of their research protocols, and all were sequentially assessed for inclusion in this analysis.
      For each patient, the most proximate clinical documentation to the date of the formal EDSS was reviewed. PwMS without any clinical documentation or with distant documentation (i.e. more than one month before or six months after the formal EDSS) were excluded. We attempted to confirm clinical stability (no worsening disability or relapses) in the interval period. For those patients in whom clinical stability could not be reliably confirmed, a sensitivity analysis was performed as detailed below.
      The primary rater (JC) reviewed the clinical neurology note and recorded the data into the CIRCLE scoring sheet. To test inter-rater reliability, a second rater (SC) captured scores from the first 50 consecutive eligible pwMS. CIRCLE raters were blinded to the formal EDSS scores and to the other rater's c-EDSS scores.

      2.5 Statistical analysis

      Agreement between the formal EDSS scores and the c-EDSS score was assessed using a weighted kappa and 95% confidence interval (CI). Similar comparison was made for each FS score. Prior studies have suggested that a one-step change in the EDSS or FS is not likely clinically significant (
      • Noseworthy J.H.
      • Vandervoort M.K.
      • Wong C.J.
      • Ebers G.C.
      Interrater variability with the expanded disability status scale (EDSS) and functional systems (FS) in a multiple sclerosis clinical trial. The Canadian Cooperation MS Study Group.
      ). A measure of clinical concordance was thus defined as a difference of one step in either the EDSS (0.5 points) or the individual FS (1.0 points). Kappas and 95% CIs were also evaluated in a sensitivity analysis to assess the effect of time interval between formal EDSS and clinical visit for c-EDSS determination. A kappa of 0 to 0.20 is considered slight agreement; 0.20–0.40 fair agreement; 0.40–0.60 moderate agreement; 0.60–0.80 substantial agreement; and 0.80–1.00 almost perfect agreement (
      • Landis J.R.
      • Koch G.G.
      The measurement of observer agreement for categorical data.
      ). The inter-rater agreement was assessed using a weighted kappa and 95% CI for the EDSS score and each FS score. The above measure of clinical concordance is reported as well.
      To maximize the sample size, clinical notes from one month prior to six months after the formal EDSS date were included. However, because clinically stability cannot be reliably confirmed for visits before a formal EDSS or for relatively distant visits after the formal EDSS, a sensitivity analysis was also performed, with a c-EDSS from a second clinical note substituted into the analysis for some pwMS with relatively distant clinical notes from the formal EDSS date, or for those without confirmed clinical stability (relative to the formal EDSS date) (Fig. 3).
      Fig. 3
      Fig. 3Notes included in sensitivity analysis. For patients whose most proximal clinical note was not authored within a week (before or after) of the formal EDSS, the CIRCLE algorithm was performed on a second clinical note, and the results were substituted into a secondary analysis to assess differences in rater agreement for the instrument's sensitivity over time. If the proximal note was recorded one month to one week prior to the formal EDSS, the algorithm was repeated on a second clinical note from after the formal EDSS (regardless of interval). Similarly, for notes obtained more than one week after the formal EDSS, a second clinical note from a date prior to the formal EDSS (regardless of interval) was substituted into the sensitivity analysis (exceptions could be made for a proximal note within one month after the formal EDSS that explicitly documented stability).

      3. Results

      3.1 Patients

      138 consecutive WUSM study patients with MS were considered for this analysis after applying the inclusion criteria; 46 pwMS were excluded, either due to no usable clinical documentation (3 pwMS), or clinical notes falling outside of the time criteria (43 pwMS); 92 pwMS were included in the analysis. Thirteen different neurologists authored the clinical documentation which was the basis for the c-EDSS score. The clinical note was 44 ± 53 (mean ± standard deviation) days from the date of the formal EDSS. All MS subtypes were included, covering a broad range of disability level (EDSS 0.0–8.5) (Table 1).
      Table 1Patient characteristics.
      Number of patients92
      Age, yearsAvg (SD)Range
      49.7 (12.2)21.6–74.0
      Interval, daysAvg (SD)Range
      44.3 (53.1)-30–172
      Gender
      Male21 (22.8%)
      Female71 (77.2%)
      EDSS
      0–2.527 (29.3%)
      3.0–5.535 (38.0%)
      6.0–8.530 (32.6%)
      MS Subtype
      RRMS57 (62.0%)
      PPMS15 (16.3%)
      SPMS20 (21.7%)

      3.2 Algorithm performance/agreement

      Substantial agreement between the c-EDSS obtained using the algorithm and the formal EDSS (kappa 0.80, 95% CI 0.75–0.86) was achieved (see Table 2, Fig. 4). The mean and median difference between c-EDSS and formal EDSS scores was 0.16 and 0.5, respectively; the mean absolute difference between c-EDSS and formal EDSS scores (i.e., the difference between the two scores irrespective of which score is higher or lower) was 0.43. In 44 of the 92 pwMS (47.8%), the c-EDSS matched the formal EDSS exactly. When the c-EDSS did not match the formal EDSS, the most frequent difference was 0.5 (30.4%). Agreement for individual FS varied from fair (kappa 0.37 for Cerebral) to moderate (kappa 0.62 for Pyramidal). When evaluated for clinically-relevant concordance (overall EDSS within 0.5 and individual FS within 1.0), the c-EDSS was clinically concordant with the formal EDSS at a rate of 78%. With the exception of Vision (79%), all individual Functional Systems were clinically concordant at least 80% of the time.
      Table 2Validation of c-EDSS (comparison to formal EDSS).
      Primary AnalysisExact agreementClinical concordance
      Clinical concordance defined as within 0.5 (inclusive) of overall EDSS and within 1.0 (inclusive) of individual Functional Systems.
      Sensitivity Analysis
      Kappa (95% CI)SEFreq (%)Freq (%)Kappa (95% CI)
      EDSS0.80 (0.74–0.86)0.0344/92 (47.8%)72/92 (78.3%)0.79 (0.73–0.85)
      Functional Systems
      Vision0.47 (0.33–0.61)0.0751/92 (55.4%)73/92 (79.3%)0.62 (0.50–0.74)
      Brainstem0.46 (0.31–0.62)0.0860/92 (65.2%)78/92 (84.8%)0.48 (0.32–0.64)
      Pyramidal0.62 (0.51–0.72)0.0551/92 (55.4%)84/92 (91.3%)0.67 (0.57–0.76)
      Cerebellar0.50 (0.37–0.64)0.0752/92 (56.5%)78/92 (84.8%)0.58 (0.46–0.70)
      Sensory0.47 (0.34–0.60)0.0745/92 (48.9%)80/92 (87.0%)0.40 (0.27–0.53)
      Bowel/Bladder
      One Bowel/Bladder FS score missing from source data for formal EDSS.
      0.44 (0.31–0.58)0.0740/91 (44.0%)76/91 (83.5%)0.44 (0.30–0.57)
      Cerebral0.37 (0.20–0.54)0.0957/92 (62.0%)74/92 (80.4%)0.36 (0.20–0.53)
      low asterisk Clinical concordance defined as within 0.5 (inclusive) of overall EDSS and within 1.0 (inclusive) of individual Functional Systems.
      low asterisklow asterisk One Bowel/Bladder FS score missing from source data for formal EDSS.
      Fig. 4
      Fig. 4Agreement of c-EDSS and formal EDSS. There is substantial agreement between the c-EDSS obtained using the algorithm and the formal EDSS (kappa 0.80, 95% CI 0.75–0.86). In 47.8% of patients, there is exact agreement between the c-EDSS and the formal EDSS scores.
      Of the 92 patients included in the analysis, 40 had a clinical note within one week before or after the formal EDSS, or within one month after the formal EDSS with documented stability. One patient had a note within one month after the formal EDSS that did explicitly document stability (and thus was included in the sensitivity analysis). The remaining 51 pwMS had a more distant note from the formal EDSS evaluated using the CIRCLE algorithm that was substituted for the sensitivity analysis. Results for the sensitivity analysis were comparable to the primary analysis (kappa 0.79, 95% CI 0.73–0.85), with mean difference between c-EDSS and formal EDSS of 0.45 (median difference was 0.5). The agreement for individual FS scores was similar to the primary analysis (Table 2).

      3.3 Inter-rater agreement

      For the 50 pwMS reviewed by a second investigator, almost perfect agreement between the two raters (kappa 0.89, 95% CI 0.84–0.95) was achieved (see Table 3, Fig. 5). The mean difference in c-EDSS between the two raters was 0.23. In 34 of the 50 pwMS (68%), the two raters arrived at the same c-EDSS (median difference between raters was 0.0). When the two raters do not exactly agree, the most frequent difference is 0.5, the smallest possible incremental difference in EDSS scores. Agreement on individual Functional Systems ranged from kappa 0.56 (Brainstem) to kappa 0.93 (Vision). Although Ambulation scores were not formally obtained in research patients, the algorithm also allows for Ambulation to be scored and compared between raters; the two raters agree near-perfectly in this domain (kappa 0.94). Using the measure of clinically-relevant concordance, EDSS scores from the two raters were 92% concordant. All individual FS scores from the two raters are clinically concordant with each other greater than 80% of the time.
      Table 3Inter-rater comparison of c-EDSS.
      Exact agreementClinical concordance
      Clinical concordance defined as within 0.5 (inclusive) of overall EDSS and within 1.0 (inclusive) of individual Functional Systems.
      Kappa (95% CI)SEFreq (%)Freq (%)
      EDSS0.89 (0.84–0.95)0.0334/50 (68%)46/50 (92%)
      Functional Systems
      Ambulation
      Included in Neurostatus scoring; one Ambulatory score missing from source data for formal EDSS.
      0.94 (0.88–1.00)0.0345/49 (92%)46/49 (94%)
      Vision0.93 (0.86–1.00)0.0446/50 (92%)50/50 (100%)
      Brainstem0.56 (0.34–0.78)0.1136/50 (72%)42/50 (84%)
      Pyramidal0.83 (0.75–0.91)0.0438/50 (76%)50/50 (100%)
      Cerebellar0.59 (0.40–0.78)0.1041/50 (82%)41/50 (82%)
      Sensory0.73 (0.62–0.85)0.0635/50 (70%)46/50 (92%)
      Bowel/Bladder0.63 (0.46–0.81)0.0933/50 (66%)49/50 (98%)
      Cerebral0.63 (0.41–0.84)0.1141/50 (82%)43/50 (86%)
      low asterisk Clinical concordance defined as within 0.5 (inclusive) of overall EDSS and within 1.0 (inclusive) of individual Functional Systems.
      low asterisklow asterisk Included in Neurostatus scoring; one Ambulatory score missing from source data for formal EDSS.
      Fig. 5
      Fig. 5Inter-rater agreement. There is almost perfect agreement between the c-EDSS obtained by two independent raters (kappa 0.89, 95% CI 0.84–0.95). In 68% of patients, the two raters arrive at exactly the same c-EDSS.

      4. Discussion

      This study introduces a new method to accurately obtain EDSS scores from prior clinical documentation. Scores obtained retrospectively with the CIRCLE algorithm had substantial agreement with formal EDSS scores. Furthermore, the inter-rater reliability of the CIRCLE was very strong, further confirming the reliability and reproducibility of the algorithm.
      This algorithm may be a useful resource for clinical and research purposes. In clinical practice, it can serve as a template to standardize current and prior neurological evaluations (including subjective and objective findings), allowing for comparisons between providers across time points. A more standardized patient assessment may allow for earlier and more accurate detection of subtle signs of progression which can have an impact on disease management (
      • Rae-Grant A.
      • Bennett A.
      • Sanders A.E.
      • Phipps M.
      • Cheng E.
      • Bever C.
      Quality improvement in neurology: multiple sclerosis quality measures: executive summary.
      ).
      Many opportunities exist for use of the algorithm in research and clinical trials. Chart review studies (that previously lacked a reliable disability outcome measure) can utilize the algorithm to obtain an accurate assessment of disability from prior notes. Similarly, the standardized prospective or retrospective documentation of disability scores can help secure a disease subtype more accurately and confirm the presence (or absence) of disability progression, which would enhance investigators’ ability to confirm that patients meet eligibility criteria for certain interventional clinical trials (
      • Kappos L.
      • Bar-Or A.
      • Cree B.A.C.
      • Fox R.J.
      • Giovannoni G.
      • Gold R.
      • et al.
      Siponimod versus placebo in secondary progressive multiple sclerosis (EXPAND): a double-blind, randomised, phase 3 study.
      ). With the increasing focus on progressive MS subtypes in clinical trials, and the introduction of more trials focused on repair (
      • Plemel J.R.
      • Liu W.Q.
      • Yong V.W.
      Remyelination therapies: a new direction and challenge in multiple sclerosis.
      ), this algorithm will prove a timely tool to improve recruitment into such trials.
      The algorithm performed consistently well in a sample of varied MS patients with different disease subtypes and disability levels, and on notes recorded by more than 10 different neurologists with varying style of documentation and exam templates. The mean difference of only 0.16 between the c-EDSS and formal EDSS scores suggests that it does not systematically over or under-score disability levels in patients. Furthermore, the instrument requires only minimal training, and the note review and EDSS scoring takes only a few minutes for a trained rater to perform in full. Simplifying the approach to the EDSS, removing redundancy, and prioritizing systems and symptoms that lead to higher c-EDSS scores first (i.e., ambulation) contributes to time savings. Furthermore, the algorithmic backbone of the instrument allows for automation of the FS and EDSS score calculations. Other groups have suggested the potential for automation and integration of novel, streamlined EDSS tools into electronic medical record (EMR) systems (
      • Baldassari L.E.
      • Salter A.R.
      • Longbrake E.E.
      • Cross A.H.
      • Naismith R.T.
      Streamlined EDSS for use in multiple sclerosis clinical practice: development and cross-sectional comparison to EDSS.
      ). Our colleagues at Washington University, using EDSS data from the Combi-Rx trial, notably streamlined the EDSS to only the elements that may be seen in a clinical evaluation and still demonstrated moderate agreement with the formally obtained EDSS (kappa for agreement 0.57) in a retrospective validation study of nearly 1000 patients (
      • Baldassari L.E.
      • Salter A.R.
      • Longbrake E.E.
      • Cross A.H.
      • Naismith R.T.
      Streamlined EDSS for use in multiple sclerosis clinical practice: development and cross-sectional comparison to EDSS.
      ). Their work and ours suggests the feasibility of embedding a streamlined, standardized EDSS into EMR systems to simultaneously produce an exam in the office note and automatically calculate an EDSS score for the visit (
      • Rae-Grant A.
      • Bennett A.
      • Sanders A.E.
      • Phipps M.
      • Cheng E.
      • Bever C.
      Quality improvement in neurology: multiple sclerosis quality measures: executive summary.
      ).
      Other attempts to streamline or simplify the EDSS using patient-derived data has been shown to be generally precise, but may be inaccurate; one study demonstrated that patients tended to over-score themselves by 0.5–0.7 on the EDSS on average (56–65% scored within 0.5 of the formal EDSS, and 77–82% scored within 1 point of the formal EDSS) (
      • Bowen J.
      • Gibbons L.
      • Gianas A.
      • Kraft G.H.
      Self-administered expanded disability status scale with functional system scores correlates well with a physician-administered test.
      ). An analysis of this and other scoring systems confirmed that, at EDSS < 6, patient-derived rating scales generally tend to overestimate the EDSS by about 0.5 (
      • Collins C.D.
      • Ivry B.
      • Bowen J.D.
      • Cheng E.M.
      • Dobson R.
      • Goodin D.S.
      • et al.
      A comparative analysis of patient-reported expanded disability status scale tools.
      ). Another group using patient-derived data via written questionnaire or telephone interview was able to achieve substantial correlation with the most current formally-obtained EDSS (coefficient of correlation: 0.79), but attempts to retrospectively capture an EDSS based on the patient's recollection of symptoms were limited to major benchmarks in the stepwise scoring system (i.e., when EDSS = 6 [use of assistive device for ambulation]) rather than the full continuous EDSS scale (
      • Ingram G.
      • Colley E.
      • Ben-Shlomo Y.
      • Cossburn M.
      • Hirst C.L.
      • Pickersgill T.P.
      • et al.
      Validity of patient-derived disability and clinical data in multiple sclerosis.
      ). Other attempts to capture an EDSS over the phone have either been limited to those patients with a baseline EDSS ≥ 6.0 (
      • Huda S.
      • Cavey A.
      • Izat A.
      • Mattison P.
      • Boggild M.
      • Palace J.
      Nurse led telephone assessment of expanded disability status scale assessment in MS patients at high levels of disability.
      ), where steps between EDSS values are explicitly determined by impairment in ambulation, or demonstrated poorer correlation at lower overall EDSS values or select Functional Systems (
      • Lechner-Scott J.
      • Kappos L.
      • Hofman M.
      • Polman C.H.
      • Ronner H.
      • Montalban X.
      • et al.
      Can the expanded disability status scale be assessed by telephone?.
      ).
      An important limitation of this study is to recognize that it does not endeavor to address the limitations of the EDSS itself. The inter-rater reliability of the EDSS has known shortcomings that have been examined in many older studies (
      • D’Souza M.
      • Yaldizli O.
      • John R.
      • Vogt D.R.
      • Papadopoulou A.
      • Lucassen E.
      • et al.
      Neurostatus e-Scoring improves consistency of expanded disability status scale assessments: a proof of concept study.
      ;
      • Noseworthy J.H.
      • Vandervoort M.K.
      • Wong C.J.
      • Ebers G.C.
      Interrater variability with the expanded disability status scale (EDSS) and functional systems (FS) in a multiple sclerosis clinical trial. The Canadian Cooperation MS Study Group.
      ;
      • Amato M.P.
      • Fratiglioni L.
      • Groppi C.
      • Siracusa G.
      • Amaducci L.
      Interrater reliability in assessing functional systems and disability on the Kurtzke scale in multiple sclerosis.
      ;
      • Francis D.A.
      • Bain P.
      • Swan A.V.
      • Hughes R.A.
      An assessment of disability rating scales used in multiple sclerosis.
      ;
      • Goodkin D.E.
      • Cookfair D.
      • Wende K.
      • Bourdette D.
      • Pullicino P.
      • Scherokman B.
      • et al.
      Inter- and intrarater scoring agreement using grades 1.0 to 3.5 of the Kurtzke expanded disability status scale (EDSS). Multiple Sclerosis Collaborative Research Group.
      ;
      • Noseworthy J.H.
      Clinical scoring methods for multiple sclerosis.
      ;
      • Sharrack B.
      • Hughes R.A.
      Clinical scales for multiple sclerosis.
      ;
      • Verdier-Taillefer M.H.
      • Zuber M.
      • Lyon-Caen O.
      • Clanet M.
      • Gout O.
      • Louis C.
      • et al.
      Observer disagreement in rating neurologic impairment in multiple sclerosis: facts and consequences.
      ) (i.e. prior to the development of the Neurostatus EDSS (
      • Kappos L.
      • D’Souza M.
      • Lechner-Scott J.
      • Lienert C.
      On the origin of Neurostatus.
      )). In one trial, only 69% of a subset of patients evaluated by multiple physicians on the same day had perfect agreement on the overall EDSS score, with equivalent or lower degrees of agreement for each FS category (
      • Noseworthy J.H.
      • Vandervoort M.K.
      • Wong C.J.
      • Ebers G.C.
      Interrater variability with the expanded disability status scale (EDSS) and functional systems (FS) in a multiple sclerosis clinical trial. The Canadian Cooperation MS Study Group.
      ); the same trial found that only 62% of the agreement between two raters could be explained other than by chance. Reproducible scoring across raters is even more challenging at lower EDSS scores; in another study of patients with EDSS 1.0–3.5, scores varied between experienced raters by as much as 1.5 EDSS points or 3.0 individual FS points (
      • Goodkin D.E.
      • Cookfair D.
      • Wende K.
      • Bourdette D.
      • Pullicino P.
      • Scherokman B.
      • et al.
      Inter- and intrarater scoring agreement using grades 1.0 to 3.5 of the Kurtzke expanded disability status scale (EDSS). Multiple Sclerosis Collaborative Research Group.
      ).
      The agreement between CIRCLE and formal EDSS scores in our study were at least equal to (if not better than) those reported in prior inter-rater studies. In addition to strong inter-rater reliability, our instrument demonstrates functionally relevant agreement with formal EDSS and FS scores. Defined as no more than a one step change in EDSS or FS scores (
      • Noseworthy J.H.
      • Vandervoort M.K.
      • Wong C.J.
      • Ebers G.C.
      Interrater variability with the expanded disability status scale (EDSS) and functional systems (FS) in a multiple sclerosis clinical trial. The Canadian Cooperation MS Study Group.
      ), clinical concordance of CIRCLE was at least 80% for the EDSS and nearly all FS scores.
      We attempted to use the most proximate clinical note to the formal EDSS and to confirm clinical stability in the note. It is possible that relapses not captured, subtle progression, or the potential salutary effect of dalfampridine (18% of patients included in this study were enrolled in a dalfampridine trial) in the interval period could be confounding some of the results (i.e., the agreement would be even higher if all notes were from the same day as the formal EDSS). However, results from our sensitivity analysis (using a temporally distant note for some patients) produced similar results to the primary analysis, suggesting that, in our sample, the interval from the formal evaluation to the clinical note plays a negligible role.
      This study validated the use of the instrument at one academic medical center. Furthermore, most of the notes used in this study were authored by MS specialists (whose history and exam may be more focused on MS-specific abnormalities). However, we believe that the instrument would perform well on any neurology note because the depth of descriptive information in the notes used in this study varied widely, and the algorithm is designed to use all available data from within an office note.
      Future directions for this tool may include formal assessment at different clinical institutions and in select patient populations (i.e., low vs. high disability, by MS subtype, by duration of disease, etc.). We did not perform subgroup analyses due to the relatively small overall number of patients in this study and the inherent issues in performing multiple subgroup analyses. Our group plans to perform a contemporaneous comparison of the CIRCLE with the traditional EDSS to further confirm the validity of the tool in clinical practice.

      5. Conclusion

      The CIRCLE algorithm is a quick and simple tool that achieved substantial agreement with formal EDSS scores. The tool produces scores within a smaller margin than the known inter-rater reliability of the original EDSS. The algorithm performed well using clinical documentation from 13 different neurologists and in pwMS of different subtypes and a full range of disability levels. Using the CIRCLE tool to capture an EDSS score from clinical documents may have implications for clinical care of pwMS, retrospective research studies and clinical trials, and can potentially be automated.

      Funding

      This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

      CRediT authorship contribution statement

      John Robert Ciotti: Conceptualization, Methodology, Investigation, Writing - original draft, Writing - review & editing. Noah Sanders: Conceptualization, Investigation, Writing - review & editing. Amber Salter: Formal analysis, Writing - review & editing. Joseph R. Berger: Writing - review & editing, Supervision. Anne Haney Cross: Writing - review & editing, Supervision. Salim Chahin: Conceptualization, Methodology, Investigation, Writing - original draft, Writing - review & editing, Supervision.

      Declaration of Competing Interest

      The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: John Ciotti has nothing to disclose. Noah Sanders has nothing to disclose. Amber Salter reports consulting fees for statistical reviews for Circulation: Cardiovascular Imaging. Joseph Berger reports grants and personal fees from Biogen, grants from TEVA, personal fees from Genentech/Roche, personal fees from Genzyme, personal fees from Millennium/Takeda, personal fees from Novartis, personal fees from Inhibikase, personal fees from ExcisionBio, personal fees from Roche, personal fees from Amgen, personal fees from Astra-Zeneca, personal fees from Alkermes, personal fees from Bayer. Anne Cross reports consulting honoraria from Biogen, Celgene, EMD Serono, Genentech/Roche, Novartis, and TG Therapeutics, and receives research support from Genentech and EMD Serono. Salim Chahin reports consulting and/or speaking honoraria from Biogen, Genentech, Sanofi Genzyme, Novartis, and Teva Neuroscience.

      Acknowledgements

      The authors would like to acknowledge the principal investigator on the studies from which formal EDSS scores were obtained: Drs. Robert Naismith at Washington University in St. Louis and Clyde Markowitz and Dina Jacobs at the University of Pennsylvania.

      Appendix. Supplementary materials

      References

        • Cohen J.A.
        • Reingold S.C.
        • Polman C.H.
        • Wolinsky J.S.
        International advisory committee on clinical trials in multiple S. disability outcome measures in multiple sclerosis clinical trials: current status and future prospects.
        Lancet Neurol. 2012; 11: 467-476
        • Kurtzke J.F.
        Rating neurologic impairment in multiple sclerosis: an expanded disability status scale (EDSS).
        Neurology. 1983; 33: 1444-1452
        • Rae-Grant A.
        • Bennett A.
        • Sanders A.E.
        • Phipps M.
        • Cheng E.
        • Bever C.
        Quality improvement in neurology: multiple sclerosis quality measures: executive summary.
        Neurology. 2015; 85: 1904-1908
        • Baldassari L.E.
        • Salter A.R.
        • Longbrake E.E.
        • Cross A.H.
        • Naismith R.T.
        Streamlined EDSS for use in multiple sclerosis clinical practice: development and cross-sectional comparison to EDSS.
        Mult. Scler. 2018; 24: 1347-1355
        • Bowen J.
        • Gibbons L.
        • Gianas A.
        • Kraft G.H.
        Self-administered expanded disability status scale with functional system scores correlates well with a physician-administered test.
        Mult. Scler. 2001; 7: 201-206
        • Collins C.D.
        • Ivry B.
        • Bowen J.D.
        • Cheng E.M.
        • Dobson R.
        • Goodin D.S.
        • et al.
        A comparative analysis of patient-reported expanded disability status scale tools.
        Mult. Scler. 2016; 22: 1349-1358
        • Huda S.
        • Cavey A.
        • Izat A.
        • Mattison P.
        • Boggild M.
        • Palace J.
        Nurse led telephone assessment of expanded disability status scale assessment in MS patients at high levels of disability.
        J. Neurol. Sci. 2016; 362: 66-68
        • Ingram G.
        • Colley E.
        • Ben-Shlomo Y.
        • Cossburn M.
        • Hirst C.L.
        • Pickersgill T.P.
        • et al.
        Validity of patient-derived disability and clinical data in multiple sclerosis.
        Mult. Scler. 2010; 16: 472-479
        • Lechner-Scott J.
        • Kappos L.
        • Hofman M.
        • Polman C.H.
        • Ronner H.
        • Montalban X.
        • et al.
        Can the expanded disability status scale be assessed by telephone?.
        Mult. Scler. 2003; 9: 154-159
        • Kappos L.
        • D’Souza M.
        • Lechner-Scott J.
        • Lienert C.
        On the origin of Neurostatus.
        Mult. Scler. Relat. Disord. 2015; 4: 182-185
        • D’Souza M.
        • Yaldizli O.
        • John R.
        • Vogt D.R.
        • Papadopoulou A.
        • Lucassen E.
        • et al.
        Neurostatus e-Scoring improves consistency of expanded disability status scale assessments: a proof of concept study.
        Mult. Scler. 2017; 23: 597-603
        • Ontaneda D.
        • Fox R.J.
        • Chataway J.
        Clinical trials in progressive multiple sclerosis: lessons learned and future perspectives.
        Lancet Neurol. 2015; 14: 208-223
        • Kappos L.
        • Bar-Or A.
        • Cree B.A.C.
        • Fox R.J.
        • Giovannoni G.
        • Gold R.
        • et al.
        Siponimod versus placebo in secondary progressive multiple sclerosis (EXPAND): a double-blind, randomised, phase 3 study.
        Lancet. 2018; 391: 1263-1273
        • Noseworthy J.H.
        • Vandervoort M.K.
        • Wong C.J.
        • Ebers G.C.
        Interrater variability with the expanded disability status scale (EDSS) and functional systems (FS) in a multiple sclerosis clinical trial. The Canadian Cooperation MS Study Group.
        Neurology. 1990; 40: 971-975
        • Landis J.R.
        • Koch G.G.
        The measurement of observer agreement for categorical data.
        Biometrics. 1977; 33: 159-174
        • Plemel J.R.
        • Liu W.Q.
        • Yong V.W.
        Remyelination therapies: a new direction and challenge in multiple sclerosis.
        Nat. Rev. Drug Discov. 2017; 16: 617-634
        • Amato M.P.
        • Fratiglioni L.
        • Groppi C.
        • Siracusa G.
        • Amaducci L.
        Interrater reliability in assessing functional systems and disability on the Kurtzke scale in multiple sclerosis.
        Arch. Neurol. 1988; 45: 746-748
        • Francis D.A.
        • Bain P.
        • Swan A.V.
        • Hughes R.A.
        An assessment of disability rating scales used in multiple sclerosis.
        Arch. Neurol. 1991; 48: 299-301
        • Goodkin D.E.
        • Cookfair D.
        • Wende K.
        • Bourdette D.
        • Pullicino P.
        • Scherokman B.
        • et al.
        Inter- and intrarater scoring agreement using grades 1.0 to 3.5 of the Kurtzke expanded disability status scale (EDSS). Multiple Sclerosis Collaborative Research Group.
        Neurology. 1992; 42: 859-863
        • Noseworthy J.H.
        Clinical scoring methods for multiple sclerosis.
        Ann. Neurol. 1994; 36: S80-S85
        • Sharrack B.
        • Hughes R.A.
        Clinical scales for multiple sclerosis.
        J. Neurol. Sci. 1996; 135: 1-9
        • Verdier-Taillefer M.H.
        • Zuber M.
        • Lyon-Caen O.
        • Clanet M.
        • Gout O.
        • Louis C.
        • et al.
        Observer disagreement in rating neurologic impairment in multiple sclerosis: facts and consequences.
        Eur. Neurol. 1991; 31: 117-119