Open Access Research Article

A Nonparametric Random-Effects Meta-Analysis Model for Diagnostic Accuracy Studies with Multiple Thresholds of Quantitative Biomarkers

Aeilko H Zwinderman*

Department of Epidemiology and Data Science, Amsterdam University Medical Center, Netherland

Corresponding Author

Received Date:March 14, 2023;  Published Date:May 30, 2023

Abstract

We introduce a nonparametric random-effects model for the meta-analysis of a series of diagnostic accuracy studies of a quantitative biomarker that reported sensitivity-and specificity-values at multiple thresholds of the biomarker. The model is based on the observed numbers of cases and controls in between cutoff-values of the biomarker.

Observed numbers of cases and controls within studies were modeled using multinomial-normal or multinomial-dirichlet distributions. Parameters of our new model were Estimated using an MCMC-algorithm in a Bayesian framework. We provide code to run our method within the R statistical software. With our new model two example datasets were analyzed and compared to the method implemented in the R package diagmeta.

With two example datasets our approach gave comparable results as the method implemented in diagmeta. Results are illustrated using estimated density and cumulative distribution function of the biomarker and associated credibility intervals. Transformations thereof such as ROCcurve, area under the ROC curve, and the Youden-index curve are also estimated together with credibility intervals.

We developed a new model for meta-analysis of diagnostic studies evaluating multiple thresholds of a quantitative biomarker. The new model provided comparable results as an existing method but with less assumptions.

Keywords:Diagnostic test; Meta-analysis; Multiple thresholds

Abbreviations:AUC: Area Under the Curve; CI: Common random Intercept; CS: Common random Slope; DICS: Different random Intercepts and Common random Slope; DIDS: Different random Intercepts and Different random Slopes; ELF: enhanced liver fibrosis; FENO: fractional exhaled nitric oxide; HA: hyaluronic acid; JAGS: Just Another Gibbs Sampler; MCMC: Multi Chain Monte Carlo; PIIINP: Procollagen III N-terminal Propeptide; ROC: Receiver Operator Curve; TIMP: tissue inhibitor of metalloproteinase-1

Introduction

Steinhauser et al [1] developed a method for the meta-analysis of multiple diagnostic test accuracy studies of a quantitative biomarker that reported sensitivity- and specificity-values at multiple thresholds of the biomarker. Their method is implemented in a user-friendly and free [2] and is very quick, depending a little bit on choice of the specific model and size of the data. Steinhauser et al made specific choices for the analysis model which validity may be questioned for some biomarkers. Their first choice concerned the distributions of the biomarker, which was assumed to be either normal or logistic. Obviously, this assumption may not be valid in particular cases. Second choice concerned the application of standard linear mixedeffects regression models of transformed sensitivity- and specificity- values on the biomarker-thresholds. The assumption about the linear relationship between transformed sensitivity and specificity and thresholds need not be correct. Lastly, the residuals of the models for transformed sensitivities/specificities associated with different thresholds in the same study (i.e., esi and fsi in the Steinhauser paper) were assumed to be independent. This assumption is peculiar because different realisations of the sensitivities/specificities in the same study are almost certainly correlated (with increasing threshold, sensitivity will -necessarily- decrease and specificity will similarly increase). A final concern is that the method is based on the normal distribution approximation of the binomial distribution of the numbers of true positives and true negatives (with the usual 0.5-trick when sensitivity or specificity is zero or 1). Steinhauser et al did point to this concern but did not as yet implement the binomial distribution of sensitivity/specificity in their R-package.

To address these issues we extended the work of Steinhauser et al for data of the type that was considered by Steinhauser et al. We suggest to use a nonparametric method based on the multinomial distribution. In the next sections we elaborate the type of (meta-) data we aim to analyse, we specify our statistical model and estimation method. To illustrate we show results of two data sets; a new set concerning results on the enhanced liver fibrosis (ELF) biomarker that might be useful to diagnose advanced liver fibrosis and a set that was published by [4] on diagnostic accuracy of fractional exhaled nitric oxide for the diagnosis of asthma. We finish with a discussion section.

Materials and Methods

Type of data

We consider a series of N diagnostic studies of a quantitative biomarker Y. In study i there were included ni1 ‘cases’ and ni0 ‘controls’. In the report of study i the sensitivities and specificities to distinguish cases and controls were reported for ki different cutoff-values (ki≥1):ξi1i2,....,ξiki Hence, for cutoff-value ξij the observed sensitivity- and specificity-values were seij and spij (see Table 1 with example data). From these reported results we calculated in every study the observed numbers of cases and controls with biomarker-values in between cutoff-values: yi1j=ni1(se-1-seij) is the number of cases with ξij-1< Y ≤ ξij and yi0j=ni0(spi,j+1-spij)is the number of controls with ,ξi,j-1< Y≤ ξij. Sensitivity sei0 and specificity spi,ki+1 were defined as zero and could therefore be considered to be associated with cutoff-values ξi0=-∞ and ξi,ki+1 = ∞ , or any biological or theoretical lower- and upperbound of Y .

So, for study i we translated published results into a vector of observed numbers of cases in the ki+1 categories of Y in study i , denoted as yi1=(yi10,...,yi1ki) and a vector of observed numbers of controls in the same ki + 1 categories of Y , denoted as yi0=(yi00,...,yi0ki). It is important to stress that in general the number of cutoff-values,ki , (likely) varies between studies and also that the specific cutoff-values themselves, ξi1,....,ξi,ki , vary between studies. It is also important to stress that the biomarker Y is represented in this way as a categorical variable through the cutoff-values, and that the ki+1 categories can be considered as ordered

Statistical models

We considered the distributions of the vectors yi1 and yi0 to be multinomials given the numbers of cases and controls:

irispublishers-openaccess-biostatistics-biometric-applications

irispublishers-openaccess-biostatistics-biometric-applications
are vectors of length ki+1 of probabilities:  πi1j and  πi0j are the probabilities that a random case or control in study i had biomarker value ξij-1< Y ≤ ξij Notice that in this model the covariance between two realisations yi1 j and yiil in the group of cases in study i equals −πi1j πi1l/ni1 (and similarly in the group of controls), and therefore the asymptotic covariance between for instance sei1i1 and sei2i1i2 is equal to sei1(1-sei2)/ni1 − se n , which is not necessarily zero (as is assumed in the diagmeta-package by Steinhauser et al).

irispublishers-openaccess-biostatistics-biometric-applications

There are several options to further model πi10i11,...,πi1m and πi00i00,...,πi0m and we explored two ways, namely by mixing the multinomial distributions with a multivariate normal or with the Dirichlet distribution. For the multinomial-normal mixture we first transformed πi10,πi11,...,πi1m and πi00i00,...,πi0m using the soft-max transformation as:

irispublishers-openaccess-biostatistics-biometric-applications

(and ai10 = ai00 = 0 ), and then we assumed that the vector of parameters of study ωi=(ai11,....,ai1m,ai01,...,ai0m), was a random draw from the multivariate normal distribution with mean μ (of length 2m) and covariance matrix Σ (of dimensions 2m). With this assumption our model is a random-effects meta- analysis model. A fixed-effects model is obtained by assuming Σ = 0 , or actually by constraining ai1j =a1j and aioj=aoj for all i .

This model is an extension of the bivariate meta-analysis model. If there is only one cutoff-value (θ ) , then πi10i00i11 are the probabilities of false negative, true negative, true positive and false positive outcomes in study i , respectively, and i11 a and i01 a are the logit-transformed sensitivity and 1-specificity of study i . In such case the parameter μ represents the average logit-transformed sensitivity and 1-specificity in the population of diagnostic accuracy studies of this biomarker [3].

Like the bivariate model, our extended model can be generalized with (study-specific) covariates, and given the ordinal character of the biomarker categories it seems sensible to use a ordinal logistic model, such as for instance a proportional odds model:

irispublishers-openaccess-biostatistics-biometric-applications

where xi is the observed value of a vector of covariates of study i . The vector of covariates may be different for cases and controls and the values of the covariates may even differ between biomarker- categories. The regression coefficients β1i and β0i may also vary over categories of the biomarker, but then the ordinal character of the biomarker-categories is not used in the statistical modeling. It may be useful too to assume that these regression coefficients are fixed parameters and not vary over studies (i.e.βi1 = β1 and βi0 = β0 for all studies). For now we will consider the situation without covariates.

irispublishers-openaccess-biostatistics-biometric-applications

where B(…) are beta-functions. This mixture model is also a random-effects meta-analysis model. Notice that the study-specific probabilities of cases (πi1j) and of controls (πi0j) were supposed to be drawn from different Dirichlet distributions. This was chosen because of ease of computations, but is not absolutely necessary. It is different from the multinomial-normal mixture that is specified above, where ai0j and ai0j were sampled from a single multivariate normal distribution, but that choice can be amended too by drawing ai1j and ai0j from different normal distributions. The multinomial- Dirichlet mixture model can also be generalized to include covariates (by modeling α1j and αoj as functions of covariates x ), but here we only considered the model without covariates.

Estimation

We first considered the multinomial-normal mixture model. Given ωi=(ai1,ai0) the conditional log likelihood of (yi1,yi0) equals

irispublishers-openaccess-biostatistics-biometric-applications

where

irispublishers-openaccess-biostatistics-biometric-applications
is the multivariate normal distribution function with mean and covariance matrix Σ . Summing contributions
irispublishers-openaccess-biostatistics-biometric-applications
over all N studies gives the total marginal log-likelihood.

The integral in the marginal likelihood is multidimensional of dimension 2m and has no analytical solution. Optimizing the total log marginal likelihood is difficult if m is large, therefore based solely on convenience-arguments we decided to use MCMC-methods in a Bayesian framework to estimate the parameters of interest, i.e., ( ) 0 1 μ = π ,π andΣ (where 0 π and 1 πare the vectors of category- probabilities averaged over studies for controls and cases, respectively). In this framework we used as hyperprior distributions

irispublishers-openaccess-biostatistics-biometric-applications

where S was a diagonal symmetric matrix with entries ”0.001”, and R was a diagonal symmetric matrix with entries ”1”. These hyperprior distributions were considered to be (almost) uninformative.

The model was implemented in JAGS [7] and was run through the R-interface of JAGS. Below is the JAGS-syntax in the Additional Material-section. We chose to apply 50.000 burn-in iterations and then checked convergence. When converged we drew 50.000 val ues from the posterior distributions with a thinning parameter of 10 to reduce autocorrelation between sampled values. Afterwards the posterior distributions were graphically illustrated and summarised with the mean, median and 2.5th and 97.5th percentiles. We also derived posterior distributions of transformations of (π 0 ,π1 ) , such as sensitivities, specificities and Youden index values.

Estimation of the multinomial-Dirichlet mixture was slightly easier because integrals involved in the likelihood can be solved analytically. Given ( 10 ,...,  1 ) π i π i ki the conditional likelihood of ( ) 10 1 ,..., i i ki y y equals

irispublishers-openaccess-biostatistics-biometric-applications
irispublishers-openaccess-biostatistics-biometric-applications

A similar total marginal log-likelihood was obtained for the data of the control-subjects in the N studies. These two likelihoods were independent, hence the decision to use two Dirichlet distributions led to separate analyses for the case- and control-data.

Like with the multinomial-normal mixture we used MCMC- methods in a Bayesian frame-work to estimate the parameters of interest in this approach too, i.e.,α00,....,α0m and α10,...,α1m . Convenient is the possibility to obtain posterior distributions of any function of α10,...,α1m and α00,....,α0m, such as averaged category-probabilities for cases and controls, sensitivities, specificities and Youden-index values. In this frame-work we used as hyperprior distributions independent gamma distributions: α1j gamma , and α0j  gamma for all j = 0,..., m).

Results

We analyzed data coming from N = 10 studies on the diagnostic value of the enhanced liver fibrosis (ELF) biomarker for diagnosis of advanced liver fibrosis. ELF is a weighted combination of type III procollagen peptide (PIIINP), hyaluronic acid (HA), and tissue inhibitor of metalloproteinase-1 (TIMP1) measured in blood. The data retrieved from the 10 published papers is summarised in Table 1. Results are available of 28 study-specific cutoffs and m = 24 unique thresholds of the ELF biomarker.

Table 1:Study cutoff values, and observed sensitivities and specificities in the various studies on the diagnostic value of the enhanced live fibrosis (ELF) biomarker.

irispublishers-openaccess-biostatistics-biometric-applications

Using the multinomial-normal mixture approach, estimated averaged category-probabilities (i.e., 1 for cases and 0 for controls) were calculated and those are reported in the top panel of Figure 1: on visual inspection the biomarker did not seem to be normally distributed, especially not in the controls. The empirical cumulative distribution functions are illustrated in the lower panel of Figure 1, separately for the cases and the controls. In these figures every study is represented by colored thin lines, and the averaged cumulative distribution functions are represented by thick black lines.

irispublishers-openaccess-biostatistics-biometric-applications

From the estimated average distribution functions we calculated the average ROC curve and the average Youden index-values for all thresholds. Both are given in Figure 2. Youden index was highest for the cutoff-value Y=9.8 (Youden index=0.49, 95 % CI: 0.437-0.541), but a similar value was found for Y=9.0 (Youden index= 0.488, 95% CI: 0.383-0.606). Estimated averaged sensitivities and specificities for all unique thresh-olds are reported in Table 2.

Table 2:Cutoff values and estimated averaged sensitivities and specificities.

irispublishers-openaccess-biostatistics-biometric-applications
irispublishers-openaccess-biostatistics-biometric-applications

The area under the ROC (AUC) was estimated as 0:816 with 95% credibility inter-val 0:788 0:851. Convergence of the model seemed to be sufficient; as an exam-ple the trace- and density-plots of the AUC-statistic are given in Figure 3. Using the multinomial- Dirichlet mixture we found similar results, although in general with wider credibility intervals; estimated area under the ROC was 0:792 with 95% credibility in-terval 0:700 0:866. Estimated area under the ROC was 0.796 (95% ci: 0.68-0.88) according to the least constrained DIDS model of Steinhauser et al, but the DIDS algorithm did not converge. Only with the simpler CS- and CI-models did their algorithm converge but then the AUC estimates were 0.747 and 0.696, respectively.

irispublishers-openaccess-biostatistics-biometric-applications

As a second example we re-analyzed the data of [4] on the diagnostic accuracy of fractional exhaled nitric oxide (FENO) for the diagnosis of asthma. The data were from 29 studies reporting sensitivity and specificity results of 150 cut-offs, of which 53 thresholds were unique. The data is available on: https://data.mendeley. com/datasets/fndpn5bnps/1.

irispublishers-openaccess-biostatistics-biometric-applications

The random-effects model did not converge within a reasonable time-frame, and therefore we performed a fixed-effects analysis. The results of the multinomial-normal and of the multinomial- Dirichlet mixture models were again very comparable; we report therefore only the results of the multinomial-normal mixture model. Observed and estimated average cumulative distributions of FENO in controls and cases per study are reported in Figure 4 (top panel) and the associated ROC curves are reported in the lower panel. The thin coloured lines are associated with the individual studies, the thick black lines represent averages (together with 95% credibility intervals). Area-under-the-averaged-ROC-curve (AUC) was estimated as 0.750 (95% CI: 0.732-0.767). AUC was estimated as 0.778 (95% ci: 0.710-0.834) according to the DIDS-model of Steinhauser et al [5].

Discussion

We extended the work of Steinhauser et al for the meta-analysis of the results of a series of diagnostic studies of a quantitative biomarker Y where results were reported with sensitivity and specificity values at multiple and varying cutoff-values across the studies. Our model is based on the observed numbers of cases and controls in the various categories of the biomarker Y defined by the cutoff-values used by the different studies and we specified a random- effects, meta-analytic, structure. Our approach gave similar results as Steinhauser et al’s method for the two example datasets that we considered, but that may not be the case in general. Our model requires less assumptions and therefore is likely more robust.

Both our and Steinhauser et al’s method ignore the reasons why the specific cutoff-values were chosen in the studies, so, basically, the cutoff-values were considered to be ‘fixed’. In practice cutoff- values are probably highly coincidental and the sampling-variation underlying these choices is ignored, which likely leads to too small credibility or confidence intervals.

That our approach does not require any distributional assumption for the biomarker is a nice feature, but it has as a downside that statistics (like Youden, sensitivity, specificity) can only be calculated for cutoff-values that have been used in at least one study. This is unlike Steinhauser et al’s method where the underlying parametric model also allows estimation of performance statistics at thresholds that not have been used by any of the included studies. Our model can perhaps be smoothed in order to facilitate interpolation, for instance by penalizing squared differences between adjacent categories (i.e.,πi1j−πi1,j-1)2 − ), but that is outside the scope of the present paper.

Our approach is implemented in the standalone computerprogram JAGS and runs through the rjags-library in R. It is much less fast than the approach of Steinhauser et al, needing hours of computation time (versus seconds by Steinhauser et al’s method). This is consequence of both the nonparametric character of our model and our decision to use a Bayesian estimation algorithm. Perhaps calculations can be sped up by using Hamiltonian MCMC (as implemented in the Stan program), but the large(r) number of parameters in our model that needs to be estimated is the main issue. For the liv er-biomarker example there were 24 different thresholds, meaning that there were 25 categories in the biomarker-distributions in cases and controls. Therefore, there were 2*(25 −1) = 48mean-parameters to be estimated (i.e.,μ ) and 48*(48 −1) 2 =1128 variances and covariances. In Steinhauser et al’s approach there were 4 mean and 4*(4 −1) 2 = 6 (co-)variance parameters in their largest DIDS-model which is best comparable to ours. Moreover, the number of parameters remains the same in their model whereas the number of parameters in our model is a direct function of the num-ber of different cutoff-values reported in the set of studies. It took a few seconds to analyse Schneider et all’s FENO data, whereas it took several hours with our approach.

Computation time of our approach for the liver-biomarker data was about 30 minutes compared to a few seconds for Steinhauser et al’s approach. However, Steinhauser et al’s software reported convergence-problems for most of the complex models. It appeared that correlations between some of the random effects were close to unity. Models can be simplified in Steinhauser et al’s approach but when doing that we found that results were quite varying. We found that AUC’s varied between 0.696/0.747 according to Steinhauser et al’s CI/CS models that converged without warnings to 0.796/0.809 according to the DIDS/DICS-models that failed to converge with singular fits. Because of the Bayesian algorithm that we used, such convergence problems did not arise with our approach. Convergence was fine as judged by trace-plots [6,7].

As described earlier, fixed-effects model-variants are easily formulated by dropping the study-subscript i from the transformed category-parameters i0 j a and i1 j a and actually specifying i1 j 1 j a = a and i0 j 0 j a = a for all i =1,..., N and j =1,..., m. This leads to a minor adaptation of the JAGS-syntax. The results of the fixed-effects model for the two examples were comparable, but the averaged density- and cumulative distribution functions were less smooth (and therefore the averaged ROC-curves too).

Conclusion

We developed a new model for meta-analysis of diagnostic studies evaluating multiple thresholds of a quantitative biomarker.

The new model provided comparable results as an existing method but with less assumptions.

Acknowledgement

None.

Conflict of Interest

The author declares that he has no competing interests.

References

    1. Steinhauser S, Schumacher M, Rucker G (2016) Modelling multiple thresholds in meta-analysis of diagnostic test accuracy studies. BMC Medical Research Methodology 16(1): 97.
    2. Rucker G, Steinhauser S, Kolampally S, Schwarzer G. R-package Diagmeta: Meta-Analysis of Diagnostic Accuracy Studies with Several Cutpoints. Version 0.3-0. https://github.com/guido-s/diagmeta.
    3. Patel PJ, Connoley D, Rhodes F, Srivastava A, Rosenberg W, et al. (2020) A review of the clinical utility of the Enhanced Liver Fibrosis test in multiple aetiologies of chronic liver disease. Ann Clin Biochem 57(1): 36-43.
    4. Schneider A, Linde K, Reitsma JB, Steinhauser S, Rucker G, et al. (2017) A novel statistical model for analyzing data of a systematic review generates optimal cutoff values for fractional exhaled nitric oxide for asthma diagnosis. Journal of Clinical Epidemi-ology 92: 69-78.
    5. Reitsma JB, Glas AS, Rutjes AW, Scholten RJ, Bossuyt PM, et al. (2005) Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol 58(10): 982-90.
    6. Harbord RM, Deeks JJ, Egger M, Whiting P, Sterne JA, et al. (2007) A unification of models for meta-analysis of diagnostic accuracy studies. Biostatistics 8(2): 239-51.
    7. Plummer M. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling, 2003. Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), March 2022, Vienna, Austria. http://mcmc-jags.sourceforge.net.
Citation
Keywords
Signup for Newsletter
Scroll to Top