Malignant Lymphomas |
1 Department of Scientific and Engineering Simulation, Nagoya Institute of Technology, Nagoya, Japan
2 Division of Molecular Medicine, Aichi Cancer Center Research Institute, Nagoya, Japan
3 Division of Information Engineering, Graduate School of Engineering, Mie University, Tsu, Japan
4 Department of Pathology, State Key Laboratory of Cancer Biology, Xijing Hospital, Fourth Military Medical University, Shaanxi, P.R.China
5 Department of Cancer Genetics, Nagoya University Graduate School of Medicine at Aichi Cancer Center, Nagoya, Japan
Correspondence: Ichiro Takeuchi, Ph.D. Department of Scientific and Engineering Simulation, Nagoya Institute of Technology, Gokiso-cho, Showa-ku, Nagoya, Aichi 466-8555, Japan. E-mail:takeuchi.ichiro{at}nitech.ac.jp
|
|
|---|
Design and Methods: Data on copy number gains and losses for 46 diffuse large B-cell lymphomas and 29 mantle cell lymphomas were used. The gene expressions of the diffuse large B-cell lymphomas cases were profiled and hierarchical clustering revealed that 28 of them were of the activated B-cell type and 18 were of the germinal center-B-cell type. Using these data, we developed a computer algorithm to classify lymphoma diseases or subtypes on the basis of copy number gains and losses.
Results: The method correctly classified 88% of the diffuse large B-cell lymphomas and mantle cell lymphomas, and 83% of the activated B-cell and germinal center-B-cell subtypes. These results demonstrate that copy number gains and losses detected by array CGH can be used for classifying lymphomas into biologically and clinically distinct diseases or subtypes.
Conclusions: Our computer algorithm based on array CGH data successfully classified diffuse large B-cell lymphomas and mantle cell lymphomas and activated B-cell and germinal center-B-cell subtypes with high accuracy. An important finding is that the regions automatically identified by the computer algorithm were located in the critical regions that are likely to be involved in the development of lymphoma.
Key words: diffuse large B-cell lymphoma, mantle cell lymphoma, array CGH, genome profile, lymphoma classification.
|
|
|---|
Several recent studies have shown the power of gene-expression analysis for the classification of malignant lymphoma diseases and subtypes.8–12 In these studies, computer algorithms were developed to select differentially expressed genes and use them to construct the classifier. In the study presented here, we examined whether genomic copy number gains and losses detected by array CGH could also be used for the classification of malignant lymphomas and developed a computer algorithm for this purpose. This algorithm is similar to the ones used in gene expression-based classification,11 but slightly modified to deal with array CGH data. We applied the algorithm to the classification of 75 cases of malignant lymphoma into 46 cases of DLBCL and 29 of MCL, as well as to further classify the 46 DLBCL cases into 28 of the activated-B-cell (ABC) subtype and 18 of the germinal center-B-cell (GCB) subtype.4,6
MCL is a single disease entity characterized by the translocation of (11;14)(q13;q32) accompanied by over-expression of CCND1.1 DLBCL is known to be the most common tumor and accounts for 40% of all malignant lymphomas.1 Gene expression analysis of DLBCL has demonstrated that these lymphomas comprise distinct tumor subtypes such as the ABC and GCB subtypes.8 ABC DLBCL is an aggressive lymphoma and the overall survival rate of patients with this subtype is inferior to that of patients with the GCB subtype.8,9 We recently demonstrated that ABC and GCB DLBCL have distinct patterns of genomic alterations.6 However, although we demonstrated that each disease entity has a characteristic pattern of genomic alterations, it was not clear whether the array CGH data could be used for classification because patients with the same disease entity vary from case to case. In the current study, we investigated whether genomic copy number gains and losses detected by array CGH could reliably distinguish different lymphoma diseases (DLBCL and MCL) as well as different subtypes (ABC and GCB). We hypothesized that an analysis of genomic copy number gains and losses would provide useful information for accurate and reproducible diagnosis of malignant lymphomas.
|
|
|---|
Array comparative genomic hybridization-based classifier
We developed a fully automatic computer algorithm for the array CGH-based classification of lymphoma subtypes. This algorithm is similar to those employed in the classification of malignant lymphomas using gene expression profiles.11 Linear predictor scores were computed for each case on the basis of copy number gains and losses detected by the array CGH. The scaling factors (coefficients) of the linear predictor scores were selected as the (signed) negative log of the p values obtained with Fishers exact test. Only those clones with the most significant differences determined with Fishers exact test were used to produce the linear predictor scores, with the optimal number of clones determined empirically (see below). The distribution of the linear predictor scores for each of the two disease entities (DLBCL and MCL) was approximated by using the normal distribution. The means and variances of these normal distributions were estimated from the linear predictor score calculated for the cases with each disease entity. For a new case, we estimated the likelihood of it belonging to one of the disease entities and then classified it by applying Bayes rule. The formal description of the array CGH-based linear compound Bayes classifier is provided in Online Supplementary Appendix.
Validation
Leave-one-out cross-validation (LOOCV) was used to estimate the performance of the classifier. As discussed in recent publications,14–16 LOOCV can produce a more reliable measure of classification accuracy than validating the performance with an independent validation set. We also used LOOCV to determine the optimal number of clones used to form linear predictor scores. For this purpose, we used nested-LOOCV with the outer loop to estimate the classification accuracy and the inner loop to determine the optimal number of clones. We also performed classification analyses by dividing the cases into training (60%) and validation (40%) sets. The classifier was then constructed with the training set and tested with the validation set. Results of the classification performances were not significantly different (p=0.05) from those of the LOOCV analyses for both the DLBCL-MCL and the ABC-GCB classifications.
Each clones significance of the differences in copy number alterations was evaluated using Fishers exact test, the false discovery rate and the family-wise error rate. The last two measures take into account multiple comparisons. We performed 10,000 label permutations to compute the false discovery rate and the family-wise error rate. The validation strategy and the computations for the significance measures are explained in detail in Online Supplementary Appendix.
|
|
|---|
![]() View larger version (32K): [in a new window] [Download PPT slide] |
Figure 1. Performance of the array-CGH based classifier for the classification of diffuse large B-cell lymphomas and mantle cell lymphomas. (A) Probability of the 75 malignant lymphoma cases being diagnosed as diffuse large B-cell lymphoma or mantle cell lymphoma using the array CGH-based classifier. (B) Top 25 clones that showed gains or losses more frequently in diffuse large B-cell lymphoma than in mantle cell lymphoma. (C) Top 25 clones that showed gains or losses more frequently in mantle cell lymphoma than in diffuse large B-cell lymphoma.
|
|
View this table: [in a new window] [Download PPT slide] |
Table 1. Results of the classification of diffuse large B-cell lymphoma and mantle cell lymphoma using the array CGH-based classifier.
|
|
View this table: [in a new window] [Download PPT slide] |
Table 3A. Clones with the most significant differences in copy number gains or losses between diffuse large B-cell lymphoma and mantle cell lymphoma. Top 25 clones with more gains and losses in diffuse large B-cell lymphoma than in mantle cell lymphoma (DLBCL-specific clones).
|
![]() View larger version (29K): [in a new window] [Download PPT slide] |
Figure 2. Performance of the array-CGH based classifier for the classification of activated-B-cell and germinal center-B-cell subtypes of diffuse large B-cell lymphoma. (A) Probability of the 46 malignant diffuse large B-cell lymphoma cases being diagnosed as activated-B-cell or germinal center-B-cell subtype using the array CGH-based classifier. (B) Top 25 clones which showed gains or losses more frequently in the activated-B-cell subtype than in the germinal center-B-cell subtype. (C) Top 25 clones which showed gains or losses more frequently in the germinal center-B-cell subtype than in the activated-B-cell subtype. Note that the subtypes activated-B-cell (represented as A) and germinal center-B-cell (represented as G) were defined using clustering analysis based on gene expression profiles.6
|
|
View this table: [in a new window] [Download PPT slide] |
Table 2. Results of the classification of activated-B-cell and germinal center-B-cell subtypes of diffuse large B-cell lymphoma using the array CGH-based classifier and a gene-expression based classifier.
|
These differences in the frequency were determined using the one-sided Fishers exact test. Figures 1B and 1C also show gains and losses observed in 25x2 clones for all 46 patients. As can be seen from the detailed information on these 50 clones listed in Table 4p values (from the one-sided Fishers exact test) were below 1.1x10–2, and the false discovery rate was below 7.9x10–2. In the entire LOOCV analysis, these 50 clones accounted for 92.5% of the clones used for the classifications.
|
View this table: [in a new window] [Download PPT slide] |
Table 4A. Clones with the most significant differences in copy number gains or losses between activated-B-cell and germinal center-B-cell subtypes. A. Top 25 clones with more gains and losses in the activated-B-cell subtype than in the germinal center-B-cell subtype (ABC-specific clones).
|
|
|
|---|
Several studies8–12 have succeeded in demonstrating the power of gene expression profiling for the classification of lymphoma diseases and subtypes. In addition, genomic analysis has also been shown to be suitable for diagnostic purposes.15,16 As demonstrated in our previous studies, smaller amounts of DNA can be used for analysis without amplification procedures.2–7 Furthermore, greater stability and easier availability of DNA in comparison with RNA could be expected to make array CGH more reliable for diagnostic purposes. When we applied our method to the classification of different lymphoma entities (DLBCL and MCL) as well as different subtypes (ABC and GCB), the results showed that copy number gains and losses at a few dozen clones were effective for differentiating between disease entities as well as DLBCL subtypes. This study demonstrates that only a small subset of clones is required for a highly accurate classification.
The concordance between the ABC and GCB classification made by means of the hierarchical clustering method and classifier method described by Wright et al. was 91.3% (Online Supplementary Figure S1). The 83% accuracy achieved using array CGH data can, therefore, be assumed to be high. It remains to be determined which method of expression profiling classification is suitable for array CGH data classification.
The list of clones used for the classification of DLBCL and MCL diseases is provided in Table 3. The first 25 clones showed more frequent gains and losses in DLBCL than in MCL, and we designated them as DLBCL-specific clones. The other 25 clones showed more frequent gains and losses in MCL than in DLBCL, and are designated as MCL-specific clones. Among the top 25 MCL-specific clones, seven were in the 11q22 region, one of which was BAC RP11-241D13, which contains the ATM gene. It is known that the ATM gene is a tumor suppressor and that the inactivation of this gene does not activate DNA repair mechanisms properly.20,21 Gene mutations and loss of heterogeneity have been identified in 56% of MCL.21 However, neither loss of heterogeneity nor deletion of 11q22 was observed in DLBCL, according to a previous report.6 The loss of 11q22 may, therefore, be strongly associated with the pathogenesis of MCL, while the presence or absence of this gene is also important for discriminating DLBCL and MCL.
The list of clones used for the classification of ABC and GCB subtypes is supplied in Table 4. The first 25 clones showed more frequent gains and losses in the ABC subtype than in the GCB subtype, and we designated them as ABC-specific clones. The other 25 clones showed more frequent gains and losses in the GCB subtype than in the ABC one, and we designated them as GCB-specific clones. The BCL2 and MALT1 genes were selected as ABC-specific clones. MALT1 gene gain was previously suggested to play an important role in DLBCL.22 Dierlamm et al. recently reported that the gain of 18q/MALT1 is associated with the ABC subtype of DLBCL.23 The fact that there are two ABC cases in the present study showing MALT1 gains without any BCL2 gain could indicate that MALT1 may be the gene implicated in this region in the ABC subtype of DLBCL. Several clones at 3q25-qter were selected as ABC-specific in the present study. This is in accordance with the report by Bea et al., who revealed that 65% of cases with 3q27 had 18q21–q22 gains among ABC subtype DLBCL.24 These findings demonstrated that DLBCL subtyping by means of expression profiling is based on genomic alterations. The differential diagnosis of DLBCL and MCL by means of array CGH is less important because immunohistological markers for MCL, such as cyclin D1, already exist, although some cases of MCL can be misdiagnosed if the cyclin D1 does not stain clearly. More importantly, the clones selected with the algorithm used in our study are clearly associated with regions that are known to be characteristic to disease entities.
These include the 11q22 and 9q34.3 regions for MCL21,25,26 and 18q21 and 19q13 for DLBCL.6 Deletion of 9q34 has been reported to be a predictor of poor survival in patients with MCL.25,26 This seems to suggest that selected markers may play an important role in the pathogenesis and/or clinicopathological features of the various lymphoma entities. As some of the genetically altered areas have not yet been fully characterized at the molecular level, it is important to recognize that critical genes involved in disease development and progression still remain to be discovered.
Although it is important to identify such responsible genes, the identification of characteristic regions by means of a computer algorithm may be much more important than successful differential diagnosis based on array CGH data.
In summary, the results of our study show that genomic copy number gains and losses, detected by array CGH, can be used for the accurate diagnosis of different malignant lymphoma diseases and their subtypes. It was further demonstrated that copy number imbalances in only a few dozen clones differentiate different diseases and subtypes. Some clones used for the classification contained genes known to be strongly associated with tumor pathogenesis. This indicates that new target genes may be identified by using the classification procedure presented here.
|
View this table: [in a new window] [Download PPT slide] |
Table 3B. Top 25 clones with more gains and losses in mantle cell lymphoma than in diffuse large B-cell lymphoma (MCL-specific clones).
|
|
View this table: [in a new window] [Download PPT slide] |
Table 4B. Top 25 clones with more gains and losses in the germinal center-B-cell subtype than in the activated-B-cell subtype (GCB-specific clones).
|
IT: designed and performed the data analysis and wrote the paper; HT: performed experiments on array CGH and wrote the paper; AT: contributed to application of the software for data analysis; MK-S: performed the gene-expression profiling experiments; YG: contributed to the pathological review and wrote the paper. MS: organized the research and wrote the paper. The authors reported no potential conflicts of interest.
Received for publication February 28, 2008. Revision received August 19, 2008. Accepted for publication September 8, 2008.
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||