Transformer Architecture Overview
Our diagnostic system is built on a custom transformer architecture specifically designed for tabular epigenomic data. Unlike traditional transformers used in natural language processing, our model is optimized to process DNA methylation patterns from Illumina arrays.
Clinical Workflow Overview
EpiClassify integrates seamlessly into clinical practice, providing objective diagnostic insights for ME/CFS and Long COVID. Our streamlined workflow includes blood sample collection, sample processing, advanced AI analysis, clinical report generation, and informed decision support.
- Blood Sample Collection: A standard 8.5mL EDTA blood tube is collected during a routine visit.
- Sample Processing: DNA is extracted and analyzed using the Illumina EPIC or 450K arrays, generating Green and Red channel IDAT files.
- Data Upload: The two IDAT files (*_Grn.idat and *_Red.idat) are uploaded through the research data portal.
- AI Analysis: A transformer-based model processes the methylation data, identifying key epigenetic markers.
- Report Generation: A comprehensive clinical report is produced with detailed visualizations and pathway analysis.
- Decision Support: Integrated results assist clinicians in diagnosis and treatment planning with high confidence scores.
The transformer processes methylation data through several specialized components:
Input Embedding
Converts methylation beta-values into a high-dimensional representation suitable for the transformer
Self-Attention
Captures relationships between different CpG sites across the genome
Feed-Forward Network
Processes the attention outputs through specialized expert networks
Classification Head
Produces final diagnostic probabilities for ME/CFS, Long COVID, and Control
Self-Supervised Masked Pretraining
Before fine-tuning on diagnostic labels, we pretrain our transformer using a self-supervised approach inspired by BERT's masked language modeling. This technique allows the model to learn the inherent structure of methylation data even with limited labeled samples.
Step 1: Masking
Randomly mask 15% of CpG methylation values in the input data
Step 2: Prediction
Train the transformer to predict the original values of the masked CpG sites
Step 3: Optimization
Minimize mean squared error between predicted and actual methylation values
Step 4: Fine-tuning
Transfer learned weights to the diagnostic model and fine-tune on labeled data
Masked Pretraining Loss Function:
LMSE = (1/∑i,j Mij) ∑i,j Mij(x̂ij - xij)2Our ablation studies showed that this pretraining approach improved final classification accuracy by approximately 6%, with the most significant gains observed in cases with ambiguous methylation patterns.
Mixture-of-Experts Feed-Forward Network
Traditional transformers use a single feed-forward network (FFN) for all inputs. Our architecture implements a Mixture-of-Experts (MoE) approach that dynamically routes each input through specialized expert networks based on learned gating functions.
MoE Feed-Forward Network:
MoE-FFN(z) = ∑e=1E ge(z) · FFNe(z)This approach allows different parts of the network to specialize in different methylation patterns, such as:
Immune Response
Patterns related to immune system genes (e.g., HLA complex, cytokines)
Energy Metabolism
Methylation in genes related to cellular energy production
Neurological Function
Patterns in genes associated with neurological pathways
Stress Response
Methylation in stress-response genes like NR3C1
The MoE approach improved classification accuracy by approximately 4% compared to a standard transformer with a single feed-forward network of equivalent parameter count.
Adaptive Computation Time
Not all samples require the same amount of processing to reach a confident diagnosis. Our transformer implements Adaptive Computation Time (ACT), allowing it to dynamically decide how many processing steps to apply to each sample.
Sample Type | Average Processing Steps | Confidence Score |
---|---|---|
Clear ME/CFS | 2.3 | 0.94 |
Clear Long COVID | 2.5 | 0.91 |
Clear Control | 1.8 | 0.97 |
Ambiguous Cases | 4.7 | 0.82 |
Adaptive Computation Time:
pj(t) = σ(wactThjt-1 + bact)ACT allows the model to allocate computational resources more efficiently, spending more time on difficult cases while quickly processing clear-cut samples. This approach improved overall accuracy by 3% and was particularly effective for borderline cases.
Attention Visualization and Interpretability
A key advantage of our transformer architecture is its interpretability. By analyzing attention weights, we can identify which CpG sites are most important for classification decisions.
Our analysis revealed several key genomic regions with high attention weights:
Gene | Function | Methylation Pattern | Condition |
---|---|---|---|
HLA-DRB1 | Immune regulation | Hypomethylated | ME/CFS |
IFNG | Interferon signaling | Hypermethylated | ME/CFS |
NR3C1 | Stress response (HPA axis) | Hypomethylated | ME/CFS |
IFITM3 | Interferon-induced antiviral | Variable methylation | Long COVID |
PDK2 | Metabolic regulation | Hypermethylated | Both |
These findings align with current theories about the underlying pathophysiology of ME/CFS and Long COVID, where immune dysregulation and metabolic imbalances are central features.
Performance Comparison
We compared our transformer-based approach with several baseline methods using the same dataset and evaluation metrics.
Our transformer model significantly outperformed all baseline methods, with the largest improvements seen in:
- Distinguishing between ME/CFS and Long COVID (often confused in other models)
- Correctly classifying borderline cases with subtle methylation changes
- Maintaining high specificity (low false positive rate) for healthy controls
- Generalizing to samples from different cohorts and array platforms