Transformer Architecture Overview

Our diagnostic system is built on a custom transformer architecture specifically designed for tabular epigenomic data. Unlike traditional transformers used in natural language processing, our model is optimized to process DNA methylation patterns from Illumina arrays.

Transformer Architecture Diagram
Figure 1: High-level overview of the EpiClassify transformer architecture

Clinical Workflow Overview

EpiClassify integrates seamlessly into clinical practice, providing objective diagnostic insights for ME/CFS and Long COVID. Our streamlined workflow includes blood sample collection, sample processing, advanced AI analysis, clinical report generation, and informed decision support.

  1. Blood Sample Collection: A standard 8.5mL EDTA blood tube is collected during a routine visit.
  2. Sample Processing: DNA is extracted and analyzed using the Illumina EPIC or 450K arrays, generating Green and Red channel IDAT files.
  3. Data Upload: The two IDAT files (*_Grn.idat and *_Red.idat) are uploaded through the research data portal.
  4. AI Analysis: A transformer-based model processes the methylation data, identifying key epigenetic markers.
  5. Report Generation: A comprehensive clinical report is produced with detailed visualizations and pathway analysis.
  6. Decision Support: Integrated results assist clinicians in diagnosis and treatment planning with high confidence scores.

Learn More About Conducting Tests

The transformer processes methylation data through several specialized components:

Input Embedding

Converts methylation beta-values into a high-dimensional representation suitable for the transformer

Self-Attention

Captures relationships between different CpG sites across the genome

Feed-Forward Network

Processes the attention outputs through specialized expert networks

Classification Head

Produces final diagnostic probabilities for ME/CFS, Long COVID, and Control

Self-Supervised Masked Pretraining

Before fine-tuning on diagnostic labels, we pretrain our transformer using a self-supervised approach inspired by BERT's masked language modeling. This technique allows the model to learn the inherent structure of methylation data even with limited labeled samples.

Step 1: Masking

Randomly mask 15% of CpG methylation values in the input data

Step 2: Prediction

Train the transformer to predict the original values of the masked CpG sites

Step 3: Optimization

Minimize mean squared error between predicted and actual methylation values

Step 4: Fine-tuning

Transfer learned weights to the diagnostic model and fine-tune on labeled data

Masked Pretraining Loss Function:

LMSE = (1/∑i,j Mij) ∑i,j Mij(x̂ij - xij)2
Where Mij is the mask indicator (1 if masked, 0 otherwise), xij is the true value, and x̂ij is the predicted value.

Our ablation studies showed that this pretraining approach improved final classification accuracy by approximately 6%, with the most significant gains observed in cases with ambiguous methylation patterns.

Mixture-of-Experts Feed-Forward Network

Traditional transformers use a single feed-forward network (FFN) for all inputs. Our architecture implements a Mixture-of-Experts (MoE) approach that dynamically routes each input through specialized expert networks based on learned gating functions.

MoE Feed-Forward Network:

MoE-FFN(z) = ∑e=1E ge(z) · FFNe(z)
Where ge(z) is the gating function for expert e, FFNe is the e-th expert network, and E is the total number of experts (4 in our implementation).

This approach allows different parts of the network to specialize in different methylation patterns, such as:

Immune Response

Patterns related to immune system genes (e.g., HLA complex, cytokines)

Energy Metabolism

Methylation in genes related to cellular energy production

Neurological Function

Patterns in genes associated with neurological pathways

Stress Response

Methylation in stress-response genes like NR3C1

The MoE approach improved classification accuracy by approximately 4% compared to a standard transformer with a single feed-forward network of equivalent parameter count.

Adaptive Computation Time

Not all samples require the same amount of processing to reach a confident diagnosis. Our transformer implements Adaptive Computation Time (ACT), allowing it to dynamically decide how many processing steps to apply to each sample.

Sample Type Average Processing Steps Confidence Score
Clear ME/CFS 2.3 0.94
Clear Long COVID 2.5 0.91
Clear Control 1.8 0.97
Ambiguous Cases 4.7 0.82

Adaptive Computation Time:

pj(t) = σ(wactThjt-1 + bact)
Where pj(t) is the halting probability at step t, hjt-1 is the hidden state from the previous step, and σ is the sigmoid function.

ACT allows the model to allocate computational resources more efficiently, spending more time on difficult cases while quickly processing clear-cut samples. This approach improved overall accuracy by 3% and was particularly effective for borderline cases.

Attention Visualization and Interpretability

A key advantage of our transformer architecture is its interpretability. By analyzing attention weights, we can identify which CpG sites are most important for classification decisions.

Attention Visualization
Figure 2: Attention weight visualization highlighting key methylation sites

Our analysis revealed several key genomic regions with high attention weights:

Gene Function Methylation Pattern Condition
HLA-DRB1 Immune regulation Hypomethylated ME/CFS
IFNG Interferon signaling Hypermethylated ME/CFS
NR3C1 Stress response (HPA axis) Hypomethylated ME/CFS
IFITM3 Interferon-induced antiviral Variable methylation Long COVID
PDK2 Metabolic regulation Hypermethylated Both

These findings align with current theories about the underlying pathophysiology of ME/CFS and Long COVID, where immune dysregulation and metabolic imbalances are central features.

Performance Comparison

We compared our transformer-based approach with several baseline methods using the same dataset and evaluation metrics.

Classification Accuracy by Method
97.06%
EpiClassify Transformer
80.2%
Random Forest
75.4%
Logistic Regression
78.1%
XGBoost
82.3%
Neural Network

Our transformer model significantly outperformed all baseline methods, with the largest improvements seen in:

  • Distinguishing between ME/CFS and Long COVID (often confused in other models)
  • Correctly classifying borderline cases with subtle methylation changes
  • Maintaining high specificity (low false positive rate) for healthy controls
  • Generalizing to samples from different cohorts and array platforms

Explore Our Pipeline

Learn how we process and analyze methylation data from raw arrays to diagnostic results.