This project implements a comprehensive comparison of quantum and classical SVM algorithms in the context of brain tumor classification (GBM and LGG) based on TCGA genetic data. The study includes robustness analysis of different quantum feature maps against various types and levels of noise in medical data. https://michalkasprowicz.com

Go to file

Michał Kasprowicz e78bce39b2 Update README.md		2025-09-16 18:53:21 +00:00
dane	Upload files to "dane"	2025-08-23 08:25:04 +00:00
eksperymenty_poboczne	Upload files to "eksperymenty_poboczne"	2025-09-16 18:35:51 +00:00
konfiguracja	Upload files to "konfiguracja"	2025-09-16 18:34:25 +00:00
wyniki	Upload files to "wyniki/eksperyment_glowny/2"	2025-09-13 01:25:39 +00:00
LICENSE	Initial commit	2025-08-21 05:43:29 +00:00
README.md	Update README.md	2025-09-16 18:53:21 +00:00
qsvm.py	Upload files to "/"	2025-08-30 12:31:21 +00:00
qsvm1_zz.py	Upload files to "/"	2025-08-21 06:32:39 +00:00
qsvm2_pauli.py	Upload files to "/"	2025-08-21 06:32:39 +00:00
qsvm3_z.py	Upload files to "/"	2025-08-21 06:32:39 +00:00
qsvm4_amplitude.py	Upload files to "/"	2025-08-21 06:32:39 +00:00
qsvm5_hybrid.py	Upload files to "/"	2025-08-21 06:32:55 +00:00
qsvm_optimized.py	Upload files to "/"	2025-09-13 01:23:15 +00:00

README.md

Analysis of genomic data for cancer mutations using quantum support vector machine algorithm (QSVM).

Project Overview

This project implements a comprehensive comparison of quantum and classical SVM algorithms in the context of brain tumor classification (GBM and LGG) based on TCGA genetic data. The study includes robustness analysis of different quantum feature maps against various types and levels of noise in medical data.

Main Research Objectives

Performance comparison of quantum and classical SVM algorithms
Robustness analysis of different quantum feature maps against data noise
Practicality assessment of quantum algorithms in real-world medical applications

Experiment Scope

11 datasets: 1 clean + 10 noisy (additive and substitutional, 1-20%)
4 experiment types: ZZ, Pauli, Z, Amplitude
6 quantum feature maps: ZZ1, ZZ2, Pauli1, Pauli2, Z1, Z2
3 C parameters: 0.1, 1.0, 10.0
10-fold cross-validation for each combination
Classical SVM as baseline

Project Structure

├── 📁 Data
│   ├── qsvm.py                # Main experiment controller
│   ├── qsvm1_zz.py            # Experiment 1: ZZ Feature Maps
│   ├── qsvm2_pauli.py         # Experiment 2: Pauli Feature Maps
│   ├── qsvm3_z.py             # Experiment 3: Z Feature Maps
│   ├── qsvm4_amplitude.py     # Experiment 4: Amplitude Encoding
│   └── dane/                  # TCGA datasets
│       ├── TCGA_GBM_LGG_Mutations_all.csv   # high dimensional data
│       ├── TCGA_GBM_LGG_Mutations_clean.csv # low dimensional data
│       ├── zaszumione/                      # Substitutional noise
│       └── zaszumione_rozszerzone/          # Additive noise
│
├── 📁 Results
│
├── 📁 Configuration
│   ├── environment.yml       # Conda environment
│   └── requirements.txt      # Python dependencies
│
└── 📁 Side experiments
    ├── experiments.py        # Experiments file 
    └── run_experiment.sh     # Shell script to run cloud computing using multi thread option

Side Experiments Overview

The directory contains side experiments that extend the main quantum brain tumor classification project. The experiments focus on analyzing the impact of genetic data complexity and different gene subsets on the effectiveness of quantum SVM algorithms.

Main Research Objectives

Genetic complexity analysis: Impact of mutation count on classification performance
Gene subset testing: Comparison of different gene groups (frequently/moderately/rarely mutated)
Multi-core optimization: Utilizing VAST.AI cloud computing
Feature map comparison: Testing different quantum feature maps on diversified data

Experiment Scope

2 main experiments: Complexity analysis + Gene subsets
4 gene subsets: All, frequently mutated, moderately mutated, rarely mutated
14 quantum feature maps: Pauli, Z, Amplitude with different parameters
3 complexity levels: Low, medium, high (based on quartiles)
Multi-core processing: Optimization for cloud computing

VAST.AI Cloud Configuration

Recommended Configuration:

ID: m:33614
Host: 166946
Processor: AMD EPYC 7C13 64-Core Processor
Cores: 32.0/128 cpu
RAM: 8 GB
Cost: $0.144/hr
DLPerf: 15.6

Launch Instructions:

File transfer:

./transfer_files.sh <IP_ADDRESS>

Server connection:

ssh root@<IP_ADDRESS>

Experiment launch:

cd /root
./run_experiment.sh

Experimental Methodology

Experiment: Genetic Complexity Analysis

Complexity Definition

Genetic data complexity is defined based on the number of genetic mutations per case:

# Calculate mutation count for each case
mutation_counts = X.sum(axis=1)

# Classification into complexity levels based on quartiles
low_threshold = mutation_counts.quantile(0.25)    # 25% quartile
high_threshold = mutation_counts.quantile(0.75)   # 75% quartile

Complexity Levels

Low Complexity
- Criterion: mutation_counts ≤ 25% quartile
- Characteristics: Cases with few mutations
- Expectations: Better performance of linear algorithms
Medium Complexity
- Criterion: 25% quartile < mutation_counts < 75% quartile
- Characteristics: Cases with moderate mutation count
- Expectations: Greatest advantage of quantum algorithms
High Complexity
- Criterion: mutation_counts ≥ 75% quartile
- Characteristics: Cases with many mutations
- Expectations: Better performance of nonlinear algorithms

Parameter Configuration

You can customize parameters in the experiments.py file:

# Experiment selection
RUN_GENE_SUBSETS_EXPERIMENT = True    # Gene subsets experiment
RUN_COMPLEXITY_EXPERIMENT = False     # Complexity experiment
RUN_FEATURE_MAPPINGS_EXPERIMENT = False  # Feature mappings experiment

# Quantum parameters
QUANTUM_SHOTS = 50                    # Number of shots (reduced for performance)
QUANTUM_TIMEOUT = 300                 # 5-minute timeout
MAX_FEATURE_DIMENSION = 8             # Maximum feature dimension

# Multi-core parameters
USE_MULTIPROCESSING = True            # Enable parallel processing
MAX_WORKERS = None                    # Automatic core detection

Installation and Configuration

System Requirements

Python: 3.9
RAM: Minimum 8GB (16GB recommended)
CPU: Multi-core processor (for parallel processing)
Disk: ~5GB free space

Method 1: Conda (Recommended)

# Clone repository
git clone <repository-url>
cd kod_sierpien

# Create conda environment
conda env create -f environment.yml

# Activate environment
conda activate MK_QSVM

Method 2: Manual Installation

# Create environment
conda create -n MK_QSVM python=3.9
conda activate MK_QSVM

# Install basic libraries
conda install -c conda-forge numpy=1.24.3 pandas=2.0.3 scikit-learn=1.3.0
conda install -c conda-forge matplotlib=3.7.2 seaborn=0.12.2 jupyter=1.0.0

# Install quantum libraries
pip install qiskit==0.44.1 qiskit-aer==0.12.2 qiskit-machine-learning==0.6.0
pip install dimod==0.12.8 umap-learn==0.5.3 plotly==5.16.1

Method 3: Requirements.txt

# Create virtual environment
python -m venv qsvm-env
source qsvm-env/bin/activate  # Linux/Mac
# or
qsvm-env\Scripts\activate     # Windows

# Install dependencies
pip install -r requirements.txt

Running Experiments

Basic Launch

# Activate environment
conda activate MK_QSVM

# Run all experiments
python qsvm.py

Running Individual Experiments

# Experiment 1: ZZ Feature Maps
python qsvm1_zz.py

# Experiment 2: Pauli Feature Maps
python qsvm2_pauli.py

# Experiment 3: Z Feature Maps
python qsvm3_z.py

# Experiment 4: Amplitude Encoding
python qsvm4_amplitude.py

Parameter Configuration

You can customize parameters in the qsvm.py file:

# Data parameters
DATA_FILES = [
    'dane/TCGA_GBM_LGG_Mutations_all.csv',
    # Add or remove files as needed
]

# Experiment parameters
RUN_CLASSIC_SVM = True      # Classical SVM
RUN_QUANTUM_SVM = True      # Quantum SVM
RUN_HYBRID_APPROACH = True  # Hybrid approach

# Dimensionality reduction parameters
USE_PCA = True
PCA_COMPONENTS = 12

Experimental Methodology

1. Data Preparation

Source: TCGA (The Cancer Genome Atlas) - GBM/LGG data
Target variable: Primary_Diagnosis
Features: Genetic mutations, demographic data, clinical features
Processing: Standardization, dimensionality reduction (PCA), train/test split

2. Quantum Feature Maps

ZZFeatureMap

Structure: Hadamard gates + Z rotations + ZZ entanglements
Properties: Local encoding with quantum correlations
Implementation: ZZFeatureMap from Qiskit

PauliFeatureMap

Structure: Utilizes all Pauli axes (X, Y, Z)
Properties: Richer encoding with stronger entanglements
Implementation: PauliFeatureMap from Qiskit

ZFeatureMap

Structure: Only Hadamard gates and Z rotations
Properties: Simpler, more stable
Implementation: ZFeatureMap from Qiskit

Amplitude Encoding

Structure: Amplitude encoding with different normalizations
Properties: Custom kernel K(x,y) = (x·y)²
Implementation: Custom AmplitudeKernel class

4. Validation and Metrics

Cross-validation: 10-fold for QSVM, 5-fold for classical SVM
Metrics: Accuracy, Precision, Recall, F1-score, ROC-AUC
Comparison: Classical SVM vs Quantum SVM

Results Analysis

Running Analysis

# Complete analysis of all results
python analyze_results.py

Key Results

Analysis of 81 experiments shows:

Classical SVM: 100% accuracy in all experiments
AMPLITUDE: 96.7% accuracy
ZZ: 86.1% accuracy
PAULI: 81.4% accuracy
Z: 66.0% accuracy

Troubleshooting

Library Errors

# If version conflicts occur
conda clean --all
conda env remove -n MK_QSVM
conda env create -f environment.yml

Memory Issues

Reduce data size in DATA_FILES
Disable some feature maps
Reduce number of PCA components

Installation Check

# Check local simulator
python -c "from qiskit import Aer; print('Local simulator OK')"

# Check quantum libraries
python -c "from qiskit_machine_learning import QSVC; print('QSVC OK')"

Academic Context

Research Area

Quantum Machine Learning
Medical Classification
Noise Robustness
Quantum Optimization

Key files to check:

environment.yml - environment configuration
qsvm.py - experiment parameters
Cache files - experiment progress
Result files - detailed logs

License

This project is intended for research and educational purposes. All data comes from publicly available TCGA sources.

Acknowledgments

TCGA for providing genetic data
IBM Qiskit for quantum framework
VAST.AI for cloud platform
Adam Mickiewicz University for research support

Last update: 2025-01-09 Version: 1.0 Status: Ready for experiment execution