Digital Pathology

Multimodal Deep Learning & Spatial Analysis in Ovarian Cancer

2025-05 – 2025-08

Data Science Intern

Regeneron Pharmaceuticals

Weakly Supervised Learning
Image Registration
Python Package
OpenCV
scikit-image

Overview

This project focused on the development of a comprehensive computational pathology framework to investigate molecular signatures in ovarian cancer. The objective was to bridge the gap between unstructured histological data and patient-level clinical endpoints. The solution involved an automated pipeline for registering serial tissue sections (H&E and IHC) and a downstream analysis workflow utilizing foundation models to correlate morphological features with molecular data, including Response, Mismatch Repair (MMR) status, and Tumor Mutational Burden (TMB).

Module 1: Automated Co-Registration Engineering

To enable multimodal analysis, pixel-perfect alignment between H&E and IHC slides was required. A robust, contour-based registration tool was engineered to address common challenges such as sectioning distortions and artifacts.

Algorithm Design: Utilized an affine transformation approach driven by tissue contours rather than pixel intensity, preventing overfitting to local warping.
Optimization Logic:
- Metric: Alignment quality was evaluated using Intersection over Union (IoU) and Mean Squared Error (MSE).
- Systematic Shift Search: A circular shift search algorithm was implemented to correct for contour extraction discrepancies and identify the optimal starting alignment.
- Fine-Tuning: Post-processing involved iterative scaling, centroid translation, and rotation searches ($\pm 15^{\circ}$).
Performance & Deployment:
- The pipeline achieved a mean IoU of 0.915 across the validation set.
- Optimization latency was minimized to ~30ms, with total processing time per slide pair under 150 seconds.
- The tool was encapsulated into an internal pip-installable Python package to facilitate scalable deployment.

Module 2: Foundation Models & Patient-Level Modeling

Following registration, a feature extraction pipeline was implemented to relate tissue morphology to clinical outcomes.

Pipeline Architecture

The analysis workflow processed Whole Slide Images (WSIs) through the following stages:

Tiling: WSIs were segmented into fixed-size patches to handle gigapixel-resolution data.
Feature Extraction: The CTransPath foundation model was utilized as a feature extractor, generating high-dimensional embeddings for each tissue patch.
Dimensionality Reduction: Techniques such as UMAP and t-SNE were applied to the extracted features to identify and visualize distinct morphological clusters within the histopathology patches.
Clinical Correlation: Extracted histopathology features were statistically correlated with clinical attributes, specifically Response, MMR status, and TMB.

Spatial Analysis & Interpretability

High-correlation features were projected back onto the original whole slide images to examine their spatial distribution. This reverse-mapping capability allowed for the validation of whether statistical correlations aligned with recognizable biological structures.