ML/DL

Toxic Comment Detection

2025-01 – 2025-05

ML Engineer

Harvard University

BERT
DeBERTa
Tensorflow

Overview

Built a reproducible pipeline to detect toxic social media content, focusing on the efficiency-accuracy trade-offs between various transformer architectures.

Key highlights

Benchmarked BERT, DeBERTa-v3, and DistilBERT using automated SLURM jobs with fault-tolerant checkpointing.
Handled severe class imbalance using stratified splits and Dice Loss, achieving a 99.49% validation AUPRC.
Reduced false negatives by 12% through optimized preprocessing and normalization of noisy social media tokens.