ML/DL
Toxic Comment Detection
- BERT
- DeBERTa
- Tensorflow
Overview
Built a reproducible pipeline to detect toxic social media content, focusing on the efficiency-accuracy trade-offs between various transformer architectures.
Key highlights
- Benchmarked BERT, DeBERTa-v3, and DistilBERT using automated SLURM jobs with fault-tolerant checkpointing.
- Handled severe class imbalance using stratified splits and Dice Loss, achieving a 99.49% validation AUPRC.
- Reduced false negatives by 12% through optimized preprocessing and normalization of noisy social media tokens.