ML/DL

Toxic Comment Detection

  • BERT
  • DeBERTa
  • Tensorflow
Overview

Built a reproducible pipeline to detect toxic social media content, focusing on the efficiency-accuracy trade-offs between various transformer architectures.

Key highlights
  • Benchmarked BERT, DeBERTa-v3, and DistilBERT using automated SLURM jobs with fault-tolerant checkpointing.
  • Handled severe class imbalance using stratified splits and Dice Loss, achieving a 99.49% validation AUPRC.
  • Reduced false negatives by 12% through optimized preprocessing and normalization of noisy social media tokens.