Hello! I am a Ph.D. student at Harvard University, advised by Flavio du Pin Calmon. My expected graduation date is May 2026! I completed my undergraduate degree in Math and Computer Science at NYU Courant and I interned at Citadel LLC and Meta.

My research interest lies in Responsible and Trustworthy Machine Learning, and my work spans LLM watermarking, algorithmic fairness, multiplicity, and more. I contemplate the impacts of ML algorithms on various domains of society for different (exponentially-many) groups of people. I use tools and frameworks from Information Theory, Probability, and Statistics. I am always open for collaborations and can be reached via email!

Publications

  • HeavyWater and SimplexWater: Watermarking Low-Entropy Text Distributions
    Dor Tsur*, Carol Xuan Long*, Claudio M. Verdun, Hsiang Hsu, Chen-Fu Chen, Haim Permuter, Sajani Vithana, Flavio P Calmon
    Under Review, 2025.
    TL/DR

    Our goal is to design watermarks that optimally use side information to maximize detection accuracy and minmize distortion of generated text. We propose two watermarks **HeavyWater** and **SimplexWater** that achieve SOTA performance. Our theoretical analysis also reveals surprising new connections between LLM watermarking and **coding theory**.

  • Optimized Couplings for Watermarking Large Language Models, (slides)
    Carol Xuan Long*, Dor Tsur*, Claudio M. Verdun, Hsiang Hsu, Haim Permuter, Flavio P Calmon
    IEEE International Symposium on Information Theory (ISIT), 2025.
    TL/DR

    We argue that a key component in watermark design is generating a coupling between the side information shared with the watermark detector and a random partition of the LLM vocabulary. Our analysis identifies the optimal coupling and randomization strategy under the worst-case LLM next-token distribution that satisfies a min-entropy constraint. We propose the **Correlated-Channel watermarking scheme** --- a closed-form scheme that achieve high detection at zero distortion.

  • Kernel Multiaccuracy, (slides)
    Carol Xuan Long, Wael Alghamdi, Alexander Glynn, Yixuan Wu, Flavio P Calmon
    Foundations of Responsible Computing (FORC), 2025.
    TL/DR

    We connect multi-group notions with *Integral Probability Metrics*, and propose **KMAcc** --- a non-iterative, one-step optimization to correct multiaccuracy errors in the kernel space.

  • Predictive Churn with the Set of Good Models
    Jamelle Watson-Daniels, Flavio P Calmon, Alexander D’Amour, Carol Xuan Long, David C. Parkes, Berk Ustun\ Under Review, 2024.
    TL/DR

    We study the effect of predictive churn - flip in predictions over ML model updates - through the lens of predictive multiplicity – i.e., the prevalence of conflicting predictions over the set of near-optimal models (the ε-Rashomon set).

  • Multi-Group Proportional Representation in Retrieval
    Alex Osterling, Claudio M Verdun, Carol Xuan Long, Alexander Glynn, Lucas Monteiro Paes, Sajani Vithana, Martina Cardone, Flavio P Calmon
    Advances in Neural Information Processing Systems (NeurIPS), 2024.
    TL/DR

    We introduce Multi-Group Proportional Representation (MPR), a novel metric that measures representation across intersectional groups. We propose practical methods and algorithms for estimating and ensuring MPR in image retrieval, with minimal compromise in retrieval accuracy.

  • Individual Arbitrariness and Group Fairness
    Carol Xuan Long, Hsiang Hsu, Wael Alghamdi, Flavio P Calmon
    Advances in Neural Information Processing Systems (NeurIPS), 2023, Spotlight Paper.
    TL/DR

    Fairness interventions in machine learning optimized solely for group fairness and accuracy can exacerbate predictive multiplicity. A third axis of ``arbitrariness'' should be considered when deploying models to aid decision-making in applications of individual-level impact.

  • On the epistemic limits of personalized prediction
    Lucas Monteiro Paes*, Carol Long*, Berk Ustun, Flavio Calmon (* Equal Contribution)
    Advances in Neural Information Processing Systems (NeurIPS), 2022.
    TL/DR

    It is impossible to reliably verify that a personalized classifier with $k \geq 19$ binary group attributes will benefit every group that provides personal data using a dataset of $n = 8 × 10^9$ samples – one for each person in the world.

Misc

Outside of work, I am a globaltrotter, dancer, and music-lover. Growing up as a swimmer, I enjoy sports. From completing a half-marathon and recovering from an ACL injury, for better or worse, I do have many stories to tell. Of course, I also love cooking Canton/Singaporean food and reading away in the comfort of home!