Hello! I am a final-year Ph.D. candidate at Harvard University, advised by Flavio du Pin Calmon. I develop robust and reliable solutions that enable trustworthy adoption of AI across critical domains. I am currently on the job market for positions starting in September 2026, with a preference for opportunities in Europe.

My research focuses on reliable, responsible and trustworthy Machine Learning, and my work spans GenAI Agents, supply chain management, LLM watermarking, algorithmic fairness, multiplicity, and more. I use tools and frameworks from Optimization, Information Theory, Probability, and Statistics. I am open for collaborations and can be reached via email!

I received my undergraduate degree in Math and Computer Science at NYU Courant and my industry experience includes Citadel LLC and Meta.

Publications

  • HeavyWater and SimplexWater: Watermarking Low-Entropy Text Distributions
    Dor Tsur*, Carol Xuan Long*, Claudio M. Verdun, Hsiang Hsu, Chen-Fu Chen, Haim Permuter, Sajani Vithana, Flavio P Calmon
    Under Review, 2025.
    TL/DR

    Our goal is to design watermarks that optimally use side information to maximize detection accuracy and minmize distortion of generated text. We propose two watermarks **HeavyWater** and **SimplexWater** that achieve SOTA performance. Our theoretical analysis also reveals surprising new connections between LLM watermarking and **coding theory**.

  • Optimized Couplings for Watermarking Large Language Models, (slides)
    Carol Xuan Long*, Dor Tsur*, Claudio M. Verdun, Hsiang Hsu, Haim Permuter, Flavio P Calmon
    IEEE International Symposium on Information Theory (ISIT), 2025.
    TL/DR

    We argue that a key component in watermark design is generating a coupling between the side information shared with the watermark detector and a random partition of the LLM vocabulary. Our analysis identifies the optimal coupling and randomization strategy under the worst-case LLM next-token distribution that satisfies a min-entropy constraint. We propose the **Correlated-Channel watermarking scheme** --- a closed-form scheme that achieve high detection at zero distortion.

  • Kernel Multiaccuracy, (slides)
    Carol Xuan Long, Wael Alghamdi, Alexander Glynn, Yixuan Wu, Flavio P Calmon
    Foundations of Responsible Computing (FORC), 2025.
    TL/DR

    We connect multi-group notions with *Integral Probability Metrics*, and propose **KMAcc** --- a non-iterative, one-step optimization to correct multiaccuracy errors in the kernel space.

  • Predictive Churn with the Set of Good Models
    Jamelle Watson-Daniels, Flavio P Calmon, Alexander D’Amour, Carol Xuan Long, David C. Parkes, Berk Ustun\ Under Review, 2024.
    TL/DR

    We study the effect of predictive churn - flip in predictions over ML model updates - through the lens of predictive multiplicity – i.e., the prevalence of conflicting predictions over the set of near-optimal models (the ε-Rashomon set).

  • Multi-Group Proportional Representation in Retrieval
    Alex Osterling, Claudio M Verdun, Carol Xuan Long, Alexander Glynn, Lucas Monteiro Paes, Sajani Vithana, Martina Cardone, Flavio P Calmon
    Advances in Neural Information Processing Systems (NeurIPS), 2024.
    TL/DR

    We introduce Multi-Group Proportional Representation (MPR), a novel metric that measures representation across intersectional groups. We propose practical methods and algorithms for estimating and ensuring MPR in image retrieval, with minimal compromise in retrieval accuracy.

  • Individual Arbitrariness and Group Fairness
    Carol Xuan Long, Hsiang Hsu, Wael Alghamdi, Flavio P Calmon
    Advances in Neural Information Processing Systems (NeurIPS), 2023, Spotlight Paper.
    TL/DR

    Fairness interventions in machine learning optimized solely for group fairness and accuracy can exacerbate predictive multiplicity. A third axis of ``arbitrariness'' should be considered when deploying models to aid decision-making in applications of individual-level impact.

  • On the epistemic limits of personalized prediction
    Lucas Monteiro Paes*, Carol Long*, Berk Ustun, Flavio Calmon (* Equal Contribution)
    Advances in Neural Information Processing Systems (NeurIPS), 2022.
    TL/DR

    It is impossible to reliably verify that a personalized classifier with $k \geq 19$ binary group attributes will benefit every group that provides personal data using a dataset of $n = 8 × 10^9$ samples – one for each person in the world.

Misc

Outside of work, I am a globetrotter, dancer, and music-lover. Growing up as a swimmer, I enjoy sports. From completing a half-marathon and recovering from an ACL injury, I’ve collected many stories to tell (for better or worse!). Whenever I can, I head outdoors — my top three U.S. national parks are Yellowstone, the Grand Canyon, and Mount Rainier.