Hello! I am a 4th-year Ph.D. student at Harvard, advised by Flavio du Pin Calmon. My expected graduation date is May 2026! I completed my undergraduate degree in Math and Computer Science at NYU Courant and I interned at Citadel LLC and Meta.

My research interest lies in Responsible and Trustworthy Machine Learning, and my work spans LLM watermarking (ongoing work!), algorithmic fairness, multiplicity, and more. I contemplate the impacts of ML algorithms on various domains of society for different (exponentially-many) groups of people. I use tools and frameworks from Information Theory, Probability, and Statistics. I am always open for collaborations and can be reached via email!


  • Kernel Multiaccuracy
    Carol Xuan Long, Wael Alghamdi, Alexander Glynn, Yixuan Wu, Flavio P Calmon
    Under Review, 2024.

  • Predictive Churn with the Set of Good Models
    Jamelle Watson-Daniels, Flavio P Calmon, Alexander D’Amour, Carol Xuan Long, David C. Parkes, Berk Ustun\ Under Review, 2024.
    TL/DR: We study the effect of predictive churn - flip in predictions over ML model updates - through the lens of predictive multiplicity – i.e., the prevalence of conflicting predictions over the set of near-optimal models (the ε-Rashomon set).

  • Multi-Group Proportional Representation in Retrieval
    Alex Osterling, Claudio Mayrink Verdun, Carol Xuan Long, Alexander Glynn, Lucas Monteiro Paes, Sajani Vithana, Martina Cardone, Flavio P Calmon
    Advances in Neural Information Processing Systems (NeurIPS), 2024.
    TL/DR: We introduce Multi-Group Proportional Representation (MPR), a novel metric that measures representation across intersectional groups. We propose practical methods and algorithms for estimating and ensuring MPR in image retrieval, with minimal compromise in retrieval accuracy.

  • Individual Arbitrariness and Group Fairness
    Carol Xuan Long, Hsiang Hsu, Wael Alghamdi, Flavio P Calmon
    Advances in Neural Information Processing Systems (NeurIPS), 2023, Spotlight Paper.
    TL/DR: Fairness interventions in machine learning optimized solely for group fairness and accuracy can exacerbate predictive multiplicity. A third axis of ``arbitrariness’’ should be considered when deploying models to aid decision-making in applications of individual-level impact.

  • On the epistemic limits of personalized prediction
    Lucas Monteiro Paes*, Carol Long*, Berk Ustun, Flavio Calmon (* Equal Contribution)
    Advances in Neural Information Processing Systems (NeurIPS), 2022
    TL/DR: It is impossible to reliably verify that a personalized classifier with $k \geq 19$ binary group attributes will benefit every group that provides personal data using a dataset of $n = 8 × 10^9$ samples – one for each person in the world.


Outside of work, being a pianist and dancer, I have a deep appreciation for all art forms, esp. classical music and ballet/contemporary dance. Growing up as a swimmer, I enjoy sports. From completing a half-marathon and recovering from an ACL injury, for better or worse, I do have many stories to tell. Of course, I love cooking Chinese/Singaporean food and reading away (AntiFragile is my recent favorite!) in the comfort of home!