MLOps Research
& Development Innovation

With over 50+ years of combined industry and academic experience in AI and ML Operations, Arthur is the only company to adopt a research-led approach to product development.

Our expert researchers and experimental approach drives exclusive capabilities in computer vision, NLP, bias mitigation, and other critical areas.

Together, we can shape the future of model operations while optimizing ML models for accuracy, explainability, and fairness to ensure compliance in highly regulated industries.

From the lab to the boardroom, we partner with global data scientists, ML directors and AI Center of Excellence leadership to launch real-world solutions worldwide. As enterprises embark on their AI maturity journey, we share researcher insights, advance whiteboard ideas, empower best practices, benchmark industry metrics, and inspire thought leadership.


University Research Experience

Carnegie Mellon
University of Washington
University of Texas
University of Maryland
BROWN university
Carnegie Mellon university
columbia university
CORNELL university
GEORGETOWN university
HARVARD university
new york university
Northwestern university
uc berkeley
of washington

Team Members

John Dickerson

Chief Scientist & Co-Founder

John is co-founder and Chief Scientist at Arthur, the AI performance monitoring company, as well as a tenured professor of Computer Science at the University of Maryland. His research centers on solving practical economic problems using techniques from computer science, stochastic optimization, and machine learning. He received his PhD in computer science from Carnegie Mellon University (SCS CSD PhD '16).

Keegan Hines

VP of Machine Learning

Keegan is the Vice President of Machine Learning at ArthurAI and an Adjunct Professor at Georgetown University. His PhD work was at the University of Texas in the lab of Rick Aldrich, with a focus on bringing powerful statistical and computational methods to bear on the study of protein biophysics. He is generally interested in how we can use machine learning in a reliable and trustworthy way.

Max Cembalest

Machine Learning Engineer

Max is a researcher at Arthur focused on simplifying and explaining machine learning models. Previously, he received an M.S. in Data Science from Harvard University, where he concentrated on interpretability and graph-based models. He is particularly excited about recent advances in applying abstract algebra, topology, and category theory to neural network design.

Jessica Dai

Machine Learning engineer

Jessica is a first-year PhD student in Computer Science at UC Berkeley, coadvised by Nika Haghtalab and Ben Recht. She previously spent two years at Arthur in engineering, research, and miscellaneous other roles, and received an Sc.B. in Computer Science from Brown University.

Teresa Datta

Machine Learning Engineer

Teresa is a researcher at Arthur interested in transparency and social impact of algorithmic systems from a human-centered lens. She is interested in use-case evaluations of tools for AI transparency and context-based mechanisms for accountability. Previously, she worked on XAI and HCI projects while completing her M.S. in Data Science at Harvard University.

Valentine d’Hauteville

Machine Learning Engineer

Valentine is a researcher at Arthur and is currently interested in data centric approaches to improve model performance as well as algorithmic and design approaches to make AI broadly usable. She comes from a Data Science background. She recently completed a Computer Science masters at Columbia University and holds an undergraduate degree in Physics from UPenn.

Daniel Nissani

Machine Learning Engineer

Daniel is a researcher at Arthur interested in the ethical design and implementation of machine learning systems. Previously, he worked on synthetic data generation, specifically around unstructured text, at Gretel. He received a dual masters from Cornell Tech in Information Systems and Applied Information Sciences and a B.S. in Mathematics and Secondary Education from Northwestern University.

Avi Schwarzschild

Machine Learning engineer

Avi is a research fellow at Arthur and a fifth-year PhD student in the Applied Math and Scientific Computation program at the University of Maryland. His work at Arthur focuses on explainability tools for neural networks. At the University of Maryland, he is advised by Tom Goldstein on his work in deep learning. His general interests range from security to generalization and interpretability and he is trying to expand our understanding of when and why neural networks work.

ML Research

Arthur offers enterprise-grade monitoring of models. Some aspects of monitoring are well understood, industry standard, and “from the book.” Yet, much of what we do—scalable very-high-dimensional drift detection, understanding the context in which fair machine learning should be offered (if at all), explainability for novel model types and input data types, understanding what robustness means, interaction with existing or future legal frameworks, and so on—necessitates deep interaction with the academic and policy communities. Toward that end, since our inception, our Research Fellows program has recruited and curated relationships with top AI, ML, policy, and legal junior researchers, who spend a summer or semester with Arthur building toward a joint goal of public dissemination of a research result. If you are a strong junior researcher interested in shaping the trustworthy and performant AI space, get in touch at

Michelle Bao

Michelle conducts research on interdisciplinary theory on AI ethics and practical tools for fairness, and hopes to better understand how one might inform the other. In addition to her time at Arthur, she has enjoyed doing research under various organizations including Stanford NLP Group, the ACLU, and Stanford ML Group and teaching/designing curricula for CS classes at Stanford.

Naveen Durvasula

Naveen is an undergraduate at UC Berkeley. His research interests lie broadly at the intersection of theoretical computer science, machine learning, and economics. In particular, he's excited about applications of learning to mechanism design and new economic paradigms for data exchange. Naveen has worked on projects with applications in kidney exchange, ecommerce, matching theory, theoretical statistics, fairness, and machine learning operations. He has collaborated with researchers at the University of Maryland, UC Berkeley, and Harvard University.

Lizzie Kumar

Lizzie Kumar is a Ph.D. candidate in Computer Science at Brown University. Her research analyzes computational and regulatory strategies for evaluating machine learning models from an interdisciplinary perspective. Previously, she developed actuarial risk models on the Data Science team at MassMutual. Lizzie holds an M.S. in Computer Science from the University of Massachusetts at Amherst and a B.A. in Mathematics from Scripps College.

Kweku Kwegyir-Aggrey

Kweku is broadly interested in machine learning and statistics with a specific focus on the design of algorithms that audit machine learning models for fairness and robustness. He is interested in questions which rigorously examine and critique data-driven technological solutionism. He is a PhD candidate in the Brown University Department of Computer Science and received his bachelor’s degree in Computer Science & Mathematics at the University of Maryland.

Sahil Verma

Sahil is a PhD student in the Department of Computer Science and Engineering at the University of Washington, Seattle. He is interested in answering questions related to explainability and fairness in ML models. In the past, Sahil has worked on developing novel techniques to generate counterfactual explanations for ML classifiers and also spearheaded a team that wrote a large and comprehensive survey paper on counterfactual explanations. Currently, Sahil is interested in problems of explainability in recommender systems and fairness in LLMs.

Publication Library

Achieving Downstream Fairness with Geometric Repair

In this work, we propose a preliminary approach to the problem of producing fair probabilities such that fairness can be guaranteed for downstream users of the model, which we term all-threshold fairness.
read more

Amortized Generation of Sequential Counterfactuals for Black Box Models

We propose a novel stochastic-control-based approach that generates sequential Algorithmic Recourses (ARs), which is model-agnostic and black box.
read more

Counterfactual Explanations for Machine Learning: Challenges Revisited

Leveraging recent work outlining desirable properties of CFEs and our experience running the ML wing of a model monitoring startup, we identify outstanding obstacles hindering CFE deployment in industry.
read more

Counterfactual Explanations for Machine Learning: A Review

NeurIPS 2020 Workshop on ML Retrospectives, Best Paper Award.
Modern approaches to counterfactual explainability in machine learning draw connections to the established legal doctrine in many countries, making them appealing to fielded systems in high-impact areas such as finance and healthcare.
read more

From Publishing to Practice: Bringing AI Model Monitoring to a Healthcare Setting: FAccT 2021

The FATE and robustness in AI/ML communities continue to develop techniques for measuring and partially mitigating forms of bias. Yet, translation of those techniques to “boots on the ground” healthcare settings comes with challenges.
read more

Tensions Between the Proxies of Human Values in AI

Over the past decade, the AI community has discovered time and again the tensions within and between popular formulations of fairness, explainability, and privacy. This position paper advocates for a redesign of these human value proxies via new research in context-aware machine learning systems.
read more

Characterizing Anomalies with Explainable Classifiers

We devise a novel method combining ideas from the fields of anomaly detection and interpretability to detect and characterize clusters of drifted points. We also show how simple rules can be extracted to generate database queries for anomalous data and to detect anomalous data in the future. 
read more

On the Generalizability and Predictability of Recommender Systems

We create RecZilla, a meta-learning approach to recommender systems that uses a model to predict the best algorithm and hyperparameters for new, unseen datasets. We also release our code and pretrained RecZilla models, as well as all of our raw experimental results.
read more

Robustness Disparities in Commercial Face Detection

We present the first of its kind detailed benchmark of the robustness of three facial detection and analysis systems: Amazon Rekognition, Microsoft Azure, and Google Cloud Platform. We use both standard and recently released academic facial datasets to quantitatively analyze trends in robustness for each.
read more



February 8-10, 2023
Raleigh, NC
Learn More

Conference Highlights & Talks


Bringing AI Model Governance to a Healthcare Setting

From academic publishing to real
world practice, learn how Humana and Arthur worked together to transform the third largest health insurance provider 
in the nation.