Meet Our Summer 2022 Research Fellows

Meet Our Summer 2022 Research Fellows

Research is core to who we are at Arthur. It guides our approach to product development and helps us drive exclusive capabilities in computer vision, NLP,  bias mitigation, and other critical areas in machine learning monitoring. From presentations to publications, our diverse research team is always at the cutting edge of what’s happening in ML. 

On that note, we’re excited to introduce our four ML Research Fellows for 2022—some who are new to Arthur and others who are returning for a second summer with us. We talked with them to learn some more about their research interests and to get some insight into what they’ll be working on throughout the next few months.

Naveen Durvasula

We’re really excited to be working with you at Arthur! Please tell us a little bit about your background and research interests.

I’m an undergraduate at the University of California, Berkeley, and I’m primarily interested in theoretical contributions at the intersection of economics, optimization, and statistics. I’m fortunate to be advised by Profs. Nika Haghtalab, Scott Kominers, John Dickerson, and Aravind Srinivasan. Some of the projects I’m currently working on include developing online no-regret algorithms for auction design, forecasting patient outcomes in kidney exchange, and determining whether data-sharing can be done in a way that better respects user interests. I find problems in this area really interesting because they allow mathematical insights to translate into tangible real-world impact. 

What interested you about working at Arthur?

I’m really excited about working at Arthur because I’m very interested in working on problems that can actually bring about positive real-world change. I’m still trying to figure out exactly what I want to do after college, and while I’m pretty much set on pursuing a PhD, I’m not sure about the extent to which I want to be in academia vs. industry. One of my fears from my (so far limited) exposure to academia is that my perception of what problems are important and interesting will differ greatly from the problems that actually matter, resulting in my work remaining purely in the realm of academic interest. I’m excited to learn about the kinds of problems that are not only societally relevant, but also important enough that people might be willing to pay for a solution, and figure out how I can play a role in developing those solutions.

For this summer, what are some areas of research that you’re interested in pursuing?

I think there are a lot of interesting open problems related to distributional shift, explainability, and fairness. It also seems like there are some key ideas that are common to these areas. One paper that was presented in a reading group that I’m in showed that distributional shift is connected to the multiaccuracy and multicalibration problems studied in the algorithmic fairness literature. I’m interested in learning more about these fields, and studying the relevant statistical, algorithmic, and game-theoretic problems that arise. 

Michelle Bao

Hey Michelle! Please tell us a little bit about your research background and interests.

I’m studying Symbolic Systems, Ethics in Technology, and Computer Science at Stanford—basically, a mix of computer science, philosophy, and human-centered AI. I’ve built ML systems in industry at the Research & Development team at The New York Times and researched ML systems through the Stanford NLP Group, the ACLU, and Stanford ML Group. I’m passionate about critically shaping AI to be more fair and just through interdisciplinary lenses. I’m particularly interested in how AI ethics research can be impactful for AI used in real-world applications, which is why I think Arthur is a great fit for me. I’m also broadly interested in the impact of technology on society from lenses of philosophy of science, anthropology, STS, comparative race and ethnicity studies, sociology, and art history/practice. 

Your recent paper about “It’s COMPASlicated” was fascinating and had a really interesting group of researchers. Can you tell us a little bit about how that project was conceived and what it was like to work on it?

Yes, the team on the paper was just stellar and I learned so much from them—I got to work with experts from AI ethics, criminal justice, psychology, statistics, and data science. I got involved when I joined as an intern on the Data Analytics team at the ACLU, and my manager had gotten involved through a Twitter conversation (funnily enough, also found out about Arthur through Twitter). The coolest part of the project was combining so many differing perspectives to shape our overview of the complexities of using criminal justice risk assessment datasets for AI fairness. The paper is certainly about how sociotechnical systems that datasets exist in, the criminal justice system in particular, may bias benchmark datasets or result in unexpected real-world impacts due to downstream decision-making. However, it is also about how current practices of AI researchers are not conducive to meaningful progress. The paper ends with a call to arms—it is not simply a critique but a demand for shifting objectives, values, and priorities.

For this summer, what are some areas of research that you’re interested in pursuing?

I’m still very much in the process of deciding, but there are a few areas I’m interested in. I would love to explore how critical theory perspectives on fairness and justice can shape practical uses of data science, algorithms, and AI, or how practical issues like data drift can add complexity to the fairness and justice objectives.

Kweku Kwegyir-Aggrey

You’re joining us for a second summer and we’re thrilled to continue working with you. To remind everyone, what are some of your academic research interests that you’ve been pursuing at Brown?

Really excited to spend another summer with Arthur! At Brown, I’ve been interested in developing reliable sample efficient techniques that can be used to regulate machine learning models by verifying properties of their performance. Lately, this has taken the form of studying what a ‘good’ audit protocol would look like between a hypothetical algorithmic auditor, and auditee. Asking questions like: can we design a working audit protocol if the auditee is trying to cheat the auditor/what are the minimum assumptions we’d have to make for an audit in this type of adversarial setting to be usable in practice? Before this line of research I worked on problems in algorithmic fairness and coming up with flexible methods for fair classification.  

Tell us a bit about what you worked on last summer at Arthur and where that is headed.

Last summer we worked on a really interesting and challenging problem in the algorithmic fairness space. At a high level, we considered a setting where some ‘upstream’ model developer is tasked with creating a model that will be used by several different but distinct ‘downstream’ clients, each of whom may want to use the classifier for the same prediction task where fairness is a priority, but in slightly different ways. To address this upstream/downstream setting, we found a really interesting application of optimal transport which makes fair classification possible for this problem, and performs well in practice. Research on this problem should be wrapping up soon and I’m excited to share those results in the near future! 

The technique we developed has a lot of potential to be applied in some related but slightly different domains, and so I’m looking forward to exploring and expanding on the work we’ve already done to hone in on some of those new directions.  

Are there new areas of research that you’re keen to explore this summer?

A little bit of everything! One of the things that’s very exciting about the state of fairness research and Arthur’s product/goal is that there are a lot of unasked and unanswered questions about exactly what a future with machine learning looks like. It feels like there’s an endless amount of new work in the fairness domain related to issues of explainability, interpretability, casuality, or algorithmic recourse that I haven’t gotten the chance to dig into yet, so I’m mostly hoping to keep my eyes and ears open and hopefully find some new and exciting problems at the intersections of some of these areas! 

Avi Schwarzschild

Welcome Avi! Tell us a bit about yourself.

I am a fourth-year Ph.D. student in the Applied Math and Scientific Computation program at the University of Maryland. I am advised by Tom Goldstein on my work in deep learning. My general interests range from security and privacy to generalization and my work focuses on expanding our understanding of when and why neural networks work. My specific interest in data security and model vulnerability has led to work on adversarial attacks and data poisoning. I am also investigating neural networks’ ability to extrapolate from easy training tasks to more difficult problems at test time.

What do you think were some of the most interesting papers/results last year in the field of model security and robustness?

Recent research into the security of federated learning has raised serious questions about whether the increasingly popular technique is following through on its promise of privacy. Federated learning is touted, in part, as a way to train large models on private data in a distributed fashion. Since practitioners may be employing federated learning for its privacy benefits, I think it’s critical to understand exactly how much data protection it really provides. Several recent papers investigate these types of attacks and they show that for a variety of architectures there are methods to recover training data at different points in the federated learning pipeline. This line of work is exciting to me and begs the question: What can be changed about federated learning routines to follow through on the promise of protecting training data?

For this summer, what are some areas of research that you’re interested in pursuing?

I am interested in the privacy and security of machine learning systems. This summer, I’m hoping to deepen my own understanding of where and when we can expect data to remain private. I’m also looking forward to empirical investigations and evaluations of attacks and defenses in the privacy space. With adequate attention to preserving privacy, I believe we can have high-performing AI without compromising private information from the people whose data makes it all work.