Mohammad Safarzadeh

New York, NY

I work on the evaluation of code-generation LLMs, reasoning models, and retrieval-augmented generation systems, with a focus on making benchmarks more reliable, detecting data leakage, and improving NL2SQL evaluation. At Oracle, I also work on conflict resolution in RAG pipelines for financial-domain applications, alongside large language models, generative AI systems, and domain adaptation for high-impact use cases including healthcare.

Before Oracle, I worked at Perceive, where I focused on quantization-aware training, efficient neural network inference, and lightweight models for edge deployment.

Before that, I was at FICO, building machine learning models for credit card fraud detection and other high-stakes decision systems.

My academic background is in astrophysics. I earned my PhD from Johns Hopkins University and held postdoctoral research positions at ASU, UCSC, Harvard, and NASA before moving into applied machine learning. That research training still shapes how I approach modeling, experimentation, and scientific rigor in modern AI systems.

Download CV (PDF) Google Scholar profile

Toy multi-agent financial advising framework

A small Streamlit and LangGraph prototype for experimenting with multi-agent financial-advice workflows, designed as an entry point for building more complex scenarios with structured outputs, retrieval, knowledge-graph evidence, memory reuse, and LLM-as-judge conflict detection.

View project on GitHub

selected publications

EMNLP
Evaluating NL2SQL via SQL2NL

Mohammadtaher Safarzadeh, Afshin Oroojlooyjadid, and Dan Roth

In Findings of EMNLP, 2025

Abs arXiv Bib HTML

We propose a schema-aligned paraphrasing framework that leverages SQL-to-NL (SQL2NL) to automatically generate semantically equivalent, lexically diverse queries for robust evaluation of NL2SQL models. Our analysis reveals that state-of-the-art models are far more brittle than standard benchmarks suggest.
@inproceedings{safarzadeh2025nl2sql, title = {Evaluating {NL2SQL} via {SQL2NL}}, author = {Safarzadeh, Mohammadtaher and Oroojlooyjadid, Afshin and Roth, Dan}, booktitle = {Findings of EMNLP}, year = {2025}, url = {https://aclanthology.org/2025.findings-emnlp.1031}, }

ACL

When Vision-Language Models Judge Without Seeing: Exposing Informativeness Bias

X. Zou, R. Sridhar, M. Safarzadeh, and 1 more author

In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2026

Main Conference

arXiv Bib arXiv

@inproceedings{zou2026informativeness,
  title = {When Vision-Language Models Judge Without Seeing: Exposing Informativeness Bias},
  author = {Zou, X. and Sridhar, R. and Safarzadeh, M. and Roth, D.},
  booktitle = {Proceedings of the Annual Meeting of the Association for Computational Linguistics},
  note = {Main Conference},
  year = {2026},
  eprint = {2604.17768},
  archiveprefix = {arXiv},
}

ACL

SPENCE: A Syntactic Probe for Detecting Contamination in NL2SQL Benchmarks

M. Safarzadeh, H. Laxmichand Patel, A. Oroojlooy, and 2 more authors

In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2026

Main Conference

arXiv Bib arXiv

@inproceedings{safarzadeh2026spence,
  title = {{SPENCE}: A Syntactic Probe for Detecting Contamination in {NL2SQL} Benchmarks},
  author = {Safarzadeh, M. and Patel, H. Laxmichand and Oroojlooy, A. and Horwood, G. and Roth, D.},
  booktitle = {Proceedings of the Annual Meeting of the Association for Computational Linguistics},
  note = {Main Conference},
  year = {2026},
  eprint = {2604.17771},
  archiveprefix = {arXiv},
}