Mohammad Safarzadeh

Principal Applied Scientist at Oracle · Building and evaluating AI systems · PhD in Astrophysics, Johns Hopkins University.

mts_headshot.jpg

New York, NY

mtsafarzadeh@gmail.com

I work on the evaluation of code-generation LLMs, reasoning models, and retrieval-augmented generation systems, with a focus on making benchmarks more reliable, detecting data leakage, and improving NL2SQL evaluation. At Oracle, I also work on conflict resolution in RAG pipelines for financial-domain applications, alongside large language models, generative AI systems, and domain adaptation for high-impact use cases including healthcare.

Before Oracle, I worked at Perceive, where I focused on quantization-aware training, efficient neural network inference, and lightweight models for edge deployment.

Before that, I was at FICO, building machine learning models for credit card fraud detection and other high-stakes decision systems.

My academic background is in astrophysics. I earned my PhD from Johns Hopkins University and held postdoctoral research positions at ASU, UCSC, Harvard, and NASA before moving into applied machine learning. That research training still shapes how I approach modeling, experimentation, and scientific rigor in modern AI systems.

Download CV (PDF) Google Scholar profile

Toy multi-agent financial advising framework

A small Streamlit and LangGraph prototype for experimenting with multi-agent financial-advice workflows, designed as an entry point for building more complex scenarios with structured outputs, retrieval, knowledge-graph evidence, memory reuse, and LLM-as-judge conflict detection.

View project on GitHub

selected publications

  1. EMNLP
    Evaluating NL2SQL via SQL2NL
    Mohammadtaher Safarzadeh, Afshin Oroojlooyjadid, and Dan Roth
    In Findings of EMNLP, 2025
  2. ACL
    When Vision-Language Models Judge Without Seeing: Exposing Informativeness Bias
    X. Zou, R. Sridhar, M. Safarzadeh, and 1 more author
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2026
    Main Conference
  3. ACL
    SPENCE: A Syntactic Probe for Detecting Contamination in NL2SQL Benchmarks
    M. Safarzadeh, H. Laxmichand Patel, A. Oroojlooy, and 2 more authors
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2026
    Main Conference