Karan Bhanot, Ph.D.

About me

I am Karan Bhanot, a Senior Data Scientist at Norstella with a Ph.D. in Computer Science from Rensselaer Polytechnic Institute. I build production-grade AI systems for healthcare using LLMs, Machine Learning, and agentic workflows across structured and unstructured data.

My work centers on end-to-end AI pipelines, from data ingestion and retrieval to prompt engineering, context engineering, structured extraction, and multi-layer validation. I focus on building systems that are reliable, observable, and deployable at scale.

I also bring a strong research foundation, with 10+ publications in leading AI conferences and journals across Generative AI, fairness, synthetic data, trustworthy machine learning, and healthcare AI.

Experience

Senior Data Scientist, Norstella | 2025-Present
  • Architected a configuration-driven data ingestion framework integrating Amazon Redshift and OpenSearch, enabling dynamic cohort selection across ICD, NDC, and multi-source queries while scaling to 50M+ clinical notes per run.
  • Reduced end-to-end processing time from hours to minutes by optimizing query execution, modularizing workflows, and standardizing ingestion patterns across projects.
  • Built end-to-end LLM-powered extraction pipelines spanning prompt engineering, context engineering, structured outputs, and validation, increasing review throughput from roughly 20-30 to 300-500 notes per cycle using LLM-as-judge and human-in-the-loop workflows.
  • Applied deep clinical domain expertise across oncology and multi-disease use cases to extract complex entities such as biomarkers, genes, medications, and imaging findings with stronger contextual accuracy and reliability.
  • Developed AI agents and automation workflows using tools such as Claude and Cursor to eliminate repetitive tasks, enable rapid prototyping, and reduce execution time for reusable skills while enforcing secure guardrails.
  • Built AI-driven dashboards and monitoring systems that surfaced missing deliverables, delays, and project risks, turning previously untracked workflow gaps into actionable delivery insights.
  • Drove cross-functional collaboration and technical leadership by mentoring junior engineers, accelerating adoption of new tools and features, and improving cost efficiency across select workflows.
Data Scientist, Norstella | 2023-2025
  • Managed multiple concurrent projects end-to-end with consistent on-time delivery, including taking over critical responsibilities during team transitions and extended absences.
  • Built and deployed hybrid ML and LLM solutions that enabled extraction of insights not feasible with traditional methods or available data alone, unlocking new analytical capabilities.
  • Delivered production-grade ML models processing approximately 20K records weekly, generating critical diagnostic insights for clients and transforming previously inaccessible data into actionable outputs.
  • Developed LLM-based data processing workflows spanning ingestion, prompt engineering, and output structuring, turning raw unstructured inputs into clean client-ready formats for downstream use.
AI Research Scholar, IBM | 2022-2023
  • Spearheaded the end-to-end development of a Python framework platform with modular, extendable, and maintainable code, improving the process of Machine Learning algorithm fairness evaluation across 81 data scenarios by 60%.
  • Authored a successful two-year grant securing $400,000 in funding for research and development with IBM, facilitating architecture design, software development, and research dissemination for robust and responsible AI applications.
  • Published 5 first-author papers on Machine Learning and Data at peer-reviewed conferences and journals, presenting complex technical ideas to multiple interdisciplinary audiences.
  • Collaborated with academic and industry ML engineers, researchers, and experts to identify and integrate open-source libraries, reducing overall development time by 80%.
Graduate Research Assistant, Rensselaer Polytechnic Institute | 2019-2021
  • Collaborated with cross-functional teams in data science, engineering, privacy, and fairness to develop 4 production-ready software applications across 40% of the development cycle.
  • Analyzed large-scale private Electronic Healthcare Records data on secure cloud servers for statistical analysis, exploratory data analysis, ML model training, and visualization across 300,000 records and 100 features.
  • Managed a team of 30 students for the development, deployment, and monitoring of the award-winning MortalityMinder visualization application, earning third place and a $15,000 prize.
  • Mentored 50 students on programming fundamentals, code design, code reviews, and documentation, resulting in a 30% decrease in code-related issues across 6 months.
  • Performed data collection, feature engineering, exploratory data analysis, and data aggregation across multiple datasets, facilitating access to real-world data for 120 students for projects and research.
AI Fairness Research Extern, IBM | 2021
  • Created two comprehensive datasets after aggregating 20 CSV files to evaluate bias in ML models, identifying variability of 20% across multiple experiments.
  • Summarized findings from 30 articles on AI ethics and fairness, informing decision-making processes within the team.
  • Contributed actively to weekly team meetings by presenting research findings, proposing solutions to mitigate bias, and fostering collaboration among team members.
Software Engineer, Cvent | 2018-2019
  • Developed and implemented 100 new features for the Appointments product, resulting in a 20% increase in product functionality.
  • Collaborated with 4 cross-functional teams in agile development, ensuring seamless integration of new features into the software.
  • Conducted thorough code reviews, resulting in a 40% decrease in bugs and improved overall code quality.
  • Engaged in 50 community-building sessions and product demos, fostering collaboration across 5 departments.

Education

  • Doctor of Philosophy (Ph.D.) in Computer Science, Rensselaer Polytechnic Institute, 2019-2023
  • Master of Science (M.Sc.) in Computer Science, Rensselaer Polytechnic Institute, 2019-2021
  • Bachelor of Technology (B.Tech.) in Computer Science, Punjab Engineering College, 2014-2018

Collaborations

My Ph.D. thesis, "Synthetic Data Generation and Evaluation for Fairness," was completed at Rensselaer Polytechnic Institute under the guidance of my advisor and mentor, Dr. Kristin P. Bennett.

I have also collaborated with experts across academia and industry including Dr. Isabelle Guyon (ChaLearn, Google), Dr. John S. Erickson (RPI), Dr. Ioana Baldini (IBM), Dr. Dennis Wei (IBM), Dr. Jiaming Zeng (formerly IBM, now AKASA), Dr. Yooyoung Park (formerly IBM, now Moderna), and Thilanka Munasinghe (RPI).

Research

  • K. Bhanot, I. Baldini, D. Wei, J. Zeng, K. P. Bennett, "Stress-testing Bias Mitigation Algorithms to Understand Fairness Vulnerabilities", AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2023. (Paper)
  • K. Bhanot, I. Baldini, D. Wei, J. Zeng, K. P. Bennett, "Stress-testing Fairness Mitigation Techniques under Distribution Shift using Synthetic Data", Knowledge Discovery in Databases (KDD), 2022. (Paper)
  • K. Bhanot, I. Baldini, D. Wei, J. Zeng, K. P. Bennett, "Downstream Fairness Caveats with Synthetic Healthcare Data", Conference on Health, Inference, and Learning (CHIL), 2022.
  • K. Bhanot, I. Baldini, D. Wei, J. Zeng, K. P. Bennett, "Evaluating Fairness of Synthetic Healthcare Data Models", AMIA Annual Symposium, 2022. (Paper)
  • J. S. Franklin, K. Bhanot, M. Ghalwash, K. P. Bennett, J. McCusker, D. L. McGuinness, "An Ontology for Fairness Metrics", AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2022. (Paper)
  • K. Bhanot, J. Pedersen, I. Guyon, K. P. Bennett, "Investigating synthetic medical time-series resemblance", Neurocomputing, 2022. (Paper)
  • K. Bhanot, S. Dash, J. Pedersen, I. Guyon, K. P. Bennett, "Quantifying Resemblance of Synthetic Medical Time-Series", European Symposium on Artificial Neural Networks (ESANN), 2021. (Paper)
  • K. Bhanot, M. Qi, J. S. Erickson, I. Guyon, K. P. Bennett, "The problem of fairness in synthetic healthcare data", Entropy Journal 2021 Special Issue - Representation Learning: Theory, Applications and Ethical Issues, 2021. (Paper)
  • K. Bhanot, J. McConnon, S. Jacobson, L. Ngweta, J. S. Erickson, K. P. Bennett, "Investigating social determinants of premature mortality in the united states", Institute for Data Exploration and Applications (IDEA), 2021. (Paper)
  • T. Munasinghe, A. N. Maheshwarkar, K. Bhanot, "Socioeconomic and Geographic Variations that Impacts the Spread of Malaria", AAAI Fall 2020 Symposium on AI for Social Good, 2020. (Paper)
  • K. P. Bennett, L. Ngweta, K. Bhanot, J. S. Erickson, "MortalityMinder: A Web Tool for Visualizing and Investigating Social Determinants of Premature Mortality in the United States", AMIA Annual Symposium, 2020.
  • K. Bhanot, D. Schroeder, I. Llewellyn, N. Luczak, T. Munasinghe, "Dengue Spread Information System (DSIS)", International Conference on Medical and Health Informatics (ICMHI), 2020. (Paper)
  • L. Ngweta, K. Bhanot, A. Maharaj, I. Bogle, T. Munasinghe, "Identifying the Relationship between Precipitation and Zika Outbreaks in Argentina", IEEE International Conference on Big Data, 2019. (Paper)