About Experience Stack Projects Leadership Contact Get in touch

I don't just build models — I own what happens after they ship. My background spans biosystems engineering, financial ML at scale, and production AI for global digital health products.

At every role I've pushed beyond the brief — shaping how AI gets evaluated, deployed, and measured against outcomes that matter.

Chehak Arora
🤖 RAG pipelines
📊 500K+ users
⚡ MLOps · CI/CD
0K+
Users on production AI
0M+
Records processed at PNC
0%
ETL runtime reduction
0%
User engagement lift
Where I've worked

Experience

Jun 2025 – Present
Alcon
Fort Worth, TX
Data Scientist, Digital Health MLOps
  • Architected conversational AI systems for global digital health platforms using serverless AWS, serving 500K+ users with privacy-aware design.
  • Designed RAG-based semantic retrieval pipelines integrating structured and unstructured healthcare data to power AI chatbots.
  • Built A/B testing and causal inference frameworks driving statistically validated product decisions and measurably improved user engagement.
  • Owned end-to-end MLOps: MLflow, LLM-as-a-judge eval, Spark pipelines, CI/CD via GitHub Actions.
500K+Users served globally
20%User engagement lift
Jan 2025 – Jun 2025
PNC Bank
Pittsburgh, PA
Data Scientist
  • Built Apache Spark and SQL pipelines processing 100M+ time series and text records for Product, Operations, and Compliance.
  • Implemented HDBSCAN clustering for anomaly detection on live system metrics, surfacing risk signals rule-based monitoring couldn't catch.
  • Partnered with compliance teams to define AI safety thresholds and stress-test against adversarial edge cases before deployment.
100M+Records processed
30%Faster ETL runtime
Jun 2022 – Nov 2023
Pfizer (via CRB)
Kalamazoo, MI
Project Engineer
  • Identified a gap in vendor evaluation and independently designed an end-to-end RFP documentation framework, standardizing assessment of 10+ enterprise vendors.
  • Built financial models and dashboards to forecast Revenue Cycle Management performance at scale.
10+Vendors standardized

Additional Experience & Research

Undergrad Research AssistantIQ Health Science & Engineering, MSU · Modeled circadian rhythm genetic feedback loops
Undergrad Research AssistantUniversity of Michigan, STEM Ed · K–12 STEM program outcomes research
Innovation InternGreenvayu · UI/UX research for clean-air product usability
Product ManagerStartupparty · Roadmap and cross-functional coordination
Graphic & Web DesignerLeonard Gelfand Center, CMU · STEM outreach visual assets
Graduate Teaching AssistantCalculus 2 · Recitations and office hours
TutorPLUStutor · One-on-one math support for middle schoolers
ExternAT&T · Telecom enterprise operations shadowing

Tools & technologies
Tech Stack

Move your mouse through the field

Core Language
Python
ML pipelinesscripting
Deep Learning
PyTorch · HuggingFace
transformersfine-tuning
LLM / RAG
LangChain · FAISS
RAGvector searchLLM eval
NLP
BERTopic · SBERT
topic modelingembeddings
Classical ML
scikit-learn · XGBoost
HDBSCANanomaly det.
Big Data
Apache Spark · dbt
100M+ recordsETLSQL
Cloud
AWS Lambda · S3 · Sage
serverlessprivacy-aware
MLOps
MLflow · GitHub Actions
CI/CDDockertracking
Stats & Causal
A/B Testing · Causal Inf.
Bayesiantime series
Visualization
Streamlit · Plotly
dashboardsSeaborn
Languages
R · SQL · Bash · JS
statsquerying
Domain
Healthcare AI · FinML
HIPAA awarecompliance

What I've built

Projects

LLMNLPSentiment
NVIDIA · Jan – May 2025
Social Listening for Physical AI

End-to-end NLP pipeline tracking social media sentiment across generative, physical, and industrial AI for Nvidia's automotive and robotics verticals. Transformer-based sentiment + BERTopic, delivered via Streamlit dashboards.

GenAIStreamlitRecommender
TartanHacks · Carnegie Mellon · 2025
Closet Coordinator

Attribute-based fashion recommendation app with a digital wardrobe browser and AI-powered outfit recommender suggesting matches via color theory and style attributes.

BERTVector SearchFAISS
Carnegie Mellon Libraries · Dec 2024 – Jan 2025

Semantic search and clustering using BERT Sentence Transformers and FAISS to map relationships across hundreds of academic theses. Showcased at CMU's Love Data Week.

Data ModelingProcess Optimization
Perrigo Pharmaceuticals
Line Speed Optimization

Analyzed manufacturing telemetry to identify production bottlenecks; data models simulating process adjustments increased throughput without quality trade-offs.

Academic background

Education

MS, Data Analytics for Science
Carnegie Mellon University
Aug 2024 – May 2025 · Pittsburgh, PA
BS, Biosystems Engineering
Michigan State University
2018 – 2022 · East Lansing, MI
Community & Impact

Leadership

Graduation Speaker
Selected to represent the student body at the graduation ceremony
Vice President
Indian Grad Student Association, CMU
Graduate Student Representative
MS in Data Analytics for Science, CMU
Director of PR & Events
International Students Association, MSU
Global Ambassador
Bridged international students with campus resources at MSU
Career Peer
College of Engineering, MSU — networking events and mentorship
Resident Assistant
Michigan State University
Beyond the screen

Off-Duty

✈️
Aviation Enthusiast
Got behind the controls of a plane and discovered a love for flying.
🍲
Home Chef
I love cooking and can whip up a proper meal off the clock.
🎤
Performer at Heart
Always down for dancing and singing whenever there's a chance.
Get in touch

Let's build something.

If you're working on something interesting, I'd love to hear about it.