Ali Yassine_

Applied AI Engineer

Building production RAG systems, fine-tuned LLMs, and inference APIs for real clients.

Scroll

About

I'm an AI engineer focused on shipping production GenAI features. Most of my work has been building RAG systems and fine-tuned LLMs for client knowledge bases — taking AI features from prototype to deployed service on Azure and GCP. NVIDIA-Certified Associate in Generative AI / LLMs. Currently open to applied AI and forward-deployed engineering roles in SoCal or remote.

Production RAGFine-tuning (LoRA/QLoRA)Azure / GCPNVIDIA-CertifiedBased in SoCal

Experience

AI Engineer Intern

Product Perfect

Mar 2025 – Sept 2025|Brea, CA

•Optimized computer vision inference pipelines (Detectron2, Stable Diffusion) by moving to async FastAPI and ONNX Runtime, cutting tail latency by ~70%.
•Profiled GPU workloads with Nsight Systems to identify bottlenecks, reducing peak VRAM usage and enabling larger batch sizes.
•Containerized inference services with Docker and integrated into CI/CD for one-command deploys.

AI Engineer Consultant

Sidereal Solutions

Aug 2024 – Present|Remote, CA

•Built and shipped production RAG pipelines for client knowledge bases, integrating LangChain and vector retrieval on Azure GPU instances; cut p95 query latency by ~50%.
•Fine-tuned open-source LLMs (Llama 3, Mistral) with LoRA/QLoRA, deploying behind FastAPI with streaming responses and structured-output guardrails.
•Migrated CPU-bound preprocessing to GPU-accelerated workflows using NVIDIA RAPIDS (cuDF), reducing batch processing from hours to minutes.

Personal Projects

Lectern

Educational GenAI platform with fine-tuned Llama 3 and end-to-end RAG

Ali Yassine_

About

Experience

AI Engineer Intern

AI Engineer Consultant

Personal Projects

Lectern

Beach Finder

Catan AI

Skills

AI / ML

Languages

Frameworks

Infra & MLOps

Education & Certifications

California State University, Fullerton

NVIDIA-Certified Associate

Resume

Contact