Ali Yassine_
Applied AI Engineer
Building production RAG systems, fine-tuned LLMs, and inference APIs for real clients.
About
I'm an AI engineer focused on shipping production GenAI features. Most of my work has been building RAG systems and fine-tuned LLMs for client knowledge bases — taking AI features from prototype to deployed service on Azure and GCP. NVIDIA-Certified Associate in Generative AI / LLMs. Currently open to applied AI and forward-deployed engineering roles in SoCal or remote.
Experience
AI Engineer Intern
Product Perfect
- •Optimized computer vision inference pipelines (Detectron2, Stable Diffusion) by moving to async FastAPI and ONNX Runtime, cutting tail latency by ~70%.
- •Profiled GPU workloads with Nsight Systems to identify bottlenecks, reducing peak VRAM usage and enabling larger batch sizes.
- •Containerized inference services with Docker and integrated into CI/CD for one-command deploys.
AI Engineer Consultant
Sidereal Solutions
- •Built and shipped production RAG pipelines for client knowledge bases, integrating LangChain and vector retrieval on Azure GPU instances; cut p95 query latency by ~50%.
- •Fine-tuned open-source LLMs (Llama 3, Mistral) with LoRA/QLoRA, deploying behind FastAPI with streaming responses and structured-output guardrails.
- •Migrated CPU-bound preprocessing to GPU-accelerated workflows using NVIDIA RAPIDS (cuDF), reducing batch processing from hours to minutes.
Personal Projects
Lectern
Educational GenAI platform with fine-tuned Llama 3 and end-to-end RAG
- •Built an end-to-end RAG application with vector retrieval and a Next.js front end; supports document and video uploads streamed to an inference API for on-demand study material generation.
- •Designed prompt-level guardrails and retrieval filters to scope generated content to uploaded source material, with refusal behavior when relevant context isn't found in the corpus.

Beach Finder
Real-time AI insights app combining government weather/ocean data with LLM-generated forecasts
- •Shipped a full-stack app combining government-provided live weather and ocean safety data with LLM-generated summaries and ML-based condition predictions; deployed to GCP.
- •Cached and parallelized external API calls to cut response time by ~50%, making generated forecasts feel real-time to the user instead of buffered.

Catan AI
Game-playing AI for Settlers of Catan using strategic decision-making algorithms
- •Built an AI opponent that evaluates game states and makes strategic decisions across resource trading, settlement placement, and development card play.
- •Supports 3-4 player simulations with multiple AI-controlled opponents competing simultaneously using alpha-beta pruning and custom heuristics.

Skills
AI / ML
Languages
Frameworks
Infra & MLOps
Education & Certifications
California State University, Fullerton
B.S. Computer Science
Aug 2021 – May 2025
Machine Learning, Advanced Neural Networks, Parallel Computing, Cloud Computing & Distributed Systems
Resume
Trouble viewing? Download the PDF or open in a new tab