Ali Yassine_

Applied AI Engineer

Building production RAG systems, fine-tuned LLMs, and inference APIs for real clients.

Scroll

About

I'm an AI engineer focused on shipping production GenAI features. Most of my work has been building RAG systems and fine-tuned LLMs for client knowledge bases — taking AI features from prototype to deployed service on Azure and GCP. NVIDIA-Certified Associate in Generative AI / LLMs. Currently open to applied AI and forward-deployed engineering roles in SoCal or remote.

Production RAGFine-tuning (LoRA/QLoRA)Azure / GCPNVIDIA-CertifiedBased in SoCal

Experience

AI Engineer Intern

Product Perfect

Mar 2025 – Sept 2025|Brea, CA
  • Optimized computer vision inference pipelines (Detectron2, Stable Diffusion) by moving to async FastAPI and ONNX Runtime, cutting tail latency by ~70%.
  • Profiled GPU workloads with Nsight Systems to identify bottlenecks, reducing peak VRAM usage and enabling larger batch sizes.
  • Containerized inference services with Docker and integrated into CI/CD for one-command deploys.

AI Engineer Consultant

Sidereal Solutions

Aug 2024 – Present|Remote, CA
  • Built and shipped production RAG pipelines for client knowledge bases, integrating LangChain and vector retrieval on Azure GPU instances; cut p95 query latency by ~50%.
  • Fine-tuned open-source LLMs (Llama 3, Mistral) with LoRA/QLoRA, deploying behind FastAPI with streaming responses and structured-output guardrails.
  • Migrated CPU-bound preprocessing to GPU-accelerated workflows using NVIDIA RAPIDS (cuDF), reducing batch processing from hours to minutes.

Personal Projects

Lectern

Educational GenAI platform with fine-tuned Llama 3 and end-to-end RAG

Next.jsLlama 3LangChainVector DBFastAPI
  • Built an end-to-end RAG application with vector retrieval and a Next.js front end; supports document and video uploads streamed to an inference API for on-demand study material generation.
  • Designed prompt-level guardrails and retrieval filters to scope generated content to uploaded source material, with refusal behavior when relevant context isn't found in the corpus.
Lectern screenshot

Beach Finder

Real-time AI insights app combining government weather/ocean data with LLM-generated forecasts

FastAPIReactGCPML Predictions
  • Shipped a full-stack app combining government-provided live weather and ocean safety data with LLM-generated summaries and ML-based condition predictions; deployed to GCP.
  • Cached and parallelized external API calls to cut response time by ~50%, making generated forecasts feel real-time to the user instead of buffered.
Beach Finder screenshot

Catan AI

Game-playing AI for Settlers of Catan using strategic decision-making algorithms

PythonGame TheoryHeuristic EvaluationPygame
  • Built an AI opponent that evaluates game states and makes strategic decisions across resource trading, settlement placement, and development card play.
  • Supports 3-4 player simulations with multiple AI-controlled opponents competing simultaneously using alpha-beta pruning and custom heuristics.
Catan AI screenshot

Skills

AI / ML

RAGLangChainPineconeHugging Face TransformersLlama / MistralGPT-4oLoRA / QLoRAPrompt EngineeringBraintrustLLM Evaluation

Languages

PythonTypeScript / JavaScriptJavaC/C++GoSQL

Frameworks

PyTorchTensorFlowONNX RuntimeFastAPINext.jsReactscikit-learn

Infra & MLOps

DockerAzureGCPGitCI/CDNVIDIA RAPIDSNsight Systems

Education & Certifications

California State University, Fullerton

B.S. Computer Science

Aug 2021 – May 2025

Machine Learning, Advanced Neural Networks, Parallel Computing, Cloud Computing & Distributed Systems

NVIDIA-Certified Associate

Generative AI LLMs

Verify on Credly

Resume

PDF preview is not available in this browser.

Open PDF in new tab

Trouble viewing? Download the PDF or open in a new tab

Built with v0