Hi, I'm Saif Mahmoud

AI Engineer | Systems and Inference

ML Systems Engineer and Researcher. Currently at Al Ain University.

Connect:

About Me

Learn more about my background, education, and courses

Education

Bachelor of Science in Software Engineering

Al Ain University

Abu Dhabi, UAE

Sep 2023 – May 2027 (Expected)

GPA: 3.81(Honors list)

Courses

Work Experience

My professional journey and contributions

Undergraduate Research Assistant

Al Ain University

UAE
November 2025 – Present

Efficient ML Systems research, and LLM inference under adverse long-context and batching. Supervised under Dr. Yazeed Ghadi, and Dr. Armagan Elibol

Attention
KV Caching
Speculative Decoding
Vision Transformers
Sparsification

AI Engineering Intern

LUXAI

UAE
July 2025 – Mar 2026

Social media intelligence pipeline with on-device inference constraints. Custom Triton fused attention for Turing, and GraphQL API interception for data extraction

Triton
ONNX Runtime
Faster-Whisper
VADER
GraphQL
FastAPI

Software Engineering Intern

Smart Navigation Systems

UAE
May 2025 – November 2025

Himaya71, a campus safety program with on-device YOLOv8s and ESP32 telemetry via Django REST API polling. 3rd place winners at UAEU I2P 2025 Program

Python
Django
YOLOv8s
PostgreSQL
IoT

Skills & Technologies

Tools and technologies I work with

Languages & Frameworks

Python
TypeScript
Bash
C++
Java
FastAPI
Django
Node.js

Inference & Optimization

Triton
CUDA
Triton Inference Server
LoRA
Quantization
ONNX Runtime
TensorRT
CTranslate2

AI & Machine Learning

PyTorch
Spacy
Hugging Face
OpenCV
scikit-learn
NLTK
NumPy
Pandas

DevOps & Tools

Docker
Linux
GitHub Actions
Alembic
Ruff
PostgreSQL
pgvector
Playwright
Redis

Featured Projects

A brief showcase

Reelwise

Featured Project

High-fidelity content intelligence pipeline. I Intercepted internal network traffic to fetch raw metadata, then routed the videos through a custom VADER triage gate (sub-35ms latency). This filtering reduced GPU load by 60%, allowing the heavier Int8 Faster-Whisper model to run within a strict 190MB VRAM constraint. Authored a custom triton kernel to increase cold-start throughput by 71%, bypassing caching overhead and maintaining 0.96x sustained throughput

Python
Playwright
Faster-Whisper
VADER
GraphQL
Next.js

Oryx Intelligence

Featured Project

Bilingual SEO engine that drove a 288% traffic surge. I engineered a pipeline combining unsupervised K-Means clustering for niche identification and a perplexity-based validation gate (Sentence-Transformers) to act as an 'LLM-as-a-judge,' preventing low-quality content generation.

Python
Gemini API
Sentence-Transformers
Scikit-Learn
K-Means
TF-IDF

Get In Touch

I'm always open to new opportunities and collaborations

Let's connect!

Whether you have a question, want to discuss a project, or just want to say hi, I'll try my best to get back to you!