Nguyen Hong Quang
DevOps Engineer | AI Platform Architecture
0964848403 | nhq0810@gmail.com | https://nhq.id.vn/ | Vietnam
EDUCATION
University of Information Technology (UIT) - VNU - HCM | Bachelor of Science in Information Security | 2018 – 2023
- Languages & Certifications: TOEIC LR: 600, SW: 230 (Professional Working); Vietnamese (Native).
PROFESSIONAL EXPERIENCE
KienlongBank | Tech Lead AI & DevOps/AIOps Engineer | Dec 2024 – Present
- Architected and managed multi-cloud infrastructure (AWS EC2, EKS, VPC, RDS, Bedrock, SageMaker, etc. / GCP Vertex AI) serving 15+ production microservices for thousands of banking users.
- Spearheaded the Enterprise AI Platform (KILOBA), leading the team to deploy 10+ AI chatbots from research to production with multi-model LLM integration (OpenAI, Gemini, Bedrock).
- Orchestrated a zero-downtime AWS Organizational Unit (OU) migration for the core production environment, drastically optimizing IAM posture and availability.
- Engineered GitOps CI/CD pipelines (ArgoCD, GitLab CI) and n8n workflows, reducing deployment cycles from days to minutes and saving 10+ hours/week.
- Implemented FinOps and a unified Observability stack (Prometheus, Loki, Tempo), reducing cloud compute costs by ~30% and accelerating incident resolution times from hours to minutes.
- Pioneered AIOps (Claude AI + MCP) and LightRAG knowledge graph architectures (Neo4j, Pinecone) to facilitate autonomous monitoring and high-accuracy retrieval.
- Optimized microservice communication by deploying high-throughput API Gateways (APISIX, Traefik, Nginx) and Redis caching, ensuring low-latency banking operations.
- Secured platform architecture by enforcing Zero-Trust networking, Kubernetes RBAC, and strict IAM governance across hybrid cloud environments.
CMC Telecom | Cloud Engineer Intern | Jun 2024 – Nov 2024
- Deployed resilient Site-to-Site VPNs (OpenVPN), ensuring secure, encrypted connections between client premises and AWS VPCs across high-availability environments.
- Troubleshot and debugged complex anomalies within CMC Cloud and AWS (EKS, EC2, RDS), strictly adhering to enterprise SLAs.
- Monitored infrastructure stability utilizing Elastic Stack (EFK), Jenkins, ArgoCD, and Prometheus suites.
VNPro | Cloud Intern | Feb 2024 – Jun 2024
- Administered and hardened educational AWS cloud infrastructures to support hundreds of CCNA, CCNP, and AWS Solutions Architect (SAA) trainee environments.
- Developed practical training modules and authored in-depth technical documentation for AWS administrative trainee classes.
TECHNICAL SKILLS
- Cloud Platforms: AWS (EC2, VPC, EKS, S3, RDS, IAM, Route53, Organizations), GCP (Vertex AI, Cloud Run), OCI.
- Containerization & Orchestration: Kubernetes (K8s, Ingress, StatefulSets, Deployments), AWS EKS, Docker (Buildx), Helm.
- CI/CD & Automation: GitLab CI/CD, AWS CodePipeline, ArgoCD, Jenkins, Terraform, Ansible, Git, Python, Bash.
- Monitoring & Observability: Prometheus, Grafana, Loki, Tempo, Elastic Stack (EFK, Kibana), AWS CloudWatch.
- Network & API Gateway: ALB/ELB, Nginx, APISIX, Traefik. VPC Peering, SecGroups, VPN (OpenVPN), SSL/TLS.
- Storage & Databases: S3, EBS, RDS (MySQL, PostgreSQL, MongoDB), Redis, Pinecone (VectorDB), Neo4j.
- OS & Environment: Linux (Ubuntu, Amazon Linux), WSL, SSH, Cloud CLI (AWS/gcloud/kubectl/eksctl).
- AI/ML Integration (Ecosystem): LLMs (Bedrock, Vertex, OpenAI, Gemini, Claude), Architectures (RAG, LightRAG), Workflows (n8n, MCP, Agents), Tools (Copilot, Kiro CLI, Prompt Eng).
SOFT SKILLS
- Leadership & Ownership, Fast Technology Adoption, Cross-team Collaboration, Problem-Solving under Pressure, Mentoring, Adaptability, Critical Thinking.
APPENDIX: TECHNICAL DEEP DIVE & PORTFOLIO
Leadership & AI-Driven Work Culture (Tech Lead)
- Served as Tech Lead of the AI team, responsible for technical direction, architecture decisions, task delegation, and code/system review — guiding the team from research phase through production deployment of 10+ AI services.
- Leveraged AI tools (ChatGPT, Claude, Gemini, GitHub Copilot) extensively in daily DevOps work: generating IaC templates, debugging infrastructure issues, writing automation scripts, and drafting technical documentation — significantly accelerating delivery speed and reducing human error.
- Championed a culture of continuous technology adoption: proactively researched and evaluated emerging AI technologies (LightRAG, KAG, multi-modal AI, AI Agents), then led POC implementations to validate feasibility before production rollout.
- Mentored team members on DevOps best practices, cloud architecture, and AI integration patterns — building team capability to operate independently.
KILOBA System Structure Breakdown
The bank's centralized AI platform powering all AI initiatives: chatbots, call scoring, computer vision, workflow automation. Multi-cloud architecture on AWS + Google Cloud, serving thousands of users bank-wide.
DevOps & Infrastructure Capabilities:
- Owned the full AWS infrastructure: provisioned and managed EC2, EKS clusters, VPC networking (peering, security groups, ELB), S3, RDS, and IAM across multiple accounts using AWS Organizations.
- Led zero-downtime migration of the entire production environment to a new AWS Organizational Unit (OU), re-configuring compute, networking, and IAM.
- Spearheaded GCP onboarding as second cloud provider: set up project hierarchy, IAM policies, billing, deployed Vertex AI / Gemini API.
- Deployed and managed Kubernetes clusters (EKS + on-premise K8s) running 10+ microservices with Helm charts, rolling updates, HPA, resource quotas.
- Designed centralized Observability Stack (Grafana + Prometheus + Loki + Tempo) for all AI services.
- Implemented FinOps: built cost dashboards, billing alerts, right-sized instances across AWS and GCP — providing monthly reports to management.
CI/CD & Automation Capabilities:
- Built end-to-end CI/CD pipelines (GitLab CI, AWS CodePipeline, ArgoCD) for the platform and 5+ projects (VNPay, Insurance, QA/QC) — reducing deployment cycle from days to minutes.
- Designed 5+ n8n automation workflows: user analytics, feedback aggregation, conversation extraction, usage reporting.
- Containerized all services with Docker (multi-stage builds), managed Helm charts, maintained consistent environments from dev to production.
AI/ML Engineering Capabilities:
- Engineered the RAG pipeline end-to-end: data training workflows, document embedding into Pinecone Vector DB, Neo4j GraphDB for knowledge graph.
- Designed the middleware layer connecting OpenAI, Gemini, Pinecone, Neo4j into a unified architecture with request routing and failover.
- Led Pilot-to-GoLive integration of Google Gemini — enabling multi-model AI capabilities in production.
- Migrated from traditional RAG to LightRAG architecture — improving retrieval accuracy and reducing token cost.
Detailed Projects Portfolio (KILOBA Sub-systems):
- AI Chatbots (10+): Automate customer support and internal Q&A for HR, Training, Recruitment — reducing manual workload for bank staff. Impact: Led team, built, deployed, and maintained 10+ chatbots from Dev → UAT → Production.
- AI Call Scoring: Automatically score CS phone calls to evaluate service quality — replacing manual call review process. Impact: Led and took ownership, re-architected infrastructure, set up evaluation pipeline.
- AIOps Automation: Enable autonomous DevOps (AI agents auto-monitor logs/metrics, detect incidents, propose solutions, and evaluate results). Impact: Leading architecture design, building Claude AI + MCP + sub-agents system (In Progress).
- CamAI: Computer Vision for the bank's physical security and operations. Impact: Led deployment of full infrastructure (compute, monitoring, alerting), production handover.
- Kiloba Web UI: Provide a user-friendly chat interface for all bank employees to interact with AI agents. Impact: Led deployment, operated, and maintained the web platform (Open-WebUI).
- OCR System: Automate document extraction to eliminate manual data entry. Impact: Led deployment of OCR pipeline for automated document processing.
- AI Agents: Extend AI capabilities beyond chatbot — enabling complex multi-step task automation. Impact: Led research and prototyped AI agent architecture.
Financial Infrastructure & Partner Projects
Critical infrastructure and CI/CD development for the bank's internal core systems and B2B integrations.
- VNPay Integration Infrastructure: Architected highly available multi-cloud routing environments to serve as the secure integration bridge between the bank's core and external VNPay services.
- Insurance Partner Gateways: Engineered dedicated secure deployment pipelines and strict IAM isolation rules for cross-selling insurance partner platforms, guaranteeing absolute transaction security.
- QA/QC Core Environments: Built completely isolated CI/CD pipelines and infrastructure segments, allowing QA testing teams to perform rigorous integration testing without impacting banking production traffic.
Comprehensive Tools & Technologies
- AWS Services: EC2, EKS, VPC, S3, RDS, ELB, IAM, Organizations, CodePipeline, CloudWatch, Route53
- GCP Services: Vertex AI, Cloud Run, IAM, AI Studio, Cloud Billing
- Cloud CLI: AWS CLI, gcloud CLI, kubectl, eksctl
- Containerization: Docker, Docker Compose, Docker Buildx, Docker Manifest, Helm
- CI/CD Platforms: GitLab CI/CD, AWS CodePipeline, ArgoCD, Jenkins
- IaC & Configuration: Terraform, Ansible
- Monitoring & Observability: Grafana, Prometheus, Loki, Tempo, Jaeger, Kibana, CloudWatch
- Scripting & Automation: Python, Bash, n8n, Cron
- AI Workflows (Daily): ChatGPT, Claude AI (Skills, Sub-agents, Co-working), Gemini, GitHub Copilot, MCP
- AI CLI Tools: Kiro CLI, Amazon Q CLI, Gemini CLI
- Version Control: Git, GitLab, GitHub
- OS & Environment: Linux (Ubuntu, Amazon Linux), WSL, SSH
- Project Management: Redmine, Agile/Scrum
Engineering Philosophy & Work Ethic
- Leadership & Ownership: Led the AI team as Tech Lead, drove critical architecture decisions, and took full ownership of complex projects from initial research to stable production.
- Fast Technology Adoption: Continuously evaluated and integrated emerging technologies (LightRAG, KAG, Gemini, MCP, AI Agents) into production pipelines within weeks.
- Cross-team Collaboration: Partnered closely with Customer Service, HR, and Recruitment departments to deliver AI solutions that accurately solved real-world business bottlenecks.
- Problem-Solving under Pressure: Re-architected critical systems (e.g., Call Scoring, AWS OU zero-downtime migration) while strictly maintaining daily banking operations.
- Mentoring & Knowledge Sharing: Guided cross-functional team members on DevOps methodologies, cloud architecture, and AI integration patterns, enabling them to operate independently.
- Self-motivated & Proactive: Independently researched new tech, initiated POCs, and proposed architectural upgrades before being asked — driving relentless innovation within the platform.
- Critical Thinking (Trade-offs): Constantly evaluated vital metrics regarding cost (FinOps), performance, and reliability when making large-scale engineering and infrastructure decisions.