Technology

The full stack, from CUDA to call.

Seven vertically integrated layers, all built on NVIDIA accelerated computing. Every layer is owned and engineered in-house. Nothing is rented from an LLM API.

07

Application — Talkif.ai

Enterprise voice agent platform. Visual flow builder, campaign engine, real-time monitoring. Production with paying customers.

06

Orchestration — Triton Inference Server

Auto-scaling GPU pools across H100 / Blackwell. In-flight batching, dynamic ensemble routing, gRPC streaming. Millions of concurrent calls.

05

Inference runtime — NVIDIA TensorRT-LLM

FP8 engines compiled per GPU SKU. Speculative decoding with a 1B draft model. Continuous batching with paged attention. Custom plugins for our interleaved audio + text token format.

04

Custom CUDA kernels

Streaming attention with rolling KV windows. Fused codec decode kernel. Audio frame de-interleaver. RVQ logits sampler. All hand-written for SM_90 (Hopper) and SM_100 (Blackwell).

03

Model — BL-Voice-1 (7B, decoder-only)

Speech-to-speech foundation model. ASR + dialog reasoning + expressive TTS in a single transformer. Trained from scratch by Bulut Labs.

02

Training — NVIDIA NeMo + Megatron-Core

Multi-node H100 / H200 clusters. Tensor + pipeline + data parallelism. FP8 mixed precision via Transformer Engine. SLURM job orchestration. NCCL over InfiniBand HDR.

01

Hardware — NVIDIA H100 / H200 / Blackwell + Jetson Orin

DGX-class clusters in the cloud for training and inference. Jetson Orin for on-premise enterprise edge deployments. One model family, one toolchain, every layer of the platform.

NVIDIA Software Stack in Production

NeMo
Megatron-Core
TensorRT-LLM
Triton Inference Server
CUDA 12.6
Transformer Engine (FP8)
cuBLAS / cuDNN
NCCL
Riva
Jetson Orin SDK
DCGM
NIM Microservices
← Back home