Technology
The full stack, from CUDA to call.
Seven vertically integrated layers, all built on NVIDIA accelerated computing. Every layer is owned and engineered in-house. Nothing is rented from an LLM API.
Application — Talkif.ai
Enterprise voice agent platform. Visual flow builder, campaign engine, real-time monitoring. Production with paying customers.
Orchestration — Triton Inference Server
Auto-scaling GPU pools across H100 / Blackwell. In-flight batching, dynamic ensemble routing, gRPC streaming. Millions of concurrent calls.
Inference runtime — NVIDIA TensorRT-LLM
FP8 engines compiled per GPU SKU. Speculative decoding with a 1B draft model. Continuous batching with paged attention. Custom plugins for our interleaved audio + text token format.
Custom CUDA kernels
Streaming attention with rolling KV windows. Fused codec decode kernel. Audio frame de-interleaver. RVQ logits sampler. All hand-written for SM_90 (Hopper) and SM_100 (Blackwell).
Model — BL-Voice-1 (7B, decoder-only)
Speech-to-speech foundation model. ASR + dialog reasoning + expressive TTS in a single transformer. Trained from scratch by Bulut Labs.
Training — NVIDIA NeMo + Megatron-Core
Multi-node H100 / H200 clusters. Tensor + pipeline + data parallelism. FP8 mixed precision via Transformer Engine. SLURM job orchestration. NCCL over InfiniBand HDR.
Hardware — NVIDIA H100 / H200 / Blackwell + Jetson Orin
DGX-class clusters in the cloud for training and inference. Jetson Orin for on-premise enterprise edge deployments. One model family, one toolchain, every layer of the platform.
NVIDIA Software Stack in Production