GPU-Native Voice Intelligence.

Bulut Labs is a deep-tech AI research lab building proprietary speech foundation models and a CUDA-optimized real-time inference stack on NVIDIA accelerated computing — turning the global call center industry into an autonomous, GPU-powered system.

Research Program

A foundation model company, not a wrapper.

Most voice AI startups stitch third-party APIs together. We don't. Bulut Labs trains its own speech-to-speech foundation models on NVIDIA H100 and H200 clusters, writes custom CUDA kernels for the streaming inference path, and ships an end-to-end accelerated stack built on NVIDIA NeMo, TensorRT-LLM, Triton Inference Server, and Riva. Every millisecond of latency we remove is the result of GPU engineering, not prompt engineering.

01

Proprietary Speech Foundation Models

We train end-to-end multilingual speech-to-speech models using the NVIDIA NeMo framework on multi-node H100 / H200 clusters, with FP8 mixed-precision and Megatron-style tensor and pipeline parallelism. Our models jointly learn ASR, dialog reasoning, and expressive TTS in a single architecture — eliminating the cascaded latency and error compounding of legacy STT→LLM→TTS pipelines.

02

CUDA-Accelerated Real-Time Inference

Our serving stack is built on TensorRT-LLM and Triton Inference Server with custom CUDA kernels for streaming attention, speculative decoding, and continuous batching. We push end-to-end voice latency below 150 ms on a single H100 and serve millions of concurrent autonomous calls on Blackwell-class GPUs — performance that is only possible on NVIDIA accelerated computing.

03

From Cloud to Edge with Jetson

The same models that run in our DGX-class data centers are quantized and distilled down to NVIDIA Jetson Orin for on-premise enterprise deployments where data sovereignty, air-gapped operation, and sub-100 ms local latency are non-negotiable. One model family, one toolchain, one accelerated computing platform — from training to the edge.

Applied Research in Production

Research that ships at scale.

The call center is a GPU problem disguised as a labor problem.

The global contact center industry employs over 17 million people executing scripted, repetitive cognitive work. Replacing it is not a chatbot problem — it is a real-time, multilingual, sub-second reasoning problem that can only be solved with purpose-built foundation models running on accelerated hardware. This is the workload Bulut Labs was built for, and NVIDIA GPUs are the only substrate it runs on.

01Live on NVIDIA GPUs

talk;f

Talkif.ai is the first commercial deployment of our research stack: an enterprise-grade autonomous voice agent platform powered end-to-end by Bulut Labs' own speech foundation models, served from NVIDIA H100 and Blackwell GPUs through TensorRT-LLM and Triton. It is already in production with paying customers across multiple verticals.

Sub-150 ms end-to-end voice latency on a single H100, achieved through custom CUDA streaming kernels, speculative decoding, and continuous batching in TensorRT-LLM.

Proprietary multilingual speech-to-speech models trained on multi-node H100 clusters with NVIDIA NeMo, FP8 precision, and tensor + pipeline parallelism.

Production-grade campaign engine orchestrating millions of concurrent autonomous calls on Triton Inference Server with auto-scaling GPU pools.

02In Stealth — Training on H200

Our second deployment applies the same GPU-native foundation model approach to a new high-volume vertical. Pre-training is currently running on an H200 cluster; first commercial pilots scheduled this year.

We build the model, the kernels, and the stack.

Bulut Labs is not a software agency and not an API reseller. We are GPU engineers and ML researchers who train our own foundation models, write our own CUDA kernels, and own every layer from the silicon up. NVIDIA accelerated computing is not a vendor choice for us — it is the only platform on which our research is physically possible, and we are committed to building on it for the next decade.

Open Research Roles

Speech Foundation Model Researcher (NeMo / Megatron)research@bulutlabs.com
CUDA & TensorRT-LLM Inference Engineerresearch@bulutlabs.com
GPU Infrastructure Engineer (H100 / H200 / Blackwell)research@bulutlabs.com