Open call · design partners · summer 2026 Read brief →

Foundation models for the physical world.

An İstanbul research lab. We have been working on a single question since 2021: how do you train a model that can reason about the world the way the world actually presents itself — through video, lidar, audio and telemetry, in real time, with provenance you can defend? We are building toward our answer in public, in stages, starting now.

What we are building About the company Become a design partner →

Working since

2021· İstanbul

Stage

Pre-seed· bootstrapped

Modalities

6· video · lidar · audio · IMU · text

Current model

BULUT-0· prototype encoder

§ 01 — Why we exist

Language models know about the world. Ours need to know it.

A robot that has read every Wikipedia page still cannot pour a glass of water. The bottleneck is not text — it is grounded experience. We have spent five years convinced that the next decade of useful intelligence belongs to models trained on the world the way it actually arrives.

The next trillion tokens of useful intelligence are not on the web. They are streaming, in real time, from instruments that nobody has labelled yet.

— a sentence we wrote in a notebook in 2022 and have not stopped believing

What we mean by physical intelligence

A model that, given a short clip of an environment, can predict what will happen ten seconds from now, what an agent should do to achieve a goal, and what would have happened had the agent acted differently. Counterfactual, embodied, real-time.

This is the substrate, we believe, for the next decade of robotics, autonomy, climate science and industrial AI — domains where text alone is insufficient. We are not the only ones with this thesis. We may be among the most patient about it.

§ 02 — What we are building

One model family. Three sizes. Honest milestones.

We do not pretend to have shipped what we have not. The three cards below describe what exists today, what is in active development, and what is on the road map. We will not announce a milestone until we hit it.

BULUT-1 · ROADMAP

The model we are working toward

in development · target preview Q4 2027

Target architecture

ArchitectureSparse MoE Transformer

Modalities (input)video · lidar · audio · imu · text

Modalities (output)future-frame · depth · policy · text

Context windowtarget 32 K · 30 s video

Parameterstarget 14 B (small), 70 B (large)

Training computetarget ~10²³ FLOPs · cloud-bursted

Hardwarecloud H100 / H200 (rented)

First previewQ4 2027 · contingent on funding

Architecture sketch

EXISTS · INTERNAL

BULUT-0 · prototype

An encoder-only model trained on a small permissioned dataset on a single GPU. It is what we use to test data pipelines, evaluation harnesses and the parts of stratus we have written so far. Useful for nothing yet — by design.

~120 M PARAMS · INTERNAL ONLY · NOT FOR USE

IN DEVELOPMENT

BULUT-1 Mini · target 2027

Our first public release target. A 14-B distillation we plan to ship under a research licence with open weights once the data engine and evaluation infrastructure are ready. Currently in active development with funding contingencies.

~14 B PARAMS · TARGET Q4 2027 · LICENCE TBD

ROADMAP · 2028+

BULUT-1 · target 2028+

The flagship model. A 70-B-class sparse mixture-of-experts trained on cloud-rented GPU capacity with our partner programmes. Realistic only with proper funding, a meaningful customer base and the infrastructure work between now and then.

~70 B PARAMS · TARGET 2028+ · CONTINGENT

§ 03 — Data engine

Five years thinking about provenance. Five months actually capturing.

A foundation-model lab is, before anything else, a data engineering lab. Below is what the data engine looks like today (small, capture-rig-stage) and what we have committed to writing down before we add any more bytes.

Continuous capture · single rig · İstanbul

One mobile multimodal capture rig running across İstanbul and three rural Anatolian provinces. RGB, lidar, IMU, GNSS and 8-channel audio, time-synced and on-device redacted before any frame leaves the rig. Honest scale: tens of terabytes, not petabytes.

The interesting part is not the volume — it is the provenance ledger we have written around it, which we treat as the most important code in the company.

1 capture rig · 4 cities · ~22 TB · all permissioned

Modality coverage today

RGB · stereo · 32-beam lidar · 8-channel audio · IMU · GNSS. Adding event cameras and 4D radar in 2027 once the calibration pipeline ships v2.

6 modalities · time-sync ≤ 5 ms

Geographic scope today

İstanbul, Konya plateau, two rural Anatolian provinces. Goal for 2027 is twelve cities across two climate zones — only if the design partner programme funds expansion.

4 cities · 1 climate zone · 1 country

Auto-labelling pipeline

Self-supervised pseudo-labelling and cross-modal consistency checks. Today: founder-only QA. Goal: a small expert annotation team funded through customer pilots, not headcount-burn.

QA team: 0 yet · funded via partners

Provenance & consent · the discipline we ship first

Every byte ingested has a cryptographic provenance chain. On-device facial and license-plate redaction runs before transmission. Consent ledgers are stored on a tamper-evident log with row-level lineage to every model checkpoint we train. This is already in production for our own corpus, even though the corpus is small.

GDPR · KVKK aligned · SOC 2 path is on the roadmap

Synthetic counterfactuals · planned

For every real trajectory we will render variants in a Gaussian-splatting twin of the scene to teach causal structure rather than mere correlation. The pipeline exists in prototype; full integration depends on cluster capacity we do not yet own.

Prototype · target activation 2027

§ 04 — Compute

We do not own a cluster yet. We will not pretend otherwise.

Frontier-class training requires frontier-class compute. We are at the rented-cloud stage. Below is what we use today and what the path looks like — including the cloud credit programmes we are actively applying to.

Today · cloud rental

Single-node H100 instances rented from cloud providers when we need them. Most of our work runs on a small on-premise development workstation. We are an active applicant to AWS Activate and the NVIDIA Inception programme, both of which provide compute credits to early-stage startups.

▲ AWS ACTIVATE · APPLICANT 2026 · NVIDIA INCEPTION applicant · seeking design-partner-funded compute · CUDA-native

`stratus` · our training stack in progress

Our internal training framework — early days, but starting to look like something. Designed from day one to elastically scale across heterogeneous rented GPU pools because we know we will never own the metal we train on at frontier scale. Open-sourcing planned for 2027 once we have something worth the README.

Code today

~14 K LOC

Tested up to

8 GPUs

Open source target

2027

Licence target

Apache 2.0

§ 05 — Where this matters

Four target verticals. Open design-partner conversations.

We are not announcing partnerships we have not signed. Below is where we believe physical-AI foundation models matter most, and where we want our first design partners to come from.

— 01

Autonomous mobility

Closed-loop simulation, edge-case mining and behavioural prediction for fleet operators. We do not build vehicles — we believe the stacks that go into them deserve smarter eyes than a per-task CNN ensemble.

targettruckinglast-milesimulation

— 02

Industrial monitoring

Continuous video understanding of factory floors, ports and energy infrastructure. Anomaly detection, predictive maintenance, OSHA-grade incident reconstruction. The vertical we expect to find our first paying design partner in.

targetmanufacturingportsenergy

— 03

Earth observation & climate

Joint reasoning across satellite, drone and ground-station data — wildfire spread, urban flooding, methane plumes. We have a research relationship in early conversation with one civil-protection group.

targetsatelliteEOclimate

— 04

Robot foundation policies

Generalist manipulation and locomotion policies fine-tuned from BULUT for humanoid, quadruped and arm-robot OEMs. Our most ambitious vertical and the one that benefits most from a foundation-model backbone.

targethumanoidsmanipulationRL

Read the design-partner brief →

§ 06 — Research notes

Six research lines. Zero published papers — yet.

We have written a great deal in our internal notebooks and published nothing externally. That changes in 2027. Below are the six lines of inquiry our founder-led research team is working on. We will publish what we are confident in, when we are confident.

World models

Latent dynamics, future-frame prediction, counterfactual rollouts. The mathematical core of what we want to build.

First paper target · 2027

Multimodal alignment

Shared representations across video, lidar, audio, IMU and text without paired supervision. Where we are spending most of our time today.

Active · prototype eval harness

Distributed training systems

Mixture-of-experts routing, elastic restarts, fault-tolerant checkpointing across heterogeneous, rented GPU pools.

stratus framework · in progress

Embodied policy learning

Generalist policies for humanoid, quadruped and arm-robot embodiments via fine-tuning from a foundation backbone.

Roadmap · 2028+

Earth observation

Joint training on satellite, drone and ground-station data for nowcasting climate phenomena at sub-km resolution.

Research-collaboration phase

Provenance & alignment

Byte-level audit logs, consent-aware training, model cards as cryptographic artefacts. Already running in production for our small corpus.

Live · the discipline we shipped first